← Back to Projects

Microsoft Data Stack

View Code
Data EngineeringSQL ServerAirflowSupersetDocker

Overview

Microsoft Data Stack is a containerized data engineering solution that demonstrates complete data pipeline capabilities. It features multi-source ingestion, a layered warehouse architecture, and quality monitoring dashboards.

Architecture

text
+==============================================================================+
|                        MICROSOFT DATA STACK                                  |
+==============================================================================+
|                                                                              |
|    +-----------+     +-----------+                                           |
|    | REST APIs |     | CSV Files |                                           |
|    +-----------+     +-----------+                                           |
|          |                 |                                                 |
|          +--------+--------+                                                 |
|                   |                                                          |
|                   v                                                          |
|            +-------------+                                                   |
|            |   Airflow   |                                                   |
|            | (Orchestrate)|                                                  |
|            +------+------+                                                   |
|                   |                                                          |
|                   v                                                          |
|    +==================================+                                      |
|    |         SQL SERVER 2022          |                                      |
|    +==================================+                                      |
|    |  Raw  -->  Staging  -->  DW  -->  Analytics                             |
|    +==================================+                                      |
|                   |                                                          |
|                   v                                                          |
|            +-------------+                                                   |
|            |  Superset   |                                                   |
|            | (Dashboards)|                                                   |
|            +-------------+                                                   |
|                                                                              |
+==============================================================================+

Key Features

  • Multi-Source Ingestion - REST APIs and CSV file support
  • Layered Architecture - Raw, Staging, Data Warehouse, and Analytics layers
  • Quality Monitoring - Tracks records received, stored, and rejected
  • Pre-Built Dashboards - Sales analytics and data quality views
  • Data Marts - Aggregated analytics for reporting

Tech Stack

  • Database - SQL Server 2022
  • Orchestration - Apache Airflow 2.8
  • Visualization - Apache Superset 3.1
  • Processing - Python (pandas, requests)
  • Infrastructure - Docker Compose

Quick Start

bash
# Clone and start
git clone https://github.com/AlharbiAbdullah/microsoft_data_stack
cd microsoft_data_stack
docker-compose up --build -d

# Access services
# Airflow:   http://localhost:8080
# Superset:  http://localhost:8088
# SQL Server: localhost:1433