Microsoft Data Stack
Overview
Microsoft Data Stack is a containerized data engineering solution that demonstrates complete data pipeline capabilities. It features multi-source ingestion, a layered warehouse architecture, and quality monitoring dashboards.
Architecture
text
+==============================================================================+
| MICROSOFT DATA STACK |
+==============================================================================+
| |
| +-----------+ +-----------+ |
| | REST APIs | | CSV Files | |
| +-----------+ +-----------+ |
| | | |
| +--------+--------+ |
| | |
| v |
| +-------------+ |
| | Airflow | |
| | (Orchestrate)| |
| +------+------+ |
| | |
| v |
| +==================================+ |
| | SQL SERVER 2022 | |
| +==================================+ |
| | Raw --> Staging --> DW --> Analytics |
| +==================================+ |
| | |
| v |
| +-------------+ |
| | Superset | |
| | (Dashboards)| |
| +-------------+ |
| |
+==============================================================================+Key Features
- Multi-Source Ingestion - REST APIs and CSV file support
- Layered Architecture - Raw, Staging, Data Warehouse, and Analytics layers
- Quality Monitoring - Tracks records received, stored, and rejected
- Pre-Built Dashboards - Sales analytics and data quality views
- Data Marts - Aggregated analytics for reporting
Tech Stack
- Database - SQL Server 2022
- Orchestration - Apache Airflow 2.8
- Visualization - Apache Superset 3.1
- Processing - Python (pandas, requests)
- Infrastructure - Docker Compose
Quick Start
bash
# Clone and start
git clone https://github.com/AlharbiAbdullah/microsoft_data_stack
cd microsoft_data_stack
docker-compose up --build -d
# Access services
# Airflow: http://localhost:8080
# Superset: http://localhost:8088
# SQL Server: localhost:1433