EnvPipe

EnvPipe is a data product for environmental data analysis, built with Databricks.
It focuses on air quality, weather, and health data, enabling end-to-end data processing from ingestion to insight-ready outputs.

Developed during my 2025 summer internship at NTT Data.

Architecture

EnvPipe follows a Hub & Spokes topological model with the medallion architecture:

Bronze layer (spokes): ingestion of raw data from multiple domains.
Silver layer (hub): integrated and standardized datasets, following the Inmon approach.
Gold layer (spokes): insight-ready data products for analytics and health risk assessment.

Data Sources

Weather data from Open-Meteo APIs:
- Historical Weather API
- Weather Forecast API
Air quality data from QualAR
Health data from:
- WHO guidelines
- GBD study

Setup

Run and follow the instructions in scripts/catalog_setup.ipynb
Run the setup pipeline once: this ingests historical data, prepares Silver tables, and trains prediction models
Schedule the forecast pipeline to run every hour, this ingests forecast data, generates predictions, and updates insights

Pipeline configuration files (.yml) are stored under the pipeline-config/ folder to ensure reproducibility. Each notebook also includes descriptions/explanations that document design choices and logic. Mermaid diagrams for each data layer and the pipelines are available in the diagrams/ folder.

Progress

Understand the data - done
Catalog setup - done
Bronze layer (data ingestion) - done
Silver layer (joined data, training set, models, forecasts) - done
Gold layer (feature importance, pollution patterns, health risk) - done
Setup and Forecast pipelines - done (next step: dtl integration)
Dashboards for insights - done

EnvPipe delivers an end-to-end data product that ingests, integrates, and enriches environmental data into insight-ready outputs through pipelines, accessible in a dashboard.

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
dashboard		dashboard
diagrams		diagrams
notebooks		notebooks
pipeline-config		pipeline-config
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EnvPipe

Architecture

Data Sources

Setup

Progress

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

EnvPipe

Architecture

Data Sources

Setup

Progress

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages