Skip to content

George-Nyamao/GCP_ETL_Project

Repository files navigation

GCP_ETL_Project

First, we create a fake employee dataset with Python with the help of the Faker library. We then upload the dataset to a Google Cloud Storage bucket using the same Python program. We use Wrangler in Data Fusion to concatenate columns and mask Personal Identifiable Information (PII). We then send the resulting table to BigQuery and create a report in Looker.

Screenshot of the pipeline

Finally, we automate the workflow using Apache Airflow in Cloud Composer.

About

An ETL pipeline to move an uploaded flat file ffrom GCS, mask PII, store Big Query, and Create a report in Looker.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages