I'm a data-focused engineer who builds reliable, scalable data systems. I started my career as a junior data programmer and evolved into a PostgreSQL Database Administrator and SQL Developer. Today I'm transitioning to Data Engineering — designing data pipelines, orchestration, and end-to-end data platforms while keeping reliability and performance first.
📍 Location: Warsaw, Poland 💬 Languages: English / Polski
-
Implemented an Apache Cassandra solution that increased system throughput by ~428% (from ~6,014 r/s to ~31,748 r/s) compared to the prior PostgreSQL-based approach
-
Automated database log reporting with pgBadger, improving time-to-detect and time-to-fix incidents
-
Tuned PostgreSQL autovacuum policy, raising data-write throughput by ~10%
-
Built a Python-based archive processing tool that reduced manual data restoration time from hours to near-instant for client requests
-
Database Management: PostgreSQL Admin, NoSQL (Cassandra), Performance Tuning, Security, Data Modeling
-
Data Programming & Automation: Advanced SQL, Python (Pandas), Bash Scripting, ETL/ELT Logic
-
Infrastructure & Operations: Monitoring (Prometheus, Grafana), Performance Testing, Configuration Management
-
Data Engineering Concepts: Data Pipelines, System Architecture, Data Archiving & Restoration
1️⃣ [final touch...] 🪙 Bitcoin Monitor - Data Engineering Project
A Bitcoin price monitoring tool that ingests price data from the CoinCap API, processes and models it, and exposes interactive dashboards for trend analysis and alerting. The project demonstrates building a production-ready data pipeline with infrastructure-as-code, orchestration and cloud data warehousing.
Tech Stack: Python • Pandas • Apache Airflow • dbt • Snowflake • Terraform • Docker • CoinCap API •
Apache Superset
💡 Highlights: ingestion and processing of Bitcoin prices, Airflow DAGs for orchestration, dbt models for transformations and testing, Terraform-managed infra and Snowflake as the analytics warehouse; designed for extensibility (alerting, CI/CD, dashboarding).
An end-to-end data engineering project built around the Million Song Dataset. It demonstrates a modern data stack for building scalable, testable, and automated data pipelines—from raw ingestion to analytics and visualization.
Tech Stack: Python • Apache Airflow • dbt • DuckDB • PostgreSQL • MinIO (S3) • Apache Superset •
Docker
💡 Highlights: Showcases my ability to design and implement a complete data platform, covering ingestion, transformation, modeling, orchestration, and visualization with a focus on reliability and clean architecture.
A data analytics project that scrapes, processes, and visualizes IT job advertisements to uncover trends in skills demand, salary ranges, and role distributions. It demonstrates turning raw, unstructured web data into actionable insights.
Tech Stack: Python • BeautifulSoup • Pandas • NumPy • SQL (SQLite) • Matplotlib • Seaborn • Jupyter
💡 Highlights: This project combines web scraping, data modeling, and visualization to extract value from unstructured data, reflecting skills in data engineering, analytics, and text processing.
- Data Modeling With Postgres - Designed and implemented a star schema database for a music streaming startup
- Data Modeling With Cassandra - Built a NoSQL data model on Apache Cassandra to handle high-throughput event data
- DE Airflow Tutorial - A practical guide and project demonstrating core concepts of ETL orchestration with Airflow
- Data engineering project: data extraction to analysis - End-to-end data engineering project simulating data extraction, transformation, loading, and analysis for an online store
- Apache Superset Tutorial - This repository contains a lightweight Docker-based setup for running Apache Superset — a modern data exploration and visualization platform
See my repositories for readmes, architecture diagrams, and runnable demos
- 🧱 Build more projects to gain hands-on experience with large-scale and real-time data
- ⚙️ Master technologies like Spark, Kafka, Terraform, and CI/CD through practical application
- ☁️ Gain deeper expertise with Python, AWS cloud services, advanced data modeling (Kimball, Inmon), and data lake architectures
- 📘 Earn certifications such as AWS Certified Data Engineer and Databricks Certified Data Engineer
- DeepLearning.AI Data Engineering - Specialization Certificate
-
*Prompt Design in Vertex AI Skill Badge ** - Google Cloud
I am currently in the planning and research phase for my next major project. My goal is to deepen my skills in one of the following areas:
| Project Theme | Focus | Example Stack |
|---|---|---|
| Big Data Processing | Distributed data processing at scale | PySpark, AWS EMR/Databricks, S3, Parquet |
| Real-Time Streaming Pipeline | Ingesting and processing data in real-time | Kafka, Spark Streaming/Flink, AWS Kinesis, Cassandra |
| Cloud Data Warehouse & BI | End-to-end analytics from source to dashboard in the cloud | Snowflake/BigQuery, dbt, Fivetran/Airbyte, Power BI |
| Infrastructure as Code (IaC) for DE | Automating the deployment of a complete data platform | Terraform, AWS (S3, Glue, Lambda), Docker, GitHub Actions |
- Email: czubi1928@gmail.com
- LinkedIn: linkedin.com/in/patryk-czubinski-1928-sql
- GitHub: github.com/czubi1928