Patryk Czubiński czubi1928

👋 Hi, I'm Patrick Czubinski

👨‍💻 Database Administrator | SQL Developer | Aspiring Data Engineer

I'm a data-focused engineer who builds reliable, scalable data systems. I started my career as a junior data programmer and evolved into a PostgreSQL Database Administrator and SQL Developer. Today I'm transitioning to Data Engineering — designing data pipelines, orchestration, and end-to-end data platforms while keeping reliability and performance first.

📍 Location: Warsaw, Poland 💬 Languages: English / Polski

⭐ Key Achievements & Core Competencies

Implemented an Apache Cassandra solution that increased system throughput by ~428% (from ~6,014 r/s to ~31,748 r/s) compared to the prior PostgreSQL-based approach
Automated database log reporting with pgBadger, improving time-to-detect and time-to-fix incidents
Tuned PostgreSQL autovacuum policy, raising data-write throughput by ~10%
Built a Python-based archive processing tool that reduced manual data restoration time from hours to near-instant for client requests
Database Management: PostgreSQL Admin, NoSQL (Cassandra), Performance Tuning, Security, Data Modeling
Data Programming & Automation: Advanced SQL, Python (Pandas), Bash Scripting, ETL/ELT Logic
Infrastructure & Operations: Monitoring (Prometheus, Grafana), Performance Testing, Configuration Management
Data Engineering Concepts: Data Pipelines, System Architecture, Data Archiving & Restoration

🏗️ Featured Projects

1️⃣ [final touch...] 🪙 Bitcoin Monitor - Data Engineering Project

A Bitcoin price monitoring tool that ingests price data from the CoinCap API, processes and models it, and exposes interactive dashboards for trend analysis and alerting. The project demonstrates building a production-ready data pipeline with infrastructure-as-code, orchestration and cloud data warehousing.

Tech Stack: Python • Pandas • Apache Airflow • dbt • Snowflake • Terraform • Docker • CoinCap API • Apache Superset

💡 Highlights: ingestion and processing of Bitcoin prices, Airflow DAGs for orchestration, dbt models for transformations and testing, Terraform-managed infra and Snowflake as the analytics warehouse; designed for extensibility (alerting, CI/CD, dashboarding).

2️⃣ 🎵 Million Song - Data Engineering Project

An end-to-end data engineering project built around the Million Song Dataset. It demonstrates a modern data stack for building scalable, testable, and automated data pipelines—from raw ingestion to analytics and visualization.

Tech Stack: Python • Apache Airflow • dbt • DuckDB • PostgreSQL • MinIO (S3) • Apache Superset • Docker

💡 Highlights: Showcases my ability to design and implement a complete data platform, covering ingestion, transformation, modeling, orchestration, and visualization with a focus on reliability and clean architecture.

3️⃣ 📋 IT Job Market Analysis

A data analytics project that scrapes, processes, and visualizes IT job advertisements to uncover trends in skills demand, salary ranges, and role distributions. It demonstrates turning raw, unstructured web data into actionable insights.

Tech Stack: Python • BeautifulSoup • Pandas • NumPy • SQL (SQLite) • Matplotlib • Seaborn • Jupyter

💡 Highlights: This project combines web scraping, data modeling, and visualization to extract value from unstructured data, reflecting skills in data engineering, analytics, and text processing.

#️⃣ Other Noteworthy Projects

Data Modeling With Postgres - Designed and implemented a star schema database for a music streaming startup
Data Modeling With Cassandra - Built a NoSQL data model on Apache Cassandra to handle high-throughput event data
DE Airflow Tutorial - A practical guide and project demonstrating core concepts of ETL orchestration with Airflow
Data engineering project: data extraction to analysis - End-to-end data engineering project simulating data extraction, transformation, loading, and analysis for an online store
Apache Superset Tutorial - This repository contains a lightweight Docker-based setup for running Apache Superset — a modern data exploration and visualization platform

See my repositories for readmes, architecture diagrams, and runnable demos

🧰 Tech Stack & Tools

Languages

Data Processing & Orchestration

Databases

Cloud & Infrastructure

Monitoring & Visualization

🎯 My Next Steps

🧱 Build more projects to gain hands-on experience with large-scale and real-time data
⚙️ Master technologies like Spark, Kafka, Terraform, and CI/CD through practical application
☁️ Gain deeper expertise with Python, AWS cloud services, advanced data modeling (Kimball, Inmon), and data lake architectures
📘 Earn certifications such as AWS Certified Data Engineer and Databricks Certified Data Engineer

🏅 Certifications & Credentials

DeepLearning.AI Data Engineering - Specialization Certificate

*Prompt Design in Vertex AI Skill Badge ** - Google Cloud

🌱 What I'm Working On Next

I am currently in the planning and research phase for my next major project. My goal is to deepen my skills in one of the following areas:

Project Theme	Focus	Example Stack
Big Data Processing	Distributed data processing at scale	PySpark, AWS EMR/Databricks, S3, Parquet
Real-Time Streaming Pipeline	Ingesting and processing data in real-time	Kafka, Spark Streaming/Flink, AWS Kinesis, Cassandra
Cloud Data Warehouse & BI	End-to-end analytics from source to dashboard in the cloud	Snowflake/BigQuery, dbt, Fivetran/Airbyte, Power BI
Infrastructure as Code (IaC) for DE	Automating the deployment of a complete data platform	Terraform, AWS (S3, Glue, Lambda), Docker, GitHub Actions

📫 Get in Touch

Email: czubi1928@gmail.com
LinkedIn: linkedin.com/in/patryk-czubinski-1928-sql
GitHub: github.com/czubi1928

Provide feedback

Saved searches

Use saved searches to filter your results more quickly