Skip to content
View czubi1928's full-sized avatar
🎯
Focusing
🎯
Focusing

Block or report czubi1928

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
czubi1928/README.md

👋 Hi, I'm Patrick Czubinski

👨‍💻 Database Administrator | SQL Developer | Aspiring Data Engineer

I'm a data-focused engineer who builds reliable, scalable data systems. I started my career as a junior data programmer and evolved into a PostgreSQL Database Administrator and SQL Developer. Today I'm transitioning to Data Engineering — designing data pipelines, orchestration, and end-to-end data platforms while keeping reliability and performance first.

📍 Location: Warsaw, Poland 💬 Languages: English / Polski


⭐ Key Achievements & Core Competencies

  • Implemented an Apache Cassandra solution that increased system throughput by ~428% (from ~6,014 r/s to ~31,748 r/s) compared to the prior PostgreSQL-based approach

  • Automated database log reporting with pgBadger, improving time-to-detect and time-to-fix incidents

  • Tuned PostgreSQL autovacuum policy, raising data-write throughput by ~10%

  • Built a Python-based archive processing tool that reduced manual data restoration time from hours to near-instant for client requests

  • Database Management: PostgreSQL Admin, NoSQL (Cassandra), Performance Tuning, Security, Data Modeling

  • Data Programming & Automation: Advanced SQL, Python (Pandas), Bash Scripting, ETL/ELT Logic

  • Infrastructure & Operations: Monitoring (Prometheus, Grafana), Performance Testing, Configuration Management

  • Data Engineering Concepts: Data Pipelines, System Architecture, Data Archiving & Restoration


🏗️ Featured Projects

A Bitcoin price monitoring tool that ingests price data from the CoinCap API, processes and models it, and exposes interactive dashboards for trend analysis and alerting. The project demonstrates building a production-ready data pipeline with infrastructure-as-code, orchestration and cloud data warehousing.

Tech Stack: PythonPandasApache AirflowdbtSnowflakeTerraformDockerCoinCap APIApache Superset

💡 Highlights: ingestion and processing of Bitcoin prices, Airflow DAGs for orchestration, dbt models for transformations and testing, Terraform-managed infra and Snowflake as the analytics warehouse; designed for extensibility (alerting, CI/CD, dashboarding).

An end-to-end data engineering project built around the Million Song Dataset. It demonstrates a modern data stack for building scalable, testable, and automated data pipelines—from raw ingestion to analytics and visualization.

Tech Stack: PythonApache AirflowdbtDuckDBPostgreSQLMinIO (S3)Apache SupersetDocker

💡 Highlights: Showcases my ability to design and implement a complete data platform, covering ingestion, transformation, modeling, orchestration, and visualization with a focus on reliability and clean architecture.

A data analytics project that scrapes, processes, and visualizes IT job advertisements to uncover trends in skills demand, salary ranges, and role distributions. It demonstrates turning raw, unstructured web data into actionable insights.

Tech Stack: PythonBeautifulSoupPandasNumPySQL (SQLite)MatplotlibSeabornJupyter

💡 Highlights: This project combines web scraping, data modeling, and visualization to extract value from unstructured data, reflecting skills in data engineering, analytics, and text processing.

#️⃣ Other Noteworthy Projects

See my repositories for readmes, architecture diagrams, and runnable demos


🧰 Tech Stack & Tools

Languages

Python SQL Bash

Data Processing & Orchestration

Apache Airflow dbt Pandas

Databases

PostgreSQL Apache Cassandra SQLite DuckDB Redis

Cloud & Infrastructure

AWS Amazon S3 Docker

Monitoring & Visualization

Prometheus Grafana Apache Superset


🎯 My Next Steps

  • 🧱 Build more projects to gain hands-on experience with large-scale and real-time data
  • ⚙️ Master technologies like Spark, Kafka, Terraform, and CI/CD through practical application
  • ☁️ Gain deeper expertise with Python, AWS cloud services, advanced data modeling (Kimball, Inmon), and data lake architectures
  • 📘 Earn certifications such as AWS Certified Data Engineer and Databricks Certified Data Engineer

🏅 Certifications & Credentials

*Prompt Design in Vertex AI Skill Badge ** - Google Cloud


🌱 What I'm Working On Next

I am currently in the planning and research phase for my next major project. My goal is to deepen my skills in one of the following areas:

Project Theme Focus Example Stack
Big Data Processing Distributed data processing at scale PySpark, AWS EMR/Databricks, S3, Parquet
Real-Time Streaming Pipeline Ingesting and processing data in real-time Kafka, Spark Streaming/Flink, AWS Kinesis, Cassandra
Cloud Data Warehouse & BI End-to-end analytics from source to dashboard in the cloud Snowflake/BigQuery, dbt, Fivetran/Airbyte, Power BI
Infrastructure as Code (IaC) for DE Automating the deployment of a complete data platform Terraform, AWS (S3, Glue, Lambda), Docker, GitHub Actions

📫 Get in Touch

Pinned Loading

  1. analyzing_job_offers analyzing_job_offers Public

    Program for processing and analyzing data from job advertisements

    Jupyter Notebook

  2. data_modeling_with_cassandra data_modeling_with_cassandra Public

    Python

  3. data_warehouse data_warehouse Public

    Python

  4. de_airflow_tutorial de_airflow_tutorial Public

    Python

  5. million_song_project million_song_project Public

    Python