This repository contains my attempts to solve the 1 Billion Rows Challenge using Python. The challenge consists of processing a large text file containing temperature measurements from various weather stations and calculating statistics for each station.
This project is currently under development.
The 1 Billion Row Challenge (#1BRC) is a programming challenge that involves processing a text file containing one billion rows of temperature measurements from weather stations. Each row contains a station name and a temperature value separated by a semicolon (e.g., Hamburg;12.5).
The goal is to calculate the min, mean, and max temperature for each weather station as efficiently as possible.
.
├── data/ # Data files
│ ├── measurements.txt # Generated measurements file
│ └── weather_stations.csv# Weather stations data
├── scripts/ # Solution attempts
│ └── 01_first_try.py # First implementation
├── create_measurements.py # Script to generate test data
├── pyproject.toml # Project dependencies
└── README.md
- Python 3.12+
Use the create_measurements.py script to generate test data:
python create_measurements.py <number_of_rows>Example:
python create_measurements.py 1_000_000I will be documenting different approaches and their performance metrics here as I implement them.
Stay tuned for updates!
This is a personal challenge project, but feel free to fork it and try your own solutions!
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.