📚 OpenLibrary 50 Books Project
This project is a small Python script that talks to the OpenLibrary public API, grabs book data, filters books published after 2000, and saves exactly 50 unique results into a CSV file.
Simple idea. A few hidden challenges. Good practice with APIs.
🎯 What This Script Does
Sends requests to the OpenLibrary API
Filters books published after the year 2000
Avoids duplicate results
Collects exactly 50 valid books
Sorts them by publication year (ascending)
Saves everything into books.csv
🧠 Why It’s Not Just “Fetch and Save”
At first glance this sounds easy:
“Get 50 books after 2000 and save them.”
But APIs don’t always give you exactly what you want in one shot.
Some practical issues I had to handle:
The API returns results in pages (pagination)
Some books don’t have a publication year
Some entries are duplicates (different editions of the same work)
The API doesn’t provide a “random books” endpoint
So the script had to be a bit smarter than a single request.
🔍 How It Works (High-Level)
Send a search request to OpenLibrary with a filter:
first_publish_year:[2001 TO *]
Request up to 200 results per page.
Loop through results and:
Check publication year
Skip duplicates using a unique book key
Keep collecting until we reach 50 valid books.
Sort them by year (ascending).
Export to CSV.
The script stops immediately once 50 valid books are collected — no unnecessary API calls.
🧱 Tech Used
Python 3
requests library
Built-in csv module
Nothing fancy. Just clean logic and proper API handling.
Install dependencies:
pip install requests
Run the script:
python main.py
After execution, you’ll find:
books.csv
in the project directory.
📂 Project Structure
openlibrary-books/
│
├── main.py
├── books.csv
└── README.md
The API doesn’t guarantee that 50 valid results will come back in one response. So I implemented a loop that keeps requesting pages until enough books are collected.
Duplicate Entries
Search results sometimes include multiple editions of the same work.
To prevent duplicates, I tracked each book’s unique key.
Missing Data
Some records don’t include publication year or author information.
I used safe dictionary access (.get()) and validated fields before adding them.
📈 Possible Improvements
If this were a bigger project, I’d probably:
Add logging instead of simple print statements
Add retry logic for failed requests
Turn it into a CLI tool (e.g., choose year dynamically)
Add basic tests
Parameterize subject instead of hardcoding it