GitHub - maarefvnd/shelf-query-script

📚 OpenLibrary 50 Books Project

This project is a small Python script that talks to the OpenLibrary public API, grabs book data, filters books published after 2000, and saves exactly 50 unique results into a CSV file.

Simple idea. A few hidden challenges. Good practice with APIs.

🎯 What This Script Does

Sends requests to the OpenLibrary API

Filters books published after the year 2000

Avoids duplicate results

Collects exactly 50 valid books

Sorts them by publication year (ascending)

Saves everything into books.csv

🧠 Why It’s Not Just “Fetch and Save”

At first glance this sounds easy:

“Get 50 books after 2000 and save them.”

But APIs don’t always give you exactly what you want in one shot.

Some practical issues I had to handle:

The API returns results in pages (pagination)

Some books don’t have a publication year

Some entries are duplicates (different editions of the same work)

The API doesn’t provide a “random books” endpoint

So the script had to be a bit smarter than a single request.

🔍 How It Works (High-Level)

Send a search request to OpenLibrary with a filter:

first_publish_year:[2001 TO *]

Request up to 200 results per page.

Loop through results and:

Check publication year

Skip duplicates using a unique book key

Keep collecting until we reach 50 valid books.

Sort them by year (ascending).

Export to CSV.

The script stops immediately once 50 valid books are collected — no unnecessary API calls.

🧱 Tech Used

Python 3

requests library

Built-in csv module

Nothing fancy. Just clean logic and proper API handling.

▶️ How to Run

Install dependencies:

pip install requests

Run the script:

python main.py

After execution, you’ll find:

books.csv

in the project directory.

📂 Project Structure

openlibrary-books/
│
├── main.py
├── books.csv
└── README.md

⚠️ Things I Had to Think About Pagination

The API doesn’t guarantee that 50 valid results will come back in one response. So I implemented a loop that keeps requesting pages until enough books are collected.

Duplicate Entries

Search results sometimes include multiple editions of the same work. To prevent duplicates, I tracked each book’s unique key.

Missing Data

Some records don’t include publication year or author information. I used safe dictionary access (.get()) and validated fields before adding them.

📈 Possible Improvements

If this were a bigger project, I’d probably:

Add logging instead of simple print statements

Add retry logic for failed requests

Turn it into a CLI tool (e.g., choose year dynamically)

Add basic tests

Parameterize subject instead of hardcoding it

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
books.csv		books.csv
openlibrary.py		openlibrary.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages