Skip to content

maarefvnd/shelf-query-script

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

📚 OpenLibrary 50 Books Project

This project is a small Python script that talks to the OpenLibrary public API, grabs book data, filters books published after 2000, and saves exactly 50 unique results into a CSV file.

Simple idea. A few hidden challenges. Good practice with APIs.

🎯 What This Script Does

Sends requests to the OpenLibrary API

Filters books published after the year 2000

Avoids duplicate results

Collects exactly 50 valid books

Sorts them by publication year (ascending)

Saves everything into books.csv

🧠 Why It’s Not Just “Fetch and Save”

At first glance this sounds easy:

“Get 50 books after 2000 and save them.”

But APIs don’t always give you exactly what you want in one shot.

Some practical issues I had to handle:

The API returns results in pages (pagination)

Some books don’t have a publication year

Some entries are duplicates (different editions of the same work)

The API doesn’t provide a “random books” endpoint

So the script had to be a bit smarter than a single request.

🔍 How It Works (High-Level)

Send a search request to OpenLibrary with a filter:

first_publish_year:[2001 TO *]

Request up to 200 results per page.

Loop through results and:

Check publication year

Skip duplicates using a unique book key

Keep collecting until we reach 50 valid books.

Sort them by year (ascending).

Export to CSV.

The script stops immediately once 50 valid books are collected — no unnecessary API calls.

🧱 Tech Used

Python 3

requests library

Built-in csv module

Nothing fancy. Just clean logic and proper API handling.

▶️ How to Run

Install dependencies:

pip install requests

Run the script:

python main.py

After execution, you’ll find:

books.csv

in the project directory.

📂 Project Structure

openlibrary-books/
│
├── main.py
├── books.csv
└── README.md

⚠️ Things I Had to Think About Pagination

The API doesn’t guarantee that 50 valid results will come back in one response. So I implemented a loop that keeps requesting pages until enough books are collected.

Duplicate Entries

Search results sometimes include multiple editions of the same work. To prevent duplicates, I tracked each book’s unique key.

Missing Data

Some records don’t include publication year or author information. I used safe dictionary access (.get()) and validated fields before adding them.

📈 Possible Improvements

If this were a bigger project, I’d probably:

Add logging instead of simple print statements

Add retry logic for failed requests

Turn it into a CLI tool (e.g., choose year dynamically)

Add basic tests

Parameterize subject instead of hardcoding it

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages