Skip to content

DaBlower/u-crawler

Repository files navigation

U-crawler

License: GPL v3Hackatime Badge
A python-based web scraper that collects and categorises UNSW (University of New South Wales) undergraduate programs by their areas of interest and outputs it into structured JSON files

Objective

The objective of this crawler is to scrape every area of interest and find all of the undergraduate programs offered at the University of New South Wales.

Directory Structure

├── categories.py
├── main.py
├── programs.py
└── results
    ├── categories.json - Contains the areas of interest which programs.py then uses
    └── programs
        └── 2025 - Each area of interest has its own JSON file with every program inside of it
            ├── Architecture and Building.json
            ├── Business and Management.json
            ├── Creative Arts.json
            ├── Education.json
            ├── Engineering and Related Technologies.json
            ├── Environmental and Related Studies.json
            ├── Health.json
            ├── Humanities and Law.json
            ├── Information Technology.json
            ├── Natural and Physical Sciences.json
            └── programs.json - This contains all of the separate area of interest files together

Setup instructions

Prebuilt executables

Just download the latest release, unzip and run!

Manual

  1. Clone this repository
  2. Create a virtual environment
    python -m venv venv
    source venv/bin/activate # or venv\Scripts\activate on Windows
  3. Install dependencies
    pip install -r requirements.txt

Usage

Run the scraper using python main.py Note: You can only run the scraper using main.py, you cannot run categories.py or programs.py separately

Features

  • Uses requests, BeautifulSoup and Selenium with headless Chrome
  • Scrapes areas of interest (eg. Engineering and Related Technologies) and the undergraduate programs in each one
  • Respects robots.txt
  • Writes results in structured JSON files per area of interest and overall
  • Logs crawl status and errors in logs/ and debugging information if --debug or -d is passed as an argument

About

A python-based web scraper that collects and categorises UNSW (University of New South Wales) undergraduate programs

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages