Skip to content

MaCoZu/nbsc

Repository files navigation

NBSC Scraper

A minimal CLI tool for downloading economic data from China's National Bureau of Statistics (NBSC).

Features

  • Simple CLI - Search and download data with intuitive commands
  • Fast Search - Build a local searchable catalog of 1000+ economic indicators
  • Pandas Integration - Direct DataFrame loading for data analysis
  • Reliable - Automatic retry logic, connection pooling, and rate limiting
  • Clean Data - Properly formatted CSVs with standardized dates
  • Well-Documented - Type hints, docstrings, and comprehensive examples

Quick Start

Installation

# Clone the repository
git clone https://github.com/MaCoZu/nbsc.git
cd nbsc

# Install dependencies
pip install -e .

Basic Usage

# Build the catalog (first time only, ~5-8 minutes)
nbsc build

# Search for indicators
nbsc search "gdp"
nbsc search "manufacturing"

# Download data
nbsc download A0101 --last 12

Commands

Command Description
nbsc build Build indicator catalog from NBSC API
nbsc tree [CODE] Browse indicator hierarchy
nbsc search <query> Search indicators by name or code
nbsc download <code> Download time-series data as CSV

Tree Navigation

Browse the indicator hierarchy:

# View root categories
nbsc tree

# View children of a category
nbsc tree A01

Indicators are marked as:

  • [DATA] - Downloadable dataset
  • [DIR] - Category with children

Search Options

# Show datasets only (default)
nbsc search "gdp"

# Include categories
nbsc search "price" --all

# Show only categories
nbsc search "economic" --parents

# Limit results
nbsc search "index" --limit 20

Download Options

# Download all available data
nbsc download A0101

# Download last N periods
nbsc download A0101 --last 12

# Specify output file
nbsc download A0101 --last 60 -o pmi_data.csv

Python API

Use load_data() for programmatic access:

from nbsc import load_data

# Load data as pandas DataFrame
df = load_data("A0B01", last_n=12)

# DataFrame has DatetimeIndex and named columns
print(df.head())
print(df.columns)

# Analyze data
print(df.describe())
df["Manufacturing Purchasing Managers' Index (%)"].plot()

See examples/ directory for more detailed usage patterns.

Data Coverage

The tool provides access to:

National:

  • hgyd - Monthly data
  • hgjd - Quarterly data
  • hgnd - Annual data

Provincial:

  • fsyd - Monthly data
  • fsjd - Quarterly data
  • fsnd - Annual data

City:

  • csyd - Monthly data
  • csjd - Quarterly data
  • csnd - Annual data

All indicators include English names.

Output Format

CSV Structure

Data is saved with:

  • Date column - Standardized YYYY-MM-DD format
  • Value columns - One column per sub-indicator with full names and units
  • Metadata file - Separate .txt file with indicator details

Example output:

Date,Manufacturing Purchasing Managers' Index (%),...
2025-12-01,50.1,...
2025-11-01,49.2,...

Date Formats

Dates are automatically converted from NBSC format:

  • Monthly: 2023012025-01-01
  • Quarterly: 2025D2025-10-01 (Q4)
  • Annual: 20252025-01-01

Technical Details

Architecture

  • API Client (api.py) - Handles all HTTP communication with retry logic and rate limiting
  • Catalog Manager (catalog.py) - Builds and searches the indicator catalog
  • CLI (cli.py) - User interface built with Typer
  • Data Loader (data.py) - Converts API responses to pandas DataFrames

Design Principles

  • Minimalism - ~750 lines of code, only 3 dependencies
  • Efficiency - Connection pooling, local catalog caching
  • Reliability - Automatic retries with exponential backoff
  • Simplicity - Clear separation of concerns, comprehensive type hints

API Details

Endpoint: https://data.stats.gov.cn/english/easyquery.htm

The tool uses two API methods:

  • getTree - Fetches indicator hierarchies
  • QueryData - Fetches time-series data

Rate limiting: 0.1s between requests (respectful of server resources)

Requirements

  • Python 3.8+
  • Internet access to https://data.stats.gov.cn
  • Dependencies: requests, pandas, typer

Note: The NBSC API may require VPN access from outside China.

Examples

See the examples/ directory:

  • example_load_data.py - Basic DataFrame loading
  • example_analysis.py - Comprehensive analysis workflow

License

MIT License - see LICENSE file for details.

Contributing

Contributions are welcome! This project follows clean code principles with:

  • Type hints throughout
  • Comprehensive docstrings
  • Ruff for linting and formatting
  • Clear separation of concerns

Acknowledgments

Data provided by the National Bureau of Statistics of China (NBSC).

About

Get data from the National Bureau of Statistics of China via the console. Includes function to load pandas DataFrame directly in your Python project.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages