NBSC Scraper

A minimal CLI tool for downloading economic data from China's National Bureau of Statistics (NBSC).

Features

Simple CLI - Search and download data with intuitive commands
Fast Search - Build a local searchable catalog of 1000+ economic indicators
Pandas Integration - Direct DataFrame loading for data analysis
Reliable - Automatic retry logic, connection pooling, and rate limiting
Clean Data - Properly formatted CSVs with standardized dates
Well-Documented - Type hints, docstrings, and comprehensive examples

Quick Start

Installation

# Clone the repository
git clone https://github.com/MaCoZu/nbsc.git
cd nbsc

# Install dependencies
pip install -e .

Basic Usage

# Build the catalog (first time only, ~5-8 minutes)
nbsc build

# Search for indicators
nbsc search "gdp"
nbsc search "manufacturing"

# Download data
nbsc download A0101 --last 12

Commands

Command	Description
`nbsc build`	Build indicator catalog from NBSC API
`nbsc tree [CODE]`	Browse indicator hierarchy
`nbsc search <query>`	Search indicators by name or code
`nbsc download <code>`	Download time-series data as CSV

Tree Navigation

Browse the indicator hierarchy:

# View root categories
nbsc tree

# View children of a category
nbsc tree A01

Indicators are marked as:

[DATA] - Downloadable dataset
[DIR] - Category with children

Search Options

# Show datasets only (default)
nbsc search "gdp"

# Include categories
nbsc search "price" --all

# Show only categories
nbsc search "economic" --parents

# Limit results
nbsc search "index" --limit 20

Download Options

# Download all available data
nbsc download A0101

# Download last N periods
nbsc download A0101 --last 12

# Specify output file
nbsc download A0101 --last 60 -o pmi_data.csv

Python API

Use load_data() for programmatic access:

from nbsc import load_data

# Load data as pandas DataFrame
df = load_data("A0B01", last_n=12)

# DataFrame has DatetimeIndex and named columns
print(df.head())
print(df.columns)

# Analyze data
print(df.describe())
df["Manufacturing Purchasing Managers' Index (%)"].plot()

See examples/ directory for more detailed usage patterns.

Data Coverage

The tool provides access to:

National:

hgyd - Monthly data
hgjd - Quarterly data
hgnd - Annual data

Provincial:

fsyd - Monthly data
fsjd - Quarterly data
fsnd - Annual data

City:

csyd - Monthly data
csjd - Quarterly data
csnd - Annual data

All indicators include English names.

Output Format

CSV Structure

Data is saved with:

Date column - Standardized YYYY-MM-DD format
Value columns - One column per sub-indicator with full names and units
Metadata file - Separate .txt file with indicator details

Example output:

Date,Manufacturing Purchasing Managers' Index (%),...
2025-12-01,50.1,...
2025-11-01,49.2,...

Date Formats

Dates are automatically converted from NBSC format:

Monthly: 202301 → 2025-01-01
Quarterly: 2025D → 2025-10-01 (Q4)
Annual: 2025 → 2025-01-01

Technical Details

Architecture

API Client (api.py) - Handles all HTTP communication with retry logic and rate limiting
Catalog Manager (catalog.py) - Builds and searches the indicator catalog
CLI (cli.py) - User interface built with Typer
Data Loader (data.py) - Converts API responses to pandas DataFrames

Design Principles

Minimalism - ~750 lines of code, only 3 dependencies
Efficiency - Connection pooling, local catalog caching
Reliability - Automatic retries with exponential backoff
Simplicity - Clear separation of concerns, comprehensive type hints

API Details

Endpoint: https://data.stats.gov.cn/english/easyquery.htm

The tool uses two API methods:

getTree - Fetches indicator hierarchies
QueryData - Fetches time-series data

Rate limiting: 0.1s between requests (respectful of server resources)

Requirements

Python 3.8+
Internet access to https://data.stats.gov.cn
Dependencies: requests, pandas, typer

Note: The NBSC API may require VPN access from outside China.

Examples

See the examples/ directory:

example_load_data.py - Basic DataFrame loading
example_analysis.py - Comprehensive analysis workflow

License

MIT License - see LICENSE file for details.

Contributing

Contributions are welcome! This project follows clean code principles with:

Type hints throughout
Comprehensive docstrings
Ruff for linting and formatting
Clear separation of concerns

Acknowledgments

Data provided by the National Bureau of Statistics of China (NBSC).

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
examples		examples
src/nbsc		src/nbsc
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NBSC Scraper

Features

Quick Start

Installation

Basic Usage

Commands

Tree Navigation

Search Options

Download Options

Python API

Data Coverage

Output Format

CSV Structure

Date Formats

Technical Details

Architecture

Design Principles

API Details

Requirements

Examples

License

Contributing

Acknowledgments

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

NBSC Scraper

Features

Quick Start

Installation

Basic Usage

Commands

Tree Navigation

Search Options

Download Options

Python API

Data Coverage

Output Format

CSV Structure

Date Formats

Technical Details

Architecture

Design Principles

API Details

Requirements

Examples

License

Contributing

Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages