A standalone toolkit for parsing CityGML files and converting 3D building geometry to the STEP (ISO 10303-21) CAD format. Originally extracted from Paper-CAD.
Full documentation: https://soynyuu.github.io/gml2step/
gml2step reads CityGML 2.0 files — including large-scale datasets from Japan's PLATEAU project — and produces STEP files suitable for CAD/CAM/BIM workflows.
Key capabilities:
- CityGML parsing with streaming support for files of any size
- STEP conversion via OpenCASCADE with automatic LoD fallback (LoD3 -> LoD2 -> LoD1 -> LoD0)
- 4 conversion methods: solid, sew, extrude, and auto (tries all in sequence)
- 8-phase geometry pipeline (including Phase 1.5) with progressive auto-repair
- PLATEAU data fetching via public APIs (MLIT Data Catalog + OSM Nominatim)
- Footprint extraction for 2D analysis without requiring OCCT
- CRS auto-detection with built-in support for all 19 Japan Plane Rectangular CS zones
- Overview
- Installation
- Quick Start
- CLI Reference
- Conversion Methods
- Processing Pipeline
- LoD Support
- Streaming Parser
- CRS and Coordinate Handling
- PLATEAU Integration
- Architecture
- Development
- License
pip install gml2steppip install "gml2step[plateau]"STEP conversion depends on pythonocc-core, which is not reliably pip-installable on all platforms. Use conda or Docker:
# conda
conda install -c conda-forge pythonocc-core
pip install gml2step
# Docker (recommended for full conversion)
docker build -t gml2step .
docker run --rm -v $(pwd):/data gml2step convert /data/input.gml /data/output.stepNote: Parsing, streaming, and footprint extraction work without OCCT. Only the
convertcommand requires it.
# Parse a CityGML file and print summary as JSON
gml2step parse ./input.gml
# Stream-parse buildings one at a time (constant memory)
gml2step stream-parse ./input.gml --limit 100
# Extract 2D footprints with height estimates
gml2step extract-footprints ./input.gml --output-json ./footprints.json
# Convert CityGML to STEP
gml2step convert ./input.gml ./output.step --method solidfrom gml2step import parse, stream_parse, extract_footprints, convert
# Lightweight summary (no OCCT required)
summary = parse("input.gml")
print(summary["total_buildings"])
print(summary["detected_source_crs"])
# Stream buildings with constant memory usage
for building, xlink_index in stream_parse("input.gml", limit=10):
bid = building.get("{http://www.opengis.net/gml}id")
print(bid)
# Extract 2D footprints with height
footprints = extract_footprints("input.gml", limit=100)
for fp in footprints:
print(fp.building_id, fp.height, len(fp.exterior))
# Full CityGML -> STEP conversion
ok, result = convert("input.gml", "output.step", method="auto")gml2step convert INPUT_GML OUTPUT_STEP [OPTIONS]
| Option | Default | Description |
|---|---|---|
--limit N |
None | Maximum number of buildings to convert |
--method |
solid |
Conversion method: solid, sew, extrude, auto |
--debug |
False | Enable debug logging |
--use-streaming / --no-use-streaming |
True | Use streaming parser for lower memory usage |
--building-id ID |
None | Filter by specific building ID(s). Can be used multiple times |
--filter-attribute |
gml:id |
Attribute to match for building ID filtering |
gml2step parse INPUT_GML [--limit N]
Outputs a JSON summary with detected CRS, building count, and building IDs.
gml2step stream-parse INPUT_GML [--limit N] [--building-id ID ...] [--filter-attribute gml:id]
Streams building IDs one per line using constant memory. Supports filtering by building ID.
gml2step extract-footprints INPUT_GML [--output-json PATH] [--limit N] [--default-height 10.0]
Extracts 2D footprints with estimated building heights. Height is derived from measuredHeight, Z-coordinate range, or the specified default.
| Method | Description |
|---|---|
| solid | Primary method. Extracts LoD surfaces, builds shells, validates solids, auto-repairs. Best for LoD2/LoD3. |
| sew | Collects WallSurface/RoofSurface/GroundSurface polygons, sews faces, and attempts to form a solid. |
| extrude | Extrudes LoD0 footprint to estimated height. Fallback for files with only 2D data. |
| auto | Tries solid -> sew -> extrude in sequence until one succeeds. |
The convert command processes each building through 8 phases (including Phase 1.5):
| Phase | Description |
|---|---|
| 0. Recentering | Translates coordinates near the origin for OCCT numerical stability |
| 1. LoD Selection | Selects the best available LoD (LoD3 -> LoD2 -> LoD1 fallback) |
| 1.5. CRS Detection | Auto-detects source CRS and reprojects if needed |
| 2. Geometry Extraction | Extracts faces using the selected conversion method |
| 3. Shell Construction | Builds OCCT shells from faces with multi-pass sewing |
| 4. Solid Validation | Validates geometry and constructs solids |
| 5. Auto-Repair | 4-level progressive repair: minimal -> standard -> aggressive -> ultra |
| 6. Part Merging | Fuses BuildingParts via Boolean union (with compound fallback) |
| 7. STEP Export | Writes AP214CD STEP file with millimeter units |
The precision_mode parameter controls coordinate tolerance:
| Mode | Relative tolerance | Use case |
|---|---|---|
standard |
0.01% | General use |
high |
0.001% | Detailed models |
maximum |
0.0001% | High-precision CAD |
ultra |
0.00001% | Maximum fidelity |
The shape_fix_level parameter controls auto-repair aggressiveness. When repair fails at the specified level, it automatically escalates:
- minimal — ShapeFix_Solid only
- standard — + ShapeUpgrade_UnifySameDomain
- aggressive — + Rebuild with relaxed tolerance
- ultra — + ShapeFix_Shape (full repair)
gml2step supports CityGML Level of Detail 0 through 3:
| LoD | Description | Surfaces supported |
|---|---|---|
| LoD3 | Architectural detail models | lod3Solid, lod3MultiSurface, lod3Geometry |
| LoD2 | Standard building models (PLATEAU primary) | lod2Solid, lod2MultiSurface, lod2Geometry, boundedBy |
| LoD1 | Simple block models | lod1Solid |
| LoD0 | 2D footprints | lod0FootPrint, lod0RoofEdge, GroundSurface |
All 6 CityGML 2.0 boundary surface types are recognized: WallSurface, RoofSurface, GroundSurface, OuterCeilingSurface, OuterFloorSurface, ClosureSurface.
For large CityGML files (common in PLATEAU datasets), gml2step provides a SAX-style streaming parser that processes one building at a time instead of loading the entire DOM tree into memory:
- O(1 building) memory vs O(entire file) for DOM parsing
- Two-tier XLink resolution cache (local per-building + global LRU)
- Optional NumPy-accelerated coordinate parsing
Note: The streaming parser has not been formally benchmarked. Memory savings and speedup depend heavily on file size and building complexity. The theoretical advantage is that memory usage stays roughly constant regardless of file size, while DOM parsing scales linearly with file size.
for building, xlinks in stream_parse("large_plateau_file.gml"):
process(building)- Auto-detection of source CRS from GML
srsNameattributes - All 19 Japan Plane Rectangular CS zones (EPSG:6669–6687) with automatic zone selection by latitude/longitude
- Automatic reprojection from geographic CRS (WGS84, JGD2000, JGD2011) to an appropriate projected CRS
- Coordinate recentering near the origin to prevent floating-point precision loss in OCCT
PLATEAU is a project by Japan's Ministry of Land, Infrastructure, Transport and Tourism (MLIT) that provides open 3D city models for the entire country in CityGML format.
gml2step provides optional convenience functions for fetching PLATEAU data (pip install "gml2step[plateau]"). Under the hood, this is a thin wrapper around two public APIs:
- PLATEAU Data Catalog API (operated by MLIT) — queried for CityGML file URLs by mesh code or municipality
- Nominatim (OpenStreetMap) — used for geocoding Japanese addresses to latitude/longitude
There is no custom backend server. All requests go directly to these public endpoints.
- Address search: Takes a Japanese address (e.g., "東京都千代田区霞が関3-2-1"), geocodes it via Nominatim, converts the coordinates to a JIS X 0410 mesh code, fetches CityGML files covering that area from the PLATEAU API, then parses and ranks the buildings by distance/name similarity.
- Mesh code lookup: Given a mesh code, fetches CityGML file URLs from the PLATEAU API and downloads them.
- Building ID lookup: Given a specific building ID and mesh code, fetches and parses just the relevant 1km grid area.
from gml2step.plateau.fetcher import search_buildings_by_address
result = search_buildings_by_address(
"東京都千代田区霞が関3-2-1",
search_mode="hybrid", # "distance", "name", or "hybrid"
limit=10,
)
for b in result["buildings"]:
print(b.building_id, b.name, b.height, b.has_lod2, b.has_lod3)PLATEAU data is organized by JIS X 0410 standard mesh codes. gml2step provides conversion functions for all 5 mesh levels:
from gml2step.plateau.mesh_utils import (
latlon_to_mesh_1st, # 80km grid (4-digit)
latlon_to_mesh_2nd, # 10km grid (6-digit)
latlon_to_mesh_3rd, # 1km grid (8-digit)
latlon_to_mesh_half, # 500m grid (9-digit)
latlon_to_mesh_quarter # 250m grid (10-digit)
)
mesh = latlon_to_mesh_3rd(35.6812, 139.7671) # Tokyo Stationimport asyncio
from gml2step.plateau.api_client import fetch_plateau_datasets_by_mesh
# Fetch PLATEAU dataset URLs by mesh code
result = asyncio.run(fetch_plateau_datasets_by_mesh("53394525"))- Geocoding via Nominatim (rate-limited to 1 req/sec per Nominatim policy), with Japan-specific validation and relevance scoring
- Building ranking with 3 modes: distance, name similarity (Levenshtein + token matching), hybrid
- JIS X 0410 mesh code conversion (1st through quarter mesh)
- Neighboring mesh enumeration (3x3 grid) for boundary searches
- Async batch resolution of mesh codes with concurrency control
- Local CityGML caching (opt-in via
CITYGML_CACHE_ENABLED/CITYGML_CACHE_DIRenv vars) - Offline mesh-to-municipality mapping in
plateau.api_client(nationwide JSON), with Tokyo-only fallback inplateau.mesh_mapping
src/gml2step/
├── __init__.py # Public API: convert, parse, stream_parse, extract_footprints
├── api.py # API implementation
├── cli.py # Typer CLI
├── coordinate_utils.py # CRS utilities, Japan zone definitions
├── data/
│ └── mesh2_municipality.json # Nationwide mesh-to-municipality mapping
├── citygml/
│ ├── core/ # Types, constants, CityGML namespaces
│ ├── parsers/ # Coordinate and polygon extraction
│ ├── streaming/ # SAX-style streaming parser, XLink cache, coordinate optimizer
│ ├── lod/ # LoD0–LoD3 extraction strategies, footprint extractor
│ ├── geometry/ # OCCT geometry builders, shell/solid construction, auto-repair
│ ├── transforms/ # CRS detection, reprojection, recentering
│ ├── utils/ # XLink resolver, XML parser, logging
│ └── pipeline/ # Orchestrator (8-phase conversion pipeline, incl. Phase 1.5)
└── plateau/ # PLATEAU API client, geocoding, mesh utilities, building search
git clone https://github.com/Soynyuu/gml2step.git
cd gml2step
pip install -e ".[dev,plateau]"
pytestThis project is licensed under the GNU Affero General Public License v3.0 (AGPL-3.0-or-later).
gml2step was originally developed as part of Paper-CAD and extracted as a standalone library. See NOTICE for full attribution.
- Paper-CAD — The parent project from which gml2step was extracted
- PLATEAU — Japan's national 3D city model project (MLIT)
- OpenCASCADE / pythonocc-core — 3D CAD kernel for STEP conversion
- pyproj — Coordinate reference system transformations
- Mitou Junior — A program supporting creators aged 17 and under with original ideas and outstanding technical skills