[FEATURE](Markdown) code generation by mojodna · Pull Request #451 · OvertureMaps/schema

Seth Fitzsimmons (mojodna) · 2026-02-27T00:14:28Z

Summary

Add overture-schema-codegen, a code generator that produces documentation from
Pydantic schema models.

Pydantic's model_json_schema() flattens the schema's domain vocabulary into JSON
Schema primitives. NewType names, constraint provenance, and custom constraint classes
disappear. Navigating Python's type annotation machinery -- NewType chains, nested
Annotated wrappers, union filtering, generic resolution -- is complex. The codegen
does it once. analyze_type() unwraps annotations into TypeInfo, a flat
target-independent representation that renderers consume without re-entering the type
system.

Architecture

Four layers with strict downward imports. The package layout mirrors the
architecture -- each layer is a sub-package:

markdown/    Rendering       ← Output formatting, all presentation decisions
layout/      Output Layout   ← What to generate, where it goes
extraction/  Extraction      ← TypeInfo, FieldSpec, ModelSpec, UnionSpec
             Discovery       ← discover_models() from overture-schema-core

Discovery lives in overture-schema-core, not in the codegen package.
cli.py sits at the package root and imports from all three sub-packages.

analyze_type() is the central function. A single iterative loop peels NewType,
Annotated, Union, and container wrappers in fixed order, accumulating constraints tagged
with the NewType that contributed them. The result is a TypeInfo dataclass that
downstream modules consume without re-entering the type system.

Both concrete BaseModel subclasses and discriminated union type aliases (like Segment = Annotated[Union[RoadSegment, ...], ...]) satisfy the FeatureSpec protocol and flow
through the same pipeline. Union extraction finds the common base class, partitions
fields into shared and variant-specific, and extracts the discriminator mapping.

markdown/pipeline.py orchestrates the full pipeline without I/O: tree expansion,
supplementary type collection, path assignment, reverse references, and rendering.
Returns list[RenderedPage]. The CLI writes files to disk with Docusaurus frontmatter.

Design doc: packages/overture-schema-codegen/docs/design.md

Changes outside the codegen package

Preparatory fixes and refactors in core/system/CLI packages:

Rename ModelKey.class_name to entry_point (carries module:Class path, not just the
class name)
Attach docstrings to NewTypes at runtime (so the codegen can extract them)
Add resolve_discriminator_field_name() to system feature module
Fix relative imports and f-string prefixes in core
Use dict instead of Mapping in system test util type hints

Example data added to theme pyproject.toml files (addresses, base, buildings,
divisions, places) under [examples.ModelName] sections.

What's in the package

Source (33 files, ~3,800 lines):

Module	Purpose
`extraction/type_analyzer.py`	Iterative type unwrapping into `TypeInfo`
`extraction/specs.py`	Data structures shared between extraction and rendering
`extraction/type_registry.py`	Type name → per-target display string mapping
`extraction/model_extraction.py`	Pydantic model → `ModelSpec`, tree expansion
`extraction/union_extraction.py`	Union alias → `UnionSpec`, discriminator mapping
`extraction/enum_extraction.py`	Enum → `EnumSpec`
`extraction/newtype_extraction.py`	NewType → `NewTypeSpec`
`extraction/numeric_extraction.py`	Numeric type extraction (`NumericSpec`)
`extraction/field_constraints.py`	Constraint objects → display text
`extraction/model_constraints.py`	Model-level constraints → prose
`layout/module_layout.py`	Python module paths → output directories
`layout/type_collection.py`	Supplementary type discovery from field trees
`markdown/path_assignment.py`	Type names → output file paths
`markdown/link_computation.py`	Relative links between output pages
`markdown/reverse_references.py`	"Used By" reference computation
`markdown/type_format.py`	`TypeInfo` → markdown type strings with links
`markdown/renderer.py`	Jinja2 template driver for all page types
`extraction/examples.py`	TOML example loading, validation, flattening
`markdown/pipeline.py`	Pipeline orchestration (no I/O), system type partitioning
`cli.py`	Click CLI: `generate` and `list` commands
`extraction/case_conversion.py`	PascalCase → snake_case
`extraction/docstring.py`	Custom vs. auto-generated docstring detection

Tests (34 files, ~6,600 lines): unit tests per module, golden file tests for
rendered markdown, integration tests against real schema models.

Design decisions worth reviewing

analyze_type is iterative, not recursive. The while True loop handles arbitrary
nesting depth (NewType wrapping Annotated wrapping NewType wrapping Annotated...)
without stack growth. Dict key/value types are the one exception where it recurses.

Cache insertion before recursion in expand_model_tree. The sub-model's ModelSpec
enters the cache before its fields are expanded. A back-edge encounter finds the cached
entry and marks starts_cycle=True rather than infinite-looping.

FeatureSpec is a Protocol, not a base class. ModelSpec and UnionSpec have
different field structures (flat list vs. annotated-field list with variant provenance).
A protocol lets them share a pipeline interface without forcing inheritance.

Schema root computed from all entry points, before theme filtering. Output directory
structure must remain stable regardless of which themes are selected. Computing the root
from filtered paths would shift directories when themes change.

Constraint provenance via ConstraintSource. Each constraint records which NewType
contributed it. Field-level constraints with source=None render on the field;
constraints with a named source render on the NewType's own page. This prevents
duplication.

Test plan

make check passes (pytest + doctests + ruff + mypy): 2,111 tests
make install && python -m overture.schema.codegen generate --format markdown --output-dir /tmp/schema-docs produces output
Spot-check generated markdown for a union feature (e.g., Segment) and a model
feature (e.g., Building) -- field tables, links, constraint descriptions, examples
Verify cross-page links resolve correctly (supplementary types link back to
features, features link to shared types)

packages/overture-schema-codegen/src/overture/schema/codegen/extraction/type_analyzer.py

Roel Bollens (RoelBollens-TomTom) · 2026-03-03T13:31:36Z

Some observations without having looked into the code:

There are two different representations for a list in the markdown generation e.g:

Name	Type	Description
`emails`	`list<EmailStr>` (optional)	The email addresses of the place. `minimum length: 1` Ensures all items in a collection are unique. (`UniqueItemsConstraint`)
`phones`	`PhoneNumber` (list, optional)	The phone numbers of the place. `minimum length: 1` Ensures all items in a collection are unique. (`UniqueItemsConstraint`)

In places the referenced Address type points to the Address type of the addresses theme (../addresses/address.md) which is incorrect:

Name	Type	Description
`addresses[]`	`list<Address>` (optional)	The address or addresses of the place `minimum length: 1`

And something very minor; when Pydantic types are used such as EmailStr or HttpUrl there wont be a reference:

Name	Type	Description
`emails`	`list<EmailStr>` (optional)	The email addresses of the place. `minimum length: 1` Ensures all items in a collection are unique. (`UniqueItemsConstraint`)
`websites`	`list<HttpUrl>` (optional)	The websites of the place. `minimum length: 1` Ensures all items in a collection are unique. (`UniqueItemsConstraint`)

Adam Lastowka (Rachmanin0xFF)

I'm probably missing quite a bit (there is a lot of code here), but I like the structure + overall design and the generated markdown looks solid, so I'm approving.

Commented on a few small issues in addition to the aforementioned list representation confusion.

packages/overture-schema-codegen/src/overture/schema/codegen/example_loader.py

packages/overture-schema-codegen/src/overture/schema/codegen/markdown_renderer.py

packages/overture-schema-codegen/src/overture/schema/codegen/markdown/renderer.py

packages/overture-schema-codegen/src/overture/schema/codegen/markdown_renderer.py

packages/overture-schema-codegen/src/overture/schema/codegen/layout/type_collection.py

Seth Fitzsimmons (mojodna) · 2026-03-03T20:30:04Z

Roel Bollens (@RoelBollens-TomTom) good finds. I'm working on fixes for the incorrect Address reference (which results from name collisions) and creating pages + links for the Pydantic types.

The reason for the 2 different list representations is NewTypes that wrap list (and annotate with constraints) vs. vanilla lists. If we were to display PhoneNumber as list<T>, we'd lose the ability to link to PhoneNumber (and display its docstring, constraints, and references in more detail). We could render list<EmailStr> as "EmailStr (list, optional)", but then we'd be treating PhoneNumber and EmailStr as the same thing even though one is a list, the other is scalar.

Roel Bollens (@RoelBollens-TomTom) and Adam Lastowka (@Rachmanin0xFF) suggestions for making this less confusing?

Adam Lastowka (Rachmanin0xFF) · 2026-03-03T20:36:18Z

Roel Bollens (Roel Bollens (@RoelBollens-TomTom)) and Adam Lastowka (Adam Lastowka (@Rachmanin0xFF)) suggestions for making this less confusing?

Maybe just a comment at https://github.com/OvertureMaps/schema/pull/451/changes#diff-d3543f3c56213c5ae4cf72e240b850d5cf763f9ceb7d2a9f9cf78c7602075739R110 would be fine.

Seth Fitzsimmons (mojodna) · 2026-03-04T22:13:22Z

I didn't understand the list rendering issue that both Roel and Adam flagged (because I was blind to it in the Markdown output), but after seeing it this morning, I fixed it in 48df0ab (part of this PR).

Victor Schappert (vcschapp) · 2026-03-05T01:47:34Z

One comment about the Description - I haven't got to the code yet.

The first five lines of the Architecture section with its four layers is a super nice compression of info. "Context compaction" to coin a phrase. It primed me to find an explicit hierarchical organization into those 4 layers. Then when I get to What's in the package, it's a big 'ol viewport-filling 22-row table of flag filenames, which is also in the diff.

Would it make sense to group the files into directories/modules based on the layer they belong to? It'd definitely help with the job of climbing up and down the abstraction ladder.

Seth Fitzsimmons (mojodna) · 2026-03-06T17:31:09Z

I'd sketched a reorganization that also included extracting the Markdown generator (and using, guess what, entry points to register codegen targets) into separate packages and was planning on discussing that later. I pulled the split by layer into 1132e48.

Victor Schappert (vcschapp) · 2026-03-09T20:43:07Z

packages/overture-schema-addresses-theme/pyproject.toml

 [project.entry-points."overture.models"]
 "overture:addresses:address" = "overture.schema.addresses:Address"
+
+[[examples.Address]]


Victor Schappert (vcschapp) · 2026-03-09T21:05:02Z

packages/overture-schema-addresses-theme/pyproject.toml

+id = "416ab01c-d836-4c4f-aedc-2f30941ce94d"
+geometry = "POINT (-176.5637854 -43.9471955)"
+country = "NZ"
+postcode = "null"


Couldn't we just omit these values?

Seems like that's "the TOML design choice" anyway so it'd be consistent with Tom's Opinionated Opinion to leave them out, and AFAIK it shouldn't affect either the example validation or the example display in any way...

Other benefits of leaving them out: less code, less need to explain, and no risk of a "null conflation", where an example that's trying to put in the explicit string "null" has it replaced with None.

Done in 952b59b as part of refactoring the example renderer to read from the Pydantic model version (vs. the dict loaded from the TOML).

Victor Schappert (vcschapp) · 2026-03-09T21:09:46Z

packages/overture-schema-codegen/src/overture/schema/codegen/extraction/type_analyzer.py

@@ -0,0 +1,344 @@
+"""Iterative type unwrapping for Pydantic model annotations."""


Very minor suggestion: given the central importance of TypeInfo in the docs, I kept looking for the file that contained it, and couldn't find so had to guess a bit. Would calling this file type_info.py be an improvement overall, given it seems to be the central star around which the other exports orbit?

Victor Schappert (vcschapp) · 2026-03-09T21:37:17Z

packages/overture-schema-codegen/src/overture/schema/codegen/extraction/specs.py

+
+
+@runtime_checkable
+class FeatureSpec(Protocol):


What does Feature signify in this context?

Does it mean Feature in the same sense as used in system/core packages, or is it another sense?

Is there some other name that'd fit and not conflict with Feature? Something like Struct?

It's a thing with a setuptools entry point that's either a (system/core) Feature or a union of them (ModelSpec or UnionSpec). It's a top-level concept that gets its own page with special treatment (vs. BaseModels used as fields, which are nested in the navigation hierarchy because they don't deserve to be as discoverable).

Struct erases the distinction between top-level specs and field specs.

EntryPointSpec is another option, but lines up with our specific discovery implementation. I'm inclined to keep FeatureSpec in the sense that it references what we (broadly) consider a "Feature" (in Overture, but also system).

Victor Schappert (vcschapp) · 2026-03-09T21:42:14Z

packages/overture-schema-codegen/src/overture/schema/codegen/extraction/specs.py

+    source_type: object | None = None
+
+
+@dataclass


Is this just a numeric primitive?

If so the usage seems to conflict with other usages of primitive under the same directory, e.g., primitive_extraction.py also includes geometry. IMO we should aim for consistent name usage.

Aside, in my mental model, a string is also a primitive although the system package doesn't really capture that inclusion.

It is [just a numeric primitive]. Good call, I'll adjust the nomenclature accordingly.

Victor Schappert (vcschapp) · 2026-03-09T21:47:22Z

packages/overture-schema-codegen/src/overture/schema/codegen/cli.py

+@click.option(
+    "--format",
+    "output_format",
+    required=True,
+    type=click.Choice(_OUTPUT_FORMATS),
+    help="Output format",
+)


Does click support short-forms? I'd say --format markdown is the correct canonical argument string, but -f md is going to be a lot more usable. Would be nice to have both.

I agree. I spent approximately no time on the CLI ergonomics (it doesn't even support --type!) with the intent to build a proper subcommand after #449 (and once we know what arguments we actually need).

Victor Schappert (vcschapp) · 2026-03-09T22:05:56Z

packages/overture-schema-codegen/src/overture/schema/codegen/markdown/path_assignment.py

+PRIMITIVES_PAGE = PurePosixPath("system/primitive/primitives.md")
+GEOMETRY_PAGE = PurePosixPath("system/primitive/geometry.md")


Another place where "primitives" and "geometry" are treated as being separately.

Let's figure out ASAP what we need to do to bring everyone's conceptual model into alignment before it balkanizes into a bunch of semi-overlapping usages.

My internal definition has been something like:

A primitive is a fundamental scalar type supported by the platform. Fundamental means that conceptually it is a whole, and not composed of other smaller pieces that can be used independently.

In my mind,the following meet the definition: numeric types, bool, string types, and geometry types; while the following do not: collections; and classes (including enums) apart from the classes that model numbers, strings, geometry.

We could think about alternative terms. For example builtin is often used for this kind of thing (but primitive does better in my mind at capturing the "it's not a struct item").

We can also look at moving things around in system if it helps produce an aligned model.

I think calling just the numbers primitives seems wrong. At that point it's more numerics isn't it?

I didn't think too hard about this, expecting that we'd do a review of how the types manifest in the schema reference from the perspective of the user journey and not only refactor the "list" pages ("primitives" + "geometry", which are admittedly arbitrary groupings) but also the placement and labeling of types that come from core and system. Pydantic types in use (HttpUrl, EmailStr) get their own pages as well, which is arguably weird.

Domain-specific extractors that consume analyze_type() and produce specs: - model_extraction: extract_model() for Pydantic models with MRO-aware field ordering, alias resolution, and recursive sub-model expansion via expand_model_tree() - enum_extraction: extract_enum() for DocumentedEnum classes - newtype_extraction: extract_newtype() for semantic NewTypes - primitive_extraction: extract_primitives() for numeric types with range and precision introspection - union_extraction: extract_union() with field merging across discriminated union variants Shared test fixtures in codegen_test_support.py. Signed-off-by: Seth Fitzsimmons <sethfitz@amazon.com>

Generate prose from extracted constraint data: - field_constraint_description: describe field-level constraints (ranges, patterns, unique items, hex colors) as human-readable notes with NewType source attribution - model_constraint_description: describe model-level constraints (@require_any_of, @radio_group, @min_fields_set, @require_if, @forbid_if) as prose, with consolidation of same-field conditional constraints Signed-off-by: Seth Fitzsimmons <sethfitz@amazon.com>

Determine what artifacts to generate and where they go: - module_layout: compute output directories for entry points, map Python module paths to filesystem output paths via compute_output_dir - path_assignment: build_placement_registry maps types to output file paths. Feature models get {theme}/{slug}/, shared types get types/{subsystem}/, theme-local types nest under their feature or sit flat at theme level - type_collection: discover supplementary types (enums, NewTypes, sub-models) by walking expanded feature trees - link_computation: relative_link() computes cross-page links, LinkContext holds page path + registry for resolving links during rendering Signed-off-by: Seth Fitzsimmons <sethfitz@amazon.com>

Embed JSON example features in [tool.overture-schema.examples] sections. Each example is a complete GeoJSON Feature matching the theme's Pydantic model, used by the codegen example_loader to render example tables in documentation. Signed-off-by: Seth Fitzsimmons <sethfitz@amazon.com>

Jinja2 templates and rendering logic for documentation pages: - markdown_renderer: orchestrates page rendering for features, enums, NewTypes, primitives, and geometry. Recursively expands MODEL-kind fields inline with dot-notation. - markdown_type_format: type string formatting with link-aware rendering via LinkContext - example_loader: loads examples from theme pyproject.toml, validates against Pydantic models, flattens to dot-notation - reverse_references: computes "Used By" cross-references between types and the features that reference them Templates: feature, enum, newtype, primitives, geometry pages. Golden-file snapshot tests verify rendered output stability. Adds renderer-specific fixtures to conftest.py (cli_runner, primitives_markdown, geometry_markdown). Signed-off-by: Seth Fitzsimmons <sethfitz@amazon.com>

Click-based CLI entry point (overture-codegen generate) that wires discovery → extraction → output layout → rendering: - Discovers models via discover_models() entry points - Filters themes, extracts specs, builds placement registry - Renders markdown pages with field tables, examples, cross- references, and sidebar metadata - Supports --theme filtering and --output-dir targeting Integration tests verify extraction against real Overture models (Building, Division, Segment, etc.) to catch schema drift. CLI tests verify end-to-end generation, output structure, and link integrity. Signed-off-by: Seth Fitzsimmons <sethfitz@amazon.com>

Design doc covers the four-layer architecture, analyze_type(), domain-specific extractors, and extension points for new output targets. Walkthrough traces Segment through the full pipeline module-by-module in dependency order, with FeatureVersion as a secondary example for constraint provenance in the type analyzer. README describes the problem (Pydantic flattens domain vocabulary), the "unwrap once, render many" approach, CLI usage, architecture overview, and programmatic API. Signed-off-by: Seth Fitzsimmons <sethfitz@amazon.com>

TypeInfo.literal_value discarded multi-value Literals entirely (Literal["a", "b"] got None). Renamed to literal_values as a tuple of all args so consumers decide presentation. single_literal_value() preserves its contract: returns the value for single-arg Literals, None otherwise. Callers (example_loader, union_extraction) are unchanged. Multi-value Literals render as pipe-separated quoted values in markdown tables: `"a"` \| `"b"`. Signed-off-by: Seth Fitzsimmons <sethfitz@amazon.com>

Replace TypeInfo.is_list: bool with list_depth: int so nested lists like list[NewType("Hierarchy", list[HierarchyItem])] are handled correctly. analyze_type increments list_depth for each list[...] layer instead of setting a boolean. An is_list property preserves the boolean API for depth-unaware consumers. Markdown renderer: format_type and format_underlying_type wrap list_depth times. _expandable_list_suffix returns "[]" per nesting level for dot-notation expansion. Constraint annotation matching strips all trailing "[]" suffixes instead of one. Union extraction: _type_identity uses list_depth (int) instead of is_list (bool) so fields with different nesting depths don't incorrectly deduplicate. Update design doc and walkthrough to reflect list_depth replacing the is_list boolean throughout TypeInfo, _UnwrapState, type formatting, and union deduplication. Signed-off-by: Seth Fitzsimmons <sethfitz@amazon.com>

Replace bare class name keys with TypeIdentity objects across all registries. Two types with the same __name__ from different modules (e.g., Places Address vs Addresses Address) now get separate registry entries and resolve to different output paths. TypeIdentity is a frozen dataclass pairing a unique Python object (class, NewType callable, or union annotation) with its display name. Equality and hashing delegate to object identity so lookups are collision-free regardless of display name. Changes across the pipeline: - ConstraintSource stores source_ref (NewType callable) and source_name instead of a bare name string - type_collection, path_assignment, link_computation, and reverse_references all key on TypeIdentity - primitive_extraction returns TypeIdentity instead of strings - Renderers construct TypeIdentity for link resolution - Each spec type exposes an identity property via _SourceTypeIdentityMixin (or directly for UnionSpec) Signed-off-by: Seth Fitzsimmons <sethfitz@amazon.com>

MinLen/MaxLen: render as prose ("Minimum length: 1") instead of wrapping the entire phrase in backticks. Math notation (≥, <) stays in backticks; English words don't belong there. UniqueItemsConstraint: reword docstring from class-description phrasing ("Ensures all items in a collection are unique") to validation-requirement phrasing ("All items must be unique"), matching model-level constraint tone. String constraints: normalize PhoneNumberConstraint, RegionCodeConstraint, and WikidataIdConstraint docstrings to the "Allows only..." pattern used by all other StringConstraint subclasses. Signed-off-by: Seth Fitzsimmons <sethfitz@amazon.com>

Pydantic types like HttpUrl and EmailStr appear in field annotations but previously rendered as unlinked inline code. Each referenced Pydantic type now gets its own page under pydantic/<module>/ with a description, upstream Pydantic docs link, and Used By section. Discovery is reference-driven: the type collection visitor detects PRIMITIVE-kind types from pydantic modules in expanded feature trees. PydanticTypeSpec joins the SupplementarySpec union and flows through placement, reverse references, and rendering. Linking is registry-driven for all PRIMITIVE-kind types. Any primitive with a page in the placement registry gets linked, whether it's a Pydantic type (individual page) or a registered numeric primitive (aggregate page). This also links int32/float64 to the primitives page, which they weren't before. Shared is_pydantic_sourced() predicate gates collection and reverse reference tracking to pydantic-origin types without restricting the linking mechanism. Signed-off-by: Seth Fitzsimmons <sethfitz@amazon.com>

Remove bbox from default skip keys so it renders in example output like any other field. Signed-off-by: Seth Fitzsimmons <sethfitz@amazon.com>

After resolving type name collisions across themes (101596f), two referrers from different modules can share a display name. The sort key (kind, name) produced ties, and Python's sorted() preserved set iteration order for tied elements -- which depends on id()-based hashing and varies across process invocations. Add the source module as a tiebreaker: (kind, name, module). Expose TypeIdentity.module property to encapsulate the getattr(obj, "__module__") access pattern. Signed-off-by: Seth Fitzsimmons <sethfitz@amazon.com>

Constraint annotations in table description cells ran directly into the preceding description text with only a single <br/>. Double the break so constraints read as a separate paragraph. Signed-off-by: Seth Fitzsimmons <sethfitz@amazon.com>

list[PhoneNumber] rendered as "PhoneNumber (list)" — implying PhoneNumber itself is a list type. The root cause: format_type couldn't distinguish list layers outside a NewType from list layers inside one. Add newtype_outer_list_depth to TypeInfo, snapshotted from list_depth when the type analyzer enters the first NewType. The renderer uses this to choose list<X> syntax (list wraps the NewType) vs a (list) qualifier (NewType wraps a list internally). Non-NewType identities (enums, models) continue using list<X>. Signed-off-by: Seth Fitzsimmons <sethfitz@amazon.com>

_truncate() produced strings up to 103 chars (100 + "..."). Account for the 3-char ellipsis so output stays within the 100-char limit. Signed-off-by: Seth Fitzsimmons <sethfitz@amazon.com>

str() on string list items renders as [a, b], indistinguishable from bare identifiers. repr() renders as ['a', 'b'] so strings are visually distinct from numbers. Signed-off-by: Seth Fitzsimmons <sethfitz@amazon.com>

extract_model() on union members produced ModelSpecs with model=None on MODEL-kind fields. _collect_from_fields then hit the RuntimeError guard when it encountered those unexpanded references. Call expand_model_tree() on each member before walking its fields. No current union members have sub-model fields, so this was latent. Signed-off-by: Seth Fitzsimmons <sethfitz@amazon.com>

flatten_example recursed into all dicts, splitting dict-typed fields like `tags: dict[str, str]` into dot-notation rows. Now collect_dict_paths walks the FieldSpec tree to identify dict-typed field paths, and _flatten_value checks membership before recursing. Indexed runtime paths (items[0].tags) are normalized to schema notation (items[].tags) for matching. The pipeline computes dict_paths from spec.fields and threads them through load_examples. Also: clarify mutual exclusion in type visitor elif chains (reverse_references, type_collection) and rename _TypeIdentity to _TypeShape in union_extraction to avoid shadowing specs.TypeIdentity. Signed-off-by: Seth Fitzsimmons <sethfitz@amazon.com>

Move modules into three sub-packages matching the architecture layers: - extraction/ (14 modules): type analysis, specs, extractors, constraints - layout/ (2 modules): module layout, type collection - markdown/ (6 modules + templates): pipeline, renderer, type formatting, links, paths, reverse references Three modules renamed to drop redundant prefixes: field_constraint_description → extraction/field_constraints model_constraint_description → extraction/model_constraints example_loader → extraction/examples Templates flattened from templates/markdown/ to markdown/templates/. Signed-off-by: Seth Fitzsimmons <sethfitz@amazon.com>

Replace dict-walking flatten machinery with Pydantic model-instance traversal. validate_example returns a BaseModel instance; flatten_model_instance walks it via isinstance checks to produce dot-notation key-value pairs, eliminating the need for external schema information (collect_dict_paths). augment_missing_fields adds cross-arm union fields as None. Remove "null" sentinel convention from TOML examples. Pydantic fills None defaults for omitted fields, making the _denull pipeline stage unnecessary. Fix BBox dict validation (missing return in __get_pydantic_core_schema__), BBox flattening via __slots__ property detection, datetime isoformat rendering, and non-string value truncation for Geometry objects. Signed-off-by: Seth Fitzsimmons <sethfitz@amazon.com>

Replace rST-style double backticks with single backticks across docstrings to match project convention. Preserve double backticks where the wrapped text itself contains backtick characters (literal markdown syntax examples). Fix D301 in type_format.py with a raw docstring for backslash content. Signed-off-by: Seth Fitzsimmons <sethfitz@amazon.com>

Extraction-layer spec describes numeric types exclusively (bounds, float_bits). Frees 'primitive' for the broader system-level taxonomy. Renames: - PrimitiveSpec → NumericSpec - primitive_extraction.py → numeric_extraction.py - extract_primitives → extract_numerics - partition_primitive_and_geometry_names → partition_numeric_and_geometry_types partition_numeric_and_geometry_types moved from numeric_extraction to pipeline — it discovers both numeric and geometry types, so it didn't belong in a module scoped to numeric extraction. Renderer function names and output constants unchanged — those describe rendered output, not the extraction concept. Signed-off-by: Seth Fitzsimmons <sethfitz@amazon.com>

Assert statements used for runtime validation disappear under python -O. Replace with TypeError/ValueError raises in validate_example, TypeIdentity.of, _SourceTypeIdentityMixin, _format_constraint, _linked_type_identity, type_collection. Guard _find_common_base against empty members list. Collapse duplicate datetime.datetime/date branches. Rename _format_field_list → _backtick_join (returns string, not list). Signed-off-by: Seth Fitzsimmons <sethfitz@amazon.com>

Victor Schappert (vcschapp) · 2026-03-11T19:39:02Z

Failing because of:

Run uv run python ./.github/workflows/scripts/package-versions.py compare \
Traceback (most recent call last):
  File "/home/runner/work/schema/schema/./.github/workflows/scripts/package-versions.py", line 154, in <module>
    main()
  File "/home/runner/work/schema/schema/./.github/workflows/scripts/package-versions.py", line 147, in main
    compare(sys.argv[2], sys.argv[3])
  File "/home/runner/work/schema/schema/./.github/workflows/scripts/package-versions.py", line 71, in compare
    combined_keys = sorted(list(set(before_dict.keys()) | set(after_dict.keys())), key=level)
  File "/home/runner/work/schema/schema/./.github/workflows/scripts/package-versions.py", line 69, in level
    raise ValueError(f"Unknown package for level computation: {package}")
ValueError: Unknown package for level computation: overture-schema-codegen
Error: Process completed with exit code 1.

Looks like we just need to add the codegen package into a manifest.

Seth Fitzsimmons (mojodna) added the change type - cosmetic 🌹 Cosmetic change label Feb 27, 2026

Seth Fitzsimmons (mojodna) temporarily deployed to staging February 27, 2026 00:14 — with GitHub Actions Inactive

Seth Fitzsimmons (mojodna) requested review from Adam Lastowka (Rachmanin0xFF), Roel Bollens (RoelBollens-TomTom), Jennings Anderson (jenningsanderson) and Victor Schappert (vcschapp) February 27, 2026 00:19

Seth Fitzsimmons (mojodna) mentioned this pull request Feb 27, 2026

Sync codegen output (2026-02-26) OvertureMaps/docs#278

Merged

Seth Fitzsimmons (mojodna) force-pushed the codegen branch from dafd3d7 to 4198027 Compare February 27, 2026 00:34

Seth Fitzsimmons (mojodna) temporarily deployed to staging February 27, 2026 00:34 — with GitHub Actions Inactive

Seth Fitzsimmons (mojodna) force-pushed the codegen branch from 4198027 to 23b22c7 Compare February 27, 2026 00:37

Seth Fitzsimmons (mojodna) temporarily deployed to staging February 27, 2026 00:38 — with GitHub Actions Inactive

Seth Fitzsimmons (mojodna) force-pushed the codegen branch from 23b22c7 to 8b0d396 Compare February 27, 2026 00:42

Seth Fitzsimmons (mojodna) temporarily deployed to staging February 27, 2026 00:42 — with GitHub Actions Inactive

Adam Lastowka (Rachmanin0xFF) reviewed Feb 28, 2026

View reviewed changes

packages/overture-schema-codegen/src/overture/schema/codegen/extraction/type_analyzer.py Show resolved Hide resolved

Adam Lastowka (Rachmanin0xFF) previously approved these changes Mar 3, 2026

View reviewed changes

Seth Fitzsimmons (mojodna) mentioned this pull request Mar 4, 2026

Sync codegen output (2026-03-03) OvertureMaps/docs#282

Merged

Seth Fitzsimmons (mojodna) dismissed Adam Lastowka (Rachmanin0xFF)’s stale review via 48df0ab March 4, 2026 21:58

Victor Schappert (vcschapp) reviewed Mar 9, 2026

View reviewed changes

Seth Fitzsimmons (mojodna) changed the title ~~(Markdown) code generation~~ [FEATURE](Markdown) code generation Mar 10, 2026

Seth Fitzsimmons (mojodna) force-pushed the codegen branch 3 times, most recently from 89635f2 to a5f64db Compare March 10, 2026 06:56

Victor Schappert (vcschapp) approved these changes Mar 11, 2026

View reviewed changes

Roel Bollens (RoelBollens-TomTom) approved these changes Mar 11, 2026

View reviewed changes

Seth Fitzsimmons (mojodna) added 25 commits March 11, 2026 12:06

fix(codegen): include bbox in examples

e16700e

Remove bbox from default skip keys so it renders in example output like any other field. Signed-off-by: Seth Fitzsimmons <sethfitz@amazon.com>

fix(codegen): include ellipsis in truncation limit

50f4aad

_truncate() produced strings up to 103 chars (100 + "..."). Account for the 3-char ellipsis so output stays within the 100-char limit. Signed-off-by: Seth Fitzsimmons <sethfitz@amazon.com>

fix(codegen): use repr() for list items in examples

667f9ad

str() on string list items renders as [a, b], indistinguishable from bare identifiers. repr() renders as ['a', 'b'] so strings are visually distinct from numbers. Signed-off-by: Seth Fitzsimmons <sethfitz@amazon.com>

Seth Fitzsimmons (mojodna) force-pushed the codegen branch from 038c250 to 86d864a Compare March 11, 2026 19:06

Victor Schappert (vcschapp) merged commit 0cbf205 into dev Mar 11, 2026
5 of 7 checks passed

Victor Schappert (vcschapp) deleted the codegen branch March 11, 2026 19:39

Victor Schappert (vcschapp) mentioned this pull request Mar 11, 2026

[BUG] (Fix) Add codegen package to package version level calculation #465

Merged

		@@ -0,0 +1,344 @@
		"""Iterative type unwrapping for Pydantic model annotations."""

		PRIMITIVES_PAGE = PurePosixPath("system/primitive/primitives.md")
		GEOMETRY_PAGE = PurePosixPath("system/primitive/geometry.md")

Conversation

Seth Fitzsimmons (mojodna) commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Architecture

Changes outside the codegen package

What's in the package

Design decisions worth reviewing

Test plan

Uh oh!

Uh oh!

Roel Bollens (RoelBollens-TomTom) commented Mar 3, 2026

Uh oh!

Adam Lastowka (Rachmanin0xFF) left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Seth Fitzsimmons (mojodna) commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Adam Lastowka (Rachmanin0xFF) commented Mar 3, 2026

Uh oh!

Seth Fitzsimmons (mojodna) commented Mar 4, 2026

Uh oh!

Victor Schappert (vcschapp) commented Mar 5, 2026

Uh oh!

Seth Fitzsimmons (mojodna) commented Mar 6, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Victor Schappert (vcschapp) commented Mar 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Seth Fitzsimmons (mojodna) commented Feb 27, 2026 •

edited

Loading

Seth Fitzsimmons (mojodna) commented Mar 3, 2026 •

edited

Loading