Skip to content

genropy/genro-bag

Repository files navigation

genro-bag

PyPI version Tests codecov Documentation Python 3.10+ License

An Intermediate Representation for Structured Data

A Bag is an intermediate representation (IR) for hierarchical data in Python.

What is an Intermediate Representation?

In compiler design, an IR is a data structure that sits between source code and machine code. It captures the essential structure while abstracting away format-specific details, making it easier to analyze, transform, and generate output.

The same principle applies to data: configuration files, API responses, documents, and UI structures all share a common shape—named things containing values, organized hierarchically, with metadata attached—but we typically scatter this across dictionaries, classes, JSON, XML, and database rows.

A Bag provides a canonical representation for this common pattern:

flowchart LR
    subgraph Sources
        JSON[JSON file]
        XML[XML file]
        API[API response]
        DB[Database]
        Input[User input]
    end

    subgraph BAG[" "]
        Tree["Unified tree<br/>of named nodes"]
    end

    subgraph Outputs
        HTML[HTML]
        XMLout[XML]
        DBout[Database]
        JSONout[JSON]
        UI[UI]
    end

    JSON --> Tree
    XML --> Tree
    API --> Tree
    DB --> Tree
    Input --> Tree

    Tree --> HTML
    Tree --> XMLout
    Tree --> DBout
    Tree --> JSONout
    Tree --> UI
Loading

Why an IR?

Decoupling: Your application logic works with one structure, regardless of input/output formats. Change your data source from XML to JSON? Your code doesn't change.

Uniformity: One access pattern (bag['path.to.value']), one way to attach metadata, one subscription model—instead of learning different APIs for each library.

Transformation: Operate on the structure itself: walk the tree, filter nodes, transform values, validate structure—without knowing if it came from a file, an API, or a database.

Round-tripping: Serialize to XML, JSON, or MessagePack and back, preserving types, attributes, and structure—including lazy-loaded values (resolvers).

The Core Model

Every node in a Bag has:

Component Purpose
Label The node's name (key in the hierarchy)
Value The data it holds (any Python value, or another Bag)
Attributes Metadata attached to the node
Tag Optional semantic type (like XML elements)

Access is path-based: bag['config.database.host'] navigates the hierarchy using dot notation.

Progressive Capability

Bag provides four layers—use only what you need:

Layer Purpose Use When
Core Bag Paths, values, attributes, serialization Always
Resolvers Lazy-loaded, computed values API calls, DB queries, expensive computations
Subscriptions React to changes Validation, logging, sync, computed properties
Builders Domain-specific languages HTML, Markdown, XML with structure validation

Interactive Notebooks

Try genro-bag directly in your browser with Google Colab:

Notebook Topics
Open In Colab Core Bag — Paths, values, attributes, serialization
Open In Colab Resolvers — Lazy loading, caching, custom resolvers
Open In Colab Subscriptions — Events, validation, computed properties
Open In Colab Builders — DSL, structure validation, cardinality

Install

pip install genro-bag

Documentation

Resource Description
Full Documentation Complete guide with examples
Why Bag? Detailed comparison with alternatives
Getting Started Learn the core concepts

In This Repository

Directory Description
src/genro_bag/ Core implementation
src/genro_bag/resolvers/ Built-in resolvers (URL, Directory, OpenAPI)
src/genro_bag/builders/ Built-in builders (HTML, Markdown, XSD)
examples/ Usage examples
tests/ Test suite (1500+ tests)
docs/ Sphinx documentation source

Development

pip install -e ".[dev]"
pytest

License

Apache License 2.0 — see LICENSE for details.

Copyright 2025 Softwell S.r.l. — Genropy Team