Skip to content

feat: add SQLite storage as single source of truth#321

Open
jimisola wants to merge 16 commits intomainfrom
feat/313-sqlite-storage
Open

feat: add SQLite storage as single source of truth#321
jimisola wants to merge 16 commits intomainfrom
feat/313-sqlite-storage

Conversation

@jimisola
Copy link
Member

@jimisola jimisola commented Mar 13, 2026

Improvements

This PR replaces the hand-built, multi-stage intermediate data pipeline with an in-memory SQLite database as the single source of truth.

Complexity reduction

Area Before After
Data pipeline stages 4 (RawDataset → CombinedRawDataset → CombinedIndexedDataset → StatisticsContainer) 2 (RawDataset → SQLite INSERT → query)
Hand-built index dicts 5 (svcs_from_req, mvrs_from_svc, reqs_from_urn, etc.) 0 — SQL FK relationships
Filtering implementation 390-line imperative IndexedDatasetFilterProcessor with recursive cleanup SQL DELETE with ON DELETE CASCADE
EL transformers 2 empty subclasses + base class evaluating per-item in Python 1 compiler emitting SQL WHERE clauses
Status table columns 13 (5 sub-columns × 2 groups + 3) with 97-line merged header renderer 5 flat columns with inline formatted cells
Statistics computation 313-line StatisticsGenerator with manual aggregation loops SQL queries via StatisticsService

What was removed (1,181 lines)

  • StatisticsContainer / TotalStatisticsItem (105 lines)
  • StatisticsGenerator (313 lines)
  • CombinedIndexedDatasetGenerator (302 lines)
  • IndexedDatasetFilterProcessor (390 lines)
  • CombinedIndexedDataset (53 lines)
  • Per-item EL transformers (18 lines)

What replaced it (1,567 lines)

  • Storage layer (1,059 lines): schema, database wrapper, populator, SQL filter processor, EL-to-SQL compiler, repository, authorizer, pipeline
  • Services layer (508 lines): StatisticsService, ExportService

The net increase of ~386 lines trades imperative index-maintenance code for a declarative storage layer with FK constraints, CASCADE rules, ACID semantics, and a security authorizer.


Summary

Replace intermediate data structures (CombinedIndexedDataset, StatisticsContainer, StatisticsGenerator) with an in-memory SQLite database as the single source of truth. All commands (status, export, report) now query the database through a RequirementsRepository and service layer instead of traversing in-memory dicts.

Phases completed in this PR

  • Phase 1-2: SQLite infrastructure — schema (13 tables with CHECK constraints, FK indexes), RequirementsDatabase, DatabasePopulator, authorizer (security sandbox)
  • Phase 3: EL-to-SQL compiler and DatabaseFilterProcessor for applying requirement/SVC filters via SQL DELETE
  • Phase 4: Service layer, repository, and command rewrites
    • RequirementsRepository — data access layer with all entity queries and test result resolution
    • StatisticsService — replaces StatisticsGenerator + StatisticsContainer with frozen TestStats/RequirementStatus/TotalStats dataclasses
    • ExportService — replaces GenerateJsonCommand inline logic with --req-ids/--svc-ids filtering
    • build_database() — pipeline helper: Location → parse → populate DB → filter → validate
    • LifecycleValidator and GroupByOrganizor migrated from CombinedIndexedDataset to RequirementsRepository
    • Multi-pass DB population to handle cross-URN FK constraints
  • JSON Schemas: status_output.schema.json and export_output.schema.json for output validation
  • Status table redesign: Collapsed 13-column table (5 sub-columns per test group) into 5 columns with compact inline formatting — positionally-aligned counts with colored numbers, dim dashes for zeros, and color-coded legend

Status table (before → after)

Before: 13 columns with complex merged header spanning T/P/F/S/M sub-columns for each test group.

After: 5 clean columns — each test cell shows inline counts with colors (green=passed, red=failed, yellow=skipped, orange=missing):

status-table

Key design decisions

  • Pipeline: Location → CombinedRawDatasetsGenerator(database=db) → DatabaseFilterProcessor → RequirementsRepository → commands
  • Old code (StatisticsGenerator, StatisticsContainer, CombinedIndexedDataset, etc.) kept for Phase 5 cleanup
  • Report uses _TotalStatsTemplateAdapter to bridge new attribute names to existing Jinja2 templates
  • References in reports are now sorted for deterministic output

Closes

Related

Test plan

  • All 300 unit tests pass
  • Lint clean (flake8)
  • Status output verified for test_standard, test_basic fixtures, and reqstool-demo
  • Report output matches main (only difference: sorted reference ordering — improvement)
  • Status JSON validates against status_output.schema.json
  • Export JSON validates against export_output.schema.json
  • JSON output unchanged by status table redesign (separate code path)
  • Verify against reqstool-demo (requires ./mvnw verify in sibling project)
  • Manual CLI testing per TEST_MATRIX.md scenarios

🤖 Generated with Claude Code

jimisola and others added 6 commits March 12, 2026 20:47
…ecurity (#313)

Add in-memory SQLite database as foundation for replacing intermediate
data structures (CombinedRawDataset, CombinedIndexedDataset). Includes
schema with 13 tables, FK indexes, authorizer for security, insert API,
and DatabasePopulator that converts RawDataset to SQL rows.

Phase 1+2 of the migration plan — pure addition with opt-in DB param
on CombinedRawDatasetsGenerator.
Add SQL CHECK constraints to enforce valid values at the database level
for significance, lifecycle_state, implementation, category,
verification_type, test status, variant, and element_kind columns.
Add ELToSQLCompiler that translates Lark expression language parse trees
into SQL WHERE clauses with parameterized queries. Add DatabaseFilterProcessor
that replicates the recursive DAG-walk filter logic using SQL DELETEs
with cascade cleanup of orphaned SVCs and MVRs.
…ommands (#313)

Phase 4 of the SQLite storage migration: replace CombinedIndexedDataset
with direct database queries through RequirementsRepository and service
layer (StatisticsService, ExportService).

- Add RequirementsRepository as the data access layer wrapping RequirementsDatabase
- Add StatisticsService with TestStats/RequirementStatus/TotalStats dataclasses
- Add ExportService for JSON export with --req-ids/--svc-ids filtering
- Add build_database() pipeline helper in storage/pipeline.py
- Rewrite status, export, and report commands to use DB pipeline
- Migrate LifecycleValidator from CombinedIndexedDataset to RequirementsRepository
- Migrate GroupByOrganizor from CombinedIndexedDataset to RequirementsRepository
- Fix multi-pass DB population to satisfy FK constraints across URNs
- Update all affected tests for new interfaces

Signed-off-by: Jimisola Laursen <jimisola@jimisola.com>
…file names (#313)

- Make build_database() a context manager; update all commands to use `with` blocks
- Replace Utils dict helpers with collections.defaultdict in parsing graph
- Delete unused CombinedIndexedDataset, statistics_container, statistics_generator,
  indexed_dataset_filter_processor, and 5 dead Utils methods
- Remove empty RequirementsELTransformer/SVCsELTransformer subclasses
- Rename files to match their primary class names (el_compiler → el_to_sql_compiler,
  filter_processor → database_filter_processor, generic_el → generic_el_transformer)
- Add unit tests for RequirementsRepository, StatisticsService, ExportService, pipeline
- Update CLAUDE.md architecture docs to reflect SQLite pipeline

Signed-off-by: jimisola <jimisola@jimisola.com>
…lls (#313)

Replace 13-column status table (5 sub-columns per test group) with 5
columns using compact inline formatting. Each test cell shows
positionally-aligned counts (T P F S M) with colored numbers and dim
dashes for zeros. Remove merged header complexity.

- Add _format_test_cell() for single-cell test stats rendering
- Use orange for missing, yellow for skipped (was both red)
- Empty cell for not_applicable (was ambiguous dash)
- Color-coded legend
- Delete _build_merged_headers, _parse_col_widths,
  _replace_header_with_merged, _format_cell

Signed-off-by: jimisola <jimisola@jimisola.com>
@jimisola jimisola changed the title feat: add SQLite storage as single source of truth (#313) feat: add SQLite storage as single source of truth Mar 13, 2026
@jimisola
Copy link
Member Author

image

Signed-off-by: jimisola <jimisola@jimisola.com>
@jimisola
Copy link
Member Author

@Jonas-Werne @lfvdavid It runs, regression tests passes etc, so please try locally.

Jimisola Laursen added 2 commits March 13, 2026 18:07
#313)

Fix two bugs:
- TotalStats.missing_automated_tests and missing_manual_tests were
  never aggregated from per-requirement stats, always reporting 0.
  Now accumulated from each requirement's TestStats after calculation.
- Dangling FK references (e.g. SVC referencing non-existent requirement)
  crashed with IntegrityError. Now caught gracefully with warnings,
  allowing semantic validation to report all errors.

Signed-off-by: jimisola <jimisola@jimisola.com>
Regression testing verified directly against main. Baselines were
from pre-Pydantic-v2 and are no longer needed.

Signed-off-by: jimisola <jimisola@jimisola.com>
@jimisola
Copy link
Member Author

Regression Test Results

Full regression comparison of main vs feat/313-sqlite-storage across 3 datasets: test_standard/baseline/ms-001, test_basic/baseline/ms-101, and reqstool-demo.

Reports (asciidoc + markdown) — 6/6 cosmetic only

All report diffs are cosmetic:

  • Extra blank line after title
  • Trailing whitespace removed from header lines
  • Sorted reference ordering (e.g. ext-001:REQ_ext001_100, ms-001:REQ_010 instead of ms-001:REQ_010, ext-001:REQ_ext001_100) — alphabetical sort, an improvement

No data value changes in any report output.

Export JSON — intentional schema changes

All diffs are the new schema per #315:

  • metadata wrapper with renamed keys (initial_urn, import_graph)
  • urn/id split (was composite "id": "ms-001:REQ_010")
  • implementationimplementation_type
  • annotations_impls/annotations_testsannotations.implementations/annotations.tests
  • automated_test_resulttest_results
  • Internal lookup maps removed (reqs_from_urn, svcs_from_req, mvrs_from_svc, etc.)
  • Clean enum values ("effective" instead of jsonpickle format)

Status JSON — intentional schema changes

Same schema redesign as export: cleaner naming, nested totals structure, metadata section.

Bugs found and fixed

  1. "SVCs missing tests/MVRs" regressionTotalStats.missing_automated_tests and missing_manual_tests were never aggregated from per-requirement stats, always reporting 0. Fixed by accumulating from each requirement's TestStats in _calculate(). Counts now match main (e.g. test_standard: 2/2).

  2. FK crash on test_errors dataset — Dangling FK references (SVC→non-existent REQ, MVR→non-existent SVC, annotations→non-existent targets) caused sqlite3.IntegrityError crash. Fixed by catching IntegrityError in insert_svc, insert_mvr, insert_annotation_impl, and insert_annotation_test with warning-level logging. Semantic validation now runs and reports all errors as before.

Stale baselines removed

The baselines/ directory was captured from pre-Pydantic-v2 main (784f77b) and was outdated. Removed after verifying regression directly against current main.

Jimisola Laursen and others added 6 commits March 13, 2026 18:09
Signed-off-by: jimisola <jimisola@jimisola.com>
Signed-off-by: jimisola <jimisola@jimisola.com>
Extract helper methods from ExportService.to_export_dict (C901: 21→<10),
StatisticsService._calculate (C901: 17→<10), and
DatabasePopulator.populate_from_raw_dataset (C901: 14→<10) to satisfy
flake8 C901 complexity threshold.

Signed-off-by: jimisola <jimisola@jimisola.com>
The top-level `from reqstool.common.utils import Utils` was incorrectly
removed in the dead-code cleanup commit, causing NameError when running
as an installed package (the conditional import only covers direct exec).

Signed-off-by: jimisola <jimisola@jimisola.com>
…unts (#313)

Replace CRD→CID pipeline in all three commands (status, export, report)
with direct DB queries via RequirementsRepository and service layer.
Fix StatisticsService undercounting missing automated tests and MVRs
by aggregating from per-requirement stats instead of global annotation scan.

Signed-off-by: Jimisola Laursen <jimisola@jimisola.com>
Copy link

@Jonas-Werne Jonas-Werne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Image

I removed the commented out import from the demo project and got this and it did not get a enumerated value.

But more importantly, with the current representation you can't see the ID of the requirement this will not work.

I like the enumerations but they can't replace the ID.

We could combine the URN + ID column to contain the full ID

reqstool-demo:REQ_001, ext-001:REQ_001 etc, then we could change the ID column to say STATUS that include the Enumerated status

@jimisola
Copy link
Member Author

I removed the commented out import from the demo project and got this and it did not get a enumerated value.

What enumerated value are you referring to?

But more importantly, with the current representation you can't see the ID of the requirement this will not work.

REQ_PASS, REQ_MANUAL_FAIL etc are the IDs of those requirements as per: https://github.com/reqstool/reqstool-demo/blob/main/docs/reqstool/requirements.yml

What do you mean that you can't see the ID?

@jimisola
Copy link
Member Author

We should add @requirements that are only implemented once to have that case as well:

https://github.com/reqstool/reqstool-demo/blob/5c77cbb39f442955f25d2e7023c442024743e72c/src/main/java/io/github/reqstool/example/demo/RequirementsExample.java#L5-L8

Or rather, remove the class level annotations except for one requirement that we name accordingly, e.g. REQ_IMPLEMENTED_2_TIMES"

@Jonas-Werne
Copy link

I removed the commented out import from the demo project and got this and it did not get a enumerated value.

What enumerated value are you referring to?

But more importantly, with the current representation you can't see the ID of the requirement this will not work.

REQ_PASS, REQ_MANUAL_FAIL etc are the IDs of those requirements as per: https://github.com/reqstool/reqstool-demo/blob/main/docs/reqstool/requirements.yml

What do you mean that you can't see the ID?

My mistake, I confused the ID to be some status of the requirement. Like REQ_PASS would mean everything was done and REQ_NOT_IMPLEMENTED meant that you have not annotated the code.

@jimisola
Copy link
Member Author

I removed the commented out import from the demo project and got this and it did not get a enumerated value.

What enumerated value are you referring to?

But more importantly, with the current representation you can't see the ID of the requirement this will not work.

REQ_PASS, REQ_MANUAL_FAIL etc are the IDs of those requirements as per: https://github.com/reqstool/reqstool-demo/blob/main/docs/reqstool/requirements.yml
What do you mean that you can't see the ID?

My mistake, I confused the ID to be some status of the requirement. Like REQ_PASS would mean everything was done and REQ_NOT_IMPLEMENTED meant that you have not annotated the code.

No, worries. I appreciate you taking a look. I though that the REQ_nnn were not self-explanatory as the new IDs are.

@Jonas-Werne Jonas-Werne self-requested a review March 15, 2026 20:31
@Jonas-Werne
Copy link

I think @lfvdavid has the best data to regression test this. Maybe we should generate a more complex data in the demo project.

All levels microservice, system, external and include filtering of requirements and also some MVR

@Jonas-Werne Jonas-Werne dismissed their stale review March 15, 2026 20:36

No valid change request

@jimisola
Copy link
Member Author

I think @lfvdavid has the best data to regression test this. Maybe we should generate a more complex data in the demo project.

All levels microservice, system, external and include filtering of requirements and also some MVR

Ideally, I don't want the demo to be too complex. It shall just demonstrate the basics for new users. What do you think?

I rather have a more complex example directly in reqstool-client then. And we still need to dog food reqstool-client and others with reqstool.

Signed-off-by: jimisola <jimisola@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

2 participants