feat: add SQLite storage as single source of truth by jimisola · Pull Request #321 · reqstool/reqstool-client

jimisola · 2026-03-13T09:46:09Z

Improvements

This PR replaces the hand-built, multi-stage intermediate data pipeline with an in-memory SQLite database as the single source of truth.

Complexity reduction

Area	Before	After
Data pipeline stages	4 (RawDataset → CombinedRawDataset → CombinedIndexedDataset → StatisticsContainer)	2 (RawDataset → SQLite INSERT → query)
Hand-built index dicts	5 (`svcs_from_req`, `mvrs_from_svc`, `reqs_from_urn`, etc.)	0 — SQL FK relationships
Filtering implementation	390-line imperative `IndexedDatasetFilterProcessor` with recursive cleanup	SQL DELETE with ON DELETE CASCADE
EL transformers	2 empty subclasses + base class evaluating per-item in Python	1 compiler emitting SQL WHERE clauses
Status table columns	13 (5 sub-columns × 2 groups + 3) with 97-line merged header renderer	5 flat columns with inline formatted cells
Statistics computation	313-line `StatisticsGenerator` with manual aggregation loops	SQL queries via `StatisticsService`

What was removed (1,181 lines)

StatisticsContainer / TotalStatisticsItem (105 lines)
StatisticsGenerator (313 lines)
CombinedIndexedDatasetGenerator (302 lines)
IndexedDatasetFilterProcessor (390 lines)
CombinedIndexedDataset (53 lines)
Per-item EL transformers (18 lines)

What replaced it (1,567 lines)

Storage layer (1,059 lines): schema, database wrapper, populator, SQL filter processor, EL-to-SQL compiler, repository, authorizer, pipeline
Services layer (508 lines): StatisticsService, ExportService

The net increase of ~386 lines trades imperative index-maintenance code for a declarative storage layer with FK constraints, CASCADE rules, ACID semantics, and a security authorizer.

Summary

Replace intermediate data structures (CombinedIndexedDataset, StatisticsContainer, StatisticsGenerator) with an in-memory SQLite database as the single source of truth. All commands (status, export, report) now query the database through a RequirementsRepository and service layer instead of traversing in-memory dicts.

Phases completed in this PR

Phase 1-2: SQLite infrastructure — schema (13 tables with CHECK constraints, FK indexes), RequirementsDatabase, DatabasePopulator, authorizer (security sandbox)
Phase 3: EL-to-SQL compiler and DatabaseFilterProcessor for applying requirement/SVC filters via SQL DELETE
Phase 4: Service layer, repository, and command rewrites
- RequirementsRepository — data access layer with all entity queries and test result resolution
- StatisticsService — replaces StatisticsGenerator + StatisticsContainer with frozen TestStats/RequirementStatus/TotalStats dataclasses
- ExportService — replaces GenerateJsonCommand inline logic with --req-ids/--svc-ids filtering
- build_database() — pipeline helper: Location → parse → populate DB → filter → validate
- LifecycleValidator and GroupByOrganizor migrated from CombinedIndexedDataset to RequirementsRepository
- Multi-pass DB population to handle cross-URN FK constraints
JSON Schemas: status_output.schema.json and export_output.schema.json for output validation
Status table redesign: Collapsed 13-column table (5 sub-columns per test group) into 5 columns with compact inline formatting — positionally-aligned counts with colored numbers, dim dashes for zeros, and color-coded legend

Status table (before → after)

Before: 13 columns with complex merged header spanning T/P/F/S/M sub-columns for each test group.

After: 5 clean columns — each test cell shows inline counts with colors (green=passed, red=failed, yellow=skipped, orange=missing):

Key design decisions

Pipeline: Location → CombinedRawDatasetsGenerator(database=db) → DatabaseFilterProcessor → RequirementsRepository → commands
Old code (StatisticsGenerator, StatisticsContainer, CombinedIndexedDataset, etc.) kept for Phase 5 cleanup
Report uses _TotalStatsTemplateAdapter to bridge new attribute names to existing Jinja2 templates
References in reports are now sorted for deterministic output

Closes

Closes investigate: replace intermediate data structures with in-memory SQLite as single source of truth #313 — SQLite as single source of truth
Closes investigate: rename/refactor StatisticsContainer and related models for clarity #312 — StatisticsContainer renamed/refactored into StatisticsService + clean dataclasses
Closes feat: define JSON Schema for export and status command output #315 — JSON Schema for export and status output
Closes investigate: JSON export strategy with SQLite backend #317 — JSON export strategy with SQLite backend (Option 3: query → dict → validate against schema)
Closes epic: separate internal data layer from user-facing output contracts #318 — Separate internal data layer from user-facing output contracts (services layer decouples all output formats from internal models)

Test plan

All 300 unit tests pass
Lint clean (flake8)
Status output verified for test_standard, test_basic fixtures, and reqstool-demo
Report output matches main (only difference: sorted reference ordering — improvement)
Status JSON validates against status_output.schema.json
Export JSON validates against export_output.schema.json
JSON output unchanged by status table redesign (separate code path)
Verify against reqstool-demo (requires ./mvnw verify in sibling project)
Manual CLI testing per TEST_MATRIX.md scenarios

🤖 Generated with Claude Code

…ecurity (#313) Add in-memory SQLite database as foundation for replacing intermediate data structures (CombinedRawDataset, CombinedIndexedDataset). Includes schema with 13 tables, FK indexes, authorizer for security, insert API, and DatabasePopulator that converts RawDataset to SQL rows. Phase 1+2 of the migration plan — pure addition with opt-in DB param on CombinedRawDatasetsGenerator.

Add SQL CHECK constraints to enforce valid values at the database level for significance, lifecycle_state, implementation, category, verification_type, test status, variant, and element_kind columns.

Add ELToSQLCompiler that translates Lark expression language parse trees into SQL WHERE clauses with parameterized queries. Add DatabaseFilterProcessor that replicates the recursive DAG-walk filter logic using SQL DELETEs with cascade cleanup of orphaned SVCs and MVRs.

…ommands (#313) Phase 4 of the SQLite storage migration: replace CombinedIndexedDataset with direct database queries through RequirementsRepository and service layer (StatisticsService, ExportService). - Add RequirementsRepository as the data access layer wrapping RequirementsDatabase - Add StatisticsService with TestStats/RequirementStatus/TotalStats dataclasses - Add ExportService for JSON export with --req-ids/--svc-ids filtering - Add build_database() pipeline helper in storage/pipeline.py - Rewrite status, export, and report commands to use DB pipeline - Migrate LifecycleValidator from CombinedIndexedDataset to RequirementsRepository - Migrate GroupByOrganizor from CombinedIndexedDataset to RequirementsRepository - Fix multi-pass DB population to satisfy FK constraints across URNs - Update all affected tests for new interfaces Signed-off-by: Jimisola Laursen <jimisola@jimisola.com>

…file names (#313) - Make build_database() a context manager; update all commands to use `with` blocks - Replace Utils dict helpers with collections.defaultdict in parsing graph - Delete unused CombinedIndexedDataset, statistics_container, statistics_generator, indexed_dataset_filter_processor, and 5 dead Utils methods - Remove empty RequirementsELTransformer/SVCsELTransformer subclasses - Rename files to match their primary class names (el_compiler → el_to_sql_compiler, filter_processor → database_filter_processor, generic_el → generic_el_transformer) - Add unit tests for RequirementsRepository, StatisticsService, ExportService, pipeline - Update CLAUDE.md architecture docs to reflect SQLite pipeline Signed-off-by: jimisola <jimisola@jimisola.com>

…lls (#313) Replace 13-column status table (5 sub-columns per test group) with 5 columns using compact inline formatting. Each test cell shows positionally-aligned counts (T P F S M) with colored numbers and dim dashes for zeros. Remove merged header complexity. - Add _format_test_cell() for single-cell test stats rendering - Use orange for missing, yellow for skipped (was both red) - Empty cell for not_applicable (was ambiguous dash) - Color-coded legend - Delete _build_merged_headers, _parse_col_widths, _replace_header_with_merged, _format_cell Signed-off-by: jimisola <jimisola@jimisola.com>

jimisola · 2026-03-13T16:18:53Z

Signed-off-by: jimisola <jimisola@jimisola.com>

jimisola · 2026-03-13T16:30:37Z

@Jonas-Werne @lfvdavid It runs, regression tests passes etc, so please try locally.

#313) Fix two bugs: - TotalStats.missing_automated_tests and missing_manual_tests were never aggregated from per-requirement stats, always reporting 0. Now accumulated from each requirement's TestStats after calculation. - Dangling FK references (e.g. SVC referencing non-existent requirement) crashed with IntegrityError. Now caught gracefully with warnings, allowing semantic validation to report all errors. Signed-off-by: jimisola <jimisola@jimisola.com>

Regression testing verified directly against main. Baselines were from pre-Pydantic-v2 and are no longer needed. Signed-off-by: jimisola <jimisola@jimisola.com>

jimisola · 2026-03-13T17:08:19Z

Regression Test Results

Full regression comparison of main vs feat/313-sqlite-storage across 3 datasets: test_standard/baseline/ms-001, test_basic/baseline/ms-101, and reqstool-demo.

Reports (asciidoc + markdown) — 6/6 cosmetic only

All report diffs are cosmetic:

Extra blank line after title
Trailing whitespace removed from header lines
Sorted reference ordering (e.g. ext-001:REQ_ext001_100, ms-001:REQ_010 instead of ms-001:REQ_010, ext-001:REQ_ext001_100) — alphabetical sort, an improvement

No data value changes in any report output.

Export JSON — intentional schema changes

All diffs are the new schema per #315:

metadata wrapper with renamed keys (initial_urn, import_graph)
urn/id split (was composite "id": "ms-001:REQ_010")
implementation → implementation_type
annotations_impls/annotations_tests → annotations.implementations/annotations.tests
automated_test_result → test_results
Internal lookup maps removed (reqs_from_urn, svcs_from_req, mvrs_from_svc, etc.)
Clean enum values ("effective" instead of jsonpickle format)

Status JSON — intentional schema changes

Same schema redesign as export: cleaner naming, nested totals structure, metadata section.

Bugs found and fixed

"SVCs missing tests/MVRs" regression — TotalStats.missing_automated_tests and missing_manual_tests were never aggregated from per-requirement stats, always reporting 0. Fixed by accumulating from each requirement's TestStats in _calculate(). Counts now match main (e.g. test_standard: 2/2).
FK crash on test_errors dataset — Dangling FK references (SVC→non-existent REQ, MVR→non-existent SVC, annotations→non-existent targets) caused sqlite3.IntegrityError crash. Fixed by catching IntegrityError in insert_svc, insert_mvr, insert_annotation_impl, and insert_annotation_test with warning-level logging. Semantic validation now runs and reports all errors as before.

Stale baselines removed

The baselines/ directory was captured from pre-Pydantic-v2 main (784f77b) and was outdated. Removed after verifying regression directly against current main.

Signed-off-by: jimisola <jimisola@jimisola.com>

Extract helper methods from ExportService.to_export_dict (C901: 21→<10), StatisticsService._calculate (C901: 17→<10), and DatabasePopulator.populate_from_raw_dataset (C901: 14→<10) to satisfy flake8 C901 complexity threshold. Signed-off-by: jimisola <jimisola@jimisola.com>

The top-level `from reqstool.common.utils import Utils` was incorrectly removed in the dead-code cleanup commit, causing NameError when running as an installed package (the conditional import only covers direct exec). Signed-off-by: jimisola <jimisola@jimisola.com>

…unts (#313) Replace CRD→CID pipeline in all three commands (status, export, report) with direct DB queries via RequirementsRepository and service layer. Fix StatisticsService undercounting missing automated tests and MVRs by aggregating from per-requirement stats instead of global annotation scan. Signed-off-by: Jimisola Laursen <jimisola@jimisola.com>

Jonas-Werne

I removed the commented out import from the demo project and got this and it did not get a enumerated value.

But more importantly, with the current representation you can't see the ID of the requirement this will not work.

I like the enumerations but they can't replace the ID.

We could combine the URN + ID column to contain the full ID

reqstool-demo:REQ_001, ext-001:REQ_001 etc, then we could change the ID column to say STATUS that include the Enumerated status

jimisola · 2026-03-15T20:18:22Z

I removed the commented out import from the demo project and got this and it did not get a enumerated value.

What enumerated value are you referring to?

But more importantly, with the current representation you can't see the ID of the requirement this will not work.

REQ_PASS, REQ_MANUAL_FAIL etc are the IDs of those requirements as per: https://github.com/reqstool/reqstool-demo/blob/main/docs/reqstool/requirements.yml

What do you mean that you can't see the ID?

jimisola · 2026-03-15T20:20:59Z

We should add @requirements that are only implemented once to have that case as well:

https://github.com/reqstool/reqstool-demo/blob/5c77cbb39f442955f25d2e7023c442024743e72c/src/main/java/io/github/reqstool/example/demo/RequirementsExample.java#L5-L8

Or rather, remove the class level annotations except for one requirement that we name accordingly, e.g. REQ_IMPLEMENTED_2_TIMES"

Jonas-Werne · 2026-03-15T20:24:41Z

I removed the commented out import from the demo project and got this and it did not get a enumerated value.

What enumerated value are you referring to?

But more importantly, with the current representation you can't see the ID of the requirement this will not work.

REQ_PASS, REQ_MANUAL_FAIL etc are the IDs of those requirements as per: https://github.com/reqstool/reqstool-demo/blob/main/docs/reqstool/requirements.yml

What do you mean that you can't see the ID?

My mistake, I confused the ID to be some status of the requirement. Like REQ_PASS would mean everything was done and REQ_NOT_IMPLEMENTED meant that you have not annotated the code.

jimisola · 2026-03-15T20:25:47Z

I removed the commented out import from the demo project and got this and it did not get a enumerated value.

What enumerated value are you referring to?

But more importantly, with the current representation you can't see the ID of the requirement this will not work.

REQ_PASS, REQ_MANUAL_FAIL etc are the IDs of those requirements as per: https://github.com/reqstool/reqstool-demo/blob/main/docs/reqstool/requirements.yml
What do you mean that you can't see the ID?

My mistake, I confused the ID to be some status of the requirement. Like REQ_PASS would mean everything was done and REQ_NOT_IMPLEMENTED meant that you have not annotated the code.

No, worries. I appreciate you taking a look. I though that the REQ_nnn were not self-explanatory as the new IDs are.

Jonas-Werne · 2026-03-15T20:34:45Z

I think @lfvdavid has the best data to regression test this. Maybe we should generate a more complex data in the demo project.

All levels microservice, system, external and include filtering of requirements and also some MVR

No valid change request

jimisola · 2026-03-15T21:35:43Z

I think @lfvdavid has the best data to regression test this. Maybe we should generate a more complex data in the demo project.

All levels microservice, system, external and include filtering of requirements and also some MVR

Ideally, I don't want the demo to be too complex. It shall just demonstrate the basics for new users. What do you think?

I rather have a more complex example directly in reqstool-client then. And we still need to dog food reqstool-client and others with reqstool.

Signed-off-by: jimisola <jimisola@users.noreply.github.com>

jimisola and others added 6 commits March 12, 2026 20:47

feat: add CHECK constraints for all enum-backed columns

e2430de

Add SQL CHECK constraints to enforce valid values at the database level for significance, lifecycle_state, implementation, category, verification_type, test status, variant, and element_kind columns.

jimisola changed the title ~~feat: add SQLite storage as single source of truth (#313)~~ feat: add SQLite storage as single source of truth Mar 13, 2026

jimisola requested review from Jonas-Werne and lfvdavid March 13, 2026 16:19

style: apply black formatting

8d04735

Signed-off-by: jimisola <jimisola@jimisola.com>

Jimisola Laursen added 2 commits March 13, 2026 18:07

chore: remove stale baselines directory

c16999c

Regression testing verified directly against main. Baselines were from pre-Pydantic-v2 and are no longer needed. Signed-off-by: jimisola <jimisola@jimisola.com>

Jimisola Laursen and others added 6 commits March 13, 2026 18:09

style: apply black formatting

090f223

Signed-off-by: jimisola <jimisola@jimisola.com>

fix: remove unused variable in test_database_filter_processor

d3c1fa5

Signed-off-by: jimisola <jimisola@jimisola.com>

Merge branch 'main' into feat/313-sqlite-storage

b9c1eaf

Jonas-Werne previously requested changes Mar 15, 2026

View reviewed changes

jimisola mentioned this pull request Mar 15, 2026

feat: add LSP server for editor integration #323

Draft

6 tasks

Jonas-Werne self-requested a review March 15, 2026 20:31

chore: remove stale TestStatisticsItem pytest exclusion

a010874

Signed-off-by: jimisola <jimisola@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add SQLite storage as single source of truth#321

feat: add SQLite storage as single source of truth#321
jimisola wants to merge 16 commits intomainfrom
feat/313-sqlite-storage

jimisola commented Mar 13, 2026 •

edited

Loading

Uh oh!

jimisola commented Mar 13, 2026

Uh oh!

jimisola commented Mar 13, 2026

Uh oh!

jimisola commented Mar 13, 2026

Uh oh!

Jonas-Werne left a comment

Uh oh!

jimisola commented Mar 15, 2026

Uh oh!

jimisola commented Mar 15, 2026

Uh oh!

Jonas-Werne commented Mar 15, 2026

Uh oh!

jimisola commented Mar 15, 2026

Uh oh!

Jonas-Werne commented Mar 15, 2026

Uh oh!

jimisola commented Mar 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jimisola commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Improvements

Complexity reduction

What was removed (1,181 lines)

What replaced it (1,567 lines)

Summary

Phases completed in this PR

Status table (before → after)

Key design decisions

Closes

Related

Test plan

Uh oh!

jimisola commented Mar 13, 2026

Uh oh!

jimisola commented Mar 13, 2026

Uh oh!

jimisola commented Mar 13, 2026

Regression Test Results

Reports (asciidoc + markdown) — 6/6 cosmetic only

Export JSON — intentional schema changes

Status JSON — intentional schema changes

Bugs found and fixed

Stale baselines removed

Uh oh!

Jonas-Werne left a comment

Choose a reason for hiding this comment

Uh oh!

jimisola commented Mar 15, 2026

Uh oh!

jimisola commented Mar 15, 2026

Uh oh!

Jonas-Werne commented Mar 15, 2026

Uh oh!

jimisola commented Mar 15, 2026

Uh oh!

Jonas-Werne commented Mar 15, 2026

Uh oh!

jimisola commented Mar 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jimisola commented Mar 13, 2026 •

edited

Loading