Skip to content

feat: Radionuclides backfill job (#558)#573

Open
kbighorse wants to merge 18 commits intostagingfrom
558-radionuclides-backfill
Open

feat: Radionuclides backfill job (#558)#573
kbighorse wants to merge 18 commits intostagingfrom
558-radionuclides-backfill

Conversation

@kbighorse
Copy link
Contributor

Summary

  • Implements the Radionuclides backfill pipeline that maps NMA_RadionuclidesObservation records with chemistry metadata
  • Adds Alembic migration for new columns on observation (nma_pk_chemistryresults, detect_flag, uncertainty, analysis_agency) and sample (nma_pk_chemistrysample, volume, volume_unit)
  • Drops stale thing_id FK from NMA_Radionuclides (relationships now go through NMA_Chemistry_SampleInfo)
  • Adds BDD test suite (7 scenarios) covering happy path, idempotency, orphan handling, edge cases
  • Adds pCi/L unit and Chemistry Observation note type to lexicon

Robustness

  • Per-row savepoint isolation — one bad row never aborts the run
  • Rows with missing AnalysisDate are skipped (logged as errors) instead of using sentinel values
  • Migration upgrade is idempotent (checks information_schema before dropping columns/FKs)
  • Migration downgrade safely restores thing_id (nullable → backfill → NOT NULL)
  • Full tracebacks logged via logger.exception() for per-row failures

Test safety

  • Cleanup scoped to scenario sample_ids (no global deletes)
  • Analysis method tracking scoped to scenario samples

Closes #558
Refs #565, #566, #567, #568, #570, #571

Test plan

  • CI passes (BDD scenarios + existing pytest suite)
  • Run alembic upgrade head on a fresh DB — migration applies cleanly
  • Run alembic downgrade -1 — migration rolls back cleanly
  • Run backfill against staging NMA_Radionuclides data
  • Verify idempotency: re-running backfill produces 0 inserts, 0 errors

🤖 Generated with Claude Code

kbighorse and others added 4 commits February 27, 2026 19:37
Add chemistry backfill that maps legacy NMA_Radionuclides rows into
Observation records keyed on nma_pk_chemistryresults (idempotent upsert).

- Schema: add chemistry columns to Observation and Sample models/schemas
- Migration: add columns + unique constraints, drop stale NMA_Radionuclides.thing_id
- Lexicon: add pCi/L unit and Chemistry Observation note type
- Backfill: resolve Samples via nma_pk_chemistrysample, create Parameters,
  AnalysisMethods, Notes; detect_flag from legacy Symbol qualifier
- Tests: BDD step definitions for all 7 scenarios, feature file fixes
- Orchestrator: wire radionuclides step into backfill pipeline

Refs #558

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Wrap cleanup chain in try/except to prevent orphan data on failure (#566)
- Scope AnalysisMethod cleanup to test-created IDs only (#567)

Also addresses #565 (scalar_one hardening) and #568 (remove unused
parameter_ids tracker) which were applied to the new files in the
preceding feature commit.

Refs #558

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add per-row savepoint isolation so one bad row doesn't abort the run
- Skip rows with missing AnalysisDate instead of inserting epoch sentinel
- Remove unused batch_size parameter from backfill signatures and CLI
- Fix migration downgrade to backfill thing_id before setting NOT NULL
- Track analysis_method_ids in test step defs for scoped cleanup

Refs #558

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…leanup

- Migration upgrade now checks information_schema before dropping thing_id
  column, guarding against environments where it was already removed
- run_backfill.sh no longer references removed --batch-size parameter
- Test observation cleanup scoped to scenario sample_ids instead of
  blanket nma_pk_chemistryresults IS NOT NULL query

Refs #558, #570

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings February 28, 2026 04:07
kbighorse and others added 2 commits February 27, 2026 20:07
… tracking

- Migration upgrade dynamically discovers FK names via sa.inspect()
  with guard for missing name entries
- Per-row exception handler now logs full traceback via logger.exception()
  before appending summary to result.errors
- Analysis method tracking scoped to scenario sample_ids
- Removed incorrect issue reference from test comment

Refs #558

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements a radionuclides chemistry backfill pipeline that converts legacy NMA_Radionuclides rows into the Ocotillo Observation schema (plus related Sample metadata), wires it into the backfill orchestrator, and adds BDD coverage + schema/migration/lexicon updates needed to support the new fields.

Changes:

  • Added a radionuclides backfill implementation with per-row savepoints, idempotent upserts, and Notes creation.
  • Added DB/schema support for chemistry audit keys + metadata (nma_pk_chemistryresults, detect_flag, uncertainty, analysis_agency, nma_pk_chemistrysample, volume, volume_unit) via Alembic + models + API schemas.
  • Added/updated BDD steps and test cleanup logic; extended lexicon with pCi/L and Chemistry Observation.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
transfers/backfill/chemistry_backfill.py Implements the radionuclides backfill (Sample resolution, Observation upsert, volume + Notes handling).
transfers/backfill/backfill.py Registers the radionuclides backfill in the orchestrator and improves logging/output.
tests/features/steps/chemistry-backfill.py Adds Behave step definitions and helpers for radionuclides backfill scenarios.
tests/features/nma-chemistry-radionuclides-refactor.feature Minor scenario adjustments for radionuclides backfill BDD feature.
tests/features/environment.py Adds per-scenario cleanup for chemistry backfill fixtures.
db/sample.py Adds chemistry audit/sample metadata columns to the ORM model.
db/observation.py Adds chemistry audit + metadata columns to the ORM model.
schemas/sample.py Exposes new Sample chemistry fields in the API response schema.
schemas/observation.py Exposes new Observation chemistry fields in the API response schema.
alembic/versions/545a5b77e5e8_add_chemistry_backfill_columns_to_.py Migration adding columns/constraints and dropping legacy thing_id from NMA_Radionuclides.
core/lexicon.json Adds lexicon terms for pCi/L and Chemistry Observation.
run_backfill.sh Updates backfill runner invocation to match the orchestrator CLI changes.

Copilot AI review requested due to automatic review settings February 28, 2026 08:45
@kbighorse kbighorse review requested due to automatic review settings February 28, 2026 08:45
kbighorse and others added 2 commits February 28, 2026 00:46
…ow catch

- Volume/volume_unit test assertions query by scenario sample_ids
- existing_keys set lowercases DB values to match global_id_str normalization
- Per-row catch broadened to Exception with logger.exception() for tracebacks
- Analysis method tracking scoped to scenario samples
- Migration FK drop guarded against missing name entries
- Resolve stash merge conflicts in sample volume steps

Refs #558

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@kbighorse kbighorse self-assigned this Feb 28, 2026
@kbighorse kbighorse requested a review from jirhiker February 28, 2026 09:21
@jirhiker jirhiker requested a review from ksmuczynski March 2, 2026 16:19
Copy link
Member

@jirhiker jirhiker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good from the description. @ksmuczynski should review re database schema

@ksmuczynski
Copy link
Contributor

@jirhiker @kbighorse Great, thanks. Will take a closer look as soon as the well inventory ingestion work is completed.

Copilot AI review requested due to automatic review settings March 2, 2026 17:23
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 2 comments.

kbighorse and others added 4 commits March 2, 2026 09:36
Snapshot AnalysisMethod IDs before/after backfill to track only
actually-created methods, preventing deletion of pre-existing rows.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Delete context._backfill_created in finally block to prevent
ID accumulation across scenarios causing redundant deletes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add ORDER BY id to legacy query so lowest-PK row wins when multiple
  radionuclide rows map to the same Sample (volume/volume_unit)
- Skip overwrites with warning when a conflict is detected
- Reject unknown CLI args in backfill entrypoint instead of silently
  ignoring them

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 2, 2026 18:29
@kbighorse kbighorse review requested due to automatic review settings March 2, 2026 18:29
Skip pg_cron extension creation gracefully when unavailable instead
of hard-failing, unblocking local dev and test environments.

Refs #576

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 2, 2026 19:37
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated no new comments.

kbighorse and others added 2 commits March 2, 2026 12:40
Skip pg_cron extension creation gracefully when unavailable instead
of hard-failing, unblocking local dev and test environments.

Also skip search vector trigger sync for tables that don't exist
yet (e.g. asset) to avoid NoSuchTableError during test setup.

Refs #576

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BDMS-611: Implement Radionuclides backfill job and BDD step definitions

4 participants