Skip to content

BDMS-626: Improve validation, schema alignment, and well inventory handling#596

Draft
ksmuczynski wants to merge 22 commits intostagingfrom
kas-well-BDMS-626-inventory-ingestion-updates_v2
Draft

BDMS-626: Improve validation, schema alignment, and well inventory handling#596
ksmuczynski wants to merge 22 commits intostagingfrom
kas-well-BDMS-626-inventory-ingestion-updates_v2

Conversation

@ksmuczynski
Copy link
Contributor

@ksmuczynski ksmuczynski commented Mar 11, 2026

Why

This PR addresses the following problem / context:

  • Database mapping was incomplete: public_availability_acknowledgement, monitoring_status, well/water notes, and water level observations were not being persisted correctly.
  • Several schema fields were overly strict — requiring values that should be optional, using plain strings where lexicon enums were expected, and enforcing contact/water level fields more rigidly than necessary.
  • WellInventoryRow schema now integrates directly with Lexicon-based enums to ensure database consistency.
  • BDD tests suffered from primary key conflicts and outdated test data that didn't align with new validation rules.
  • Previously, a single malformed row in a CSV would abort the entire import process. Users need "best-effort" logic where valid data is saved while invalid rows are flagged.
  • Now, individual row failures (validation or database) no longer abort the entire ingestion.

How

Implementation summary - the following was changed / added / removed:

  • Database mapping:
    • public_availability_acknowledgement now maps to Location.release_status (True → public, False → private, unset → draft).
    • monitoring_status is now written to the StatusHistory table.
    • well_notes and water_notes are now stored as polymorphic notes on the Thing.
    • Providing depth_to_water_ft now auto-creates a Sample and Observation linked to the field activity.
  • WellInventoryRow schema:
    • Made site_name, elevation_ft, elevation_method, and measuring_point_height_ft optional.
    • Replaced plain str types with lexicon-based enums for elevation_method, depth_source, well_pump_type, monitoring_status, sample_method, and data_quality.
    • Added a flexible_lexicon_validator for case-insensitive, whitespace-tolerant matching.
    • Relaxed contact validation to require only one of contact_name or contact_organization, and made water_level_date_time only required when depth_to_water_ft is present.
  • Best-effort logic:
    • Implemented row-level atomic savepoints so individual row failures no longer abort the full import.
    • Failed rows are logged with 1-based row numbers and well IDs in a validation_errors list.
    • Fixed an UnboundLocalError caused by auto-generated well IDs
    • Updated error reporting to surface database-level failures as "Database error" fields.
    • Added commit=False support to allow services/thing_helper.py and services/contact_helper.py o participate in the outer best-effort transaction without prematurely committing.
  • BDD test suite:
    • Updated test CSVs in tests/features/data/ to use valid lexicon terms and properly quoted comma-containing values.
    • Added scenario-based unique ID suffixing in Given steps to isolate tests and prevent primary key conflicts.
    • Updated well-inventory-csv.feature to assert partial success (e.g., "1 well is imported") in negative scenarios. All 44 scenarios now pass.

Notes

Any special considerations, workarounds, or follow-up work to note?

  • Further enhancements may be required for complete schema validation coverage.
  • Complete schema validation coverage will be scoped in a separate PR, most likely a future PR scoped at testing the ingestion of real user-entered data
  • Lexicon matching is intentionally lenient (case-insensitive, whitespace-stripped) to reduce friction for CSV authors, but values must still resolve to a known lexicon entry or the row will be skipped.
  • Test isolation is achieved via ID suffixing at the Given step level; if the shared test database is ever replaced with per-scenario teardown, this workaround can be removed.

- Introduced `validation_alias` with `AliasChoices` for selected fields (`well_status`, `sampler`, `measurement_date_time`, `mp_height`) to allow alternate field names.
- Ensured alignment with schema validation updates.
- Introduced unit tests for `WellInventoryRow` alias mappings.
- Verified correct handling of alias fields like `well_hole_status`, `mp_height_ft`, and others.
- Ensured canonical fields take precedence when both alias and canonical values are provided.
… and new fields

- Added `flexible_lexicon_validator` to support case-insensitive validation of enum-like fields.
- Introduced new fields: `OriginType`, `WellPumpType`, `MonitoringStatus`, among others.
- Updated existing fields to use flexible lexicon validation for improved consistency.
- Adjusted `WellInventoryRow` optional fields handling and validation rules.
- Refined contact field validation logic to require `role` and `type` when other contact details are provided.
…dations

- Refined validation error handling to provide more detailed feedback in test assertions.
- Adjusted test setup to ensure accurate validation scenarios for contact and water level fields.
- Updated contact-related tests to validate new composite field error messages.
- Renamed "Water" to "Water Bearing Zone" and refined its definition.
- Added new term "Water Quality" under `note_type` category.
… to prevent cross-test collisions

- Supports BDD test suite stability
- Added hashing mechanism to append unique suffix to `well_name_point_id` for scenario isolation.
- Integrated pandas for robust CSV parsing and content modifications when applicable.
- Ensured handling preserves existing format for IDs ending with `-xxxx`.
- Maintained existing handling for empty or non-CSV files.
…ollback side effects

- Supports transaction management
- Moved `session.refresh` calls under `commit` condition to streamline database session operations.
- Reorganized `session.rollback` logic to properly align with commit flow.
…ory source fields in support of schema alignment and database mapping

- Update well inventory CSV files to correct data inconsistencies and improve schema alignment.
- Added support for `Sample`, `Observation`, and `Parameter` objects within well inventory processing.
- Enhanced elevation handling with optional and default value logic.
- Introduced `release_status`, `monitoring_status`, and validation for derived fields.
- Updated notes handling with new cases and refined content categorization.
- Improved `depth_to_water` processing with associated sample and observation creation.
- Refined lexicon updates and schema field adjustments for better data consistency.
…h 1 well

- Updated BDD tests to reflect changes in well inventory bulk upload logic, allowing the import of 1 well despite validation errors.
- Modified step definitions for more granular validation on imported well counts.
- Enhanced error message detail in responses for validation scenarios.
- Adjusted sample CSV files to match new import logic and validation schema updates.
- Refined service behavior to improve handling of validation errors and partial imports.
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the well-inventory CSV ingestion pipeline to better align schema validation with lexicon-backed enums, persist previously-missed fields (notes, monitoring/public availability, water level observations), and change the import behavior to “best-effort” so individual row failures don’t abort the entire upload.

Changes:

  • Added row-level savepoints and improved error reporting so invalid rows are skipped while valid rows are persisted.
  • Updated WellInventoryRow schema to relax/adjust requirements and introduce lexicon-backed enum parsing plus CSV field aliases.
  • Updated BDD/unit tests and test CSV fixtures to reflect new validation rules and partial-success behavior.

Reviewed changes

Copilot reviewed 34 out of 34 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
services/well_inventory_csv.py Best-effort import via nested savepoints; mapping updates for release status, notes, monitoring status, and water level observation persistence
schemas/well_inventory.py Schema optionality updates, lexicon enum coercion, contact + water level validation adjustments, and alias handling
services/thing_helper.py Adds monitoring status history writes; adjusts commit/rollback behavior for outer-transaction support
services/contact_helper.py Adjusts commit/rollback behavior for outer-transaction support
cli/service_adapter.py Improves exit code and stderr reporting for partial success and validation failures
tests/test_well_inventory.py Adds schema alias tests and helper row builder
tests/features/well-inventory-csv.feature Updates expected partial-success outcomes in negative scenarios
tests/features/steps/*.py Updates step definitions for partial success and loosens validation-error matching
tests/features/data/*.csv Refreshes fixture data to match new lexicon/validation expectations
core/lexicon.json Adds a new note_type term

… unique well name suffixes in well inventory scenarios

- Updated `pd.read_csv` calls with `keep_default_na=False` to retain empty values as-is.
- Refined logic for suffix addition by excluding empty and `-xxxx` suffixed IDs.
- Improved test isolation by maintaining scenario-specific unique identifiers.
…nd `DataQuality`

- Changed `SampleMethodField` to validate against `SampleMethod` instead of `OriginType`
- Changed `DataQualityField` to validate against `DataQuality` instead of `OriginType`
… import

- Make contact.role and contact.contact_type nullable in the ORM and migrations
- Update contact schemas and well inventory validation to accept missing values
- Allow contact import when name or organization is present without role/type
- Stop round-tripping CSV fixtures through pandas to avoid rewriting structural test cases
- Preserve repeated header rows and duplicate column fixtures so importer validation is exercised correctly
- Keep the blank contact name/organization scenario focused on a single invalid row for stable assertions
…n errors

- Prevent one actual validation error from satisfying multiple expected assertions (avoids false positives)
- Keep validation matching order-independent while requiring distinct matches (preserves flexibility)
- Tighten BDD error checks without relying on exact error text (improves test precision)
…behavior

- Update partial-success scenarios to expect valid rows to import alongside row-level validation errors
- Reflect current importer behavior for invalid lexicon, invalid date, and repeated-header cases
- Keep BDD coverage focused on user-visible import outcomes instead of outdated all-or-nothing assumptions
…sitive parsing

- Update unit expectations to accept lowercase placeholder tokens that are now supported
- Document normalization of mixed-case and spaced placeholder formats to uppercase prefixes
- Keep test coverage aligned with importer behavior and reduce confusion around valid autogen inputs
…DataQuality`

- Adjust test data to reflect updated descriptions for `sample_method` and `data_quality` fields.
…ization scenarios

- Add test to ensure contact creation returns None when both name and organization are missing
- Add test to verify contact creation with organization only, ensuring proper dict structure
- Update assertions for comprehensive validation of contact fields
@ksmuczynski
Copy link
Contributor Author

@jirhiker Recent updates to the CLI command tests are failing on my branch. Since these changes don't seem to impact the well inventory ingestion, should I resolve the test failures within this PR, or would you prefer I handle them in a separate one before re-opening for review?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants