BDMS-626: Improve validation, schema alignment, and well inventory handling#596
Draft
ksmuczynski wants to merge 22 commits intostagingfrom
Draft
BDMS-626: Improve validation, schema alignment, and well inventory handling#596ksmuczynski wants to merge 22 commits intostagingfrom
ksmuczynski wants to merge 22 commits intostagingfrom
Conversation
- Introduced `validation_alias` with `AliasChoices` for selected fields (`well_status`, `sampler`, `measurement_date_time`, `mp_height`) to allow alternate field names. - Ensured alignment with schema validation updates.
- Introduced unit tests for `WellInventoryRow` alias mappings. - Verified correct handling of alias fields like `well_hole_status`, `mp_height_ft`, and others. - Ensured canonical fields take precedence when both alias and canonical values are provided.
… and new fields - Added `flexible_lexicon_validator` to support case-insensitive validation of enum-like fields. - Introduced new fields: `OriginType`, `WellPumpType`, `MonitoringStatus`, among others. - Updated existing fields to use flexible lexicon validation for improved consistency. - Adjusted `WellInventoryRow` optional fields handling and validation rules. - Refined contact field validation logic to require `role` and `type` when other contact details are provided.
…dations - Refined validation error handling to provide more detailed feedback in test assertions. - Adjusted test setup to ensure accurate validation scenarios for contact and water level fields. - Updated contact-related tests to validate new composite field error messages.
- Renamed "Water" to "Water Bearing Zone" and refined its definition. - Added new term "Water Quality" under `note_type` category.
… to prevent cross-test collisions - Supports BDD test suite stability - Added hashing mechanism to append unique suffix to `well_name_point_id` for scenario isolation. - Integrated pandas for robust CSV parsing and content modifications when applicable. - Ensured handling preserves existing format for IDs ending with `-xxxx`. - Maintained existing handling for empty or non-CSV files.
…ollback side effects - Supports transaction management - Moved `session.refresh` calls under `commit` condition to streamline database session operations. - Reorganized `session.rollback` logic to properly align with commit flow.
…ory source fields in support of schema alignment and database mapping - Update well inventory CSV files to correct data inconsistencies and improve schema alignment. - Added support for `Sample`, `Observation`, and `Parameter` objects within well inventory processing. - Enhanced elevation handling with optional and default value logic. - Introduced `release_status`, `monitoring_status`, and validation for derived fields. - Updated notes handling with new cases and refined content categorization. - Improved `depth_to_water` processing with associated sample and observation creation. - Refined lexicon updates and schema field adjustments for better data consistency.
…h 1 well - Updated BDD tests to reflect changes in well inventory bulk upload logic, allowing the import of 1 well despite validation errors. - Modified step definitions for more granular validation on imported well counts. - Enhanced error message detail in responses for validation scenarios. - Adjusted sample CSV files to match new import logic and validation schema updates. - Refined service behavior to improve handling of validation errors and partial imports.
Contributor
There was a problem hiding this comment.
Pull request overview
This PR updates the well-inventory CSV ingestion pipeline to better align schema validation with lexicon-backed enums, persist previously-missed fields (notes, monitoring/public availability, water level observations), and change the import behavior to “best-effort” so individual row failures don’t abort the entire upload.
Changes:
- Added row-level savepoints and improved error reporting so invalid rows are skipped while valid rows are persisted.
- Updated
WellInventoryRowschema to relax/adjust requirements and introduce lexicon-backed enum parsing plus CSV field aliases. - Updated BDD/unit tests and test CSV fixtures to reflect new validation rules and partial-success behavior.
Reviewed changes
Copilot reviewed 34 out of 34 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
services/well_inventory_csv.py |
Best-effort import via nested savepoints; mapping updates for release status, notes, monitoring status, and water level observation persistence |
schemas/well_inventory.py |
Schema optionality updates, lexicon enum coercion, contact + water level validation adjustments, and alias handling |
services/thing_helper.py |
Adds monitoring status history writes; adjusts commit/rollback behavior for outer-transaction support |
services/contact_helper.py |
Adjusts commit/rollback behavior for outer-transaction support |
cli/service_adapter.py |
Improves exit code and stderr reporting for partial success and validation failures |
tests/test_well_inventory.py |
Adds schema alias tests and helper row builder |
tests/features/well-inventory-csv.feature |
Updates expected partial-success outcomes in negative scenarios |
tests/features/steps/*.py |
Updates step definitions for partial success and loosens validation-error matching |
tests/features/data/*.csv |
Refreshes fixture data to match new lexicon/validation expectations |
core/lexicon.json |
Adds a new note_type term |
… unique well name suffixes in well inventory scenarios - Updated `pd.read_csv` calls with `keep_default_na=False` to retain empty values as-is. - Refined logic for suffix addition by excluding empty and `-xxxx` suffixed IDs. - Improved test isolation by maintaining scenario-specific unique identifiers.
…nd `DataQuality` - Changed `SampleMethodField` to validate against `SampleMethod` instead of `OriginType` - Changed `DataQualityField` to validate against `DataQuality` instead of `OriginType`
… import - Make contact.role and contact.contact_type nullable in the ORM and migrations - Update contact schemas and well inventory validation to accept missing values - Allow contact import when name or organization is present without role/type
- Stop round-tripping CSV fixtures through pandas to avoid rewriting structural test cases - Preserve repeated header rows and duplicate column fixtures so importer validation is exercised correctly - Keep the blank contact name/organization scenario focused on a single invalid row for stable assertions
…n errors - Prevent one actual validation error from satisfying multiple expected assertions (avoids false positives) - Keep validation matching order-independent while requiring distinct matches (preserves flexibility) - Tighten BDD error checks without relying on exact error text (improves test precision)
…behavior - Update partial-success scenarios to expect valid rows to import alongside row-level validation errors - Reflect current importer behavior for invalid lexicon, invalid date, and repeated-header cases - Keep BDD coverage focused on user-visible import outcomes instead of outdated all-or-nothing assumptions
…sitive parsing - Update unit expectations to accept lowercase placeholder tokens that are now supported - Document normalization of mixed-case and spaced placeholder formats to uppercase prefixes - Keep test coverage aligned with importer behavior and reduce confusion around valid autogen inputs
…DataQuality` - Adjust test data to reflect updated descriptions for `sample_method` and `data_quality` fields.
…ization scenarios - Add test to ensure contact creation returns None when both name and organization are missing - Add test to verify contact creation with organization only, ensuring proper dict structure - Update assertions for comprehensive validation of contact fields
Contributor
Author
|
@jirhiker Recent updates to the CLI command tests are failing on my branch. Since these changes don't seem to impact the well inventory ingestion, should I resolve the test failures within this PR, or would you prefer I handle them in a separate one before re-opening for review? |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
This PR addresses the following problem / context:
public_availability_acknowledgement,monitoring_status,well/water notes, and water level observations were not being persisted correctly.How
Implementation summary - the following was changed / added / removed:
public_availability_acknowledgementnow maps toLocation.release_status(True → public, False → private, unset → draft).monitoring_statusis now written to theStatusHistorytable.well_notesandwater_notesare now stored as polymorphic notes on theThing.depth_to_water_ftnow auto-creates aSampleandObservationlinked to the field activity.WellInventoryRowschema:site_name,elevation_ft,elevation_method, andmeasuring_point_height_ftoptional.elevation_method,depth_source,well_pump_type,monitoring_status,sample_method, anddata_quality.flexible_lexicon_validatorfor case-insensitive, whitespace-tolerant matching.contact_nameorcontact_organization, and madewater_level_date_timeonly required whendepth_to_water_ftis present.validation_errorslist.UnboundLocalErrorcaused by auto-generated well IDs"Database error"fields.commit=Falsesupport to allowservices/thing_helper.pyandservices/contact_helper.pyo participate in the outer best-effort transaction without prematurely committing.tests/features/data/to use valid lexicon terms and properly quoted comma-containing values.Givensteps to isolate tests and prevent primary key conflicts.well-inventory-csv.featureto assert partial success (e.g., "1 well is imported") in negative scenarios. All 44 scenarios now pass.Notes
Any special considerations, workarounds, or follow-up work to note?