feat: comprehensive input-side incident data validation (#305)#364
Open
dhanasai2 wants to merge 2 commits intofireform-core:mainfrom
Open
feat: comprehensive input-side incident data validation (#305)#364dhanasai2 wants to merge 2 commits intofireform-core:mainfrom
dhanasai2 wants to merge 2 commits intofireform-core:mainfrom
Conversation
Fixes fireform-core#305 This commit adds a validation layer that catches incomplete or malformed incident data before PDF generation begins. This prevents downstream failures and ensures data integrity in the form-filling pipeline. ## New Files - src/validator.py: Comprehensive validation module with: - IncidentValidator class for configurable validation - validate_incident() convenience function - validate_incident_strict() for detailed error info - ValidationError and ValidationResult dataclasses - Support for custom required fields per agency - tests/test_validator.py: 25+ test cases covering: - Valid input handling - Missing required fields - Empty/whitespace-only fields - LLM's "-1" not-found marker - Type validation (dict required) - List field validation (plural values) - Edge cases (unicode, special chars, long values) ## Modified Files - src/filler.py: - Added FormValidationError exception class - Integrated validation before PDF filling - Added skip_validation option for backward compatibility - Added required_fields parameter - api/errors/base.py: - Added ValidationError class (HTTP 422) - api/errors/handlers.py: - Added handler for ValidationError - Added handler for FormValidationError - Returns detailed validation errors in response ## API Response Format Failed validation returns HTTP 422 with: { "error": "Incident data validation failed", "validation_errors": ["Field 'location' is missing", ...], "extracted_data": {...} } Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This comprehensive PR implements validation at multiple stages:
1. TranscriptValidator - Validates raw user input before LLM processing
- Type checking for string input
- Length validation (10-50,000 characters)
- Content validation with incident keyword detection
- Early error feedback to prevent wasted API calls
2. IncidentValidator - Validates extracted incident data before PDF generation
- Required field presence checking
- Empty, null, and whitespace-only value detection
- LLM not-found indicator ('-1') handling
- List/array value validation for plural fields
- Optional format validation with warnings
3. Pipeline Integration
- FileManipulator.fill_form() validates inputs early
- Filler.fill_form() validates extracted data before PDF generation
- API routes handle validation errors with clear responses
- All errors include field name, message, type, and value
4. Comprehensive Testing
- 47+ test cases covering all validation scenarios
- Type validation, required fields, empty values
- Whitespace handling, unicode support
- LLM '-1' indicator handling
- Edge cases and boundary conditions
- All tests passing
Files changed:
- src/validator.py (NEW): 400+ lines of validation logic
- src/file_manipulator.py: Added transcript & template field validation
- api/routes/forms.py: Added validation error handling
- tests/test_validator.py: Added 47+ comprehensive test cases
Benefits:
- Prevents wasted API calls on invalid transcripts
- Catches data issues before expensive PDF generation
- Provides clear, actionable error messages
- Zero overhead for valid data
- Fully backward compatible
- Configurable for different agencies
Fixes fireform-core#305
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Pull Request: Input-Side Incident Data Validation (#305)
Overview
This PR implements comprehensive input-side validation for incident data, catching incomplete or malformed data early in the pipeline before expensive PDF generation operations.
Issue: #305
What's the Problem?
FireForm was missing critical validation at the input stage:
This resulted in wasted resources, unclear errors, and poor user experience.
The Solution
A comprehensive multi-stage validation system:
TranscriptValidator
IncidentValidator
Pipeline Integration
FileManipulator.fill_form()- Validates inputs earlyFiller.fill_form()- Validates extracted dataAPI routes- Validation with error handlingChanges Summary
Testing
Usage
Simple interface:
Strict interface:
Benefits
✓ Prevent wasted API calls on invalid transcripts
✓ Catch data issues before PDF generation
✓ Clear, actionable error messages
✓ Flexible & configurable per agency
✓ Zero overhead for valid data
✓ Backward compatible
Fixes #305