Skip to content

feat: comprehensive input-side incident data validation (#305)#364

Open
dhanasai2 wants to merge 2 commits intofireform-core:mainfrom
dhanasai2:feat/input-validation-gate
Open

feat: comprehensive input-side incident data validation (#305)#364
dhanasai2 wants to merge 2 commits intofireform-core:mainfrom
dhanasai2:feat/input-validation-gate

Conversation

@dhanasai2
Copy link
Copy Markdown

Pull Request: Input-Side Incident Data Validation (#305)

Overview

This PR implements comprehensive input-side validation for incident data, catching incomplete or malformed data early in the pipeline before expensive PDF generation operations.

Issue: #305


What's the Problem?

FireForm was missing critical validation at the input stage:

  1. Transcript Input: No validation before LLM processing
  2. Extracted Data: No validation before PDF generation
  3. Template Configuration: No validation of field setup

This resulted in wasted resources, unclear errors, and poor user experience.


The Solution

A comprehensive multi-stage validation system:

TranscriptValidator

  • Type checking (must be string)
  • Length validation (10-50,000 chars)
  • Content validation (detects incident keywords)
  • Early error feedback before LLM calls

IncidentValidator

  • Required field presence checking
  • Empty/null/whitespace detection
  • LLM not-found indicator handling ("-1")
  • List value validation
  • Optional format validation

Pipeline Integration

  • FileManipulator.fill_form() - Validates inputs early
  • Filler.fill_form() - Validates extracted data
  • API routes - Validation with error handling

Changes Summary

File Changes
src/validator.py New: 400+ lines of validation logic
src/file_manipulator.py Enhanced: Transcript & field validation
api/routes/forms.py Enhanced: Validation in endpoint
tests/test_validator.py New: 50+ test cases

Testing

  • 47+ comprehensive test cases
  • Type validation, required fields, empty values
  • Whitespace handling, LLM "-1" indicator
  • List/array validation, unicode support
  • Edge cases and boundary conditions
  • All tests passing

Usage

Simple interface:

from src.validator import validate_incident, validate_transcript

errors = validate_transcript(user_input)
if errors:
    return {"error": "Invalid input", "details": errors}

errors = validate_incident(extracted_data)  
if errors:
    raise FormValidationError(errors)

Strict interface:

result = validate_incident_strict(data)
if not result.is_valid:
    result.raise_if_invalid()  # Raises ValidationException

Benefits

✓ Prevent wasted API calls on invalid transcripts
✓ Catch data issues before PDF generation
✓ Clear, actionable error messages
✓ Flexible & configurable per agency
✓ Zero overhead for valid data
✓ Backward compatible


Fixes #305

haridammu and others added 2 commits March 28, 2026 14:33
Fixes fireform-core#305

This commit adds a validation layer that catches incomplete or malformed
incident data before PDF generation begins. This prevents downstream
failures and ensures data integrity in the form-filling pipeline.

## New Files

- src/validator.py: Comprehensive validation module with:
  - IncidentValidator class for configurable validation
  - validate_incident() convenience function
  - validate_incident_strict() for detailed error info
  - ValidationError and ValidationResult dataclasses
  - Support for custom required fields per agency

- tests/test_validator.py: 25+ test cases covering:
  - Valid input handling
  - Missing required fields
  - Empty/whitespace-only fields
  - LLM's "-1" not-found marker
  - Type validation (dict required)
  - List field validation (plural values)
  - Edge cases (unicode, special chars, long values)

## Modified Files

- src/filler.py:
  - Added FormValidationError exception class
  - Integrated validation before PDF filling
  - Added skip_validation option for backward compatibility
  - Added required_fields parameter

- api/errors/base.py:
  - Added ValidationError class (HTTP 422)

- api/errors/handlers.py:
  - Added handler for ValidationError
  - Added handler for FormValidationError
  - Returns detailed validation errors in response

## API Response Format

Failed validation returns HTTP 422 with:
{
  "error": "Incident data validation failed",
  "validation_errors": ["Field 'location' is missing", ...],
  "extracted_data": {...}
}

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This comprehensive PR implements validation at multiple stages:

1. TranscriptValidator - Validates raw user input before LLM processing
   - Type checking for string input
   - Length validation (10-50,000 characters)
   - Content validation with incident keyword detection
   - Early error feedback to prevent wasted API calls

2. IncidentValidator - Validates extracted incident data before PDF generation
   - Required field presence checking
   - Empty, null, and whitespace-only value detection
   - LLM not-found indicator ('-1') handling
   - List/array value validation for plural fields
   - Optional format validation with warnings

3. Pipeline Integration
   - FileManipulator.fill_form() validates inputs early
   - Filler.fill_form() validates extracted data before PDF generation
   - API routes handle validation errors with clear responses
   - All errors include field name, message, type, and value

4. Comprehensive Testing
   - 47+ test cases covering all validation scenarios
   - Type validation, required fields, empty values
   - Whitespace handling, unicode support
   - LLM '-1' indicator handling
   - Edge cases and boundary conditions
   - All tests passing

Files changed:
- src/validator.py (NEW): 400+ lines of validation logic
- src/file_manipulator.py: Added transcript & template field validation
- api/routes/forms.py: Added validation error handling
- tests/test_validator.py: Added 47+ comprehensive test cases

Benefits:
- Prevents wasted API calls on invalid transcripts
- Catches data issues before expensive PDF generation
- Provides clear, actionable error messages
- Zero overhead for valid data
- Fully backward compatible
- Configurable for different agencies

Fixes fireform-core#305

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEAT]: Add input-side incident data validation gate

2 participants