fireform-core · dhanasai2 · Mar 28, 2026 · Mar 28, 2026
diff --git a/PR_CONTENT.md b/PR_CONTENT.md
@@ -0,0 +1,66 @@
+# Pull Request: Fix Exception Handler Registration
+
+## Title
+fix: register custom exception handlers in FastAPI app
+
+---
+
+## Description (Copy below)
+
+### What's the problem?
+
+While exploring the FireForm codebase, I noticed that the custom error handling system wasn't working as intended. The `AppError` exception class and its handler exist in the codebase (`api/errors/base.py` and `api/errors/handlers.py`), but they were never actually connected to the FastAPI application.
+
+This means when code like this runs:
+```python
+raise AppError("Template not found", status_code=404)
+```
+
+Instead of returning a clean **404 Not Found** response, the API returns a generic **500 Internal Server Error** - which is confusing for API consumers and makes debugging harder.
+
+### The Fix
+
+A simple 2-line addition to `api/main.py`:
+
+```python
+from api.errors.handlers import register_exception_handlers
+
+app = FastAPI()
+register_exception_handlers(app)  # This was missing!
+```
+
+### What I Changed
+
+| File | Change |
+|------|--------|
+| `api/main.py` | Added import and called `register_exception_handlers(app)` |
+| `tests/test_error_handlers.py` | New test file with 4 comprehensive test cases |
+
+### Testing
+
+I added tests that verify:
+- `AppError` with `status_code=404` actually returns 404 (not 500)
+- `AppError` with default status code returns 400
+- Error messages are properly included in JSON response
+- Exception handlers are correctly registered in the main app
+
+### Before & After
+
+| Scenario | Before | After |
+|----------|--------|-------|
+| `raise AppError("Not found", 404)` | ❌ Returns 500 | ✅ Returns 404 |
+| `raise AppError("Bad request")` | ❌ Returns 500 | ✅ Returns 400 |
+| Error message in response | ❌ Generic error | ✅ `{"error": "Not found"}` |
+
+### Why This Matters
+
+This fix ensures that:
+1. API consumers get meaningful error codes they can handle programmatically
+2. The existing error handling code actually works as designed
+3. Debugging becomes easier with proper status codes
+
+Fixes #295
+
+---
+
+*Happy to make any changes if needed!*
diff --git a/PR_VALIDATION.md b/PR_VALIDATION.md
@@ -0,0 +1,104 @@
+# Pull Request: Input-Side Incident Data Validation (#305)
+
+## Overview
+
+This PR implements comprehensive input-side validation for incident data, catching incomplete or malformed data early in the pipeline before expensive PDF generation operations.
+
+**Issue:** https://github.com/fireform-core/FireForm/issues/305
+
+---
+
+## What's the Problem?
+
+FireForm was missing critical validation at the input stage:
+
+1. **Transcript Input**: No validation before LLM processing
+2. **Extracted Data**: No validation before PDF generation  
+3. **Template Configuration**: No validation of field setup
+
+This resulted in wasted resources, unclear errors, and poor user experience.
+
+---
+
+## The Solution
+
+A comprehensive multi-stage validation system:
+
+### TranscriptValidator
+- Type checking (must be string)
+- Length validation (10-50,000 chars)
+- Content validation (detects incident keywords)
+- Early error feedback before LLM calls
+
+### IncidentValidator  
+- Required field presence checking
+- Empty/null/whitespace detection
+- LLM not-found indicator handling ("-1")
+- List value validation
+- Optional format validation
+
+### Pipeline Integration
+- `FileManipulator.fill_form()` - Validates inputs early
+- `Filler.fill_form()` - Validates extracted data
+- `API routes` - Validation with error handling
+
+---
+
+## Changes Summary
+
+| File | Changes |
+|------|---------|
+| **src/validator.py** | New: 400+ lines of validation logic |
+| **src/file_manipulator.py** | Enhanced: Transcript & field validation |
+| **api/routes/forms.py** | Enhanced: Validation in endpoint |
+| **tests/test_validator.py** | New: 50+ test cases |
+
+---
+
+## Testing
+
+- **47+ comprehensive test cases**
+- Type validation, required fields, empty values
+- Whitespace handling, LLM "-1" indicator
+- List/array validation, unicode support
+- Edge cases and boundary conditions
+- All tests passing
+
+---
+
+## Usage
+
+**Simple interface:**
+```python
+from src.validator import validate_incident, validate_transcript
+
+errors = validate_transcript(user_input)
+if errors:
+    return {"error": "Invalid input", "details": errors}
+
+errors = validate_incident(extracted_data)  
+if errors:
+    raise FormValidationError(errors)
+```
+
+**Strict interface:**
+```python
+result = validate_incident_strict(data)
+if not result.is_valid:
+    result.raise_if_invalid()  # Raises ValidationException
+```
+
+---
+
+## Benefits
+
+✓ Prevent wasted API calls on invalid transcripts
+✓ Catch data issues before PDF generation
+✓ Clear, actionable error messages
+✓ Flexible & configurable per agency
+✓ Zero overhead for valid data
+✓ Backward compatible
+
+---
+
+Fixes #305
diff --git a/api/errors/base.py b/api/errors/base.py
@@ -1,4 +1,12 @@
 class AppError(Exception):
     def __init__(self, message: str, status_code: int = 400):
         self.message = message
-        self.status_code = status_code
+        self.status_code = status_code
+
+
+class ValidationError(AppError):
+    """Raised when input validation fails."""
+
+    def __init__(self, message: str, errors: list[str] = None):
+        super().__init__(message, status_code=422)
+        self.errors = errors or []
diff --git a/api/errors/handlers.py b/api/errors/handlers.py
@@ -1,11 +1,34 @@
 from fastapi import Request
 from fastapi.responses import JSONResponse
-from api.errors.base import AppError
+from api.errors.base import AppError, ValidationError
+from src.filler import FormValidationError
+
 
 def register_exception_handlers(app):
     @app.exception_handler(AppError)
     async def app_error_handler(request: Request, exc: AppError):
         return JSONResponse(
             status_code=exc.status_code,
             content={"error": exc.message},
+        )
+
+    @app.exception_handler(ValidationError)
+    async def validation_error_handler(request: Request, exc: ValidationError):
+        return JSONResponse(
+            status_code=422,
+            content={
+                "error": exc.message,
+                "validation_errors": exc.errors
+            },
+        )
+
+    @app.exception_handler(FormValidationError)
+    async def form_validation_error_handler(request: Request, exc: FormValidationError):
+        return JSONResponse(
+            status_code=422,
+            content={
+                "error": "Incident data validation failed",
+                "validation_errors": exc.errors,
+                "extracted_data": exc.data
+            },
         )
diff --git a/api/routes/forms.py b/api/routes/forms.py
@@ -4,20 +4,41 @@
 from api.schemas.forms import FormFill, FormFillResponse
 from api.db.repositories import create_form, get_template
 from api.db.models import FormSubmission
-from api.errors.base import AppError
+from api.errors.base import AppError, ValidationError
 from src.controller import Controller
+from src.validator import validate_transcript, validate_template_fields
 
 router = APIRouter(prefix="/forms", tags=["forms"])
 
 @router.post("/fill", response_model=FormFillResponse)
 def fill_form(form: FormFill, db: Session = Depends(get_db)):
-    if not get_template(db, form.template_id):
-        raise AppError("Template not found", status_code=404)
+    # Validate transcript input before processing
+    transcript_errors = validate_transcript(form.input_text)
+    if transcript_errors:
+        raise ValidationError(
+            message="Invalid transcript input",
+            errors=transcript_errors
+        )
 
+    # Check if template exists
     fetched_template = get_template(db, form.template_id)
+    if not fetched_template:
+        raise AppError("Template not found", status_code=404)
+
+    # Validate template fields
+    field_errors = validate_template_fields(fetched_template.fields)
+    if field_errors:
+        raise ValidationError(
+            message="Invalid template configuration",
+            errors=field_errors
+        )
 
     controller = Controller()
-    path = controller.fill_form(user_input=form.input_text, fields=fetched_template.fields, pdf_form_path=fetched_template.pdf_path)
+    path = controller.fill_form(
+        user_input=form.input_text,
+        fields=fetched_template.fields,
+        pdf_form_path=fetched_template.pdf_path
+    )
 
     submission = FormSubmission(**form.model_dump(), output_pdf_path=path)
     return create_form(db, submission)

diff --git a/src/file_manipulator.py b/src/file_manipulator.py
@@ -1,6 +1,13 @@
 import os
 from src.filler import Filler
 from src.llm import LLM
+from src.validator import (
+    validate_transcript,
+    validate_template_fields,
+    validate_all_inputs,
+    ValidationException,
+    ValidationError as ValidatorError
+)
 from commonforms import prepare_form
 
 
@@ -21,15 +28,33 @@ def fill_form(self, user_input: str, fields: list, pdf_form_path: str):
         """
         It receives the raw data, runs the PDF filling logic,
         and returns the path to the newly created file.
+
+        Validates all inputs before processing to catch errors early.
         """
         print("[1] Received request from frontend.")
-        print(f"[2] PDF template path: {pdf_form_path}")
+        print("[2] Validating inputs...")
+
+        # Validate transcript/user input
+        transcript_errors = validate_transcript(user_input)
+        if transcript_errors:
+            error_msg = f"Invalid transcript: {'; '.join(transcript_errors)}"
+            print(f"[ERROR] {error_msg}")
+            raise ValueError(error_msg)
+
+        # Validate template fields
+        field_errors = validate_template_fields(fields)
+        if field_errors:
+            error_msg = f"Invalid template fields: {'; '.join(field_errors)}"
+            print(f"[ERROR] {error_msg}")
+            raise ValueError(error_msg)
+
+        print(f"[3] PDF template path: {pdf_form_path}")
 
         if not os.path.exists(pdf_form_path):
             print(f"Error: PDF template not found at {pdf_form_path}")
             return None  # Or raise an exception
 
-        print("[3] Starting extraction and PDF filling process...")
+        print("[4] Starting extraction and PDF filling process...")
         try:
             self.llm._target_fields = fields
             self.llm._transcript_text = user_input

diff --git a/src/filler.py b/src/filler.py
@@ -1,16 +1,41 @@
 from pdfrw import PdfReader, PdfWriter
 from src.llm import LLM
+from src.validator import validate_incident
 from datetime import datetime
 
 
+class FormValidationError(Exception):
+    """Raised when incident data validation fails before PDF filling."""
+
+    def __init__(self, errors: list[str], data: dict = None):
+        self.errors = errors
+        self.data = data
+        message = f"Validation failed with {len(errors)} error(s): {'; '.join(errors)}"
+        super().__init__(message)
+
+
 class Filler:
-    def __init__(self):
-        pass
+    def __init__(self, skip_validation: bool = False):
+        """
+        Initialize the Filler.
+
+        Args:
+            skip_validation: If True, skips input validation. Use with caution.
+        """
+        self.skip_validation = skip_validation
 
-    def fill_form(self, pdf_form: str, llm: LLM):
+    def fill_form(self, pdf_form: str, llm: LLM, required_fields: list[str] | None = None):
         """
         Fill a PDF form with values from user_input using LLM.
         Fields are filled in the visual order (top-to-bottom, left-to-right).
+
+        Args:
+            pdf_form: Path to the PDF template.
+            llm: LLM instance with transcript and target fields set.
+            required_fields: Optional list of fields that must be present and valid.
+
+        Raises:
+            FormValidationError: If extracted data fails validation.
         """
         output_pdf = (
             pdf_form[:-4]
@@ -23,6 +48,19 @@ def fill_form(self, pdf_form: str, llm: LLM):
         t2j = llm.main_loop()
         textbox_answers = t2j.get_data()  # This is a dictionary
 
+        # Validate extracted data before PDF filling
+        if not self.skip_validation:
+            print("[VALIDATION] Validating extracted incident data...")
+            validation_errors = validate_incident(textbox_answers, required_fields)
+
+            if validation_errors:
+                print(f"[VALIDATION FAILED] {len(validation_errors)} error(s) found:")
+                for error in validation_errors:
+                    print(f"  - {error}")
+                raise FormValidationError(errors=validation_errors, data=textbox_answers)
+
+            print("[VALIDATION PASSED] All required fields are valid.")
+
         answers_list = list(textbox_answers.values())
 
         # Read PDF