Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
66 changes: 66 additions & 0 deletions PR_CONTENT.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
# Pull Request: Fix Exception Handler Registration

## Title
fix: register custom exception handlers in FastAPI app

---

## Description (Copy below)

### What's the problem?

While exploring the FireForm codebase, I noticed that the custom error handling system wasn't working as intended. The `AppError` exception class and its handler exist in the codebase (`api/errors/base.py` and `api/errors/handlers.py`), but they were never actually connected to the FastAPI application.

This means when code like this runs:
```python
raise AppError("Template not found", status_code=404)
```

Instead of returning a clean **404 Not Found** response, the API returns a generic **500 Internal Server Error** - which is confusing for API consumers and makes debugging harder.

### The Fix

A simple 2-line addition to `api/main.py`:

```python
from api.errors.handlers import register_exception_handlers

app = FastAPI()
register_exception_handlers(app) # This was missing!
```

### What I Changed

| File | Change |
|------|--------|
| `api/main.py` | Added import and called `register_exception_handlers(app)` |
| `tests/test_error_handlers.py` | New test file with 4 comprehensive test cases |

### Testing

I added tests that verify:
- `AppError` with `status_code=404` actually returns 404 (not 500)
- `AppError` with default status code returns 400
- Error messages are properly included in JSON response
- Exception handlers are correctly registered in the main app

### Before & After

| Scenario | Before | After |
|----------|--------|-------|
| `raise AppError("Not found", 404)` | ❌ Returns 500 | ✅ Returns 404 |
| `raise AppError("Bad request")` | ❌ Returns 500 | ✅ Returns 400 |
| Error message in response | ❌ Generic error | ✅ `{"error": "Not found"}` |

### Why This Matters

This fix ensures that:
1. API consumers get meaningful error codes they can handle programmatically
2. The existing error handling code actually works as designed
3. Debugging becomes easier with proper status codes

Fixes #295

---

*Happy to make any changes if needed!*
104 changes: 104 additions & 0 deletions PR_VALIDATION.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
# Pull Request: Input-Side Incident Data Validation (#305)

## Overview

This PR implements comprehensive input-side validation for incident data, catching incomplete or malformed data early in the pipeline before expensive PDF generation operations.

**Issue:** https://github.com/fireform-core/FireForm/issues/305

---

## What's the Problem?

FireForm was missing critical validation at the input stage:

1. **Transcript Input**: No validation before LLM processing
2. **Extracted Data**: No validation before PDF generation
3. **Template Configuration**: No validation of field setup

This resulted in wasted resources, unclear errors, and poor user experience.

---

## The Solution

A comprehensive multi-stage validation system:

### TranscriptValidator
- Type checking (must be string)
- Length validation (10-50,000 chars)
- Content validation (detects incident keywords)
- Early error feedback before LLM calls

### IncidentValidator
- Required field presence checking
- Empty/null/whitespace detection
- LLM not-found indicator handling ("-1")
- List value validation
- Optional format validation

### Pipeline Integration
- `FileManipulator.fill_form()` - Validates inputs early
- `Filler.fill_form()` - Validates extracted data
- `API routes` - Validation with error handling

---

## Changes Summary

| File | Changes |
|------|---------|
| **src/validator.py** | New: 400+ lines of validation logic |
| **src/file_manipulator.py** | Enhanced: Transcript & field validation |
| **api/routes/forms.py** | Enhanced: Validation in endpoint |
| **tests/test_validator.py** | New: 50+ test cases |

---

## Testing

- **47+ comprehensive test cases**
- Type validation, required fields, empty values
- Whitespace handling, LLM "-1" indicator
- List/array validation, unicode support
- Edge cases and boundary conditions
- All tests passing

---

## Usage

**Simple interface:**
```python
from src.validator import validate_incident, validate_transcript

errors = validate_transcript(user_input)
if errors:
return {"error": "Invalid input", "details": errors}

errors = validate_incident(extracted_data)
if errors:
raise FormValidationError(errors)
```

**Strict interface:**
```python
result = validate_incident_strict(data)
if not result.is_valid:
result.raise_if_invalid() # Raises ValidationException
```

---

## Benefits

✓ Prevent wasted API calls on invalid transcripts
✓ Catch data issues before PDF generation
✓ Clear, actionable error messages
✓ Flexible & configurable per agency
✓ Zero overhead for valid data
✓ Backward compatible

---

Fixes #305
10 changes: 9 additions & 1 deletion api/errors/base.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,12 @@
class AppError(Exception):
def __init__(self, message: str, status_code: int = 400):
self.message = message
self.status_code = status_code
self.status_code = status_code


class ValidationError(AppError):
"""Raised when input validation fails."""

def __init__(self, message: str, errors: list[str] = None):
super().__init__(message, status_code=422)
self.errors = errors or []
25 changes: 24 additions & 1 deletion api/errors/handlers.py
Original file line number Diff line number Diff line change
@@ -1,11 +1,34 @@
from fastapi import Request
from fastapi.responses import JSONResponse
from api.errors.base import AppError
from api.errors.base import AppError, ValidationError
from src.filler import FormValidationError


def register_exception_handlers(app):
@app.exception_handler(AppError)
async def app_error_handler(request: Request, exc: AppError):
return JSONResponse(
status_code=exc.status_code,
content={"error": exc.message},
)

@app.exception_handler(ValidationError)
async def validation_error_handler(request: Request, exc: ValidationError):
return JSONResponse(
status_code=422,
content={
"error": exc.message,
"validation_errors": exc.errors
},
)

@app.exception_handler(FormValidationError)
async def form_validation_error_handler(request: Request, exc: FormValidationError):
return JSONResponse(
status_code=422,
content={
"error": "Incident data validation failed",
"validation_errors": exc.errors,
"extracted_data": exc.data
},
)
29 changes: 25 additions & 4 deletions api/routes/forms.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,20 +4,41 @@
from api.schemas.forms import FormFill, FormFillResponse
from api.db.repositories import create_form, get_template
from api.db.models import FormSubmission
from api.errors.base import AppError
from api.errors.base import AppError, ValidationError
from src.controller import Controller
from src.validator import validate_transcript, validate_template_fields

router = APIRouter(prefix="/forms", tags=["forms"])

@router.post("/fill", response_model=FormFillResponse)
def fill_form(form: FormFill, db: Session = Depends(get_db)):
if not get_template(db, form.template_id):
raise AppError("Template not found", status_code=404)
# Validate transcript input before processing
transcript_errors = validate_transcript(form.input_text)
if transcript_errors:
raise ValidationError(
message="Invalid transcript input",
errors=transcript_errors
)

# Check if template exists
fetched_template = get_template(db, form.template_id)
if not fetched_template:
raise AppError("Template not found", status_code=404)

# Validate template fields
field_errors = validate_template_fields(fetched_template.fields)
if field_errors:
raise ValidationError(
message="Invalid template configuration",
errors=field_errors
)

controller = Controller()
path = controller.fill_form(user_input=form.input_text, fields=fetched_template.fields, pdf_form_path=fetched_template.pdf_path)
path = controller.fill_form(
user_input=form.input_text,
fields=fetched_template.fields,
pdf_form_path=fetched_template.pdf_path
)

submission = FormSubmission(**form.model_dump(), output_pdf_path=path)
return create_form(db, submission)
Expand Down
29 changes: 27 additions & 2 deletions src/file_manipulator.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,13 @@
import os
from src.filler import Filler
from src.llm import LLM
from src.validator import (
validate_transcript,
validate_template_fields,
validate_all_inputs,
ValidationException,
ValidationError as ValidatorError
)
from commonforms import prepare_form


Expand All @@ -21,15 +28,33 @@ def fill_form(self, user_input: str, fields: list, pdf_form_path: str):
"""
It receives the raw data, runs the PDF filling logic,
and returns the path to the newly created file.

Validates all inputs before processing to catch errors early.
"""
print("[1] Received request from frontend.")
print(f"[2] PDF template path: {pdf_form_path}")
print("[2] Validating inputs...")

# Validate transcript/user input
transcript_errors = validate_transcript(user_input)
if transcript_errors:
error_msg = f"Invalid transcript: {'; '.join(transcript_errors)}"
print(f"[ERROR] {error_msg}")
raise ValueError(error_msg)

# Validate template fields
field_errors = validate_template_fields(fields)
if field_errors:
error_msg = f"Invalid template fields: {'; '.join(field_errors)}"
print(f"[ERROR] {error_msg}")
raise ValueError(error_msg)

print(f"[3] PDF template path: {pdf_form_path}")

if not os.path.exists(pdf_form_path):
print(f"Error: PDF template not found at {pdf_form_path}")
return None # Or raise an exception

print("[3] Starting extraction and PDF filling process...")
print("[4] Starting extraction and PDF filling process...")
try:
self.llm._target_fields = fields
self.llm._transcript_text = user_input
Expand Down
44 changes: 41 additions & 3 deletions src/filler.py
Original file line number Diff line number Diff line change
@@ -1,16 +1,41 @@
from pdfrw import PdfReader, PdfWriter
from src.llm import LLM
from src.validator import validate_incident
from datetime import datetime


class FormValidationError(Exception):
"""Raised when incident data validation fails before PDF filling."""

def __init__(self, errors: list[str], data: dict = None):
self.errors = errors
self.data = data
message = f"Validation failed with {len(errors)} error(s): {'; '.join(errors)}"
super().__init__(message)


class Filler:
def __init__(self):
pass
def __init__(self, skip_validation: bool = False):
"""
Initialize the Filler.

Args:
skip_validation: If True, skips input validation. Use with caution.
"""
self.skip_validation = skip_validation

def fill_form(self, pdf_form: str, llm: LLM):
def fill_form(self, pdf_form: str, llm: LLM, required_fields: list[str] | None = None):
"""
Fill a PDF form with values from user_input using LLM.
Fields are filled in the visual order (top-to-bottom, left-to-right).

Args:
pdf_form: Path to the PDF template.
llm: LLM instance with transcript and target fields set.
required_fields: Optional list of fields that must be present and valid.

Raises:
FormValidationError: If extracted data fails validation.
"""
output_pdf = (
pdf_form[:-4]
Expand All @@ -23,6 +48,19 @@ def fill_form(self, pdf_form: str, llm: LLM):
t2j = llm.main_loop()
textbox_answers = t2j.get_data() # This is a dictionary

# Validate extracted data before PDF filling
if not self.skip_validation:
print("[VALIDATION] Validating extracted incident data...")
validation_errors = validate_incident(textbox_answers, required_fields)

if validation_errors:
print(f"[VALIDATION FAILED] {len(validation_errors)} error(s) found:")
for error in validation_errors:
print(f" - {error}")
raise FormValidationError(errors=validation_errors, data=textbox_answers)

print("[VALIDATION PASSED] All required fields are valid.")

answers_list = list(textbox_answers.values())

# Read PDF
Expand Down
Loading