[FEAT]: Batched PDF Field Extraction (main_loop_batched) by Harshitha-arch · Pull Request #400 · fireform-core/FireForm

Harshitha-arch · 2026-03-31T08:24:42Z

Currently, FireForm processes each PDF field sequentially with independent LLM calls, which increases latency for multi-field forms.

Rationale:
Batched processing can improve performance and scalability by sending all fields in a single structured prompt and mapping JSON output to PDF fields.

Proposed Solution:

Add main_loop_batched() in src/llm.py.
Use build_batch_prompt() to create one request for all fields.
Parse JSON output and map it to PDF fields.
Keep main_loop() unchanged for backward compatibility.
Add minimal test in tests/test_forms.py.

Acceptance Criteria:

Latency for multi-field PDFs decreases.
JSON output matches schema.
Works in Docker container.

Additional Context:
This aligns with my GSoC proposal focused on improving FireForm's pipeline efficiency.

##Fixes 399

…ireform-core#399

Implement batched PDF field extraction (main_loop_batched) for issue f…

45d6b49

…ireform-core#399

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEAT]: Batched PDF Field Extraction (main_loop_batched)#400

[FEAT]: Batched PDF Field Extraction (main_loop_batched)#400
Harshitha-arch wants to merge 1 commit intofireform-core:mainfrom
Harshitha-arch:feature/batched-extraction

Harshitha-arch commented Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Harshitha-arch commented Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant