Skip to content

[FEAT]: Batched PDF Field Extraction (main_loop_batched)#400

Open
Harshitha-arch wants to merge 1 commit intofireform-core:mainfrom
Harshitha-arch:feature/batched-extraction
Open

[FEAT]: Batched PDF Field Extraction (main_loop_batched)#400
Harshitha-arch wants to merge 1 commit intofireform-core:mainfrom
Harshitha-arch:feature/batched-extraction

Conversation

@Harshitha-arch
Copy link
Copy Markdown

Currently, FireForm processes each PDF field sequentially with independent LLM calls, which increases latency for multi-field forms.

Rationale:
Batched processing can improve performance and scalability by sending all fields in a single structured prompt and mapping JSON output to PDF fields.

Proposed Solution:

  • Add main_loop_batched() in src/llm.py.
  • Use build_batch_prompt() to create one request for all fields.
  • Parse JSON output and map it to PDF fields.
  • Keep main_loop() unchanged for backward compatibility.
  • Add minimal test in tests/test_forms.py.

Acceptance Criteria:

  • Latency for multi-field PDFs decreases.
  • JSON output matches schema.
  • Works in Docker container.

Additional Context:
This aligns with my GSoC proposal focused on improving FireForm's pipeline efficiency.

##Fixes 399

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant