Refactor: Implement Async Semantic Routing to eliminate O(N) LLM bott…#390
Open
NEHAJAKATE wants to merge 1 commit intofireform-core:mainfrom
Open
Refactor: Implement Async Semantic Routing to eliminate O(N) LLM bott…#390NEHAJAKATE wants to merge 1 commit intofireform-core:mainfrom
NEHAJAKATE wants to merge 1 commit intofireform-core:mainfrom
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
Currently, the extraction pipeline (e.g., in
src/llm.py) iterates sequentially over form fields, creating anO(N)HTTP blocking bottleneck. Attempting to solve this by dumping all fields into a single monolithic prompt causes "Attention Dilution" (Lost in the Middle syndrome) in smaller, local SLMs, leading to hallucinated or omitted fields in the middle of the schema.Changes Proposed
This PR introduces a Pareto-Optimal Semantic Router** to handle extractions concurrently and deterministically.
Schema Chunking: Decomposed the master extraction requirement into logical, domain-specific Pydantic sub-schemas (e.g., Spatial, Medical, Tactical).
Asynchronous Concurrency: Replaced the synchronous blocking loop with
aiohttpandasyncio.gatherto fire focused extraction chunks concurrently without blocking the FastAPI event loop.Non-Destructive Refactor: Implemented the new asynchronous router while maintaining the integrity of the existing project structure.
Impact
Latency Reduction:Reduces wall-clock latency to
O(1)(bound only by the single slowest chunk), drastically speeding up report generation.Accuracy Maximization: Maximizes local SLM accuracy by keeping the context-window hyper-focused on one specific domain per generation.
How to Test
mistralorllama3).(Note: I am submitting this PR as part of my active contribution and exploration of the FireForm architecture for GSoC 2026. I would love any feedback from the maintainers on this approach!)