Skip to content

Refactor: Implement Async Semantic Routing to eliminate O(N) LLM bott…#390

Open
NEHAJAKATE wants to merge 1 commit intofireform-core:mainfrom
NEHAJAKATE:feature/async-semantic-router
Open

Refactor: Implement Async Semantic Routing to eliminate O(N) LLM bott…#390
NEHAJAKATE wants to merge 1 commit intofireform-core:mainfrom
NEHAJAKATE:feature/async-semantic-router

Conversation

@NEHAJAKATE
Copy link
Copy Markdown

Motivation

Currently, the extraction pipeline (e.g., in src/llm.py) iterates sequentially over form fields, creating an O(N) HTTP blocking bottleneck. Attempting to solve this by dumping all fields into a single monolithic prompt causes "Attention Dilution" (Lost in the Middle syndrome) in smaller, local SLMs, leading to hallucinated or omitted fields in the middle of the schema.

Changes Proposed

This PR introduces a Pareto-Optimal Semantic Router** to handle extractions concurrently and deterministically.
Schema Chunking: Decomposed the master extraction requirement into logical, domain-specific Pydantic sub-schemas (e.g., Spatial, Medical, Tactical).
Asynchronous Concurrency: Replaced the synchronous blocking loop with aiohttp and asyncio.gather to fire focused extraction chunks concurrently without blocking the FastAPI event loop.
Non-Destructive Refactor: Implemented the new asynchronous router while maintaining the integrity of the existing project structure.

Impact

Latency Reduction:Reduces wall-clock latency to O(1) (bound only by the single slowest chunk), drastically speeding up report generation.
Accuracy Maximization: Maximizes local SLM accuracy by keeping the context-window hyper-focused on one specific domain per generation.

How to Test

  1. Start the local server and ensure Ollama is running a local model (e.g., mistral or llama3).
  2. Submit a mock incident transcript to the extraction endpoint.
  3. Observe the concurrent processing in the server logs and verify the structural integrity of the returned JSON against the Pydantic schemas.

(Note: I am submitting this PR as part of my active contribution and exploration of the FireForm architecture for GSoC 2026. I would love any feedback from the maintainers on this approach!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant