Skip to content
View pranavkumaarofficial's full-sized avatar

Block or report pranavkumaarofficial

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Pranav Kumaar

Software Engineer · Machine Learning Research · Production ML Systems

Multi-Agent LLM Systems Retrieval-Augmented Generation Local & Cost-Aware Inference Production ML Infrastructure

I am a software engineer who builds production-grade AI systems — focusing on how models, data, and infrastructure behave under real-world constraints such as latency, cost, scale, and reliability.

Portfolio · LinkedIn · Email


Research & Publications

ICMLC 2026 When Graph Structure Hurts: Lightweight Path Ranking for Dense KG-RAG
93.9% AUC · 13× fewer parameters than GNN baselines · Designed for dense, production-scale knowledge graphs

Selected Systems & Public Projects

Channel AI
Conversational BI Platform

Results
– Reduced enterprise reporting cycles from days to minutes
– Deployed across 4 enterprise pilots and 12 SMB environments
– Sub-20s latency on multi-million-row analytical workloads

System
Multi-agent LangGraph orchestration over Apache Iceberg

Stack
LangGraph · OpenAI Agents SDK · Iceberg · RAG · LlamaIndex · Qdrant · WhatsApp API

https://github.com/pranavkumaarofficial/newdhatu-enterprise

NLCLI Wizard
Local LLM Tooling

Results
– 83.3% accuracy translating natural language to shell commands
– Fully offline CPU inference (810 MB quantized model)
– ~1.5s latency with zero external dependencies

System
Gemma 3 1B fine-tuned via QLoRA and quantized to GGUF

Data
1,500 manually verified command mappings

https://github.com/pranavkumaarofficial/nlcli-wizard


Production Case Studies (No Public Repository)

OneSKU
Hybrid Retrieval System

Implemented within a client-facing production environment; source code not publicly releasable.

Results
– 94% precision on catalog-matching benchmarks
– Sub-15s query latency across multi-million SKU inventories
– Rolled out across 20+ vendor catalogs

Engineering Notes
– Hybrid BM25 + dense retrieval outperformed purely neural approaches on noisy catalogs
– Explicit separation of categorical (exact-match) and numerical (range-aware) attributes
– Vendor-specific schema reconciliation logic


Systems in Progress

Efficient Agent Routing Cost-aware agent selection for tool-heavy LLM workflows under strict latency budgets
Small Language Models for Analytics Local inference, quantization, and structured reasoning for domain-specific business intelligence

Technical Focus Areas

AI / ML Systems
Multi-agent orchestration · RAG · PEFT · Quantization · Model optimization
Data & Infrastructure
Apache Iceberg · PostgreSQL · Vector databases · Docker · Kubernetes · Cloud platforms
Production Engineering
FastAPI · Python · TypeScript · OAuth2 · PKI · HL7 / FHIR interoperability

📫 Connect

Portfolio · LinkedIn · Email

Software engineer interested in scalable AI systems, local LLM deployment, and production ML infrastructure

Pinned Loading

  1. nlcli-wizard nlcli-wizard Public

    Natural language control for Python CLI tools using locally-trained SLMs (CPU inference)

    Python 30 3

  2. newdhatu-enterprise newdhatu-enterprise Public

    Documentation and design artifacts for Channel AI (Formerly New Dhatu), a multi-agent conversational analytics system.

  3. python-est python-est Public

    🔐 Enterprise EST (RFC 7030) protocol server in Python. Secure certificate enrollment, multi-CA support, TLS 1.3, production-ready PKI solution.

    Python 2

  4. venvy venvy Public

    Fast Python virtual environment manager for all OS'es

    Python 3