I'm an AI/ML Engineer with 3+ years of experience building and shipping production-grade LLM systems at scale. I specialize in RAG architectures, Voice AI, distributed inference, and real-time conversational agents β turning cutting-edge research into systems that actually work in the real world.
- ποΈ Architected distributed LLM inference platforms serving 1M+ daily queries on AWS EKS
- ποΈ Built multilingual Voice AI agents using LiveKit, WebRTC, Twilio, OpenAI & Deepgram β automating 50K+ calls/month
- β‘ Reduced LLM inference latency 50% (400ms β 200ms p95) and costs 30% via GPTQ quantization & vLLM
- π Cut hallucinations 40% using hybrid Graph + Vector RAG on enterprise knowledge bases (10M+ docs)
- π Published low-resource NLP research for Urdu (70M+ speakers) β targeting ACL/EMNLP 2025
| Project | Stack | Highlights |
|---|---|---|
| AI Calling Agents | LiveKit, WebRTC, Twilio, OpenAI, Deepgram | 100K+ users, 95% automation, 50+ business customers |
| Call Analytics SaaS | FastAPI, LLM, Dashboards | Real-time sentiment, summaries & conversation analytics |
| Dental AI Bot | RAG, WhatsApp/Instagram/Twitter | Domain-specific omnichannel chatbot |
| Restaurant AI Bot | Multimodal RAG, Pinecone, LangGraph | Multi-platform reservations, menu image understanding |
| ICAP AI Bot | LLaMA 3.1, RAG, Ubuntu On-Prem | On-premises deployment, enhanced data privacy |
| Sports Commentary AI | OpenCV, GPT-4, TTS, Edge Computing | Live cricket commentary, sub-second latency |
| Urdu NLP Research | Gemma, PEFT/QLoRA, Custom TTS | Fine-tuned for 70M+ Urdu speakers, targeting ACL/EMNLP 2025 |
π BS Software Engineering β University of Karachi (UBIT) Β· 2020β2024
π Certifications:
- LLMOps Β· Agentic RAG with LlamaIndex Β· Pretraining LLMs Β· Prompt Engineering Β· Building Systems with ChatGPT β DeepLearning.AI
- Introduction to Generative AI β Google Cloud
- Machine Learning Β· Deep Learning Β· Notebook Expert β Kaggle
- n8n Automation β Simplilearn
- Intermediate Python β DataCamp
- π Urdu TTS Architectures β Novel approaches to natural-sounding speech synthesis for low-resource languages
- π Efficient LLM Adaptation for Low-Resource Languages β PEFT/QLoRA techniques for Urdu and similar languages
Targeting ACL / EMNLP 2025