Skip to content
View aneessaheba's full-sized avatar

Block or report aneessaheba

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
aneessaheba/README.md

Hi, I'm Anees Saheba Guddi

Generative AI & LLMs | Distributed Systems | Data Engineering

MS in Applied Data Intelligence @ San José State University

Email LinkedIn Portfolio GitHub YouTube LeetCode Medium Tableau


About Me

MS in Applied Data Intelligence student at San José State University (2025–2027) specializing in Generative AI, LLM Fine-tuning, and Machine Learning Model Development. I architect intelligent systems that combine cutting-edge AI capabilities with production-grade software engineering—from fine-tuning large language models and training deep learning networks to building agentic workflows and distributed architectures.

My expertise spans the full machine learning lifecycle: designing and training neural networks from scratch, fine-tuning foundation models for domain-specific tasks, architecting LLM-based agentic systems with LangChain and LangGraph, and deploying scalable microservices. With professional experience as a Software Development Engineer at HP Inc., I bring hands-on knowledge in building production AI systems, distributed data pipelines, and intelligent automation solutions.

Background

  • Former Software Development Engineer at HP Inc. (Jul 2023 – Aug 2024)
  • B.E. in Information Science and Engineering from Visvesvaraya Technological University (2019-2023)
  • Based in San Jose, California | Originally from Bangalore, India

Currently Working On

  • Building production-ready agentic AI systems with LangChain, LangGraph, and multi-tool orchestration
  • Exploring MLOps workflows, LLM fine-tuning techniques, and distributed machine learning training
  • Deepening knowledge in scalable data platforms and real-time streaming architectures
  • Contributing to open-source AI/ML projects and sharing insights through technical writing
  • Creating educational content on Gen AI and Data Engineering on Medium and YouTube

Featured Projects

Tech Stack: Python | LangChain | ChromaDB | Elasticsearch | BM25 | PyMuPDF | sentence-transformers | LLaMA | Google Gemini

Built a RAG-based chatbot that answers U.S. tax questions for international students grounded in 41 real IRS documents — publications, forms, tax treaties, and university guides — extracted page-by-page with PyMuPDF, split into 2,247 chunks, and embedded using all-MiniLM-L6-v2 into ChromaDB and Elasticsearch. Orchestrated the full pipeline with LangChain, implementing hybrid retrieval (vector search + BM25 merged via Reciprocal Rank Fusion) that boosted hit rate from 70% to 100%, dual safety guards (keyword filter + 0.70 confidence threshold), and personalized answers conditioned on 7 student profile attributes collected at startup. Powered generation with LLaMA and Gemini 2.0 Flash with an extractive fallback, and built a 5-metric evaluation framework (Context Relevance, Hit Rate, Answer Relevance, Faithfulness, LLM-as-a-Judge) achieving a final Judge score of 0.770 across iterative versions.


Tech Stack: FastAPI | Kafka | MySQL | MongoDB | Redis

Built a distributed travel booking system inspired by Kayak supporting search, booking, billing, and analytics for flights, hotels, and cars. Designed backend microservices using FastAPI, Kafka, and relational + NoSQL databases. Developed an AI-powered recommendation service for personalized travel deals and real-time updates. Implemented scalable infrastructure and service communication for resilient, high-throughput operations.


Tech Stack: LangChain | FastAPI | React | MySQL

Built a full-stack Airbnb-style platform with property listings, bookings, and secure authentication. Designed an Agentic AI Concierge using LangChain to generate personalized travel plans and recommendations. Integrated LLM-driven workflows with backend APIs for context-aware, goal-oriented user interactions.


Professional Experience

Hewlett Packard (HP) | Bengaluru, India

Software Development Engineer | Jul 2023 – Aug 2024

  • Implemented rule-based chatbots for Printer Customer Support to guide users through common troubleshooting
  • Prepared and organized data from customer support transcripts and internal troubleshooting documents
  • Performed basic text cleaning and keyword extraction to map user queries to predefined intents
  • Built decision-based conversation flows using simple rules, conditional logic, and fallback responses
  • Integrated chatbot logic with backend support APIs to fetch device status and recommended actions
  • Conducted limited exploration with early LLM tools to assess potential improvements in response quality and coverage

Pheuna Technology | Bengaluru, India

Software Engineer Intern | May 2022 – Aug 2022

  • Designed RESTful APIs using Node.js and Express with Sequelize ORM for real-time event-driven systems
  • Implemented Kafka producers and consumers for distributed message processing
  • Built a cross-platform mobile dashboard using React and Ionic for real-time monitoring

Technical Skills

Generative AI & LLMs

Gen AI APIs Prompt Engineering LangChain LangGraph Tool Calling RAG Vector Databases OpenAI Google Gemini HuggingFace Fine-tuning LoRA Ollama

Machine Learning & Deep Learning

Model Training Supervised Learning Unsupervised Learning Neural Networks CNN RNN Transfer Learning Feature Engineering PyTorch TensorFlow scikit-learn Keras

Programming & Frameworks

Python SQL NumPy Pandas FastAPI Streamlit Flask Node.js Express React HTML5 CSS3 JavaScript

Data & Cloud Systems

PostgreSQL MySQL MongoDB Redis SQLite DuckDB Docker AWS AWS SageMaker Amazon S3 Amazon EC2 AWS ECS Google Cloud

Data Engineering & Big Data

Apache Kafka Apache Spark Apache Airflow Apache Hadoop ETL Snowflake AWS Glue

Data Analysis & Visualization

Matplotlib Seaborn Plotly Tableau Power BI

Tools & Development

Git GitHub VS Code Jupyter Google Colab Postman Jira


Additional Projects

Generative AI & Agentic Systems

Tech Stack: Ollama | Docker | AWS ECS | HTML/CSS/JS

Built a multi-agent workflow using Ollama LLMs (Planner, Reviewer, Finalizer) for automated blog content creation. Developed a web front-end for blog submission with HTML, CSS, and JavaScript, including JSON handling. Deployed using Docker containers and AWS ECS, integrating lightweight local LLMs ('smollm:1.7b', 'Phi3:mini'). Generated automated outputs including tags, summaries, and a publishable content package.

Tech Stack: FastAPI | MongoDB | Google Gemini | Motor

Built an intelligent chatbot with multi-tiered memory architecture using FastAPI and MongoDB. Implements short-term conversational memory, session-based summaries, lifetime user context condensation, and episodic memory retrieval with vector embeddings. Features automatic memory consolidation, importance-weighted fact extraction, and context-aware responses using Google Generative AI.

Tech Stack: ReAct | MRKL | DuckDB | Express

Built a single-agent ReAct + MRKL workflow that analyzes Divvy bike-share trip data to recommend whether riders should purchase a membership or stay on pay-per-ride pricing. Implements custom tools (CSV SQL via DuckDB, policy retrieval with web scraping, calculator) with transparent Thought → Action → Observation traces and policy citations for decision justification.

Tech Stack: Streamlit | Gemini | LangChain

Developed an AI-powered career planning assistant using Gemini LLM and custom tools. Features include Skills Gap Analyzer, Resume Scorer with improvement suggestions (0-10 scale), Salary Estimator, and Interview Question Generator for personalized career guidance.


Data Engineering & Analytics

Tech Stack: PostgreSQL | Docker | ETL | Dimensional Modeling

Built an ETL pipeline for stock market data using Python, integrating multiple sources and automating data ingestion. Designed a dimensional data warehouse in PostgreSQL for structured financial analysis and reporting. Implemented Dockerized workflows for reproducible deployments and efficient environment management. Developed analytics dashboards and SQL queries for stock trends, financial KPIs, and company-level insights.

Tech Stack: AWS Glue | Snowflake | Power BI

Built an ETL pipeline with Spotify API, AWS Glue, and Snowflake. Created interactive Power BI dashboards delivering insights on peak hours, weekend listening patterns, and top artists/tracks.

Tech Stack: Python | Pandas | SQL Server

Built an end-to-end data pipeline using Python and Pandas to process retail orders dataset. Loaded cleaned data into SQL Server and performed advanced analytics to identify top-performing products, regional sales patterns, monthly trends, and year-over-year growth metrics.


Software Engineering & Data Structures

Tech Stack: Python | OOP | CSV | Encryption

Python console-based student grade management application using object-oriented programming principles and CSV data persistence. Supports CRUD operations, search, sort with timing analysis, data encryption, academic reports, and statistical analytics. Implements both array and linked list backends with role-based menus and comprehensive unit tests for performance validation.

Tech Stack: Python | OOP | GUI | SQLite

Object-oriented stock tracking application with both console and GUI interfaces built using Python. Features embedded database management for saving and retrieving stock data, historical price tracking from web APIs and CSV imports, profit/loss report generation, and interactive chart visualization using Python libraries.


Machine Learning & Computer Vision

4DX Movie Technology Using ML

Tech Stack: TensorFlow | CNN | Python | OpenCV | Audio Processing

Developed a CNN-based system that processes synchronized audio-visual streams to detect dynamic movie events in real-time and trigger corresponding physical theater effects (water, wind, seat motion) with millisecond-level precision for immersive 4DX experiences.

Face Mask Detection Using ML

Tech Stack: MobileNetV2 | OpenCV | TensorFlow | Python

Built a real-time face mask detection system using transfer learning with MobileNetV2, achieving 95%+ accuracy at 30+ FPS with OpenCV-based face detection and multi-face classification capabilities optimized for edge deployment.

Credit Card Fraud Detection

Tech Stack: PCA | Random Forest | Isolation Forest | Python | scikit-learn

Implemented an anomaly detection pipeline for identifying fraudulent transactions in highly imbalanced datasets using PCA dimensionality reduction and ensemble methods (Isolation Forest + Random Forest) with SMOTE oversampling and precision-recall optimization.


Data Visualization

Interactive dashboards for business intelligence, trend analysis, and KPI visualization showcasing storytelling with data.


Education

San José State University | San Jose, CA
Master of Science in Applied Data Intelligence | Jan 2025 – May 2027 | GPA: 3.5/4.0

Relevant Coursework: Gen AI LLMs, Agentic AI, Machine Learning, Deep Learning, Big Data Algorithms, Distributed Systems, Scalable Data Platforms

Visvesvaraya Technological University | Karnataka, India
Bachelor of Engineering in Information Science and Engineering | Aug 2019 – Jun 2023 | GPA: 7.9/10.0

Relevant Coursework: Data Structures and Algorithms, Database Systems, Software Engineering


GitHub Statistics

GitHub Stats Top Languages

GitHub Streak

GitHub Trophies

Contribution Graph

LeetCode Statistics

LeetCode Stats


Connect With Me

Email LinkedIn Portfolio GitHub YouTube LeetCode Medium Tableau

Popular repositories Loading

  1. airbnb-agentic-ai airbnb-agentic-ai Public

    Full-stack Airbnb-style rental platform with a LangGraph multi-agent AI concierge powered by Google Gemini. Features SSE streaming, tool calling, async pipelines, and Kubernetes deployment on AWS E…

    JavaScript 1

  2. distributed-kayak-booking-system distributed-kayak-booking-system Public

    A distributed Kayak-inspired travel booking system with microservices, Kafka event streaming, Redis caching, MySQL, MongoDB, and an AI concierge agent powered by Gemini 2.5, RAG pipeline, and QLoRA…

    JavaScript 1

  3. RAG-Tax-Advisory-System-for-Students RAG-Tax-Advisory-System-for-Students Public

    Python 1

  4. realtime-flight-delay-predictor realtime-flight-delay-predictor Public

    An end-to-end big data pipeline that predicts U.S. flight delays in real time. Ingests live flight events via Apache Kafka, processes streams with Spark Structured Streaming, stores historical data…

    1

  5. hadoop-news-analytics hadoop-news-analytics Public

    Distributed word frequency analysis on 5,000 HuffPost news headlines using Apache Hadoop MapReduce and mrjob. Single-node cluster on Docker with HDFS and YARN configured from scratch. Top 50 keywor…

    Python 1

  6. realtime-market-analytics-kafka-spark-hive realtime-market-analytics-kafka-spark-hive Public

    Real-time stock market analytics pipeline using Apache Kafka, Spark Structured Streaming, and Hive. Simulates live OHLC bar data, computes windowed trend signals (BULLISH/BEARISH/NEUTRAL), and visu…

    Python 1