Skip to content

raphaelDuff/langgraph-sql-agent

Repository files navigation

Dr. Frankstein Data Assistant

Dr. Frankstein Data Assistant

A conversational data analysis agent that answers natural language questions about a SQLite database. The agent uses Claude Sonnet as the LLM, LangGraph for node flow orchestration, and Streamlit as the web interface.

The user types a business question, the agent decides whether a database query is needed, generates and executes SQL with automatic error recovery, and returns a natural language response along with an interactive visualization via Plotly.

Table of Contents


Tech Stack

Layer Technology
LLM Claude Sonnet (claude-sonnet-4-6) via langchain-anthropic
Orchestration LangGraph
Web interface Streamlit
Visualization Plotly
Database SQLite + SQLAlchemy
Package manager uv
Runtime Python 3.13

Running Instructions

Prerequisites

1. Configure environment variables

Edit the .env file and fill in the values:

DB_NAME=anexo_desafio_1.db
ANTHROPIC_API_KEY=your_key_here

# Optional: tracing via LangSmith
LANGSMITH_TRACING=true
LANGSMITH_API_KEY=your_langsmith_key
LANGSMITH_PROJECT=franq-agent

2. Run with Docker

docker-compose up --build

Alternative: Local Execution

# Install dependencies (requires uv)
uv sync

# Web interface
uv run streamlit run app.py

Agent Flow and Architecture Explanation

General flow

Agent flow

Graph nodes

Node Role
resolve_context Rewrites follow-up questions as standalone questions using the conversation history
classify Classifies the question as sql (requires database query) or direct (greeting, meta-question)
schema Dynamically discovers the database schema (tables and columns) on each invocation
planner Creates a JSON reasoning plan before generating SQL (required tables, steps, strategy)
generate_sql Generates a SQLite SELECT query based on the plan and schema
guardrail Blocks destructive keywords (DROP, DELETE, UPDATE, INSERT, etc.), ensuring read-only access
execute Executes the query and captures results or errors
repair If execution fails, uses the error message as feedback for the LLM to fix the broken SQL (max. 3 attempts)
finalize Interprets the results, formulates the natural language response, and defines the visualization type

Architectural decisions

Planning

First, I made the following sketch in a notebook and discussed the project's feasibility with ChatGPT to refine the design:

Graph sketch

After the strategic discussion, the automatic error recovery node was also added to the graph. This foundation was defined:

  • Sequential architecture with a classification node: the classify node evaluates whether the question requires database access before triggering the entire SQL pipeline. Direct questions (greetings, questions about the agent itself) are answered immediately by the finalize node, avoiding unnecessary latency and cost.

  • guardrail node as a security layer: before any execution, the generated SQL is inspected for keywords indicating write or destructive operations. Disallowed queries are rejected before reaching the database, protecting data integrity.

  • Automatic error recovery: the repair node implements a limited retry loop (maximum of 3 attempts), using the execution error message as context for the LLM to fix the broken SQL.

Project Execution

With the graph plan in hand, the first MVP was created using Claude Code. For file organization and architecture, I directed the following setup:

File Architecture

Folder structure suggested by the official LangChain documentation: File Structure

MVP Review and Improvements

With the first MVP in place, I reviewed all the code, adding improvements, removing unnecessary complexity, and changing some architectural choices, for example:

  • Modernized typing: replaced List and Dict from the typing module with the native base classes list and dict.

  • Enums for domain types: created QuestionType and DataVizType as Enum.

  • Separation of system prompt and user prompt: refactored ChatPromptTemplate.from_template to use from_messages with distinct roles (system and human). System prompts carry different weight in the model's attention and are easier to optimize and version independently.

  • Migration from sqlite3 to SQLAlchemy: the database access logic was moved to SQLAlchemy, decoupling the code from the database type and making a future migration to PostgreSQL, MySQL, or another engine easier.

  • Nominal categorical column discovery: the schema now includes distinct values for categorical columns, providing the LLM with exact keywords to build more precise filters and WHERE clauses.

  • LLM configures Plotly axes: the finalize node delegates to the LLM the choice of axis variables (x, y, color) for the chart, adapting the visualization to the question and possible groupings (GROUP BY).


Query Examples Tested

  • "List the 5 states with the highest number of customers who purchased via app in May."

    question 1

  • "How many customers interacted with WhatsApp campaigns in 2024?"

    question 2

  • "Which product categories had the highest average number of purchases per customer?"

    question 3

  • "What is the number of unresolved complaints by channel?"

    question 4

  • "What is the trend in complaints by channel over the last year?"

    question 5


Observability

The project integrates LangSmith for complete tracing of each agent execution. With it you can inspect:

  • LLM-generated plan: the reasoning steps, identified tables, and query strategy produced by the planner node before SQL generation.
  • State between nodes: all fields of the AgentState (resolved_question, schema, plan, sql_query, query_result, final_answer, etc.) are inspectable at each graph transition.
  • Token usage: input and output token consumption per LLM call, enabling cost estimation and identification of the most expensive prompts.
  • Performance: individual latency of each graph node, making it easier to identify bottlenecks in the pipeline.

LangSmith

To enable it, configure the variables in .env:

LANGSMITH_TRACING=true
LANGSMITH_API_KEY=your_langsmith_key
LANGSMITH_PROJECT=franq-agent

Suggested Improvements and Extensions

  • Restrict the agent's scope in the classification node: explicitly limit the LLM's action to questions about the database. If the question falls outside this scope, return a short, standardized response without triggering the SQL pipeline, saving token usage.

  • Migrate to asynchronous execution: refactor the code to use ainvoke and async calls throughout the entire graph, improving scalability and interface responsiveness.

  • Integrate with FastAPI: expose the agent as a REST API via FastAPI, allowing integration with other systems and frontends beyond Streamlit. For scalability and good architecture, this project of mine that follows DDD practices can be used as a base: raphaelDuff/samizdat. As a reference I used this book: Clean Architecture with Python: Implement scalable and maintainable applications using proven architectural principles (English Edition)

  • Migrate schema discovery to a serverless function with caching: move schema_discovery to a Lambda function (or equivalent) executed periodically. Store the schema in long-term cache (e.g., Redis) and eliminate the need to query the database on every invocation.

  • Improve data visualization: for the question "What is the trend in complaints by channel over the last year?" the chosen chart could be a stacked bar chart for better visualization. As an improvement, adding a list of examples to the prompt (FEW-SHOT examples) with the mapping of questions to chart types could improve the system.

  • Merge nodes: we can see in the LangSmith image that execution is slow across several nodes. If the logic of some nodes were combined, it could improve performance. Together with the team, we would need to discuss how worthwhile this is versus the Single Responsibility Principle.

  • Test different system prompts: rewrite both the system prompts and the user prompt in the nodes to see if we can "slim down" the token count to save costs without losing quality.

  • Add metrics to the agent: structure the project to have response relevance metrics (ResponseRelevancy()) and consistency metrics (Faithfulness()) using the ragas library. This would require preparing a database of test questions and answers for agent evaluation.

  • Add an action plan RAG: create a knowledge base (RAG) with guidelines and action plans associated with patterns identified in the data. For example: if complaints by channel are increasing, the agent would query the RAG to suggest corrective actions to the user.

About

A LangGraph-based AI agent that answers natural language questions by dynamically retrieving and reasoning over relational database data.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors