A conversational data analysis agent that answers natural language questions about a SQLite database. The agent uses Claude Sonnet as the LLM, LangGraph for node flow orchestration, and Streamlit as the web interface.
The user types a business question, the agent decides whether a database query is needed, generates and executes SQL with automatic error recovery, and returns a natural language response along with an interactive visualization via Plotly.
- Tech Stack
- Running Instructions
- Agent Flow and Architecture Explanation
- Observability
- Query Examples Tested
- Suggested Improvements and Extensions
| Layer | Technology |
|---|---|
| LLM | Claude Sonnet (claude-sonnet-4-6) via langchain-anthropic |
| Orchestration | LangGraph |
| Web interface | Streamlit |
| Visualization | Plotly |
| Database | SQLite + SQLAlchemy |
| Package manager | uv |
| Runtime | Python 3.13 |
- Docker and Docker Compose installed
- Anthropic API key (
ANTHROPIC_API_KEY)
Edit the .env file and fill in the values:
DB_NAME=anexo_desafio_1.db
ANTHROPIC_API_KEY=your_key_here
# Optional: tracing via LangSmith
LANGSMITH_TRACING=true
LANGSMITH_API_KEY=your_langsmith_key
LANGSMITH_PROJECT=franq-agentdocker-compose up --build# Install dependencies (requires uv)
uv sync
# Web interface
uv run streamlit run app.py| Node | Role |
|---|---|
resolve_context |
Rewrites follow-up questions as standalone questions using the conversation history |
classify |
Classifies the question as sql (requires database query) or direct (greeting, meta-question) |
schema |
Dynamically discovers the database schema (tables and columns) on each invocation |
planner |
Creates a JSON reasoning plan before generating SQL (required tables, steps, strategy) |
generate_sql |
Generates a SQLite SELECT query based on the plan and schema |
guardrail |
Blocks destructive keywords (DROP, DELETE, UPDATE, INSERT, etc.), ensuring read-only access |
execute |
Executes the query and captures results or errors |
repair |
If execution fails, uses the error message as feedback for the LLM to fix the broken SQL (max. 3 attempts) |
finalize |
Interprets the results, formulates the natural language response, and defines the visualization type |
First, I made the following sketch in a notebook and discussed the project's feasibility with ChatGPT to refine the design:
After the strategic discussion, the automatic error recovery node was also added to the graph. This foundation was defined:
-
Sequential architecture with a classification node: the
classifynode evaluates whether the question requires database access before triggering the entire SQL pipeline. Direct questions (greetings, questions about the agent itself) are answered immediately by thefinalizenode, avoiding unnecessary latency and cost. -
guardrailnode as a security layer: before any execution, the generated SQL is inspected for keywords indicating write or destructive operations. Disallowed queries are rejected before reaching the database, protecting data integrity. -
Automatic error recovery: the
repairnode implements a limited retry loop (maximum of 3 attempts), using the execution error message as context for the LLM to fix the broken SQL.
With the graph plan in hand, the first MVP was created using Claude Code. For file organization and architecture, I directed the following setup:
Folder structure suggested by the official LangChain documentation: File Structure
With the first MVP in place, I reviewed all the code, adding improvements, removing unnecessary complexity, and changing some architectural choices, for example:
-
Modernized typing: replaced
ListandDictfrom thetypingmodule with the native base classeslistanddict. -
Enums for domain types: created
QuestionTypeandDataVizTypeasEnum. -
Separation of system prompt and user prompt: refactored
ChatPromptTemplate.from_templateto usefrom_messageswith distinct roles (systemandhuman). System prompts carry different weight in the model's attention and are easier to optimize and version independently. -
Migration from
sqlite3to SQLAlchemy: the database access logic was moved to SQLAlchemy, decoupling the code from the database type and making a future migration to PostgreSQL, MySQL, or another engine easier. -
Nominal categorical column discovery: the schema now includes distinct values for categorical columns, providing the LLM with exact keywords to build more precise filters and
WHEREclauses. -
LLM configures Plotly axes: the
finalizenode delegates to the LLM the choice of axis variables (x,y,color) for the chart, adapting the visualization to the question and possible groupings (GROUP BY).
-
"List the 5 states with the highest number of customers who purchased via app in May."
-
"How many customers interacted with WhatsApp campaigns in 2024?"
-
"Which product categories had the highest average number of purchases per customer?"
-
"What is the number of unresolved complaints by channel?"
-
"What is the trend in complaints by channel over the last year?"
The project integrates LangSmith for complete tracing of each agent execution. With it you can inspect:
- LLM-generated plan: the reasoning steps, identified tables, and query strategy produced by the
plannernode before SQL generation. - State between nodes: all fields of the AgentState (
resolved_question,schema,plan,sql_query,query_result,final_answer, etc.) are inspectable at each graph transition. - Token usage: input and output token consumption per LLM call, enabling cost estimation and identification of the most expensive prompts.
- Performance: individual latency of each graph node, making it easier to identify bottlenecks in the pipeline.
To enable it, configure the variables in .env:
LANGSMITH_TRACING=true
LANGSMITH_API_KEY=your_langsmith_key
LANGSMITH_PROJECT=franq-agent-
Restrict the agent's scope in the classification node: explicitly limit the LLM's action to questions about the database. If the question falls outside this scope, return a short, standardized response without triggering the SQL pipeline, saving token usage.
-
Migrate to asynchronous execution: refactor the code to use
ainvokeand async calls throughout the entire graph, improving scalability and interface responsiveness. -
Integrate with FastAPI: expose the agent as a REST API via FastAPI, allowing integration with other systems and frontends beyond Streamlit. For scalability and good architecture, this project of mine that follows DDD practices can be used as a base: raphaelDuff/samizdat. As a reference I used this book: Clean Architecture with Python: Implement scalable and maintainable applications using proven architectural principles (English Edition)
-
Migrate schema discovery to a serverless function with caching: move
schema_discoveryto a Lambda function (or equivalent) executed periodically. Store the schema in long-term cache (e.g., Redis) and eliminate the need to query the database on every invocation. -
Improve data visualization: for the question "What is the trend in complaints by channel over the last year?" the chosen chart could be a stacked bar chart for better visualization. As an improvement, adding a list of examples to the prompt (FEW-SHOT examples) with the mapping of questions to chart types could improve the system.
-
Merge nodes: we can see in the LangSmith image that execution is slow across several nodes. If the logic of some nodes were combined, it could improve performance. Together with the team, we would need to discuss how worthwhile this is versus the Single Responsibility Principle.
-
Test different system prompts: rewrite both the system prompts and the user prompt in the nodes to see if we can "slim down" the token count to save costs without losing quality.
-
Add metrics to the agent: structure the project to have response relevance metrics (
ResponseRelevancy()) and consistency metrics (Faithfulness()) using theragaslibrary. This would require preparing a database of test questions and answers for agent evaluation. -
Add an action plan RAG: create a knowledge base (RAG) with guidelines and action plans associated with patterns identified in the data. For example: if complaints by channel are increasing, the agent would query the RAG to suggest corrective actions to the user.







