Dr. Frankstein Data Assistant

A conversational data analysis agent that answers natural language questions about a SQLite database. The agent uses Claude Sonnet as the LLM, LangGraph for node flow orchestration, and Streamlit as the web interface.

The user types a business question, the agent decides whether a database query is needed, generates and executes SQL with automatic error recovery, and returns a natural language response along with an interactive visualization via Plotly.

Tech Stack

Layer	Technology
LLM	Claude Sonnet (`claude-sonnet-4-6`) via `langchain-anthropic`
Orchestration	LangGraph
Web interface	Streamlit
Visualization	Plotly
Database	SQLite + SQLAlchemy
Package manager	`uv`
Runtime	Python 3.13

Running Instructions

Prerequisites

Docker and Docker Compose installed
Anthropic API key (ANTHROPIC_API_KEY)

1. Configure environment variables

Edit the .env file and fill in the values:

DB_NAME=anexo_desafio_1.db
ANTHROPIC_API_KEY=your_key_here

# Optional: tracing via LangSmith
LANGSMITH_TRACING=true
LANGSMITH_API_KEY=your_langsmith_key
LANGSMITH_PROJECT=franq-agent

2. Run with Docker

docker-compose up --build

Alternative: Local Execution

# Install dependencies (requires uv)
uv sync

# Web interface
uv run streamlit run app.py

Agent Flow and Architecture Explanation

General flow

Graph nodes

Node	Role
`resolve_context`	Rewrites follow-up questions as standalone questions using the conversation history
`classify`	Classifies the question as `sql` (requires database query) or `direct` (greeting, meta-question)
`schema`	Dynamically discovers the database schema (tables and columns) on each invocation
`planner`	Creates a JSON reasoning plan before generating SQL (required tables, steps, strategy)
`generate_sql`	Generates a SQLite `SELECT` query based on the plan and schema
`guardrail`	Blocks destructive keywords (`DROP`, `DELETE`, `UPDATE`, `INSERT`, etc.), ensuring read-only access
`execute`	Executes the query and captures results or errors
`repair`	If execution fails, uses the error message as feedback for the LLM to fix the broken SQL (max. 3 attempts)
`finalize`	Interprets the results, formulates the natural language response, and defines the visualization type

Architectural decisions

Planning

First, I made the following sketch in a notebook and discussed the project's feasibility with ChatGPT to refine the design:

After the strategic discussion, the automatic error recovery node was also added to the graph. This foundation was defined:

Sequential architecture with a classification node: the classify node evaluates whether the question requires database access before triggering the entire SQL pipeline. Direct questions (greetings, questions about the agent itself) are answered immediately by the finalize node, avoiding unnecessary latency and cost.
guardrail node as a security layer: before any execution, the generated SQL is inspected for keywords indicating write or destructive operations. Disallowed queries are rejected before reaching the database, protecting data integrity.
Automatic error recovery: the repair node implements a limited retry loop (maximum of 3 attempts), using the execution error message as context for the LLM to fix the broken SQL.

Project Execution

With the graph plan in hand, the first MVP was created using Claude Code. For file organization and architecture, I directed the following setup:

File Architecture

Folder structure suggested by the official LangChain documentation: File Structure

MVP Review and Improvements

With the first MVP in place, I reviewed all the code, adding improvements, removing unnecessary complexity, and changing some architectural choices, for example:

Modernized typing: replaced List and Dict from the typing module with the native base classes list and dict.
Enums for domain types: created QuestionType and DataVizType as Enum.
Separation of system prompt and user prompt: refactored ChatPromptTemplate.from_template to use from_messages with distinct roles (system and human). System prompts carry different weight in the model's attention and are easier to optimize and version independently.
Migration from sqlite3 to SQLAlchemy: the database access logic was moved to SQLAlchemy, decoupling the code from the database type and making a future migration to PostgreSQL, MySQL, or another engine easier.
Nominal categorical column discovery: the schema now includes distinct values for categorical columns, providing the LLM with exact keywords to build more precise filters and WHERE clauses.
LLM configures Plotly axes: the finalize node delegates to the LLM the choice of axis variables (x, y, color) for the chart, adapting the visualization to the question and possible groupings (GROUP BY).

Query Examples Tested

"List the 5 states with the highest number of customers who purchased via app in May."
"How many customers interacted with WhatsApp campaigns in 2024?"
"Which product categories had the highest average number of purchases per customer?"
"What is the number of unresolved complaints by channel?"
"What is the trend in complaints by channel over the last year?"

Observability

The project integrates LangSmith for complete tracing of each agent execution. With it you can inspect:

LLM-generated plan: the reasoning steps, identified tables, and query strategy produced by the planner node before SQL generation.
State between nodes: all fields of the AgentState (resolved_question, schema, plan, sql_query, query_result, final_answer, etc.) are inspectable at each graph transition.
Token usage: input and output token consumption per LLM call, enabling cost estimation and identification of the most expensive prompts.
Performance: individual latency of each graph node, making it easier to identify bottlenecks in the pipeline.

To enable it, configure the variables in .env:

LANGSMITH_TRACING=true
LANGSMITH_API_KEY=your_langsmith_key
LANGSMITH_PROJECT=franq-agent

Suggested Improvements and Extensions

Restrict the agent's scope in the classification node: explicitly limit the LLM's action to questions about the database. If the question falls outside this scope, return a short, standardized response without triggering the SQL pipeline, saving token usage.
Migrate to asynchronous execution: refactor the code to use ainvoke and async calls throughout the entire graph, improving scalability and interface responsiveness.
Integrate with FastAPI: expose the agent as a REST API via FastAPI, allowing integration with other systems and frontends beyond Streamlit. For scalability and good architecture, this project of mine that follows DDD practices can be used as a base: raphaelDuff/samizdat. As a reference I used this book: Clean Architecture with Python: Implement scalable and maintainable applications using proven architectural principles (English Edition)
Migrate schema discovery to a serverless function with caching: move schema_discovery to a Lambda function (or equivalent) executed periodically. Store the schema in long-term cache (e.g., Redis) and eliminate the need to query the database on every invocation.
Improve data visualization: for the question "What is the trend in complaints by channel over the last year?" the chosen chart could be a stacked bar chart for better visualization. As an improvement, adding a list of examples to the prompt (FEW-SHOT examples) with the mapping of questions to chart types could improve the system.
Merge nodes: we can see in the LangSmith image that execution is slow across several nodes. If the logic of some nodes were combined, it could improve performance. Together with the team, we would need to discuss how worthwhile this is versus the Single Responsibility Principle.
Test different system prompts: rewrite both the system prompts and the user prompt in the nodes to see if we can "slim down" the token count to save costs without losing quality.
Add metrics to the agent: structure the project to have response relevance metrics (ResponseRelevancy()) and consistency metrics (Faithfulness()) using the ragas library. This would require preparing a database of test questions and answers for agent evaluation.
Add an action plan RAG: create a knowledge base (RAG) with guidelines and action plans associated with patterns identified in the data. For example: if complaints by channel are increasing, the agent would query the RAG to suggest corrective actions to the user.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
franq_agent		franq_agent
images		images
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
Dockerfile		Dockerfile
README.md		README.md
anexo_desafio_1.db		anexo_desafio_1.db
app.py		app.py
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dr. Frankstein Data Assistant

Table of Contents

Tech Stack

Running Instructions

Prerequisites

1. Configure environment variables

2. Run with Docker

Alternative: Local Execution

Agent Flow and Architecture Explanation

General flow

Graph nodes

Architectural decisions

Planning

Project Execution

File Architecture

MVP Review and Improvements

Query Examples Tested

Observability

Suggested Improvements and Extensions

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Dr. Frankstein Data Assistant

Table of Contents

Tech Stack

Running Instructions

Prerequisites

1. Configure environment variables

2. Run with Docker

Alternative: Local Execution

Agent Flow and Architecture Explanation

General flow

Graph nodes

Architectural decisions

Planning

Project Execution

File Architecture

MVP Review and Improvements

Query Examples Tested

Observability

Suggested Improvements and Extensions

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages