SQL Agent

This project implements a SQL Agent using LangGraph which generates SQL based on user's query and relational database, then returns a natural-language answer.

Agent's answers to sample questions can be found at [00_docs/AgentAnswers.pdf]

1. System Architecture

For this project, I used Open WebUI as the GUI after spending a significant amount of time trying to render graphs with LangChain Agent Chat UI. Unfortunately, I couldn’t pass custom UI components due to a behavior similar to that described in this issue.

Pipe acts as a lightweight communication layer between Open WebUI and LangGraph, enabling flexible and structured server–client data exchange.

Regarding the LLM provider, I chose Gemini due to its low cost-per-token while still being one of the most reliable providers available. New users can also start on a free plan with AUD 447 in credits for 90 days, which made it an easy choice.

Finally, PostgreSQL was selected as the database management system due to its strict typing (compared to SQLLite or MySQL), which is critical for ensuring deterministic SQL queries generated by the agent.

2. Graph Structure

The graph is orgainized into three main layers:

1. SQL:

Genrates SQL from user's question.
Validates & executes the generated SQL.

2. Visualization:

Produces visualizations based on query result.

3. Natural Language:

Generates natural-language answers from query results.
Summarizes and explains visualization.
Suggests new insights derived from the dataset.

3. Requirements

psycopg==3.3.2
pandas==2.3.3
pyyaml==6.0.3
matplotlib==3.10.8
langgraph==1.0.6
langchain==1.2.0
langchain-community==0.4.1
langchain_google_genai==4.2.0
langgraph-cli[inmem]
open-webui==0.7.2

Warning

To run LangGraph Server, ensure that:

Python version is 3.11
rustc and C++ installed with Visual Studio

4. Run Locally

Clone Repository

git clone https://github.com/isthatm/SQL-Agent.git

cd 01_src

Install Dependencies

python -m venv .venv

.\.venv\Scripts\activate

pip install -r requirements.txt

Update Environment Variables

Open .env file and update the following variables:

Set LANGSMITH_API_KEY to your API key on LangSmith.
Set LANGSMITH_PROJECT to your project name on LangSmith.
Set GOOGLE_API_KEY to your API key on Google AI Studio API Key

Load Data

Open 01_src\config.yaml file.
Set RAW_DATA_DIR to your directory with raw .csv files. If you use the data in 02_data, remember to unzip sample_data.zip first.

DATA:
  RAW_DATA_DIR: ../02_data

Update the following parameters to match your database settings:

DATA:
  ...
  DB_CONNECTION:
    host: localhost        # Database host (use localhost for local DB)
    port: 5432             # Database port (default for PostgreSQL)
    dbname: postgres       # Name of the database
    user: postgres         # Database username
    password: 1            # User's password
    dialect: postgresql    # Database type
    driver: psycopg        # Python driver for PostgreSQL

Load data from .csv files onto SQL database:

python .\prepare_data.py

Run LangGraph Server:

Open a new terminal and run the following command:

langgraph dev

Your agent will be hosted at http://127.0.0.1:2024

Run Open WebUI

In another terminal:

open-webui serve

Your UI will be hosted at http://localhost:8080
Click on the profile icon at the bottom left corner > Admin Panel > Functions > New Function
Copy the code in 01_src\open-webui\langgraph_pipe.py and paste it here (Remember to enable this pipe after saving).

Notes: When running these commands ensure that your current directory is 01_src and virtual environment is activated.

5. Evaluation

A test suite implemented in pytest is provided to evaluate the agent’s performance. The suite includes 20 questions categorized by difficulty level: easy, medium, hard, and expert. Agent achieves a score of 18/20. The missed cases are primarily related to understanding the relationship between franchiseId and franchise names, where a single franchise name may correspond to multiple franchiseId values.

The evaluation consists of two main components:

SQL Generation Test:

This test compares the SQL queries generated by the agent against predefined golden queries.
Correctness is determined by comparing the execution results of the generated queries with the golden query results.
Agent’s Answers Test:

This test evaluates the correctness of the agent’s final answers using GEval, a large language model–based evaluation metric provided by DeepEval.

To run the tests, execute the following commands:

cd 01_src

pytest

6. Important Files

File(s)	Description
`config.yaml`	Config file for loading data onto database
`langgraph.json`	Config file for LangGraph Server
`langgraph_pipe.py`	Messenger between Open WebUI and LangGraph

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
00_docs		00_docs
01_src		01_src
02_data		02_data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SQL Agent

1. System Architecture

2. Graph Structure

3. Requirements

4. Run Locally

Clone Repository

Install Dependencies

Update Environment Variables

Load Data

Run LangGraph Server:

Run Open WebUI

5. Evaluation

6. Important Files

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SQL Agent

1. System Architecture

2. Graph Structure

3. Requirements

4. Run Locally

Clone Repository

Install Dependencies

Update Environment Variables

Load Data

Run LangGraph Server:

Run Open WebUI

5. Evaluation

6. Important Files

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages