MedGraph

A LangGraph-based agent that answers medical questions using a biomedical knowledge graph (PheKnowLator) and optional LLM finetuning. MedGraph grounds user queries in KG entities via vector search, then runs Neo4j Cypher (or structured tools) and synthesizes answers with a small language model.

Features

Query → KG grounding: User question is embedded and matched against entity descriptions (ChromaDB + optional BM25). Retrieved labels are passed to the planner so tool parameters use exact KG entity names.
Structured + raw Cypher: 13 predefined Neo4j tools (disease–phenotype, drug–disease, gene–disease, anatomy, molecular interactions, etc.) plus a generic neo4j_search for custom Cypher.
Two-model pipeline: Planner LLM (e.g. Llama 3.2 3B) chooses tools and parameters; answer LLM (e.g. Llama 3.2 1B) turns tool results into a final, KG-grounded reply.
Optional finetuning: Notebooks and scripts for QLoRA finetuning of the answer model on medical/KG data.

Architecture

User query
    → reason (vector grounding → planner → tool call or answer)
    → [if tool] call_tool (Neo4j per grounding entity + fallback → merge)
    → answer_node (truncate tool results → answer LLM)
    → Final message

State: messages + optional grounding_entities (entity labels from Chroma or Neo4j fallback).
Data: ChromaDB for entity-description vectors; Neo4j for the KG (Entity nodes, RELATES_TO edges).

See docs/ARCHITECTURE.md for a detailed pipeline and file map.

Demo

Download the demo if it doesn’t play inline.

Prerequisites

Python 3.10+
Neo4j (local or remote) with the PheKnowLator-derived graph loaded
vLLM (or OpenAI-compatible endpoints) for planner and answer models
Optional: Docker for Neo4j, vLLM, Chroma server

Quick start

Install inference dependencies

pip install -r inference_requirements.txt

Set environment variables

export NEO4J_URI=bolt://localhost:7687
export NEO4J_USER=neo4j
export NEO4J_PASSWORD=password
export PLANNER_API_BASE=http://localhost:8001/v1   # planner (e.g. 3B)
export ANSWER_API_BASE=http://localhost:8000/v1    # answer (e.g. 1B)

Start services in this order
1. Build the knowledge graph: ./scripts/restore_db.sh
2. Start Chroma: ./scripts/start_chroma.sh
3. Start answer (finetuned) model: ./scripts/start_finetuned_vllm.sh
4. Start planner model: ./scripts/start_planner_model.sh
Run the chat UI
```
streamlit run streamlit_app.py
```
Or run the test script (no UI, from repo root):
```
python tests/test_agent.py
```

Environment variables

Variable	Description	Default
`NEO4J_URI`	Neo4j bolt URL	`bolt://localhost:7687`
`NEO4J_USER`	Neo4j user	`neo4j`
`NEO4J_PASSWORD`	Neo4j password	`password`
`PLANNER_API_BASE`	Planner LLM API (OpenAI-compatible)	`http://localhost:8001/v1`
`ANSWER_API_BASE`	Answer LLM API (OpenAI-compatible)	`http://localhost:8000/v1`

Knowledge graph and vector store

Neo4j (KG)

Schema: nodes Entity (properties: id, type, label, synonym, description), edges RELATES_TO (property type for relation meaning).
Load a prebuilt PheKnowLator export (see neo4j/export_for_neo4j_simplified.py, neo4j/import_nodes.sh, neo4j/import_edges_chunks.sh). Alternatively use neo4j/download_prebuilt_kg.py if you have a prebuilt artifact URL.

Vector store (entity grounding)

Built from the same NodeLabels used for Neo4j (label, description, synonym). Used at query time to resolve free text → KG entity labels.

Build once:

cd neo4j && pip install sentence-transformers chromadb rank_bm25
python build_vector_store.py

Optional: rebuild only the BM25 index from the existing Chroma collection:

python build_vector_store.py --bm25-only

Store path: neo4j/vector_store/ (Chroma persistence + bm25_index.pkl for hybrid search). If the store is missing, the agent falls back to Neo4j neo4j_search_entities for grounding.

Project structure

Path	Purpose
`medgraph/`	Core agent package.
`medgraph/agent.py`	LangGraph graph (reason → call_tool / answer_node).
`medgraph/helpers.py`	Grounding (Chroma + Neo4j fallback), Neo4j tools, reason/call_tool/answer_node logic.
`medgraph/models.py`	Planner and answer `ChatOpenAI` clients (vLLM bases).
`medgraph/prompts.py`	Planner system prompt.
`medgraph/neo4j_connector.py`	Neo4j connection and query execution.
`streamlit_app.py`	Chat UI at repo root; invokes `medgraph` app.
`app.py`	Lightning App wrapper (optional) that serves Streamlit.
`tests/test_agent.py`	Non-UI test with fixed queries.
`tests/test_agent_queries.py`	Compare runs with/without ChromaDB grounding.
`tests/test_queries.md`	Example queries for manual testing.
`neo4j/description_retriever.py`	Chroma + SentenceTransformer retrieval for grounding.
`neo4j/build_vector_store.py`	Build Chroma + BM25 from NodeLabels.
`neo4j/export_for_neo4j_simplified.py`	Export KG to Neo4j CSVs.
`docs/ARCHITECTURE.md`	Detailed pipeline and file map.
`scripts/`	Shell scripts for vLLM, Chroma, DB restore.

Finetuning (optional)

Notebooks: e.g. notebooks/medical_llm_finetuning.ipynb for QLoRA/medical data.
Requirements: requirements.txt (PyTorch, transformers, peft, trl, etc.). Use a separate env if you only need inference.

License and data

Code: see repository.
PheKnowLator and related KG data have their own licenses; ensure compliance when downloading or redistributing.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MedGraph

Features

Architecture

Demo

Prerequisites

Quick start

Environment variables

Knowledge graph and vector store

Neo4j (KG)

Vector store (entity grounding)

Project structure

Finetuning (optional)

License and data

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
docs		docs
medgraph		medgraph
neo4j		neo4j
notebooks		notebooks
qlora-adapter		qlora-adapter
scripts		scripts
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
app.py		app.py
inference.py		inference.py
inference_requirements.txt		inference_requirements.txt
requirements-streamlit.txt		requirements-streamlit.txt
requirements.txt		requirements.txt
run_lightning_app.sh		run_lightning_app.sh
streamlit_app.py		streamlit_app.py

Folders and files

Latest commit

History

Repository files navigation

MedGraph

Features

Architecture

Demo

Prerequisites

Quick start

Environment variables

Knowledge graph and vector store

Neo4j (KG)

Vector store (entity grounding)

Project structure

Finetuning (optional)

License and data

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages