Skip to content

raoashish10/MedGraph

Repository files navigation

MedGraph

A LangGraph-based agent that answers medical questions using a biomedical knowledge graph (PheKnowLator) and optional LLM finetuning. MedGraph grounds user queries in KG entities via vector search, then runs Neo4j Cypher (or structured tools) and synthesizes answers with a small language model.

Features

  • Query → KG grounding: User question is embedded and matched against entity descriptions (ChromaDB + optional BM25). Retrieved labels are passed to the planner so tool parameters use exact KG entity names.
  • Structured + raw Cypher: 13 predefined Neo4j tools (disease–phenotype, drug–disease, gene–disease, anatomy, molecular interactions, etc.) plus a generic neo4j_search for custom Cypher.
  • Two-model pipeline: Planner LLM (e.g. Llama 3.2 3B) chooses tools and parameters; answer LLM (e.g. Llama 3.2 1B) turns tool results into a final, KG-grounded reply.
  • Optional finetuning: Notebooks and scripts for QLoRA finetuning of the answer model on medical/KG data.

Architecture

User query
    → reason (vector grounding → planner → tool call or answer)
    → [if tool] call_tool (Neo4j per grounding entity + fallback → merge)
    → answer_node (truncate tool results → answer LLM)
    → Final message
  • State: messages + optional grounding_entities (entity labels from Chroma or Neo4j fallback).
  • Data: ChromaDB for entity-description vectors; Neo4j for the KG (Entity nodes, RELATES_TO edges).

See docs/ARCHITECTURE.md for a detailed pipeline and file map.

Demo

Download the demo if it doesn’t play inline.

Prerequisites

  • Python 3.10+
  • Neo4j (local or remote) with the PheKnowLator-derived graph loaded
  • vLLM (or OpenAI-compatible endpoints) for planner and answer models
  • Optional: Docker for Neo4j, vLLM, Chroma server

Quick start

  1. Install inference dependencies

    pip install -r inference_requirements.txt
  2. Set environment variables

    export NEO4J_URI=bolt://localhost:7687
    export NEO4J_USER=neo4j
    export NEO4J_PASSWORD=password
    export PLANNER_API_BASE=http://localhost:8001/v1   # planner (e.g. 3B)
    export ANSWER_API_BASE=http://localhost:8000/v1    # answer (e.g. 1B)
  3. Start services in this order

    1. Build the knowledge graph: ./scripts/restore_db.sh
    2. Start Chroma: ./scripts/start_chroma.sh
    3. Start answer (finetuned) model: ./scripts/start_finetuned_vllm.sh
    4. Start planner model: ./scripts/start_planner_model.sh
  4. Run the chat UI

    streamlit run streamlit_app.py

    Or run the test script (no UI, from repo root):

    python tests/test_agent.py

Environment variables

Variable Description Default
NEO4J_URI Neo4j bolt URL bolt://localhost:7687
NEO4J_USER Neo4j user neo4j
NEO4J_PASSWORD Neo4j password password
PLANNER_API_BASE Planner LLM API (OpenAI-compatible) http://localhost:8001/v1
ANSWER_API_BASE Answer LLM API (OpenAI-compatible) http://localhost:8000/v1

Knowledge graph and vector store

Neo4j (KG)

  • Schema: nodes Entity (properties: id, type, label, synonym, description), edges RELATES_TO (property type for relation meaning).
  • Load a prebuilt PheKnowLator export (see neo4j/export_for_neo4j_simplified.py, neo4j/import_nodes.sh, neo4j/import_edges_chunks.sh). Alternatively use neo4j/download_prebuilt_kg.py if you have a prebuilt artifact URL.

Vector store (entity grounding)

  • Built from the same NodeLabels used for Neo4j (label, description, synonym). Used at query time to resolve free text → KG entity labels.

  • Build once:

    cd neo4j && pip install sentence-transformers chromadb rank_bm25
    python build_vector_store.py

    Optional: rebuild only the BM25 index from the existing Chroma collection:

    python build_vector_store.py --bm25-only
  • Store path: neo4j/vector_store/ (Chroma persistence + bm25_index.pkl for hybrid search). If the store is missing, the agent falls back to Neo4j neo4j_search_entities for grounding.

Project structure

Path Purpose
medgraph/ Core agent package.
medgraph/agent.py LangGraph graph (reason → call_tool / answer_node).
medgraph/helpers.py Grounding (Chroma + Neo4j fallback), Neo4j tools, reason/call_tool/answer_node logic.
medgraph/models.py Planner and answer ChatOpenAI clients (vLLM bases).
medgraph/prompts.py Planner system prompt.
medgraph/neo4j_connector.py Neo4j connection and query execution.
streamlit_app.py Chat UI at repo root; invokes medgraph app.
app.py Lightning App wrapper (optional) that serves Streamlit.
tests/test_agent.py Non-UI test with fixed queries.
tests/test_agent_queries.py Compare runs with/without ChromaDB grounding.
tests/test_queries.md Example queries for manual testing.
neo4j/description_retriever.py Chroma + SentenceTransformer retrieval for grounding.
neo4j/build_vector_store.py Build Chroma + BM25 from NodeLabels.
neo4j/export_for_neo4j_simplified.py Export KG to Neo4j CSVs.
docs/ARCHITECTURE.md Detailed pipeline and file map.
scripts/ Shell scripts for vLLM, Chroma, DB restore.

Finetuning (optional)

  • Notebooks: e.g. notebooks/medical_llm_finetuning.ipynb for QLoRA/medical data.
  • Requirements: requirements.txt (PyTorch, transformers, peft, trl, etc.). Use a separate env if you only need inference.

License and data

  • Code: see repository.
  • PheKnowLator and related KG data have their own licenses; ensure compliance when downloading or redistributing.

About

LangGraph agent over a biomedical KG (PheKnowLator/Neo4j) with ChromaDB grounding and a two-LLM planner/answer pipeline.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors