This is a modular, Object-Oriented Programming (OOP) based experimental framework designed for Text-to-Graph (Text2Cypher/GQL) tasks. The system covers the entire pipeline from natural language question transformation to query execution, validation, and multi-dimensional metric evaluation.
The project supports integration with Qwen and other Large Language Models (LLMs) for prediction, connects to graph databases (TuGraph, Google Spanner) and relational databases (SQLite) for Execution Accuracy (EA) verification, and utilizes external tools for in-depth grammatical and structural analysis.
-
Full Pipeline Control: Supports flexible toggling between the Prediction Phase (LLM query generation) and the Evaluation Phase (metrics calculation).
-
LLM Integration: Built-in interface compatible with the OpenAI SDK format (default configuration targets AliCloud DashScope/Qwen).
-
Multi-Database Support:
-
TuGraph: Connects via Bolt protocol.
-
Google Spanner: Supports GQL query validation for cloud-native distributed graph databases.
-
SQLite: Supports traditional SQL execution for cross-paradigm comparisons.
-
Multi-Dimensional Metrics:
-
Execution Accuracy (EA): Compares results returned by the database against gold standards.
-
Google BLEU: Text-level N-gram similarity.
-
External Metrics: Integration with external tools to calculate Grammar Correctness and Structural Similarity.
-
Clean OOP Architecture: Uses abstract base classes (Drivers), making it easy to extend with new model interfaces or database adapters.
\gql-generation-driver
├─ driver/ # Abstract Interface Definitions
├─ example_data/ # Sample Input Data (JSON)
├─ experiment/ # Configuration Files
├─ impl/ # Core Implementation Layer
│ ├─ db_driver/ # Database Adapters (TuGraph, Spanner, SQLite)
│ ├─ evaluation/ # Metric Implementations
│ └─ text2graph_system/ # LLM System Logic
├─ tools/ # External Tools (Evaluation plugins, etc.)
├─ output/ # Prediction Results Directory
├─ evaluation_detail/ # Detailed Evaluation Reports
├─ requirements.txt # Project Dependencies
└─ run_pipeline.py # Main Entry Point
Ensure Python version is >= 3.8. Run the following in the project root:
pip install -r requirements.txt
To ensure Python correctly resolves module imports within the project structure, set the PYTHONPATH before execution.
Windows (PowerShell):
$env:PYTHONPATH="."
Windows (CMD):
set PYTHONPATH=.
Linux/macOS (Bash/Zsh):
export PYTHONPATH=.
The main configuration file is located at experiment/test_config.json. Key fields are described below:
{
"pipeline": {
"run_prediction": true, // Toggle LLM prediction
"run_evaluation": true // Toggle metrics calculation
},
"data": {
"input_path": "example_data/Social_Network_Twitter/Cypher/Social_Network_Twitter_cypher.json",
"output_path": "output/test_result.json"
},
"prediction": {
"api_key": "PLACEHOLDER_FOR_API_KEY", // Your LLM API Key
"base_url": "PLACEHOLDER_FOR_BASE_URL", // Model API Endpoint
"model": "qwen-plus", // Model Name
"schema_path": "example_data/Social_Network_Twitter/Cypher/schema.txt",
"target_lang": "cypher", // Options: cypher, gql, sql
"mode": "zeroshot",
"graph_name": "social_network_twitter",
"max_workers": 5, // Concurrency for API calls
"temperature": 0.0,
"level_fields": [ // Dynamic mapping of Input vs Output fields
["initial_question", "initial_query"],
["level_1", "level_1_query"],
["level_2", "level_2_query"],
["level_3", "level_3_query"],
["level3_with_ext", "level_3_external_knowledge_query"]
]
},
"evaluation": {
"tugraph": {
"db_uri": "bolt://localhost:7687", // TuGraph Connection URI
"db_user": "admin",
"db_pass": "PLACEHOLDER_FOR_DB_PASS"
},
"dbgpt_root": "tools/eval_similarity_grammar", // Root for external eval tools
"spanner": { // Google Spanner Config
"project_id": "PLACEHOLDER_FOR_PROJECT_ID",
"instance_id": "PLACEHOLDER_FOR_INSTANCE_ID",
"database_id": "PLACEHOLDER_FOR_DATABASE_ID"
},
"sqlite": { // SQLite Config
"db_path": "example_data/SQLite/disney.sqlite"
}
}
}
python run_pipeline.py
Use the --config argument to point to a specific config file:
python run_pipeline.py --config experiment/debug_config.json
The sample file geography_5_csv_files_08051006_corpus_seeds.json uses Cypher as the default language for gold standard queries. The accompanying .csv and import_config.json files are specifically formatted for batch import into TuGraph DB.
Each entry includes the database name, the original question, multi-level reasoning questions, and optional external knowledge:
{
"id": "unique_identifier",
"database": "database_name",
"initial_question": "Original natural language question",
"gold_query": "Gold standard Cypher/GQL query",
"level_1": "Level 1 (Coarse-grained reasoning) question",
"level_2": "Level 2 (Structured reasoning) question",
"level_3": "Level 3 (Sub-goal planning) question",
"external_knowledge": "Knowledge from outside sources (Encyclopedia, etc.)",
"difficulty": "easy / medium / hard",
"source": "data_source"
}
The model generates predicted queries for each level, saved in data.output_path:
{
"id": "Unique Identifier",
"database": "Database Name",
"initial_question": "Original Natural Language Question",
"gold_query": "Correct Cypher/GQL corresponding to the original natural language question",
"level_1": "Level 1 (Coarse-grained Reasoning) Question",
"level_2": "Level 2 (Structured Reasoning) Question",
"level_3": "Level 3 (Sub-goal Planning) Question",
"level_4": "Level 4 (Final Reasoning) Question",
"external_knowledge": "Knowledge depending on sources outside the database (e.g., encyclopedia, common facts, can be empty)",
"difficulty": "Question Difficulty (easy / medium / hard)",
"source": "Data Source",
"initial_query": "Model Predicted Query Statement",
"level_1_query": "Model Predicted Query Statement",
"level_2_query": "Model Predicted Query Statement",
"level_3_query": "Model Predicted Query Statement",
"level_3_external_knowledge_query": "Model Predicted Query Statement"
}
Evaluation produces the following metrics for each reasoning level:
- EA (Execution Accuracy)
- Grammar (Syntactic Validity)
- Similarity (Structural Matching)
- Google BLEU (Textual Similarity)
Reports are saved in evaluation_detail/execution_results/ as level-specific JSON files (e.g., level_1_results.json):
[
{
"instance_id": "unique_identifier",
"gold_query": "Standard answer query",
"pred_query": "Model predicted query",
"metrics": {
"ea": 1.0,
"grammar": 1.0,
"similarity": 0.9212
}
}
]