📄 Paper: CG-Bench: Can Language Models Assist Call Graph Construction in the Real World? - Published at LMPL@SPLASH2025
CG-Bench is a comprehensive benchmark dataset designed to evaluate the capabilities of Large Language Models (LLMs) in assisting with call graph construction in real-world C/C++ codebases. The benchmark focuses specifically on challenging indirect function calls through function pointers, which are notoriously difficult for traditional static analysis tools to resolve.
Call graph construction is a fundamental program analysis technique crucial for various software engineering tasks including debugging, optimization, security analysis, and program comprehension. While direct function calls are straightforward to analyze, indirect calls through function pointers present significant challenges for automated tools.
This benchmark provides:
- Real-world complexity: Examples extracted from popular open-source projects
- Diverse patterns: Multiple categories of function pointer usage patterns
- Structured format: Consistent annotation format for evaluation
- Comprehensive coverage: 104 examples across 7 major open-source projects
The benchmark includes examples from seven major C/C++ projects:
| Project | Domain | Lines of Code(KLoC) | Description |
|---|---|---|---|
| openssh | Security | ~120 | Secure Shell (SSH) protocol implementation |
| curl | Networking | ~182 | Command line tool and library for transferring data |
| redis | Database | ~200 | In-memory data structure store |
| zfs | File System | ~380 | ZFS file system implementation |
| wrk | Benchmarking | ~601 | HTTP benchmarking tool |
| ffmpeg | Multimedia | ~1,257 | Complete multimedia framework |
| gcc | Compiler | ~6,200 | GNU Compiler Collection |
| Total | All Domains | ~8,940 | 7 major open-source projects |
The benchmark categorizes function pointer usage patterns into 11 distinct types:
fnptr-callback(15 examples): Function pointers used as callbacksfnptr-cast(7 examples): Function pointers with type castingfnptr-dynamic-call(5 examples): Dynamically resolved function callsfnptr-global-array(6 examples): Function pointers in global arraysfnptr-global-struct-array(12 examples): Function pointers in arrays within global structuresfnptr-global-struct(11 examples): Function pointers in global structuresfnptr-library(20 examples): Function pointers in library interfacesfnptr-only(12 examples): Basic function pointer calls without complex patternsfnptr-struct(14 examples): Function pointers stored in structuresfnptr-varargs(1 example): Function pointers with variable argumentsfnptr-virtual(1 example): Virtual function-like patterns in C
CG-Bench/
├── README.md # This file
├── LICENSE # MIT License
├── projects.md # Detailed project statistics
├── extract_from_markdowns.py # Data extraction script
├── fnptr-*.md # Category-specific examples
└── CG_Bench_Can_Language_Models_Assist_Call_Graph_Construction_in_the_Real_World.pdf
Each example follows a consistent structure:
# Example N
## Callsite
*Full path and location of the function pointer call*
fnptr: *function_pointer_name*
targets: target_function1, target_function2, ...
## Related Code Snippets
```c
// Relevant code context showing the function pointer usageTo generate a structured JSON dataset from the markdown files:
python3 extract_from_markdowns.pyThis creates cgbench.json containing all examples in a structured format suitable for evaluation.
The generated JSON follows this structure:
{
"project_name": {
"callsite_path": {
"callsite": "function_pointer_name",
"type": "category",
"chain_summary": [
{
"source_code": ["line1", "line2", ...],
"parent": ""
}
],
"callees": {
"targets": {
"target_function": ""
}
}
}
}
}- Total Examples: 104
- Project Categories: 7 major open-source projects
- Function Pointer Patterns: 11 distinct categories
- Code Contexts: Multiple code snippets per example showing usage patterns
- Real-world Complexity: Examples from production codebases
This benchmark is designed for:
- LLM Evaluation: Assessing language models' ability to understand complex code patterns
- Tool Development: Benchmarking static analysis tools for call graph construction
- Research: Studying function pointer resolution in real-world codebases
- Education: Understanding various function pointer usage patterns in C/C++
This benchmark accompanies the research paper: "CG-Bench: Can Language Models Assist Call Graph Construction in the Real World?"
The paper provides detailed methodology, evaluation results, and analysis of LLM performance on call graph construction tasks.
We welcome contributions to expand the benchmark! Please consider:
- Adding examples from additional open-source projects
- Identifying new function pointer usage patterns
- Improving the extraction and annotation process
- Reporting issues or inconsistencies
This project is licensed under the MIT License - see the LICENSE file for details.
If you use this benchmark in your research, please cite:
@inproceedings{cgbench2025,
title={CG-Bench: Can Language Models Assist Call Graph Construction in the Real World?},
author={[Authors]},
booktitle={Proceedings of the 1st ACM SIGPLAN International Workshop on Language Models for Programming (LMPL 2025)},
year={2025},
publisher={ACM},
address={New York, NY, USA},
url={https://conf.researchr.org/home/icfp-splash-2025/lmpl-2025},
note={Co-located with SPLASH 2025}
}For questions or collaboration opportunities, please open an issue on GitHub or contact the maintainers.
Keywords: Call Graph Construction, Function Pointers, Static Analysis, Large Language Models, C/C++ Analysis, Software Engineering, Program Analysis