CG-Bench: A Call Graph Construction Benchmark for Language Models

📄 Paper: CG-Bench: Can Language Models Assist Call Graph Construction in the Real World? - Published at LMPL@SPLASH2025

CG-Bench is a comprehensive benchmark dataset designed to evaluate the capabilities of Large Language Models (LLMs) in assisting with call graph construction in real-world C/C++ codebases. The benchmark focuses specifically on challenging indirect function calls through function pointers, which are notoriously difficult for traditional static analysis tools to resolve.

📋 Overview

Call graph construction is a fundamental program analysis technique crucial for various software engineering tasks including debugging, optimization, security analysis, and program comprehension. While direct function calls are straightforward to analyze, indirect calls through function pointers present significant challenges for automated tools.

This benchmark provides:

Real-world complexity: Examples extracted from popular open-source projects
Diverse patterns: Multiple categories of function pointer usage patterns
Structured format: Consistent annotation format for evaluation
Comprehensive coverage: 104 examples across 7 major open-source projects

🗂️ Dataset Structure

Projects Included

The benchmark includes examples from seven major C/C++ projects:

Project	Domain	Lines of Code(KLoC)	Description
openssh	Security	~120	Secure Shell (SSH) protocol implementation
curl	Networking	~182	Command line tool and library for transferring data
redis	Database	~200	In-memory data structure store
zfs	File System	~380	ZFS file system implementation
wrk	Benchmarking	~601	HTTP benchmarking tool
ffmpeg	Multimedia	~1,257	Complete multimedia framework
gcc	Compiler	~6,200	GNU Compiler Collection
Total	All Domains	~8,940	7 major open-source projects

Function Pointer Categories

The benchmark categorizes function pointer usage patterns into 11 distinct types:

fnptr-callback (15 examples): Function pointers used as callbacks
fnptr-cast (7 examples): Function pointers with type casting
fnptr-dynamic-call (5 examples): Dynamically resolved function calls
fnptr-global-array (6 examples): Function pointers in global arrays
fnptr-global-struct-array (12 examples): Function pointers in arrays within global structures
fnptr-global-struct (11 examples): Function pointers in global structures
fnptr-library (20 examples): Function pointers in library interfaces
fnptr-only (12 examples): Basic function pointer calls without complex patterns
fnptr-struct (14 examples): Function pointers stored in structures
fnptr-varargs (1 example): Function pointers with variable arguments
fnptr-virtual (1 example): Virtual function-like patterns in C

📁 Repository Contents

CG-Bench/
├── README.md                         # This file
├── LICENSE                           # MIT License
├── projects.md                       # Detailed project statistics
├── extract_from_markdowns.py         # Data extraction script
├── fnptr-*.md                        # Category-specific examples
└── CG_Bench_Can_Language_Models_Assist_Call_Graph_Construction_in_the_Real_World.pdf

Example Format

Each example follows a consistent structure:

# Example N

## Callsite
*Full path and location of the function pointer call*

fnptr: *function_pointer_name*
targets: target_function1, target_function2, ...

## Related Code Snippets
```c
// Relevant code context showing the function pointer usage

🚀 Usage

Extracting Benchmark Data

To generate a structured JSON dataset from the markdown files:

python3 extract_from_markdowns.py

This creates cgbench.json containing all examples in a structured format suitable for evaluation.

Data Format

The generated JSON follows this structure:

{
  "project_name": {
    "callsite_path": {
      "callsite": "function_pointer_name",
      "type": "category",
      "chain_summary": [
        {
          "source_code": ["line1", "line2", ...],
          "parent": ""
        }
      ],
      "callees": {
        "targets": {
          "target_function": ""
        }
      }
    }
  }
}

📊 Benchmark Statistics

Total Examples: 104
Project Categories: 7 major open-source projects
Function Pointer Patterns: 11 distinct categories
Code Contexts: Multiple code snippets per example showing usage patterns
Real-world Complexity: Examples from production codebases

🎯 Use Cases

This benchmark is designed for:

LLM Evaluation: Assessing language models' ability to understand complex code patterns
Tool Development: Benchmarking static analysis tools for call graph construction
Research: Studying function pointer resolution in real-world codebases
Education: Understanding various function pointer usage patterns in C/C++

📖 Paper Reference

This benchmark accompanies the research paper: "CG-Bench: Can Language Models Assist Call Graph Construction in the Real World?"

The paper provides detailed methodology, evaluation results, and analysis of LLM performance on call graph construction tasks.

🤝 Contributing

We welcome contributions to expand the benchmark! Please consider:

Adding examples from additional open-source projects
Identifying new function pointer usage patterns
Improving the extraction and annotation process
Reporting issues or inconsistencies

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🔗 Citation

If you use this benchmark in your research, please cite:

@inproceedings{cgbench2025,
  title={CG-Bench: Can Language Models Assist Call Graph Construction in the Real World?},
  author={[Authors]},
  booktitle={Proceedings of the 1st ACM SIGPLAN International Workshop on Language Models for Programming (LMPL 2025)},
  year={2025},
  publisher={ACM},
  address={New York, NY, USA},
  url={https://conf.researchr.org/home/icfp-splash-2025/lmpl-2025},
  note={Co-located with SPLASH 2025}
}

📞 Contact

For questions or collaboration opportunities, please open an issue on GitHub or contact the maintainers.

Keywords: Call Graph Construction, Function Pointers, Static Analysis, Large Language Models, C/C++ Analysis, Software Engineering, Program Analysis

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.gitignore		.gitignore
CG_Bench_Can_Language_Models_Assist_Call_Graph_Construction_in_the_Real_World.pdf		CG_Bench_Can_Language_Models_Assist_Call_Graph_Construction_in_the_Real_World.pdf
LICENSE		LICENSE
README.md		README.md
extract_from_markdowns.py		extract_from_markdowns.py
fnptr-callback.md		fnptr-callback.md
fnptr-cast.md		fnptr-cast.md
fnptr-dynamic-call.md		fnptr-dynamic-call.md
fnptr-global-array.md		fnptr-global-array.md
fnptr-global-struct-array.md		fnptr-global-struct-array.md
fnptr-global-struct.md		fnptr-global-struct.md
fnptr-library.md		fnptr-library.md
fnptr-only.md		fnptr-only.md
fnptr-struct.md		fnptr-struct.md
fnptr-varargs.md		fnptr-varargs.md
fnptr-virtual.md		fnptr-virtual.md
projects.md		projects.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CG-Bench: A Call Graph Construction Benchmark for Language Models

📋 Overview

🗂️ Dataset Structure

Projects Included

Function Pointer Categories

📁 Repository Contents

Example Format

🚀 Usage

Extracting Benchmark Data

Data Format

📊 Benchmark Statistics

🎯 Use Cases

📖 Paper Reference

🤝 Contributing

📄 License

🔗 Citation

📞 Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CG-Bench: A Call Graph Construction Benchmark for Language Models

📋 Overview

🗂️ Dataset Structure

Projects Included

Function Pointer Categories

📁 Repository Contents

Example Format

🚀 Usage

Extracting Benchmark Data

Data Format

📊 Benchmark Statistics

🎯 Use Cases

📖 Paper Reference

🤝 Contributing

📄 License

🔗 Citation

📞 Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages