Skip to content

Navigation Menu

Appearance settings

View all features
- BY COMPANY SIZE
  Enterprises
  Small and medium teams
  Startups
  Nonprofits
- BY USE CASE
  App Modernization
  DevSecOps
  DevOps
  CI/CD
  View all use cases
- BY INDUSTRY
  Healthcare
  Financial services
  Manufacturing
  Government
  View all industries
View all solutions
- EXPLORE BY TOPIC
  AI
  Software Development
  DevOps
  Security
  View all topics
- EXPLORE BY TYPE
  Customer stories
  Events & webinars
  Ebooks & reports
  Business insights
  GitHub Skills
- SUPPORT & SERVICES
  Documentation
  Customer support
  Community forum
  Trust center
  Partners
View all resources
- COMMUNITY
  GitHub SponsorsFund open source developers
- PROGRAMS
  Security Lab
  Maintainer Community
  Accelerator
  GitHub Stars
  Archive Program
- REPOSITORIES
  Topics
  Trending
  Collections
- ENTERPRISE SOLUTIONS
  Enterprise platformAI-powered developer platform
- AVAILABLE ADD-ONS
  GitHub Advanced SecurityEnterprise-grade security features
  Copilot for BusinessEnterprise-grade AI features
  Premium SupportEnterprise-grade 24/7 support
Pricing

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

Appearance settings

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

QiushiSun / Awesome-Code-Intelligence Public

Notifications You must be signed in to change notification settings
Fork 15
Star 283

Code
Issues 2
Pull requests
Actions
Projects
Wiki
Security and quality
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Wiki
Security and quality
Insights

Benchmarks

Jump to bottom

Shuai Yuan edited this page May 28, 2024 · 7 revisions

NaturalCodeBench: Examining Coding Performance Mismatch on HumanEval and Natural User Prompts
DebugBench: Evaluating Debugging Capability of Large Language Models
VulBench How Far Have We Gone in Vulnerability Detection Using Large Language Models
InstructCoder: Empowering Language Models for Code Editing
EvalGPTFix A Critical Review of Large Language Model on Software Engineering: An Example from ChatGPT and Automated Program Repair
SWE-bench: Can Language Models Resolve Real-World GitHub Issues
CodeTransOcean: A Comprehensive Multilingual Benchmark for Code Translation
CodeApex: A Bilingual Programming Evaluation Benchmark for Large Language Models
VerilogEval: Evaluating Large Language Models for Verilog Code Generation
G-TransEval On the Evaluation of Neural Code Translation: Taxonomy and Benchmark
HumanEvalPack OctoPack: Instruction Tuning Code Large Language Models
LogHub A Large-scale Benchmark for Log Parsing Techniques: How Far Are We
BioCoder: A Benchmark for Bioinformatics Code Generation with Contextual Pragmatic Knowledge
ExGroFi Delving into Commit-Issue Correlation to Enhance Commit Message Generation Models
CoRec Context-aware Retrieval-based Deep Commit Message Generation
COJ2022 Errorclr: Semantic error classification, localization and repair for introductory programming assignments
VulnPatchPairs Limits of Machine Learning for Automatic Vulnerability Detection
DotPrompts Guiding Language Models of Code with Global Context using Monitors
LongCoder: A Long-Range Pre-trained Language Model for Code Completion
StudentEval: A Benchmark of Student-Written Prompts for Large Language Models of Code
EvalPlus Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation.
DiverseVul: A New Vulnerable Source Code Dataset for Deep Learning Based Vulnerability Detection
RunBugRun An Executable Dataset for Automated Program Repair
HumanEval-X CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Evaluations on HumanEval-X
xCodeEval: A Large Scale Multilingual Multitask Benchmark for Code Understanding, Generation, Translation and Retrieval
ODEX Execution-Based Evaluation for Open-Domain Code Generation
ARCADE Natural Language to Code Generation in Interactive Data Science Notebooks
DS-1000: A Natural and Reliable Benchmark for Data Science Code Generation
ExeDS Execution-based Evaluation for Data Science Code Generation Models
TorchDataEval When Language Model Meets Private Library
MBXP, Multilingual HumanEval, MathQA-X Multi-lingual Evaluation of Code Generation Models
MultiPL-E: a scalable and polyglot approach to benchmarking neural code generation
AixBench: A Code Generation Benchmark Dataset
PandasEval, NumpyEval CERT: Continual Pre-Training on Sketches for Library-Oriented Code Generation
BIG-Bench Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
MCoNaLa: A Benchmark for Code Generation from Multiple Natural Languages
CoST Multilingual Code Snippets Training for Program Translation
MTPB CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis
CodeContests Competition-Level Code Generation with AlphaCode
DSP Training and Evaluating a Jupyter Notebook Data Science Assistant
MBPP Program Synthesis with Large Language Models
PlotCoder: Hierarchical Decoding for Synthesizing Visualization Code in Programmatic Context
CrossVul: a cross-language vulnerability dataset with commit data
HumanEval Evaluating Large Language Models Trained on Code
KaggleDBQA: Realistic Evaluation of Text-to-SQL Parsers
CommitBERT: Commit Message Generation Using Pre-Trained Programming Language Model
CoSQA: 20,000+ Web Queries for Code Search and Question Answering
APPS Measuring Coding Challenge Competence With APPS
CodeTrans: Towards Cracking the Language of Silicon's Code Through Self-Supervised Deep Learning and High Performance Computing
CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation.
squall On the Potential of Lexico-logical Alignments for Semantic Parsing to SQL Queries
FB-Java Deep Graph Matching and Searching for Semantic Code Retrieval
SO-DS Neural Code Search Revisited: Enhancing Code Snippet Retrieval through Natural Language Intent
GREAT Global Relational Models of Source Code
ATOM: Commit Message Generation Based on Abstract Syntax Tree and Hybrid Ranking
CLCDSA: Cross Language Code Clone Detection using Syntactical Features and API Documentation
JuICe: A Large Scale Distantly Supervised Dataset for Open Domain Context-based Code
CodeSearchNet Challenge: Evaluating the State of Semantic Code Search
Devign: Effective Vulnerability Identification by Learning Comprehensive Program Semantics via Graph Neural Networks
CoSQL: A Conversational Text-to-SQL Challenge Towards Cross-Domain Natural Language Interfaces to Databases
CoDiSum Commit Message Generation for Source Code Changes
SParC: Cross-Domain Semantic Parsing in Context
PtrGNCMsg Generating commit messages from diffs using pointer-generator network
Bugs2Fix An Empirical Study on Learning Bug-Fixing Patches in the Wild via Neural Machine Translation.
Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task
SySeVR: A Framework for Using Deep Learning to Detect Software Vulnerabilities
CONCODE Mapping Language to Code in Programmatic Context
TL-CodeSum Summarizing source code with transferred API knowledge
Draper VDISC Automated Vulnerability Detection in Source Code Using Deep Representation Learning
NL2Bash Deep code comment generation
DeepCom NL2Bash: A Corpus and Semantic Parser for Natural Language Interface to the Linux Operating System
CGD VulDeePecker: A Deep Learning-Based System for Vulnerability Detection
QuixBugs: a multi-lingual program repair benchmark set based on the quixey challenge
WikiSQL Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning
CommitGen A Neural Architecture for Generating Natural Language Descriptions from Source Code Changes
DeepFix: Fixing Common C Language Errors by Deep Learning
PY150 Probabilistic Model for Code with Decision Trees
CODE-NN Summarizing Source Code using a Neural Attention Model
BigCloneBench Evaluating clone detection tools with BigCloneBench
WikiTQ Compositional Semantic Parsing on Semi-Structured Tables
POJ-104 Convolutional Neural Networks over Tree Structures for Programming Language Processing
Defects4J: a database of existing faults to enable controlled testing studies for Java programs
GitHub Java Corpus Mining Source Code Repositories at Massive Scale using Language Modeling
ATIS Expanding the Scope of the ATIS Task: The ATIS-3 Corpus

Toggle table of contents Pages 4

Home
AI4Science
Benchmarks
Code‐corpus

Clone this wiki locally

Footer

© 2026 GitHub, Inc.

Footer navigation

Terms
Privacy
Security
Status
Community
Docs
Contact

You can’t perform that action at this time.