Knowledge Distillation for Pre-trained Code Models

This repository contains the source code used in our experiments on knowledge distillation (KD) applied to different pre-trained code model architectures across multiple downstream tasks. The project is organized hierarchically by model architecture and task type, with detailed scripts and documentation for both fine-tuning and distillation.

Project Structure

1. Model-Level Directories

KD_CodeBERT/ – Knowledge distillation experiments based on the CodeBERT model.
KD_CodeT5/ – Knowledge distillation experiments based on the CodeT5 model.
KD_Qwen2.5Coder/ – Knowledge distillation experiments based on the Qwen2.5-Coder model.

Each directory corresponds to a complete set of distillation experiments for one specific pre-trained model.

2. Task-Level Subdirectories

Inside each model directory, experiments are organized by downstream task:

0_vulne/ – Experiments on vulnerability detection.
1_clone/ – Experiments on code clone detection.
2_code2nl/ – Experiments on code-to-natural language generation (documentation generation).

This ensures clear separation between different evaluation tasks.

3. Contents of Each Task Directory

Within every task directory (e.g., KD_CodeT5/1_clone/), you will find:

Fine-tuning scripts: Train the original pre-trained model directly on the downstream task.
Distillation scripts: Implement the teacher–student knowledge distillation process.
Instruction file (README.md or run_instructions.md):
A task-specific documentation file describing:
- Required datasets and preprocessing steps.
- How to run fine-tuning.
- How to run knowledge distillation.
- Expected output formats and evaluation results.

4. Availability of the Best Distilled Models

The best-performing distilled model files are available upon request. If you would like to obtain these models for research or reproduction purposes, please feel free to contact us via email at [wuruifeng@stu.cqu.edu.cn] — we are happy to share them.

Environment Setup

1. Python Version

Python 3.8+ is recommended.

2. Required Libraries

The experiments depend on several core libraries for deep learning, model distillation, and hyperparameter optimization.

You can install them with the following commands:

# Core deep learning framework
pip install torch==1.4.0

# Hugging Face Transformers
pip install transformers==2.5.0

# Utilities
pip install filelock
pip install pandas
pip install numpy
pip install scikit-learn
pip install hyperopt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Knowledge Distillation for Pre-trained Code Models

Project Structure

1. Model-Level Directories

2. Task-Level Subdirectories

3. Contents of Each Task Directory

4. Availability of the Best Distilled Models

Environment Setup

1. Python Version

2. Required Libraries

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
KD_CodeBERT		KD_CodeBERT
KD_CodeT5		KD_CodeT5
KD_Qwen2.5Coder		KD_Qwen2.5Coder
README.md		README.md

Waylandite/KDcode2public

Folders and files

Latest commit

History

Repository files navigation

Knowledge Distillation for Pre-trained Code Models

Project Structure

1. Model-Level Directories

2. Task-Level Subdirectories

3. Contents of Each Task Directory

4. Availability of the Best Distilled Models

Environment Setup

1. Python Version

2. Required Libraries

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages