This repository contains the source code used in our experiments on knowledge distillation (KD) applied to different pre-trained code model architectures across multiple downstream tasks. The project is organized hierarchically by model architecture and task type, with detailed scripts and documentation for both fine-tuning and distillation.
KD_CodeBERT/– Knowledge distillation experiments based on the CodeBERT model.KD_CodeT5/– Knowledge distillation experiments based on the CodeT5 model.KD_Qwen2.5Coder/– Knowledge distillation experiments based on the Qwen2.5-Coder model.
Each directory corresponds to a complete set of distillation experiments for one specific pre-trained model.
Inside each model directory, experiments are organized by downstream task:
0_vulne/– Experiments on vulnerability detection.1_clone/– Experiments on code clone detection.2_code2nl/– Experiments on code-to-natural language generation (documentation generation).
This ensures clear separation between different evaluation tasks.
Within every task directory (e.g., KD_CodeT5/1_clone/), you will find:
- Fine-tuning scripts: Train the original pre-trained model directly on the downstream task.
- Distillation scripts: Implement the teacher–student knowledge distillation process.
- Instruction file (
README.mdorrun_instructions.md):
A task-specific documentation file describing:- Required datasets and preprocessing steps.
- How to run fine-tuning.
- How to run knowledge distillation.
- Expected output formats and evaluation results.
The best-performing distilled model files are available upon request. If you would like to obtain these models for research or reproduction purposes, please feel free to contact us via email at [wuruifeng@stu.cqu.edu.cn] — we are happy to share them.
- Python 3.8+ is recommended.
The experiments depend on several core libraries for deep learning, model distillation, and hyperparameter optimization.
You can install them with the following commands:
# Core deep learning framework
pip install torch==1.4.0
# Hugging Face Transformers
pip install transformers==2.5.0
# Utilities
pip install filelock
pip install pandas
pip install numpy
pip install scikit-learn
pip install hyperopt