Fine-tune an open-source LLM to write MISRA-compliant C++ code by default. Not prompt engineering — actual weight changes so the model produces compliant code from the get-go.
First run results: 62% violation reduction across 100 test prompts. Four MISRA rules learned perfectly (100% fix rate), one regressed due to training data quality.
MISRA C is a set of 143 coding rules used in safety-critical systems (automotive, aerospace, medical). Writing MISRA-compliant code manually is tedious. This project fine-tunes Qwen2.5-Coder-7B-Instruct using QLoRA so it generates compliant code when given a MISRA system prompt.
The entire pipeline — data collection, fix generation, training, and evaluation — costs about $3 total to reproduce.
| Base Model | Fine-tuned | |
|---|---|---|
| Total violations | 303 | 115 |
| Avg per file | 3.0 | 1.1 |
| Improved / Same / Worse | — | 54 / 20 / 26 |
| Rule | Description | Base | Fine-tuned | Change |
|---|---|---|---|---|
| 8.2 | main(void) not main() |
68 | 0 | 100% fix |
| 12.3 | One declaration per line | 14 | 0 | 100% fix |
| 21.3 | No malloc/free | 12 | 0 | 100% fix |
| 15.5 | Single return per function | 152 | 7 | 95% fix |
| 15.6 | Braces on all control structures | 32 | 101 | Regressed |
The braces regression is a training data quality issue — GPT-5-Mini's "fixed" code was inconsistent on braces. See Improvement Path.
- Base model refused to generate code 27/100 times ("Understood! I'll ensure..."). Fine-tuned model: 0 refusals.
- cppcheck-only eval (limited to C-parseable output): 26 violations -> 0 on 8 comparable files.
HuggingFace dataset (1.35M C++ files)
| Step 1: filter + cppcheck MISRA
data/raw/ (2,762 files with violations)
| Step 2: GPT-5-Mini rewrites + cppcheck validates
data/pairs/ (1,025 original <-> fixed pairs)
| Step 3: GPT-5-Mini describes tasks + format as chat JSONL
data/training/ (922 train + 103 val conversations)
| Step 4: QLoRA fine-tune on Modal (L40S GPU, 15 min)
Modal Volume (LoRA adapter, ~200MB)
| Step 5: generate code + cppcheck/pattern eval
eval/results/ (violation comparison)
Each training example is a chat conversation:
{
"messages": [
{"role": "system", "content": "You are a C++ developer that writes strictly MISRA C:2012 compliant code..."},
{"role": "user", "content": "Sorts an array of integers using bubble sort"},
{"role": "assistant", "content": "#include <cstdio>\n\nint main(void)\n{\n ..."}
]
}| Component | Choice | Why |
|---|---|---|
| Base model | Qwen2.5-Coder-7B-Instruct | Best open-source code model at 7B size |
| Fine-tuning | QLoRA (rank 64, alpha 128, 4-bit) | Fits on single GPU, ~2% of params trained |
| Training infra | Modal + Unsloth | $0.50/run, 2x faster than vanilla, no setup |
| MISRA checker | cppcheck 2.19 + misra.py addon | Only free MISRA tool available |
| Fix generation | GPT-5-Mini | 12x cheaper than Claude, comparable quality |
| Data source | nguyentruong-ins/codeforces_cpp_cleaned | 1.35M C++ programs, streamable |
- Python 3.11+
- cppcheck 2.19+ (
brew install cppcheckorapt install cppcheck) - OpenAI API key (for steps 2-3)
- Modal account (free tier, $30/month credits — for steps 4-5)
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
# Modal authentication (opens browser)
python -m modal setup
# Set API keys
export OPENAI_API_KEY="your-key"# Step 1: Collect C++ files with MISRA violations (~16 min)
python 01_collect_violations.py
# Step 2: Generate MISRA-compliant fixes via LLM (~1 hour)
python 02_generate_fixes.py
# Step 3: Format into training JSONL (~6 min)
python 03_format_dataset.py
# Step 4: Fine-tune on Modal (~15 min, ~$0.50)
modal run 04_train.py
# Step 5: Evaluate base vs fine-tuned (~15 min, ~$0.50)
modal run 05_evaluate.pyAll steps are resumable. If interrupted, re-run the same command.
Everything is controlled by config.yaml:
training:
base_model: "Qwen/Qwen2.5-Coder-7B-Instruct"
lora_rank: 64
lora_alpha: 128
learning_rate: 0.0002
epochs: 3 # 2 is better for <1000 examples
batch_size: 4
max_seq_length: 2048cppcheck's MISRA addon (misra.py) only fires with --language=c. In C++ mode, it produces zero MISRA output. This means:
- Code with
std::vector,std::cin,#include <iostream>etc. cannot be MISRA-checked - The fine-tuned model outputs C++-style code (learned from training data), so most of its output can't be verified by cppcheck
- Pattern-based evaluation (regex checks for braces, declarations, main(void), etc.) works on all code
- No free MISRA C++ tooling exists. Commercial tools cost $3,000-$10,000+/year.
- Best checkpoint is epoch 1.0, not the final model. With 922 examples, overfitting starts at epoch 1.5. Eval loss: 0.415 (epoch 1) vs 0.473 (epoch 3).
- Trainable parameters: 161M of 7.7B (2.08%). LoRA rank 64 with alpha 128.
- 15 minutes on a single L40S GPU. Unsloth makes QLoRA 2x faster.
- Fix braces in training data — filter or re-generate pairs that violate Rule 15.6
- More data — resume step 2 to process remaining ~1,700 files (currently 1,025/2,762)
- C-style training data — modify step 2 prompt to use printf/scanf instead of cin/cout so cppcheck can verify the output
- Train with 2 epochs — sweet spot for <1,000 examples. With 2,000+ examples, 3 epochs may work
- Better evaluation — pattern checker covers more rules, but a real MISRA C++ tool would be definitive
| Step | Service | Cost |
|---|---|---|
| Step 1 | Free (HuggingFace streaming + local cppcheck) | $0 |
| Step 2 | OpenAI GPT-5-Mini (1,025 fixes) | ~$2.00 |
| Step 3 | OpenAI GPT-5-Mini (1,025 descriptions) | ~$0.50 |
| Step 4 | Modal L40S GPU (15 min) | ~$0.50 |
| Step 5 | Modal L40S GPU (15 min) | ~$0.50 |
| Total | ~$3.50 |
MIT