Skip to content

TonyStef/misraCoder

Repository files navigation

misraCoder

Fine-tune an open-source LLM to write MISRA-compliant C++ code by default. Not prompt engineering — actual weight changes so the model produces compliant code from the get-go.

First run results: 62% violation reduction across 100 test prompts. Four MISRA rules learned perfectly (100% fix rate), one regressed due to training data quality.

What is this?

MISRA C is a set of 143 coding rules used in safety-critical systems (automotive, aerospace, medical). Writing MISRA-compliant code manually is tedious. This project fine-tunes Qwen2.5-Coder-7B-Instruct using QLoRA so it generates compliant code when given a MISRA system prompt.

The entire pipeline — data collection, fix generation, training, and evaluation — costs about $3 total to reproduce.

Results

Pattern-based evaluation (100 prompts, all code checked)

Base Model Fine-tuned
Total violations 303 115
Avg per file 3.0 1.1
Improved / Same / Worse 54 / 20 / 26

Per-rule breakdown

Rule Description Base Fine-tuned Change
8.2 main(void) not main() 68 0 100% fix
12.3 One declaration per line 14 0 100% fix
21.3 No malloc/free 12 0 100% fix
15.5 Single return per function 152 7 95% fix
15.6 Braces on all control structures 32 101 Regressed

The braces regression is a training data quality issue — GPT-5-Mini's "fixed" code was inconsistent on braces. See Improvement Path.

Additional findings

  • Base model refused to generate code 27/100 times ("Understood! I'll ensure..."). Fine-tuned model: 0 refusals.
  • cppcheck-only eval (limited to C-parseable output): 26 violations -> 0 on 8 comparable files.

How it works

HuggingFace dataset (1.35M C++ files)
    | Step 1: filter + cppcheck MISRA
data/raw/ (2,762 files with violations)
    | Step 2: GPT-5-Mini rewrites + cppcheck validates
data/pairs/ (1,025 original <-> fixed pairs)
    | Step 3: GPT-5-Mini describes tasks + format as chat JSONL
data/training/ (922 train + 103 val conversations)
    | Step 4: QLoRA fine-tune on Modal (L40S GPU, 15 min)
Modal Volume (LoRA adapter, ~200MB)
    | Step 5: generate code + cppcheck/pattern eval
eval/results/ (violation comparison)

Each training example is a chat conversation:

{
  "messages": [
    {"role": "system", "content": "You are a C++ developer that writes strictly MISRA C:2012 compliant code..."},
    {"role": "user", "content": "Sorts an array of integers using bubble sort"},
    {"role": "assistant", "content": "#include <cstdio>\n\nint main(void)\n{\n    ..."}
  ]
}

Stack

Component Choice Why
Base model Qwen2.5-Coder-7B-Instruct Best open-source code model at 7B size
Fine-tuning QLoRA (rank 64, alpha 128, 4-bit) Fits on single GPU, ~2% of params trained
Training infra Modal + Unsloth $0.50/run, 2x faster than vanilla, no setup
MISRA checker cppcheck 2.19 + misra.py addon Only free MISRA tool available
Fix generation GPT-5-Mini 12x cheaper than Claude, comparable quality
Data source nguyentruong-ins/codeforces_cpp_cleaned 1.35M C++ programs, streamable

Reproduce

Prerequisites

  • Python 3.11+
  • cppcheck 2.19+ (brew install cppcheck or apt install cppcheck)
  • OpenAI API key (for steps 2-3)
  • Modal account (free tier, $30/month credits — for steps 4-5)

Setup

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

# Modal authentication (opens browser)
python -m modal setup

# Set API keys
export OPENAI_API_KEY="your-key"

Run the pipeline

# Step 1: Collect C++ files with MISRA violations (~16 min)
python 01_collect_violations.py

# Step 2: Generate MISRA-compliant fixes via LLM (~1 hour)
python 02_generate_fixes.py

# Step 3: Format into training JSONL (~6 min)
python 03_format_dataset.py

# Step 4: Fine-tune on Modal (~15 min, ~$0.50)
modal run 04_train.py

# Step 5: Evaluate base vs fine-tuned (~15 min, ~$0.50)
modal run 05_evaluate.py

All steps are resumable. If interrupted, re-run the same command.

Configuration

Everything is controlled by config.yaml:

training:
  base_model: "Qwen/Qwen2.5-Coder-7B-Instruct"
  lora_rank: 64
  lora_alpha: 128
  learning_rate: 0.0002
  epochs: 3          # 2 is better for <1000 examples
  batch_size: 4
  max_seq_length: 2048

Key constraint: cppcheck only works in C mode

cppcheck's MISRA addon (misra.py) only fires with --language=c. In C++ mode, it produces zero MISRA output. This means:

  • Code with std::vector, std::cin, #include <iostream> etc. cannot be MISRA-checked
  • The fine-tuned model outputs C++-style code (learned from training data), so most of its output can't be verified by cppcheck
  • Pattern-based evaluation (regex checks for braces, declarations, main(void), etc.) works on all code
  • No free MISRA C++ tooling exists. Commercial tools cost $3,000-$10,000+/year.

Training observations

  • Best checkpoint is epoch 1.0, not the final model. With 922 examples, overfitting starts at epoch 1.5. Eval loss: 0.415 (epoch 1) vs 0.473 (epoch 3).
  • Trainable parameters: 161M of 7.7B (2.08%). LoRA rank 64 with alpha 128.
  • 15 minutes on a single L40S GPU. Unsloth makes QLoRA 2x faster.

Improvement path

  1. Fix braces in training data — filter or re-generate pairs that violate Rule 15.6
  2. More data — resume step 2 to process remaining ~1,700 files (currently 1,025/2,762)
  3. C-style training data — modify step 2 prompt to use printf/scanf instead of cin/cout so cppcheck can verify the output
  4. Train with 2 epochs — sweet spot for <1,000 examples. With 2,000+ examples, 3 epochs may work
  5. Better evaluation — pattern checker covers more rules, but a real MISRA C++ tool would be definitive

Cost breakdown

Step Service Cost
Step 1 Free (HuggingFace streaming + local cppcheck) $0
Step 2 OpenAI GPT-5-Mini (1,025 fixes) ~$2.00
Step 3 OpenAI GPT-5-Mini (1,025 descriptions) ~$0.50
Step 4 Modal L40S GPU (15 min) ~$0.50
Step 5 Modal L40S GPU (15 min) ~$0.50
Total ~$3.50

License

MIT

About

Fine-tune Qwen2.5-Coder-7B to write MISRA-compliant C++ code by default.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors