misraCoder

Fine-tune an open-source LLM to write MISRA-compliant C++ code by default. Not prompt engineering — actual weight changes so the model produces compliant code from the get-go.

First run results: 62% violation reduction across 100 test prompts. Four MISRA rules learned perfectly (100% fix rate), one regressed due to training data quality.

What is this?

MISRA C is a set of 143 coding rules used in safety-critical systems (automotive, aerospace, medical). Writing MISRA-compliant code manually is tedious. This project fine-tunes Qwen2.5-Coder-7B-Instruct using QLoRA so it generates compliant code when given a MISRA system prompt.

The entire pipeline — data collection, fix generation, training, and evaluation — costs about $3 total to reproduce.

Results

Pattern-based evaluation (100 prompts, all code checked)

	Base Model	Fine-tuned
Total violations	303	115
Avg per file	3.0	1.1
Improved / Same / Worse	—	54 / 20 / 26

Per-rule breakdown

Rule	Description	Base	Fine-tuned	Change
8.2	`main(void)` not `main()`	68	0	100% fix
12.3	One declaration per line	14	0	100% fix
21.3	No malloc/free	12	0	100% fix
15.5	Single return per function	152	7	95% fix
15.6	Braces on all control structures	32	101	Regressed

The braces regression is a training data quality issue — GPT-5-Mini's "fixed" code was inconsistent on braces. See Improvement Path.

Additional findings

Base model refused to generate code 27/100 times ("Understood! I'll ensure..."). Fine-tuned model: 0 refusals.
cppcheck-only eval (limited to C-parseable output): 26 violations -> 0 on 8 comparable files.

How it works

HuggingFace dataset (1.35M C++ files)
    | Step 1: filter + cppcheck MISRA
data/raw/ (2,762 files with violations)
    | Step 2: GPT-5-Mini rewrites + cppcheck validates
data/pairs/ (1,025 original <-> fixed pairs)
    | Step 3: GPT-5-Mini describes tasks + format as chat JSONL
data/training/ (922 train + 103 val conversations)
    | Step 4: QLoRA fine-tune on Modal (L40S GPU, 15 min)
Modal Volume (LoRA adapter, ~200MB)
    | Step 5: generate code + cppcheck/pattern eval
eval/results/ (violation comparison)

Each training example is a chat conversation:

{
  "messages": [
    {"role": "system", "content": "You are a C++ developer that writes strictly MISRA C:2012 compliant code..."},
    {"role": "user", "content": "Sorts an array of integers using bubble sort"},
    {"role": "assistant", "content": "#include <cstdio>\n\nint main(void)\n{\n    ..."}
  ]
}

Stack

Component	Choice	Why
Base model	Qwen2.5-Coder-7B-Instruct	Best open-source code model at 7B size
Fine-tuning	QLoRA (rank 64, alpha 128, 4-bit)	Fits on single GPU, ~2% of params trained
Training infra	Modal + Unsloth	$0.50/run, 2x faster than vanilla, no setup
MISRA checker	cppcheck 2.19 + misra.py addon	Only free MISRA tool available
Fix generation	GPT-5-Mini	12x cheaper than Claude, comparable quality
Data source	nguyentruong-ins/codeforces_cpp_cleaned	1.35M C++ programs, streamable

Reproduce

Prerequisites

Python 3.11+
cppcheck 2.19+ (brew install cppcheck or apt install cppcheck)
OpenAI API key (for steps 2-3)
Modal account (free tier, $30/month credits — for steps 4-5)

Setup

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

# Modal authentication (opens browser)
python -m modal setup

# Set API keys
export OPENAI_API_KEY="your-key"

Run the pipeline

# Step 1: Collect C++ files with MISRA violations (~16 min)
python 01_collect_violations.py

# Step 2: Generate MISRA-compliant fixes via LLM (~1 hour)
python 02_generate_fixes.py

# Step 3: Format into training JSONL (~6 min)
python 03_format_dataset.py

# Step 4: Fine-tune on Modal (~15 min, ~$0.50)
modal run 04_train.py

# Step 5: Evaluate base vs fine-tuned (~15 min, ~$0.50)
modal run 05_evaluate.py

All steps are resumable. If interrupted, re-run the same command.

Configuration

Everything is controlled by config.yaml:

training:
  base_model: "Qwen/Qwen2.5-Coder-7B-Instruct"
  lora_rank: 64
  lora_alpha: 128
  learning_rate: 0.0002
  epochs: 3          # 2 is better for <1000 examples
  batch_size: 4
  max_seq_length: 2048

Key constraint: cppcheck only works in C mode

cppcheck's MISRA addon (misra.py) only fires with --language=c. In C++ mode, it produces zero MISRA output. This means:

Code with std::vector, std::cin, #include <iostream> etc. cannot be MISRA-checked
The fine-tuned model outputs C++-style code (learned from training data), so most of its output can't be verified by cppcheck
Pattern-based evaluation (regex checks for braces, declarations, main(void), etc.) works on all code
No free MISRA C++ tooling exists. Commercial tools cost $3,000-$10,000+/year.

Training observations

Best checkpoint is epoch 1.0, not the final model. With 922 examples, overfitting starts at epoch 1.5. Eval loss: 0.415 (epoch 1) vs 0.473 (epoch 3).
Trainable parameters: 161M of 7.7B (2.08%). LoRA rank 64 with alpha 128.
15 minutes on a single L40S GPU. Unsloth makes QLoRA 2x faster.

Improvement path

Fix braces in training data — filter or re-generate pairs that violate Rule 15.6
More data — resume step 2 to process remaining ~1,700 files (currently 1,025/2,762)
C-style training data — modify step 2 prompt to use printf/scanf instead of cin/cout so cppcheck can verify the output
Train with 2 epochs — sweet spot for <1,000 examples. With 2,000+ examples, 3 epochs may work
Better evaluation — pattern checker covers more rules, but a real MISRA C++ tool would be definitive

Cost breakdown

Step	Service	Cost
Step 1	Free (HuggingFace streaming + local cppcheck)	$0
Step 2	OpenAI GPT-5-Mini (1,025 fixes)	~$2.00
Step 3	OpenAI GPT-5-Mini (1,025 descriptions)	~$0.50
Step 4	Modal L40S GPU (15 min)	~$0.50
Step 5	Modal L40S GPU (15 min)	~$0.50
Total		~$3.50

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
utils		utils
.gitignore		.gitignore
01_collect_violations.py		01_collect_violations.py
02_generate_fixes.py		02_generate_fixes.py
03_format_dataset.py		03_format_dataset.py
04_train.py		04_train.py
05_evaluate.py		05_evaluate.py
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
config.yaml		config.yaml
requirements.txt		requirements.txt
test_sample.cpp		test_sample.cpp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

misraCoder

What is this?

Results

Pattern-based evaluation (100 prompts, all code checked)

Per-rule breakdown

Additional findings

How it works

Stack

Reproduce

Prerequisites

Setup

Run the pipeline

Configuration

Key constraint: cppcheck only works in C mode

Training observations

Improvement path

Cost breakdown

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

misraCoder

What is this?

Results

Pattern-based evaluation (100 prompts, all code checked)

Per-rule breakdown

Additional findings

How it works

Stack

Reproduce

Prerequisites

Setup

Run the pipeline

Configuration

Key constraint: cppcheck only works in C mode

Training observations

Improvement path

Cost breakdown

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages