feat: add verifiable fine-tuning step on deterministic training baseline by ryoari · Pull Request #86 · AOSSIE-Org/OpenVerifiableLLM

ryoari · 2026-03-28T18:44:14Z

Proof of Concept: Verifiable Fine-tuning

Summary

Adds a minimal verifiable fine-tuning step on top of a deterministic training baseline.

What this PR does

Trains a tiny deterministic model (base checkpoint)
Applies a deterministic fine-tuning step
Generates hashes for both base and fine-tuned checkpoints
Links both stages through a simple manifest

Why this matters

This builds on earlier work validating determinism and checkpoint reproducibility, and extends it into a minimal verifiable training + fine-tuning pipeline.

It shows that:

fine-tuning can also be reproducible
the full pipeline (base → fine-tune → final model) can be verified end-to-end

Screenshots/Recordings:

Additional Notes:

This is a minimal implementation:

tiny model
synthetic data
simple deterministic update

Future work:

real datasets
full manifest schema
audit/replay verification

Checklist

[ x ] My code follows the project's code style and conventions
[ x ] I have made corresponding changes to the documentation
[ x ] My changes generate no new warnings or errors
[ x ] I have joined the Discord server and I will share a link to this PR with the project maintainers there
[ x ] I have read the Contributing Guidelines

Summary by CodeRabbit

New Features
- Verifiable Fine-Tuning PoC: deterministic base training and fininetuning with reproducible checkpoint hashing and manifest-based verification.
Documentation
- Added README with purpose and step-by-step instructions to run the verification workflow.
Chores
- Added ignore rules to exclude Python artifacts and model output files.
Dependencies
- Added NumPy and PyTorch runtime dependencies.
Bug Fixes
- Minor formatting tweaks to verification report output.

coderabbitai · 2026-03-28T18:44:27Z

Walkthrough

Added a verifiable fine-tuning PoC under experiments/verifiable_finetuning/ (deterministic base training, deterministic fine-tune, checkpoint hashing/manifesting, verification utilities) and added runtime deps numpy and torch to pyproject.

Changes

Cohort / File(s)	Summary
Experiment config & docs `experiments/verifiable_finetuning/.gitignore`, `experiments/verifiable_finetuning/README.md`	Add .gitignore for Python/ML artifacts and README describing the Verifiable Fine-Tuning PoC and run steps.
Training & fine-tuning scripts `experiments/verifiable_finetuning/train_base.py`, `experiments/verifiable_finetuning/finetune.py`	Add deterministic base training and finetuning entrypoints that produce checkpoints, compute hashes, and update manifest entries; finetune mutates parameters deterministically.
Verification & utilities `experiments/verifiable_finetuning/manifest.py`, `experiments/verifiable_finetuning/utils.py`	Add manifest verifier that compares on-disk checkpoint hashes to manifest values and utility functions for deterministic seeding, file hashing, deterministic save, path resolution, and manifest updates.
Project config & minor formatting `pyproject.toml`, `openverifiablellm/verify.py`	Add `numpy>=2.0.2` and `torch>=2.8.0` to dependencies; minor formatting adjustments in verification reporting.

Sequence Diagram(s)

sequenceDiagram
    participant User as User
    participant Base as BaseTraining (train_base.py)
    participant Fine as FineTuner (finetune.py)
    participant Manifest as Verifier (manifest.py)
    participant Utils as Utils (utils.py)
    participant Files as Filesystem

    User->>Base: run train_base.py
    Base->>Utils: set_seed(...) / build model
    Base->>Utils: save_deterministic(state_dict)
    Utils->>Files: write `base_checkpoint.pt`
    Utils-->>Base: return base_checkpoint_hash
    Base->>Utils: update_manifest("base", {...})
    Utils->>Files: write/update `manifest.json`
    Base-->>User: print base checkpoint hash

    User->>Fine: run finetune.py
    Fine->>Files: read `base_checkpoint.pt`
    Fine->>Utils: hash base checkpoint
    Fine->>Utils: mutate model deterministically
    Fine->>Utils: save_deterministic(finetuned state)
    Utils->>Files: write `finetuned_checkpoint.pt`
    Utils-->>Fine: return finetuned_checkpoint_hash
    Fine->>Utils: update_manifest("finetune", {...})
    Fine-->>User: print finetune checkpoint hash and match result

    User->>Manifest: run manifest.py
    Manifest->>Files: read `manifest.json`
    Manifest->>Files: hash `base_checkpoint.pt`
    Manifest->>Manifest: compare to manifest["base"]["checkpoint_hash"]
    Manifest-->>User: print base match result
    Manifest->>Files: hash `finetuned_checkpoint.pt`
    Manifest->>Manifest: compare to manifest["finetune"]["checkpoint_hash"]
    Manifest-->>User: print finetuned match result

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

[Feature]: Add environment verification #58 — touches openverifiablellm/verify.py environment-fingerprint verification/report formatting (overlaps formatting changes in this PR).

Suggested labels

Python Lang, Documentation

Suggested reviewers

Archit381

Poem

🐇 I seeded, saved, and nudged each weight,

base then fine, checked hashes straight.
A manifest keeps honest score,
Repro runs hopping to the door! 🥕

🚥 Pre-merge checks | ✅ 2

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The pull request title accurately describes the main change: adding a verifiable fine-tuning step on a deterministic training baseline, which matches the comprehensive changeset including training scripts, fine-tuning logic, verification utilities, and manifest tracking.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 11

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@experiments/verifiable_finetuning/finetune.py`:
- Around line 41-42: The second print unconditionally prints "FINE MATCH" which
can mislabel mismatches; change the logic in finetune.py to compute
hash_file(ft_ckpt) and compare it to ft_hash (use the ft_hash variable and the
hash_file and ft_ckpt calls) and only print the success message when they are
equal, otherwise print a clear failure message (including both values) so
mismatches are not silently labeled as matches.
- Around line 24-27: The code is mutating param.data directly; instead, inside
the existing torch.no_grad() block update parameters with an in-place tensor op
that preserves autograd invariants (e.g., replace "param.data += 0.001" with
"param.add_(0.001)" or another in-place method) while iterating
model.parameters() so you do not bypass autograd via .data.
- Line 21: Update the checkpoint load call used by model.load_state_dict to use
safe unpickling and explicit device mapping: when loading base_ckpt with
torch.load (the argument passed into model.load_state_dict), pass
weights_only=True and map_location="cpu" (or map to the target device) so only
tensor/primitive types are unpickled and tensors are pinned to a known device;
update the call site where torch.load(base_ckpt) is used (the value fed to
model.load_state_dict) accordingly and ensure compatibility with PyTorch
>=2.8.0.

In `@experiments/verifiable_finetuning/manifest.py`:
- Line 15: The printed verification title has a typo: change the string in the
print statement that currently reads " End to End Plipeline Verification \n" to
" End to End Pipeline Verification \n" so the user-facing message correctly
spells "Pipeline" (locate and update the print call that outputs the
verification title).
- Around line 17-29: The verification currently prints results but doesn't fail
CI or handle missing files/keys; update the block that uses hash_file and
manifest (references: hash_file, manifest, "base_checkpoint.pt",
"finetuned_checkpoint.pt") to be fail-fast: validate presence of
manifest["base"]["checkpoint_hash"] and manifest["finetune"]["checkpoint_hash"],
catch file-not-found/key errors and log a clear error, compute both hashes, and
if either actual != expected call sys.exit(1) (or raise SystemExit) after
printing the mismatch so the process returns non-zero; ensure any unexpected
exceptions are surfaced (or logged) rather than swallowed so CI fails loudly.

In `@experiments/verifiable_finetuning/README.md`:
- Around line 3-10: The README headings and code fence need Markdown lint fixes:
update the "What this proves" and "How to run" headings to standard Markdown
(remove stray quotes around phrases and ensure a blank line above each heading),
add a blank line before and after the fenced code block, and specify the
code-fence language (bash) so the block around the three commands (python
train_base.py, python finetune.py, python manifest.py) is formatted correctly;
check the heading text for extra trailing/leading spaces and remove them to
satisfy MD001/MD022/MD031.

In `@experiments/verifiable_finetuning/train_base.py`:
- Around line 29-30: The second print unconditionally appends "CORRECT MATCH"
even though no comparison is made; update the verification to compute
hash_file(ckpt_path), compare it to ckpt_hash, and print a clear message
reflecting the result (e.g., "CORRECT MATCH" only if hash_file(ckpt_path) ==
ckpt_hash, otherwise "MISMATCH" with both values). Locate the prints around the
variables ckpt_hash and ckpt_path and the hash_file(…) call in train_base.py and
replace the unconditional message with this conditional comparison and concise
outcome.
- Around line 25-27: The manifest currently stores a static label under
"dataset_hash" instead of a cryptographic fingerprint; compute a real SHA-256
(or similar) digest of the generated dataset tensors/bytes (the object created
earlier in the script that holds the synthetic data) before calling
update_manifest("base", ...), convert to a stable hex string, and pass that hex
digest as the dataset_hash value to update_manifest so the manifest
cryptographically binds the exact dataset used (use the same deterministic
ordering/serialization of the tensors when hashing to ensure repeatability given
seed 99).

In `@experiments/verifiable_finetuning/utils.py`:
- Around line 1-6: The import block in utils.py mixes stdlib and third-party
imports; reorder and group them so stdlib imports (hashlib, json, os, random)
appear first, followed by a blank line, then third-party imports (numpy as np,
torch), keeping names as in the diff so Ruff's import grouping rule passes.
- Around line 8-9: The module currently calls os.chdir(SCRIPT_DIR) at import
time which mutates global process state; remove that call and instead use
SCRIPT_DIR for explicit path construction where needed (e.g., join SCRIPT_DIR
with filenames in callers). Add a small helper like get_script_dir() or expose
SCRIPT_DIR constant and update call sites to use os.path.join(SCRIPT_DIR, ...)
rather than relying on changing the working directory; ensure no other code in
this module or tests depends on cwd mutation before removing
os.chdir(SCRIPT_DIR).
- Around line 11-16: In set_seed, enforce strict deterministic behavior by
removing the warn_only=True argument from the torch.use_deterministic_algorithms
call (i.e., call torch.use_deterministic_algorithms(True) so nondeterministic
ops raise errors); update the set_seed function to call
torch.use_deterministic_algorithms(True) to ensure strict determinism for
reproducible, verifiable training.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 0ff47ffb-3963-426b-9814-e4c2ab1acd0e

📥 Commits

Reviewing files that changed from the base of the PR and between 578bc79 and b9829a6.

⛔ Files ignored due to path filters (1)

uv.lock is excluded by !**/*.lock

📒 Files selected for processing (8)

experiments/verifiable_finetuning/.gitignore
experiments/verifiable_finetuning/README.md
experiments/verifiable_finetuning/finetune.py
experiments/verifiable_finetuning/manifest.py
experiments/verifiable_finetuning/train_base.py
experiments/verifiable_finetuning/utils.py
openverifiablellm/verify.py
pyproject.toml

experiments/verifiable_finetuning/finetune.py

coderabbitai · 2026-03-28T18:49:38Z

experiments/verifiable_finetuning/finetune.py

+    with torch.no_grad():
+        for param in model.parameters():
+            param.data += 0.001
+


🧹 Nitpick | 🔵 Trivial

Avoid param.data mutation; use in-place ops under no_grad.

param.data bypasses autograd internals in a non-idiomatic way.

✅ Idiomatic update

with torch.no_grad(): for param in model.parameters(): - param.data += 0.001 + param.add_(0.001)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

with torch.no_grad():

for param in model.parameters():

param.data += 0.001

with torch.no_grad():

for param in model.parameters():

param.add_(0.001)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@experiments/verifiable_finetuning/finetune.py` around lines 24 - 27, The code is mutating param.data directly; instead, inside the existing torch.no_grad() block update parameters with an in-place tensor op that preserves autograd invariants (e.g., replace "param.data += 0.001" with "param.add_(0.001)" or another in-place method) while iterating model.parameters() so you do not bypass autograd via .data.

experiments/verifiable_finetuning/finetune.py

coderabbitai · 2026-03-28T18:49:38Z

experiments/verifiable_finetuning/manifest.py

+    with open("manifest.json", "r") as f:
+        manifest = json.load(f)
+
+    print(" End to End Plipeline Verification \n")


⚠️ Potential issue | 🟡 Minor

Fix user-facing typo in the verification title.

“Plipeline” should be “Pipeline”.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@experiments/verifiable_finetuning/manifest.py` at line 15, The printed verification title has a typo: change the string in the print statement that currently reads " End to End Plipeline Verification \n" to " End to End Pipeline Verification \n" so the user-facing message correctly spells "Pipeline" (locate and update the print call that outputs the verification title).

coderabbitai · 2026-03-28T18:49:38Z

experiments/verifiable_finetuning/manifest.py

+    base_actual = hash_file("base_checkpoint.pt")
+    base_expected = manifest["base"]["checkpoint_hash"]
+    base_match = "BINGO" if base_actual == base_expected else "NUH-UH"
+    print(f"Base expected: {base_expected}")
+    print(f"Base actual  : {base_actual} {base_match}\n")
+
+    # Verify Finetune
+    ft_actual = hash_file("finetuned_checkpoint.pt")
+    ft_expected = manifest["finetune"]["checkpoint_hash"]
+    ft_match = "BINGO" if ft_actual == ft_expected else "NUH-UH"
+    print(f"Finetune expected: {ft_expected}")
+    print(f"Finetune actual  : {ft_actual} {ft_match}")
+


⚠️ Potential issue | 🟠 Major

Make verification fail-fast and CI-friendly.

The current flow can raise unhandled errors on missing files/keys and still exits successfully on hash mismatches. Verification should return non-zero when checks fail.

✅ Robust verification pattern

+import sys @@ def verify(): @@ - base_actual = hash_file("base_checkpoint.pt") - base_expected = manifest["base"]["checkpoint_hash"] + required = [("base", "base_checkpoint.pt"), ("finetune", "finetuned_checkpoint.pt")] + for stage, ckpt in required: + if stage not in manifest or "checkpoint_hash" not in manifest[stage]: + print(f"Missing manifest entry: {stage}.checkpoint_hash") + return 1 + if not os.path.exists(ckpt): + print(f"Missing checkpoint file: {ckpt}") + return 1 + + base_actual = hash_file("base_checkpoint.pt") + base_expected = manifest["base"]["checkpoint_hash"] @@ - print(f"Finetune actual : {ft_actual} {ft_match}") + print(f"Finetune actual : {ft_actual} {ft_match}") + return 0 if (base_actual == base_expected and ft_actual == ft_expected) else 1 @@ if __name__ == "__main__": - verify() + sys.exit(verify())

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@experiments/verifiable_finetuning/manifest.py` around lines 17 - 29, The verification currently prints results but doesn't fail CI or handle missing files/keys; update the block that uses hash_file and manifest (references: hash_file, manifest, "base_checkpoint.pt", "finetuned_checkpoint.pt") to be fail-fast: validate presence of manifest["base"]["checkpoint_hash"] and manifest["finetune"]["checkpoint_hash"], catch file-not-found/key errors and log a clear error, compute both hashes, and if either actual != expected call sys.exit(1) (or raise SystemExit) after printing the mismatch so the process returns non-zero; ensure any unexpected exceptions are surfaced (or logged) rather than swallowed so CI fails loudly.

we don't need such robust verification, for such a simple experiment

@ryoari, understood! Given this is a minimal PoC to demonstrate reproducibility rather than a production-grade pipeline, keeping the verification simple and straightforward makes sense. I'll defer to your judgment here.

(╯°▽°)╯

coderabbitai · 2026-03-28T18:49:38Z

experiments/verifiable_finetuning/train_base.py

+    update_manifest(
+        "base", {"seed": 99, "dataset_hash": "synthetic_16x10_seed99", "checkpoint_hash": ckpt_hash}
+    )


⚠️ Potential issue | 🟠 Major

dataset_hash is not actually a hash.

Line 26 stores a static label, so dataset integrity is not cryptographically bound in the manifest.

✅ Example fix (hash the generated tensors)

+ dataset_bytes = X.detach().cpu().numpy().tobytes() + y.detach().cpu().numpy().tobytes() + dataset_hash = __import__("hashlib").sha256(dataset_bytes).hexdigest() update_manifest( - "base", {"seed": 99, "dataset_hash": "synthetic_16x10_seed99", "checkpoint_hash": ckpt_hash} + "base", {"seed": 99, "dataset_hash": dataset_hash, "checkpoint_hash": ckpt_hash} )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@experiments/verifiable_finetuning/train_base.py` around lines 25 - 27, The manifest currently stores a static label under "dataset_hash" instead of a cryptographic fingerprint; compute a real SHA-256 (or similar) digest of the generated dataset tensors/bytes (the object created earlier in the script that holds the synthetic data) before calling update_manifest("base", ...), convert to a stable hex string, and pass that hex digest as the dataset_hash value to update_manifest so the manifest cryptographically binds the exact dataset used (use the same deterministic ordering/serialization of the tensors when hashing to ensure repeatability given seed 99).

coderabbitai · 2026-03-28T18:49:38Z

experiments/verifiable_finetuning/train_base.py

+    print(f"Base run hash: {ckpt_hash}")
+    print(f"Base run hash (again): {hash_file(ckpt_path)}   CORRECT MATCH")


⚠️ Potential issue | 🟠 Major

Avoid unconditional “match” output.

Line 30 always prints CORRECT MATCH without comparing values, which can hide real mismatches.

✅ Verification output fix

print(f"Base run hash: {ckpt_hash}") - print(f"Base run hash (again): {hash_file(ckpt_path)} CORRECT MATCH") + actual_hash = hash_file(ckpt_path) + status = "CORRECT MATCH" if actual_hash == ckpt_hash else "MISMATCH" + print(f"Base run hash (again): {actual_hash} {status}")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

print(f"Base run hash: {ckpt_hash}")

print(f"Base run hash (again): {hash_file(ckpt_path)} CORRECT MATCH")

print(f"Base run hash: {ckpt_hash}")

actual_hash = hash_file(ckpt_path)

status = "CORRECT MATCH" if actual_hash == ckpt_hash else "MISMATCH"

print(f"Base run hash (again): {actual_hash} {status}")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@experiments/verifiable_finetuning/train_base.py` around lines 29 - 30, The second print unconditionally appends "CORRECT MATCH" even though no comparison is made; update the verification to compute hash_file(ckpt_path), compare it to ckpt_hash, and print a clear message reflecting the result (e.g., "CORRECT MATCH" only if hash_file(ckpt_path) == ckpt_hash, otherwise "MISMATCH" with both values). Locate the prints around the variables ckpt_hash and ckpt_path and the hash_file(…) call in train_base.py and replace the unconditional message with this conditional comparison and concise outcome.

experiments/verifiable_finetuning/utils.py

coderabbitai · 2026-03-28T18:49:38Z

experiments/verifiable_finetuning/utils.py

+def set_seed(seed=99):
+    random.seed(seed)
+    np.random.seed(seed)
+    torch.manual_seed(seed)
+    torch.use_deterministic_algorithms(True, warn_only=True)
+    torch.set_default_device("cpu")


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

In PyTorch, does torch.use_deterministic_algorithms(True, warn_only=True) allow nondeterministic operations to run with warnings instead of raising an error?

💡 Result:

Yes, torch.use_deterministic_algorithms(True, warn_only=True) allows nondeterministic operations to run while issuing warnings instead of raising a RuntimeError.

Citations:

1: Set warning or error with use_deterministic_algorithms pytorch/pytorch#64883

2: https://docs.pytorch.org/docs/2.0/generated/torch.use_deterministic_algorithms.html

🏁 Script executed:

# Find where set_seed is called rg "set_seed" experiments/verifiable_finetuning/ -n # Look for any nondeterministic operations or related comments rg -i "nondeterministic|deterministic|warn" experiments/verifiable_finetuning/ -n # Check if there are any known issues with nondeterministic operations rg -i "warn_only|use_deterministic" -A 3 -B 3 experiments/verifiable_finetuning/

Repository: AOSSIE-Org/OpenVerifiableLLM

Length of output: 2104

🏁 Script executed:

# Understand the purpose and context of the verification code fd "\.py$" experiments/verifiable_finetuning/ | head -20 | xargs wc -l | tail -1

Repository: AOSSIE-Org/OpenVerifiableLLM

Length of output: 82

🏁 Script executed:

# Check imports and overall file structure head -30 experiments/verifiable_finetuning/utils.py

Repository: AOSSIE-Org/OpenVerifiableLLM

Length of output: 729

🏁 Script executed:

# Check for comments or git history around this parameter git log -p --all -S "warn_only" -- experiments/verifiable_finetuning/utils.py 2>/dev/null | head -50

Repository: AOSSIE-Org/OpenVerifiableLLM

Length of output: 54

🏁 Script executed:

# Check for any exception handling or try-except around set_seed calls rg "try:|except:|set_seed" experiments/verifiable_finetuning/ -B 2 -A 2

Repository: AOSSIE-Org/OpenVerifiableLLM

Length of output: 1623

🏁 Script executed:

# Look for any TODOs or comments in the file grep -n "TODO\|FIXME\|NOTE\|XXX\|#" experiments/verifiable_finetuning/utils.py

Repository: AOSSIE-Org/OpenVerifiableLLM

Length of output: 54

Remove warn_only=True to enforce strict determinism.

The warn_only=True parameter allows nondeterministic operations to proceed with warnings instead of raising errors. Given that this project's core objective is verifiable and reproducible training with cryptographic verification of model states, nondeterministic operations silently degrade the verification guarantee.

Suggested fix

- torch.use_deterministic_algorithms(True, warn_only=True) + torch.use_deterministic_algorithms(True)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@experiments/verifiable_finetuning/utils.py` around lines 11 - 16, In set_seed, enforce strict deterministic behavior by removing the warn_only=True argument from the torch.use_deterministic_algorithms call (i.e., call torch.use_deterministic_algorithms(True) so nondeterministic ops raise errors); update the set_seed function to call torch.use_deterministic_algorithms(True) to ensure strict determinism for reproducible, verifiable training.

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@experiments/verifiable_finetuning/utils.py`:
- Around line 21-25: The hash_file function currently reads entire file into
memory via f.read(), which will OOM on large checkpoints; modify hash_file to
read the file in fixed-size chunks (e.g., 64KB or 1MB) in a loop and call
hasher.update(chunk) for each chunk until EOF, keeping the rest of the function
(hasher = hashlib.sha256(), open(file_path, "rb"), return hasher.hexdigest())
the same so it streams and hashes arbitrarily large files without loading them
entirely.
- Around line 33-42: The update_manifest function currently assumes
manifest_path contains valid JSON; if json.load raises json.JSONDecodeError the
function will crash with a confusing error. Wrap the open+json.load block in a
try/except catching json.JSONDecodeError (and optionally IOError), and on decode
error either reinitialize manifest = {} or raise a clearer exception mentioning
manifest_path and the original error; ensure you still write the new
manifest[stage] = data and that the file is overwritten when recovering.
Reference: function update_manifest, variable manifest_path, exception
json.JSONDecodeError.
- Around line 28-30: The save_deterministic function currently uses the internal
parameter _use_new_zipfile_serialization which is undocumented and may be
removed; update save_deterministic to guard or document this: check
torch.__version__ (or use packaging.version.parse) and only pass
_use_new_zipfile_serialization when running a PyTorch version known to support
it (e.g., 2.11), otherwise fall back to calling torch.save without that kwarg or
use an alternate deterministic serialization approach, and add a short
comment/docstring near save_deterministic describing the version requirement and
why the branch exists; ensure hash_file is still called on the saved path in
both branches.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 696fb768-980c-43f7-97ad-a207582bcc3c

📥 Commits

Reviewing files that changed from the base of the PR and between b9829a6 and b025e75.

📒 Files selected for processing (1)

experiments/verifiable_finetuning/utils.py

coderabbitai · 2026-03-28T18:54:15Z

experiments/verifiable_finetuning/utils.py

+def hash_file(file_path):
+    hasher = hashlib.sha256()
+    with open(file_path, "rb") as f:
+        hasher.update(f.read())
+    return hasher.hexdigest()


🧹 Nitpick | 🔵 Trivial

Use chunked reading for large checkpoint files.

f.read() loads the entire file into memory. This works for the tiny model POC but will cause memory issues with real-world checkpoints (often GBs).

♻️ Chunked hashing for large files

def hash_file(file_path): hasher = hashlib.sha256() with open(file_path, "rb") as f: - hasher.update(f.read()) + for chunk in iter(lambda: f.read(8192), b""): + hasher.update(chunk) return hasher.hexdigest()

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@experiments/verifiable_finetuning/utils.py` around lines 21 - 25, The hash_file function currently reads entire file into memory via f.read(), which will OOM on large checkpoints; modify hash_file to read the file in fixed-size chunks (e.g., 64KB or 1MB) in a loop and call hasher.update(chunk) for each chunk until EOF, keeping the rest of the function (hasher = hashlib.sha256(), open(file_path, "rb"), return hasher.hexdigest()) the same so it streams and hashes arbitrarily large files without loading them entirely.

coderabbitai · 2026-03-28T18:54:15Z

experiments/verifiable_finetuning/utils.py

+def save_deterministic(state_dict, path):
+    torch.save(state_dict, path, _use_new_zipfile_serialization=False)
+    return hash_file(path)


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

PyTorch torch.save _use_new_zipfile_serialization parameter deprecated removed

💡 Result:

The _use_new_zipfile_serialization parameter in torch.save is an internal (underscore-prefixed) option introduced around PyTorch 1.6 to control whether to use the new zipfile-based serialization format (True, default since 1.6) or the legacy pickle-based format (False). It is not officially documented in the torch.save API reference pages across versions including 2.11. No evidence found in official documentation, source code snippets, release notes (up to 2.11.0 as of 2026-03-28), or GitHub issues/PRs indicating it has been formally deprecated or removed. The parameter still appears in serialization source code comments referencing PyTorch 1.6 behavior and is mentioned in legacy docs (e.g., 1.6.0 notes). Current serialization docs (stable/main) describe the zip format as default since 1.6 without mentioning the parameter, suggesting it remains supported for backward compatibility to produce legacy files but is unneeded for default behavior. As an undocumented internal parameter, its presence is not guaranteed across versions, but it has not been removed up to the latest PyTorch 2.11.

Citations:

1: https://pytorch.org/docs/stable/_modules/torch/serialization.html

2: https://docs.pytorch.org/docs/stable/generated/torch.save.html

3: https://docs.pytorch.org/docs/1.6.0/notes/serialization.html

4: https://docs.pytorch.org/docs/stable/notes/serialization.html

5: https://docs.pytorch.org/docs/2.8/_sources/notes/serialization.rst.txt

6: https://pytorch.org/docs/stable/notes/serialization.html

7: https://docs.pytorch.org/docs/2.0/notes/serialization.html

8: https://docs.pytorch.org/docs/2.5/_modules/torch/serialization.html

9: https://pytorch.org/docs/stable/generated/torch.save.html

10: http://docs.pytorch.org/docs/main/notes/serialization.html

Address undocumented internal parameter for deterministic serialization.

The _use_new_zipfile_serialization parameter is undocumented and internal (underscore-prefixed), with no official stability guarantees. While currently supported in PyTorch 2.11, relying on undocumented parameters creates fragility—future versions may remove it or change its behavior without notice, potentially breaking hash reproducibility across environments. Consider documenting the PyTorch version constraints or adding a version check to ensure consistent serialization behavior, or evaluate alternative approaches to deterministic saving if the parameter is not available.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@experiments/verifiable_finetuning/utils.py` around lines 28 - 30, The save_deterministic function currently uses the internal parameter _use_new_zipfile_serialization which is undocumented and may be removed; update save_deterministic to guard or document this: check torch.__version__ (or use packaging.version.parse) and only pass _use_new_zipfile_serialization when running a PyTorch version known to support it (e.g., 2.11), otherwise fall back to calling torch.save without that kwarg or use an alternate deterministic serialization approach, and add a short comment/docstring near save_deterministic describing the version requirement and why the branch exists; ensure hash_file is still called on the saved path in both branches.

coderabbitai · 2026-03-28T18:54:15Z

experiments/verifiable_finetuning/utils.py

+def update_manifest(stage, data, manifest_path="manifest.json"):
+    manifest = {}
+    if os.path.exists(manifest_path):
+        with open(manifest_path, "r") as f:
+            manifest = json.load(f)
+
+    manifest[stage] = data
+
+    with open(manifest_path, "w") as f:
+        json.dump(manifest, f, indent=4)


🧹 Nitpick | 🔵 Trivial

Consider adding error handling for malformed JSON.

If manifest.json exists but contains invalid JSON, json.load() will raise JSONDecodeError with a potentially confusing error. For robustness, consider catching this and either re-initializing or providing a clearer error message.

♻️ Optional: Add error handling

def update_manifest(stage, data, manifest_path="manifest.json"): manifest = {} if os.path.exists(manifest_path): - with open(manifest_path, "r") as f: - manifest = json.load(f) + try: + with open(manifest_path, "r") as f: + manifest = json.load(f) + except json.JSONDecodeError: + print(f"Warning: {manifest_path} is malformed, reinitializing.") + manifest = {} manifest[stage] = data with open(manifest_path, "w") as f: json.dump(manifest, f, indent=4)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@experiments/verifiable_finetuning/utils.py` around lines 33 - 42, The update_manifest function currently assumes manifest_path contains valid JSON; if json.load raises json.JSONDecodeError the function will crash with a confusing error. Wrap the open+json.load block in a try/except catching json.JSONDecodeError (and optionally IOError), and on decode error either reinitialize manifest = {} or raise a clearer exception mentioning manifest_path and the original error; ensure you still write the new manifest[stage] = data and that the file is overwritten when recovering. Reference: function update_manifest, variable manifest_path, exception json.JSONDecodeError.

…e pro ject metadata to use contributor-based authorship

coderabbitai

♻️ Duplicate comments (7)

experiments/verifiable_finetuning/utils.py (3)

15-19: ⚠️ Potential issue | 🟠 Major

Enforce strict determinism in set_seed.

Line 18 uses warn_only=True, which allows nondeterministic ops to proceed and weakens reproducibility guarantees for this verification flow.

Suggested fix

 def set_seed(seed=42):
     """Locks in deterministic behavior."""
     torch.manual_seed(seed)
-    torch.use_deterministic_algorithms(True, warn_only=True)
+    torch.use_deterministic_algorithms(True)

In current PyTorch docs, does torch.use_deterministic_algorithms(True, warn_only=True) allow nondeterministic operations to continue with warnings instead of raising?

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@experiments/verifiable_finetuning/utils.py` around lines 15 - 19, The
set_seed function currently calls torch.use_deterministic_algorithms(True,
warn_only=True) which only warns on nondeterministic ops; change this to enforce
strict determinism by calling torch.use_deterministic_algorithms(True,
warn_only=False) so PyTorch raises on nondeterministic ops, and also make the
function fully deterministic by seeding CUDA (torch.cuda.manual_seed_all) and
setting backend flags (torch.backends.cudnn.deterministic = True and
torch.backends.cudnn.benchmark = False) in the set_seed implementation.

29-32: ⚠️ Potential issue | 🟡 Minor

Guard usage of internal torch.save parameter.

Line 31 depends on _use_new_zipfile_serialization, an internal/underscore API with no compatibility guarantee across versions.

Suggested fix

 def save_deterministic(state_dict, path):
     """Saves state dict without zip metadata to ensure identical hashes."""
-    torch.save(state_dict, path, _use_new_zipfile_serialization=False)
+    try:
+        torch.save(state_dict, path, _use_new_zipfile_serialization=False)
+    except TypeError as e:
+        raise RuntimeError(
+            "This reproducibility flow requires torch.save support for "
+            "_use_new_zipfile_serialization=False."
+        ) from e
     return hash_file(path)

Is _use_new_zipfile_serialization an officially documented/stable torch.save parameter in latest PyTorch, and is its behavior guaranteed across releases?

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@experiments/verifiable_finetuning/utils.py` around lines 29 - 32, The use of
the internal kwarg _use_new_zipfile_serialization in save_deterministic is
unsafe across PyTorch versions; update save_deterministic to detect support for
that parameter (e.g., inspect.signature(torch.save) or checking
torch.__version__ / attribute presence) and call torch.save with
_use_new_zipfile_serialization only when supported, otherwise fall back to
calling torch.save without that kwarg; ensure the function still returns
hash_file(path) and keep the function name save_deterministic unchanged.

21-26: 🧹 Nitpick | 🔵 Trivial

Hash files in chunks to avoid high memory usage.

Line 25 reads the entire checkpoint into memory; this will not scale once checkpoints grow.

Suggested fix

 def hash_file(filepath):
     """Generates a SHA-256 hash of a file."""
     hasher = hashlib.sha256()
     with open(filepath, "rb") as f:
-        hasher.update(f.read())
+        for chunk in iter(lambda: f.read(1024 * 1024), b""):
+            hasher.update(chunk)
     return hasher.hexdigest()

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@experiments/verifiable_finetuning/utils.py` around lines 21 - 26, The
hash_file function currently reads the entire file into memory which will OOM
for large checkpoints; change hash_file to stream the file in fixed-size chunks
(e.g., 64KB) and call hashlib.sha256().update on each chunk in a loop while
reading until EOF, then return hexdigest; keep the file opened in binary mode
and preserve the existing function name hash_file to locate and replace the
current implementation.

experiments/verifiable_finetuning/train_base.py (2)

35-36: ⚠️ Potential issue | 🟠 Major

Do not print unconditional “CORRECT MATCH”.

Line 36 can report success even when hashes differ.

Suggested fix

     print(f"Base run hash: {ckpt_hash}")
-    print(f"Base run hash (again): {hash_file(ckpt_path)}   CORRECT MATCH")
+    actual_hash = hash_file(ckpt_path)
+    status = "CORRECT MATCH" if actual_hash == ckpt_hash else "MISMATCH"
+    print(f"Base run hash (again): {actual_hash}   {status}")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@experiments/verifiable_finetuning/train_base.py` around lines 35 - 36, Remove
the hardcoded "CORRECT MATCH" message and instead compare ckpt_hash to
hash_file(ckpt_path): call hash_file(ckpt_path), compare the result to
ckpt_hash, and print a single informative message that includes both hashes and
a conditional "MATCH" or "MISMATCH" comment; update the two print statements
around ckpt_hash, hash_file, and ckpt_path (the variables referenced) so the
success text is only shown when the equality check passes.

31-33: ⚠️ Potential issue | 🟠 Major

dataset_hash should be an actual digest, not a label.

Line 32 stores a static string, so the manifest does not cryptographically bind the exact dataset used in this run.

Suggested fix

+import hashlib
 import torch
 import torch.nn as nn
 from utils import get_path, hash_file, save_deterministic, set_seed, update_manifest
@@
-    update_manifest(
-        "base", {"seed": 99, "dataset_hash": "synthetic_16x10_seed99", "checkpoint_hash": ckpt_hash}
-    )
+    dataset_bytes = X.detach().cpu().numpy().tobytes() + y.detach().cpu().numpy().tobytes()
+    dataset_hash = hashlib.sha256(dataset_bytes).hexdigest()
+    update_manifest("base", {"seed": 99, "dataset_hash": dataset_hash, "checkpoint_hash": ckpt_hash})

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@experiments/verifiable_finetuning/train_base.py` around lines 31 - 33, The
manifest is being written with a static label instead of a cryptographic digest;
replace the hardcoded "synthetic_16x10_seed99" in the update_manifest call with
a real digest computed from the dataset used in this run (e.g., compute a
SHA-256 of the dataset file(s) or a canonical representation) and pass that
digest as the dataset_hash argument to update_manifest; locate the call to
update_manifest in train_base.py and use the computed_digest variable (or a new
helper like compute_dataset_hash(dataset_path)) so update_manifest("base",
{"seed": 99, "dataset_hash": computed_digest, "checkpoint_hash": ckpt_hash})
records the actual dataset digest.

experiments/verifiable_finetuning/finetune.py (1)

24-27: 🧹 Nitpick | 🔵 Trivial

Avoid .data mutation in parameter updates.

Line 26 should use an in-place op directly on the tensor under no_grad() instead of mutating .data.

Suggested fix

     with torch.no_grad():
         for param in model.parameters():
-            param.data += 0.001
+            param.add_(0.001)

In current PyTorch guidance, is direct .data mutation discouraged compared with in-place ops (e.g., param.add_) inside torch.no_grad()?

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@experiments/verifiable_finetuning/finetune.py` around lines 24 - 27, The code
mutates parameter tensors via param.data inside the torch.no_grad() block;
replace the .data mutation with an in-place tensor operation (e.g., call
param.add_ or param.mul_ as appropriate) so updates occur safely without using
.data — locate the loop iterating model.parameters() in the torch.no_grad()
context and change the param.data += 0.001 line to an in-place tensor op like
param.add_(0.001).

experiments/verifiable_finetuning/manifest.py (1)

15-15: ⚠️ Potential issue | 🟡 Minor

Fix typo in verification banner.

Line 15 prints “Plipeline”; this should be “Pipeline”.

Suggested fix

-    print(" End to End Plipeline Verification \n")
+    print(" End to End Pipeline Verification \n")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@experiments/verifiable_finetuning/manifest.py` at line 15, The printed banner
string contains a typo: replace the text " End to End Plipeline Verification \n"
in the print statement with " End to End Pipeline Verification \n" (i.e., fix
"Plipeline" → "Pipeline") so the verification banner reads correctly; locate the
print call that emits the banner and update the literal accordingly.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@experiments/verifiable_finetuning/finetune.py`:
- Around line 24-27: The code mutates parameter tensors via param.data inside
the torch.no_grad() block; replace the .data mutation with an in-place tensor
operation (e.g., call param.add_ or param.mul_ as appropriate) so updates occur
safely without using .data — locate the loop iterating model.parameters() in the
torch.no_grad() context and change the param.data += 0.001 line to an in-place
tensor op like param.add_(0.001).

In `@experiments/verifiable_finetuning/manifest.py`:
- Line 15: The printed banner string contains a typo: replace the text " End to
End Plipeline Verification \n" in the print statement with " End to End Pipeline
Verification \n" (i.e., fix "Plipeline" → "Pipeline") so the verification banner
reads correctly; locate the print call that emits the banner and update the
literal accordingly.

In `@experiments/verifiable_finetuning/train_base.py`:
- Around line 35-36: Remove the hardcoded "CORRECT MATCH" message and instead
compare ckpt_hash to hash_file(ckpt_path): call hash_file(ckpt_path), compare
the result to ckpt_hash, and print a single informative message that includes
both hashes and a conditional "MATCH" or "MISMATCH" comment; update the two
print statements around ckpt_hash, hash_file, and ckpt_path (the variables
referenced) so the success text is only shown when the equality check passes.
- Around line 31-33: The manifest is being written with a static label instead
of a cryptographic digest; replace the hardcoded "synthetic_16x10_seed99" in the
update_manifest call with a real digest computed from the dataset used in this
run (e.g., compute a SHA-256 of the dataset file(s) or a canonical
representation) and pass that digest as the dataset_hash argument to
update_manifest; locate the call to update_manifest in train_base.py and use the
computed_digest variable (or a new helper like
compute_dataset_hash(dataset_path)) so update_manifest("base", {"seed": 99,
"dataset_hash": computed_digest, "checkpoint_hash": ckpt_hash}) records the
actual dataset digest.

In `@experiments/verifiable_finetuning/utils.py`:
- Around line 15-19: The set_seed function currently calls
torch.use_deterministic_algorithms(True, warn_only=True) which only warns on
nondeterministic ops; change this to enforce strict determinism by calling
torch.use_deterministic_algorithms(True, warn_only=False) so PyTorch raises on
nondeterministic ops, and also make the function fully deterministic by seeding
CUDA (torch.cuda.manual_seed_all) and setting backend flags
(torch.backends.cudnn.deterministic = True and torch.backends.cudnn.benchmark =
False) in the set_seed implementation.
- Around line 29-32: The use of the internal kwarg
_use_new_zipfile_serialization in save_deterministic is unsafe across PyTorch
versions; update save_deterministic to detect support for that parameter (e.g.,
inspect.signature(torch.save) or checking torch.__version__ / attribute
presence) and call torch.save with _use_new_zipfile_serialization only when
supported, otherwise fall back to calling torch.save without that kwarg; ensure
the function still returns hash_file(path) and keep the function name
save_deterministic unchanged.
- Around line 21-26: The hash_file function currently reads the entire file into
memory which will OOM for large checkpoints; change hash_file to stream the file
in fixed-size chunks (e.g., 64KB) and call hashlib.sha256().update on each chunk
in a loop while reading until EOF, then return hexdigest; keep the file opened
in binary mode and preserve the existing function name hash_file to locate and
replace the current implementation.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 429788a8-597b-4905-92ba-576b011cbba3

📥 Commits

Reviewing files that changed from the base of the PR and between b025e75 and bd1681f.

📒 Files selected for processing (4)

experiments/verifiable_finetuning/finetune.py
experiments/verifiable_finetuning/manifest.py
experiments/verifiable_finetuning/train_base.py
experiments/verifiable_finetuning/utils.py

feat: Verifiable Fine-Tuning PoC

b9829a6

github-actions bot added no-issue-linked backend configuration documentation Improvements or additions to documentation python labels Mar 28, 2026

github-actions bot added size/XL external-contributor pending-coderabbit-review and removed size/XL labels Mar 28, 2026

coderabbitai bot requested changes Mar 28, 2026

View reviewed changes

ruff: fix

b025e75

github-actions bot added size/XL and removed size/XL labels Mar 28, 2026

Update experiments/verifiable_finetuning/finetune.py

976a5ff

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

github-actions bot added size/XL and removed size/XL labels Mar 28, 2026

Update experiments/verifiable_finetuning/finetune.py

eb049e8

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

github-actions bot added size/XL and removed size/XL labels Mar 28, 2026

coderabbitai bot requested changes Mar 28, 2026

View reviewed changes

ryoari added 3 commits March 29, 2026 00:42

fix: fixing from bot suggestions

7e46061

Adding in experiments/checkpoint_reproducibility_test files and Updat…

d5e5dc6

…e pro ject metadata to use contributor-based authorship

fix: NUH-UH

bd1681f

github-actions bot added size/XL and removed size/XL labels Mar 28, 2026

coderabbitai bot reviewed Mar 28, 2026

View reviewed changes

		print(f"Base run hash: {ckpt_hash}")
		print(f"Base run hash (again): {hash_file(ckpt_path)} CORRECT MATCH")

Uh oh!

Conversation

ryoari commented Mar 28, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Proof of Concept: Verifiable Fine-tuning

Summary

What this PR does

Why this matters

Screenshots/Recordings:

Additional Notes:

Checklist

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ryoari Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ryoari commented Mar 28, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 28, 2026 •

edited

Loading

coderabbitai bot Mar 28, 2026 •

edited

Loading