Skip to content

feat: add verifiable fine-tuning step on deterministic training baseline#86

Open
ryoari wants to merge 7 commits intoAOSSIE-Org:mainfrom
ryoari:feat/deterministic-baseline
Open

feat: add verifiable fine-tuning step on deterministic training baseline#86
ryoari wants to merge 7 commits intoAOSSIE-Org:mainfrom
ryoari:feat/deterministic-baseline

Conversation

@ryoari
Copy link
Copy Markdown
Contributor

@ryoari ryoari commented Mar 28, 2026

Proof of Concept: Verifiable Fine-tuning

Summary

Adds a minimal verifiable fine-tuning step on top of a deterministic training baseline.

What this PR does

  • Trains a tiny deterministic model (base checkpoint)
  • Applies a deterministic fine-tuning step
  • Generates hashes for both base and fine-tuned checkpoints
  • Links both stages through a simple manifest

Why this matters

This builds on earlier work validating determinism and checkpoint reproducibility, and extends it into a minimal verifiable training + fine-tuning pipeline.

It shows that:

  • fine-tuning can also be reproducible
  • the full pipeline (base → fine-tune → final model) can be verified end-to-end

Screenshots/Recordings:

image

Additional Notes:

This is a minimal implementation:

  • tiny model
  • synthetic data
  • simple deterministic update

Future work:

  • real datasets
  • full manifest schema
  • audit/replay verification

Checklist

  • [ x ] My code follows the project's code style and conventions
  • [ x ] I have made corresponding changes to the documentation
  • [ x ] My changes generate no new warnings or errors
  • [ x ] I have joined the Discord server and I will share a link to this PR with the project maintainers there
  • [ x ] I have read the Contributing Guidelines

Summary by CodeRabbit

  • New Features

    • Verifiable Fine-Tuning PoC: deterministic base training and fininetuning with reproducible checkpoint hashing and manifest-based verification.
  • Documentation

    • Added README with purpose and step-by-step instructions to run the verification workflow.
  • Chores

    • Added ignore rules to exclude Python artifacts and model output files.
  • Dependencies

    • Added NumPy and PyTorch runtime dependencies.
  • Bug Fixes

    • Minor formatting tweaks to verification report output.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 28, 2026

Walkthrough

Added a verifiable fine-tuning PoC under experiments/verifiable_finetuning/ (deterministic base training, deterministic fine-tune, checkpoint hashing/manifesting, verification utilities) and added runtime deps numpy and torch to pyproject.

Changes

Cohort / File(s) Summary
Experiment config & docs
experiments/verifiable_finetuning/.gitignore, experiments/verifiable_finetuning/README.md
Add .gitignore for Python/ML artifacts and README describing the Verifiable Fine-Tuning PoC and run steps.
Training & fine-tuning scripts
experiments/verifiable_finetuning/train_base.py, experiments/verifiable_finetuning/finetune.py
Add deterministic base training and finetuning entrypoints that produce checkpoints, compute hashes, and update manifest entries; finetune mutates parameters deterministically.
Verification & utilities
experiments/verifiable_finetuning/manifest.py, experiments/verifiable_finetuning/utils.py
Add manifest verifier that compares on-disk checkpoint hashes to manifest values and utility functions for deterministic seeding, file hashing, deterministic save, path resolution, and manifest updates.
Project config & minor formatting
pyproject.toml, openverifiablellm/verify.py
Add numpy>=2.0.2 and torch>=2.8.0 to dependencies; minor formatting adjustments in verification reporting.

Sequence Diagram(s)

sequenceDiagram
    participant User as User
    participant Base as BaseTraining (train_base.py)
    participant Fine as FineTuner (finetune.py)
    participant Manifest as Verifier (manifest.py)
    participant Utils as Utils (utils.py)
    participant Files as Filesystem

    User->>Base: run train_base.py
    Base->>Utils: set_seed(...) / build model
    Base->>Utils: save_deterministic(state_dict)
    Utils->>Files: write `base_checkpoint.pt`
    Utils-->>Base: return base_checkpoint_hash
    Base->>Utils: update_manifest("base", {...})
    Utils->>Files: write/update `manifest.json`
    Base-->>User: print base checkpoint hash

    User->>Fine: run finetune.py
    Fine->>Files: read `base_checkpoint.pt`
    Fine->>Utils: hash base checkpoint
    Fine->>Utils: mutate model deterministically
    Fine->>Utils: save_deterministic(finetuned state)
    Utils->>Files: write `finetuned_checkpoint.pt`
    Utils-->>Fine: return finetuned_checkpoint_hash
    Fine->>Utils: update_manifest("finetune", {...})
    Fine-->>User: print finetune checkpoint hash and match result

    User->>Manifest: run manifest.py
    Manifest->>Files: read `manifest.json`
    Manifest->>Files: hash `base_checkpoint.pt`
    Manifest->>Manifest: compare to manifest["base"]["checkpoint_hash"]
    Manifest-->>User: print base match result
    Manifest->>Files: hash `finetuned_checkpoint.pt`
    Manifest->>Manifest: compare to manifest["finetune"]["checkpoint_hash"]
    Manifest-->>User: print finetuned match result
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested labels

Python Lang, Documentation

Suggested reviewers

  • Archit381

Poem

🐇 I seeded, saved, and nudged each weight,

base then fine, checked hashes straight.
A manifest keeps honest score,
Repro runs hopping to the door! 🥕

🚥 Pre-merge checks | ✅ 2
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title accurately describes the main change: adding a verifiable fine-tuning step on a deterministic training baseline, which matches the comprehensive changeset including training scripts, fine-tuning logic, verification utilities, and manifest tracking.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@experiments/verifiable_finetuning/finetune.py`:
- Around line 41-42: The second print unconditionally prints "FINE MATCH" which
can mislabel mismatches; change the logic in finetune.py to compute
hash_file(ft_ckpt) and compare it to ft_hash (use the ft_hash variable and the
hash_file and ft_ckpt calls) and only print the success message when they are
equal, otherwise print a clear failure message (including both values) so
mismatches are not silently labeled as matches.
- Around line 24-27: The code is mutating param.data directly; instead, inside
the existing torch.no_grad() block update parameters with an in-place tensor op
that preserves autograd invariants (e.g., replace "param.data += 0.001" with
"param.add_(0.001)" or another in-place method) while iterating
model.parameters() so you do not bypass autograd via .data.
- Line 21: Update the checkpoint load call used by model.load_state_dict to use
safe unpickling and explicit device mapping: when loading base_ckpt with
torch.load (the argument passed into model.load_state_dict), pass
weights_only=True and map_location="cpu" (or map to the target device) so only
tensor/primitive types are unpickled and tensors are pinned to a known device;
update the call site where torch.load(base_ckpt) is used (the value fed to
model.load_state_dict) accordingly and ensure compatibility with PyTorch
>=2.8.0.

In `@experiments/verifiable_finetuning/manifest.py`:
- Line 15: The printed verification title has a typo: change the string in the
print statement that currently reads " End to End Plipeline Verification \n" to
" End to End Pipeline Verification \n" so the user-facing message correctly
spells "Pipeline" (locate and update the print call that outputs the
verification title).
- Around line 17-29: The verification currently prints results but doesn't fail
CI or handle missing files/keys; update the block that uses hash_file and
manifest (references: hash_file, manifest, "base_checkpoint.pt",
"finetuned_checkpoint.pt") to be fail-fast: validate presence of
manifest["base"]["checkpoint_hash"] and manifest["finetune"]["checkpoint_hash"],
catch file-not-found/key errors and log a clear error, compute both hashes, and
if either actual != expected call sys.exit(1) (or raise SystemExit) after
printing the mismatch so the process returns non-zero; ensure any unexpected
exceptions are surfaced (or logged) rather than swallowed so CI fails loudly.

In `@experiments/verifiable_finetuning/README.md`:
- Around line 3-10: The README headings and code fence need Markdown lint fixes:
update the "What this proves" and "How to run" headings to standard Markdown
(remove stray quotes around phrases and ensure a blank line above each heading),
add a blank line before and after the fenced code block, and specify the
code-fence language (bash) so the block around the three commands (python
train_base.py, python finetune.py, python manifest.py) is formatted correctly;
check the heading text for extra trailing/leading spaces and remove them to
satisfy MD001/MD022/MD031.

In `@experiments/verifiable_finetuning/train_base.py`:
- Around line 29-30: The second print unconditionally appends "CORRECT MATCH"
even though no comparison is made; update the verification to compute
hash_file(ckpt_path), compare it to ckpt_hash, and print a clear message
reflecting the result (e.g., "CORRECT MATCH" only if hash_file(ckpt_path) ==
ckpt_hash, otherwise "MISMATCH" with both values). Locate the prints around the
variables ckpt_hash and ckpt_path and the hash_file(…) call in train_base.py and
replace the unconditional message with this conditional comparison and concise
outcome.
- Around line 25-27: The manifest currently stores a static label under
"dataset_hash" instead of a cryptographic fingerprint; compute a real SHA-256
(or similar) digest of the generated dataset tensors/bytes (the object created
earlier in the script that holds the synthetic data) before calling
update_manifest("base", ...), convert to a stable hex string, and pass that hex
digest as the dataset_hash value to update_manifest so the manifest
cryptographically binds the exact dataset used (use the same deterministic
ordering/serialization of the tensors when hashing to ensure repeatability given
seed 99).

In `@experiments/verifiable_finetuning/utils.py`:
- Around line 1-6: The import block in utils.py mixes stdlib and third-party
imports; reorder and group them so stdlib imports (hashlib, json, os, random)
appear first, followed by a blank line, then third-party imports (numpy as np,
torch), keeping names as in the diff so Ruff's import grouping rule passes.
- Around line 8-9: The module currently calls os.chdir(SCRIPT_DIR) at import
time which mutates global process state; remove that call and instead use
SCRIPT_DIR for explicit path construction where needed (e.g., join SCRIPT_DIR
with filenames in callers). Add a small helper like get_script_dir() or expose
SCRIPT_DIR constant and update call sites to use os.path.join(SCRIPT_DIR, ...)
rather than relying on changing the working directory; ensure no other code in
this module or tests depends on cwd mutation before removing
os.chdir(SCRIPT_DIR).
- Around line 11-16: In set_seed, enforce strict deterministic behavior by
removing the warn_only=True argument from the torch.use_deterministic_algorithms
call (i.e., call torch.use_deterministic_algorithms(True) so nondeterministic
ops raise errors); update the set_seed function to call
torch.use_deterministic_algorithms(True) to ensure strict determinism for
reproducible, verifiable training.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 0ff47ffb-3963-426b-9814-e4c2ab1acd0e

📥 Commits

Reviewing files that changed from the base of the PR and between 578bc79 and b9829a6.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (8)
  • experiments/verifiable_finetuning/.gitignore
  • experiments/verifiable_finetuning/README.md
  • experiments/verifiable_finetuning/finetune.py
  • experiments/verifiable_finetuning/manifest.py
  • experiments/verifiable_finetuning/train_base.py
  • experiments/verifiable_finetuning/utils.py
  • openverifiablellm/verify.py
  • pyproject.toml

Comment on lines +24 to +27
with torch.no_grad():
for param in model.parameters():
param.data += 0.001

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Avoid param.data mutation; use in-place ops under no_grad.

param.data bypasses autograd internals in a non-idiomatic way.

✅ Idiomatic update
     with torch.no_grad():
         for param in model.parameters():
-            param.data += 0.001
+            param.add_(0.001)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
with torch.no_grad():
for param in model.parameters():
param.data += 0.001
with torch.no_grad():
for param in model.parameters():
param.add_(0.001)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@experiments/verifiable_finetuning/finetune.py` around lines 24 - 27, The code
is mutating param.data directly; instead, inside the existing torch.no_grad()
block update parameters with an in-place tensor op that preserves autograd
invariants (e.g., replace "param.data += 0.001" with "param.add_(0.001)" or
another in-place method) while iterating model.parameters() so you do not bypass
autograd via .data.

with open("manifest.json", "r") as f:
manifest = json.load(f)

print(" End to End Plipeline Verification \n")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix user-facing typo in the verification title.

“Plipeline” should be “Pipeline”.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@experiments/verifiable_finetuning/manifest.py` at line 15, The printed
verification title has a typo: change the string in the print statement that
currently reads " End to End Plipeline Verification \n" to " End to End Pipeline
Verification \n" so the user-facing message correctly spells "Pipeline" (locate
and update the print call that outputs the verification title).

Comment on lines +17 to +29
base_actual = hash_file("base_checkpoint.pt")
base_expected = manifest["base"]["checkpoint_hash"]
base_match = "BINGO" if base_actual == base_expected else "NUH-UH"
print(f"Base expected: {base_expected}")
print(f"Base actual : {base_actual} {base_match}\n")

# Verify Finetune
ft_actual = hash_file("finetuned_checkpoint.pt")
ft_expected = manifest["finetune"]["checkpoint_hash"]
ft_match = "BINGO" if ft_actual == ft_expected else "NUH-UH"
print(f"Finetune expected: {ft_expected}")
print(f"Finetune actual : {ft_actual} {ft_match}")

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Make verification fail-fast and CI-friendly.

The current flow can raise unhandled errors on missing files/keys and still exits successfully on hash mismatches. Verification should return non-zero when checks fail.

✅ Robust verification pattern
+import sys
@@
 def verify():
@@
-    base_actual = hash_file("base_checkpoint.pt")
-    base_expected = manifest["base"]["checkpoint_hash"]
+    required = [("base", "base_checkpoint.pt"), ("finetune", "finetuned_checkpoint.pt")]
+    for stage, ckpt in required:
+        if stage not in manifest or "checkpoint_hash" not in manifest[stage]:
+            print(f"Missing manifest entry: {stage}.checkpoint_hash")
+            return 1
+        if not os.path.exists(ckpt):
+            print(f"Missing checkpoint file: {ckpt}")
+            return 1
+
+    base_actual = hash_file("base_checkpoint.pt")
+    base_expected = manifest["base"]["checkpoint_hash"]
@@
-    print(f"Finetune actual  : {ft_actual} {ft_match}")
+    print(f"Finetune actual  : {ft_actual} {ft_match}")
+    return 0 if (base_actual == base_expected and ft_actual == ft_expected) else 1
@@
 if __name__ == "__main__":
-    verify()
+    sys.exit(verify())
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@experiments/verifiable_finetuning/manifest.py` around lines 17 - 29, The
verification currently prints results but doesn't fail CI or handle missing
files/keys; update the block that uses hash_file and manifest (references:
hash_file, manifest, "base_checkpoint.pt", "finetuned_checkpoint.pt") to be
fail-fast: validate presence of manifest["base"]["checkpoint_hash"] and
manifest["finetune"]["checkpoint_hash"], catch file-not-found/key errors and log
a clear error, compute both hashes, and if either actual != expected call
sys.exit(1) (or raise SystemExit) after printing the mismatch so the process
returns non-zero; ensure any unexpected exceptions are surfaced (or logged)
rather than swallowed so CI fails loudly.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need such robust verification, for such a simple experiment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ryoari, understood! Given this is a minimal PoC to demonstrate reproducibility rather than a production-grade pipeline, keeping the verification simple and straightforward makes sense. I'll defer to your judgment here.

(╯°▽°)╯

Comment on lines +25 to +27
update_manifest(
"base", {"seed": 99, "dataset_hash": "synthetic_16x10_seed99", "checkpoint_hash": ckpt_hash}
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

dataset_hash is not actually a hash.

Line 26 stores a static label, so dataset integrity is not cryptographically bound in the manifest.

✅ Example fix (hash the generated tensors)
+    dataset_bytes = X.detach().cpu().numpy().tobytes() + y.detach().cpu().numpy().tobytes()
+    dataset_hash = __import__("hashlib").sha256(dataset_bytes).hexdigest()
     update_manifest(
-        "base", {"seed": 99, "dataset_hash": "synthetic_16x10_seed99", "checkpoint_hash": ckpt_hash}
+        "base", {"seed": 99, "dataset_hash": dataset_hash, "checkpoint_hash": ckpt_hash}
     )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@experiments/verifiable_finetuning/train_base.py` around lines 25 - 27, The
manifest currently stores a static label under "dataset_hash" instead of a
cryptographic fingerprint; compute a real SHA-256 (or similar) digest of the
generated dataset tensors/bytes (the object created earlier in the script that
holds the synthetic data) before calling update_manifest("base", ...), convert
to a stable hex string, and pass that hex digest as the dataset_hash value to
update_manifest so the manifest cryptographically binds the exact dataset used
(use the same deterministic ordering/serialization of the tensors when hashing
to ensure repeatability given seed 99).

Comment on lines +29 to +30
print(f"Base run hash: {ckpt_hash}")
print(f"Base run hash (again): {hash_file(ckpt_path)} CORRECT MATCH")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid unconditional “match” output.

Line 30 always prints CORRECT MATCH without comparing values, which can hide real mismatches.

✅ Verification output fix
     print(f"Base run hash: {ckpt_hash}")
-    print(f"Base run hash (again): {hash_file(ckpt_path)}   CORRECT MATCH")
+    actual_hash = hash_file(ckpt_path)
+    status = "CORRECT MATCH" if actual_hash == ckpt_hash else "MISMATCH"
+    print(f"Base run hash (again): {actual_hash}   {status}")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
print(f"Base run hash: {ckpt_hash}")
print(f"Base run hash (again): {hash_file(ckpt_path)} CORRECT MATCH")
print(f"Base run hash: {ckpt_hash}")
actual_hash = hash_file(ckpt_path)
status = "CORRECT MATCH" if actual_hash == ckpt_hash else "MISMATCH"
print(f"Base run hash (again): {actual_hash} {status}")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@experiments/verifiable_finetuning/train_base.py` around lines 29 - 30, The
second print unconditionally appends "CORRECT MATCH" even though no comparison
is made; update the verification to compute hash_file(ckpt_path), compare it to
ckpt_hash, and print a clear message reflecting the result (e.g., "CORRECT
MATCH" only if hash_file(ckpt_path) == ckpt_hash, otherwise "MISMATCH" with both
values). Locate the prints around the variables ckpt_hash and ckpt_path and the
hash_file(…) call in train_base.py and replace the unconditional message with
this conditional comparison and concise outcome.

Comment on lines +11 to +16
def set_seed(seed=99):
random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)
torch.use_deterministic_algorithms(True, warn_only=True)
torch.set_default_device("cpu")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

In PyTorch, does torch.use_deterministic_algorithms(True, warn_only=True) allow nondeterministic operations to run with warnings instead of raising an error?

💡 Result:

Yes, torch.use_deterministic_algorithms(True, warn_only=True) allows nondeterministic operations to run while issuing warnings instead of raising a RuntimeError.

Citations:


🏁 Script executed:

# Find where set_seed is called
rg "set_seed" experiments/verifiable_finetuning/ -n

# Look for any nondeterministic operations or related comments
rg -i "nondeterministic|deterministic|warn" experiments/verifiable_finetuning/ -n

# Check if there are any known issues with nondeterministic operations
rg -i "warn_only|use_deterministic" -A 3 -B 3 experiments/verifiable_finetuning/

Repository: AOSSIE-Org/OpenVerifiableLLM

Length of output: 2104


🏁 Script executed:

# Understand the purpose and context of the verification code
fd "\.py$" experiments/verifiable_finetuning/ | head -20 | xargs wc -l | tail -1

Repository: AOSSIE-Org/OpenVerifiableLLM

Length of output: 82


🏁 Script executed:

# Check imports and overall file structure
head -30 experiments/verifiable_finetuning/utils.py

Repository: AOSSIE-Org/OpenVerifiableLLM

Length of output: 729


🏁 Script executed:

# Check for comments or git history around this parameter
git log -p --all -S "warn_only" -- experiments/verifiable_finetuning/utils.py 2>/dev/null | head -50

Repository: AOSSIE-Org/OpenVerifiableLLM

Length of output: 54


🏁 Script executed:

# Check for any exception handling or try-except around set_seed calls
rg "try:|except:|set_seed" experiments/verifiable_finetuning/ -B 2 -A 2

Repository: AOSSIE-Org/OpenVerifiableLLM

Length of output: 1623


🏁 Script executed:

# Look for any TODOs or comments in the file
grep -n "TODO\|FIXME\|NOTE\|XXX\|#" experiments/verifiable_finetuning/utils.py

Repository: AOSSIE-Org/OpenVerifiableLLM

Length of output: 54


Remove warn_only=True to enforce strict determinism.

The warn_only=True parameter allows nondeterministic operations to proceed with warnings instead of raising errors. Given that this project's core objective is verifiable and reproducible training with cryptographic verification of model states, nondeterministic operations silently degrade the verification guarantee.

Suggested fix
-    torch.use_deterministic_algorithms(True, warn_only=True)
+    torch.use_deterministic_algorithms(True)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@experiments/verifiable_finetuning/utils.py` around lines 11 - 16, In
set_seed, enforce strict deterministic behavior by removing the warn_only=True
argument from the torch.use_deterministic_algorithms call (i.e., call
torch.use_deterministic_algorithms(True) so nondeterministic ops raise errors);
update the set_seed function to call torch.use_deterministic_algorithms(True) to
ensure strict determinism for reproducible, verifiable training.

@github-actions github-actions bot added size/XL and removed size/XL labels Mar 28, 2026
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
@github-actions github-actions bot added size/XL and removed size/XL labels Mar 28, 2026
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
@github-actions github-actions bot added size/XL and removed size/XL labels Mar 28, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@experiments/verifiable_finetuning/utils.py`:
- Around line 21-25: The hash_file function currently reads entire file into
memory via f.read(), which will OOM on large checkpoints; modify hash_file to
read the file in fixed-size chunks (e.g., 64KB or 1MB) in a loop and call
hasher.update(chunk) for each chunk until EOF, keeping the rest of the function
(hasher = hashlib.sha256(), open(file_path, "rb"), return hasher.hexdigest())
the same so it streams and hashes arbitrarily large files without loading them
entirely.
- Around line 33-42: The update_manifest function currently assumes
manifest_path contains valid JSON; if json.load raises json.JSONDecodeError the
function will crash with a confusing error. Wrap the open+json.load block in a
try/except catching json.JSONDecodeError (and optionally IOError), and on decode
error either reinitialize manifest = {} or raise a clearer exception mentioning
manifest_path and the original error; ensure you still write the new
manifest[stage] = data and that the file is overwritten when recovering.
Reference: function update_manifest, variable manifest_path, exception
json.JSONDecodeError.
- Around line 28-30: The save_deterministic function currently uses the internal
parameter _use_new_zipfile_serialization which is undocumented and may be
removed; update save_deterministic to guard or document this: check
torch.__version__ (or use packaging.version.parse) and only pass
_use_new_zipfile_serialization when running a PyTorch version known to support
it (e.g., 2.11), otherwise fall back to calling torch.save without that kwarg or
use an alternate deterministic serialization approach, and add a short
comment/docstring near save_deterministic describing the version requirement and
why the branch exists; ensure hash_file is still called on the saved path in
both branches.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 696fb768-980c-43f7-97ad-a207582bcc3c

📥 Commits

Reviewing files that changed from the base of the PR and between b9829a6 and b025e75.

📒 Files selected for processing (1)
  • experiments/verifiable_finetuning/utils.py

Comment on lines +21 to +25
def hash_file(file_path):
hasher = hashlib.sha256()
with open(file_path, "rb") as f:
hasher.update(f.read())
return hasher.hexdigest()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Use chunked reading for large checkpoint files.

f.read() loads the entire file into memory. This works for the tiny model POC but will cause memory issues with real-world checkpoints (often GBs).

♻️ Chunked hashing for large files
 def hash_file(file_path):
     hasher = hashlib.sha256()
     with open(file_path, "rb") as f:
-        hasher.update(f.read())
+        for chunk in iter(lambda: f.read(8192), b""):
+            hasher.update(chunk)
     return hasher.hexdigest()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@experiments/verifiable_finetuning/utils.py` around lines 21 - 25, The
hash_file function currently reads entire file into memory via f.read(), which
will OOM on large checkpoints; modify hash_file to read the file in fixed-size
chunks (e.g., 64KB or 1MB) in a loop and call hasher.update(chunk) for each
chunk until EOF, keeping the rest of the function (hasher = hashlib.sha256(),
open(file_path, "rb"), return hasher.hexdigest()) the same so it streams and
hashes arbitrarily large files without loading them entirely.

Comment on lines +28 to +30
def save_deterministic(state_dict, path):
torch.save(state_dict, path, _use_new_zipfile_serialization=False)
return hash_file(path)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

PyTorch torch.save _use_new_zipfile_serialization parameter deprecated removed

💡 Result:

The _use_new_zipfile_serialization parameter in torch.save is an internal (underscore-prefixed) option introduced around PyTorch 1.6 to control whether to use the new zipfile-based serialization format (True, default since 1.6) or the legacy pickle-based format (False). It is not officially documented in the torch.save API reference pages across versions including 2.11. No evidence found in official documentation, source code snippets, release notes (up to 2.11.0 as of 2026-03-28), or GitHub issues/PRs indicating it has been formally deprecated or removed. The parameter still appears in serialization source code comments referencing PyTorch 1.6 behavior and is mentioned in legacy docs (e.g., 1.6.0 notes). Current serialization docs (stable/main) describe the zip format as default since 1.6 without mentioning the parameter, suggesting it remains supported for backward compatibility to produce legacy files but is unneeded for default behavior. As an undocumented internal parameter, its presence is not guaranteed across versions, but it has not been removed up to the latest PyTorch 2.11.

Citations:


Address undocumented internal parameter for deterministic serialization.

The _use_new_zipfile_serialization parameter is undocumented and internal (underscore-prefixed), with no official stability guarantees. While currently supported in PyTorch 2.11, relying on undocumented parameters creates fragility—future versions may remove it or change its behavior without notice, potentially breaking hash reproducibility across environments. Consider documenting the PyTorch version constraints or adding a version check to ensure consistent serialization behavior, or evaluate alternative approaches to deterministic saving if the parameter is not available.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@experiments/verifiable_finetuning/utils.py` around lines 28 - 30, The
save_deterministic function currently uses the internal parameter
_use_new_zipfile_serialization which is undocumented and may be removed; update
save_deterministic to guard or document this: check torch.__version__ (or use
packaging.version.parse) and only pass _use_new_zipfile_serialization when
running a PyTorch version known to support it (e.g., 2.11), otherwise fall back
to calling torch.save without that kwarg or use an alternate deterministic
serialization approach, and add a short comment/docstring near
save_deterministic describing the version requirement and why the branch exists;
ensure hash_file is still called on the saved path in both branches.

Comment on lines +33 to +42
def update_manifest(stage, data, manifest_path="manifest.json"):
manifest = {}
if os.path.exists(manifest_path):
with open(manifest_path, "r") as f:
manifest = json.load(f)

manifest[stage] = data

with open(manifest_path, "w") as f:
json.dump(manifest, f, indent=4)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Consider adding error handling for malformed JSON.

If manifest.json exists but contains invalid JSON, json.load() will raise JSONDecodeError with a potentially confusing error. For robustness, consider catching this and either re-initializing or providing a clearer error message.

♻️ Optional: Add error handling
 def update_manifest(stage, data, manifest_path="manifest.json"):
     manifest = {}
     if os.path.exists(manifest_path):
-        with open(manifest_path, "r") as f:
-            manifest = json.load(f)
+        try:
+            with open(manifest_path, "r") as f:
+                manifest = json.load(f)
+        except json.JSONDecodeError:
+            print(f"Warning: {manifest_path} is malformed, reinitializing.")
+            manifest = {}

     manifest[stage] = data

     with open(manifest_path, "w") as f:
         json.dump(manifest, f, indent=4)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@experiments/verifiable_finetuning/utils.py` around lines 33 - 42, The
update_manifest function currently assumes manifest_path contains valid JSON; if
json.load raises json.JSONDecodeError the function will crash with a confusing
error. Wrap the open+json.load block in a try/except catching
json.JSONDecodeError (and optionally IOError), and on decode error either
reinitialize manifest = {} or raise a clearer exception mentioning manifest_path
and the original error; ensure you still write the new manifest[stage] = data
and that the file is overwritten when recovering. Reference: function
update_manifest, variable manifest_path, exception json.JSONDecodeError.

@github-actions github-actions bot added size/XL and removed size/XL labels Mar 28, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (7)
experiments/verifiable_finetuning/utils.py (3)

15-19: ⚠️ Potential issue | 🟠 Major

Enforce strict determinism in set_seed.

Line 18 uses warn_only=True, which allows nondeterministic ops to proceed and weakens reproducibility guarantees for this verification flow.

Suggested fix
 def set_seed(seed=42):
     """Locks in deterministic behavior."""
     torch.manual_seed(seed)
-    torch.use_deterministic_algorithms(True, warn_only=True)
+    torch.use_deterministic_algorithms(True)
In current PyTorch docs, does torch.use_deterministic_algorithms(True, warn_only=True) allow nondeterministic operations to continue with warnings instead of raising?
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@experiments/verifiable_finetuning/utils.py` around lines 15 - 19, The
set_seed function currently calls torch.use_deterministic_algorithms(True,
warn_only=True) which only warns on nondeterministic ops; change this to enforce
strict determinism by calling torch.use_deterministic_algorithms(True,
warn_only=False) so PyTorch raises on nondeterministic ops, and also make the
function fully deterministic by seeding CUDA (torch.cuda.manual_seed_all) and
setting backend flags (torch.backends.cudnn.deterministic = True and
torch.backends.cudnn.benchmark = False) in the set_seed implementation.

29-32: ⚠️ Potential issue | 🟡 Minor

Guard usage of internal torch.save parameter.

Line 31 depends on _use_new_zipfile_serialization, an internal/underscore API with no compatibility guarantee across versions.

Suggested fix
 def save_deterministic(state_dict, path):
     """Saves state dict without zip metadata to ensure identical hashes."""
-    torch.save(state_dict, path, _use_new_zipfile_serialization=False)
+    try:
+        torch.save(state_dict, path, _use_new_zipfile_serialization=False)
+    except TypeError as e:
+        raise RuntimeError(
+            "This reproducibility flow requires torch.save support for "
+            "_use_new_zipfile_serialization=False."
+        ) from e
     return hash_file(path)
Is _use_new_zipfile_serialization an officially documented/stable torch.save parameter in latest PyTorch, and is its behavior guaranteed across releases?
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@experiments/verifiable_finetuning/utils.py` around lines 29 - 32, The use of
the internal kwarg _use_new_zipfile_serialization in save_deterministic is
unsafe across PyTorch versions; update save_deterministic to detect support for
that parameter (e.g., inspect.signature(torch.save) or checking
torch.__version__ / attribute presence) and call torch.save with
_use_new_zipfile_serialization only when supported, otherwise fall back to
calling torch.save without that kwarg; ensure the function still returns
hash_file(path) and keep the function name save_deterministic unchanged.

21-26: 🧹 Nitpick | 🔵 Trivial

Hash files in chunks to avoid high memory usage.

Line 25 reads the entire checkpoint into memory; this will not scale once checkpoints grow.

Suggested fix
 def hash_file(filepath):
     """Generates a SHA-256 hash of a file."""
     hasher = hashlib.sha256()
     with open(filepath, "rb") as f:
-        hasher.update(f.read())
+        for chunk in iter(lambda: f.read(1024 * 1024), b""):
+            hasher.update(chunk)
     return hasher.hexdigest()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@experiments/verifiable_finetuning/utils.py` around lines 21 - 26, The
hash_file function currently reads the entire file into memory which will OOM
for large checkpoints; change hash_file to stream the file in fixed-size chunks
(e.g., 64KB) and call hashlib.sha256().update on each chunk in a loop while
reading until EOF, then return hexdigest; keep the file opened in binary mode
and preserve the existing function name hash_file to locate and replace the
current implementation.
experiments/verifiable_finetuning/train_base.py (2)

35-36: ⚠️ Potential issue | 🟠 Major

Do not print unconditional “CORRECT MATCH”.

Line 36 can report success even when hashes differ.

Suggested fix
     print(f"Base run hash: {ckpt_hash}")
-    print(f"Base run hash (again): {hash_file(ckpt_path)}   CORRECT MATCH")
+    actual_hash = hash_file(ckpt_path)
+    status = "CORRECT MATCH" if actual_hash == ckpt_hash else "MISMATCH"
+    print(f"Base run hash (again): {actual_hash}   {status}")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@experiments/verifiable_finetuning/train_base.py` around lines 35 - 36, Remove
the hardcoded "CORRECT MATCH" message and instead compare ckpt_hash to
hash_file(ckpt_path): call hash_file(ckpt_path), compare the result to
ckpt_hash, and print a single informative message that includes both hashes and
a conditional "MATCH" or "MISMATCH" comment; update the two print statements
around ckpt_hash, hash_file, and ckpt_path (the variables referenced) so the
success text is only shown when the equality check passes.

31-33: ⚠️ Potential issue | 🟠 Major

dataset_hash should be an actual digest, not a label.

Line 32 stores a static string, so the manifest does not cryptographically bind the exact dataset used in this run.

Suggested fix
+import hashlib
 import torch
 import torch.nn as nn
 from utils import get_path, hash_file, save_deterministic, set_seed, update_manifest
@@
-    update_manifest(
-        "base", {"seed": 99, "dataset_hash": "synthetic_16x10_seed99", "checkpoint_hash": ckpt_hash}
-    )
+    dataset_bytes = X.detach().cpu().numpy().tobytes() + y.detach().cpu().numpy().tobytes()
+    dataset_hash = hashlib.sha256(dataset_bytes).hexdigest()
+    update_manifest("base", {"seed": 99, "dataset_hash": dataset_hash, "checkpoint_hash": ckpt_hash})
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@experiments/verifiable_finetuning/train_base.py` around lines 31 - 33, The
manifest is being written with a static label instead of a cryptographic digest;
replace the hardcoded "synthetic_16x10_seed99" in the update_manifest call with
a real digest computed from the dataset used in this run (e.g., compute a
SHA-256 of the dataset file(s) or a canonical representation) and pass that
digest as the dataset_hash argument to update_manifest; locate the call to
update_manifest in train_base.py and use the computed_digest variable (or a new
helper like compute_dataset_hash(dataset_path)) so update_manifest("base",
{"seed": 99, "dataset_hash": computed_digest, "checkpoint_hash": ckpt_hash})
records the actual dataset digest.
experiments/verifiable_finetuning/finetune.py (1)

24-27: 🧹 Nitpick | 🔵 Trivial

Avoid .data mutation in parameter updates.

Line 26 should use an in-place op directly on the tensor under no_grad() instead of mutating .data.

Suggested fix
     with torch.no_grad():
         for param in model.parameters():
-            param.data += 0.001
+            param.add_(0.001)
In current PyTorch guidance, is direct .data mutation discouraged compared with in-place ops (e.g., param.add_) inside torch.no_grad()?
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@experiments/verifiable_finetuning/finetune.py` around lines 24 - 27, The code
mutates parameter tensors via param.data inside the torch.no_grad() block;
replace the .data mutation with an in-place tensor operation (e.g., call
param.add_ or param.mul_ as appropriate) so updates occur safely without using
.data — locate the loop iterating model.parameters() in the torch.no_grad()
context and change the param.data += 0.001 line to an in-place tensor op like
param.add_(0.001).
experiments/verifiable_finetuning/manifest.py (1)

15-15: ⚠️ Potential issue | 🟡 Minor

Fix typo in verification banner.

Line 15 prints “Plipeline”; this should be “Pipeline”.

Suggested fix
-    print(" End to End Plipeline Verification \n")
+    print(" End to End Pipeline Verification \n")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@experiments/verifiable_finetuning/manifest.py` at line 15, The printed banner
string contains a typo: replace the text " End to End Plipeline Verification \n"
in the print statement with " End to End Pipeline Verification \n" (i.e., fix
"Plipeline" → "Pipeline") so the verification banner reads correctly; locate the
print call that emits the banner and update the literal accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@experiments/verifiable_finetuning/finetune.py`:
- Around line 24-27: The code mutates parameter tensors via param.data inside
the torch.no_grad() block; replace the .data mutation with an in-place tensor
operation (e.g., call param.add_ or param.mul_ as appropriate) so updates occur
safely without using .data — locate the loop iterating model.parameters() in the
torch.no_grad() context and change the param.data += 0.001 line to an in-place
tensor op like param.add_(0.001).

In `@experiments/verifiable_finetuning/manifest.py`:
- Line 15: The printed banner string contains a typo: replace the text " End to
End Plipeline Verification \n" in the print statement with " End to End Pipeline
Verification \n" (i.e., fix "Plipeline" → "Pipeline") so the verification banner
reads correctly; locate the print call that emits the banner and update the
literal accordingly.

In `@experiments/verifiable_finetuning/train_base.py`:
- Around line 35-36: Remove the hardcoded "CORRECT MATCH" message and instead
compare ckpt_hash to hash_file(ckpt_path): call hash_file(ckpt_path), compare
the result to ckpt_hash, and print a single informative message that includes
both hashes and a conditional "MATCH" or "MISMATCH" comment; update the two
print statements around ckpt_hash, hash_file, and ckpt_path (the variables
referenced) so the success text is only shown when the equality check passes.
- Around line 31-33: The manifest is being written with a static label instead
of a cryptographic digest; replace the hardcoded "synthetic_16x10_seed99" in the
update_manifest call with a real digest computed from the dataset used in this
run (e.g., compute a SHA-256 of the dataset file(s) or a canonical
representation) and pass that digest as the dataset_hash argument to
update_manifest; locate the call to update_manifest in train_base.py and use the
computed_digest variable (or a new helper like
compute_dataset_hash(dataset_path)) so update_manifest("base", {"seed": 99,
"dataset_hash": computed_digest, "checkpoint_hash": ckpt_hash}) records the
actual dataset digest.

In `@experiments/verifiable_finetuning/utils.py`:
- Around line 15-19: The set_seed function currently calls
torch.use_deterministic_algorithms(True, warn_only=True) which only warns on
nondeterministic ops; change this to enforce strict determinism by calling
torch.use_deterministic_algorithms(True, warn_only=False) so PyTorch raises on
nondeterministic ops, and also make the function fully deterministic by seeding
CUDA (torch.cuda.manual_seed_all) and setting backend flags
(torch.backends.cudnn.deterministic = True and torch.backends.cudnn.benchmark =
False) in the set_seed implementation.
- Around line 29-32: The use of the internal kwarg
_use_new_zipfile_serialization in save_deterministic is unsafe across PyTorch
versions; update save_deterministic to detect support for that parameter (e.g.,
inspect.signature(torch.save) or checking torch.__version__ / attribute
presence) and call torch.save with _use_new_zipfile_serialization only when
supported, otherwise fall back to calling torch.save without that kwarg; ensure
the function still returns hash_file(path) and keep the function name
save_deterministic unchanged.
- Around line 21-26: The hash_file function currently reads the entire file into
memory which will OOM for large checkpoints; change hash_file to stream the file
in fixed-size chunks (e.g., 64KB) and call hashlib.sha256().update on each chunk
in a loop while reading until EOF, then return hexdigest; keep the file opened
in binary mode and preserve the existing function name hash_file to locate and
replace the current implementation.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 429788a8-597b-4905-92ba-576b011cbba3

📥 Commits

Reviewing files that changed from the base of the PR and between b025e75 and bd1681f.

📒 Files selected for processing (4)
  • experiments/verifiable_finetuning/finetune.py
  • experiments/verifiable_finetuning/manifest.py
  • experiments/verifiable_finetuning/train_base.py
  • experiments/verifiable_finetuning/utils.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant