Add score_answer versioning and historical scorer resolution by sileod · Pull Request #3 · sileod/reasoning-core

sileod · 2026-03-22T20:41:21Z

Provide stable, auditable scorer behavior by recording scorer version, content hash, and commit with each generated example to allow reproducible scoring.
Allow loading legacy score_answer implementations by version, hash, or from a historical commit/file so that scorer changes remain backward-compatible.
Surface the change in the task authoring guide so task authors know to bump versions when changing scorers.

Add a new reasoning_core/score_answer_history.py module to compute scorer hashes, locate repository commits, load historical scorer source from files or git commits, resolve callables, and return the appropriate scoring function via resolve_score_answer_fn.
Extend reasoning_core/template.py to declare Task.score_answer_version and Task.score_answer_history, provide helpers score_answer_hash, resolve_score_answer_fn, and score_answer_for_entry, and record _score_answer metadata (version, hash, commit) in generate_example().
Update reasoning_core/__init__.py to invoke the new per-entry scoring entrypoint via DATASETS[task_name].score_answer_for_entry(...) so scorers chosen per-entry are applied correctly.
Update TASK_AUTHORING_GUIDE.md to document the _score_answer metadata and the recommended workflow for bumping scorer compatibility.
Add unit tests in tests/test_score_answer_versioning.py covering default metadata recording, loading a legacy scorer from a file via score_answer_history, and error behavior when a requested legacy version is not registered.

Executed pytest tests/test_score_answer_versioning.py which ran the new tests and they passed.
The tests validate that _score_answer metadata is recorded, legacy scorer files can be loaded for historical version, and that missing score_answer_history entries raise a KeyError as expected.

Extract score_answer history helpers

39c007a

sileod added the codex label Mar 22, 2026 — with ChatGPT Codex Connector

Provide feedback