Add plan for visualizing experiment results in the debugger by smirnovlad · Pull Request #233 · IINemo/thinkbooster

smirnovlad · 2026-03-13T20:44:08Z

No description provided.

The original plan incorrectly assumed step_candidates and all_trajectories are saved in results.json. They are not — strategies return them but run_tts_eval.py only saves a subset of fields. Added Step 0 prerequisite to save tree-building data before the converter can work.

… experiments in visual debugger

smirnovlad

(Reposting with proper formatting — ignore the previous review)

Nice implementation — follows the plan well and the app.js changes for offline mode / sample navigation are solid. One critical bug and a couple of minor things.

Critical: import before sys.path (`convert_results_to_debugger.py:23`)

from service_app.core.debugger_events import ... runs before sys.path.insert on line 26, so Python can't find the service_app package → ModuleNotFoundError on every run.

Fix: move PROJECT_ROOT / sys.path.insert above the import:

PROJECT_ROOT = Path(__file__).resolve().parents[1]
sys.path.insert(0, str(PROJECT_ROOT))

from service_app.core.debugger_events import convert_strategy_result_to_debugger_run

Minor: `import yaml` dependency

pyyaml is not in the base deps (only in service extras). Consider adding a note or using omegaconf (already a dep via hydra).

Nit: speculative keys in `run_tts_eval.py`

all_traces and all_trajectory_steps — I don't see any strategy returning these keys. Not harmful (the if key in result guard skips them), but consider removing to avoid confusion.

Serialization: looks good

Checked: save_results_json uses safe_serialize() which handles StepCandidate objects. And for beam search, step_candidates entries are already plain dicts. No issues here.

app.js offline mode: looks good

Graceful fallback to cached examples when API is down, auto-enable cached mode toggle, sample navigation with prev/next and incorrect filter — all solid.

- Move sys.path.insert before service_app import to avoid ModuleNotFoundError - Remove all_traces and all_trajectory_steps keys (no strategy returns them)

Both are used by debugger_events.py: all_traces for self-consistency visualization, all_trajectory_steps for offline BoN step-level display.

smirnovlad · 2026-03-15T07:43:09Z

One more thing: scripts/README_debugger.md feels misplaced — we keep docs in docs/. Please move it to docs/service/debugger.md (next to the existing docs/service/SERVICE_API_GUIDE.md).

smirnovlad

I've pushed two follow-up commits on this branch:

1069ce5 — Fixed the critical import order bug in convert_results_to_debugger.py (moved sys.path.insert before the service_app import)
7d3e57e — Restored all_traces and all_trajectory_steps keys that I incorrectly removed (they are used by debugger_events.py for self-consistency and offline BoN visualization)

Remaining suggestion: move scripts/README_debugger.md to docs/service/debugger.md (see comment above).

smirnovlad

Final review after fixes

All previously identified issues have been resolved:

Import order — fixed (sys.path.insert now comes before service_app import)
Linting — flake8 (noqa: E402), black, isort all pass
Tree data keys — all 7 keys restored; verified each is used by at least one strategy + debugger_events.py
Docs location — moved to docs/service/debugger.md

What this PR does

run_tts_eval.py — saves tree visualization data that strategies already compute but was previously discarded
convert_results_to_debugger.py — new converter script that transforms experiment results into debugger-compatible JSON
app.js + index.html — file upload, sample navigation (prev/next), incorrect only filter, offline mode fallback
docs/service/debugger.md — usage docs
Test config — quick beam search config for MATH-500 with OpenRouter

Non-blocking note

pyyaml dependency: the converter uses import yaml which requires pyyaml. It is available via hydra (already a dep for eval), but worth mentioning in docs if someone runs the converter in a minimal env.

Should be tested end-to-end by the author: run an experiment with updated run_tts_eval.py, then run the converter and verify the debugger renders the tree.

LGTM from my side.

smirnovlad-test · 2026-03-20T15:11:36Z

Not clear names of entries in selector of cached examples
"Выберите файл" -> add styles
Hardcoded "Online bon" for each cached example? It is like a constant
More tests (with bigger beam size, number of candidates and so on)

Improve experiment visualizer: add --merge flag, timestamps, and UI fixes - Add --merge flag to converter for accumulating results in cached_examples.json - Add run timestamps and dataset index to scenario titles - Add requires_scorer field to fix self-consistency "Run" button - Use unique IDs per strategy to prevent scenario collisions when combining - Replace browser-native file input with English "Choose JSON file…" button - Auto-run cached example when switching scenarios

smirnovlad and others added 3 commits March 14, 2026 00:42

Add plan for visualizing experiment results in the debugger

24758ea

feat: added implementation of tree visualisations for all examples in…

bfb0814

… experiments in visual debugger

This comment was marked as off-topic.

Sign in to view

smirnovlad commented Mar 15, 2026

View reviewed changes

smirnovlad added 2 commits March 15, 2026 11:19

Fix import order and remove speculative keys in debugger converter

1069ce5

- Move sys.path.insert before service_app import to avoid ModuleNotFoundError - Remove all_traces and all_trajectory_steps keys (no strategy returns them)

Restore all_traces and all_trajectory_steps keys

7d3e57e

Both are used by debugger_events.py: all_traces for self-consistency visualization, all_trajectory_steps for offline BoN step-level display.

smirnovlad commented Mar 15, 2026

View reviewed changes

smirnovlad added 4 commits March 15, 2026 11:49

Fix flake8 E402: add noqa for imports after sys.path setup

9ea7199

Fix black formatting in convert_results_to_debugger.py

9fa6672

Fix isort formatting in convert_results_to_debugger.py

1aa1c14

Move debugger docs to docs/service/debugger.md

7e954bf

smirnovlad commented Mar 15, 2026

View reviewed changes

smirnovlad-test assigned smirnovlad-test and rvz16 and unassigned smirnovlad-test Mar 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add plan for visualizing experiment results in the debugger#233

Add plan for visualizing experiment results in the debugger#233
smirnovlad wants to merge 10 commits intomainfrom
experiment-visualization-plan

smirnovlad commented Mar 13, 2026

Uh oh!

This comment was marked as off-topic.

Uh oh!

smirnovlad left a comment

Uh oh!

smirnovlad commented Mar 15, 2026

Uh oh!

smirnovlad left a comment

Uh oh!

smirnovlad left a comment

Uh oh!

smirnovlad-test commented Mar 20, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

smirnovlad commented Mar 13, 2026

Uh oh!

This comment was marked as off-topic.

Uh oh!

smirnovlad left a comment

Choose a reason for hiding this comment

Critical: import before sys.path (convert_results_to_debugger.py:23)

Minor: import yaml dependency

Nit: speculative keys in run_tts_eval.py

Serialization: looks good

app.js offline mode: looks good

Uh oh!

smirnovlad commented Mar 15, 2026

Uh oh!

smirnovlad left a comment

Choose a reason for hiding this comment

Uh oh!

smirnovlad left a comment

Choose a reason for hiding this comment

Final review after fixes

What this PR does

Non-blocking note

Uh oh!

smirnovlad-test commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Critical: import before sys.path (`convert_results_to_debugger.py:23`)

Minor: `import yaml` dependency

Nit: speculative keys in `run_tts_eval.py`

smirnovlad-test commented Mar 20, 2026 •

edited

Loading