treedb: add template lab tooling and class-aware rewrite observability#854
treedb: add template lab tooling and class-aware rewrite observability#854snissn wants to merge 3 commits intofollowon/vlog-dict-class-split-experimentfrom
Conversation
|
Tracking ticket with full experiment details and exact reproduction commands is now filed at: #855 Key outcome from this run:
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0f8896fcf8
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if mode == "mixed" { | ||
| prefixLimit = pProbe | ||
| } |
There was a problem hiding this comment.
Keep prefix probe limit independent from pointer override
In mixed mode, the prefix probe branch assigns prefixLimit = pProbe, so -pointer-probe unexpectedly changes how many prefix records are probed. The CLI help says -pointer-probe only overrides pointer probing, so this silently skews probe_attempted/probe_kept and can under/over-sample prefix results whenever -probe and -pointer-probe differ.
Useful? React with 👍 / 👎.
| pointerWriter, err := newCorpusWriter(pointerPath, *overwrite) | ||
| if err != nil { | ||
| log.Fatalf("open pointer corpus: %v", err) |
There was a problem hiding this comment.
Honor skip flags before opening corpus output files
The command creates pointer_values.bin and outer_leaf_pages.bin before checking -skip-pointer / -skip-outer-leaf. This means a skipped section can still fail due to an existing file when -overwrite=false, and with overwrite enabled it can truncate a file the user intended to keep. Partial reruns should avoid opening (and thus touching) outputs for skipped datasets.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Pull request overview
This PR advances the template-compression investigation by wiring template store/config into the value-log rewrite path, adding class-aware rewrite observability, and introducing lab tooling to extract corpora and run template sweeps.
Changes:
- Add template rewrite stats (attempted/kept/input/output + per-class reason counters) and expose them in
treemap vlog-rewrite. - Wire
TemplateStoreinto rewrite options and side-store lookup plumbing. - Add new lab commands:
template_corpus_extract,template_lab, andtemplate_seed_train(+ docs/benchmark notes) to support corpus extraction, sweeps, and store seeding.
Reviewed changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| docs/benchmarks/CELESTIA_VLOG_COMPRESSION_NOTES_2026-03-26.md | Adds reproducible commands/results for the template lab sweep. |
| TreeDB/vlog_rewrite.go | Public treedb rewrite stats extended with template/class counters and reason maps. |
| TreeDB/db/vlog_rewrite.go | Implements template-aware rewrite writer, applies template compression during offline rewrite, and plumbs stats/reason snapshots. |
| TreeDB/db/db.go | Adds ValueLogOptions.TemplateStore to support template encoding (e.g., rewrite prepass). |
| TreeDB/side_store_lookups.go | Wires templatedb lookups and conditionally sets TemplateStore when template mode is enabled. |
| TreeDB/cmd/treemap/main.go | Extends vlog-rewrite command with template flags, templatedb wiring, and class-aware output lines. |
| TreeDB/db/vlog_rewrite_test.go | Adds rewrite writer tests validating template prepass ordering and template-only dict bypass. |
| TreeDB/caching/db.go | Adds per-record dict-class range splitting and appends mixed-class batches via multiple dict IDs. |
| TreeDB/caching/vlog_dict_classifier_test.go | Adds tests for class-range splitting behavior (split outer-leaf vs single mode). |
| TreeDB/cmd/template_corpus_extract/main.go | New command to extract pointer/outer-leaf corpora + manifest. |
| TreeDB/cmd/template_corpus_extract/README.md | Documents corpus extraction output format and usage. |
| TreeDB/cmd/template_lab/main.go | New sweep harness for template configs and optional outer-leaf pretransform experiments. |
| TreeDB/cmd/template_lab/README.md | Documents lab harness input format, flags, and notes. |
| TreeDB/cmd/template_seed_train/main.go | New command to seed/warm templatedb from an existing app dir. |
| TreeDB/cmd/template_seed_train/README.md | Documents seeding usage and behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| func (s *gzipRecordSizer) WriteRecord(payload []byte) error { | ||
| if s == nil || s.gw == nil { | ||
| return errors.New("nil gzip sizer") | ||
| } | ||
| if len(payload) > int(^uint32(0)) { | ||
| return fmt.Errorf("record too large: %d", len(payload)) | ||
| } | ||
| var hdr [4]byte | ||
| binary.LittleEndian.PutUint32(hdr[:], uint32(len(payload))) | ||
| if _, err := s.gw.Write(hdr[:]); err != nil { | ||
| return err | ||
| } | ||
| if len(payload) == 0 { | ||
| return nil | ||
| } | ||
| _, err := s.gw.Write(payload) | ||
| return err | ||
| } | ||
|
|
||
| func (s *gzipRecordSizer) Close() (int64, error) { | ||
| if s == nil || s.gw == nil { | ||
| return 0, nil | ||
| } | ||
| if err := s.gw.Close(); err != nil { | ||
| return 0, err | ||
| } | ||
| return s.cw.n, nil | ||
| } | ||
|
|
||
| func readCorpusFile(path string, maxRecords int) ([][]byte, error) { | ||
| f, err := os.Open(path) | ||
| if err != nil { | ||
| return nil, err | ||
| } | ||
| defer f.Close() | ||
| r := bufio.NewReaderSize(f, 1<<20) | ||
| out := make([][]byte, 0, 1024) | ||
| for { | ||
| if maxRecords > 0 && len(out) >= maxRecords { | ||
| break | ||
| } | ||
| var hdr [4]byte | ||
| if _, err := io.ReadFull(r, hdr[:]); err != nil { | ||
| if errors.Is(err, io.EOF) { | ||
| break | ||
| } | ||
| if errors.Is(err, io.ErrUnexpectedEOF) { | ||
| return nil, fmt.Errorf("truncated record header in %s", path) | ||
| } | ||
| return nil, err | ||
| } | ||
| n := binary.LittleEndian.Uint32(hdr[:]) | ||
| buf := make([]byte, int(n)) | ||
| if n > 0 { | ||
| if _, err := io.ReadFull(r, buf); err != nil { | ||
| return nil, err | ||
| } | ||
| } |
There was a problem hiding this comment.
template_lab has the same 32-bit overflow issue in WriteRecord: len(payload) > int(^uint32(0)) will wrap on 32-bit platforms and reject all records. Also, readCorpusFile converts the on-disk uint32 length to int without checking n fits in int, which can overflow/panic on 32-bit or when reading corrupted inputs. Prefer uint64 bounds checks before converting/allocating.
| templateDir := filepath.Join(*appDir, "templatedb") | ||
| tmplDB, err := treedbdb.Open(treedbdb.Options{Dir: templateDir, DisableBackgroundPrune: true}) | ||
| if err != nil { | ||
| log.Fatalf("open templatedb: %v", err) | ||
| } | ||
| defer tmplDB.Close() | ||
| store := templatedb.New(templateBackendKV{db: tmplDB}, templatedb.Config{}) | ||
|
|
||
| mainDir := *appDir | ||
| if info, statErr := os.Stat(filepath.Join(*appDir, "maindb")); statErr == nil && info.IsDir() { | ||
| mainDir = filepath.Join(*appDir, "maindb") | ||
| } | ||
| backendOpts := treedbdb.Options{ | ||
| Dir: mainDir, | ||
| ReadOnly: true, | ||
| DisableBackgroundPrune: true, | ||
| } | ||
| dictDir := filepath.Join(*appDir, "dictdb") | ||
| dictIndex := filepath.Join(dictDir, "index.db") | ||
| if info, statErr := os.Stat(dictIndex); statErr == nil && !info.IsDir() { | ||
| dictOpts := treedbdb.Options{ | ||
| Dir: dictDir, | ||
| ReadOnly: true, | ||
| DisableBackgroundPrune: true, | ||
| IndexOuterLeavesInValueLog: false, | ||
| } | ||
| dictBackend, err := treedbdb.Open(dictOpts) | ||
| if err != nil { | ||
| log.Fatalf("open dictdb: %v", err) | ||
| } | ||
| defer dictBackend.Close() | ||
| dictStore := dictdb.New(dictBackend) | ||
| backendOpts.ValueLog.DictLookup = func(dictID uint64) ([]byte, error) { | ||
| return dictStore.GetDictBytes(context.Background(), dictID) | ||
| } | ||
| } | ||
|
|
||
| backend, err := treedbdb.Open(backendOpts) |
There was a problem hiding this comment.
This command opens templatedb, maindb, and dictdb via treedbdb.Open without applying the persisted format config (e.g., via treedbdb.LoadFormatConfig(...).ApplyToOptions(...)). Other tooling (e.g. treemap) does this to ensure options like chunk size / index format match on-disk settings. Without it, the tool can fail to open or misinterpret a DB created with non-default format config.
| pointerWriter, err := newCorpusWriter(pointerPath, *overwrite) | ||
| if err != nil { | ||
| log.Fatalf("open pointer corpus: %v", err) | ||
| } | ||
| defer func() { | ||
| _ = pointerWriter.Close() | ||
| }() | ||
| outerWriter, err := newCorpusWriter(outerLeafPath, *overwrite) | ||
| if err != nil { | ||
| log.Fatalf("open outer-leaf corpus: %v", err) | ||
| } | ||
| defer func() { | ||
| _ = outerWriter.Close() | ||
| }() | ||
|
|
||
| pointerScanned := 0 | ||
| if !*skipPointer { | ||
| pointerScanned, err = extractPointerCorpus(backend, pointerWriter, *pointerLimit, *pointerStride) | ||
| if err != nil { | ||
| log.Fatalf("extract pointer corpus: %v", err) | ||
| } | ||
| } | ||
| if err := pointerWriter.Close(); err != nil { | ||
| log.Fatalf("close pointer corpus: %v", err) | ||
| } | ||
|
|
||
| outerLeafScanned := 0 | ||
| if !*skipOuterLeaf { | ||
| outerLeafScanned, err = extractOuterLeafCorpus(backend, outerWriter, *outerLeafLimit, *outerLeafStride) | ||
| if err != nil { | ||
| log.Fatalf("extract outer-leaf corpus: %v", err) | ||
| } | ||
| } | ||
| if err := outerWriter.Close(); err != nil { | ||
| log.Fatalf("close outer-leaf corpus: %v", err) | ||
| } |
There was a problem hiding this comment.
corpusWriter instances are both deferred-closed and then explicitly closed later. Since (*corpusWriter).Close is not idempotent (it doesn’t nil out w.buf / w.file after closing), the second close will attempt to flush/close an already-closed file and may return errors (currently ignored in the deferred path). Prefer a single close path (either defer only, or explicit close only), or make Close idempotent by clearing fields after successful close.
| store := templatedb.New(templateBackendKV{db: templateBackend}, templatedb.Config{}) | ||
| if opts.ValueLog.TemplateMode != template.TemplateOff && opts.ValueLog.TemplateStore == nil { | ||
| opts.ValueLog.TemplateStore = store | ||
| } |
There was a problem hiding this comment.
wireSideStoreLookups may set opts.ValueLog.TemplateStore to a templatedb.Store backed by a DB handle opened read-only (see templateOpts.ReadOnly = true earlier in this block). The template engine’s training/publish pipeline calls Store.PutTemplateDef, which will fail with ErrReadOnly, preventing templates from being published and potentially wasting CPU on repeated publish attempts. Consider either opening templatedb read-write when TemplateMode != TemplateOff (and the main opts are not read-only), or wrapping the store with a read-only implementation that disables training/publishing for rewrite-only experiments.
| func (db *DB) appendValueLogForRecords(l *lane, records []valuelog.Record, durability journalDurability) ([]page.ValuePtr, error) { | ||
| if len(records) == 0 { | ||
| return nil, nil | ||
| } | ||
| ranges := db.valueLogDictClassRangesForRecords(records) | ||
| if len(ranges) == 0 { | ||
| return nil, nil | ||
| } | ||
| resolveDictID := func(class vlogDictClass, cache *[vlogDictClassCount]uint64, cached *[vlogDictClassCount]bool) (uint64, error) { | ||
| if db == nil || db.dictStore == nil { | ||
| return 0, nil | ||
| } | ||
| classIdx := int(class) | ||
| if classIdx < 0 || classIdx >= vlogDictClassCount { | ||
| classIdx = int(vlogDictClassSingleValue) | ||
| class = vlogDictClassSingleValue |
There was a problem hiding this comment.
appendValueLogForRecords always splits the record batch into class ranges, even when db.dictStore == nil (in which case resolveDictID returns 0 for every class). That turns into multiple appendValueLog(..., dictID=0, ...) calls and repeated lane locking/rotation checks with no functional benefit. Consider a fast-path: if db == nil || db.dictStore == nil (or if all resolved dictIDs are 0), call appendValueLog once with the full records slice.
| if w == nil || w.buf == nil { | ||
| return errors.New("nil corpus writer") | ||
| } | ||
| if len(payload) > int(^uint32(0)) { |
There was a problem hiding this comment.
The size check len(payload) > int(^uint32(0)) can overflow on 32-bit platforms because converting ^uint32(0) to int wraps to -1, making the condition always true. Use a uint-based comparison instead (e.g., compare uint64(len(payload)) to uint64(^uint32(0))) and similarly guard any uint32→int conversions.
| if len(payload) > int(^uint32(0)) { | |
| if uint64(len(payload)) > uint64(^uint32(0)) { |
|
Follow-up update pushed in
Key result (pointer corpus):
Outer-leaf remained unchanged ( |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 94055dff59
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| decoded = payload | ||
| } | ||
| decodeNS += time.Since(decodeStart).Nanoseconds() | ||
| recovered, err := reverseOuterLeafPretransform(decoded, cfg.OuterLeafPretransform) |
There was a problem hiding this comment.
Restrict pretransform reversal to outer-leaf datasets
runTemplateLab always calls reverseOuterLeafPretransform whenever -outer-leaf-pretransform=header_v1, but transformValue only applies that transform for outer_leaf datasets. On pointer runs this can fail with pretransform: payload too short for valid small values (or wrongly reinterpret payloads that happen to start with the magic prefix), which makes pointer/both sweeps brittle under a flag that is supposed to be outer-leaf-only.
Useful? React with 👍 / 👎.
| if mode == "mixed" { | ||
| l = pLimit |
There was a problem hiding this comment.
Keep prefix training limit independent from pointer override
In mixed mode, prefix training is capped by pLimit, so -pointer-limit also shrinks the prefix training pass. That contradicts the flag contract (-pointer-limit is documented as a pointer-source override) and can silently under-train prefix data, skewing seed behavior and summary counters when users tune pointer limits independently.
Useful? React with 👍 / 👎.
|
Follow-up pushed in What was added
ExperimentsCorpus:
ConclusionOuter-leaf pages remain non-templateable in this path even with directory normalization + aggressive template training knobs. This further supports pivoting outer-leaf optimization toward non-template mechanisms (block/dict/codec path), while template remains high-impact for pointer-value payloads. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3c6fbfd238
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| backendOpts := treedbdb.Options{ | ||
| Dir: mainDir, | ||
| ReadOnly: true, | ||
| DisableBackgroundPrune: true, | ||
| } |
There was a problem hiding this comment.
Wire template lookup before scanning backend values
The seeding tool opens templatedb but never sets backendOpts.ValueLog.TemplateLookup for the main backend, so reads depend on raw value-log payloads only. On any database that already contains template-encoded values, iterator materialization (ValueCopy) can hit missing-template decode errors and fail the training/probe pass instead of reading real values. This path should pass a template lookup (and decode options) from the opened template store before treedbdb.Open(backendOpts).
Useful? React with 👍 / 👎.
| if manifest.Pointer.File != "" { | ||
| *pointerFile = filepath.Join(*corpusDir, manifest.Pointer.File) | ||
| } | ||
| if manifest.OuterLeaf.File != "" { | ||
| *outerLeafFile = filepath.Join(*corpusDir, manifest.OuterLeaf.File) |
There was a problem hiding this comment.
Preserve explicit dataset file flags when manifest exists
When -corpus-dir is set and manifest.json is present, the code unconditionally overwrites -pointer-file / -outer-leaf-file with manifest paths. This silently ignores explicit CLI file selections and can run experiments against the wrong corpus files, skewing reported results. Manifest-derived paths should be used only as fallbacks when the corresponding flag was not provided.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 18 out of 18 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if len(payload) < len(magic)+12 { | ||
| return nil, false, errors.New("pretransform: payload too short") | ||
| } | ||
| if !bytes.Equal(payload[:len(magic)], magic) { | ||
| return payload, false, nil | ||
| } |
There was a problem hiding this comment.
reverseHeaderSideTransform returns an error for untransformed payloads that are simply shorter than len(magic)+12, because it checks length before confirming the magic prefix. This makes the reverse path overly strict and can fail round-trips if shorter payloads are ever passed through (even when not transformed). Reorder the checks so you first verify whether the payload has the magic prefix (or at least has len(magic) bytes) and only enforce the +12 header requirement when the magic matches.
| if len(payload) < len(magic)+12 { | |
| return nil, false, errors.New("pretransform: payload too short") | |
| } | |
| if !bytes.Equal(payload[:len(magic)], magic) { | |
| return payload, false, nil | |
| } | |
| // If the payload is shorter than the magic, it cannot be transformed; pass through. | |
| if len(payload) < len(magic) { | |
| return payload, false, nil | |
| } | |
| // If the magic prefix does not match, treat as untransformed and pass through. | |
| if !bytes.Equal(payload[:len(magic)], magic) { | |
| return payload, false, nil | |
| } | |
| // At this point the payload is transformed; enforce the full header length requirement. | |
| if len(payload) < len(magic)+12 { | |
| return nil, false, errors.New("pretransform: payload too short") | |
| } |
| templateDir := filepath.Join(rootDir, "templatedb") | ||
| templateIndexPath := filepath.Join(templateDir, "index.db") | ||
| if _, err := os.Stat(templateIndexPath); err == nil { | ||
| templateReadOnly := templateMode == template.TemplateOff | ||
| templateOpts := treedbdb.Options{Dir: templateDir, ReadOnly: templateReadOnly} | ||
| applyPersistedFormatConfig(templateDir, &templateOpts) | ||
| templateOpts.DisableBackgroundPrune = true | ||
| templateOpts.ValueLog.Compression = treedbdb.ValueLogCompressionOff | ||
| templateOpts.ValueLog.TemplateMode = template.TemplateOff | ||
| templateOpts.ValueLog.TemplateLookup = nil | ||
| templateOpts.ValueLog.TemplateStore = nil | ||
|
|
||
| templateBackend, err := treedbdb.Open(templateOpts) | ||
| if err != nil { | ||
| fatalf("templatedb open: %v", err) | ||
| } |
There was a problem hiding this comment.
treemap vlog-rewrite opens templatedb directly, but treedb.ValueLogRewriteOffline (TreeDB-level) also calls wireSideStoreLookups, which (per this PR) opens/wires side stores including templatedb. This duplication can cause double-open behavior and potential lock/handle contention (especially when template-mode is not off and this path opens templatedb read-write). Consider removing the explicit templatedb open/wiring in treemap and relying on wireSideStoreLookups to populate TemplateLookup/TemplateStore, or explicitly disabling side-store wiring when treemap takes ownership of templatedb wiring.
| templateDir := filepath.Join(*appDir, "templatedb") | ||
| tmplDB, err := treedbdb.Open(treedbdb.Options{Dir: templateDir, DisableBackgroundPrune: true}) | ||
| if err != nil { | ||
| log.Fatalf("open templatedb: %v", err) | ||
| } |
There was a problem hiding this comment.
This command opens existing on-disk DBs (templatedb, and the main backend) without first applying the persisted format config (contrast with treemap’s applyPersistedFormatConfig). For safety/consistency, load and apply the format config for each opened DB directory (or explicitly set IgnoreFormatConfig when intended) so the tool interprets on-disk data using the persisted knobs.
| backendOpts := treedbdb.Options{ | ||
| Dir: mainDir, | ||
| ReadOnly: true, | ||
| DisableBackgroundPrune: true, | ||
| } |
There was a problem hiding this comment.
This command opens existing on-disk DBs (templatedb, and the main backend) without first applying the persisted format config (contrast with treemap’s applyPersistedFormatConfig). For safety/consistency, load and apply the format config for each opened DB directory (or explicitly set IgnoreFormatConfig when intended) so the tool interprets on-disk data using the persisted knobs.
| return 0, errors.New("value-log reader unavailable") | ||
| } | ||
|
|
||
| seen := make(map[outerLeafKey]struct{}, 1<<20) |
There was a problem hiding this comment.
Preallocating seen to 1<<20 entries can cause a large up-front memory spike even for small runs (e.g., low -outer-leaf-limit). Consider sizing the initial capacity based on the effective limit (e.g., min(limit, 1<<20)), or starting smaller and letting the map grow to reduce peak memory footprint for typical invocations.
| seen := make(map[outerLeafKey]struct{}, 1<<20) | |
| initialCap := 1 << 10 | |
| if limit > 0 { | |
| if limit < (1 << 20) { | |
| initialCap = limit | |
| } else { | |
| initialCap = 1 << 20 | |
| } | |
| } | |
| seen := make(map[outerLeafKey]struct{}, initialCap) |
|
|
||
| func newGzipRecordSizer() *gzipRecordSizer { | ||
| cw := &countingWriter{} | ||
| gw, _ := gzip.NewWriterLevel(cw, gzip.BestSpeed) |
There was a problem hiding this comment.
gzip.NewWriterLevel returns an error (even if currently unlikely with gzip.BestSpeed). Ignoring it makes failures harder to diagnose if the level ever becomes configurable or changed. Handle the error (e.g., fall back to gzip.NewWriter or return an error from newGzipRecordSizer) so the harness fails explicitly when gzip initialization fails.
| gw, _ := gzip.NewWriterLevel(cw, gzip.BestSpeed) | |
| gw, err := gzip.NewWriterLevel(cw, gzip.BestSpeed) | |
| if err != nil { | |
| log.Printf("failed to create gzip writer with level %d: %v; falling back to default level", gzip.BestSpeed, err) | |
| gw = gzip.NewWriter(cw) | |
| } |
Summary
This PR executes the next phase of the compression investigation plan:
What changed
pointer_valuevsouter_leafattempted/kept/input/output bytes and reason maps.TemplateStorein value-log options, side-store wiring).treemap vlog-rewriteoutput with class-specific template lines.template_corpus_extractcommand:pointer_values.bin,outer_leaf_pages.bin,manifest.json.template_labcommand:off|header_v1outer-leaf pre-transform,-disable-mask-templatesfor anchor-only runs.template_seed_traincommand + README for seedingtemplatedbfrom existing app dirs.Validation
go test ./TreeDB/caching -run 'TestValueLogDictClassRangesForRecords_SplitOuterLeaf|TestValueLogDictClassRangesForRecords_SingleMode' -count=1go test ./TreeDB/db -run 'TestRewriteWriter_TemplatePrepassEncodesBeforeDict|TestRewriteWriter_TemplateOnlyDisablesDictCompression' -count=1go test ./TreeDB/cmd/treemap ./TreeDB/cmd/template_corpus_extract ./TreeDB/cmd/template_lab ./TreeDB/cmd/template_seed_train -count=1go test ./TreeDB -run 'TestVacuumIndexOffline_WithTemplateFrames_WiresTemplateLookup' -count=1Experiment run (full corpus)
Corpus:
/tmp/template_corpus_run_20260327_120046Sweep artifacts:
/tmp/template_lab_run_off_20260327.json/tmp/template_lab_run_header_v1_20260327.jsonObserved:
20092/42802, encoded bytes improved by ~0.3904%.0keeps across tested configs.header_v1pre-transform: no measurable outer-leaf improvement in this pass.Repro details are documented in
docs/benchmarks/CELESTIA_VLOG_COMPRESSION_NOTES_2026-03-26.md.