Skip to content

feat: BigramQuery trait + FileRecord/FileListView for zero-copy index support#341

Open
magnusmalm wants to merge 3 commits intodmtrKovalenko:mainfrom
magnusmalm:feat/zerocopy-views
Open

feat: BigramQuery trait + FileRecord/FileListView for zero-copy index support#341
magnusmalm wants to merge 3 commits intodmtrKovalenko:mainfrom
magnusmalm:feat/zerocopy-views

Conversation

@magnusmalm
Copy link
Copy Markdown
Contributor

Summary

Two changes toward zero-copy index support, following up on the discussion in #330.

Commit 1: BigramQuery trait

Extracts query() and is_ready() into a BigramQuery trait that BigramFilter implements. grep_search() now accepts Option<&dyn BigramQuery>, so external consumers can provide alternative implementations (e.g. an mmap-backed view) without changing the grep pipeline.

Static helpers (is_candidate, count_candidates) stay on BigramFilter since they operate on the returned Vec<u64>, not the index itself.

Commit 2: FileRecord + FileListView

Adds building blocks for mmap-friendly file list storage:

  • FileRecord: 24-byte repr(C) struct storing path offset, lengths, size, modified, and is_binary flag
  • FileListView<'a>: borrows records + string table from an mmap, provides indexed access without heap allocation
  • build_file_records(): convert &[FileItem] to records + string table
  • to_file_items(): convert back when the search pipeline needs owned FileItems

Wiring FileListView directly into match_and_score_files / grep_search would require a FileEntry trait that changes field access to method calls throughout score.rs and grep.rs. That felt too invasive for this PR. Happy to do it as a follow-up if you want to go that direction.

Benchmarked with fff-cli on buildroot (13k files). The dyn BigramQuery vtable dispatch adds no measurable overhead to grep or search.

Extract query() and is_ready() into a BigramQuery trait that
BigramFilter implements. grep_search() now accepts
Option<&dyn BigramQuery>, allowing external consumers to provide
alternative implementations (e.g. a zero-copy mmap-backed view)
without changing the grep pipeline.

Static helpers (is_candidate, count_candidates) remain on
BigramFilter since they operate on the returned Vec<u64> directly.
Add building blocks for mmap-friendly file list storage:

- FileRecord: 24-byte repr(C) struct with path offset, lengths, size,
  modified time, and is_binary flag packed into the high bit
- FileListView: borrows records + string table from an mmap, provides
  indexed access to paths and metadata without heap allocation
- build_file_records(): convert &[FileItem] to records + string table
- to_file_items(): convert back to owned FileItems for the search pipeline

Tests for record layout, flags, and FileItem round-trip included.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant