feat: BigramQuery trait + FileRecord/FileListView for zero-copy index support#341
Open
magnusmalm wants to merge 3 commits intodmtrKovalenko:mainfrom
Open
feat: BigramQuery trait + FileRecord/FileListView for zero-copy index support#341magnusmalm wants to merge 3 commits intodmtrKovalenko:mainfrom
magnusmalm wants to merge 3 commits intodmtrKovalenko:mainfrom
Conversation
Extract query() and is_ready() into a BigramQuery trait that BigramFilter implements. grep_search() now accepts Option<&dyn BigramQuery>, allowing external consumers to provide alternative implementations (e.g. a zero-copy mmap-backed view) without changing the grep pipeline. Static helpers (is_candidate, count_candidates) remain on BigramFilter since they operate on the returned Vec<u64> directly.
Add building blocks for mmap-friendly file list storage: - FileRecord: 24-byte repr(C) struct with path offset, lengths, size, modified time, and is_binary flag packed into the high bit - FileListView: borrows records + string table from an mmap, provides indexed access to paths and metadata without heap allocation - build_file_records(): convert &[FileItem] to records + string table - to_file_items(): convert back to owned FileItems for the search pipeline Tests for record layout, flags, and FileItem round-trip included.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two changes toward zero-copy index support, following up on the discussion in #330.
Commit 1: BigramQuery trait
Extracts
query()andis_ready()into aBigramQuerytrait thatBigramFilterimplements.grep_search()now acceptsOption<&dyn BigramQuery>, so external consumers can provide alternative implementations (e.g. an mmap-backed view) without changing the grep pipeline.Static helpers (
is_candidate,count_candidates) stay onBigramFiltersince they operate on the returnedVec<u64>, not the index itself.Commit 2: FileRecord + FileListView
Adds building blocks for mmap-friendly file list storage:
FileRecord: 24-byterepr(C)struct storing path offset, lengths, size, modified, and is_binary flagFileListView<'a>: borrows records + string table from an mmap, provides indexed access without heap allocationbuild_file_records(): convert&[FileItem]to records + string tableto_file_items(): convert back when the search pipeline needs ownedFileItemsWiring
FileListViewdirectly intomatch_and_score_files/grep_searchwould require aFileEntrytrait that changes field access to method calls throughoutscore.rsandgrep.rs. That felt too invasive for this PR. Happy to do it as a follow-up if you want to go that direction.Benchmarked with fff-cli on buildroot (13k files). The
dyn BigramQueryvtable dispatch adds no measurable overhead to grep or search.