sift

Indexed regex search over a codebase: build a trigram index once, then query it with a grep-like CLI or the sift-core library.

Crate	Package	Purpose
`crates/core`	`sift-core`	Index + `CompiledSearch` + `search_index` / `search_walk`
`crates/cli`	`sift-cli`	`sift` binary (ripgrep-shaped flags)
`fuzz/`	(standalone)	`cargo-fuzz` against `sift-core` only

Docs: crates/core/benches/README.md (benchmarks & profiling), plan.md (roadmap), AGENTS.md (repo / automation hints). Per-crate README.md and AGENTS.md live under each crate and under fuzz/.

Agent skills (skills.sh / npx skills): skills/README.md.

Manual rg vs sift timing demo (kernel tree, no scripts — you run both and compare): demo/kernel-video/README.md.

Quick start

Install binary (GitHub Release)

curl -fsSL https://raw.githubusercontent.com/botirk38/sift/v0.1.2/scripts/install.sh | sh

Installs to $HOME/.local/bin/sift by default (override with PREFIX).

Environment variables:

Variable	Meaning
`SIFT_REPO`	`owner/repo` on GitHub (default: `botirk38/sift`)
`SIFT_VERSION`	Release without `v`, e.g. `0.1.2` (skips GitHub API if set)
`SIFT_DEFAULT_VERSION`	Fallback if the API is rate-limited (default baked into the script)
`PREFIX`	Install prefix; binary at `$PREFIX/bin`

If curl https://api.github.com/.../releases/latest hits a rate limit, either export SIFT_VERSION=0.1.2 before running the script or rely on the script’s built-in default version.

From source

cargo build --release -p sift-cli
./target/release/sift --sift-dir .sift build /path/to/corpus
./target/release/sift --sift-dir .sift pattern

Patterns use Rust’s regex syntax unless -F (fixed string). Literal build: sift -- build or -e build.

CLI vs ripgrep (short)

Search needs a prior index (build).
Optional path arguments must lie under the indexed corpus root.
No glob -g here yet; --no-filename is used instead of -h (help).

Performance snapshot

Current Linux benchsuite snapshot against the Linux corpus.

correctness parity: 11/11
sift faster: 8/11
rg faster: 3/11

Search class	Snapshot	Takeaway
Indexed literals	`~5.8x` faster	Trigram narrowing is doing the heavy lifting
Indexed word matches	`~5.6x` faster	Whole-word literal shaping stays cheap
Indexed alternation	`~2.6x` faster	Candidate narrowing plus `build_many` helps a lot
Full-scan Unicode	`~1.0x`	Near parity overall, but Greek classes still trail
Full-scan no-literal regex	`~0.6x`	Regex-engine full scans remain the hardest cases

Fast path takeaways:

indexed literal, word, suffix-literal, and alternation searches are decisively faster with sift
full-scan Unicode class searches are the main remaining gap versus rg
see crates/core/benches/README.md for the benchmark and profiling workflow

Develop

cargo test --workspace --all-features
cargo clippy-check   # see `.cargo/config.toml`

CI (GitHub Actions): fmt, clippy with -D warnings, test on Linux, macOS, and Windows for pushes/PRs to main / master — .github/workflows/ci.yml.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

sift

Quick start

Install binary (GitHub Release)

From source

CLI vs ripgrep (short)

Performance snapshot

Develop

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 71 Commits
.cargo		.cargo
.github/workflows		.github/workflows
benchsuite		benchsuite
crates		crates
docs		docs
fff.nvim		fff.nvim
fuzz		fuzz
scripts		scripts
skills		skills
.gitignore		.gitignore
AGENTS.md		AGENTS.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

sift

Quick start

Install binary (GitHub Release)

From source

CLI vs ripgrep (short)

Performance snapshot

Develop

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages