Indexed regex search over a codebase: build a trigram index once, then query it with a grep-like CLI or the sift-core library.
| Crate | Package | Purpose |
|---|---|---|
crates/core |
sift-core |
Index + CompiledSearch + search_index / search_walk |
crates/cli |
sift-cli |
sift binary (ripgrep-shaped flags) |
fuzz/ |
(standalone) | cargo-fuzz against sift-core only |
Docs: crates/core/benches/README.md (benchmarks & profiling), plan.md (roadmap), AGENTS.md (repo / automation hints). Per-crate README.md and AGENTS.md live under each crate and under fuzz/.
Agent skills (skills.sh / npx skills): skills/README.md.
Manual rg vs sift timing demo (kernel tree, no scripts — you run both and compare): demo/kernel-video/README.md.
curl -fsSL https://raw.githubusercontent.com/botirk38/sift/v0.1.2/scripts/install.sh | shInstalls to $HOME/.local/bin/sift by default (override with PREFIX).
Environment variables:
| Variable | Meaning |
|---|---|
SIFT_REPO |
owner/repo on GitHub (default: botirk38/sift) |
SIFT_VERSION |
Release without v, e.g. 0.1.2 (skips GitHub API if set) |
SIFT_DEFAULT_VERSION |
Fallback if the API is rate-limited (default baked into the script) |
PREFIX |
Install prefix; binary at $PREFIX/bin |
If curl https://api.github.com/.../releases/latest hits a rate limit, either export SIFT_VERSION=0.1.2 before running the script or rely on the script’s built-in default version.
cargo build --release -p sift-cli
./target/release/sift --sift-dir .sift build /path/to/corpus
./target/release/sift --sift-dir .sift patternPatterns use Rust’s regex syntax unless -F (fixed string). Literal build: sift -- build or -e build.
- Search needs a prior index (
build). - Optional path arguments must lie under the indexed corpus root.
- No glob
-ghere yet;--no-filenameis used instead of-h(help).
Current Linux benchsuite snapshot against the Linux corpus.
- correctness parity: 11/11
siftfaster: 8/11rgfaster: 3/11
| Search class | Snapshot | Takeaway |
|---|---|---|
| Indexed literals | ~5.8x faster |
Trigram narrowing is doing the heavy lifting |
| Indexed word matches | ~5.6x faster |
Whole-word literal shaping stays cheap |
| Indexed alternation | ~2.6x faster |
Candidate narrowing plus build_many helps a lot |
| Full-scan Unicode | ~1.0x |
Near parity overall, but Greek classes still trail |
| Full-scan no-literal regex | ~0.6x |
Regex-engine full scans remain the hardest cases |
Fast path takeaways:
- indexed literal, word, suffix-literal, and alternation searches are decisively faster with
sift - full-scan Unicode class searches are the main remaining gap versus
rg - see
crates/core/benches/README.mdfor the benchmark and profiling workflow
cargo test --workspace --all-features
cargo clippy-check # see `.cargo/config.toml`CI (GitHub Actions): fmt, clippy with -D warnings, test on Linux, macOS, and Windows for pushes/PRs to main / master — .github/workflows/ci.yml.
