`sux`

A pure Rust implementation of succinct and compressed data structures.

This crate started as part of the Sux project; it contains also code ported from the DSI Utilities and new structures.

Highlights

Support for 32-bit and 64-bit architectures;
bit vectors and bit-field vectors;
several structures for rank and selection with different tradeoffs;
indexed dictionaries, including an implementation of the popular Elias–Fano representation of monotone sequences, and lists of strings compressed by prefix omission.
new state-of-the-art structures for static functions and static filters, scaling to trillions of keys, and providing very fast queries;
support for signed index functions;
monotone minimal perfect hash functions based on longest common prefixes;
partial arrays, that is, “arrays with holes”, implemented using ranking or Elias–Fano;
lists (e.g., compressed lists of integers).

The focus is on performance (e.g., there are unchecked versions of all methods and support for unaligned access) and on flexible composability (e.g., you can fine-tune your EliasFano instance by choosing different types of internal indices, and whether to index zeros or ones). Whenever possible, there are mapping methods that replace an underlying structure with another one, provided it is compatible.

This crate does not provide high-level genericity on bit vectors: bit vectors operations on words of type W are based on a combination of the BitLength trait, which provides the bit length, and on AsRef<W>/AsMut<W>, which provide access to the underlying data. This approach makes it possible to use any structure that implements these traits as a bit vector, and to implement your own bit vector if you need specific features (e.g., support for unaligned access).

To make rank/select structures composable, we parameterize them with a backend that needs to implement the Backend trait, which provides only the backend word as an associated type Word. Implementations can then use BitLength and AsRef<Self::Word>/AsMut<Self::Word> to access the backend using bit operations. Bit vectors and structures delegate these traits to their backend, so you can use any structure that implements the Backend trait as a backend for (further) rank/select structures.

ε-serde Support

All structures in this crate are designed to work well with ε-serde: in particular, once you have created and serialized them, you can easily map them into memory or load them in memory regions with specific mmap() attributes. Support for ε-serde is provided by the feature epserde, and support for memory mapping in ε-serde is provided by the mmap feature.

serde Support

All structures in this crate support serialization with serde. Support is gated by the feature serde.

Slice by Value Support

Wherever possible, we support the “slice by value” traits from the value-traits crate, which make it possible to treat in a manner similar to slices structures such as bit-field vectors or succinct representations.

`MemDbg`/`MemSize` Support

All structures in this crate support the MemDbg and MemSize traits from the mem_dbg crate, which provide convenient facilities for inspecting memory usage and debugging memory-related issues. For example, this is the output of mem_dbg() on an EliasFano instance:

11.15 MB 100.00% ⏺: sux::dict::elias_fano::EliasFano<sux::rank_sel::select_zero_adapt_const::SelectZeroAdaptConst<sux::rank_sel::select_adapt_const::SelectAdaptConst<sux::bits::bit_vec::BitVec<alloc::boxed::Box<[usize]>>>>>
7.500 MB  67.24% ├╴low_bits: sux::bits::bit_field_vec::BitFieldVec<usize, alloc::boxed::Box<[usize]>>
7.500 MB  67.24% │ ├╴bits: alloc::boxed::Box<[usize]>
    8  B   0.00% │ ├╴bit_width: usize
    8  B   0.00% │ ├╴mask: usize
    8  B   0.00% │ ╰╴len: usize
3.654 MB  32.76% ├╴high_bits: sux::rank_sel::select_zero_adapt_const::SelectZeroAdaptConst<sux::rank_sel::select_adapt_const::SelectAdaptConst<sux::bits::bit_vec::BitVec<alloc::boxed::Box<[usize]>>>>
3.379 MB  30.29% │ ├╴bits: sux::rank_sel::select_adapt_const::SelectAdaptConst<sux::bits::bit_vec::BitVec<alloc::boxed::Box<[usize]>>>
3.203 MB  28.72% │ │ ├╴bits: sux::bits::bit_vec::BitVec<alloc::boxed::Box<[usize]>>
3.203 MB  28.72% │ │ │ ├╴bits: alloc::boxed::Box<[usize]>
    8  B   0.00% │ │ │ ╰╴len: usize
175.8 kB   1.58% │ │ ├╴inventory: alloc::boxed::Box<[usize]>
   16  B   0.00% │ │ ╰╴spill: alloc::boxed::Box<[usize]>
274.7 kB   2.46% │ ├╴inventory: alloc::boxed::Box<[usize]>
   16  B   0.00% │ ╰╴spill: alloc::boxed::Box<[usize]>
    8  B   0.00% ├╴n: usize
    8  B   0.00% ├╴u: usize
    8  B   0.00% ╰╴l: usize

Binaries

A few binaries make it possible to build and serialize structures with ε-serde (e.g., rcl, vfunc, and vfilter). Moreover, there are examples benchmarking the structures (e.g., bench_rear_coded_list, bench_vfunc, and bench_vfilter). You have to use the feature cli to build them.

Features

The crate has the following features:

rayon: enables support for parallel iterators using the rayon crate (default);
flate2: enables support for gzip-compressed files in the lenders module (default);
zstd: enables support for zstd-compressed files in the lenders module (default);
deko: enables support for the deko crate, which provides dynamic detection of compressed files for the lenders module;
epserde: enables support for ε-serde;
serde: enables support for serde;
clap: enables the clap crate for command-line argument parsing;
cli: builds the binaries (implies clap, epserde, deko);
mmap: enables support for memory mapping in ε-serde (implies epserde);
aarch64_prefetch: enables prefetch support on aarch64 (requires nightly).

Note: The MemDbg and MemSize traits from the mem_dbg crate are always available as mem_dbg is a required dependency.

Benchmarks

A few benchmarks are available in the benches directory. The ones starting with bench_ can be just run as usual; for example,

cargo bench --bench bench_vfunc

The sux benchmark, which tests rank and select structures, is instead a CLI command with options. Try

cargo bench --bench sux -- --help

to see the available tests. For example, with

cargo bench --bench sux -- Rank9 -d 0.5 -l 100000,1000000,10000000

you can test the Rank9 structure with a density of 0.5 on a few bit sizes. Afterwards, you can generate an SVG plot and CSV data in the plots directory with

./python/plot_benches.py --benches-path ./target/criterion/ --plot-dir plots

You can then open the plots/plot.svg with a browser to see the results, or inspect the directory csv for CSV data. Note that as you run benchmarks, the results will cumulate in the target/criterion directory, so you can generate plots for multiple runs.

By specifying multiple structures (using also substring matching), you can compare the behavior of different structures. For example,

cargo bench --bench sux -- SelectSmall SelectAdapt0 -d 0.5 -l 100000,1000000,10000000

will test all variants of SelectSmall against a SelectAdapt with one (2⁰) u64 per subinventory. The plot will highlight the differences in performance:

./python/plot_benches.py --benches-path ./target/criterion/ --plot-dir plots

Acknowledgments

This software has been partially supported by project SERICS (PE00000014) under the NRRP MUR program funded by the EU - NGEU, and by project ANR COREGRAPHIE, grant ANR-20-CE23-0002 of the French Agence Nationale de la Recherche. Views and opinions expressed are however those of the authors only and do not necessarily reflect those of the European Union or the Italian MUR. Neither the European Union nor the Italian MUR can be held responsible for them.

Name		Name	Last commit message	Last commit date
Latest commit History 1,593 Commits
.cargo		.cargo
.github/workflows		.github/workflows
benches		benches
examples		examples
fuzz		fuzz
python		python
src		src
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.toml		Cargo.toml
LICENSE-Apache-2.0		LICENSE-Apache-2.0
LICENSE-LGPL-2.1-or-later		LICENSE-LGPL-2.1-or-later
README.md		README.md
rustfmt.toml		rustfmt.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

`sux`

Highlights

ε-serde Support

serde Support

Slice by Value Support

`MemDbg`/`MemSize` Support

Binaries

Features

Benchmarks

Acknowledgments

About

Licenses found

Uh oh!

Releases 27

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

sux

Highlights

ε-serde Support

serde Support

Slice by Value Support

MemDbg/MemSize Support

Binaries

Features

Benchmarks

Acknowledgments

About

Resources

License

Licenses found

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 27

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

`sux`

`MemDbg`/`MemSize` Support

Packages