Beating main-memory bandwidths in geospatial pipelines with fast in-memory compression.
The fused compressors have been upgraded to remove redundant stores. Now, we achieve up to 4x speedups:
src/codecs/generic/*: base codec interfaces (StatefulIntegerCodec, CompositeStatefulIntegerCodec, DirectAccessCodec)
src/codecs/int32/*: codec implementations for int32_t data
src/codecs/int32/codec_collection.h: bundled codec registry (InitCodecs)
Main programs:
bench/bench_comp.cpp: benchmark codecs (compression ratio and speed)bench/bench_pipeline.cpp: benchmark geospatial pipelines (decode + access transformation)tests/test_int32_codecs.cpp: test int32 codecstests/test_remappings.cpp: verifies Morton and zigzag remappings
Additional files:
src/util.h,src/transformations.h,src/remappings.h: C++ utilitiessrc/bench_utils.h: shared benchmark helpers (access transformations,RunningStats, GDAL block sampling)bench/bench_gdal_utils.h: GDAL raster I/O helperspy/*: Python utilitiessh/*: Shell performance-monitoring utilities
We assume a Linux environment with GCC 13+ (C++23) and CMake 3.20+.
- Install packages with
apt-get(you may need more — debug as needed):g++ libgdal-dev python3-gdal liblz4-dev libzstd-dev zlib1g-dev liblzma-dev - Obtain the submodules in
external/and build them:git submodule update --init --recursive # build FastPFor cmake -S external/FastPFor -B external/FastPFor/build && cmake --build external/FastPFor/build # build simdcomp make -C external/simdcomp # build MaskedVByte, StreamVByte, TurboPFor, FrameOfReference similarly - Configure and build:
Binaries are placed in
cmake -B build cmake --build buildbuild/:bench_comp,bench_pipeline,test_comp,test_remappings.
To use the fused codec variants (SimdCompFusedCodec, FastPForFusedCodec) which write a decode-time sum into the overflow buffer:
- Re-build
external/FastPForandexternal/simdcompfrom these forks:- FastPFor: https://github.com/omarathon/FastPFor
- simdcomp: https://github.com/omarathon/simdcomp
- Rebuild:
cmake --build build
The fused variants are registered alongside the originals in src/codecs/int32/codec_collection.h and are available under the names simdcomp_fused and FastPFor_fused_<codec>.
source hpc/modules.sh- Adjust
CMakeLists.txtfor the HPC compiler/flags as needed (seehpc/for reference) cmake -B build && cmake --build build
Licence:
- MIT for all files in
src/codecs/, except the TurboPFor wrapper (turbopfor_codecs.h) and LZ4 wrapper (lz4_codecs.h) which are GPL. - GPL for everything else.
Full repo/data/report on request
