To translate Sparse Dialect code into SCF loops use the follows command:
mlir-opt sparse_mttkrp.mlir --sparsification-and-bufferization --sparse-vectorization="vl=4" -o scf_loops.mlir
To translate the generated SCF+Vector loop into LLVM dialect use the following command:
mlir-opt scf_loops.mlir \
--canonicalize \
--cse \
--loop-invariant-code-motion \
--lower-vector-mask \
--convert-vector-to-scf \
--canonicalize \
--cse \
--expand-realloc \
--sparse-storage-specifier-to-llvm \
--convert-linalg-to-loops \
--lower-affine \
--canonicalize \
--cse \
--convert-scf-to-cf \
--expand-strided-metadata \
--finalize-memref-to-llvm \
--convert-vector-to-llvm="enable-x86vector=1" \
--convert-math-to-llvm \
--convert-arith-to-llvm \
--convert-func-to-llvm \
--convert-cf-to-llvm \
--reconcile-unrealized-casts
-o llvm_dialect.mlir
To translate a sparse dialect program directly into LLVM IR use the following command:
mlir-opt sparse_mttkrp.mlir --sparsifier | mlir-translate --mlir-to-llvmir -o mttkrp.ll
clang -O3 mttkrp.ll -L"/home/kabilan/llvm-dev/lib" -lmlir_c_runner_utils -lmlir_runner_utils -Wl,-rpath,"/home/kabilan/llvm-dev/lib" -o mttkrp_benchmark
-
Why not use a BlockSparse encoding?
- It's the wrong abstraction for tensors other than 2D. BSR is a 2D matrix format --
(d0 floordiv B, d1 floordiv B, d0 mod B, d1 mod B). There is no block dimension to exploit. The encoding simply doesnn't apply. - It moves the intersection problem, not solves it. The scalar
scf.whileco-iteration loop reappears at the block-coordinate level. We still get the same branchy, un-vectorizable pointer-chasing, just a coarser granularity. The fundamental issue is untouched. - It introduces phantom nonzeros. Coordinates in our
.tnsfile are arbitrary. Forcing them into fixed-size blocks requires storing zeros that don't exist in the original data, inflating memory and bandwidth for no computational gain.
- It's the wrong abstraction for tensors other than 2D. BSR is a 2D matrix format --
-
What are the conditions under which BlockSparse will be beneficial compared to this vectorization?
For this, the data is natually block-structured. If real nonzeros cluster into BxB tiles with >50% density inside each tile, fill-in is cheap and the dense inner dimension give the auto-vectorizer free SIMD.
- What are the condiions that direct sparse dialect will be beneficial comparared to this vectorization?
cmake -G Ninja .. -DMLIR_DIR=/home/kabilan/llvm-dev/lib/cmake/mlir ninja ./bin/splyce-opt --splyce="vector-width=8" --debug-only=splyce-vectorize ../test/coiter_scf.mlir -o /dev/null
This guide covers two paths:
- Build LLVM+MLIR from source (required if you don't have an install)
- Use a pre-built LLVM install (faster, skip to "Build the pass")
Tested against LLVM 23.0.0. The pass uses no deprecated APIs so it should work on any LLVM >= 23.0.0git.
Note: clang and lld are installed to serve as a fast host compiler/linker to bootstrap the MLIR build and to compile the Splyce pass later.
sudo apt-get install -y \
cmake ninja-build clang lld \
python3 python3-pip \
git zlib1g-devbrew install cmake ninja llvm python3sudo dnf install -y cmake ninja-build clang lld python3 git zlib-devel(Skip to PATH B if you already have an MLIR install with cmake files)
git clone --depth=1 https://github.com/llvm/llvm-project.git
cd llvm-project-DLLVM_ENABLE_PROJECTS="mlir" — build MLIR alongside LLVM core -DLLVM_TARGETS_TO_BUILD="X86" — only native target; add AArch64/RISCV as needed -DLLVM_ENABLE_ASSERTIONS=ON — required for MLIR pattern matching debug output -DCMAKE_BUILD_TYPE=Release — use RelWithDebInfo if you want debuggable IR
cmake -S llvm -B build \
-G Ninja \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_C_COMPILER=clang \
-DCMAKE_CXX_COMPILER=clang++ \
-DLLVM_ENABLE_PROJECTS="mlir" \
-DLLVM_TARGETS_TO_BUILD="X86" \
-DLLVM_ENABLE_ASSERTIONS=ON \
-DLLVM_INSTALL_UTILS=ON \
-DCMAKE_INSTALL_PREFIX=$HOME/llvm-installninja -C build installexport LLVM_INSTALL=$HOME/llvm-install
export PATH=$LLVM_INSTALL/bin:$PATH
cd .. # back to your workspace rootIf your distro ships MLIR cmake files (Ubuntu 22.04+ with llvm-XX-dev):
Ubuntu / Debian:
sudo apt-get install llvm-19-dev mlir-19-tools libmlir-19-dev
export LLVM_INSTALL=/usr/lib/llvm-XX
export PATH=$LLVM_INSTALL/bin:$PATHHomebrew (macOS):
brew install llvm
export LLVM_INSTALL=$(brew --prefix llvm)
export PATH=$LLVM_INSTALL/bin:$PATHls $LLVM_INSTALL/lib/cmake/mlir/MLIRConfig.cmake # must exist
ls $LLVM_INSTALL/lib/cmake/llvm/LLVMConfig.cmake # must existClone or enter the project directory. (If you received the files directly, just cd into them.)
cd splycecmake -S . -B build \
-G Ninja \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_C_COMPILER=clang \
-DCMAKE_CXX_COMPILER=clang++ \
-DMLIR_DIR=$LLVM_INSTALL/lib/cmake/mlir \
-DLLVM_DIR=$LLVM_INSTALL/lib/cmake/llvmninja -C build
# Verify the tool was built.
./build/bin/splyce-opt --help | grep splyce
# Expected output:
# --splyce - Vectorize scf.while co-iteration loops using Vector dialect./build/bin/splyce-opt \
"--splyce=target-function=mttkrp_kernel" \
--debug-only=splyce-vectorize \
../../playground/sparse_dialect/mttkrp_scf.mlirRemove the --debug-only=splyce-vectorize to avoid debug output.
./build/bin/splyce-opt \
"--splyce=target-function=mttkrp_kernel" \
--debug-only=splyce-vectorize \
../../playground/sparse_dialect/mttkrp_scf.mlir
-o /tmp/coiter_vec_out.mlir
cat /tmp/coiter_vec_out.mlir./build/bin/splyce-opt --splyce="vector-width=8 min-density=0.1 runtime-density-threshold=0.5 target-function=mttkrp_kernel" input.mlir