BlockStructuredSolvers_CPP

A high-performance C++/CUDA library for solving block-structured linear systems, specifically block tridiagonal matrices. This is a C++/CUDA implementation equivalent to the Julia BlockStructuredSolvers.jl package.

Features

Multiple Backends: CPU (using BLAS/LAPACK) and CUDA (using cuBLAS/cuSOLVER) support
Precision Options: Both single (float) and double precision
Optimized Algorithms: Block Cholesky factorization for symmetric positive definite systems
Modern C++: Uses C++17 features and modern design patterns
High Performance: Optimized implementations for each backend

Algorithm

The library implements a block Cholesky factorization for block tridiagonal matrices of the form:

[A₁  B₁   0   0  ⋯ ]
[B₁ᵀ A₂  B₂   0  ⋯ ]
[ 0  B₂ᵀ A₃  B₃  ⋯ ]
[ ⋮   ⋮   ⋱   ⋱   ⋱ ]

Where:

Aᵢ are n×n diagonal blocks (must be symmetric positive definite)
Bᵢ are n×n off-diagonal blocks
The system is solved using forward and backward substitution after factorization

Requirements

CPU Version

C++17 compatible compiler (GCC 7+, Clang 5+, MSVC 2017+)
CMake 3.18+
BLAS library (OpenBLAS, Intel MKL, or system BLAS)
LAPACK library

CUDA Version

NVIDIA CUDA Toolkit 11.0+
cuBLAS and cuSOLVER libraries
CUDA-capable GPU with compute capability 6.0+

Installation

Clone the repository:

git clone <repository-url>
cd BlockStructuredSolvers_CPP

Create build directory:

mkdir build && cd build

Configure with CMake:

# For CPU-only build
cmake .. -DCMAKE_BUILD_TYPE=Release

# For CUDA build
cmake .. -DCMAKE_BUILD_TYPE=Release -DENABLE_CUDA=ON

Build:

make -j$(nproc)

Run tests:

ctest

Quick Start

CPU Example

#include "block_structured_solvers.hpp"
using namespace block_structured_solvers;

int main() {
    const int N = 10;  // Number of blocks
    const int n = 4;   // Size of each block
    
    // Create CPU solver
    auto solver = create_solver_double(Backend::CPU, N, n);
    
    // Set up diagonal blocks (must be symmetric positive definite)
    for (int i = 0; i < N; ++i) {
        auto block = create_random_positive_definite_matrix<double>(n, 42 + i);
        solver->set_diagonal_block(i, block);
    }
    
    // Set up off-diagonal blocks
    for (int i = 0; i < N - 1; ++i) {
        std::vector<std::vector<double>> block(n, std::vector<double>(n, 0.1));
        solver->set_offdiagonal_block(i, block);
    }
    
    // Create right-hand side
    std::vector<std::vector<double>> rhs(N, std::vector<double>(n, 1.0));
    
    // Factorize and solve
    solver->factorize();
    std::vector<std::vector<double>> solution;
    solver->solve(rhs, solution);
    
    return 0;
}

CUDA Example

#include "block_structured_solvers.hpp"
using namespace block_structured_solvers;

int main() {
    const int N = 20;  // Number of blocks
    const int n = 8;   // Size of each block
    
    try {
        // Create CUDA solver
        auto solver = create_solver_double(Backend::CUDA, N, n);
        
        // Set up system (same as CPU example)
        // ... setup code ...
        
        // Factorize and solve on GPU
        solver->factorize();
        std::vector<std::vector<double>> solution;
        solver->solve(rhs, solution);
        
    } catch (const std::exception& e) {
        std::cerr << "CUDA error: " << e.what() << std::endl;
    }
    
    return 0;
}

API Reference

Factory Functions

// Create solver with specific type
template<typename T>
std::unique_ptr<BlockTriDiagData<T>> create_solver(Backend backend, int N, int n);

// Convenience functions
std::unique_ptr<BlockTriDiagData<float>> create_solver_float(Backend backend, int N, int n);
std::unique_ptr<BlockTriDiagData<double>> create_solver_double(Backend backend, int N, int n);

Core Interface

class BlockTriDiagData<T> {
public:
    // Set diagonal block i to the given matrix
    virtual void set_diagonal_block(int i, const std::vector<std::vector<T>>& block) = 0;
    
    // Set off-diagonal block i to the given matrix  
    virtual void set_offdiagonal_block(int i, const std::vector<std::vector<T>>& block) = 0;
    
    // Perform block Cholesky factorization
    virtual void factorize() = 0;
    
    // Solve the system Ax = b
    virtual void solve(const std::vector<std::vector<T>>& rhs, 
                      std::vector<std::vector<T>>& solution) = 0;
};

Utility Functions

// Create identity matrix
template<typename T>
std::vector<std::vector<T>> create_identity_matrix(int n);

// Create random positive definite matrix
template<typename T>
std::vector<std::vector<T>> create_random_positive_definite_matrix(int n, unsigned int seed = 42);

// Print matrix to console
template<typename T>
void print_matrix(const std::vector<std::vector<T>>& matrix);

Performance

The library is optimized for performance on both CPU and GPU:

CPU Implementation: Uses optimized BLAS/LAPACK routines (GEMM, TRSM, POTRF)
CUDA Implementation: Uses cuBLAS and cuSOLVER for maximum GPU performance
Memory Layout: Optimized data layouts for each backend
Error Handling: Comprehensive error checking with meaningful messages

Performance Tips

Problem Size: GPUs typically perform better with larger block sizes (n ≥ 16)
Precision: Single precision is ~2x faster than double precision on most GPUs
Memory: Pre-allocate and reuse solver objects when solving multiple systems
Batch Size: For multiple systems, consider using batched operations

Examples and Benchmarks

The repository includes several examples and benchmarks:

examples/example_cpu.cpp: Basic CPU usage
examples/example_cuda.cpp: Basic CUDA usage
benchmarks/benchmark.cpp: Performance comparison between CPU and CUDA
tests/test_solver.cpp: Unit tests for correctness

To run the examples:

./example_cpu
./example_cuda
./benchmark

Comparison with Julia Version

This C++/CUDA implementation provides equivalent functionality to the Julia BlockStructuredSolvers.jl package:

Feature	Julia Version	C++/CUDA Version
CPU Backend	✅ BLAS/LAPACK	✅ BLAS/LAPACK
CUDA Backend	✅ CUDA.jl	✅ cuBLAS/cuSOLVER
Precision	Float32/Float64	float/double
Algorithm	Block Cholesky	Block Cholesky
Performance	High	Comparable/Better

Contributing

Contributions are welcome! Please:

Fork the repository
Create a feature branch
Make your changes
Add tests for new functionality
Submit a pull request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Citation

If you use this library in your research, please cite:

@software{BlockStructuredSolvers_CPP,
  author = {David Jin, Alexis Montoison, and Sungho Shin},
  title = {BlockStructuredSolvers_CPP: A High-Performance C++/CUDA Library for Block-Structured Linear Systems},
  year = {2024},
  url = {https://github.com/username/BlockStructuredSolvers_CPP}
}

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
benchmarks		benchmarks
examples		examples
include		include
src		src
tests		tests
.gitignore		.gitignore
BATCHED_IMPLEMENTATION_STATUS.md		BATCHED_IMPLEMENTATION_STATUS.md
CMakeLists.txt		CMakeLists.txt
README.md		README.md
build.sh		build.sh
debug_test.cpp		debug_test.cpp
simple_debug.cpp		simple_debug.cpp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BlockStructuredSolvers_CPP

Features

Algorithm

Requirements

CPU Version

CUDA Version

Installation

Quick Start

CPU Example

CUDA Example

API Reference

Factory Functions

Core Interface

Utility Functions

Performance

Performance Tips

Examples and Benchmarks

Comparison with Julia Version

Contributing

License

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

BlockStructuredSolvers_CPP

Features

Algorithm

Requirements

CPU Version

CUDA Version

Installation

Quick Start

CPU Example

CUDA Example

API Reference

Factory Functions

Core Interface

Utility Functions

Performance

Performance Tips

Examples and Benchmarks

Comparison with Julia Version

Contributing

License

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages