Skip to content

mit-shin-group/BlockStructuredSolvers.cpp

Repository files navigation

BlockStructuredSolvers_CPP

A high-performance C++/CUDA library for solving block-structured linear systems, specifically block tridiagonal matrices. This is a C++/CUDA implementation equivalent to the Julia BlockStructuredSolvers.jl package.

Features

  • Multiple Backends: CPU (using BLAS/LAPACK) and CUDA (using cuBLAS/cuSOLVER) support
  • Precision Options: Both single (float) and double precision
  • Optimized Algorithms: Block Cholesky factorization for symmetric positive definite systems
  • Modern C++: Uses C++17 features and modern design patterns
  • High Performance: Optimized implementations for each backend

Algorithm

The library implements a block Cholesky factorization for block tridiagonal matrices of the form:

[A₁  B₁   0   0  ⋯ ]
[B₁ᵀ A₂  B₂   0  ⋯ ]
[ 0  B₂ᵀ A₃  B₃  ⋯ ]
[ ⋮   ⋮   ⋱   ⋱   ⋱ ]

Where:

  • Aᵢ are n×n diagonal blocks (must be symmetric positive definite)
  • Bᵢ are n×n off-diagonal blocks
  • The system is solved using forward and backward substitution after factorization

Requirements

CPU Version

  • C++17 compatible compiler (GCC 7+, Clang 5+, MSVC 2017+)
  • CMake 3.18+
  • BLAS library (OpenBLAS, Intel MKL, or system BLAS)
  • LAPACK library

CUDA Version

  • NVIDIA CUDA Toolkit 11.0+
  • cuBLAS and cuSOLVER libraries
  • CUDA-capable GPU with compute capability 6.0+

Installation

  1. Clone the repository:
git clone <repository-url>
cd BlockStructuredSolvers_CPP
  1. Create build directory:
mkdir build && cd build
  1. Configure with CMake:
# For CPU-only build
cmake .. -DCMAKE_BUILD_TYPE=Release

# For CUDA build
cmake .. -DCMAKE_BUILD_TYPE=Release -DENABLE_CUDA=ON
  1. Build:
make -j$(nproc)
  1. Run tests:
ctest

Quick Start

CPU Example

#include "block_structured_solvers.hpp"
using namespace block_structured_solvers;

int main() {
    const int N = 10;  // Number of blocks
    const int n = 4;   // Size of each block
    
    // Create CPU solver
    auto solver = create_solver_double(Backend::CPU, N, n);
    
    // Set up diagonal blocks (must be symmetric positive definite)
    for (int i = 0; i < N; ++i) {
        auto block = create_random_positive_definite_matrix<double>(n, 42 + i);
        solver->set_diagonal_block(i, block);
    }
    
    // Set up off-diagonal blocks
    for (int i = 0; i < N - 1; ++i) {
        std::vector<std::vector<double>> block(n, std::vector<double>(n, 0.1));
        solver->set_offdiagonal_block(i, block);
    }
    
    // Create right-hand side
    std::vector<std::vector<double>> rhs(N, std::vector<double>(n, 1.0));
    
    // Factorize and solve
    solver->factorize();
    std::vector<std::vector<double>> solution;
    solver->solve(rhs, solution);
    
    return 0;
}

CUDA Example

#include "block_structured_solvers.hpp"
using namespace block_structured_solvers;

int main() {
    const int N = 20;  // Number of blocks
    const int n = 8;   // Size of each block
    
    try {
        // Create CUDA solver
        auto solver = create_solver_double(Backend::CUDA, N, n);
        
        // Set up system (same as CPU example)
        // ... setup code ...
        
        // Factorize and solve on GPU
        solver->factorize();
        std::vector<std::vector<double>> solution;
        solver->solve(rhs, solution);
        
    } catch (const std::exception& e) {
        std::cerr << "CUDA error: " << e.what() << std::endl;
    }
    
    return 0;
}

API Reference

Factory Functions

// Create solver with specific type
template<typename T>
std::unique_ptr<BlockTriDiagData<T>> create_solver(Backend backend, int N, int n);

// Convenience functions
std::unique_ptr<BlockTriDiagData<float>> create_solver_float(Backend backend, int N, int n);
std::unique_ptr<BlockTriDiagData<double>> create_solver_double(Backend backend, int N, int n);

Core Interface

class BlockTriDiagData<T> {
public:
    // Set diagonal block i to the given matrix
    virtual void set_diagonal_block(int i, const std::vector<std::vector<T>>& block) = 0;
    
    // Set off-diagonal block i to the given matrix  
    virtual void set_offdiagonal_block(int i, const std::vector<std::vector<T>>& block) = 0;
    
    // Perform block Cholesky factorization
    virtual void factorize() = 0;
    
    // Solve the system Ax = b
    virtual void solve(const std::vector<std::vector<T>>& rhs, 
                      std::vector<std::vector<T>>& solution) = 0;
};

Utility Functions

// Create identity matrix
template<typename T>
std::vector<std::vector<T>> create_identity_matrix(int n);

// Create random positive definite matrix
template<typename T>
std::vector<std::vector<T>> create_random_positive_definite_matrix(int n, unsigned int seed = 42);

// Print matrix to console
template<typename T>
void print_matrix(const std::vector<std::vector<T>>& matrix);

Performance

The library is optimized for performance on both CPU and GPU:

  • CPU Implementation: Uses optimized BLAS/LAPACK routines (GEMM, TRSM, POTRF)
  • CUDA Implementation: Uses cuBLAS and cuSOLVER for maximum GPU performance
  • Memory Layout: Optimized data layouts for each backend
  • Error Handling: Comprehensive error checking with meaningful messages

Performance Tips

  1. Problem Size: GPUs typically perform better with larger block sizes (n ≥ 16)
  2. Precision: Single precision is ~2x faster than double precision on most GPUs
  3. Memory: Pre-allocate and reuse solver objects when solving multiple systems
  4. Batch Size: For multiple systems, consider using batched operations

Examples and Benchmarks

The repository includes several examples and benchmarks:

  • examples/example_cpu.cpp: Basic CPU usage
  • examples/example_cuda.cpp: Basic CUDA usage
  • benchmarks/benchmark.cpp: Performance comparison between CPU and CUDA
  • tests/test_solver.cpp: Unit tests for correctness

To run the examples:

./example_cpu
./example_cuda
./benchmark

Comparison with Julia Version

This C++/CUDA implementation provides equivalent functionality to the Julia BlockStructuredSolvers.jl package:

Feature Julia Version C++/CUDA Version
CPU Backend ✅ BLAS/LAPACK ✅ BLAS/LAPACK
CUDA Backend ✅ CUDA.jl ✅ cuBLAS/cuSOLVER
Precision Float32/Float64 float/double
Algorithm Block Cholesky Block Cholesky
Performance High Comparable/Better

Contributing

Contributions are welcome! Please:

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests for new functionality
  5. Submit a pull request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Citation

If you use this library in your research, please cite:

@software{BlockStructuredSolvers_CPP,
  author = {David Jin, Alexis Montoison, and Sungho Shin},
  title = {BlockStructuredSolvers_CPP: A High-Performance C++/CUDA Library for Block-Structured Linear Systems},
  year = {2024},
  url = {https://github.com/username/BlockStructuredSolvers_CPP}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors