Added planar types to speed up complex half precision GEMMs#1142
Added planar types to speed up complex half precision GEMMs#1142cliffburdick wants to merge 5 commits intomainfrom
Conversation
Greptile SummaryThis PR introduces Key observations:
Confidence Score: 3/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant User
participant SetOp
participant PlanarTensor as tensor_t PlanarType
participant PlanarProxy as PlanarComplexProxy
participant MatMulCUDA
participant cuBLAS
Note over User,cuBLAS: Element-wise assign to planar output
User->>SetOp: run(exec)
SetOp->>SetOp: _internal_mapply(i,j) const with mutable out_
SetOp->>PlanarTensor: out_.operator() non-const via mutable
PlanarTensor-->>PlanarProxy: PlanarComplexProxy{this, offset}
SetOp->>PlanarProxy: proxy = get_value(op_, i, j)
PlanarProxy->>PlanarTensor: StorePlanarComplex(offset, val)
PlanarTensor->>PlanarTensor: base[offset]=real, base[offset+N]=imag
Note over User,cuBLAS: GEMM with pre-planar inputs
User->>MatMulCUDA: Execute(a_planar, b_planar, c_planar, stream)
MatMulCUDA->>MatMulCUDA: a_is_planar=true, skip conversion
MatMulCUDA->>MatMulCUDA: b_is_planar=true, skip conversion
MatMulCUDA->>MatMulCUDA: c_is_planar=true, c_adj.Reset(c.Data())
MatMulCUDA->>MatMulCUDA: params.ldc = c.Size(RANK-1)
MatMulCUDA->>cuBLAS: cublasGemmEx CUDA_C_16F planar pointers
cuBLAS-->>MatMulCUDA: writes planar result to c.Data() directly
MatMulCUDA->>MatMulCUDA: c_is_planar=true, skip interleaved conversion
MatMulCUDA-->>User: done
Note over User,cuBLAS: Read back planar to interleaved
User->>SetOp: run(exec) for c_interleaved = c_planar
SetOp->>PlanarTensor: get_value(c_planar, i, j) const
PlanarTensor->>PlanarTensor: LoadPlanarComplex(offset)
PlanarTensor-->>SetOp: T with real=base[off], imag=base[off+N]
SetOp->>SetOp: store to c_interleaved(i,j)
|
|
/build |
1 similar comment
|
/build |
|
/build |
No description provided.