Skip to content

[SYCL][InstCombine] Skip GEP canonicalization for JointMatrix types#21676

Draft
zhaomaosu wants to merge 2 commits intointel:syclfrom
zhaomaosu:fix-joint-matrix-gep
Draft

[SYCL][InstCombine] Skip GEP canonicalization for JointMatrix types#21676
zhaomaosu wants to merge 2 commits intointel:syclfrom
zhaomaosu:fix-joint-matrix-gep

Conversation

@zhaomaosu
Copy link
Copy Markdown
Contributor

GEP canonicalization in visitGetElementPtrInst rewrites single-index GEPs to use an [N x i8] stride based on DL.getTypeAllocSize(). For SPIR-V cooperative matrix types (spirv.CooperativeMatrixKHR), this allocation size is not meaningful — the type is opaque to the data layout — so the canonicalized stride is incorrect and produces invalid IR.

Add an IsMatrixType predicate that recognizes GEP element types that are, and skip the i8-stride canonicalization when this predicate matches.

GEP canonicalization in visitGetElementPtrInst rewrites single-index GEPs
to use an [N x i8] stride based on DL.getTypeAllocSize(). For SPIR-V
cooperative matrix types (spirv.CooperativeMatrixKHR), this allocation
size is not meaningful — the type is opaque to the data layout — so the
canonicalized stride is incorrect and produces invalid IR.

Add an IsMatrixType predicate that recognizes GEP element types that are,
and skip the i8-stride canonicalization when this predicate matches.
@zhaomaosu zhaomaosu requested a review from a team as a code owner April 2, 2026 07:03
@zhaomaosu zhaomaosu requested a review from a team as a code owner April 2, 2026 07:57
// Skip canonicalization for JointMatrix type. DL.getTypeAllocSize() will not
// return the true allocation size, so the canonicalized [N x i8] stride would
// be incorrect.
auto IsMatrixType = [](Type *Ty) -> bool {
Copy link
Copy Markdown
Contributor

@YuriPlyakhin YuriPlyakhin Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we skip canonicalization for any TargetExtType? Would getTypeAllocSize return true allocation size for other target extension types?
I think we might need to submit this fix to upstream LLVM to avoid unnecessary customizations in intel/llvm.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's ok for me to upstream the fix to community, I open this PR mainly for testing purpose.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's ok for me to upstream the fix to community, I open this PR mainly for testing purpose.

In this case you should open draft PR.

Copy link
Copy Markdown
Contributor

@YuriPlyakhin YuriPlyakhin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems fix breaks IGC compiler:

Failed Tests (3):
SYCL :: Matrix/SG32/joint_matrix_prefetch.cpp
SYCL :: Matrix/joint_matrix_bf16_fill_k_cache_prefetch.cpp
SYCL :: Matrix/joint_matrix_prefetch.cpp

Copy link
Copy Markdown
Contributor

@YuriPlyakhin YuriPlyakhin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another concern is there are other places, where GEP is canonialized, for example:
https://github.com/zhaomaosu/llvm/blob/6fb20d2f5bcc1610959b4747a2ecd1ef58ef7293/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp#L3404

If we prevent canonicalization for TargetExtType in one place, I think we need to prevent it in all places.

Previously we decided to handle this in backend compilers instead.
If we want to review this decision, it might be good idea to coordinate this change with IGC engineers as well and maybe to submit change like that to upstream LLVM.

@zhaomaosu
Copy link
Copy Markdown
Contributor Author

For failed prefetch tests, I dumped the IR before translated to spirv, looks like the problem mainly lies in these instructions:

  %add.ptr.i = getelementptr inbounds nuw [2 x i8], ptr addrspace(4) %0, i64 %mul7.i
  call @CooperativeMatrixPrefetchINTEL(..., ptr addrspace(4) %add.ptr.i, ...)

And I also dumped the IR from good compiler without community change 6ecbc0c96, these instructions should like:

  %add.ptr.i = getelementptr inbounds %"class.sycl::_V1::ext::oneapi::bfloat16", ptr addrspace(1) %_arg_A, i64 %mul7.i, !intel-tbaa !16
  %3 = addrspacecast ptr addrspace(1) %add.ptr.i to ptr addrspace(4)
  call @CooperativeMatrixPrefetchINTEL(..., ptr addrspace(4) %3, ...)

The bfloat16 type is also canonicalized to byte array type [2 x i8], but unlike joint matrix type, bfloat16 has determined size, so the IR should be legal. I'm not familiar with joint matrix spec and how IGC compiler handle this case. @dkhaldi, may I get your comments/suggestions.

BTW, IGC compiler reported error messages like:

error: Failed to resolve matrix prefetch pointer type
in kernel: 'typeinfo name for mult_kernel'
error: backend compiler failed build.

error: undefined reference to `__builtin_spriv_OpJointMatrixPrefetchINTEL_SG16_8x16_i0'
in function: '__builtin_spriv_OpJointMatrixPrefetchINTEL_SG16_8x16_i0' called by kernel: 'typeinfo name for mult_kernel'

@dkhaldi
Copy link
Copy Markdown
Contributor

dkhaldi commented Apr 3, 2026

For failed prefetch tests, I dumped the IR before translated to spirv, looks like the problem mainly lies in these instructions:

  %add.ptr.i = getelementptr inbounds nuw [2 x i8], ptr addrspace(4) %0, i64 %mul7.i
  call @CooperativeMatrixPrefetchINTEL(..., ptr addrspace(4) %add.ptr.i, ...)

And I also dumped the IR from good compiler without community change 6ecbc0c96, these instructions should like:

  %add.ptr.i = getelementptr inbounds %"class.sycl::_V1::ext::oneapi::bfloat16", ptr addrspace(1) %_arg_A, i64 %mul7.i, !intel-tbaa !16
  %3 = addrspacecast ptr addrspace(1) %add.ptr.i to ptr addrspace(4)
  call @CooperativeMatrixPrefetchINTEL(..., ptr addrspace(4) %3, ...)

The bfloat16 type is also canonicalized to byte array type [2 x i8], but unlike joint matrix type, bfloat16 has determined size, so the IR should be legal. I'm not familiar with joint matrix spec and how IGC compiler handle this case. @dkhaldi, may I get your comments/suggestions.

BTW, IGC compiler reported error messages like:

error: Failed to resolve matrix prefetch pointer type
in kernel: 'typeinfo name for mult_kernel'
error: backend compiler failed build.

error: undefined reference to `__builtin_spriv_OpJointMatrixPrefetchINTEL_SG16_8x16_i0'
in function: '__builtin_spriv_OpJointMatrixPrefetchINTEL_SG16_8x16_i0' called by kernel: 'typeinfo name for mult_kernel'

Since you disable GEP canonicalization, why we still see the array of type [2 x i8]?
yes IGC is also looking at this. Basically, IGC feedback is okay if we can disable this in syclos and avoid modifications in both backends.

@zhaomaosu zhaomaosu marked this pull request as draft April 6, 2026 13:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants