[AIMIGRAPHX-885] Add Releaxed Check for Concat fusions by TedThemistokleous · Pull Request #4724 · ROCm/AMDMIGraphX

TedThemistokleous · 2026-03-31T21:40:45Z

Motivation

Modified fuse_concats to allow larger inputs to be fused by relaxing the overhead s which will result in a concat with many noop inputs. Since we're doing a read in most of these cases these operations should be zero copy

Technical Details

In some cases for prediction model we concat many inputs. Our concat fusion misses these as there's a large amount of noop inputs.

Adding the additional short circuit here allows us to still perform a concat fusion for very large concats with large 100+ inputs, this is relevant when many inputs are from slice operations when reading from gather table embeddings.

Changelog Category

Add a CHANGELOG.md entry for any option other than Not Applicable

- Added: New functionality.
- Changed: Changes to existing functionality.
- Removed: Functionality or support that has been removed. (Compared to a previous release)
- Optimized: Component performance that has been optimized or improved.
- Resolved Issues: Known issues from a previous version that have been resolved.
- Not Applicable: This PR is not to be included in the changelog.

…copies to allow for point wise ops to be fused along with the noops and concats

pfultz2 · 2026-03-31T22:10:03Z

this is relevant when many inputs are from slice operations when reading from gather table embeddings.

It seems like there is a better operator to be used here. concat kernel is better optimized for reading different tensors not slices from the same tensor.

If this comes from gather, then why cant we combine this into a larger gather operator?

pfultz2 · 2026-03-31T22:15:07Z

src/fuse_concat.cpp

            }
        }
-        if(num_noops > std::max(size_t{1}, concat_ins->inputs().size() / 4))
+        if((num_noops > std::max(size_t{1}, concat_ins->inputs().size() / 4) and (concat_ins->inputs().size() < 100)))


I dont think this is a good heuristic in general. The noop copy can be eliminated completely in eliminate_concat. It doesnt work for your case because the inputs are slice.

Okay so it sounds like a better approach is to do this entire thing in eliminate_concat

codecov · 2026-03-31T23:15:58Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

Additional details and impacted files

@@           Coverage Diff            @@
##           develop    #4724   +/-   ##
========================================
  Coverage    92.29%   92.29%           
========================================
  Files          580      580           
  Lines        28687    28688    +1     
========================================
+ Hits         26474    26475    +1     
  Misses        2213     2213

Files with missing lines	Coverage Δ
src/fuse_concat.cpp	`96.67% <100.00%> (ø)`

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

TedThemistokleous · 2026-04-04T01:53:21Z

this is relevant when many inputs are from slice operations when reading from gather table embeddings.

It seems like there is a better operator to be used here. concat kernel is better optimized for reading different tensors not slices from the same tensor.

If this comes from gather, then why cant we combine this into a larger gather operator?

Cost/overhead. gathers get expensive and tend to scale poorly in this model. Using this eliminated a bunch of overhead by fusing the preceding ops into the concat even if they do appear noops

Add relaxed check for concat fusion to enable large concat with zero …

1c26d2f

…copies to allow for point wise ops to be fused along with the noops and concats

TedThemistokleous self-assigned this Mar 31, 2026

TedThemistokleous added simple small or simple changes Perf Improve labels Mar 31, 2026

TedThemistokleous changed the title ~~AIMIGRAPHX-885 Add Releaxed Check for Concat fusions~~ [AIMIGRAPHX-885] Add Releaxed Check for Concat fusions Mar 31, 2026

pfultz2 reviewed Mar 31, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AIMIGRAPHX-885] Add Releaxed Check for Concat fusions#4724

[AIMIGRAPHX-885] Add Releaxed Check for Concat fusions#4724
TedThemistokleous wants to merge 1 commit intodevelopfrom
large_embedding_concats

TedThemistokleous commented Mar 31, 2026

Uh oh!

pfultz2 commented Mar 31, 2026

Uh oh!

pfultz2 Mar 31, 2026

Uh oh!

TedThemistokleous Apr 4, 2026

Uh oh!

codecov bot commented Mar 31, 2026

Uh oh!

TedThemistokleous commented Apr 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

TedThemistokleous commented Mar 31, 2026

Motivation

Technical Details

Changelog Category

Uh oh!

pfultz2 commented Mar 31, 2026

Uh oh!

pfultz2 Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

TedThemistokleous Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Mar 31, 2026

Codecov Report

Uh oh!

TedThemistokleous commented Apr 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants