Enable NextStepDiffusion and support multi-device tuning for diffusion by xin3he · Pull Request #1640 · intel/auto-round

xin3he · 2026-03-30T12:59:35Z

Description

fix nextstep loading issue

example_prompt = "A REALISTIC PHOTOGRAPH OF A WALL WITH \"TOWARD AUTOREGRESSIVE IMAGE GENERATION WITH CONTINUOUS TOKENS AT SCALE\" PROMINENTLY DISPLAYED"

Raw model output:

W4A16 model output with torch backend on CPU:

W4A16 model output with `gptqmodel:marlin` backend on CUDA:

Type of Change

Related Issues

Fixes or relates to #

Checklist Before Submitting

My code has been tested locally.
Documentation has been updated as needed.
New or updated tests are included where applicable.

Signed-off-by: Xin He <xin3.he@intel.com>

Copilot

Pull request overview

Fixes model loading for the “nextstep” model type by selecting an appropriate AutoModel loader, and adjusts multimodal key detection to recognize “image”-named components.

Changes:

Force AutoModel for model_type == "nextstep" during MLLM model loading.
Add "image" to MM_KEYS to broaden multimodal component detection.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
`auto_round/utils/model.py`	Adds a NextStep-specific loader class override to resolve loading failures.
`auto_round/utils/common.py`	Extends multimodal key matching to include `"image"` for downstream detection/mapping.

auto_round/utils/model.py

auto_round/utils/common.py

Signed-off-by: Xin He <xin3.he@intel.com>

xin3he · 2026-03-30T13:52:54Z

The exllama backend has accuracy issue for nextstep generation.
The marlin backend requires main branch so fix it in this PR.
cc @wenhuach21

auto_round_extension/cuda/gptqmodel_marlin.py

wenhuach21 · 2026-03-31T01:44:35Z

better add next_step to mllm support matrix

xin3he · 2026-03-31T08:07:09Z

I need to upstream a model before updating the support matrix (requires model link).

wenhuach21 · 2026-03-31T08:57:43Z

I need to upstream a model before updating the support matrix (requires model link).

If the model’s license allows upstreaming, we can upload it. Otherwise, we can leave the link blank.

xin3he · 2026-04-03T01:53:36Z

The status has been reverted to "Draft," as only RTN is currently supported; upstream adaptation and optimization work is currently underway.

…model loading for NextStep Signed-off-by: Xin He <xin3.he@intel.com>

Signed-off-by: Xin He <xin3.he@intel.com>

… gptqmodel fix Signed-off-by: Xin He <xin3.he@intel.com>

for more information, see https://pre-commit.ci

Signed-off-by: Xin He <xin3.he@intel.com>

…imports Signed-off-by: Xin He <xin3.he@intel.com>

for more information, see https://pre-commit.ci

xin3he · 2026-04-08T07:54:49Z

/azp run Unit-Test-CUDA-AutoRound

azure-pipelines · 2026-04-08T07:54:59Z

Azure Pipelines successfully started running 1 pipeline(s).

fix nextstep loading issue

2bc3697

Signed-off-by: Xin He <xin3.he@intel.com>

xin3he requested review from Copilot, mengniwang95 and n1ck-guo March 30, 2026 12:59

Copilot started reviewing on behalf of xin3he March 30, 2026 13:11 View session

Copilot AI reviewed Mar 30, 2026

View reviewed changes

auto_round/utils/model.py Outdated Show resolved Hide resolved

auto_round/utils/common.py Outdated Show resolved Hide resolved

support 6.0.0 gptqmodel

2bb744a

Signed-off-by: Xin He <xin3.he@intel.com>

xin3he requested a review from wenhuach21 March 30, 2026 13:57

wenhuach21 reviewed Mar 31, 2026

View reviewed changes

auto_round_extension/cuda/gptqmodel_marlin.py Outdated Show resolved Hide resolved

xin3he marked this pull request as draft April 3, 2026 01:52

xin3he added 2 commits April 3, 2026 09:55

Enhance DiffusionCompressor with custom pipeline support and improve …

576e130

…model loading for NextStep Signed-off-by: Xin He <xin3.he@intel.com>

Merge branch 'main' into xinhe/3-30a

010d4c0

xin3he mentioned this pull request Apr 7, 2026

[Feature]: support NextStepPipeline for tuning #1664

Open

xin3he added 2 commits April 7, 2026 12:23

fix bug for nextstep

bb22b29

Signed-off-by: Xin He <xin3.he@intel.com>

set self.num_inference_steps=1 for calibration

87c4037

Signed-off-by: Xin He <xin3.he@intel.com>

xin3he force-pushed the xinhe/3-30a branch from d55bab0 to 1c01200 Compare April 8, 2026 02:44

xin3he and others added 5 commits April 8, 2026 02:46

Remove unused function; add CUDA CI for diffusion tuning test; revert…

1c01200

… gptqmodel fix Signed-off-by: Xin He <xin3.he@intel.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

240df28

for more information, see https://pre-commit.ci

fix bug

04bb910

Signed-off-by: Xin He <xin3.he@intel.com>

Refactor device dispatching logic; remove unused function and update …

d84c2d2

…imports Signed-off-by: Xin He <xin3.he@intel.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

8632802

for more information, see https://pre-commit.ci

xin3he changed the title ~~fix nextstep loading issue~~ Enable NextStepDiffusion and support multi-device tuning for diffusion Apr 8, 2026

xin3he requested a review from changwangss April 8, 2026 07:51

xin3he marked this pull request as ready for review April 8, 2026 07:54

xin3he requested a review from wenhuach21 April 8, 2026 07:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable NextStepDiffusion and support multi-device tuning for diffusion#1640

Enable NextStepDiffusion and support multi-device tuning for diffusion#1640
xin3he wants to merge 11 commits intomainfrom
xinhe/3-30a

xin3he commented Mar 30, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

xin3he commented Mar 30, 2026 •

edited

Loading

Uh oh!

Uh oh!

wenhuach21 commented Mar 31, 2026

Uh oh!

xin3he commented Mar 31, 2026

Uh oh!

wenhuach21 commented Mar 31, 2026

Uh oh!

xin3he commented Apr 3, 2026

Uh oh!

xin3he commented Apr 8, 2026

Uh oh!

azure-pipelines bot commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

xin3he commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Raw model output:

W4A16 model output with torch backend on CPU:

W4A16 model output with gptqmodel:marlin backend on CUDA:

Type of Change

Related Issues

Checklist Before Submitting

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

xin3he commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

wenhuach21 commented Mar 31, 2026

Uh oh!

xin3he commented Mar 31, 2026

Uh oh!

wenhuach21 commented Mar 31, 2026

Uh oh!

xin3he commented Apr 3, 2026

Uh oh!

xin3he commented Apr 8, 2026

Uh oh!

azure-pipelines bot commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

xin3he commented Mar 30, 2026 •

edited

Loading

W4A16 model output with `gptqmodel:marlin` backend on CUDA:

xin3he commented Mar 30, 2026 •

edited

Loading