Skip to content

[megatron] fix vit_attn_impl megatron (compat mcore-bridge)#9019

Merged
Jintao-Huang merged 1 commit intomodelscope:mainfrom
Jintao-Huang:fix_vit_attn_impl_megatron_2
Apr 5, 2026
Merged

[megatron] fix vit_attn_impl megatron (compat mcore-bridge)#9019
Jintao-Huang merged 1 commit intomodelscope:mainfrom
Jintao-Huang:fix_vit_attn_impl_megatron_2

Conversation

@Jintao-Huang
Copy link
Copy Markdown
Collaborator

No description provided.

@Jintao-Huang Jintao-Huang changed the title [megatron] fix vit_attn_impl megatron [megatron] fix vit_attn_impl megatron (compat mcore-bridge) Apr 5, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request renames attn_impl to vit_attn_impl and gradient_checkpointing_kwargs to vit_gradient_checkpointing_kwargs in the documentation, argument definitions, and trainer logic to specify their application to the ViT component. Feedback was provided to align the default value of vit_attn_impl in the code with the documentation's stated default of 'flash_attn'.

# visual
vit_gradient_checkpointing: Optional[bool] = None
vit_gradient_checkpointing_kwargs: Optional[Union[dict, str]] = None
vit_attn_impl: Optional[str] = None
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The documentation (both Chinese and English versions) specifies that the default value for vit_attn_impl is 'flash_attn', but the code currently initializes it to None. To ensure consistency between the documentation and the actual behavior, consider setting the default value to 'flash_attn' directly in the dataclass definition.

Suggested change
vit_attn_impl: Optional[str] = None
vit_attn_impl: Optional[str] = 'flash_attn'

@Jintao-Huang Jintao-Huang merged commit 03b5297 into modelscope:main Apr 5, 2026
2 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant