[megatron] fix vit_attn_impl megatron (compat mcore-bridge)#9019
Conversation
There was a problem hiding this comment.
Code Review
This pull request renames attn_impl to vit_attn_impl and gradient_checkpointing_kwargs to vit_gradient_checkpointing_kwargs in the documentation, argument definitions, and trainer logic to specify their application to the ViT component. Feedback was provided to align the default value of vit_attn_impl in the code with the documentation's stated default of 'flash_attn'.
| # visual | ||
| vit_gradient_checkpointing: Optional[bool] = None | ||
| vit_gradient_checkpointing_kwargs: Optional[Union[dict, str]] = None | ||
| vit_attn_impl: Optional[str] = None |
There was a problem hiding this comment.
The documentation (both Chinese and English versions) specifies that the default value for vit_attn_impl is 'flash_attn', but the code currently initializes it to None. To ensure consistency between the documentation and the actual behavior, consider setting the default value to 'flash_attn' directly in the dataclass definition.
| vit_attn_impl: Optional[str] = None | |
| vit_attn_impl: Optional[str] = 'flash_attn' |
No description provided.