Skip to content

[Bug]: hadamard dtype and inplace tranform #1631

@wenhuach21

Description

@wenhuach21

Problem Description

1 Prior work performs the Hadamard transform in float64, whereas our approach uses bfloat16.
2 The weight transform can be done in-place, eliminating the need to reapply it at every iteration of AR tuning.
3 Supports shared layers, such as MoE and fused QKV.
4 Uses true randomness matrix for each layer.
5 Fused with block-wise AR tuning to significantly reduce RAM usage (otherwise memory overhead is high).

Reproduction Steps

~

Environment Information

~

Error Logs

~

Additional Context

No response

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions