[Bug]: hadamard dtype and inplace tranform

### Problem Description

1 Prior work performs the Hadamard transform in float64, whereas our approach uses bfloat16.
2 The weight transform can be done in-place, eliminating the need to reapply it at every iteration of AR tuning.
3 Supports shared layers, such as MoE and fused QKV.
4 Uses true randomness matrix for each layer.
5 Fused with block-wise AR tuning to significantly reduce RAM usage (otherwise memory overhead is high).
### Reproduction Steps

~

### Environment Information

~

### Error Logs

```shell
~
```

### Additional Context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: hadamard dtype and inplace tranform #1631

Problem Description

Reproduction Steps

Environment Information

Error Logs

Additional Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug]: hadamard dtype and inplace tranform #1631

Description

Problem Description

Reproduction Steps

Environment Information

Error Logs

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions