fix(a5): normalize risky vec col-major TMOV to row-major via treshape#440
fix(a5): normalize risky vec col-major TMOV to row-major via treshape#440TaoTao-real wants to merge 13 commits intohw-native-sys:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces new test samples and regression guards for TMOV alignment on the A5 architecture, specifically covering 16x1 column-major and 1x16 row-major tile configurations. The changes include new .pto and .py test files and updates to the runop.sh test runner to validate the emitted C++ code. A critical issue was identified in the test runner where an undefined variable target_arch_lc would cause the new tests to be incorrectly skipped; a suggestion was provided to use a consistent inline transformation for the architecture check.
| fi | ||
| if [[ ( "$base" == "test_tmov_col_major_16x1_align_a5" || \ | ||
| "$base" == "test_tmov_row_major_1x16_control_a5" ) && \ | ||
| "${target_arch_lc}" != "a5" ]]; then |
There was a problem hiding this comment.
The variable target_arch_lc is not defined in this script. This will cause the condition to always evaluate to true (as an undefined variable expands to an empty string), resulting in these tests being skipped even when the target architecture is correctly set to a5. You should use the inline transformation consistent with the rest of the file to check the architecture.
| "${target_arch_lc}" != "a5" ]]; then | |
| "$(printf '%s' "$target_arch" | tr '[:upper:]' '[:lower:]')" != "a5" ]]; then |
|
/run a5 test_tmov_col_major_16x1_align_a5 test_tmov_row_major_1x16_control_a5 --pto-level=level3 |
A5 板测失败
日志尾部 |
|
/run a5 test_tmov_col_major_16x1_align_a5 test_tmov_row_major_1x16_control_a5 --pto-level=level3 |
A5 板测失败
日志尾部 |
|
/run a5 test_tmov_col_major_16x1_align_a5 --pto-level=level3 |
A5 板测失败
日志尾部 |
|
/run a5 test_tmov_col_major_16x1_align_a5 --pto-level=level3 |
A5 板测成功
|
|
/run a5 test_tmov_row_major_1x16_control_a5 --pto-level=level3 |
A5 板测成功
|
|
/run a5 rmsnorm_incore_0 --pto-level=level3 |
A5 板测失败
失败用例
|
A5 板测失败详情:PR #440rmsnorm_incore_0
|
|
/run a5 rmsnorm_incore_0 --pto-level=level3 |
A5 板测失败
失败用例
|
A5 板测失败详情:PR #440rmsnorm_incore_0
|
|
/run a5 rmsnorm_incore_0 --pto-level=level3 |
A5 板测失败
失败用例
|
A5 板测失败详情:PR #440rmsnorm_incore_0
|
|
/run a5 rmsnorm_incore_0 --pto-level=level3 |
A5 板测成功
|
|
/run a5 rmsnorm_incore_0 --pto-level=level3 |
A5 板测成功
|
|
/run a5 rmsnorm_incore_0 --pto-level=level3 |
A5 板测成功
|
|
/run a5 rmsnorm_incore_0 --pto-level=level3 |
A5 板测失败
失败用例
|
A5 板测失败详情:PR #440rmsnorm_incore_0
|
|
/run a5 test_tmov_col_major_16x1_align_a5 --pto-level=level3 |
A5 板测失败
失败用例
|
A5 板测失败详情:PR #440test_tmov_col_major_16x1_align_a5
|
Summary
PTOA5NormalizeTMovPassto normalize risky A5pto.tmovpatterns:vec -> vecand both src/dst tiles arecol_major + none_box.treshape(row_major src) + treshape(row_major dst) + tmov(row_major -> row_major).ptoaspipeline beforePTOViewToMemref.test/samples/runop.shfor:test_tmov_col_major_16x1_align_a5test_tmov_row_major_1x16_control_a5decode_projection_incore_0rmsnorm_incore_0decode_projection_incore_0sample intotest/samples/Syncfor A5 regression coverage.test/npu_validation/scripts/generate_testcase.pyto generate board-friendly params fordecode_projection_incore_0/rmsnorm_incore_0:[16, hidden]windows8192arg3) forced to0in single-block validationMotivation
On A5,
vec->vectmovwithcol_majortiles can enter an alignment-sensitive backend path and trigger UB alignment exceptions in real kernels (observed inrmsnorm_incore_0/decode_projection_incore_0).The pass avoids this unsafe lowering path by normalizing to a row-major reinterpret route while preserving tile alias semantics (no real data movement introduced by
treshape).Design Notes
pto.target_arch == a5).tmovremains after rewrite, pass emits error and fails.tmovare preserved on rewritten op.Validation
ninja -C build ptoas--pto-arch=a5 --pto-level=level3 --enable-insert-sync):test_tmov_col_major_16x1_align_a5:TRESHAPEpresenttest_tmov_row_major_1x16_control_a5: noTRESHAPEdecode_projection_incore_0/rmsnorm_incore_0:TRESHAPEpresentrunop.shtargeted guard run for the 4 samples: passRisk / Rollback
tmovpattern.PTOA5NormalizeTMovPassin pipeline.