Add PTOAS support for TAXPY, THISTOGRAM, TGET_SCALE_ADDR, TROWARGMAX and TROWARGMIN#461
Add PTOAS support for TAXPY, THISTOGRAM, TGET_SCALE_ADDR, TROWARGMAX and TROWARGMIN#461HecreReed wants to merge 4 commits intohw-native-sys:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request adds several new operations to the PTO dialect, including THistogramOp, TGetScaleAddrOp, TAxpyOp, TRowArgMaxOp, and TRowArgMinOp, along with their associated TableGen definitions, IR verifiers, memory effects, and EmitC lowering patterns. Feedback focuses on improving the IR verifiers: specifically, the TAxpyOp verifier should account for bf16 to f32 widening on the A5 architecture, and both TRowArgMaxOp and TRowArgMinOp should be extended to support bf16 source elements on A5 for consistency with other operations.
| bool widenF16ToF32 = srcElem.isF16() && dstElem.isF32(); | ||
| if (!(sameType || widenF16ToF32)) | ||
| return emitOpError( | ||
| "expects dst/src element types to match, or dst=f32 and src=f16"); |
There was a problem hiding this comment.
The verifyA5 lambda for TAxpyOp does not account for bf16 to f32 widening. Since bf16 is explicitly allowed for both src and dst in the subsequent checks (lines 2641-2644), it should also be allowed to widen to f32. The error message should also be updated to reflect this.
| bool widenF16ToF32 = srcElem.isF16() && dstElem.isF32(); | |
| if (!(sameType || widenF16ToF32)) | |
| return emitOpError( | |
| "expects dst/src element types to match, or dst=f32 and src=f16"); | |
| bool widenF16ToF32 = srcElem.isF16() && dstElem.isF32(); | |
| bool widenBF16ToF32 = srcElem.isBF16() && dstElem.isF32(); | |
| if (!(sameType || widenF16ToF32 || widenBF16ToF32)) | |
| return emitOpError( | |
| "expects dst/src element types to match, or dst=f32 and src=f16/bf16"); |
| auto srcElem = getElemTy(srcTy).dyn_cast<mlir::FloatType>(); | ||
| if (!srcElem || (!srcElem.isF16() && !srcElem.isF32())) | ||
| return emitOpError("expects src element type to be f16 or f32"); |
There was a problem hiding this comment.
The TRowArgMaxOp verifier currently only allows f16 and f32 for the source element type. Given that TAxpyOp supports bf16 on A5, it is likely that TRowArgMaxOp should also support bf16 on A5. Consider updating the verifier to allow bf16 when the architecture is A5 by splitting the verifyByArch lambda or adding an architecture check.
| auto srcElem = getElemTy(srcTy).dyn_cast<mlir::FloatType>(); | ||
| if (!srcElem || (!srcElem.isF16() && !srcElem.isF32())) | ||
| return emitOpError("expects src element type to be f16 or f32"); |
|
/run a3 |
A3 板测失败
失败用例
|
A3 板测失败详情:PR #461rsqrt
rems
rem
|
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: db7b6d8d1a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| PTO_ADD_READ(getSrcMutable()); | ||
| PTO_ADD_READ(getScalarMutable()); | ||
| PTO_ADD_WRITE(getDstMutable()); |
There was a problem hiding this comment.
Mark TAXPY destination as read in memory effects
pto.taxpy is defined as an in-place accumulate (dst += src * scalar), so it depends on the previous contents of dst. In TAxpyOp::getEffects the destination is marked only as Write, which can let transformations that consume MemoryEffectsOpInterface treat earlier updates to the same tile as dead/overwritable. This can change results when multiple taxpy operations target the same destination.
Useful? React with 👍 / 👎.
| PTO_ADD_READ(getSrcMutable()); | ||
| PTO_ADD_READ(getIdxMutable()); | ||
| PTO_ADD_WRITE(getDstMutable()); |
There was a problem hiding this comment.
Mark THISTOGRAM destination as read in memory effects
pto.thistogram performs histogram accumulation into dst, so it semantically reads existing bin values before writing updated ones. THistogramOp::getEffects currently models dst as write-only, which can permit incorrect scheduling or dead-store style rewrites for sequences that repeatedly update the same histogram tile.
Useful? React with 👍 / 👎.
Summary
pto.taxpy,pto.thistogram,pto.tget_scale_addr,pto.trowargmax, andpto.trowargminptobcopcode mapping and add basic smoke/verifier coverageValidation
ninja -C build ptoasbuild/tools/ptoas/ptoason the 7 newtest/basic/*.ptocasesctest --output-on-failure -R ptobc_opcode_coverage_check