Skip to content

Rewrote flash attention to use BF16, transpose k and v, rewrote the task distribution, increase parallelism on decode, and use double the registers for the core of flash attention.#835

Closed
copybara-service[bot] wants to merge 0 commit intodevfrom
test_868146247

Conversation

@copybara-service
Copy link

Rewrote flash attention to use BF16, transpose k and v, rewrote the task distribution, increase parallelism on decode, and use double the registers for the core of flash attention.

@copybara-service copybara-service bot force-pushed the test_868146247 branch 2 times, most recently from 7b55d41 to a814aa4 Compare February 16, 2026 11:55
@copybara-service copybara-service bot force-pushed the test_868146247 branch 7 times, most recently from a6ce8e7 to f198565 Compare March 2, 2026 10:53
@copybara-service copybara-service bot closed this Mar 2, 2026
@copybara-service copybara-service bot deleted the test_868146247 branch March 2, 2026 11:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants