Homework 4 extra Public repository and stub/testing code for Homework 4-etra of 10-714. Transformer Penn Treebank dataset Reference: How to Optimize a CUDA Matmul Kernel for cuBLAS-like Performance: a Worklog An Efficient Matrix Transpose in CUDA C/C++