Thank you for your excellent job on HLS.
I've noticed a potential issue in the compute_q_matmul_k function in attention.cpp file. It appears that during the initial stages of computation, many elements within q_blocks are involved in calculations before they have been fully read in. This could potentially lead to inaccuracies in the computed results. Could you please explain the rationale behind this approach?