matrix_mul_fp32_simt_256x64x8_64x32x8_tn_splitk_parallel.cu 1.6 KB