Improve qkv transpose performance (#23919)
Use vector instruction (LDG.128) to improve qkv transpose. It provides 1.4X speedup at same GPU base frequency. test=develop
Showing
想要评论请 注册 或 登录
Fork自 PaddlePaddle / Paddle
Use vector instruction (LDG.128) to improve qkv transpose. It provides 1.4X speedup at same GPU base frequency. test=develop