Improve transpose performance with tile sm copy, test=develop (#22311)
* Refine code, fix select tile error,test=develop
* Refine element type and some comments, test=develop
* Refine comments and gpu utils, test=develop
* Remove some useless condition
* Refine floor and ceil, test=develop
* refine for loop. test=develop
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>
Showing
想要评论请 注册 或 登录