[cherry-pick] Improve topk performance. (#21087) (#21441)
* Improve topk performance.
give 200000 data to compute topk,
before opt: cost 1s
after opt: cost 0.0028s.
* Refine return value.
* Add cuda util funtions.
* Fix ComputeBlockSize bug & refine comments.
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>
Showing
想要评论请 注册 或 登录
