-
由 zhaoyuchen2018 提交于
* Improve topk performance. give 200000 data to compute topk, before opt: cost 1s after opt: cost 0.0028s. * Refine return value. * Add cuda util funtions. * Fix ComputeBlockSize bug & refine comments. Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>
5dbe9e59