未验证 提交 0f4b6247 编写于 作者: W wangchaochaohu 提交者: GitHub

refine the gpu config for performance optimization (#28291)

上级 acc11c2a
......@@ -53,10 +53,8 @@ inline GpuLaunchConfig GetGpuLaunchConfig1D(
// Need get from device
const int thread_per_block = std::min(1024, context.GetMaxThreadsPerBlock());
// Suppose block count small than factor * sm, factor is a experiments value.
int factor = 4;
const int block_count =
std::min(DivUp(physical_thread_count, thread_per_block), factor * sm);
std::min(DivUp(physical_thread_count, thread_per_block), sm);
GpuLaunchConfig config;
config.theory_thread_count.x = theory_thread_count;
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册