未验证 提交 0f4b6247 编写于 作者: W wangchaochaohu 提交者: GitHub

refine the gpu config for performance optimization (#28291)

上级 acc11c2a
...@@ -53,10 +53,8 @@ inline GpuLaunchConfig GetGpuLaunchConfig1D( ...@@ -53,10 +53,8 @@ inline GpuLaunchConfig GetGpuLaunchConfig1D(
// Need get from device // Need get from device
const int thread_per_block = std::min(1024, context.GetMaxThreadsPerBlock()); const int thread_per_block = std::min(1024, context.GetMaxThreadsPerBlock());
// Suppose block count small than factor * sm, factor is a experiments value.
int factor = 4;
const int block_count = const int block_count =
std::min(DivUp(physical_thread_count, thread_per_block), factor * sm); std::min(DivUp(physical_thread_count, thread_per_block), sm);
GpuLaunchConfig config; GpuLaunchConfig config;
config.theory_thread_count.x = theory_thread_count; config.theory_thread_count.x = theory_thread_count;
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册