未验证 提交 85a11c47 编写于 作者: Z Zhang Zheng 提交者: GitHub

Modify the implementation of BlockXReduce to fit more scenes (#39554)

* Modify the implementation of BlockYReduce to fit more scenes

* fix

* fix
上级 7fa29a6b
......@@ -110,7 +110,11 @@ __device__ __forceinline__ T BlockXReduce(T val, ReduceOp reducer) {
T temp = paddle::platform::CudaShuffleDownSync(mask, val, stride);
val = reducer(val, temp);
}
return val;
if (threadIdx.x == 0) {
shared[threadIdx.y] = val;
}
__syncthreads();
return shared[threadIdx.y];
}
/**
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册