提交 67c0625e 编写于 作者: S Son Tuan Vu 提交者: TensorFlower Gardener

[XLA:GPU] Fix theoretical bug where shmem_usage > shmem_budget

PiperOrigin-RevId: 565152217
上级 6ed6cf08
......@@ -872,19 +872,14 @@ ReductionCodegenInfo HloFusionAnalysis::ComputeReductionCodegenInfo(
// (Indeed I *think* "num_partial_results" is a misnomer for column
// reductions; I think it's the number of *complete*, i.e. not partial,
// results per warp.)
// TODO(vuson): two things are wrong here:
// (1) If num_partial_results == 1, and shmem_usage is big enough (or
// shmem_budget small enough), we will execute the loop once, turning
// num_partial_results to 0.
// (2) The loop was originally applied to both row and column reductions, we
// TODO(vuson): something is wrong here:
// The loop was originally applied to both row and column reductions, we
// would need to verify that we could indeed exceed the memory usage for
// column reductions, in which case the outer if needs to be removed.
if (reduction_dimensions.is_row_reduction) {
while (shmem_usage * num_partial_results > shmem_budget) {
while (num_partial_results != 1 &&
shmem_usage * num_partial_results > shmem_budget) {
num_partial_results /= 2;
if (num_partial_results == 1) {
break;
}
}
}
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册