[XLA:GPU] Fix theoretical bug where shmem_usage > shmem_budget

PiperOrigin-RevId: 565152217

[XLA:GPU] Fix theoretical bug where shmem_usage > shmem_budget
PiperOrigin-RevId: 565152217
67c0625e · Son Tuan Vu · TensorFlower Gardener · 6ed6cf08 · 67c0625e
隐藏空白更改
内联并排

Showing with 4 addition and 9 deletion

third_party/xla/xla/service/gpu/hlo_fusion_analysis.cc third_party/xla/xla/service/gpu/hlo_fusion_analysis.cc +4 -9

未找到文件。
--- a/third_party/xla/xla/service/gpu/hlo_fusion_analysis.cc
+++ b/third_party/xla/xla/service/gpu/hlo_fusion_analysis.cc
@@ -872,19 +872,14 @@ ReductionCodegenInfo HloFusionAnalysis::ComputeReductionCodegenInfo(
  // (Indeed I *think* "num_partial_results" is a misnomer for column
  // reductions; I think it's the number of *complete*, i.e. not partial,
  // results per warp.)
-  // TODO(vuson): two things are wrong here:
-  // (1) If num_partial_results == 1, and shmem_usage is big enough (or
-  // shmem_budget small enough), we will execute the loop once, turning
-  // num_partial_results to 0.
-  // (2) The loop was originally applied to both row and column reductions, we
+  // TODO(vuson): something is wrong here:
+  // The loop was originally applied to both row and column reductions, we
  // would need to verify that we could indeed exceed the memory usage for
  // column reductions, in which case the outer if needs to be removed.
  if (reduction_dimensions.is_row_reduction) {
-    while (shmem_usage * num_partial_results > shmem_budget) {
+    while (num_partial_results != 1 &&
+           shmem_usage * num_partial_results > shmem_budget) {
      num_partial_results /= 2;
-      if (num_partial_results == 1) {
-        break;
-      }
    }
  }