PhiloxRandom: Fix race in GPU fill function (#10298)

* PhiloxRandom: Fix race in GPU fill function The PhiloxRandom fill kernel for the GPU had race conditions that caused the outputs to be non-deterministic. In particular, the code previously executed with N GPU threads (# thread contexts per GPU), but it would only advance the fill addresses by N-1 stride in each step. This incorrect stride caused the 0th and N-1st threads to write to the same memory locations, racing for which was last to write their common locations. Make the stride equal to the number of threads to eliminate the race. BONUS: By fixing this race, PhiloxRandom constant-sized GPU initializers now match CPU initializers. * Update random_ops_test.py to find race conditions Increasing the size of arrays in the random_ops_test.py test to manifest the race conditions to be resolved.

PhiloxRandom: Fix race in GPU fill function (#10298)
* PhiloxRandom: Fix race in GPU fill function The PhiloxRandom fill kernel for the GPU had race conditions that caused the outputs to be non-deterministic. In particular, the code previously executed with N GPU threads (# thread contexts per GPU), but it would only advance the fill addresses by N-1 stride in each step. This incorrect stride caused the 0th and N-1st threads to write to the same memory locations, racing for which was last to write their common locations. Make the stride equal to the number of threads to eliminate the race. BONUS: By fixing this race, PhiloxRandom constant-sized GPU initializers now match CPU initializers. * Update random_ops_test.py to find race conditions Increasing the size of arrays in the random_ops_test.py test to manifest the race conditions to be resolved.
58747e35 · Joel Hestness · Jonathan Hseu · 2cbcda08 · 58747e35 · 58747e35
Showing with 5 addition and 4 deletion

tensorflow/core/kernels/random_op_gpu.cu.cc tensorflow/core/kernels/random_op_gpu.cu.cc +1 -1

tensorflow/python/kernel_tests/random_ops_test.py tensorflow/python/kernel_tests/random_ops_test.py +4 -3

未找到文件。
--- a/tensorflow/core/kernels/random_op_gpu.cu.cc
+++ b/tensorflow/core/kernels/random_op_gpu.cu.cc
@@ -141,7 +141,7 @@ struct FillPhiloxRandomKernel<Distribution, false> {
      const typename Distribution::ResultType samples = dist(&gen);
      copier(&data[offset], samples);

-      offset += (total_thread_count - 1) * kGroupSize;
+      offset += total_thread_count * kGroupSize;
      gen.Skip(total_thread_count - 1);
    }


--- a/tensorflow/python/kernel_tests/random_ops_test.py
+++ b/tensorflow/python/kernel_tests/random_ops_test.py
@@ -66,7 +66,8 @@ class RandomNormalTest(test.TestCase):
    for dt in dtypes.float16, dtypes.float32, dtypes.float64:
      results = {}
      for use_gpu in [False, True]:
-        sampler = self._Sampler(1000, 0.0, 1.0, dt, use_gpu=use_gpu, seed=12345)
+        sampler = self._Sampler(
+            1000000, 0.0, 1.0, dt, use_gpu=use_gpu, seed=12345)
        results[use_gpu] = sampler()
      if dt == dtypes.float16:
        self.assertAllClose(results[False], results[True], rtol=1e-3, atol=1e-3)
@@ -135,7 +136,7 @@ class TruncatedNormalTest(test.TestCase):
        # We need a particular larger number of samples to test multiple rounds
        # on GPU
        sampler = self._Sampler(
-            200000, 0.0, 1.0, dt, use_gpu=use_gpu, seed=12345)
+            1000000, 0.0, 1.0, dt, use_gpu=use_gpu, seed=12345)
        results[use_gpu] = sampler()
      if dt == dtypes.float16:
        self.assertAllClose(results[False], results[True], rtol=1e-3, atol=1e-3)
@@ -243,7 +244,7 @@ class RandomUniformTest(test.TestCase):
      results = {}
      for use_gpu in False, True:
        sampler = self._Sampler(
-            1000, minv=0, maxv=maxv, dtype=dt, use_gpu=use_gpu, seed=12345)
+            1000000, minv=0, maxv=maxv, dtype=dt, use_gpu=use_gpu, seed=12345)
        results[use_gpu] = sampler()
      self.assertAllEqual(results[False], results[True])