未验证 提交 273f58a3 编写于 作者: H Huihuang Zheng 提交者: GitHub

Decrease Random Failure Probability for test_parallel_executor_mnist, test=develop (#27498)

As the title, decrease random failure probability for test_parallel_executor_mnist

The old code set larger delta when comparing reduce and all reduce, but didn't set all. I added it.

On my linux machine, I run 100 times, no failure occurs. In addition, we only saw this random failure on CI two times since I worked. I thought it was rare and I just increased the delta.
上级 d7f422c9
......@@ -124,8 +124,10 @@ class TestMNIST(TestParallelExecutorBase):
def test_simple_fc_with_new_strategy(self):
# use_cuda, use_reduce
self._compare_reduce_and_allreduce(simple_fc_net, True)
self._compare_reduce_and_allreduce(simple_fc_net, False)
# NOTE: the computation result of nccl_reduce is non-deterministic,
# related issue: https://github.com/NVIDIA/nccl/issues/157
self._compare_reduce_and_allreduce(simple_fc_net, True, 1e-5, 1e-2)
self._compare_reduce_and_allreduce(simple_fc_net, False, 1e-5, 1e-2)
def check_simple_fc_parallel_accuracy(self, use_cuda):
if use_cuda and not core.is_compiled_with_cuda():
......@@ -179,7 +181,7 @@ class TestMNIST(TestParallelExecutorBase):
# NOTE: the computation result of nccl_reduce is non-deterministic,
# related issue: https://github.com/NVIDIA/nccl/issues/157
self._compare_reduce_and_allreduce(fc_with_batchnorm, True, 1e-5, 1e-2)
self._compare_reduce_and_allreduce(fc_with_batchnorm, False)
self._compare_reduce_and_allreduce(fc_with_batchnorm, False, 1e-5, 1e-2)
if __name__ == '__main__':
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册