test_dist_ctr failed in ci task
Created by: wanghaoshuang
Error log:
[12:18:59]197/493 Test #192: test_dist_ctr ...................................***Failed 5.42 sec
[12:18:59]local_stderr:
[12:18:59]W0319 20:20:25.035552 2543620992 graph.h:204] WARN: After a series of passes, the current graph can be quite different from OriginProgram. So, please avoid using the `OriginProgram()` method!
[12:18:59]
[12:18:59]
[12:18:59]local_stderr:
[12:18:59]W0319 20:20:26.976441 2543620992 graph.h:204] WARN: After a series of passes, the current graph can be quite different from OriginProgram. So, please avoid using the `OriginProgram()` method!
[12:18:59]
[12:18:59]
[12:18:59]test_dist_ctr failed
[12:18:59] EE
[12:18:59]======================================================================
[12:18:59]ERROR: test_dist_ctr (test_dist_ctr.TestDistCTR2x2)
[12:18:59]----------------------------------------------------------------------
[12:18:59]Traceback (most recent call last):
[12:18:59] File "/home/teamcity/work/e84e6e698a3f913d/build/python/paddle/fluid/tests/unittests/test_dist_ctr.py", line 27, in test_dist_ctr
[12:18:59] self.check_with_place("dist_ctr.py", delta=1e-7, check_error_log=False)
[12:18:59] File "/home/teamcity/work/e84e6e698a3f913d/build/python/paddle/fluid/tests/unittests/test_dist_base.py", line 531, in check_with_place
[12:18:59] check_error_log)
[12:18:59] File "/home/teamcity/work/e84e6e698a3f913d/build/python/paddle/fluid/tests/unittests/test_dist_base.py", line 340, in _run_local
[12:18:59] sys.stderr.write('local_stdout: %s\n' % pickle.loads(local_out))
[12:18:59] File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 1388, in loads
[12:18:59] return Unpickler(file).load()
[12:18:59] File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 864, in load
[12:18:59] dispatch[key](self)
[12:18:59] File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 1153, in load_dup
[12:18:59] self.append(self.stack[-1])
[12:18:59]IndexError: list index out of range
相关code: test_dist_base: https://github.com/PaddlePaddle/Paddle/blame/develop/python/paddle/fluid/tests/unittests/test_dist_base.py#L365
PR: https://github.com/PaddlePaddle/Paddle/pull/16226
可能的原因:
- 单测间计算资源争用?
- 单测使用的相同的本地路径做checkpoints?