`test_parallel_executor.ParallelExecutorTestingDuringTraining` test failed frequently.
Created by: Superjomn
The ParallelExecutor
unit test fails too frequently, even when the PR doesn't change anything affect it.
Can we have some quick fix to this issue? such as disable the test.
The test is test_parallel_executor.ParallelExecutorTestingDuringTraining
.
[18:34:40] [Step 1/1] 95/125 Test #91: test_parallel_executor ..........................***Exception: Other 23.63 sec
[18:34:40] [Step 1/1] test_parallel_testing (test_parallel_executor.ParallelExecutorTestingDuringTraining) ... terminate called after throwing an instance of 'paddle::platform::EnforceNotMet'
[18:34:40] [Step 1/1] what(): an illegal memory access was encountered at [/paddle/paddle/fluid/framework/details/op_handle_base.cc:37]
[18:34:40] [Step 1/1] PaddlePaddle Call Stacks:
[18:34:40] [Step 1/1] 0 0x7f7ba516bb8cp paddle::platform::EnforceNotMet::EnforceNotMet(std::__exception_ptr::exception_ptr, char const*, int) + 572
[18:34:40] [Step 1/1] 1 0x7f7ba5d58e72p paddle::framework::details::OpHandleBase::~OpHandleBase() + 626
[18:34:40] [Step 1/1] 2 0x7f7ba5d542d1p paddle::framework::details::FetchOpHandle::~FetchOpHandle() + 17
[18:34:40] [Step 1/1] 3 0x7f7ba5d4fdfep paddle::framework::details::ThreadedSSAGraphExecutor::Run(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) + 3326
[18:34:40] [Step 1/1] 4 0x7f7ba5232f35p paddle::framework::ParallelExecutor::Run(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, paddle::framework::LoDTensor, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, paddle::framework::LoDTensor> > > const&) + 469
[18:34:40] [Step 1/1] 5 0x7f7ba51c0503p void pybind11::cpp_function::initialize<pybind11::cpp_function::initialize<void, paddle::framework::ParallelExecutor, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, paddle::framework::LoDTensor, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, paddle::framework::LoDTensor> > > const&, pybind11::name, pybind11::is_method, pybind11::sibling>(void (paddle::framework::ParallelExecutor::*)(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, paddle::framework::LoDTensor, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, paddle::framework::LoDTensor> > > const&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&)::{lambda(paddle::framework::ParallelExecutor*, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, paddle::framework::LoDTensor, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, paddle::framework::LoDTensor> > > const&)#1}, void, paddle::framework::ParallelExecutor*, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, paddle::framework::LoDTensor, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, paddle::framework::LoDTensor> > > const&, pybind11::name, pybind11::is_method, pybind11::sibling>(pybind11::cpp_function::initialize<void, paddle::framework::ParallelExecutor, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, paddle::framework::LoDTensor, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, paddle::framework::LoDTensor> > > const&, pybind11::name, pybind11::is_method, pybind11::sibling>(void (paddle::framework::ParallelExecutor::*)(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, paddle::framework::LoDTensor, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, paddle::framework::LoDTensor> > > const&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&)::{lambda(paddle::framework::ParallelExecutor*, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, paddle::framework::LoDTensor, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, paddle::framework::LoDTensor> > > const&)#1}&&, void (*)(paddle::framework::ParallelExecutor*, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, paddle::framework::LoDTensor, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, paddle::framework::LoDTensor> > > const&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&)::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call) + 451
[18:34:40] [Step 1/1] 6 0x7f7ba5186084p pybind11::cpp_function::dispatcher(_object*, _object*, _object*) + 1236
[18:34:40] [Step 1/1] 7 0x4c37edp PyEval_EvalFrameEx + 31165
[18:34:40] [Step 1/1] 8 0x4b9ab6p PyEval_EvalCodeEx + 774
[18:34:40] [Step 1/1] 9 0x4c16e7p PyEval_EvalFrameEx + 22711
[18:34:40] [Step 1/1] 10 0x4c136fp PyEval_EvalFrameEx + 21823
[18:34:40] [Step 1/1] 11 0x4b9ab6p PyEval_EvalCodeEx + 774
[18:34:40] [Step 1/1] 12 0x4d55f3p
[18:34:40] [Step 1/1] 13 0x4a577ep PyObject_Call + 62
[18:34:40] [Step 1/1] 14 0x4bed3dp PyEval_EvalFrameEx + 12045
[18:34:40] [Step 1/1] 15 0x4b9ab6p PyEval_EvalCodeEx + 774
[18:34:40] [Step 1/1] 16 0x4d54b9p
[18:34:40] [Step 1/1] 17 0x4eebeep
[18:34:40] [Step 1/1] 18 0x4a577ep PyObject_Call + 62
[18:34:40] [Step 1/1] 19 0x548253p
[18:34:40] [Step 1/1] 20 0x4c15bfp PyEval_EvalFrameEx + 22415
[18:34:40] [Step 1/1] 21 0x4b9ab6p PyEval_EvalCodeEx + 774
[18:34:40] [Step 1/1] 22 0x4d55f3p
[18:34:40] [Step 1/1] 23 0x4a577ep PyObject_Call + 62
[18:34:40] [Step 1/1] 24 0x4bed3dp PyEval_EvalFrameEx + 12045
[18:34:40] [Step 1/1] 25 0x4b9ab6p PyEval_EvalCodeEx + 774
[18:34:40] [Step 1/1] 26 0x4d54b9p
[18:34:40] [Step 1/1] 27 0x4eebeep
[18:34:40] [Step 1/1] 28 0x4a577ep PyObject_Call + 62
[18:34:40] [Step 1/1] 29 0x548253p
[18:34:40] [Step 1/1] 30 0x4c15bfp PyEval_EvalFrameEx + 22415
[18:34:40] [Step 1/1] 31 0x4b9ab6p PyEval_EvalCodeEx + 774
[18:34:40] [Step 1/1] 32 0x4d55f3p
[18:34:40] [Step 1/1] 33 0x4a577ep PyObject_Call + 62
[18:34:40] [Step 1/1] 34 0x4bed3dp PyEval_EvalFrameEx + 12045
[18:34:40] [Step 1/1] 35 0x4b9ab6p PyEval_EvalCodeEx + 774
[18:34:40] [Step 1/1] 36 0x4d54b9p
[18:34:40] [Step 1/1] 37 0x4eebeep
[18:34:40] [Step 1/1] 38 0x4a577ep PyObject_Call + 62
[18:34:40] [Step 1/1] 39 0x548253p
[18:34:40] [Step 1/1] 40 0x4c15bfp PyEval_EvalFrameEx + 22415
[18:34:40] [Step 1/1] 41 0x4b9ab6p PyEval_EvalCodeEx + 774
[18:34:40] [Step 1/1] 42 0x4d55f3p
[18:34:40] [Step 1/1] 43 0x4a577ep PyObject_Call + 62
[18:34:40] [Step 1/1] 44 0x4bed3dp PyEval_EvalFrameEx + 12045
[18:34:40] [Step 1/1] 45 0x4b9ab6p PyEval_EvalCodeEx + 774
[18:34:40] [Step 1/1] 46 0x4d54b9p
[18:34:40] [Step 1/1] 47 0x4eebeep
[18:34:40] [Step 1/1] 48 0x4a577ep PyObject_Call + 62
[18:34:40] [Step 1/1] 49 0x548253p
[18:34:40] [Step 1/1] 50 0x4c15bfp PyEval_EvalFrameEx + 22415
[18:34:40] [Step 1/1] 51 0x4c136fp PyEval_EvalFrameEx + 21823
[18:34:40] [Step 1/1] 52 0x4c136fp PyEval_EvalFrameEx + 21823
[18:34:40] [Step 1/1] 53 0x4b9ab6p PyEval_EvalCodeEx + 774
[18:34:40] [Step 1/1] 54 0x4d55f3p
[18:34:40] [Step 1/1] 55 0x4eebeep
[18:34:40] [Step 1/1] 56 0x4ee7f6p
[18:34:40] [Step 1/1] 57 0x4aa9abp
[18:34:40] [Step 1/1] 58 0x4c15bfp PyEval_EvalFrameEx + 22415
[18:34:40] [Step 1/1] 59 0x4b9ab6p PyEval_EvalCodeEx + 774
[18:34:40] [Step 1/1] 60 0x4bfa8dp PyEval_EvalFrameEx + 15453
[18:34:40] [Step 1/1] 61 0x4b9ab6p PyEval_EvalCodeEx + 774
[18:34:40] [Step 1/1] 62 0x4c16e7p PyEval_EvalFrameEx + 22711
[18:34:40] [Step 1/1] 63 0x4b9ab6p PyEval_EvalCodeEx + 774
[18:34:40] [Step 1/1] 64 0x4d54b9p
[18:34:40] [Step 1/1] 65 0x4a577ep PyObject_Call + 62
[18:34:40] [Step 1/1] 66 0x519a46p
[18:34:40] [Step 1/1] 67 0x493b06p Py_Main + 1590
[18:34:40] [Step 1/1] 68 0x7f7bda326830p __libc_start_main + 240
[18:34:40] [Step 1/1] 69 0x4933e9p _start + 41
[18:34:40] [Step 1/1]
[18:34:40] [Step 1/1] *** Aborted at 1523874860 (unix time) try "date -d @1523874860" if you are using GNU date ***
[18:34:40] [Step 1/1] PC: @ 0x0 (unknown)
[18:34:40] [Step 1/1] *** SIGABRT (@0x6124) received by PID 24868 (TID 0x7f7bdab06700) from PID 24868; stack trace: ***
[18:34:40] [Step 1/1] @ 0x7f7bda6e1390 (unknown)
[18:34:40] [Step 1/1] @ 0x7f7bda33b428 gsignal
[18:34:40] [Step 1/1] @ 0x7f7bda33d02a abort
[18:34:40] [Step 1/1] @ 0x7f7bd0e1284d __gnu_cxx::__verbose_terminate_handler()
[18:34:40] [Step 1/1] @ 0x7f7bd0e106b6 (unknown)
[18:34:40] [Step 1/1] @ 0x7f7bd0e0f6a9 (unknown)
[18:34:40] [Step 1/1] @ 0x7f7bd0e10005 __gxx_personality_v0
[18:34:40] [Step 1/1] @ 0x7f7bd335bf83 (unknown)
[18:34:40] [Step 1/1] @ 0x7f7bd335c2eb _Unwind_RaiseException
[18:34:40] [Step 1/1] @ 0x7f7bd0e1090c __cxa_throw
[18:34:40] [Step 1/1] @ 0x7f7ba5d58e90 paddle::framework::details::OpHandleBase::~OpHandleBase()
[18:34:40] [Step 1/1] @ 0x7f7ba5d542d1 paddle::framework::details::FetchOpHandle::~FetchOpHandle()
[18:34:40] [Step 1/1] @ 0x7f7ba5d4fdfe paddle::framework::details::ThreadedSSAGraphExecutor::Run()
[18:34:40] [Step 1/1] @ 0x7f7ba5232f35 paddle::framework::ParallelExecutor::Run()
[18:34:40] [Step 1/1] @ 0x7f7ba51c0503 _ZZN8pybind1112cpp_function10initializeIZNS0_C4IvN6paddle9framework16ParallelExecutorEJRKSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaISC_EERKSC_RKSt13unordered_mapISC_NS4_9LoDTensorESt4hashISC_ESt8equal_toISC_ESaISt4pairISH_SK_EEEEJNS_4nameENS_9is_methodENS_7siblingEEEEMT0_FT_DpT1_EDpRKT2_EUlPS5_SG_SI_SU_E_vJS18_SG_SI_SU_EJSV_SW_SX_EEEvOSZ_PFSY_S11_ES17_ENUlRNS_6detail13function_callEE1_4_FUNES1F_
[18:34:40] [Step 1/1] @ 0x7f7ba5186084 pybind11::cpp_function::dispatcher()
[18:34:40] [Step 1/1] @ 0x4c37ed PyEval_EvalFrameEx
[18:34:40] [Step 1/1] @ 0x4b9ab6 PyEval_EvalCodeEx
[18:34:40] [Step 1/1] @ 0x4c16e7 PyEval_EvalFrameEx
[18:34:40] [Step 1/1] @ 0x4c136f PyEval_EvalFrameEx
[18:34:40] [Step 1/1] @ 0x4b9ab6 PyEval_EvalCodeEx
[18:34:40] [Step 1/1] @ 0x4d55f3 (unknown)
[18:34:40] [Step 1/1] @ 0x4a577e PyObject_Call
[18:34:40] [Step 1/1] @ 0x4bed3d PyEval_EvalFrameEx
[18:34:40] [Step 1/1] @ 0x4b9ab6 PyEval_EvalCodeEx
[18:34:40] [Step 1/1] @ 0x4d54b9 (unknown)
[18:34:40] [Step 1/1] @ 0x4eebee (unknown)
[18:34:40] [Step 1/1] @ 0x4a577e PyObject_Call
[18:34:40] [Step 1/1] @ 0x548253 (unknown)
[18:34:40] [Step 1/1] @ 0x4c15bf PyEval_EvalFrameEx
[18:34:40] [Step 1/1] @ 0x4b9ab6 PyEval_EvalCodeEx
[18:34:40] [Step 1/1] @ 0x4d55f3 (unknown)