parallel_executor使用报错
Created by: jshower
在python/paddle/fluid/tests/unittests/test_parallel_executor.py 中,我尝试对TestCRFModel(unittest.TestCase)的batch轮数增大,发现轮数增多后会出现下述问题。具体出现轮数差别较大,有时候几十轮后出现,有时候几千轮后出现(不是数据耗尽)。
*** Aborted at 1523950391 (unix time) try "date -d @1523950391" if you are using GNU date ***
PC: @ 0x0 (unknown)
*** SIGSEGV (@0x0) received by PID 12445 (TID 0x7f88035ee700) from PID 0; stack trace: ***
@ 0x7f8a90581390 (unknown)
@ 0x7f8a44af4a0f _ZNSt17_Function_handlerIFvvEZN6paddle9framework7details19ComputationOpHandle7RunImplEvEUlvE_E9_M_invokeERKSt9_Any_data
@ 0x7f8a44b0a94d _ZNSt17_Function_handlerIFvvEZN6paddle9framework7details12OpHandleBase17RunAndRecordEventERKSt8functionIS0_EEUlvE_E9_M_invokeERKSt9_Any_data
@ 0x7f8a44b0999d paddle::framework::details::OpHandleBase::RunAndRecordEvent()
@ 0x7f8a44af4f59 paddle::framework::details::ComputationOpHandle::RunImpl()
@ 0x7f8a44b0ad61 paddle::framework::details::OpHandleBase::Run()
@ 0x7f8a44aff04b _ZZN6paddle9framework7details24ThreadedSSAGraphExecutor5RunOpEPNS1_13BlockingQueueIPNS1_13VarHandleBaseEEEPNS1_12OpHandleBaseEENKUlvE_clEv
@ 0x7f8a44aff640 _ZNSt17_Function_handlerIFSt10unique_ptrINSt13__future_base12_Result_baseENS2_8_DeleterEEvENS1_12_Task_setterIS0_INS1_7_ResultIvEES3_ESt12_Bind_simpleIFSt17reference_wrapperISt5_BindIFZN6paddle9framework7details24ThreadedSSAGraphExecutor5RunOpEPNSF_13BlockingQueueIPNSF_13VarHandleBaseEEEPNSF_12OpHandleBaseEEUlvE_vEEEvEEvEEE9_M_invokeERKSt9_Any_data
@ 0x7f8a449a0a4e std::__future_base::_State_baseV2::_M_do_set()
@ 0x7f8a9057ea99 __pthread_once_slow
@ 0x7f8a44afe20d _ZNSt17_Function_handlerIFvvEZN10ThreadPool7enqueueIRZN6paddle9framework7details24ThreadedSSAGraphExecutor5RunOpEPNS5_13BlockingQueueIPNS5_13VarHandleBaseEEEPNS5_12OpHandleBaseEEUlvE_JEEESt6futureINSt9result_ofIFT_DpT0_EE4typeEEOSI_DpOSJ_EUlvE_E9_M_invokeERKSt9_Any_data
@ 0x7f8a44b03c74 _ZNSt6thread5_ImplISt12_Bind_simpleIFZN10ThreadPoolC4EmEUlvE_vEEE6_M_runEv
@ 0x7f8a86e5bc80 (unknown)
@ 0x7f8a905776ba start_thread
@ 0x7f8a902ad41d clone
@ 0x0 (unknown)
Segmentation fault
辛苦追查下原因。