Training examples/librispeech - Segmentation fault
Created by: karlkao
While it's training with examples/librispeech, right after test done with epoch 2, starting epoch 3. The training failed with following message.
----------Begin test... --------Time: 8828.566067 sec, epoch: 2, train loss: 55.043900, test loss: 823.349414 save parameters at ./checkpoints/libri/epoch_2 W0123 04:51:52.655895 1282 init.cc:206] *** Aborted at 1579755112 (unix time) try "date -d @1579755112" if you are using GNU date *** W0123 04:51:52.658562 1282 init.cc:206] PC: @ 0x0 (unknown) W0123 04:51:52.658725 1282 init.cc:206] *** SIGSEGV (@0x64) received by PID 1156 (TID 0x7f2532fed700) from PID 100; stack trace: *** W0123 04:51:52.660856 1282 init.cc:206] @ 0x7f4343178390 (unknown) W0123 04:51:52.663314 1282 init.cc:206] @ 0x7f4280d79d56 std::_Hashtable<>::_M_find_before_node() W0123 04:51:52.666824 1282 init.cc:206] @ 0x7f4280d77fb7 paddle::framework::Scope::FindVarLocally() W0123 04:51:52.668771 1282 init.cc:206] @ 0x7f4280d785be paddle::framework::Scope::VarInternal() W0123 04:51:52.670828 1282 init.cc:206] @ 0x7f4280d7873d paddle::framework::Scope::Var() W0123 04:51:52.673468 1282 init.cc:206] @ 0x7f427ee0dc7d paddle::operators::RecurrentGradOp::RunImpl() W0123 04:51:52.675822 1282 init.cc:206] @ 0x7f4280d0269c paddle::framework::OperatorBase::Run() W0123 04:51:52.677320 1282 init.cc:206] @ 0x7f4280ae4c8d _ZNSt17_Function_handlerIFvvEZN6paddle9framework7details12OpHandleBase17RunAndRecordEventERKSt8functionIS0_EEUlvE_E9_M_invokeERKSt9_Any_data W0123 04:51:52.680279 1282 init.cc:206] @ 0x7f4280ae44b5 paddle::framework::details::OpHandleBase::RunAndRecordEvent() W0123 04:51:52.683532 1282 init.cc:206] @ 0x7f4280ae78cb paddle::framework::details::ComputationOpHandle::RunImpl() W0123 04:51:52.686373 1282 init.cc:206] @ 0x7f4280a9eba6 paddle::framework::details::FastThreadedSSAGraphExecutor::RunOpSync() W0123 04:51:52.689316 1282 init.cc:206] @ 0x7f4280a9d8ef paddle::framework::details::FastThreadedSSAGraphExecutor::RunOp() W0123 04:51:52.690891 1282 init.cc:206] @ 0x7f4280a9dbb4 _ZNSt17_Function_handlerIFvvESt17reference_wrapperISt12_Bind_simpleIFS1_ISt5_BindIFZN6paddle9framework7details28FastThreadedSSAGraphExecutor10RunOpAsyncEPSt13unordered_mapIPNS6_12OpHandleBaseESt6atomicIiESt4hashISA_ESt8equal_toISA_ESaISt4pairIKSA_SC_EEESA_RKSt10shared_ptrINS5_13BlockingQueueImEEEEUlvE_vEEEvEEEE9_M_invokeERKSt9_Any_data W0123 04:51:52.694702 1282 init.cc:206] @ 0x7f427e5a0d13 std::_Function_handler<>::_M_invoke() W0123 04:51:52.698467 1282 init.cc:206] @ 0x7f427e3ece37 std::__future_base::_State_base::_M_do_set() W0123 04:51:52.700204 1282 init.cc:206] @ 0x7f4343175a99 __pthread_once_slow W0123 04:51:52.701709 1282 init.cc:206] @ 0x7f4280a99702 _ZNSt13__future_base11_Task_stateISt5_BindIFZN6paddle9framework7details28FastThreadedSSAGraphExecutor10RunOpAsyncEPSt13unordered_mapIPNS4_12OpHandleBaseESt6atomicIiESt4hashIS8_ESt8equal_toIS8_ESaISt4pairIKS8_SA_EEES8_RKSt10shared_ptrINS3_13BlockingQueueImEEEEUlvE_vEESaIiEFvvEE6_M_runEv W0123 04:51:52.704826 1282 init.cc:206] @ 0x7f427e3ee5f4 _ZZN10ThreadPoolC1EmENKUlvE_clEv W0123 04:51:52.706565 1282 init.cc:206] @ 0x7f4334da0c80 (unknown) W0123 04:51:52.708508 1282 init.cc:206] @ 0x7f434316e6ba start_thread W0123 04:51:52.710675 1282 init.cc:206] @ 0x7f4342ea441d clone W0123 04:51:52.712718 1282 init.cc:206] @ 0x0 (unknown) Segmentation fault (core dumped) Failed in training!