训练若干个epoch后报错an illegal memory access was encountered
Created by: junior-talk
paddlecloud job-0bb5eedc1f205017
terminate called after throwing an instance of 'paddle::platform::EnforceNotMet'
what():
--------------------------------------------
C++ Call Stacks (More useful to developers):
--------------------------------------------
0 std::string paddle::platform::GetTraceBackString<char const*>(char const*&&, char const*, int)
1 paddle::platform::EnforceNotMet::EnforceNotMet(std::__exception_ptr::exception_ptr, char const*, int)
2 paddle::framework::details::OpHandleBase::~OpHandleBase()
3 paddle::framework::details::FetchOpHandle::~FetchOpHandle()
4 paddle::framework::ir::Node::~Node()
5 paddle::framework::ir::Node::~Node()
6 paddle::framework::details::ClearFetchOp(paddle::framework::ir::Graph*, std::vector<paddle::framework::details::OpHandleBase*, std::allocator<paddle::framework::details::OpHandleBase*> >*)
7 paddle::framework::details::FastThreadedSSAGraphExecutor::ExecutionFinal(std::vector<paddle::framework::details::OpHandleBase*, std::allocator<paddle::framework::details::OpHandleBase*> >*)
8 paddle::framework::details::FastThreadedSSAGraphExecutor::Run(std::vector<std::string, std::allocator<std::string> > const&)
9 paddle::framework::details::ScopeBufferedMonitor::Apply(std::function<void ()()> const&, bool)
10 paddle::framework::details::ScopeBufferedSSAGraphExecutor::Run(std::vector<std::string, std::allocator<std::string> > const&)
11 paddle::framework::ParallelExecutor::Run(std::vector<std::string, std::allocator<std::string> > const&)
----------------------
Error Message Summary:
----------------------
Error: An error occurred here. There is no accurate error hint for this error yet. We are continuously in the process of increasing hint for this kind of error check. It would be helpful if you could inform us of how this conversion went by opening a github issue. And we will resolve it with high priority.
- New issue link: https://github.com/PaddlePaddle/Paddle/issues/new
- Recommended issue content: all error stack information: an illegal memory access was encountered at (/paddle/paddle/fluid/framework/details/op_handle_base.cc:39)
W0620 16:58:14.349057 878 init.cc:209] Warning: PaddlePaddle catches a failure signal, it may not work properly
W0620 16:58:14.349076 878 init.cc:211] You could check whether you killed PaddlePaddle thread/process accidentally or report the case to PaddlePaddle
W0620 16:58:14.349079 878 init.cc:214] The detail failure signal is:
W0620 16:58:14.349082 878 init.cc:217] *** Aborted at 1592643494 (unix time) try "date -d @1592643494" if you are using GNU date ***
W0620 16:58:14.351125 878 init.cc:217] PC: @ 0x0 (unknown)
W0620 16:58:14.351239 878 init.cc:217] *** SIGABRT (@0x36e) received by PID 878 (TID 0x7fbe93e04700) from PID 878; stack trace: ***
W0620 16:58:14.353410 878 init.cc:217] @ 0x7fbe933d7bb0 (unknown)
W0620 16:58:14.355901 878 init.cc:217] @ 0x7fbe92951f29 __GI_raise
W0620 16:58:14.357455 878 init.cc:217] @ 0x7fbe9295334a __GI_abort
W0620 16:58:14.358215 878 init.cc:217] @ 0x7fbddfb0ca8d __gnu_cxx::__verbose_terminate_handler()
W0620 16:58:14.358928 878 init.cc:217] @ 0x7fbddfb0abe6 (unknown)
W0620 16:58:14.359678 878 init.cc:217] @ 0x7fbddfb09b69 (unknown)
W0620 16:58:14.360306 878 init.cc:217] @ 0x7fbddfb0a5c1 __gxx_personality_v0
W0620 16:58:14.360981 878 init.cc:217] @ 0x7fbe1a5a8383 (unknown)
W0620 16:58:14.361657 878 init.cc:217] @ 0x7fbe1a5a8457 _Unwind_Resume
W0620 16:58:14.367496 878 init.cc:217] @ 0x7fbd9418ab9c paddle::framework::details::OpHandleBase::~OpHandleBase()
W0620 16:58:14.369899 878 init.cc:217] @ 0x7fbd94145011 paddle::framework::details::FetchOpHandle::~FetchOpHandle()
W0620 16:58:14.372714 878 init.cc:217] @ 0x7fbd91a72a89 paddle::framework::ir::Node::~Node()
W0620 16:58:14.377493 878 init.cc:217] @ 0x7fbd91a72c31 paddle::framework::ir::Node::~Node()
W0620 16:58:14.396554 878 init.cc:217] @ 0x7fbd94147956 paddle::framework::details::ClearFetchOp()
W0620 16:58:14.398361 878 init.cc:217] @ 0x7fbd9414394a paddle::framework::details::FastThreadedSSAGraphExecutor::ExecutionFinal()
W0620 16:58:14.403327 878 init.cc:217] @ 0x7fbd9414252a paddle::framework::details::FastThreadedSSAGraphExecutor::Run()
W0620 16:58:14.404551 878 init.cc:217] @ 0x7fbd9408afe7 _ZNSt17_Function_handlerIFvvEZN6paddle9framework7details29ScopeBufferedSSAGraphExecutor3RunERKSt6vectorISsSaISsEEEUlvE_E9_M_invokeERKSt9_Any_data
W0620 16:58:14.408218 878 init.cc:217] @ 0x7fbd9408ffdf paddle::framework::details::ScopeBufferedMonitor::Apply()
W0620 16:58:14.409902 878 init.cc:217] @ 0x7fbd9408bd86 paddle::framework::details::ScopeBufferedSSAGraphExecutor::Run()
W0620 16:58:14.412961 878 init.cc:217] @ 0x7fbd91b739a8 paddle::framework::ParallelExecutor::Run()
W0620 16:58:14.413442 878 init.cc:217] @ 0x7fbd91789a18 _ZZN8pybind1112cpp_function10initializeIZN6paddle6pybindL22pybind11_init_core_avxERNS_6moduleEEUlRNS2_9framework16ParallelExecutorERKSt6vectorISsSaISsEEE199_S9_INS6_9LoDTensorESaISF_EEIS8_SD_EINS_4nameENS_9is_methodENS_7siblingEEEEvOT_PFT0_DpT1_EDpRKT2_ENUlRNS_6detail13function_callEE1_4_FUNESY_
W0620 16:58:14.414839 878 init.cc:217] @ 0x7fbd917debb1 pybind11::cpp_function::dispatcher()
W0620 16:58:14.416537 878 init.cc:217] @ 0x7fbe936efce8 PyEval_EvalFrameEx
W0620 16:58:14.418002 878 init.cc:217] @ 0x7fbe936f237d PyEval_EvalCodeEx
W0620 16:58:14.419442 878 init.cc:217] @ 0x7fbe936efd70 PyEval_EvalFrameEx
W0620 16:58:14.420891 878 init.cc:217] @ 0x7fbe936f237d PyEval_EvalCodeEx
W0620 16:58:14.422331 878 init.cc:217] @ 0x7fbe936efd70 PyEval_EvalFrameEx
W0620 16:58:14.423780 878 init.cc:217] @ 0x7fbe936f237d PyEval_EvalCodeEx
W0620 16:58:14.425217 878 init.cc:217] @ 0x7fbe936efd70 PyEval_EvalFrameEx
W0620 16:58:14.426668 878 init.cc:217] @ 0x7fbe936f237d PyEval_EvalCodeEx
W0620 16:58:14.428104 878 init.cc:217] @ 0x7fbe936f24b2 PyEval_EvalCode
W0620 16:58:14.429548 878 init.cc:217] @ 0x7fbe9371c1c2 PyRun_FileExFlags