Paddle 1.7.1 训练出core
Created by: parap1uie-s
PaddlePaddle 1.7.1.post107,GPU EfficientNet B3结构 + 自定义loss函数,SGDM优化器 从ImageNet预训练权值fine tune是ok的,但从checkpoint继续训练,就会报这个问题 同样的代码在Paddle 1.6.3下可正常运行,加载1.7.1的checkpoint也可正常继续训练。
错误提示:
I0413 11:32:42.475473 91815 build_strategy.cc:365] SeqOnlyAllReduceOps:0, num_trainers:1
W0413 11:32:56.840953 97958 operator.cc:181] elementwise_mul raises an exception thrust::system::system_error, parallel_for failed: an illegal memory access was encountered
W0413 11:32:56.840953 97963 operator.cc:181] elementwise_mul raises an exception thrust::system::system_error, parallel_for failed: an illegal memory access was encountered
F0413 11:32:56.841661 97958 exception_holder.h:37] std::exception caught, parallel_for failed: an illegal memory access was encountered
*** Check failure stack trace: ***
F0413 11:32:56.841711 97963 exception_holder.h:37] std::exception caught, parallel_for failed: an illegal memory access was encountered
*** Check failure stack trace: ***
W0413 11:32:56.842131 97951 operator.cc:181] elementwise_mul raises an exception thrust::system::system_error, parallel_for failed: an illegal memory access was encountered
F0413 11:32:56.841711 97963 exception_holder.h:37] std::exception caught, parallel_for failed: an illegal memory access was encounteredF0413 11:32:56.842221 97951 exception_holder.h:37] std::exception caught, parallel_for failed: an illegal memory access was encountered
*** Check failure stack trace: ***
@ 0x7fbcd560e9fd google::LogMessage::Fail()
@ 0x7fbcd560e9fd google::LogMessage::Fail()
@ 0x7fbcd56124ac google::LogMessage::SendToLog()
@ 0x7fbcd56124ac google::LogMessage::SendToLog()
@ 0x7fbcd560e9fd google::LogMessage::Fail()
@ 0x7fbcd56124ac google::LogMessage::SendToLog()
@ 0x7fbcd560e523 google::LogMessage::Flush()
@ 0x7fbcd560e523 google::LogMessage::Flush()
@ 0x7fbcd560e523 google::LogMessage::Flush()
@ 0x7fbcd56139be google::LogMessageFatal::~LogMessageFatal()
@ 0x7fbcd56139be google::LogMessageFatal::~LogMessageFatal()
@ 0x7fbcd56139be google::LogMessageFatal::~LogMessageFatal()
@ 0x7fbcd7be8788 paddle::framework::details::ExceptionHolder::Catch()
@ 0x7fbcd7be8788 paddle::framework::details::ExceptionHolder::Catch()
@ 0x7fbcd7c9445e paddle::framework::details::FastThreadedSSAGraphExecutor::RunOpSync()
@ 0x7fbcd7c9445e paddle::framework::details::FastThreadedSSAGraphExecutor::RunOpSync()
@ 0x7fbcd7be8788 paddle::framework::details::ExceptionHolder::Catch()
@ 0x7fbcd7c9306f paddle::framework::details::FastThreadedSSAGraphExecutor::RunOp()
@ 0x7fbcd7c9306f paddle::framework::details::FastThreadedSSAGraphExecutor::RunOp()
@ 0x7fbcd7c93334 _ZNSt17_Function_handlerIFvvESt17reference_wrapperISt12_Bind_simpleIFS1_ISt5_BindIFZN6paddle9framework7details28FastThreadedSSAGraphExecutor10RunOpAsyncEPSt13unordered_mapIPNS6_12OpHandleBaseESt6atomicIiESt4hashISA_ESt8equal_toISA_ESaISt4pairIKSA_SC_EEESA_RKSt10shared_ptrINS5_13BlockingQueueImEEEEUlvE_vEEEvEEEE9_M_invokeERKSt9_Any_data
@ 0x7fbcd7c9445e paddle::framework::details::FastThreadedSSAGraphExecutor::RunOpSync()
@ 0x7fbcd7c93334 _ZNSt17_Function_handlerIFvvESt17reference_wrapperISt12_Bind_simpleIFS1_ISt5_BindIFZN6paddle9framework7details28FastThreadedSSAGraphExecutor10RunOpAsyncEPSt13unordered_mapIPNS6_12OpHandleBaseESt6atomicIiESt4hashISA_ESt8equal_toISA_ESaISt4pairIKSA_SC_EEESA_RKSt10shared_ptrINS5_13BlockingQueueImEEEEUlvE_vEEEvEEEE9_M_invokeERKSt9_Any_data
@ 0x7fbcd7c9306f paddle::framework::details::FastThreadedSSAGraphExecutor::RunOp()
@ 0x7fbcd5667753 std::_Function_handler<>::_M_invoke()
@ 0x7fbcd7c93334 _ZNSt17_Function_handlerIFvvESt17reference_wrapperISt12_Bind_simpleIFS1_ISt5_BindIFZN6paddle9framework7details28FastThreadedSSAGraphExecutor10RunOpAsyncEPSt13unordered_mapIPNS6_12OpHandleBaseESt6atomicIiESt4hashISA_ESt8equal_toISA_ESaISt4pairIKSA_SC_EEESA_RKSt10shared_ptrINS5_13BlockingQueueImEEEEUlvE_vEEEvEEEE9_M_invokeERKSt9_Any_data
@ 0x7fbcd53f5b07 std::__future_base::_State_base::_M_do_set()
@ 0x7fbcd5667753 std::_Function_handler<>::_M_invoke()
@ 0x7fbd25f0bbe0 __GI___pthread_once
@ 0x7fbcd5667753 std::_Function_handler<>::_M_invoke()
@ 0x7fbcd7c8e822 _ZNSt13__future_base11_Task_stateISt5_BindIFZN6paddle9framework7details28FastThreadedSSAGraphExecutor10RunOpAsyncEPSt13unordered_mapIPNS4_12OpHandleBaseESt6atomicIiESt4hashIS8_ESt8equal_toIS8_ESaISt4pairIKS8_SA_EEES8_RKSt10shared_ptrINS3_13BlockingQueueImEEEEUlvE_vEESaIiEFvvEE6_M_runEv
@ 0x7fbcd53f5b07 std::__future_base::_State_base::_M_do_set()
@ 0x7fbcd53f7d34 _ZZN10ThreadPoolC1EmENKUlvE_clEv
@ 0x7fbcd53f5b07 std::__future_base::_State_base::_M_do_set()
@ 0x7fbd25f0bbe0 __GI___pthread_once
@ 0x7fbd0aed3c5c execute_native_thread_routine_compat
@ 0x7fbcd7c8e822 _ZNSt13__future_base11_Task_stateISt5_BindIFZN6paddle9framework7details28FastThreadedSSAGraphExecutor10RunOpAsyncEPSt13unordered_mapIPNS4_12OpHandleBaseESt6atomicIiESt4hashIS8_ESt8equal_toIS8_ESaISt4pairIKS8_SA_EEES8_RKSt10shared_ptrINS3_13BlockingQueueImEEEEUlvE_vEESaIiEFvvEE6_M_runEv
@ 0x7fbd25f0bbe0 __GI___pthread_once
@ 0x7fbcd7c8e822 _ZNSt13__future_base11_Task_stateISt5_BindIFZN6paddle9framework7details28FastThreadedSSAGraphExecutor10RunOpAsyncEPSt13unordered_mapIPNS4_12OpHandleBaseESt6atomicIiESt4hashIS8_ESt8equal_toIS8_ESaISt4pairIKS8_SA_EEES8_RKSt10shared_ptrINS3_13BlockingQueueImEEEEUlvE_vEESaIiEFvvEE6_M_runEv
@ 0x7fbd25f06df3 start_thread
@ 0x7fbcd53f7d34 _ZZN10ThreadPoolC1EmENKUlvE_clEv
@ 0x7fbd0aed3c5c execute_native_thread_routine_compat
@ 0x7fbd25c342cd __clone
@ 0x7fbcd53f7d34 _ZZN10ThreadPoolC1EmENKUlvE_clEv
@ (nil) (unknown)