Failed to run cascade_rcnn_cbr200_vd_fpn_dcnv2_nonlocal_softnms.yml
Created by: whlin-pku
32g v100, 4 gpu
[2020/08/12 11:44:22] -------------------------------------------- [2020/08/12 11:44:22] C++ Call Stacks (More useful to developers): [2020/08/12 11:44:22] -------------------------------------------- [2020/08/12 11:44:22] 0 std::string paddle::platform::GetTraceBackStringstd::string(std::string&&, char const*, int) [2020/08/12 11:44:22] 1 paddle::memory::allocation::CUDAAllocator::AllocateImpl(unsigned long) [2020/08/12 11:44:22] 2 paddle::memory::allocation::AlignedAllocator::AllocateImpl(unsigned long) [2020/08/12 11:44:22] 3 paddle::memory::allocation::AutoGrowthBestFitAllocator::AllocateImpl(unsigned long) [2020/08/12 11:44:22] 4 paddle::memory::allocation::Allocator::Allocate(unsigned long) [2020/08/12 11:44:22] 5 paddle::memory::allocation::RetryAllocator::AllocateImpl(unsigned long) [2020/08/12 11:44:22] 6 paddle::memory::allocation::AllocatorFacade::Alloc(paddle::platform::Place const&, unsigned long) [2020/08/12 11:44:22] 7 paddle::memory::allocation::AllocatorFacade::AllocShared(paddle::platform::Place const&, unsigned long) [2020/08/12 11:44:22] 8 paddle::memory::AllocShared(paddle::platform::Place const&, unsigned long) [2020/08/12 11:44:22] 9 paddle::framework::Tensor::mutable_data(paddle::platform::Place const&, paddle::framework::proto::VarType_Type, unsigned long) [2020/08/12 11:44:22] 10 paddle::operators::CUDNNConvGradOpKernel::Compute(paddle::framework::ExecutionContext const&) const [2020/08/12 11:44:22] 11 std::_Function_handler<void (paddle::framework::ExecutionContext const&), paddle::framework::OpKernelRegistrarFunctor<paddle::platform::CUDAPlace, false, 0ul, paddle::operators::CUDNNConvGradOpKernel, paddle::operators::CUDNNConvGradOpKernel, paddle::operators::CUDNNConvGradOpKernelpaddle::platform::float16 >::operator()(char const*, char const*, int) const::{lambda(paddle::framework::ExecutionContext const&)#1}>::_M_invoke(std::_Any_data const&, paddle::framework::ExecutionContext const&) [2020/08/12 11:44:22] 12 paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, paddle::platform::Place const&, paddle::framework::RuntimeContext*) const [2020/08/12 11:44:22] 13 paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, paddle::platform::Place const&) const [2020/08/12 11:44:22] 14 paddle::framework::OperatorBase::Run(paddle::framework::Scope const&, paddle::platform::Place const&) [2020/08/12 11:44:22] 15 paddle::framework::details::ComputationOpHandle::RunImpl() [2020/08/12 11:44:22] 16 paddle::framework::details::FastThreadedSSAGraphExecutor::RunOpSync(paddle::framework::details::OpHandleBase*) [2020/08/12 11:44:22] 17 paddle::framework::details::FastThreadedSSAGraphExecutor::RunOp(paddle::framework::details::OpHandleBase*, std::shared_ptr<paddle::framework::BlockingQueue > const&, unsigned long*) [2020/08/12 11:44:22] 18 std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result, std::__future_base::_Result_base::_Deleter>, void> >::_M_invoke(std::_Any_data const&) [2020/08/12 11:44:22] 19 std::__future_base::_State_base::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>&, bool&) [2020/08/12 11:44:22] 20 ThreadPool::ThreadPool(unsigned long)::{lambda()#1}::operator()() const [2020/08/12 11:44:22] [2020/08/12 11:44:22] ---------------------- [2020/08/12 11:44:22] Error Message Summary: [2020/08/12 11:44:22] ---------------------- [2020/08/12 11:44:22] ResourceExhaustedError: [2020/08/12 11:44:22] [2020/08/12 11:44:22] Out of memory error on GPU 1. Cannot allocate 128.125244MB memory on GPU 1, available memory is only 55.875000MB. [2020/08/12 11:44:22] [2020/08/12 11:44:22] Please check whether there is any other process using GPU 1. [2020/08/12 11:44:22] 1. If yes, please stop them, or start PaddlePaddle on another GPU. [2020/08/12 11:44:22] 2. If no, please decrease the batch size of your model. [2020/08/12 11:44:22] [2020/08/12 11:44:22] at (/paddle/paddle/fluid/memory/allocation/cuda_allocator.cc:69) [2020/08/12 11:44:22] F0812 11:44:22.577244 233 exception_holder.h:37] std::exception caught, [2020/08/12 11:44:22] [2020/08/12 11:44:22] -------------------------------------------- [2020/08/12 11:44:22] C++ Call Stacks (More useful to developers): [2020/08/12 11:44:22] 0 std::string paddle::platform::GetTraceBackStringstd::string(std::string&&, char const*, int) [2020/08/12 11:44:22] 1 paddle::memory::allocation::CUDAAllocator::AllocateImpl(unsigned long) [2020/08/12 11:44:22] 2 paddle::memory::allocation::AlignedAllocator::AllocateImpl(unsigned long) [2020/08/12 11:44:22] 3 paddle::memory::allocation::AutoGrowthBestFitAllocator::AllocateImpl(unsigned long) [2020/08/12 11:44:22] 4 paddle::memory::allocation::Allocator::Allocate(unsigned long) [2020/08/12 11:44:22] 5 paddle::memory::allocation::RetryAllocator::AllocateImpl(unsigned long) [2020/08/12 11:44:22] 6 paddle::memory::allocation::AllocatorFacade::Alloc(paddle::platform::Place const&, unsigned long) [2020/08/12 11:44:22] 7 paddle::memory::allocation::AllocatorFacade::AllocShared(paddle::platform::Place const&, unsigned long) [2020/08/12 11:44:22] 8 paddle::memory::AllocShared(paddle::platform::Place const&, unsigned long) [2020/08/12 11:44:22] 9 paddle::framework::Tensor::mutable_data(paddle::platform::Place const&, paddle::framework::proto::VarType_Type, unsigned long) [2020/08/12 11:44:22] 10 paddle::operators::CUDNNConvGradOpKernel::Compute(paddle::framework::ExecutionContext const&) const [2020/08/12 11:44:22] 11 std::_Function_handler<void (paddle::framework::ExecutionContext const&), paddle::framework::OpKernelRegistrarFunctor<paddle::platform::CUDAPlace, false, 0ul, paddle::operators::CUDNNConvGradOpKernel, paddle::operators::CUDNNConvGradOpKernel, paddle::operators::CUDNNConvGradOpKernelpaddle::platform::float16 >::operator()(char const*, char const*, int) const::{lambda(paddle::framework::ExecutionContext const&)#1}>::_M_invoke(std::_Any_data const&, paddle::framework::ExecutionContext const&) [2020/08/12 11:44:22] 12 paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, paddle::platform::Place const&, paddle::framework::RuntimeContext*) const [2020/08/12 11:44:22] 13 paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, paddle::platform::Place const&) const [2020/08/12 11:44:22] 14 paddle::framework::OperatorBase::Run(paddle::framework::Scope const&, paddle::platform::Place const&) [2020/08/12 11:44:22] 15 paddle::framework::details::ComputationOpHandle::RunImpl() [2020/08/12 11:44:22] 16 paddle::framework::details::FastThreadedSSAGraphExecutor::RunOpSync(paddle::framework::details::OpHandleBase*) [2020/08/12 11:44:22] 17 paddle::framework::details::FastThreadedSSAGraphExecutor::RunOp(paddle::framework::details::OpHandleBase*, std::shared_ptr<paddle::framework::BlockingQueue > const&, unsigned long*) [2020/08/12 11:44:22] 18 std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result, std::__future_base::_Result_base::_Deleter>, void> >::_M_invoke(std::_Any_data const&) [2020/08/12 11:44:22] 19 std::__future_base::_State_base::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>&, bool&) [2020/08/12 11:44:22] 20 ThreadPool::ThreadPool(unsigned long)::{lambda()#1}::operator()() const [2020/08/12 11:44:22] [2020/08/12 11:44:22] ---------------------- [2020/08/12 11:44:22] Error Message Summary: [2020/08/12 11:44:22] ---------------------- [2020/08/12 11:44:22] ResourceExhaustedError: [2020/08/12 11:44:22] [2020/08/12 11:44:22] Out of memory error on GPU 1. Cannot allocate 128.125244MB memory on GPU 1, available memory is only 55.875000MB. [2020/08/12 11:44:22] [2020/08/12 11:44:22] Please check whether there is any other process using GPU 1. [2020/08/12 11:44:22] 1. If yes, please stop them, or start PaddlePaddle on another GPU. [2020/08/12 11:44:22] 2. If no, please decrease the batch size of your model. [2020/08/12 11:44:22] [2020/08/12 11:44:22] at (/paddle/paddle/fluid/memory/allocation/cuda_allocator.cc:69) [2020/08/12 11:44:22] *** Check failure stack trace: *** [2020/08/12 11:44:22] @ 0x7fb325efc96d google::LogMessage::Fail() [2020/08/12 11:44:22] @ 0x7fb325f0041c google::LogMessage::SendToLog() [2020/08/12 11:44:22] @ 0x7fb325efc493 google::LogMessage::Flush() [2020/08/12 11:44:22] @ 0x7fb325f0192e google::LogMessageFatal::~LogMessageFatal() [2020/08/12 11:44:22] @ 0x7fb3290cd448 paddle::framework::details::ExceptionHolder::Catch() [2020/08/12 11:44:22] @ 0x7fb32916853e paddle::framework::details::FastThreadedSSAGraphExecutor::RunOpSync() [2020/08/12 11:44:22] @ 0x7fb329165edf paddle::framework::details::FastThreadedSSAGraphExecutor::RunOp() [2020/08/12 11:44:22] @ 0x7fb3291661a4 _ZNSt17_Function_handlerIFvvESt17reference_wrapperISt12_Bind_simpleIFS1_ISt5_BindIFZN6paddle9framework7details28FastThreadedSSAGraphExecutor10RunOpAsyncEPSt13unordered_mapIPNS6_12OpHandleBaseESt6atomicIiESt4hashISA_ESt8equal_toISA_ESaISt4pairIKSA_SC_EEESA_RKSt10shared_ptrINS5_13BlockingQueueImEEEEUlvE_vEEEvEEEE9_M_invokeERKSt9_Any_data [2020/08/12 11:44:22] @ 0x7fb325f59e43 std::_Function_handler<>::_M_invoke() [2020/08/12 11:44:22] @ 0x7fb325d555e7 std::__future_base::_State_base::_M_do_set() [2020/08/12 11:44:22] @ 0x7fb46b6f3a99 __pthread_once_slow [2020/08/12 11:44:22] @ 0x7fb329162372 _ZNSt13__future_base11_Task_stateISt5_BindIFZN6paddle9framework7details28FastThreadedSSAGraphExecutor10RunOpAsyncEPSt13unordered_mapIPNS4_12OpHandleBaseESt6atomicIiESt4hashIS8_ESt8equal_toIS8_ESaISt4pairIKS8_SA_EEES8_RKSt10shared_ptrINS3_13BlockingQueueImEEEEUlvE_vEESaIiEFvvEE6_M_runEv [2020/08/12 11:44:22] @ 0x7fb325d57a44 _ZZN10ThreadPoolC1EmENKUlvE_clEv [2020/08/12 11:44:22] @ 0x7fb16e9dbc80 (unknown) [2020/08/12 11:44:22] @ 0x7fb46b6ec6ba start_thread [2020/08/12 11:44:22] @ 0x7fb46b42241d clone [2020/08/12 11:44:22] @ (nil) (unknown) [2020/08/12 11:44:30] Aborted (core dumped)