显存充足，但报错 out of memory (#446) · Issue · PaddlePaddle / PaddleDetection

显存充足，但报错 out of memory

Created by: yinggo

用cascade_rcnn_cbr200_vd_fpn_dcnv2_nonlocal_softnms.yml训练自己的数据集显存充足，但报错 out of memory，请问该怎么解决这个问题？

`python3 -u tools/train.py -c configs/dcn/cascade_rcnn_cbr200_vd_fpn_dcnv2_nonlocal_softnms.yml -o pretrain_weights=models/cascade_rcnn_cbr200_vd_fpn_dcnv2_nonlocal_softnms --use_tb=True --tb_log_dir=tb_log_caltech/scalar --eval

P.S. batch_size已经设置为1，尝试多进程方式 python -m paddle.distributed.launch --selected_gpus 0,1,2,3 tools/train.py ... 也有同样的问题实在不知道要怎么做了，求指教...

2020-04-06 17:36:49,707-INFO: 6707 samples in file dataset/coco/annotations/instances_val2007.json 2020-04-06 17:36:49,712-INFO: places would be ommited when DataLoader is not iterable W0406 17:36:50.419684 27808 device_context.cc:237] Please NOTE: device: 0, CUDA Capability: 61, Driver API Version: 10.0, Runtime API Version: 10.0 W0406 17:36:50.422271 27808 device_context.cc:245] device: 0, cuDNN Version: 7.6. 2020-04-06 17:36:51,523-INFO: Loading parameters from models/cascade_rcnn_cbr200_vd_fpn_dcnv2_nonlocal_softnms... 2020-04-06 17:36:51,524-WARNING: models/cascade_rcnn_cbr200_vd_fpn_dcnv2_nonlocal_softnms.pdparams not found, try to load model file saved with [ save_params, save_persistables, save_vars ] 2020-04-06 17:36:51,524-WARNING: models/cascade_rcnn_cbr200_vd_fpn_dcnv2_nonlocal_softnms.pdparams not found, try to load model file saved with [ save_params, save_persistables, save_vars ] loading annotations into memory... Done (t=0.20s) creating index... index created! 2020-04-06 17:36:53,468-WARNING: Found an invalid bbox in annotations: im_id: 5387, area: -10.0 x1: 348, y1: 176, x2: 348, y2: 196. 2020-04-06 17:36:53,481-WARNING: Found an invalid bbox in annotations: im_id: 5765, area: -10.0 x1: 71, y1: 174, x2: 71, y2: 197. 2020-04-06 17:36:53,686-INFO: 15649 samples in file dataset/coco/annotations/instances_train2007.json 2020-04-06 17:36:53,699-INFO: places would be ommited when DataLoader is not iterable I0406 17:36:54.286912 27808 parallel_executor.cc:440] The Program will be executed on CUDA using ParallelExecutor, 4 cards are used, so 4 programs are executed in parallel. W0406 17:37:00.890586 27808 fuse_all_reduce_op_pass.cc:74] Find all_reduce operators: 730. To make the speed faster, some all_reduce ops are fused during training, after fusion, the number of all_reduce ops is 604. I0406 17:37:01.025799 27808 build_strategy.cc:365] SeqOnlyAllReduceOps:0, num_trainers:1 I0406 17:37:28.992170 27808 parallel_executor.cc:307] Inplace strategy is enabled, when build_strategy.enable_inplace = True I0406 17:37:29.978662 27808 parallel_executor.cc:375] Garbage collection strategy is enabled, when FLAGS_eager_delete_tensor_gb = 0 W0406 17:37:42.174067 508 operator.cc:181] deformable_conv raises an exception paddle::memory::allocation::BadAlloc,

C++ Call Stacks (More useful to developers):

0 std::string paddle::platform::GetTraceBackStringstd::string(std::string&&, char const*, int) 1 paddle::memory::allocation::CUDAAllocator::AllocateImpl(unsigned long) 2 paddle::memory::allocation::AlignedAllocator::AllocateImpl(unsigned long) 3 paddle::memory::allocation::AutoGrowthBestFitAllocator::AllocateImpl(unsigned long) 4 paddle::memory::allocation::Allocator::Allocate(unsigned long) 5 paddle::memory::allocation::RetryAllocator::AllocateImpl(unsigned long) 6 paddle::memory::allocation::AllocatorFacade::Alloc(paddle::platform::Place const&, unsigned long) 7 paddle::memory::Alloc(paddle::platform::Place const&, unsigned long) 8 paddle::memory::Alloc(paddle::platform::DeviceContext const&, unsigned long) 9 paddle::framework::Tensor paddle::framework::ExecutionContext::AllocateTmpTensor<float, paddle::platform::CUDADeviceContext>(paddle::framework::DDim const&, paddle::platform::CUDADeviceContext const&) const 10 paddle::operators::DeformableConvCUDAKernel<paddle::platform::CUDADeviceContext, float>::Compute(paddle::framework::ExecutionContext const&) const 11 std::_Function_handler<void (paddle::framework::ExecutionContext const&), paddle::framework::OpKernelRegistrarFunctor<paddle::platform::CUDAPlace, false, 0ul, paddle::operators::DeformableConvCUDAKernel<paddle::platform::CUDADeviceContext, float> >::operator()(char const*, char const*, int) const::{lambda(paddle::framework::ExecutionContext const&)#1}>::_M_invoke(std::_Any_data const&, paddle::framework::ExecutionContext const&) 12 paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, paddle::platform::Place const&, paddle::framework::RuntimeContext*) const 13 paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, paddle::platform::Place const&) const 14 paddle::framework::OperatorBase::Run(paddle::framework::Scope const&, paddle::platform::Place const&) 15 paddle::framework::details::ComputationOpHandle::RunImpl() 16 paddle::framework::details::FastThreadedSSAGraphExecutor::RunOpSync(paddle::framework::details::OpHandleBase*) 17 paddle::framework::details::FastThreadedSSAGraphExecutor::RunOp(paddle::framework::details::OpHandleBase*, std::shared_ptr<paddle::framework::BlockingQueue > const&, unsigned long*) 18 std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result, std::__future_base::_Result_base::_Deleter>, void> >::_M_invoke(std::_Any_data const&) 19 std::__future_base::_State_base::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>&, bool&) 20 ThreadPool::ThreadPool(unsigned long)::{lambda()#1}::operator()() const

Error Message Summary:

ResourceExhaustedError:

Out of memory error on GPU 2. Cannot allocate 66.797119MB memory on GPU 2, available memory is only 31.062500MB.

Please check whether there is any other process using GPU 2.

If yes, please stop them, or start PaddlePaddle on another GPU.
If no, please decrease the batch size of your model.

at (/paddle/paddle/fluid/memory/allocation/cuda_allocator.cc:69) F0406 17:37:42.174147 508 exception_holder.h:37] std::exception caught,

C++ Call Stacks (More useful to developers):

Error Message Summary:

ResourceExhaustedError:

Out of memory error on GPU 2. Cannot allocate 66.797119MB memory on GPU 2, available memory is only 31.062500MB.

Please check whether there is any other process using GPU 2.

If yes, please stop them, or start PaddlePaddle on another GPU.
If no, please decrease the batch size of your model.

at (/paddle/paddle/fluid/memory/allocation/cuda_allocator.cc:69) *** Check failure stack trace: *** @ 0x7fdb53276c2d google::LogMessage::Fail() @ 0x7fdb5327a6dc google::LogMessage::SendToLog() @ 0x7fdb53276753 google::LogMessage::Flush() @ 0x7fdb5327bbee google::LogMessageFatal::~LogMessageFatal() @ 0x7fdb558509b8 paddle::framework::details::ExceptionHolder::Catch() @ 0x7fdb558fc68e paddle::framework::details::FastThreadedSSAGraphExecutor::RunOpSync() @ 0x7fdb558fb29f paddle::framework::details::FastThreadedSSAGraphExecutor::RunOp() @ 0x7fdb558fb564 _ZNSt17_Function_handlerIFvvESt17reference_wrapperISt12_Bind_simpleIFS1_ISt5_BindIFZN6paddle9framework7details28FastThreadedSSAGraphExecutor10RunOpAsyncEPSt13unordered_mapIPNS6_12OpHandleBaseESt6atomicIiESt4hashISA_ESt8equal_toISA_ESaISt4pairIKSA_SC_EEESA_RKSt10shared_ptrINS5_13BlockingQueueImEEEEUlvE_vEEEvEEEE9_M_invokeERKSt9_Any_data @ 0x7fdb532cf983 std::_Function_handler<>::_M_invoke() @ 0x7fdb5305dc37 std::__future_base::_State_base::_M_do_set() @ 0x7fdb8dfe5a99 __pthread_once_slow @ 0x7fdb558f6a52 _ZNSt13__future_base11_Task_stateISt5_BindIFZN6paddle9framework7details28FastThreadedSSAGraphExecutor10RunOpAsyncEPSt13unordered_mapIPNS4_12OpHandleBaseESt6atomicIiESt4hashIS8_ESt8equal_toIS8_ESaISt4pairIKS8_SA_EEES8_RKSt10shared_ptrINS3_13BlockingQueueImEEEEUlvE_vEESaIiEFvvEE6_M_runEv @ 0x7fdb5305fe64 _ZZN10ThreadPoolC1EmENKUlvE_clEv @ 0x7fdb7f27d3e7 execute_native_thread_routine_compat @ 0x7fdb8dfde6ba start_thread @ 0x7fdb8dd1441d clone @ (nil) (unknown) Aborted (core dumped)

nvidia-smi

Mon Apr 6 17:51:08 2020

+-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 1523 G /usr/lib/xorg/Xorg 215MiB | | 0 3783 G /opt/teamviewer/tv_bin/TeamViewer 41MiB | | 0 12214 G /usr/bin/nvidia-settings 1MiB | | 0 20941 G compiz 109MiB | | 0 26423 G .../local/MATLAB/R2018a/bin/glnxa64/MATLAB 6MiB | +-----------------------------------------------------------------------------+

PaddlePaddle / PaddleDetection 大约 2 年 前同步成功

显存充足，但报错 out of memory

C++ Call Stacks (More useful to developers):

Error Message Summary:

C++ Call Stacks (More useful to developers):

Error Message Summary:

PaddlePaddle / PaddleDetection
大约 2 年前同步成功