显存充足,但报错 out of memory
Created by: yinggo
用cascade_rcnn_cbr200_vd_fpn_dcnv2_nonlocal_softnms.yml训练自己的数据集 显存充足,但报错 out of memory,请问该怎么解决这个问题?
`python3 -u tools/train.py -c configs/dcn/cascade_rcnn_cbr200_vd_fpn_dcnv2_nonlocal_softnms.yml -o pretrain_weights=models/cascade_rcnn_cbr200_vd_fpn_dcnv2_nonlocal_softnms --use_tb=True --tb_log_dir=tb_log_caltech/scalar --eval
P.S. batch_size已经设置为1, 尝试多进程方式 python -m paddle.distributed.launch --selected_gpus 0,1,2,3 tools/train.py ... 也有同样的问题 实在不知道要怎么做了,求指教...
2020-04-06 17:36:49,707-INFO: 6707 samples in file dataset/coco/annotations/instances_val2007.json 2020-04-06 17:36:49,712-INFO: places would be ommited when DataLoader is not iterable W0406 17:36:50.419684 27808 device_context.cc:237] Please NOTE: device: 0, CUDA Capability: 61, Driver API Version: 10.0, Runtime API Version: 10.0 W0406 17:36:50.422271 27808 device_context.cc:245] device: 0, cuDNN Version: 7.6. 2020-04-06 17:36:51,523-INFO: Loading parameters from models/cascade_rcnn_cbr200_vd_fpn_dcnv2_nonlocal_softnms... 2020-04-06 17:36:51,524-WARNING: models/cascade_rcnn_cbr200_vd_fpn_dcnv2_nonlocal_softnms.pdparams not found, try to load model file saved with [ save_params, save_persistables, save_vars ] 2020-04-06 17:36:51,524-WARNING: models/cascade_rcnn_cbr200_vd_fpn_dcnv2_nonlocal_softnms.pdparams not found, try to load model file saved with [ save_params, save_persistables, save_vars ] loading annotations into memory... Done (t=0.20s) creating index... index created! 2020-04-06 17:36:53,468-WARNING: Found an invalid bbox in annotations: im_id: 5387, area: -10.0 x1: 348, y1: 176, x2: 348, y2: 196. 2020-04-06 17:36:53,481-WARNING: Found an invalid bbox in annotations: im_id: 5765, area: -10.0 x1: 71, y1: 174, x2: 71, y2: 197. 2020-04-06 17:36:53,686-INFO: 15649 samples in file dataset/coco/annotations/instances_train2007.json 2020-04-06 17:36:53,699-INFO: places would be ommited when DataLoader is not iterable I0406 17:36:54.286912 27808 parallel_executor.cc:440] The Program will be executed on CUDA using ParallelExecutor, 4 cards are used, so 4 programs are executed in parallel. W0406 17:37:00.890586 27808 fuse_all_reduce_op_pass.cc:74] Find all_reduce operators: 730. To make the speed faster, some all_reduce ops are fused during training, after fusion, the number of all_reduce ops is 604. I0406 17:37:01.025799 27808 build_strategy.cc:365] SeqOnlyAllReduceOps:0, num_trainers:1 I0406 17:37:28.992170 27808 parallel_executor.cc:307] Inplace strategy is enabled, when build_strategy.enable_inplace = True I0406 17:37:29.978662 27808 parallel_executor.cc:375] Garbage collection strategy is enabled, when FLAGS_eager_delete_tensor_gb = 0 W0406 17:37:42.174067 508 operator.cc:181] deformable_conv raises an exception paddle::memory::allocation::BadAlloc,
C++ Call Stacks (More useful to developers):
0 std::string paddle::platform::GetTraceBackStringstd::string(std::string&&, char const*, int) 1 paddle::memory::allocation::CUDAAllocator::AllocateImpl(unsigned long) 2 paddle::memory::allocation::AlignedAllocator::AllocateImpl(unsigned long) 3 paddle::memory::allocation::AutoGrowthBestFitAllocator::AllocateImpl(unsigned long) 4 paddle::memory::allocation::Allocator::Allocate(unsigned long) 5 paddle::memory::allocation::RetryAllocator::AllocateImpl(unsigned long) 6 paddle::memory::allocation::AllocatorFacade::Alloc(paddle::platform::Place const&, unsigned long) 7 paddle::memory::Alloc(paddle::platform::Place const&, unsigned long) 8 paddle::memory::Alloc(paddle::platform::DeviceContext const&, unsigned long) 9 paddle::framework::Tensor paddle::framework::ExecutionContext::AllocateTmpTensor<float, paddle::platform::CUDADeviceContext>(paddle::framework::DDim const&, paddle::platform::CUDADeviceContext const&) const 10 paddle::operators::DeformableConvCUDAKernel<paddle::platform::CUDADeviceContext, float>::Compute(paddle::framework::ExecutionContext const&) const 11 std::_Function_handler<void (paddle::framework::ExecutionContext const&), paddle::framework::OpKernelRegistrarFunctor<paddle::platform::CUDAPlace, false, 0ul, paddle::operators::DeformableConvCUDAKernel<paddle::platform::CUDADeviceContext, float> >::operator()(char const*, char const*, int) const::{lambda(paddle::framework::ExecutionContext const&)#1}>::_M_invoke(std::_Any_data const&, paddle::framework::ExecutionContext const&) 12 paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, paddle::platform::Place const&, paddle::framework::RuntimeContext*) const 13 paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, paddle::platform::Place const&) const 14 paddle::framework::OperatorBase::Run(paddle::framework::Scope const&, paddle::platform::Place const&) 15 paddle::framework::details::ComputationOpHandle::RunImpl() 16 paddle::framework::details::FastThreadedSSAGraphExecutor::RunOpSync(paddle::framework::details::OpHandleBase*) 17 paddle::framework::details::FastThreadedSSAGraphExecutor::RunOp(paddle::framework::details::OpHandleBase*, std::shared_ptr<paddle::framework::BlockingQueue > const&, unsigned long*) 18 std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result, std::__future_base::_Result_base::_Deleter>, void> >::_M_invoke(std::_Any_data const&) 19 std::__future_base::_State_base::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>&, bool&) 20 ThreadPool::ThreadPool(unsigned long)::{lambda()#1}::operator()() const
Error Message Summary:
ResourceExhaustedError:
Out of memory error on GPU 2. Cannot allocate 66.797119MB memory on GPU 2, available memory is only 31.062500MB.
Please check whether there is any other process using GPU 2.
- If yes, please stop them, or start PaddlePaddle on another GPU.
- If no, please decrease the batch size of your model.
at (/paddle/paddle/fluid/memory/allocation/cuda_allocator.cc:69) F0406 17:37:42.174147 508 exception_holder.h:37] std::exception caught,
C++ Call Stacks (More useful to developers):
0 std::string paddle::platform::GetTraceBackStringstd::string(std::string&&, char const*, int) 1 paddle::memory::allocation::CUDAAllocator::AllocateImpl(unsigned long) 2 paddle::memory::allocation::AlignedAllocator::AllocateImpl(unsigned long) 3 paddle::memory::allocation::AutoGrowthBestFitAllocator::AllocateImpl(unsigned long) 4 paddle::memory::allocation::Allocator::Allocate(unsigned long) 5 paddle::memory::allocation::RetryAllocator::AllocateImpl(unsigned long) 6 paddle::memory::allocation::AllocatorFacade::Alloc(paddle::platform::Place const&, unsigned long) 7 paddle::memory::Alloc(paddle::platform::Place const&, unsigned long) 8 paddle::memory::Alloc(paddle::platform::DeviceContext const&, unsigned long) 9 paddle::framework::Tensor paddle::framework::ExecutionContext::AllocateTmpTensor<float, paddle::platform::CUDADeviceContext>(paddle::framework::DDim const&, paddle::platform::CUDADeviceContext const&) const 10 paddle::operators::DeformableConvCUDAKernel<paddle::platform::CUDADeviceContext, float>::Compute(paddle::framework::ExecutionContext const&) const 11 std::_Function_handler<void (paddle::framework::ExecutionContext const&), paddle::framework::OpKernelRegistrarFunctor<paddle::platform::CUDAPlace, false, 0ul, paddle::operators::DeformableConvCUDAKernel<paddle::platform::CUDADeviceContext, float> >::operator()(char const*, char const*, int) const::{lambda(paddle::framework::ExecutionContext const&)#1}>::_M_invoke(std::_Any_data const&, paddle::framework::ExecutionContext const&) 12 paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, paddle::platform::Place const&, paddle::framework::RuntimeContext*) const 13 paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, paddle::platform::Place const&) const 14 paddle::framework::OperatorBase::Run(paddle::framework::Scope const&, paddle::platform::Place const&) 15 paddle::framework::details::ComputationOpHandle::RunImpl() 16 paddle::framework::details::FastThreadedSSAGraphExecutor::RunOpSync(paddle::framework::details::OpHandleBase*) 17 paddle::framework::details::FastThreadedSSAGraphExecutor::RunOp(paddle::framework::details::OpHandleBase*, std::shared_ptr<paddle::framework::BlockingQueue > const&, unsigned long*) 18 std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result, std::__future_base::_Result_base::_Deleter>, void> >::_M_invoke(std::_Any_data const&) 19 std::__future_base::_State_base::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>&, bool&) 20 ThreadPool::ThreadPool(unsigned long)::{lambda()#1}::operator()() const
Error Message Summary:
ResourceExhaustedError:
Out of memory error on GPU 2. Cannot allocate 66.797119MB memory on GPU 2, available memory is only 31.062500MB.
Please check whether there is any other process using GPU 2.
- If yes, please stop them, or start PaddlePaddle on another GPU.
- If no, please decrease the batch size of your model.
at (/paddle/paddle/fluid/memory/allocation/cuda_allocator.cc:69) *** Check failure stack trace: *** @ 0x7fdb53276c2d google::LogMessage::Fail() @ 0x7fdb5327a6dc google::LogMessage::SendToLog() @ 0x7fdb53276753 google::LogMessage::Flush() @ 0x7fdb5327bbee google::LogMessageFatal::~LogMessageFatal() @ 0x7fdb558509b8 paddle::framework::details::ExceptionHolder::Catch() @ 0x7fdb558fc68e paddle::framework::details::FastThreadedSSAGraphExecutor::RunOpSync() @ 0x7fdb558fb29f paddle::framework::details::FastThreadedSSAGraphExecutor::RunOp() @ 0x7fdb558fb564 _ZNSt17_Function_handlerIFvvESt17reference_wrapperISt12_Bind_simpleIFS1_ISt5_BindIFZN6paddle9framework7details28FastThreadedSSAGraphExecutor10RunOpAsyncEPSt13unordered_mapIPNS6_12OpHandleBaseESt6atomicIiESt4hashISA_ESt8equal_toISA_ESaISt4pairIKSA_SC_EEESA_RKSt10shared_ptrINS5_13BlockingQueueImEEEEUlvE_vEEEvEEEE9_M_invokeERKSt9_Any_data @ 0x7fdb532cf983 std::_Function_handler<>::_M_invoke() @ 0x7fdb5305dc37 std::__future_base::_State_base::_M_do_set() @ 0x7fdb8dfe5a99 __pthread_once_slow @ 0x7fdb558f6a52 _ZNSt13__future_base11_Task_stateISt5_BindIFZN6paddle9framework7details28FastThreadedSSAGraphExecutor10RunOpAsyncEPSt13unordered_mapIPNS4_12OpHandleBaseESt6atomicIiESt4hashIS8_ESt8equal_toIS8_ESaISt4pairIKS8_SA_EEES8_RKSt10shared_ptrINS3_13BlockingQueueImEEEEUlvE_vEESaIiEFvvEE6_M_runEv @ 0x7fdb5305fe64 _ZZN10ThreadPoolC1EmENKUlvE_clEv @ 0x7fdb7f27d3e7 execute_native_thread_routine_compat @ 0x7fdb8dfde6ba start_thread @ 0x7fdb8dd1441d clone @ (nil) (unknown) Aborted (core dumped)
nvidia-smi
Mon Apr 6 17:51:08 2020
+-----------------------------------------------------------------------------+ | NVIDIA-SMI 410.78 Driver Version: 410.78 CUDA Version: 10.0 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 TITAN X (Pascal) Off | 00000000:05:00.0 On | N/A | | 26% 46C P8 19W / 250W | 380MiB / 12188MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 1 TITAN X (Pascal) Off | 00000000:06:00.0 Off | N/A | | 26% 46C P8 18W / 250W | 2MiB / 12196MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 2 TITAN X (Pascal) Off | 00000000:09:00.0 Off | N/A | | 25% 44C P8 19W / 250W | 2MiB / 12196MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 3 TITAN X (Pascal) Off | 00000000:0A:00.0 Off | N/A | | 23% 40C P8 17W / 250W | 2MiB / 12196MiB | 0% Default | +-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 1523 G /usr/lib/xorg/Xorg 215MiB | | 0 3783 G /opt/teamviewer/tv_bin/TeamViewer 41MiB | | 0 12214 G /usr/bin/nvidia-settings 1MiB | | 0 20941 G compiz 109MiB | | 0 26423 G .../local/MATLAB/R2018a/bin/glnxa64/MATLAB 6MiB | +-----------------------------------------------------------------------------+
`