模型训练显卡问题
Created by: weiaoliu
我在训练obj365/cascade_rcnn_cls_aware_r200_vd_fpn_dcnv2_nonlocal_softnms.yml的时候,export CUDA_VISIBLE_DEVICES=3,5指定显卡的时候,不管用,在train.py添加os.environ['CUDA_VISIBLE_DEVICES'] = '3,5',指定显卡时,还是不行。会跑到别的GPU上。我用的0.3的paddle。
export CUDA_VISIBLE_DEVICES=3,5 (wei) wangrunqi@irecog:/data/wei/paddle0.3/PaddleDetection$ python -u tools/train.py -c configs/obj365/cascade_rcnn_cls_aware_r200_vd_fpn_dcnv2_nonlocal_softnms.yml 2020-07-30 09:42:57,638-INFO: If regularizer of a Parameter has been set by 'fluid.ParamAttr' or 'fluid.WeightNormParamAttr' already. The Regularization[L2Decay, regularization_coeff=0.000100] in Optimizer will not take effect, and it will only be applied to other Parameters! W0730 09:42:58.374402 48135 device_context.cc:252] Please NOTE: device: 0, CUDA Capability: 75, Driver API Version: 10.0, Runtime API Version: 10.0 W0730 09:42:58.379374 48135 device_context.cc:260] device: 0, cuDNN Version: 7.6. 2020-07-30 09:43:01,022-WARNING: /home/wangrunqi/.cache/paddle/weights/ResNet200_vd_pretrained.pdparams not found, try to load model file saved with [ save_params, save_persistables, save_vars ] /opt/Anaconda3/envs/wei/lib/python3.7/site-packages/paddle/fluid/io.py:1998: UserWarning: This list is not set, Because of Paramerter not found in program. There are: fc_0.b_0 fc_0.w_0 format(" ".join(unused_para_list))) 2020-07-30 09:43:13,167-INFO: places would be ommited when DataLoader is not iterable W0730 09:43:16.844976 48135 fuse_all_reduce_op_pass.cc:74] Find all_reduce operators: 380. To make the speed faster, some all_reduce ops are fused during training, after fusion, the number of all_reduce ops is 317. W0730 09:43:31.182279 48566 operator.cc:187] concat raises an exception paddle::memory::allocation::BadAlloc,
C++ Call Stacks (More useful to developers):
0 std::string paddle::platform::GetTraceBackStringstd::string(std::string&&, char const*, int) 1 paddle::memory::allocation::CUDAAllocator::AllocateImpl(unsigned long) 2 paddle::memory::allocation::AlignedAllocator::AllocateImpl(unsigned long) 3 paddle::memory::allocation::AutoGrowthBestFitAllocator::AllocateImpl(unsigned long) 4 paddle::memory::allocation::Allocator::Allocate(unsigned long) 5 paddle::memory::allocation::RetryAllocator::AllocateImpl(unsigned long) 6 paddle::memory::allocation::AllocatorFacade::Alloc(paddle::platform::Place const&, unsigned long) 7 paddle::memory::allocation::AllocatorFacade::AllocShared(paddle::platform::Place const&, unsigned long) 8 paddle::memory::AllocShared(paddle::platform::Place const&, unsigned long) 9 paddle::framework::Tensor::mutable_data(paddle::platform::Place const&, paddle::framework::proto::VarType_Type, unsigned long) 10 paddle::operators::ConcatKernel<paddle::platform::CUDADeviceContext, float>::Compute(paddle::framework::ExecutionContext const&) const 11 std::_Function_handler<void (paddle::framework::ExecutionContext const&), paddle::framework::OpKernelRegistrarFunctor<paddle::platform::CUDAPlace, false, 1ul, paddle::operators::ConcatKernel<paddle::platform::CUDADeviceContext, double>, paddle::operators::ConcatKernel<paddle::platform::CUDADeviceContext, float>, paddle::operators::ConcatKernel<paddle::platform::CUDADeviceContext, paddle::platform::float16>, paddle::operators::ConcatKernel<paddle::platform::CUDADeviceContext, long>, paddle::operators::ConcatKernel<paddle::platform::CUDADeviceContext, int> >::operator()(char const*, char const*, int) const::{lambda(paddle::framework::ExecutionContext const&)#1}>::_M_invoke(std::_Any_data const&, paddle::framework::ExecutionContext const&) 12 paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, paddle::platform::Place const&, paddle::framework::RuntimeContext*) const 13 paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, paddle::platform::Place const&) const 14 paddle::framework::OperatorBase::Run(paddle::framework::Scope const&, paddle::platform::Place const&) 15 paddle::framework::details::ComputationOpHandle::RunImpl() 16 paddle::framework::details::FastThreadedSSAGraphExecutor::RunOpSync(paddle::framework::details::OpHandleBase*) 17 paddle::framework::details::FastThreadedSSAGraphExecutor::RunOp(paddle::framework::details::OpHandleBase*, std::shared_ptr<paddle::framework::BlockingQueue > const&, unsigned long*) 18 std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result, std::__future_base::_Result_base::_Deleter>, void> >::_M_invoke(std::_Any_data const&) 19 std::__future_base::_State_base::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>&, bool&) 20 ThreadPool::ThreadPool(unsigned long)::{lambda()#1}::operator()() const
Error Message Summary:
ResourceExhaustedError:
Out of memory error on GPU 0. Cannot allocate 98.000244MB memory on GPU 0, available memory is only 43.625000MB.
Please check whether there is any other process using GPU 0.
- If yes, please stop them, or start PaddlePaddle on another GPU.
- If no, please decrease the batch size of your model.
at (/paddle/paddle/fluid/memory/allocation/cuda_allocator.cc:69) F0730 09:43:31.183924 48566 exception_holder.h:37] std::exception caught,
C++ Call Stacks (More useful to developers):
0 std::string paddle::platform::GetTraceBackStringstd::string(std::string&&, char const*, int) 1 paddle::memory::allocation::CUDAAllocator::AllocateImpl(unsigned long) 2 paddle::memory::allocation::AlignedAllocator::AllocateImpl(unsigned long) 3 paddle::memory::allocation::AutoGrowthBestFitAllocator::AllocateImpl(unsigned long) 4 paddle::memory::allocation::Allocator::Allocate(unsigned long) 5 paddle::memory::allocation::RetryAllocator::AllocateImpl(unsigned long) 6 paddle::memory::allocation::AllocatorFacade::Alloc(paddle::platform::Place const&, unsigned long) 7 paddle::memory::allocation::AllocatorFacade::AllocShared(paddle::platform::Place const&, unsigned long) 8 paddle::memory::AllocShared(paddle::platform::Place const&, unsigned long) 9 paddle::framework::Tensor::mutable_data(paddle::platform::Place const&, paddle::framework::proto::VarType_Type, unsigned long) 10 paddle::operators::ConcatKernel<paddle::platform::CUDADeviceContext, float>::Compute(paddle::framework::ExecutionContext const&) const 11 std::_Function_handler<void (paddle::framework::ExecutionContext const&), paddle::framework::OpKernelRegistrarFunctor<paddle::platform::CUDAPlace, false, 1ul, paddle::operators::ConcatKernel<paddle::platform::CUDADeviceContext, double>, paddle::operators::ConcatKernel<paddle::platform::CUDADeviceContext, float>, paddle::operators::ConcatKernel<paddle::platform::CUDADeviceContext, paddle::platform::float16>, paddle::operators::ConcatKernel<paddle::platform::CUDADeviceContext, long>, paddle::operators::ConcatKernel<paddle::platform::CUDADeviceContext, int> >::operator()(char const*, char const*, int) const::{lambda(paddle::framework::ExecutionContext const&)#1}>::_M_invoke(std::_Any_data const&, paddle::framework::ExecutionContext const&) 12 paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, paddle::platform::Place const&, paddle::framework::RuntimeContext*) const 13 paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, paddle::platform::Place const&) const 14 paddle::framework::OperatorBase::Run(paddle::framework::Scope const&, paddle::platform::Place const&) 15 paddle::framework::details::ComputationOpHandle::RunImpl() 16 paddle::framework::details::FastThreadedSSAGraphExecutor::RunOpSync(paddle::framework::details::OpHandleBase*) 17 paddle::framework::details::FastThreadedSSAGraphExecutor::RunOp(paddle::framework::details::OpHandleBase*, std::shared_ptr<paddle::framework::BlockingQueue > const&, unsigned long*) 18 std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result, std::__future_base::_Result_base::_Deleter>, void> >::_M_invoke(std::_Any_data const&) 19 std::__future_base::_State_base::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>&, bool&) 20 ThreadPool::ThreadPool(unsigned long)::{lambda()#1}::operator()() const
Error Message Summary:
ResourceExhaustedError:
Out of memory error on GPU 0. Cannot allocate 98.000244MB memory on GPU 0, available memory is only 43.625000MB.
Please check whether there is any other process using GPU 0.
- If yes, please stop them, or start PaddlePaddle on another GPU.
- If no, please decrease the batch size of your model.
at (/paddle/paddle/fluid/memory/allocation/cuda_allocator.cc:69)