YOLOv3 训练慢,内存占用高
Created by: ellinyang
版本、环境信息 1)docker镜像 paddlepaddle/paddle:1.4.0-gpu-cuda9.0-cudnn7 2)GPU Tesla V100 (16G显存) 训练信息 1)单机单卡训练 2)batchsize=1 3)图像大小800*800 背景 修改了yolov3的数据读入接口和评估代码,并在训练中加入测试程序。 问题 如果不设置FLAGS_fraction_of_gpu_memory_to_use ,batch_size=1的模型会占15G显存;设置了FLAGS_fraction_of_gpu_memory_to_use=0 ,模型会动态占用1G~7G的显存。当BatchSize>2时,报错如下:
W0521 08:04:10.744120 44911 device_context.cc:261] Please NOTE: device: 0, CUDA Capability: 70, Driver API Version: 9.0, Runtime API Version: 9.0
W0521 08:04:10.749207 44911 device_context.cc:269] device: 0, cuDNN Version: 7.0.
W0521 08:04:10.845206 44911 graph.h:204] WARN: After a series of passes, the current graph can be quite different from OriginProgram. So, please avoid using the `OriginProgram()` method!
W0521 08:04:10.855835 44911 graph.h:204] WARN: After a series of passes, the current graph can be quite different from OriginProgram. So, please avoid using the `OriginProgram()` method!
W0521 08:04:34.876225 44911 system_allocator.cc:121] Cannot malloc 39.0627 MB GPU memory. Please shrink FLAGS_fraction_of_gpu_memory_to_use or FLAGS_initial_gpu_memory_in_mb or FLAGS_reallocate_gpu_memory_in_mbenvironment variable to a lower value. Current FLAGS_fraction_of_gpu_memory_to_use value is 0. Current FLAGS_initial_gpu_memory_in_mb value is 0. Current FLAGS_reallocate_gpu_memory_in_mb value is 0
F0521 08:04:34.879753 44911 legacy_allocator.cc:201] Cannot allocate 39.062500MB in GPU 0, available 34.312500MBtotal 16936927232GpuMinChunkSize 256.000000BGpuMaxChunkSize 0.000000BGPU memory used: 0.000000B
*** Check failure stack trace: ***
@ 0x7f5afbd40ffd google::LogMessage::Fail()
@ 0x7f5afbd44aac google::LogMessage::SendToLog()
@ 0x7f5afbd40b23 google::LogMessage::Flush()
@ 0x7f5afbd45fbe google::LogMessageFatal::~LogMessageFatal()
@ 0x7f5afd9c5ca7 paddle::memory::legacy::Alloc<>()
@ 0x7f5afd9c5ee5 paddle::memory::allocation::LegacyAllocator::AllocateImpl()
@ 0x7f5afd9eb21b paddle::memory::allocation::Allocator::Allocate()
@ 0x7f5afd9b9aa3 paddle::memory::allocation::AllocatorFacade::Alloc()
@ 0x7f5afd9b9bc1 paddle::memory::allocation::AllocatorFacade::AllocShared()
@ 0x7f5afd5e6270 paddle::memory::AllocShared()
@ 0x7f5afd98b1ca paddle::framework::Tensor::mutable_data()
@ 0x7f5afc004606 paddle::operators::CUDNNConvOpKernel<>::Compute()
@ 0x7f5afc005c63 _ZNSt17_Function_handlerIFvRKN6paddle9framework16ExecutionContextEEZNKS1_24OpKernelRegistrarFunctorINS0_8platform9CUDAPlaceELb0ELm0EJNS0_9operators17CUDNNConvOpKernelIfEENSA_IdEENSA_INS7_7float16EEEEEclEPKcSH_iEUlS4_E_E9_M_invokeERKSt9_Any_dataS4_
@ 0x7f5afd938b90 paddle::framework::OperatorWithKernel::RunImpl()
@ 0x7f5afd938f71 paddle::framework::OperatorWithKernel::RunImpl()
@ 0x7f5afd9365ac paddle::framework::OperatorBase::Run()
@ 0x7f5afbd872ee paddle::framework::Executor::RunPreparedContext()
@ 0x7f5afbd8812f paddle::framework::Executor::Run()
@ 0x7f5afbbf987e _ZZN8pybind1112cpp_function10initializeIZN6paddle6pybindL18pybind11_init_coreERNS_6moduleEEUlRNS2_9framework8ExecutorERKNS6_11ProgramDescEPNS6_5ScopeEibbRKSt6vectorISsSaISsEEE98_vIS8_SB_SD_ibbSI_EINS_4nameENS_9is_methodENS_7siblingEEEEvOT_PFT0_DpT1_EDpRKT2_ENUlRNS_6detail13function_callEE1_4_FUNES10_
@ 0x7f5afbc3efa6 pybind11::cpp_function::dispatcher()
@ 0x4c5326 PyEval_EvalFrameEx
@ 0x4b9b66 PyEval_EvalCodeEx
@ 0x4c1f56 PyEval_EvalFrameEx
@ 0x4b9b66 PyEval_EvalCodeEx
@ 0x4c17c6 PyEval_EvalFrameEx
@ 0x4b9b66 PyEval_EvalCodeEx
@ 0x4c1f56 PyEval_EvalFrameEx
@ 0x4b9b66 PyEval_EvalCodeEx
@ 0x4eb69f (unknown)
@ 0x4e58f2 PyRun_FileExFlags
@ 0x4e41a6 PyRun_SimpleFileExFlags
@ 0x4938ce Py_Main
----------- Configuration Arguments -----------