1.5.1GPU版本训练出core，GPU显存分配问题，调整FLAGS_fraction_of_gpu_memory_to_use不起作用 (#18819) · Issue · PaddlePaddle / Paddle

1.5.1GPU版本训练出core，GPU显存分配问题，调整FLAGS_fraction_of_gpu_memory_to_use不起作用

Created by: sshilei

调整FLAGS_fraction_of_gpu_memory_to_use不起作用。

不设置FLAGS_fraction_of_gpu_memory_to_use时，报错如下：

W0725 17:02:17.091418 60643 device_context.cc:259] Please NOTE: device: 0, CUDA Capability: 61, Driver API Version: 9.2, Runtime API Version: 9.0 W0725 17:02:17.096510 60643 device_context.cc:267] device: 0, cuDNN Version: 7.3. Traceback (most recent call last): File "/home/map/rd/wushilei/code/paddle-frame/baidu/mapsearch/paddle-frame/frame/scheduler/../..//frame/core/gpu_dataset_trainer.py", line 159, in ret = trainer.start(sys.argv) File "/home/map/rd/wushilei/code/paddle-frame/baidu/mapsearch/paddle-frame/frame/scheduler/../..//frame/core/gpu_dataset_trainer.py", line 150, in start self.set_post_paddle_env(FLAGS, factory) File "/home/map/rd/wushilei/code/paddle-frame/baidu/mapsearch/paddle-frame/frame/core/base_trainer.py", line 359, in set_post_paddle_env exe = self.create_executor(FLAGS) File "/home/map/rd/wushilei/code/paddle-frame/baidu/mapsearch/paddle-frame/frame/core/base_trainer.py", line 344, in create_executor exe.run(program) File "/home/map/rd/wushilei/program/paddle/bin/gpu/1.5.1/cuda9/paddle_release_home/python/lib/python2.7/site-packages/paddle/fluid/executor.py", line 651, in run use_program_cache=use_program_cache) File "/home/map/rd/wushilei/program/paddle/bin/gpu/1.5.1/cuda9/paddle_release_home/python/lib/python2.7/site-packages/paddle/fluid/executor.py", line 749, in run exe.run(program.desc, scope, 0, True, True, fetch_var_name) paddle.fluid.core_avx.EnforceNotMet: Invoke operator fill_constant error. Python Callstacks: File "/home/map/rd/wushilei/program/paddle/bin/gpu/1.5.1/cuda9/paddle_release_home/python/lib/python2.7/site-packages/paddle/fluid/framework.py", line 1842, in prepend_op attrs=kwargs.get("attrs", None)) File "/home/map/rd/wushilei/program/paddle/bin/gpu/1.5.1/cuda9/paddle_release_home/python/lib/python2.7/site-packages/paddle/fluid/initializer.py", line 189, in call stop_gradient=True) File "/home/map/rd/wushilei/program/paddle/bin/gpu/1.5.1/cuda9/paddle_release_home/python/lib/python2.7/site-packages/paddle/fluid/framework.py", line 1625, in create_var kwargs'initializer' File "/home/map/rd/wushilei/program/paddle/bin/gpu/1.5.1/cuda9/paddle_release_home/python/lib/python2.7/site-packages/paddle/fluid/layer_helper_base.py", line 383, in set_variable_initializer initializer=initializer) File "/home/map/rd/wushilei/program/paddle/bin/gpu/1.5.1/cuda9/paddle_release_home/python/lib/python2.7/site-packages/paddle/fluid/layers/tensor.py", line 142, in create_global_var value=float(value), force_cpu=force_cpu)) File "/home/map/rd/wushilei/program/paddle/bin/gpu/1.5.1/cuda9/paddle_release_home/python/lib/python2.7/site-packages/paddle/fluid/optimizer.py", line 226, in create_global_learning_rate persistable=True) File "/home/map/rd/wushilei/program/paddle/bin/gpu/1.5.1/cuda9/paddle_release_home/python/lib/python2.7/site-packages/paddle/fluid/optimizer.py", line 365, in create_optimization_pass self.create_global_learning_rate() File "/home/map/rd/wushilei/program/paddle/bin/gpu/1.5.1/cuda9/paddle_release_home/python/lib/python2.7/site-packages/paddle/fluid/optimizer.py", line 532, in apply_gradients optimize_ops = self.create_optimization_pass(params_grads) File "/home/map/rd/wushilei/program/paddle/bin/gpu/1.5.1/cuda9/paddle_release_home/python/lib/python2.7/site-packages/paddle/fluid/optimizer.py", line 562, in apply_optimize optimize_ops = self.apply_gradients(params_grads) File "/home/map/rd/wushilei/program/paddle/bin/gpu/1.5.1/cuda9/paddle_release_home/python/lib/python2.7/site-packages/paddle/fluid/optimizer.py", line 601, in minimize loss, startup_program=startup_program, params_grads=params_grads) File "/home/map/rd/wushilei/program/paddle/bin/gpu/1.5.1/cuda9/paddle_release_home/python/lib/python2.7/site-packages/paddle/fluid/dygraph/base.py", line 87, in impl return func(*args, **kwargs) File "/home/map/rd/wushilei/program/paddle/bin/gpu/1.5.1/cuda9/paddle_release_home/python/lib/python2.7/site-packages/paddle/fluid/wrapped_decorator.py", line 25, in impl return wrapped_func(args, kwargs) File "</home/map/rd/wushilei/program/paddle/bin/gpu/1.5.1/cuda9/paddle_release_home/python/lib/python2.7/site-packages/decorator.pyc:decorator-gen-20>", line 2, in minimize File "/home/map/rd/wushilei/code/paddle-frame/baidu/mapsearch/paddle-frame/frame/scheduler/../..//frame/core/gpu_dataset_trainer.py", line 123, in set_optimizer sgd_optimizer.minimize(net_output['loss']) File "/home/map/rd/wushilei/code/paddle-frame/baidu/mapsearch/paddle-frame/frame/scheduler/../..//frame/core/gpu_dataset_trainer.py", line 147, in start self.set_optimizer(FLAGS, net_output) File "/home/map/rd/wushilei/code/paddle-frame/baidu/mapsearch/paddle-frame/frame/scheduler/../..//frame/core/gpu_dataset_trainer.py", line 159, in ret = trainer.start(sys.argv) C++ Callstacks: Enforce failed. Expected allocating <= available, but received allocating:6976001517 > available:29294336. Insufficient GPU memory to allocation. at [/paddle/paddle/fluid/platform/gpu_info.cc:262] PaddlePaddle Call Stacks: 0 0x7f57d95efff8p void paddle::platform::EnforceNotMet::Initstd::string(std::string, char const, int) + 360 1 0x7f57d95f0347p paddle::platform::EnforceNotMet::EnforceNotMet(std::string const&, char const, int) + 87 2 0x7f57db612e86p paddle::platform::GpuMaxChunkSize() + 630 3 0x7f57db5e717ap 4 0x7f5894e8e973p pthread_once + 83 5 0x7f57db5e681dp paddle::memory::legacy::GetGPUBuddyAllocator(int) + 109 6 0x7f57db5e7665p void paddle::memory::legacy::Allocpaddle::platform::CUDAPlace(paddle::platform::CUDAPlace const&, unsigned long) + 37 7 0x7f57db5e7ba5p paddle::memory::allocation::LegacyAllocator::AllocateImpl(unsigned long) + 421 8 0x7f57db5dbcc5p paddle::memory::allocation::AllocatorFacade::Alloc(boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&, unsigned long) + 181 9 0x7f57db5dbe4ap paddle::memory::allocation::AllocatorFacade::AllocShared(boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&, unsigned long) + 26 10 0x7f57db1e672cp paddle::memory::AllocShared(boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&, unsigned long) + 44 11 0x7f57db5ae6f4p paddle::framework::Tensor::mutable_data(boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_>, paddle::framework::proto::VarType_Type, unsigned long) + 148 12 0x7f57da3a2b0ep paddle::operators::FillConstantKernel::Compute(paddle::framework::ExecutionContext const&) const + 494 13 0x7f57da3a5c23p std::Function_handler<void (paddle::framework::ExecutionContext const&), paddle::framework::OpKernelRegistrarFunctor<paddle::platform::CUDAPlace, false, 0ul, paddle::operators::FillConstantKernel, paddle::operators::FillConstantKernel, paddle::operators::FillConstantKernel, paddle::operators::FillConstantKernel, paddle::operators::FillConstantKernelpaddle::platform::float16 >::operator()(char const*, char const*, int) const::{lambda(paddle::framework::ExecutionContext const&)#1 (closed)}>::M_invoke(std::Any_data const&, paddle::framework::ExecutionContext const&) + 35 14 0x7f57db558037p paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&, paddle::framework::RuntimeContext*) const + 375 15 0x7f57db558411p paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&) const + 529 16 0x7f57db555a0cp paddle::framework::OperatorBase::Run(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&) + 332 17 0x7f57d977c46ep paddle::framework::Executor::RunPreparedContext(paddle::framework::ExecutorPrepareContext*, paddle::framework::Scope*, bool, bool, bool) + 382 18 0x7f57d977f50fp paddle::framework::Executor::Run(paddle::framework::ProgramDesc const&, paddle::framework::Scope*, int, bool, bool, std::vector<std::string, std::allocatorstd::string > const&, bool) + 143 19 0x7f57d95e0f8dp 20 0x7f57d9622936p 21 0x7f58951aabb8p PyEval_EvalFrameEx + 25016 22 0x7f58951ae0bdp PyEval_EvalCodeEx + 2061 23 0x7f58951ab345p PyEval_EvalFrameEx + 26949 24 0x7f58951ae0bdp PyEval_EvalCodeEx + 2061 25 0x7f58951ab345p PyEval_EvalFrameEx + 26949 26 0x7f58951ae0bdp PyEval_EvalCodeEx + 2061 27 0x7f58951ab345p PyEval_EvalFrameEx + 26949 28 0x7f58951ae0bdp PyEval_EvalCodeEx + 2061 29 0x7f58951ab345p PyEval_EvalFrameEx + 26949 30 0x7f58951ae0bdp PyEval_EvalCodeEx + 2061 31 0x7f58951ab345p PyEval_EvalFrameEx + 26949 32 0x7f58951ae0bdp PyEval_EvalCodeEx + 2061 33 0x7f58951ae1f2p PyEval_EvalCode + 50 34 0x7f58951d6f42p PyRun_FileExFlags + 146 35 0x7f58951d82d9p PyRun_SimpleFileExFlags + 217 36 0x7f58951ee00dp Py_Main + 3149 37 0x7f58943ebbd5p __libc_start_main + 245 38 0x4007a1p

将FLAGS_fraction_of_gpu_memory_to_use调整到很小，已经是0.003，报错如下： W0725 17:13:25.508072 175955 device_context.cc:259] Please NOTE: device: 0, CUDA Capability: 61, Driver API Version: 9.2, Runtime API Version: 9.0 W0725 17:13:25.513144 175955 device_context.cc:267] device: 0, cuDNN Version: 7.3. W0725 17:13:27.698426 175955 system_allocator.cc:121] Cannot malloc 195.801 MB GPU memory. Please shrink FLAGS_fraction_of_gpu_memory_to_use or FLAGS_initial_gpu_memory_in_mb or FLAGS_reallocate_gpu_memory_in_mbenvironment variable to a lower value. Current FLAGS_fraction_of_gpu_memory_to_use value is 0.003. Current FLAGS_initial_gpu_memory_in_mb value is 0. Current FLAGS_reallocate_gpu_memory_in_mb value is 0 F0725 17:13:27.698644 175955 legacy_allocator.cc:201] Cannot allocate 195.800781MB in GPU 0, available 5.937500MBtotal 7981694976GpuMinChunkSize 256.000000BGpuMaxChunkSize 21.694021MBGPU memory used: 324.750000kB *** Check failure stack trace: *** @ 0x7f77dfffa27d google::LogMessage::Fail() @ 0x7f77dfffdd2c google::LogMessage::SendToLog() @ 0x7f77dfff9da3 google::LogMessage::Flush() @ 0x7f77dffff23e google::LogMessageFatal::~LogMessageFatal() @ 0x7f77e1eaa8c4 paddle::memory::legacy::Alloc<>() @ 0x7f77e1eaaba5 paddle::memory::allocation::LegacyAllocator::AllocateImpl() @ 0x7f77e1e9ecc5 paddle::memory::allocation::AllocatorFacade::Alloc() @ 0x7f77e1e9ee4a paddle::memory::allocation::AllocatorFacade::AllocShared() @ 0x7f77e1aa972c paddle::memory::AllocShared() 90 sync|datafeed) @ 0x7f77e1e716f4 paddle::framework::Tensor::mutable_data() @ 0x7f77e05ca875 paddle::operators::GPUUniformRandomKernel<>::Compute() @ 0x7f77e05cad73 ZNSt17_Function_handlerIFvRKN6paddle9framework16ExecutionContextEEZNKS1_24OpKernelRegistrarFunctorINS0_8platform9CUDAPlaceELb0ELm0EINS0_9operators22GPUUniformRandomKernelIfEENSA_IdEEEEclEPKcSF_iEUlS4_E_E9_M_invokeERKSt9_Any_dataS4 @ 0x7f77e1e1b037 paddle::framework::OperatorWithKernel::RunImpl() @ 0x7f77e1e1b411 paddle::framework::OperatorWithKernel::RunImpl() @ 0x7f77e1e18a0c paddle::framework::OperatorBase::Run() @ 0x7f77e003f46e paddle::framework::Executor::RunPreparedContext() @ 0x7f77e004250f paddle::framework::Executor::Run() @ 0x7f77dfea3f8d ZZN8pybind1112cpp_function10initializeIZN6paddle6pybindL22pybind11_init_core_avxERNS_6moduleEEUlRNS2_9framework8ExecutorERKNS6_11ProgramDescEPNS6_5ScopeEibbRKSt6vectorISsSaISsEEE85_vIS8_SB_SD_ibbSI_EINS_4nameENS_9is_methodENS_7siblingEEEEvOT_PFT0_DpT1_EDpRKT2_ENUlRNS_6detail13function_callEE1_4_FUNES10 @ 0x7f77dfee5936 pybind11::cpp_function::dispatcher() @ 0x7f789ba6dbb8 PyEval_EvalFrameEx @ 0x7f789ba710bd PyEval_EvalCodeEx @ 0x7f789ba6e345 PyEval_EvalFrameEx @ 0x7f789ba710bd PyEval_EvalCodeEx @ 0x7f789ba6e345 PyEval_EvalFrameEx @ 0x7f789ba710bd PyEval_EvalCodeEx @ 0x7f789ba6e345 PyEval_EvalFrameEx @ 0x7f789ba710bd PyEval_EvalCodeEx @ 0x7f789ba6e345 PyEval_EvalFrameEx @ 0x7f789ba710bd PyEval_EvalCodeEx @ 0x7f789ba6e345 PyEval_EvalFrameEx @ 0x7f789ba710bd PyEval_EvalCodeEx @ 0x7f789ba711f2 PyEval_EvalCode

PaddlePaddle / Paddle 大约 1 年 前同步成功

1.5.1GPU版本训练出core，GPU显存分配问题，调整FLAGS_fraction_of_gpu_memory_to_use不起作用

PaddlePaddle / Paddle
大约 1 年前同步成功