GPU单机训练报错,GpuMemcpySync
Created by: shiyazhou121
paddle版本为: gpu_1.4.1 place使用CUDAPlace(0)会报以下错误,使用CPUPlace正常训练 报错如下:
W0711 10:59:06.681474 47540 device_context.cc:261] Please NOTE: device: 0, CUDA Capability: 61, Driver API Version: 8.0, Runtime API Version: 8.0 W0711 10:59:06.684422 47540 device_context.cc:269] device: 0, cuDNN Version: 5.0. W0711 10:59:06.684448 47540 device_context.cc:293] WARNING: device: 0. The installed Paddle is compiled with CUDNN 5.1, but CUDNN version in your machine is 5.0, which may cause serious incompatible bug. Please recompile or reinstall Paddle with compatible CUDNN version. Traceback (most recent call last): File "simNet.py", line 297, in <module> op.train() File "simNet.py", line 291, in train train_loop(fluid.default_main_program()) File "simNet.py", line 217, in train_loop feed=dt, fetch_list=[model_layers['loss'], model_layers['acc'], model_layers['pred'], model_layers['label']]) File "/home/shiyazhou/fluid_gpu_1.4_Env/lib/python2.7/site-packages/paddle/fluid/executor.py", line 565, in run use_program_cache=use_program_cache) File "/home/shiyazhou/fluid_gpu_1.4_Env/lib/python2.7/site-packages/paddle/fluid/executor.py", line 642, in _run exe.run(program.desc, scope, 0, True, True, fetch_var_name) paddle.fluid.core.EnforceNotMet: Invoke operator fetch error. Python Callstacks: File "/home/shiyazhou/fluid_gpu_1.4_Env/lib/python2.7/site-packages/paddle/fluid/framework.py", line 1654, in append_op attrs=kwargs.get("attrs", None)) File "/home/shiyazhou/fluid_gpu_1.4_Env/lib/python2.7/site-packages/paddle/fluid/executor.py", line 360, in _add_feed_fetch_ops attrs={'col': i}) File "/home/shiyazhou/fluid_gpu_1.4_Env/lib/python2.7/site-packages/paddle/fluid/executor.py", line 639, in _run fetch_var_name=fetch_var_name) File "/home/shiyazhou/fluid_gpu_1.4_Env/lib/python2.7/site-packages/paddle/fluid/executor.py", line 565, in run use_program_cache=use_program_cache) File "simNet.py", line 217, in train_loop feed=dt, fetch_list=[model_layers['loss'], model_layers['acc'], model_layers['pred'], model_layers['label']]) File "simNet.py", line 291, in train train_loop(fluid.default_main_program()) File "simNet.py", line 297, in <module> op.train() C++ Callstacks: cudaMemcpy failed in paddle::platform::GpuMemcpySync (0x102182ee640 -> 0x7f37a10b4040, length: 4): unspecified launch failure at [/paddle/paddle/fluid/platform/gpu_info.cc:280] PaddlePaddle Call Stacks: 0 0x7f37e7d1f840p void paddle::platform::EnforceNotMet::Init<char const*>(char const*, char const*, int) + 352 1 0x7f37e7d1fbb9p paddle::platform::EnforceNotMet::EnforceNotMet(std::__exception_ptr::exception_ptr, char const*, int) + 137 2 0x7f37e9b9dfecp paddle::platform::GpuMemcpySync(void*, void const*, unsigned long, cudaMemcpyKind) + 188 3 0x7f37e7e91601p void paddle::memory::Copy<paddle::platform::CPUPlace, paddle::platform::CUDAPlace>(paddle::platform::CPUPlace, void*, paddle::platform::CUDAPlace, void const*, unsigned long, CUstream_st*) + 241 4 0x7f37e9b3dcf4p paddle::framework::TensorCopySync(paddle::framework::Tensor const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&, paddle::framework::Tensor*) + 916 5 0x7f37e95799b2p paddle::operators::FetchOp::RunImpl(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&) const + 626 6 0x7f37e9aaad5cp paddle::framework::OperatorBase::Run(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&) + 332 7 0x7f37e7e94abep paddle::framework::Executor::RunPreparedContext(paddle::framework::ExecutorPrepareContext*, paddle::framework::Scope*, bool, bool, bool) + 382 8 0x7f37e7e958ffp paddle::framework::Executor::Run(paddle::framework::ProgramDesc const&, paddle::framework::Scope*, int, bool, bool, std::vector<std::string, std::allocator<std::string> > const&, bool) + 143 9 0x7f37e7d0f55ep 10 0x7f37e7d5292ep 11 0x7f38255fc599p PyEval_EvalFrameEx + 31177 12 0x7f38255fe3bdp PyEval_EvalCodeEx + 2061 13 0x7f38255fca92p PyEval_EvalFrameEx + 32450 14 0x7f38255fe3bdp PyEval_EvalCodeEx + 2061 15 0x7f38255fca92p PyEval_EvalFrameEx + 32450 16 0x7f38255fe3bdp PyEval_EvalCodeEx + 2061 17 0x7f38255fca92p PyEval_EvalFrameEx + 32450 18 0x7f38255fe3bdp PyEval_EvalCodeEx + 2061 19 0x7f38255fca92p PyEval_EvalFrameEx + 32450 20 0x7f38255fe3bdp PyEval_EvalCodeEx + 2061 21 0x7f38255fe4f2p PyEval_EvalCode + 50 22 0x7f3825629062p PyRun_FileExFlags + 146 23 0x7f382562a3e9p PyRun_SimpleFileExFlags + 217 24 0x7f38256403bfp Py_Main + 3199 25 0x7f3824f2cec5p __libc_start_main + 245 26 0x4006fep