gru4rec 4卡下报错
Created by: ccmeteorljh
('use_cuda:', True, 'parallel:', True)
start constuct word dict
epoch_1 start
Traceback (most recent call last):
File "train.py", line 125, in <module>
train()
File "train.py", line 105, in train
fetch_list=fetch_list)
File "/usr/local/lib/python2.7/site-packages/paddle/fluid/parallel_executor.py", line 277, in run
self.executor.run(fetch_list, fetch_var_name)
paddle.fluid.core.EnforceNotMet: an illegal memory access was encountered at [/paddle/paddle/fluid/platform/device_context.cc:250]
PaddlePaddle Call Stacks:
0 0x7f50aa2abda6p paddle::platform::EnforceNotMet::EnforceNotMet(std::__exception_ptr::exception_ptr, char const*, int) + 486
1 0x7f50ab9f5e32p paddle::platform::CUDADeviceContext::Wait() const + 258
2 0x7f50ab67e3e2p paddle::framework::details::ScopeBufferedSSAGraphExecutor::Run(std::vector<std::string, std::allocator<std::string> > const&) + 850
3 0x7f50aa38c0b9p paddle::framework::ParallelExecutor::Run(std::vector<std::string, std::allocator<std::string> > const&, std::string const&) + 489
4 0x7f50aa2a05e0p
5 0x7f50aa2c24b4p pybind11::cpp_function::dispatcher(_object*, _object*, _object*) + 2596
6 0x7f510bba83f9p PyEval_EvalFrameEx + 22841
7 0x7f510bba948cp PyEval_EvalCodeEx + 2076
8 0x7f510bba828ap PyEval_EvalFrameEx + 22474
9 0x7f510bba83acp PyEval_EvalFrameEx + 22764
10 0x7f510bba948cp PyEval_EvalCodeEx + 2076
11 0x7f510bba95a9p PyEval_EvalCode + 25
12 0x7f510bbcd32ap PyRun_FileExFlags + 138
13 0x7f510bbce827p PyRun_SimpleFileExFlags + 231
14 0x7f510bbe4f31p Py_Main + 3265
15 0x7f510b4d6830p __libc_start_main + 240
16 0x4006e9p _start + 41
terminate called after throwing an instance of 'paddle::platform::EnforceNotMet'
what(): an illegal memory access was encountered at [/paddle/paddle/fluid/platform/device_context.cc:250]
PaddlePaddle Call Stacks:
0 0x7f50aa2abda6p paddle::platform::EnforceNotMet::EnforceNotMet(std::__exception_ptr::exception_ptr, char const*, int) + 486
1 0x7f50ab9f5e32p paddle::platform::CUDADeviceContext::Wait() const + 258
2 0x7f50aa388065p paddle::framework::ParallelExecutor::~ParallelExecutor() + 69
3 0x7f50aa2a9023p pybind11::class_<paddle::framework::ParallelExecutor>::dealloc(_object*) + 35
4 0x7f50aa2c0c81p pybind11_object_dealloc + 49
5 0x7f510bb35cdbp
6 0x7f510bb5779ep
7 0x7f510bb20bc9p
8 0x7f510bbd6d7bp
9 0x7f510bbd6d8bp
10 0x7f510bb36b07p
11 0x7f510bb38577p PyDict_SetItem + 103
12 0x7f510bb39d14p PyDict_SetItemString + 68
13 0x7f510bbbc0ffp PyImport_Cleanup + 335
14 0x7f510bbce1bep Py_Finalize + 254
15 0x7f510bbe4864p Py_Main + 1524
16 0x7f510b4d6830p __libc_start_main + 240
17 0x4006e9p _start + 41
*** Aborted at 1543478033 (unix time) try "date -d @1543478033" if you are using GNU date ***
PC: @ 0x0 (unknown)
*** SIGABRT (@0x795) received by PID 1941 (TID 0x7f510c0b6700) from PID 1941; stack trace: ***
@ 0x7f510b891390 (unknown)
@ 0x7f510b4eb428 gsignal
@ 0x7f510b4ed02a abort
@ 0x7f510150792d __gnu_cxx::__verbose_terminate_handler()
@ 0x7f5101505996 __cxxabiv1::__terminate()
@ 0x7f5101504a49 __cxa_call_terminate
@ 0x7f5101505335 __gxx_personality_v0
@ 0x7f5101a5ff83 (unknown)
@ 0x7f5101a60487 _Unwind_Resume
@ 0x7f50ab9f5fd6 paddle::platform::CUDADeviceContext::Wait()
@ 0x7f50aa388065 paddle::framework::ParallelExecutor::~ParallelExecutor()
@ 0x7f50aa2a9023 pybind11::class_<>::dealloc()
@ 0x7f50aa2c0c81 pybind11_object_dealloc
@ 0x7f510bb35cdb dict_dealloc
@ 0x7f510bb5779e subtype_dealloc
@ 0x7f510bb20bc9 frame_dealloc
@ 0x7f510bbd6d7b tb_dealloc
@ 0x7f510bbd6d8b tb_dealloc
@ 0x7f510bb36b07 insertdict
@ 0x7f510bb38577 PyDict_SetItem
@ 0x7f510bb39d14 PyDict_SetItemString
@ 0x7f510bbbc0ff PyImport_Cleanup
@ 0x7f510bbce1be Py_Finalize
@ 0x7f510bbe4864 Py_Main
@ 0x7f510b4d6830 __libc_start_main
@ 0x4006e9 _start
@ 0x0 (unknown)
Aborted
paddle_version: 1.1 单卡下正常;