作业报错paddle.fluid.core.EnforceNotMet: an illegal memory access was encountered
Created by: alexqdh
基于代码commitID: 43b6d4f8
在paddle平台上执行报错,本地执行没有问题,作业号job-e6c5b3228bb4b5af
INFO:thirdparty.datareader.corelib.reader.reader:connect to filesystem[afs://baihua.afs.baidu.com:9902]
Traceback (most recent call last):
File "train.py", line 334, in
log_per_batch=10)
File "train.py", line 301, in train
train_loop(fluid.default_main_program())
File "train.py", line 254, in train_loop
exe.run(fluid.default_startup_program())
File "/usr/local/lib/python2.7/site-packages/paddle/fluid/executor.py", line 340, in run
self.executor.run(program.desc, scope, 0, True, True)
paddle.fluid.core.EnforceNotMet: an illegal memory access was encountered at [/paddle/new_paddle/Paddle/paddle/fluid/platform/device_context.cc:179]
PaddlePaddle Call Stacks:
0 0x7f5f83c6ca56p paddle::platform::EnforceNotMet::EnforceNotMet(std::__exception_ptr::exception_ptr, char const*, int) + 486
1 0x7f5f85f2e515p paddle::platform::CUDADeviceContext::Wait() const + 549
2 0x7f5f83cfb81cp paddle::framework::Executor::RunPreparedContext(paddle::framework::ExecutorPrepareContext*, paddle::framework::Scope*, bool, bool) + 956
3 0x7f5f83cfbd14p paddle::framework::Executor::Run(paddle::framework::ProgramDesc const&, paddle::framework::Scope*, int, bool, bool) + 100
4 0x7f5f83c840fbp ZZN8pybind1112cpp_function10initializeIZNS0_C1IvN6paddle9framework8ExecutorEIRKNS4_11ProgramDescEPNS4_5ScopeEibbEINS_4nameENS_9is_methodENS_7sib
lingEEEEMT0_FT_DpT1_EDpRKT2_EUlPS5_S8_SA_ibbE_vISO_S8_SA_ibbEISB_SC_SD_EEEvOSF_PFSE_SH_ESN_ENUlRNS_6detail13function_callEE1_4_FUNESV + 555
5 0x7f5f83c7db54p pybind11::cpp_function::dispatcher(_object*, _object*, _object*) + 2596
6 0x7f5fe1be6631p PyEval_EvalFrameEx + 24497
7 0x7f5fe1be7bcep PyEval_EvalCodeEx + 2190
8 0x7f5fe1be620ap PyEval_EvalFrameEx + 23434
9 0x7f5fe1be7bcep PyEval_EvalCodeEx + 2190
10 0x7f5fe1be620ap PyEval_EvalFrameEx + 23434
11 0x7f5fe1be7bcep PyEval_EvalCodeEx + 2190
12 0x7f5fe1be620ap PyEval_EvalFrameEx + 23434
13 0x7f5fe1be7bcep PyEval_EvalCodeEx + 2190
14 0x7f5fe1be7ce2p PyEval_EvalCode + 50
15 0x7f5fe1c079e0p PyRun_FileExFlags + 176
16 0x7f5fe1c07bbfp PyRun_SimpleFileExFlags + 239
17 0x7f5fe1c1d454p Py_Main + 3188
18 0x7f5fe0ed1cddp __libc_start_main + 253
19 0x400649p
[/root/paddlejob/paddle_k8s : 180] [start_trainer]
[FATAL]: execute user cmd failed