Check failed: cudaSuccess == cudaStat (0 vs. 77) Cuda Error: an illegal memory access was encountered
Created by: jamestang0219
Hello, while I train my models, this error sometimes occurred. It is very weird that by training the same samples, the error sometimes occurred, but sometimes didn't. I wanna know why this error occur and how to avoid this error to train my models successfully. Here is the log:
I1104 11:18:44.281013 87409 TrainerInternal.cpp:165] Batch=3000 samples=192000 AvgCost=0.136428 CurrentCost=0.100355 Eval: classification_error_evaluator=0.0541406 CurrentEval: classification_error_evaluator=0.0395312
I1104 11:18:45.290464 87409 Tester.cpp:127] Test samples=1000 cost=0.347865 Eval: classification_error_evaluator=0.148
...................................................................................................
I1104 11:19:09.124543 87409 TrainerInternal.cpp:165] Batch=3100 samples=198400 AvgCost=0.135145 CurrentCost=0.0966748 Eval: classification_error_evaluator=0.0535938 CurrentEval: classification_error_evaluator=0.0371875
..........F1104 11:19:11.596421 87418 hl_cuda_cublas.cc:220] Check failed: stat == CUBLAS_STATUS_SUCCESS (13 vs. 0) [cublas status]: execution failed
*** Check failure stack trace: ***
F1104 11:19:11.596427 87426 hl_cuda_device.cc:661] Check failed: cudaSuccess == cudaStat (0 vs. 77) Cuda Error: an illegal memory access was encounteredF1104 11:19:11.596457 87427 hl_cuda_device.cc:661] Check failed: cudaSuccess == cudaStat (0 vs. 77) Cuda Error: an illegal memory access was encounteredF1104 11:19:11.596423 87430 hl_cuda_device.cc:661] Check failed: cudaSuccess == cudaStat (0 vs. 77) Cuda Error: an illegal memory access was encounteredF1104 11:19:11.596422 87422 hl_cuda_device.cc:661] Check failed: cudaSuccess == cudaStat (0 vs. 77) Cuda Error: an illegal memory access was encountered
*** Check failure stack trace: ***
F1104 11:19:11.596427 87426 hl_cuda_device.cc:661] Check failed: cudaSuccess == cudaStat (0 vs. 77) Cuda Error: an illegal memory access was encounteredF1104 11:19:11.596457 87427 hl_cuda_device.cc:661] Check failed: cudaSuccess == cudaStat (0 vs. 77) Cuda Error: an illegal memory access was encounteredF1104 11:19:11.596423 87430 hl_cuda_device.cc:661] Check failed: cudaSuccess == cudaStat (0 vs. 77) Cuda Error: an illegal memory access was encounteredF1104 11:19:11.596422 87422 hl_cuda_device.cc:661] Check failed: cudaSuccess == cudaStat (0 vs. 77) Cuda Error: an illegal memory access was encountered
*** Check failure stack trace: ***
F1104 11:19:11.596427 87426 hl_cuda_device.cc:661] Check failed: cudaSuccess == cudaStat (0 vs. 77) Cuda Error: an illegal memory access was encounteredF1104 11:19:11.596457 87427 hl_cuda_device.cc:661] Check failed: cudaSuccess == cudaStat (0 vs. 77) Cuda Error: an illegal memory access was encounteredF1104 11:19:11.596423 87430 hl_cuda_device.cc:661] Check failed: cudaSuccess == cudaStat (0 vs. 77) Cuda Error: an illegal memory access was encounteredF1104 11:19:11.596422 87422 hl_cuda_device.cc:661] Check failed: cudaSuccess == cudaStat (0 vs. 77) Cuda Error: an illegal memory access was encountered
*** Check failure stack trace: ***
/usr/local/paddle/bin//paddle: line 81: 87409 Aborted (core dumped) ${DEBUGGER} $MYDIR/../opt/paddle/bin/paddle_trainer ${@:2}
wish your reply, thank you.