Cudnn Error: CUDNN_STATUS_BAD_PARAM
Created by: CrossLee1
gpubox使用12块卡训练,运行到中途的时候报错Check failed: CUDNN_STATUS_SUCCESS == cudnnStat (0 vs. 3) Cudnn Error: CUDNN_STATUS_BAD_PARAM,请问下这是什么原因呢
I1211 18:35:43.264616 1309 TrainerInternal.cpp:165] Batch=10 samples=6000 AvgCost=2.68723 CurrentCost=2.68723 Eval: classification_error_evaluator=0.757333 CurrentEval: classification_error_evaluator=0.757333
.F1211 18:35:51.846307 1661 hl_cuda_cudnn.cc:933] Check failed: CUDNN_STATUS_SUCCESS == cudnnStat (0 vs. 3) Cudnn Error: CUDNN_STATUS_BAD_PARAM
* Check failure stack trace: *
@ 0xafe2ed google::LogMessage::Fail()
@ 0xb01d9c google::LogMessage::SendToLog()
@ 0xafdde3 google::LogMessage::Flush()
@ 0xb032ae google::LogMessageFatal::~LogMessageFatal()
@ 0x10b2d51 hl_softmax_backward()
@ 0xde68db paddle::GpuMatrix::softmaxBackward()
@ 0xbc7c1b paddle::softmaxActivation::backward()
@ 0xbc8275 paddle::sequence_softmaxActivation::backward()
@ 0xc24a69 paddle::Layer::backwardActivation()
@ 0xca89cd paddle::FullyConnectedLayer::backward()
@ 0xb26db9 paddle::NeuralNetwork::backward()
@ 0xb47896 paddle::TrainerThread::backward()
@ 0xb4749a paddle::TrainerThread::computeThread()
@ 0xb46f9d _ZZN6paddle13TrainerThread5startEvENKUlvE_clEv
@ 0xb4d1fa _ZNSt12_Bind_simpleIFZN6paddle13TrainerThread5startEvEUlvE_vEE9_M_invokeIIEEEvSt12_Index_tupleIIXspT_EEE
@ 0xb4cf59 _ZNSt12_Bind_simpleIFZN6paddle13TrainerThread5startEvEUlvE_vEEclEv
@ 0xb4cdba _ZNSt6thread5_ImplISt12_Bind_simpleIFZN6paddle13TrainerThread5startEvEUlvE_vEEE6_M_runEv
@ 0x7fd85b9a62d0 execute_native_thread_routine
@ 0x318b207851 (unknown)
@ 0x318aee767d (unknown)
@ (nil) (unknown)`