GPU训练超长sequence出现[hl_gpu_apply_unary_op failed]
Created by: comeonfox
使用GPU训练时,出现如下错误:
125 [INFO 2017-10-15 18:36:14,705 trainer.py:81] Pass=0 Batch=840 Cost=7.03243 AvgCost=6.70295
126 F1015 18:38:03.304704 12823 hl_gpu_matrix_kernel.cuh:181] Check failed: cudaSuccess == err (0 vs. 9) [hl_gpu_apply_unary_op failed] CUDA error: invalid configuration argument
127 *** Check failure stack trace: ***
128 @ 0x7f4a1642a8fd google::LogMessage::Fail()
129 @ 0x7f4a1642e3ac google::LogMessage::SendToLog()
130 @ 0x7f4a1642a423 google::LogMessage::Flush()
131 @ 0x7f4a1642f8be google::LogMessageFatal::~LogMessageFatal()
132 @ 0x7f4a162b5264 hl_gpu_apply_unary_op<>()
133 @ 0x7f4a162b55a5 paddle::BaseMatrixT<>::applyUnary<>()
134 @ 0x7f4a162b57d3 paddle::BaseMatrixT<>::zero()
135 @ 0x7f4a1624a655 paddle::GpuMatrix::zeroMem()
136 @ 0x7f4a160bf4fb paddle::Layer::resetSpecifyOutput()
137 @ 0x7f4a160bf6e4 paddle::Layer::resetOutput()
138 @ 0x7f4a160a50fd paddle::MixedLayer::forward()
139 @ 0x7f4a1614f719 paddle::NeuralNetwork::forward()
140 @ 0x7f4a16171ed4 paddle::TrainerThread::forward()
141 @ 0x7f4a16173155 paddle::TrainerThread::computeThread()
142 @ 0x7f4a459c48a0 execute_native_thread_routine
143 @ 0x7f4a56aa41c3 start_thread
144 @ 0x7f4a560cc12d __clone
145 @ (nil) (unknown)
与 #4439 (closed) 和 #4440 (closed) 不同的是,我的模型在使用较短序列时可以成功训练。为排除数据的问题,我将 pass=0 batch=850 的数据单独拿出来训练也没有出现问题。
我的环境:cuda 7.5 GPU: Tesla K40 paddle 版本:https://paddleci.ngrok.io/viewLog.html?buildId=13074&buildTypeId=CentOS6u3gcc482_LegacyBuild&tab=artifacts