cudaSuccess == cudaStat (0 vs. 77) Cuda Error: an illegal memory access was encountered
Created by: Z-TAO
多GPU训练的时候提示错误:
F0221 14:00:17.320631 17204 hl_cuda_device.cc:566] Check failed: cudaSuccess == cudaStat (0 vs. 77) Cuda Error: an illegal memory access was encountered
F0221 14:00:17.320631 17206 hl_cuda_device.cc:566] Check failed: cudaSuccess == cudaStat (0 vs. 77) Cuda Error: an illegal memory access was encountered
*** Check failure stack trace: ***
@ 0x9ea800 google::LogMessage::Fail()
@ 0x9ea800 google::LogMessage::Fail()
@ 0x9ea75c google::LogMessage::SendToLog()
@ 0x9ea75c google::LogMessage::SendToLog()
@ 0x9ea0e0 google::LogMessage::Flush()
@ 0x9ea0e0 google::LogMessage::Flush()
@ 0x9ed187 google::LogMessageFatal::~LogMessageFatal()
@ 0x9ed187 google::LogMessageFatal::~LogMessageFatal()
@ 0x9b4437 hl_stream_synchronize()
@ 0x9b4437 hl_stream_synchronize()
@ 0x9c1007 hl_matrix_csr_mul_dense()
@ 0x61c79b paddle::TrainerThread::valueDispatchThread()
@ 0x7c6cc7 paddle::GpuMatrix::mul()
@ 0x7ca5af paddle::GpuMatrix::mul()
@ 0x6fa234 paddle::FullyConnectedLayer::forward()
@ 0x6411a4 paddle::NeuralNetwork::forward()
@ 0x7f60239508a0 execute_native_thread_routine
@ 0x61d161 paddle::TrainerThread::forward()
@ 0x61e35c paddle::TrainerThread::computeThread()
@ 0x7f60248371c3 start_thread
@ 0x7f60239508a0 execute_native_thread_routine
@ 0x7f60230c112d __clone
@ 0x7f60248371c3 start_thread
@ (nil) (unknown)
问题的具体表现形式是:
- 多卡训练的时候,指定大batch,不太容易出现以上的错误,小batch很容易出现
- 单卡的时候不出现这个问题
神经网络很简单,就是一个600000 * 256的fc layer。
trainer_config:
unit_word = data_layer(name='unit_words', size=word_dict_len)
rec_word = data_layer(name='recword', size=word_dict_len)
labels = data_layer(name='label', size=2)
unit_word_embedding = fc_layer(input = unit_word,
size = 256,
act = IdentityActivation(),
bias_attr = False,
param_attr = ParamAttr(name='_source_language_embedding', initial_mean=0, initial_std=0.01,sparse_update=True))
rec_word_embedding = fc_layer(input = rec_word,
size = 256,
act = IdentityActivation(),
bias_attr = False,
param_attr=ParamAttr(name='_source_language_embedding', initial_mean=0,initial_std=0.01,sparse_update=True))
output_embedding = fc_layer(input = [unit_word_embedding, rec_word_embedding],
size = 256,
act = TanhActivation(),
bias_attr = True)
output_embedding2 = fc_layer(input = output_embedding,
size = 64,
act = TanhActivation(),
bias_attr = True)
final_output = fc_layer(input = output_embedding2,
size = 2,
act = SoftmaxActivation(),
bias_attr = True)
cost = cross_entropy(input=final_output,
label=labels)
dataprovider:
def hook(settings, word_dict, **kwargs):
settings.word_dict = word_dict
settings.line_idx = 0
#all inputs are integral and sequential type
settings.slots = [
sparse_vector(len(word_dict)),
sparse_binary_vector(len(word_dict)),
integer_value(2)
]