用自己的训练集训练报错
Created by: huxiaoman7
如题,我用自己的数据集制作了manifest数据,训练,run_train如下:
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5 \
python -u train.py \
--batch_size=4 \
--trainer_count=6 \
--num_passes=5 \
--num_proc_data=16 \
--num_conv_layers=2 \
--num_rnn_layers=3 \
--rnn_layer_size=2048 \
--num_iter_print=1 \
--learning_rate=5e-4 \
--max_duration=1000.0 \
--min_duration=0.0 \
--test_off=False \
--use_sortagrad=True \
--use_gru=False \
--use_gpu=True \
--is_local=True \
--share_rnn_weights=True \
--train_manifest='/data/Paddle/DeepSpeech/demo/manifest.train' \
--dev_manifest='/data/Paddle/DeepSpeech/demo/manifest.dev' \
--mean_std_path='/data/Paddle/DeepSpeech/demo/mean_std.npz' \
--vocab_path='/data/Paddle/DeepSpeech/demo/vocab.txt' \
--output_model_dir='/data/Paddle/DeepSpeech/demo/log/' \
--augment_conf_path='conf/augmentation.config' \
--specgram_type='linear' \
--shuffle_method='batch_shuffle_clipped'
训练了好几次,都报如下错误,我调小了batch_size,减少了显卡数量,每次运行失败后也手动kill了,但还是报错,请问如何解决?
F0328 08:04:45.916014 5449 hl_warpctc_wrap.cc:131] Check failed: CTC_STATUS_SUCCESS == dynload::compute_ctc_loss(batchInput, batchGrad, cpuLabels, cpuLabelLengths, cpuInputLengths, numClasses, numSequences, cpuCosts, workspace, *options) (0 vs. 4) warp-ctc [version 2] Error: unknown error
*** Check failure stack trace: ***
@ 0x7febcd740bcd google::LogMessage::Fail()
@ 0x7febcd74467c google::LogMessage::SendToLog()
@ 0x7febcd7406f3 google::LogMessage::Flush()
@ 0x7febcd745b8e google::LogMessageFatal::~LogMessageFatal()
@ 0x7febcd6f0e41 hl_warpctc_compute_loss()
@ 0x7febcd3307f5 paddle::WarpCTCLayer::forward()
@ 0x7febcd44af4d paddle::NeuralNetwork::forward()
@ 0x7febcd46d014 paddle::TrainerThread::forward()
@ 0x7febcd46e315 paddle::TrainerThread::computeThread()
@ 0x7febe8a93c80 (unknown)
@ 0x7febeadfe6ba start_thread
@ 0x7febeab3482d clone
@ (nil) (unknown)
Aborted (core dumped)