用自己的训练集训练报错 (#196) · Issue · PaddlePaddle / DeepSpeech

用自己的训练集训练报错

Created by: huxiaoman7

如题，我用自己的数据集制作了manifest数据，训练，run_train如下：

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5 \
python -u train.py \
--batch_size=4 \
--trainer_count=6 \
--num_passes=5 \
--num_proc_data=16 \
--num_conv_layers=2 \
--num_rnn_layers=3 \
--rnn_layer_size=2048 \
--num_iter_print=1 \
--learning_rate=5e-4 \
--max_duration=1000.0 \
--min_duration=0.0 \
--test_off=False \
--use_sortagrad=True \
--use_gru=False \
--use_gpu=True \
--is_local=True \
--share_rnn_weights=True \
--train_manifest='/data/Paddle/DeepSpeech/demo/manifest.train' \
--dev_manifest='/data/Paddle/DeepSpeech/demo/manifest.dev' \
--mean_std_path='/data/Paddle/DeepSpeech/demo/mean_std.npz' \
--vocab_path='/data/Paddle/DeepSpeech/demo/vocab.txt' \
--output_model_dir='/data/Paddle/DeepSpeech/demo/log/' \
--augment_conf_path='conf/augmentation.config' \
--specgram_type='linear' \
--shuffle_method='batch_shuffle_clipped'

训练了好几次，都报如下错误，我调小了batch_size，减少了显卡数量，每次运行失败后也手动kill了，但还是报错，请问如何解决？

F0328 08:04:45.916014  5449 hl_warpctc_wrap.cc:131] Check failed: CTC_STATUS_SUCCESS == dynload::compute_ctc_loss(batchInput, batchGrad, cpuLabels, cpuLabelLengths, cpuInputLengths, numClasses, numSequences, cpuCosts, workspace, *options) (0 vs. 4) warp-ctc [version 2] Error: unknown error
*** Check failure stack trace: ***
    @     0x7febcd740bcd  google::LogMessage::Fail()
    @     0x7febcd74467c  google::LogMessage::SendToLog()
    @     0x7febcd7406f3  google::LogMessage::Flush()
    @     0x7febcd745b8e  google::LogMessageFatal::~LogMessageFatal()
    @     0x7febcd6f0e41  hl_warpctc_compute_loss()
    @     0x7febcd3307f5  paddle::WarpCTCLayer::forward()
    @     0x7febcd44af4d  paddle::NeuralNetwork::forward()
    @     0x7febcd46d014  paddle::TrainerThread::forward()
    @     0x7febcd46e315  paddle::TrainerThread::computeThread()
    @     0x7febe8a93c80  (unknown)
    @     0x7febeadfe6ba  start_thread
    @     0x7febeab3482d  clone
    @              (nil)  (unknown)
Aborted (core dumped)

PaddlePaddle / DeepSpeech 大约 2 年 前同步成功

用自己的训练集训练报错

PaddlePaddle / DeepSpeech
大约 2 年前同步成功