Skip to content

  • 体验新版
    • 正在加载...
  • 登录
  • PaddlePaddle
  • DeepSpeech
  • Issue
  • #114

D
DeepSpeech
  • 项目概览

PaddlePaddle / DeepSpeech
大约 2 年 前同步成功

通知 210
Star 8425
Fork 1598
  • 代码
    • 文件
    • 提交
    • 分支
    • Tags
    • 贡献者
    • 分支图
    • Diff
  • Issue 245
    • 列表
    • 看板
    • 标记
    • 里程碑
  • 合并请求 3
  • Wiki 0
    • Wiki
  • 分析
    • 仓库
    • DevOps
  • 项目成员
  • Pages
D
DeepSpeech
  • 项目概览
    • 项目概览
    • 详情
    • 发布
  • 仓库
    • 仓库
    • 文件
    • 提交
    • 分支
    • 标签
    • 贡献者
    • 分支图
    • 比较
  • Issue 245
    • Issue 245
    • 列表
    • 看板
    • 标记
    • 里程碑
  • 合并请求 3
    • 合并请求 3
  • Pages
  • 分析
    • 分析
    • 仓库分析
    • DevOps
  • Wiki 0
    • Wiki
  • 成员
    • 成员
  • 收起侧边栏
  • 动态
  • 分支图
  • 创建新Issue
  • 提交
  • Issue看板
已关闭
开放中
Opened 1月 10, 2018 by saxon_zh@saxon_zhGuest

Failed to train LibriSpeech using example script.

Created by: misbullah

Hi, I tried to train model using all LibriSpeech dataset (960H) with data augmentation. I use 2 GPU GTX 1080. After Pass 10, I got the following error.

I0110 17:18:34.314939 47267 FirstOrderOptimizer.cpp:321] parameter=_batch_norm_4.w0 need clipping by local threshold=400, max grad =1.86242e+09, avg grad=7.7372e+07 I0110 17:18:34.315071 47267 FirstOrderOptimizer.cpp:321] parameter=_batch_norm_4.wbias need clipping by local threshold=400, max g rad=2.21661e+09, avg grad=7.25256e+07 I0110 17:18:34.319563 47267 FirstOrderOptimizer.cpp:321] parameter=_recurrent_layer_4.w0 need clipping by local threshold=400, max grad=4.5849e+09, avg grad=7.94203e+06 I0110 17:18:34.319880 47267 FirstOrderOptimizer.cpp:321] parameter=_recurrent_layer_4.wbias need clipping by local threshold=400, max grad=2.21661e+09, avg grad=7.25256e+07 .I0110 17:18:34.418779 47267 FirstOrderOptimizer.cpp:321] parameter=_batch_norm_0.w0 need clipping by local threshold=400, max gra d=942.53, avg grad=156.387 I0110 17:18:34.418969 47267 FirstOrderOptimizer.cpp:321] parameter=_batch_norm_0.wbias need clipping by local threshold=400, max g rad=1613.23, avg grad=205.539 I0110 17:18:34.419972 47267 FirstOrderOptimizer.cpp:321] parameter=_batch_norm_1.w0 need clipping by local threshold=400, max grad =1028.06, avg grad=339.691 I0110 17:18:34.420138 47267 FirstOrderOptimizer.cpp:321] parameter=_batch_norm_1.wbias need clipping by local threshold=400, max g rad=1713.61, avg grad=505.019 I0110 17:18:34.426007 47267 FirstOrderOptimizer.cpp:321] parameter=_batch_norm_2.w0 need clipping by local threshold=400, max grad =905.969, avg grad=26.9515 I0110 17:18:34.426167 47267 FirstOrderOptimizer.cpp:321] parameter=_batch_norm_2.wbias need clipping by local threshold=400, max g rad=1077.66, avg grad=32.965 I0110 17:18:34.430650 47267 FirstOrderOptimizer.cpp:321] parameter=_recurrent_layer_0.w0 need clipping by local threshold=400, max grad=2006.8, avg grad=1.30195 I0110 17:18:34.430976 47267 FirstOrderOptimizer.cpp:321] parameter=_recurrent_layer_0.wbias need clipping by local threshold=400, max grad=1071.72, avg grad=31.856 .*** Aborted at 1515575915 (unix time) try "date -d @1515575915" if you are using GNU date *** [37/1957] PC: @ 0x0 (unknown)

* SIGFPE (@0x7fc03283ad89) received by PID 47267 (TID 0x7fc0b121c740) from PID 847490441; stack trace: *

@ 0x7fc0b0e11330 (unknown) @ 0x7fc03283ad89 paddle::GpuVectorT<>::getAbsMax() @ 0x7fc032afbef6 paddle::OptimizerWithGradientClipping::update() @ 0x7fc032ae1ddd paddle::SgdThreadUpdater::updateImpl() @ 0x7fc03299ed51 ParameterUpdater::update() @ 0x7fc03257a336 _wrap_ParameterUpdater_update @ 0x52714b PyEval_EvalFrameEx @ 0x555551 PyEval_EvalCodeEx @ 0x525560 PyEval_EvalFrameEx @ 0x555551 PyEval_EvalCodeEx @ 0x524338 PyEval_EvalFrameEx @ 0x555551 PyEval_EvalCodeEx @ 0x524338 PyEval_EvalFrameEx @ 0x555551 PyEval_EvalCodeEx @ 0x525560 PyEval_EvalFrameEx @ 0x555551 PyEval_EvalCodeEx @ 0x525560 PyEval_EvalFrameEx @ 0x567d14 (unknown) @ 0x465bf4 PyRun_FileExFlags @ 0x46612d PyRun_SimpleFileExFlags @ 0x466d92 Py_Main @ 0x7fc0b0a59f45 __libc_start_main @ 0x577c2e (unknown) @ 0x0 (unknown) run_train.sh: line 35: 47267 Floating point exception(core dumped) CUDA_VISIBLE_DEVICES=0,2 python -u train.py --init_model_path='/var /nlp/alim/paddle-deepspeech/checkpoints/libri/params.latest.tar.gz' --batch_size=16 --trainer_count=2 --num_passes=20 --num_proc_data= 16 --num_conv_layers=2 --num_rnn_layers=3 --rnn_layer_size=1024 --num_iter_print=100 --learning_rate=5e-4 --max_duration=27.0 --min_du ration=0.0 --test_off=False --use_sortagrad=True --use_gru=False --use_gpu=True --is_local=True --share_rnn_weights=True --train_manif est='data/librispeech/manifest.train' --dev_manifest='data/librispeech/manifest.dev-clean' --mean_std_path='data/librispeech/mean_std. npz' --vocab_path='data/librispeech/vocab.txt' --output_model_dir='./checkpoints/libri' --augment_conf_path='conf/augmentation.config' --specgram_type='linear' --shuffle_method='batch_shuffle_clipped' Failed in training!

Any suggestion?

Thanks, Alim

指派人
分配到
无
里程碑
无
分配里程碑
工时统计
无
截止日期
无
标识: paddlepaddle/DeepSpeech#114
渝ICP备2023009037号

京公网安备11010502055752号

网络110报警服务 Powered by GitLab CE v13.7
开源知识
Git 入门 Pro Git 电子书 在线学 Git
Markdown 基础入门 IT 技术知识开源图谱
帮助
使用手册 反馈建议 博客
《GitCode 隐私声明》 《GitCode 服务条款》 关于GitCode
Powered by GitLab CE v13.7