Skip to content

  • 体验新版
    • 正在加载...
  • 登录
  • PaddlePaddle
  • Paddle
  • Issue
  • #12567

P
Paddle
  • 项目概览

PaddlePaddle / Paddle
大约 2 年 前同步成功

通知 2325
Star 20933
Fork 5424
  • 代码
    • 文件
    • 提交
    • 分支
    • Tags
    • 贡献者
    • 分支图
    • Diff
  • Issue 1423
    • 列表
    • 看板
    • 标记
    • 里程碑
  • 合并请求 543
  • Wiki 0
    • Wiki
  • 分析
    • 仓库
    • DevOps
  • 项目成员
  • Pages
P
Paddle
  • 项目概览
    • 项目概览
    • 详情
    • 发布
  • 仓库
    • 仓库
    • 文件
    • 提交
    • 分支
    • 标签
    • 贡献者
    • 分支图
    • 比较
  • Issue 1,423
    • Issue 1,423
    • 列表
    • 看板
    • 标记
    • 里程碑
  • 合并请求 543
    • 合并请求 543
  • Pages
  • 分析
    • 分析
    • 仓库分析
    • DevOps
  • Wiki 0
    • Wiki
  • 成员
    • 成员
  • 收起侧边栏
  • 动态
  • 分支图
  • 创建新Issue
  • 提交
  • Issue看板
已关闭
开放中
Opened 8月 07, 2018 by saxon_zh@saxon_zhGuest

[Blocker] CRNN-CTC crashes when the training is ran

Created by: Sand3r-

Description

CRNN-CTC from the newest models developo branch crashes during training when ran with newest paddle version from develop branch ().

Models commit hash: 4c6882ab55fbf9b03c405d2fce2b65a3ba90970f Paddle commit hash: 3300a532

Steps to reproduce

  1. Download the newest paddle version and build (potentially with MKLDNN support)
  2. Download the models repository
  3. In the main models directory run source fluid/ocr_recognition/scripts/train.sh CPU

Actual behavior

The training results with the following output:

/home/mgallus/src/paddle/build/python/paddle/fluid/evaluator.py:69: Warning: The EditDistance is deprecated, because maintain a modified program inside evaluator cause bug easily, please use fluid.metrics.EditDistance instead.
  % (self.__class__.__name__, self.__class__.__name__), Warning)
*** Aborted at 1533633833 (unix time) try "date -d @1533633833" if you are using GNU date ***
PC: @                0x0 (unknown)
*** SIGSEGV (@0x0) received by PID 45492 (TID 0x7fa4bf86d700) from PID 0; stack trace: ***
    @     0x7fa4bf460390 (unknown)
    @     0x7fa4a42ff6ac paddle::framework::make_dim<>()
    @     0x7fa4a42ff74f paddle::framework::make_ddim()
    @     0x7fa4a42ffb98 paddle::framework::make_ddim()
    @     0x7fa4a42ffc99 paddle::framework::make_ddim()
    @     0x7fa4a3933731 paddle::operators::trim_trailing_singular_dims()
    @     0x7fa4a3e9a1cc paddle::operators::ElementwiseComputeEx<>()
    @     0x7fa4a3e90a16 paddle::operators::CompareOpKernel<>::Compute()
    @     0x7fa4a3e89fe5 _ZZNK6paddle9framework24OpKernelRegistrarFunctorINS_8platform8CPUPlaceELb0ELm2EINS_9operators15CompareOpKernelINS2_16CPUDeviceContextENS4_12EqualFunctorIiEEEENS5_IS6_NS7_IlEEEENS5_IS6_NS7_IfEEEENS5_IS6_NS7_IdEEEEEEclEPKcSI_ENKUlRKNS0_16ExecutionContextEE_clESL_
    @     0x7fa4a3ea4fe3 _ZNSt17_Function_handlerIFvRKN6paddle9framework16ExecutionContextEEZNKS1_24OpKernelRegistrarFunctorINS0_8platform8CPUPlaceELb0ELm2EINS0_9operators15CompareOpKernelINS7_16CPUDeviceContextENS9_12EqualFunctorIiEEEENSA_ISB_NSC_IlEEEENSA_ISB_NSC_IfEEEENSA_ISB_NSC_IdEEEEEEclEPKcSN_EUlS4_E_E9_M_invokeERKSt9_Any_dataS4_
    @     0x7fa4a4293087 std::function<>::operator()()
    @     0x7fa4a428e8fc paddle::framework::OperatorWithKernel::RunImpl()
    @     0x7fa4a428b040 paddle::framework::OperatorBase::Run()
    @     0x7fa4a35c2871 paddle::framework::Executor::RunPreparedContext()
    @     0x7fa4a35bf27e paddle::framework::Executor::Run()
    @     0x7fa4a3401ac1 _ZZN6paddle6pybindL13pybind11_initEvENKUlRNS_9framework8ExecutorERKNS1_11ProgramDescEPNS1_5ScopeEibbE63_clES3_S6_S8_ibb
    @     0x7fa4a3427f04 _ZN8pybind116detail15argument_loaderIJRN6paddle9framework8ExecutorERKNS3_11ProgramDescEPNS3_5ScopeEibbEE9call_implIvRZNS2_6pybindL13pybind11_initEvEUlS5_S8_SA_ibbE63_JLm0ELm1ELm2ELm3ELm4ELm5EEEET_OT0_NS0_14index_sequenceIJXspT1_EEEE
    @     0x7fa4a342553f _ZN8pybind116detail15argument_loaderIJRN6paddle9framework8ExecutorERKNS3_11ProgramDescEPNS3_5ScopeEibbEE4callIvRZNS2_6pybindL13pybind11_initEvEUlS5_S8_SA_ibbE63_EENSt9enable_ifIXsrSt7is_voidIT_E5valueENS0_9void_typeEE4typeEOT0_
    @     0x7fa4a341f37d _ZZN8pybind1112cpp_function10initializeIZN6paddle6pybindL13pybind11_initEvEUlRNS2_9framework8ExecutorERKNS4_11ProgramDescEPNS4_5ScopeEibbE63_vJS6_S9_SB_ibbEJNS_4nameENS_9is_methodENS_7siblingEEEEvOT_PFT0_DpT1_EDpRKT2_ENKUlRNS_6detail13function_callEE1_clEST_
    @     0x7fa4a341f42b _ZZN8pybind1112cpp_function10initializeIZN6paddle6pybindL13pybind11_initEvEUlRNS2_9framework8ExecutorERKNS4_11ProgramDescEPNS4_5ScopeEibbE63_vJS6_S9_SB_ibbEJNS_4nameENS_9is_methodENS_7siblingEEEEvOT_PFT0_DpT1_EDpRKT2_ENUlRNS_6detail13function_callEE1_4_FUNEST_
    @     0x7fa4a3439e32 pybind11::cpp_function::dispatcher()
    @           0x4c37ed PyEval_EvalFrameEx
    @           0x4b9ab6 PyEval_EvalCodeEx
    @           0x4c16e7 PyEval_EvalFrameEx
    @           0x4b9ab6 PyEval_EvalCodeEx
    @           0x4c1e6f PyEval_EvalFrameEx
    @           0x4b9ab6 PyEval_EvalCodeEx
    @           0x4c16e7 PyEval_EvalFrameEx
    @           0x4c136f PyEval_EvalFrameEx
    @           0x4b9ab6 PyEval_EvalCodeEx
    @           0x4eb30f (unknown)
    @           0x4e5422 PyRun_FileExFlags
./train.sh: line 54: 45492 Segmentation fault      (core dumped) python ../ctc_train.py --use_gpu $use_gpu --parallel $parallel --batch_size $batch_size --save_model_period 1 --total_step 1 --save_model_dir $save_model_dir

Expected behavior

The training should run just fine.

Additional notes

In the case of MKLDNN output, I have investigated that the error occurs upon calling ElementwiseOp of operation equal on fill_constant and editdistance ops. The fill_constant has had its dimensions set to 0, which caused trim_trailing_singular_dims() to yield error upon creating a dim with make_ddim().

However, since there are different errors depending on whether CPU or MKLDNN versions are used, I guess the problem lays somewhere deeper.

指派人
分配到
无
里程碑
无
分配里程碑
工时统计
无
截止日期
无
标识: paddlepaddle/Paddle#12567
渝ICP备2023009037号

京公网安备11010502055752号

网络110报警服务 Powered by GitLab CE v13.7
开源知识
Git 入门 Pro Git 电子书 在线学 Git
Markdown 基础入门 IT 技术知识开源图谱
帮助
使用手册 反馈建议 博客
《GitCode 隐私声明》 《GitCode 服务条款》 关于GitCode
Powered by GitLab CE v13.7