Skip to content

  • 体验新版
    • 正在加载...
  • 登录
  • PaddlePaddle
  • Paddle
  • Issue
  • #12854

P
Paddle
  • 项目概览

PaddlePaddle / Paddle
大约 2 年 前同步成功

通知 2325
Star 20933
Fork 5424
  • 代码
    • 文件
    • 提交
    • 分支
    • Tags
    • 贡献者
    • 分支图
    • Diff
  • Issue 1423
    • 列表
    • 看板
    • 标记
    • 里程碑
  • 合并请求 543
  • Wiki 0
    • Wiki
  • 分析
    • 仓库
    • DevOps
  • 项目成员
  • Pages
P
Paddle
  • 项目概览
    • 项目概览
    • 详情
    • 发布
  • 仓库
    • 仓库
    • 文件
    • 提交
    • 分支
    • 标签
    • 贡献者
    • 分支图
    • 比较
  • Issue 1,423
    • Issue 1,423
    • 列表
    • 看板
    • 标记
    • 里程碑
  • 合并请求 543
    • 合并请求 543
  • Pages
  • 分析
    • 分析
    • 仓库分析
    • DevOps
  • Wiki 0
    • Wiki
  • 成员
    • 成员
  • 收起侧边栏
  • 动态
  • 分支图
  • 创建新Issue
  • 提交
  • Issue看板
已关闭
开放中
Opened 8月 22, 2018 by saxon_zh@saxon_zhGuest

多机下Unknown exception caught

Created by: ccmeteorljh

paddle版本:0.15 模型:se-resnext 环境:多机单卡异步 paddlecloud地址: http://paddlecloud.baidu-int.com:8088/paddle/jobRunInfo?jobId=job-e6c5b7cd20745dde&flag=jobs&groupName=k8s_gpu_demo&groupId=c0a1f165-6279-5320-b9e7-e0218c7a87f5&currentPage=1&currentKey=1

F0822 11:08:56.576814  3256 exception_holder.h:34] Unknown exception caught
*** Check failure stack trace: ***
    @     0x7f5fa75d853d  google::LogMessage::Fail()
    @     0x7f5fa75dbfec  google::LogMessage::SendToLog()
    @     0x7f5fa75d8063  google::LogMessage::Flush()
    @     0x7f5fa75dd4fe  google::LogMessageFatal::~LogMessageFatal()
    @     0x7f5fa839c002  paddle::framework::details::ExceptionHolder::Catch()
    @     0x7f5fa839894d  _ZZN6paddle9framework7details24ThreadedSSAGraphExecutor5RunOpEPNS0_13BlockingQueueIPNS1_13VarHandleBaseEEEPNS1_12OpHandleBaseEENKUlvE_clEv
    @     0x7f5fa8398ea5  paddle::framework::details::ThreadedSSAGraphExecutor::RunOp()
    @     0x7f5fa839aa58  paddle::framework::details::ThreadedSSAGraphExecutor::Run()
    @     0x7f5fa83a0db7  paddle::framework::details::ScopeBufferedSSAGraphExecutor::Run()
    @     0x7f5fa763675c  paddle::framework::ParallelExecutor::Run()
    @     0x7f5fa7549780  _ZZN8pybind1112cpp_function10initializeIZN6paddle6pybindL13pybind11_initEvEUlRNS2_9framework16ParallelExecutorERKSt6vectorISsSaISsEERKSsE91_vIS6_SB_SD_EINS_4nameENS_9is_methodENS_7siblingEEEEvOT_PFT0_DpT1_EDpRKT2_ENUlRNS_6detail13function_callEE1_4_FUNESV_
    @     0x7f5fa7565484  pybind11::cpp_function::dispatcher()
    @     0x7f60108df631  PyEval_EvalFrameEx
    @     0x7f60108e0bce  PyEval_EvalCodeEx
    @     0x7f60108df20a  PyEval_EvalFrameEx
    @     0x7f60108e0bce  PyEval_EvalCodeEx
    @     0x7f60108df20a  PyEval_EvalFrameEx
    @     0x7f60108df560  PyEval_EvalFrameEx
    @     0x7f60108df560  PyEval_EvalFrameEx
    @     0x7f60108e0bce  PyEval_EvalCodeEx
    @     0x7f60108e0ce2  PyEval_EvalCode
    @     0x7f60109009e0  PyRun_FileExFlags
    @     0x7f6010900bbf  PyRun_SimpleFileExFlags
    @     0x7f6010916454  Py_Main
    @     0x7f600fbcacdd  __libc_start_main
*** Aborted at 1534907336 (unix time) try "date -d @1534907336" if you are using GNU date ***
PC: @                0x0 (unknown)
*** SIGABRT (@0xcb8) received by PID 3256 (TID 0x7f6010fd9700) from PID 3256; stack trace: ***
    @     0x7f60105d9500 (unknown)
    @     0x7f600fbde8a5 __GI_raise
    @     0x7f600fbe0085 __GI_abort
    @     0x7f5fa75e2fbb google::FindSymbol()
    @     0x7f5fa75e397a google::GetSymbolFromObjectFile()
    @     0x7f5fa75e4042 google::SymbolizeAndDemangle()
    @     0x7f5fa75e1848 google::DumpStackTrace()
    @     0x7f5fa75e1906 google::DumpStackTraceAndExit()
    @     0x7f5fa75d853d google::LogMessage::Fail()
    @     0x7f5fa75dbfec google::LogMessage::SendToLog()
    @     0x7f5fa75d8063 google::LogMessage::Flush()
    @     0x7f5fa75dd4fe google::LogMessageFatal::~LogMessageFatal()
    @     0x7f5fa839c002 paddle::framework::details::ExceptionHolder::Catch()
    @     0x7f5fa839894d _ZZN6paddle9framework7details24ThreadedSSAGraphExecutor5RunOpEPNS0_13BlockingQueueIPNS1_13VarHandleBaseEEEPNS1_12OpHandleBaseEENKUlvE_clEv
    @     0x7f5fa8398ea5 paddle::framework::details::ThreadedSSAGraphExecutor::RunOp()
    @     0x7f5fa839aa58 paddle::framework::details::ThreadedSSAGraphExecutor::Run()
    @     0x7f5fa83a0db7 paddle::framework::details::ScopeBufferedSSAGraphExecutor::Run()
    @     0x7f5fa763675c paddle::framework::ParallelExecutor::Run()
    @     0x7f5fa7549780 _ZZN8pybind1112cpp_function10initializeIZN6paddle6pybindL13pybind11_initEvEUlRNS2_9framework16ParallelExecutorERKSt6vectorISsSaISsEERKSsE91_vIS6_SB_SD_EINS_4nameENS_9is_methodENS_7siblingEEEEvOT_PFT0_DpT1_EDpRKT2_ENUlRNS_6detail13function_callEE1_4_FUNESV_
    @     0x7f5fa7565484 pybind11::cpp_function::dispatcher()
    @     0x7f60108df631 PyEval_EvalFrameEx
    @     0x7f60108e0bce PyEval_EvalCodeEx
    @     0x7f60108df20a PyEval_EvalFrameEx
    @     0x7f60108e0bce PyEval_EvalCodeEx
    @     0x7f60108df20a PyEval_EvalFrameEx
    @     0x7f60108df560 PyEval_EvalFrameEx
    @     0x7f60108df560 PyEval_EvalFrameEx
    @     0x7f60108e0bce PyEval_EvalCodeEx
    @     0x7f60108e0ce2 PyEval_EvalCode
    @     0x7f60109009e0 PyRun_FileExFlags
    @     0x7f6010900bbf PyRun_SimpleFileExFlags
    @     0x7f6010916454 Py_Main
/root/paddlejob/run.sh: line 239:  3256 Aborted                 (core dumped) python train.py
指派人
分配到
无
里程碑
无
分配里程碑
工时统计
无
截止日期
无
标识: paddlepaddle/Paddle#12854
渝ICP备2023009037号

京公网安备11010502055752号

网络110报警服务 Powered by GitLab CE v13.7
开源知识
Git 入门 Pro Git 电子书 在线学 Git
Markdown 基础入门 IT 技术知识开源图谱
帮助
使用手册 反馈建议 博客
《GitCode 隐私声明》 《GitCode 服务条款》 关于GitCode
Powered by GitLab CE v13.7