Skip to content

  • 体验新版
    • 正在加载...
  • 登录
  • PaddlePaddle
  • Paddle
  • Issue
  • #17209

P
Paddle
  • 项目概览

PaddlePaddle / Paddle
大约 2 年 前同步成功

通知 2325
Star 20933
Fork 5424
  • 代码
    • 文件
    • 提交
    • 分支
    • Tags
    • 贡献者
    • 分支图
    • Diff
  • Issue 1423
    • 列表
    • 看板
    • 标记
    • 里程碑
  • 合并请求 543
  • Wiki 0
    • Wiki
  • 分析
    • 仓库
    • DevOps
  • 项目成员
  • Pages
P
Paddle
  • 项目概览
    • 项目概览
    • 详情
    • 发布
  • 仓库
    • 仓库
    • 文件
    • 提交
    • 分支
    • 标签
    • 贡献者
    • 分支图
    • 比较
  • Issue 1,423
    • Issue 1,423
    • 列表
    • 看板
    • 标记
    • 里程碑
  • 合并请求 543
    • 合并请求 543
  • Pages
  • 分析
    • 分析
    • 仓库分析
    • DevOps
  • Wiki 0
    • Wiki
  • 成员
    • 成员
  • 收起侧边栏
  • 动态
  • 分支图
  • 创建新Issue
  • 提交
  • Issue看板
已关闭
开放中
Opened 5月 05, 2019 by saxon_zh@saxon_zhGuest

MPI训练:has no optimize block

Created by: ninnkotora

  • 版本、环境信息:    1)PaddlePaddle版本:1.3    2)CPU    3)Python2.7

  • 问题描述:请详细描述您的问题,同步贴出报错信息、日志、可复现的代码片段 app/pserver.log中一直有下面警告:pserver [xxxxx:yyyy] has no optimize block!! 现象是程序执行不动了,像是哪里卡住了, 看日志,程序执行到红线部分就不动了 image

Sun May 5 18:54:25 2019[1,29]:2019-05-05 18:54:25,830 - WARNING - pserver [10.182.14.28:62005] has no optimize block!! Sun May 5 18:54:25 2019[1,29]:I0505 18:54:25.845669 138469 grpc_server.cc:430] Server listening on 10.182.14.28:62005 selected port: 62005 Sun May 5 18:54:25 2019[1,45]:2019-05-05 18:54:25 dist train Sun May 5 18:54:25 2019[1,45]:2019-05-05 18:54:25 run pserver Sun May 5 18:54:25 2019[1,45]:get_pserver_program() is deprecated, call get_pserver_programs() to get pserver main and startup in a single call. Sun May 5 18:54:25 2019[1,45]:2019-05-05 18:54:25,872 - WARNING - pserver [10.182.109.38:62005] has no optimize block!! Sun May 5 18:54:25 2019[1,45]:I0505 18:54:25.883363 41628 grpc_server.cc:430] Server listening on 10.182.109.38:62005 selected port: 62005 Sun May 5 18:54:26 2019[1,3]:2019-05-05 18:54:26 dist train Sun May 5 18:54:26 2019[1,3]:2019-05-05 18:54:26 run pserver Sun May 5 18:54:26 2019[1,3]:get_pserver_program() is deprecated, call get_pserver_programs() to get pserver main and startup in a single call. Sun May 5 18:54:26 2019[1,3]:2019-05-05 18:54:26,054 - WARNING - pserver [10.182.10.152:62005] has no optimize block!! Sun May 5 18:54:26 2019[1,3]:I0505 18:54:26.068373 1449 grpc_server.cc:430] Server listening on 10.182.10.152:62005 selected port: 62005 Sun May 5 18:54:27 2019[1,36]:2019-05-05 18:54:27 dist train Sun May 5 18:54:27 2019[1,36]:2019-05-05 18:54:27 run pserver Sun May 5 18:54:27 2019[1,36]:get_pserver_program() is deprecated, call get_pserver_programs() to get pserver main and startup in a single call. Sun May 5 18:54:27 2019[1,36]:2019-05-05 18:54:27,057 - WARNING - pserver [10.182.13.18:62005] has no optimize block!! Sun May 5 18:54:27 2019[1,36]:I0505 18:54:27.068897 50413 grpc_server.cc:430] Server listening on 10.182.13.18:62005 selected port: 62005 Sun May 5 18:54:27 2019[1,19]:2019-05-05 18:54:27 dist train Sun May 5 18:54:27 2019[1,19]:2019-05-05 18:54:27 run pserver Sun May 5 18:54:27 2019[1,19]:get_pserver_program() is deprecated, call get_pserver_programs() to get pserver main and startup in a single call. Sun May 5 18:54:27 2019[1,19]:2019-05-05 18:54:27,295 - WARNING - pserver [10.182.15.142:62005] has no optimize block!!

最终在train.log中报错: Sun May 5 01:18:43 2019[1,12]:F0505 01:18:43.651557 33534 grpc_client.cc:418] BatchBarrierRPC name:[BATCH_BARRIER@RECV], ep:[10.182.65.144:62004], status:[-1] meets grpc error, error_code:14 error_message:Socket closed error_details: Sun May 5 01:18:43 2019[1,15]:F0505 01:18:43.666119 45154 grpc_client.cc:418] BatchBarrierRPC name:[BATCH_BARRIER@RECV], ep:[10.182.65.144:62004], status:[-1] meets grpc error, error_code:14 error_message:Socket closed error_details: Sun May 5 01:18:43 2019[1,13]:F0505 01:18:43.658960 30178 grpc_client.cc:418] BatchBarrierRPC name:[BATCH_BARRIER@RECV], ep:[10.182.65.144:62004], status:[-1] meets grpc error, error_code:14 error_message:Socket closed error_details: Sun May 5 01:18:43 2019[1,18]:F0505 01:18:43.684461 47413 grpc_client.cc:418] BatchBarrierRPC name:[BATCH_BARRIER@RECV], ep:[10.182.65.144:62004], status:[-1] meets grpc error, error_code:14 error_message:Socket closed error_details: Sun May 5 01:18:43 2019[1,37]:F0505 01:18:43.684893 21399 grpc_client.cc:418] BatchBarrierRPC name:[BATCH_BARRIER@RECV], ep:[10.182.65.144:62004], status:[-1] meets grpc error, error_code:14 error_message:Socket closed error_details: Sun May 5 01:18:43 2019[1,35]:F0505 01:18:43.672283 42122 grpc_client.cc:418] BatchBarrierRPC name:[BATCH_BARRIER@RECV], ep:[10.182.65.144:62004], status:[-1] meets grpc error, error_code:14 error_message:Socket closed error_details:

指派人
分配到
无
里程碑
无
分配里程碑
工时统计
无
截止日期
无
标识: paddlepaddle/Paddle#17209
渝ICP备2023009037号

京公网安备11010502055752号

网络110报警服务 Powered by GitLab CE v13.7
开源知识
Git 入门 Pro Git 电子书 在线学 Git
Markdown 基础入门 IT 技术知识开源图谱
帮助
使用手册 反馈建议 博客
《GitCode 隐私声明》 《GitCode 服务条款》 关于GitCode
Powered by GitLab CE v13.7