Skip to content

  • 体验新版
    • 正在加载...
  • 登录
  • PaddlePaddle
  • Paddle
  • Issue
  • #3123

P
Paddle
  • 项目概览

PaddlePaddle / Paddle
大约 2 年 前同步成功

通知 2325
Star 20933
Fork 5424
  • 代码
    • 文件
    • 提交
    • 分支
    • Tags
    • 贡献者
    • 分支图
    • Diff
  • Issue 1423
    • 列表
    • 看板
    • 标记
    • 里程碑
  • 合并请求 543
  • Wiki 0
    • Wiki
  • 分析
    • 仓库
    • DevOps
  • 项目成员
  • Pages
P
Paddle
  • 项目概览
    • 项目概览
    • 详情
    • 发布
  • 仓库
    • 仓库
    • 文件
    • 提交
    • 分支
    • 标签
    • 贡献者
    • 分支图
    • 比较
  • Issue 1,423
    • Issue 1,423
    • 列表
    • 看板
    • 标记
    • 里程碑
  • 合并请求 543
    • 合并请求 543
  • Pages
  • 分析
    • 分析
    • 仓库分析
    • DevOps
  • Wiki 0
    • Wiki
  • 成员
    • 成员
  • 收起侧边栏
  • 动态
  • 分支图
  • 创建新Issue
  • 提交
  • Issue看板
已关闭
开放中
Opened 7月 31, 2017 by saxon_zh@saxon_zhGuest

网络配置dropout后训练错误

Created by: copytang

网络结构,双塔型结构,如下: 单侧结构:[256relu+dropout=0.2]->[128relu+dropout=0.5]->[64relu+dropout=0.5]

单机版正常运行,集群版运行错误,错误信息如下: Mon Jul 31 18:03:13 2017[1,9]:101 1098 SocketChannel.cpp:: ck failed: len > 0 pe] Cheken piped: len > 0 peer=70.87.87.2731 018:vCnt=4 iovs[curIov].base=0x7ffa73ffec40:13.844094 1070 SocketChannel.cpp:: 101Broken pipe [Check failed: len > 0 peer=10.87.138.24 curIov=0 iovCnt=4 Mon Jul 31 18:03:13 2017[1,9]:iovs[curIov].base=0x7ffae3ffe iovs[curIov].iov_len= curIov=0 iovCnt=4 iovs[curIov].base=0x7ffbc2bfcc40 iovs[curIov].iov_len=16: Broken pipe [32]: Broken pipe [32]F0731 18:03:13.844498 1067 SocketChannel.cpp:101] Check failed: len > 0 peer=10.87.138.25 curIov=0 iovCnt=4 iovs[curIov].base=0x7ffafd7fac40 iovs[curIov].iov_len=16: Broken pipe [32]F0731 18:03:13.844630 1065 SocketChannel.cpp:101] Check failed: len > 0 peer=10.87.138.26 curIov=0 iovCnt=4 iovs[curIov].base=0x7ffafebfcc40 iovs[curIov].iov_len=16: Broken pipe [32]F0731 18:03:13.844714 1076 SocketChannel.cpp:101] Check failed: len > 0 peer=10.87.138.19 curIov=0 iovCnt=4 iovs[curIov].base=0x7ffac7ffec40 iovs[curIov].iov_len=16: Broken pipe [32]F0731 18:03:13.844947 1061 SocketChannel.cpp:101] Check failed: len > 0 peer=10.87.138.28 curIov=0 iovCnt=4 iovs[curIov].base=0x7ffb197fac40 iovs[curIov].iov_len=16: Broken pipe [32]F0731 18:03:13.842777 1028 SocketChannel.cpp:101] Check failed: len > 0 peer=10.87.87.13 curIov=0 iovCnt=4 iovs[curIov].base=0x7ffba6bfcc40 iovs[curIov].iov_len=16: Broken pipe [32]F0731 18:03:13.845197 1019 SocketChannel.cpp:101] Check failed: len > 0 peer=10.87.87.17 curIov=0 iovCnt=4 iovs[curIov].base=0x7ffbc3ffec40 iovs[curIov].iov_len=16: Broken pipe [32]F0731 18:03:13.845266 1063 SocketChannel.cpp:101] Check failed: len > 0 peer=10.87.138.27 curIov=0 iovCnt=4 iovs[curIov].base=0x7ffaffffec40 iovs[curIov].iov_len=16: Broken pipe [32]F0731 18:03:13.845346 1030 SocketChannel.cpp:101] Check failed: len > 0 peer=10.87.87.14 curIov=0 iovCnt=4 iovs[curIov].base=0x7ffba57fac40 iovs[curIov].iov_len=16: Broken pipe [32]F0731 18:03:13.845851 1102 SocketChannel.cpp:101] Check failed: len > 0 peer=10.87.87.11 curIov=0 iovCnt=4 iovs[curIov].base=0x7ffa717fac40 iovs[curIov].iov_len=16: Broken pipe [32]F0731 18:03:13 Mon Jul 31 18:03:13 2017[1,9]:.845903 1057 SocketChannel.cpp:101] Check failed: len > 0 peer=10.87.138.20 curIov=0 iovCnt=4 iovs[curIov].base=0x7ffb1bffec40 iovs[curIov].iov_len=16: Broken pipe [32]F0731 18:03:13.846002 1072 SocketChannel.cpp:101] Check failed: len > 0 peer=10.87.138.23 curIov=0 iovCnt=4 iovs[curIov].base=0x7ffae2bfcc40 iovs[curIov].iov_len=16: Broken pipe [32]F0731 18:03:13.843533 1034 SocketChannel.cpp:101] Check failed: len > 0 peer=10.87.87.45 curIov=0 iovCnt=5 iovs[curIov].base=0x7ffb82bfcc40 iovs[curIov].iov_len=16: Broken pipe [32] Mon Jul 31 18:03:13 2017[1,9]: [32]F0731 18:03:13.848150 1055 SocketChannel.cpp:101] Check failed: len > 0 peer=10.87.138.21 curIov=0 iovCnt=4 iovs[curIov].base=0x7ffb357fac40 iovs[curIov].iov_len=16: Broken pipe [32]F0731 18:03:13.848290 1087 SocketChannel.cpp:101] Check failed: len > 0 peer=10.87.87.25 curIov=0 iovCnt=4 iovs[curIov].base=0x7ffaa97fac40 iovs[curIov].iov_len=16F: 31 18:03 [32.848299 1081 SocketChannel.cpp:101] Check failed: len > 0 peer=10.87.138.32 curIov=0 iovCnt=4 iovs[curIov].base=0x7ffac57fac40 iovs[curIov].iov_len=16: Broken pipe [32]F0731 18:03:13.848317 1094 SocketChannel.cpp:101] Check failed: len > 0 peer=10.87.87.24 curIov=0 iovCnt=4 iovs[curIov].base=0x7ffa8d7fac40 iovs[curIov].iov_len=16: Broken pipe [32]F0731 18:03:13.848337 1089 SocketChannel.cpp:101] Check failed: len > 0 peer=10.87.87.26 curIov=0 iovCnt=4 iovs[curIov].base=0x7ffa8fffec40 iovs[curIov].iov_len=16: Broken pipe [32]F0731 18:03:13.848343 1100 SocketChannel.cpp:101] Check failed: len > 0 peer=10.87.87.28 curIov=0 iovCnt=4 iovs[curIov].base=0x7ffa72bfcc40 iovs[curIov].iov_len=16: Broken pipe [32]F0731 18:03:13.848388 1083 SocketChannel.cpp:101] Check failed: len > 0 peer=10.87.87.21 curIov=0 iovCnt=4 iovs[curIov].base=0x7ffaabffec40 iovs[curIov].iov_len=16: Broken pipe [32]F0731 18:03:13.848433 1078 SocketChannel.cpp:101] Check failed: len > 0 peer=10.87.138.30 curIov=0 iovCnt=4 iovs[curIov].base=0x7ffac6bfcc40 iovs[curIov].iov_len=16F0731 18:03:: Broken pipe84843832 1085 SocketChannel.cpp Mon Jul 31 18:03:13 2017[1,9]:101] Check failed: len > 0 peer=10.87.87.20 curIov=0 iovCnt=4 iovs[curIov].base=0x7ffaaabfcc40 iovs[curIov].iov_len=16: Broken pipe [32] Mon Jul 31 18:03:13 2017[1,9]:*** Check failure stack trace: *** Mon Jul 31 18:03:13 2017[1,9]: @ 0x91316d google::LogMessage::Fail() Mon Jul 31 18:03:13 2017[1,9]: @ 0x91316d google::LogMessage::Fail() Mon Jul 31 18:03:13 2017[1,9]: @ 0x91316d google::LogMessage::Fail() Mon Jul 31 18:03:13 2017[1,9]: @ 0x91316d google::LogMessage::Fail() Mon Jul 31 18:03:13 2017[1,9]: @ 0x912c93 google::LogMessage::Flush() Mon Jul 31 18:03:13 2017[1,9]: @ 0x916c1c google::LogMessage::SendToLog() Mon Jul 31 18:03:13 2017[1,9]: @ 0x916c1c google::LogMessage::SendToLog() Mon Jul 31 18:03:13 2017[1,9]: @ 0x76b112 paddle::ProtoClient::recv() Mon Jul 31 18:03:13 2017[1,9]: @ 0x91316d google::LogMessage::Fail() Mon Jul 31 18:03:13 2017[1,9]: @ 0x91316d google::LogMessage::Fail() Mon Jul 31 18:03:13 2017[1,9]: @ 0x91316d google::LogMessage::Fail() Mon Jul 31 18:03:13 2017[1,9]: @ 0x91316d google::LogMessage::Fail() Mon Jul 31 18:03:13 2017[1,9]: @ 0x91316d google::LogMessage::Fail() Mon Jul 31 18:03:13 2017[1,9]: @ 0x91316d google::LogMessage::Fail() Mon Jul 31 18:03:13 2017[1,9]: @ 0x91316d google::LogMessage::Fail() Mon Jul 31 18:03:13 2017[1,9]: @ 0x91316d google::LogMessage::Fail() Mon Jul 31 18:03:13 2017[1,9]: @ 0x91316d google::LogMessage::Fail() Mon Jul 31 18:03:13 2017[1,9]: @ 0x916c1c google::LogMessage::SendToLog() Mon Jul 31 18:03:13 2017[1,9]: @ 0x916c1c google::LogMessage::SendToLog() Mon Jul 31 18:03:13 2017[1,9]: @ 0x912c93 google::LogMessage::Flush() Mon Jul 31 18:03:13 2017[1,9]: @ 0x916c1c google::LogMessage::SendToLog() Mon Jul 31 18:03:13 2017[1,9]: @ 0x916c1c google::LogMessage::SendToLog() Mon Jul 31 18:03:13 2017[1,9]: @ 0x916c1c google::LogMessage::SendToLog() Mon Jul 31 18:03:13 2017[1,9]: @ 0x91316d google::LogMessage::Fail() Mon Jul 31 18:03:13 2017[1,9]: @ 0x916c1c google::LogMessage::SendToLog() Mon Jul 31 18:03:13 2017[1,9]: @ 0x916c1c google::LogMessage::SendToLog() Mon Jul 31 18:03:13 2017[1,9]: @ 0x912e99 google::LogMessage::~LogMessage() Mon Jul 31 18:03:13 2017[1,9]: @ 0x912c93 google::LogMessage::Flush() Mon Jul 31 18:03:13 2017[1,9]: @ 0x916c1c google::LogMessage::SendToLog() Mon Jul 31 18:03:13 2017[1,9]: @ 0x912c93 google::LogMessage::Flush() Mon Jul 31 18:03:13 2017[1,9]: @ 0x916c1c google::LogMessage::SendToLog() Mon Jul 31 18:03:13 2017[1,9]: @ 0x916c1c google::LogMessage::SendToLog() Mon Jul 31 18:03:13 2017[1,9]: @ 0x91316d google::LogMessage::Fail() Mon Jul 31 18:03:13 2017[1,9]: @ 0xf0b668 paddle::ParameterClient2::recv() Mon Jul 31 18:03:13 2017[1,9]: @ 0x91316d google::LogMessage::Fail() Mon Jul 31 18:03:13 2017[1,9]: @ 0x91316d google::LogMessage::Fail() Mon Jul 31 18:03:13 2017[1,9]: @ 0x912c93 google::LogMessage::Flush() Mon Jul 31 18:03:13 2017[1,9]: @ 0x912e99 google::LogMessage::~LogMessage() Mon Jul 31 18:03:13 2017[1,9]: @ 0x916147 google::ErrnoLogMessage::~ErrnoLogMessage() Mon Jul 31 18:03:13 2017[1,9]: @ 0x769b9c paddle::readwritev<>() Mon Jul 31 18:03:13 2017[1,9]: @ 0x912c93 google::LogMessage::Flush() Mon Jul 31 18:03:13 2017[1,9]: @ 0x912c93 google::LogMessage::Flush() Mon Jul 31 18:03:13 2017[1,9]: @ 0x912e99 google::LogMessage::~LogMessage() Mon Jul 31 18:03:13 2017[1,9]: @ 0x916c1c google::LogMessage::SendToLog() Mon Jul 31 18:03:13 2017[1,9]: @ 0x916147 google::ErrnoLogMessage::~ErrnoLogMessage() Mon Jul 31 18:03:13 2017[1,9]: @ 0x769b9c paddle::readwritev<>() Mon Jul 31 18:03:13 2017[1,9]: @ 0x7ffe0aa7b8a0 execute_native_thread_routine Mon Jul 31 18:03:13 2017[1,9]: @ 0x912e99 google::LogMessage::~LogMessage() Mon Jul 31 18:03:13 2017[1,9]: @ 0x7ffe0aeda1c3 start_thread Mon Jul 31 18:03:13 2017[1,9]: @ 0x916c1c google::LogMessage::SendToLog() Mon Jul 31 18:03:13 2017[1,9]: @ 0x916c1c google::LogMessage::SendToLog() Mon Jul 31 18:03:13 2017[1,9]: @ 0x912c93 google::LogMessage::Flush() Mon Jul 31 18:03:13 2017[1,9]: @ 0x7ffe0a1ec12d __clone Mon Jul 31 18:03:13 2017[1,9]: @ 0x912e99 google::LogMessage::~LogMessage() Mon Jul 31 18:03:13 2017[1,9]: @ 0x0 (unknown) Mon Jul 31 18:03:13 2017[1,9]: @ 0x76abad paddle::SocketChannel::writeMessage() Mon Jul 31 18:03:13 2017[1,9]: @ 0x916c1c google::LogMessage::SendToLog() Mon Jul 31 18:03:14 2017[1,29]:./train.sh: line 207: 9019 Segmentation fault PYTHONPATH=./paddle:$PYTHONPATH GLOG_logtostderr=0 GLOG_log_dir="./log" ./paddle_trainer --num_gradient_servers=${OMPI_COMM_WORLD_SIZE} --trainer_id=${OMPI_COMM_WORLD_RANK} --pservers=$ipstring --rdma_tcp=${rdma_tcp} --nics=${nics} ${train_arg} --config=conf/trainer_config.conf --save_dir=./${save_dir} ${extern_arg} Mon Jul 31 18:03:14 2017[1,29]:+ '[' 139 -ne 0 ']' 任务链接: http://10.87.87.21:8920/fileview.html?path=/home/disk1/normandy/maybach/41383/ train.log链接: http://10.87.87.21:8920/filetree?action=cat&path=/home/disk1/normandy/maybach/41383/workspace/log/train.log

指派人
分配到
无
里程碑
无
分配里程碑
工时统计
无
截止日期
无
标识: paddlepaddle/Paddle#3123
渝ICP备2023009037号

京公网安备11010502055752号

网络110报警服务 Powered by GitLab CE v13.7
开源知识
Git 入门 Pro Git 电子书 在线学 Git
Markdown 基础入门 IT 技术知识开源图谱
帮助
使用手册 反馈建议 博客
《GitCode 隐私声明》 《GitCode 服务条款》 关于GitCode
Powered by GitLab CE v13.7