paddle cloud训练paddle v2时正常运行过程中遇到SocketChannel.cpp:101] Check failed: len > 0 peer=10.73.70.41 curIov=54 iovCnt=76 iovs[curIov].base=0x7fd5185d7519 iovs[curIov].iov_len=273191
Created by: huashaosmile
模型单机运行和使用单节点运行时都没问题,使用多节点时遇到多次正常训练的过程中出现如下报错
[INFO 2018-12-10 15:57:54,447 reader.py:111] start train reader from ./train_data
F1210 17:08:10.563592 20691 SocketChannel.cpp:101] Check failed: len > 0 peer=10.73.70.41 curIov=54 iovCnt=76 iovs[curIov].base=0x7fd5185d7519 iovs[curIov].iov_len=273191
* Check failure stack trace: *
@ 0x7fd68d46e1bd google::LogMessage::Fail()
@ 0x7fd68d471c6c google::LogMessage::SendToLog()
@ 0x7fd68d46dce3 google::LogMessage::Flush()
@ 0x7fd68d47317e google::LogMessageFatal::~LogMessageFatal()
@ 0x7fd68d2d4851 paddle::readwritev<>()
@ 0x7fd68d2d573d paddle::SocketChannel::writeMessage()
@ 0x7fd68d2d657c paddle::ProtoClient::send()
@ 0x7fd68d56c5ff paddle::ParameterClient2::sendParallel()
@ 0x7fd68d3daaac _ZNSt6thread5_ImplISt12_Bind_simpleIFZN6paddle14SyncThreadPool5startEvEUliE_mEEE6_M_runEv
@ 0x7fd67a2688a0 execute_native_thread_routine
@ 0x7fd6d62ed1c3 start_thread
@ 0x7fd6d591512d __clone
@ (nil) (unknown)