distribute train grpc error
Created by: jacquesqiao
F0710 01:54:37.113919 3853 grpc_client.cc:248] var: name:[embedding_9.w_0.block0] ep:[10.88.130.33:30006] grpc
error:Socket closed
*** Check failure stack trace: ***
@ 0x7fd55cf26ebd google::LogMessage::Fail()
@ 0x7fd55cf2a96c google::LogMessage::SendToLog()
@ 0x7fd55cf269e3 google::LogMessage::Flush()
@ 0x7fd55cf2be7e google::LogMessageFatal::~LogMessageFatal()
@ 0x7fd55b6be988 paddle::operators::distributed::GRPCClient::Proceed()
@ 0x7fd5701f3470 (unknown)
@ 0x7fd5f6f44851 start_thread
@ 0x7fd5f660790d clone
@ (nil) (unknown)
/root/paddlejob/paddle_k8s: line 117: 3736 Aborted (core dumped) stdbuf -oL ${START_CMD}
[/root/paddlejob/paddle_k8s : 180] [start_trainer]
[FATAL]: execute user cmd failed