Run distributed text_classification error proc param error:name:[fc_0.w_0@GRAD.block0.trainer_3] ep:[192.168.16.28:30256] grpc error:Connect Failed
Created by: typhoonzero
Background, run distributed vgg16 goes well but model: https://github.com/typhoonzero/fluid_gpu_benchmark/blob/master/text_fluid.py results in following error first-time trainer want to send variables.
E0328 11:59:35.739938 211 grpc_client.cc:189] proc param error:name:[fc_0.w_0@GRAD.block3.trainer_3] ep:[192.168.16.27:30256] grpc error:Connect Failed
E0328 11:59:35.739984 208 grpc_client.cc:189] proc param error:name:[fc_0.b_0@GRAD.trainer_3] ep:[192.168.16.27:30256] grpc error:Connect Failed
E0328 11:59:35.740000 203 grpc_client.cc:189] proc param error:name:[sequence_conv_0.w_0@GRAD.block0.trainer_3] ep:[192.168.16.27:30256] grpc error:Connect Failed
E0328 11:59:35.740012 204 grpc_client.cc:189] proc param error:name:[embedding_0.w_0@GRAD.block0.trainer_3] ep:[192.168.16.27:30256] grpc error:Connect Failed
E0328 11:59:35.740399 202 grpc_client.cc:189] proc param error:name:[fc_1.b_0@GRAD.trainer_3] ep:[192.168.16.28:30256] grpc error:Connect Failed
E0328 11:59:35.740417 198 grpc_client.cc:189] proc param error:name:[sequence_conv_0.w_0@GRAD.block1.trainer_3] ep:[192.168.16.28:30256] grpc error:Connect Failed
E0328 11:59:35.740432 209 grpc_client.cc:189] proc param error:name:[embedding_0.w_0@GRAD.block1.trainer_3] ep:[192.168.16.28:30256] grpc error:Connect Failed
E0328 11:59:35.740423 207 grpc_client.cc:189] proc param error:name:[fc_0.w_0@GRAD.block0.trainer_3] ep:[192.168.16.28:30256] grpc error:Connect Failed