分布式训练模型初始化问题
Created by: zhaoyang1708
用paddle cloud提交任务训练模型,没有用初始化时可以正常训练,用下面初始化模型的时候都会报错 init_model_path=/app/ecom/native-ad/zhaoyang29/reward_video_nnq/xcxxs/fluidmodel/output/ab28ea5e-cb8b-50b6-9503-b447b99aabed/job-0bb5d31881d04ac9/output/rank-00000/pass-9/ 报错:F0731 03:18:58.043956 37734 grpc_client.cc:418] GetRPC name:[sequence_conv_4.b_0], ep:[10.182.76.151:62004], status:[-1] meets grpc error, error_code:14 error_message:Socket closed error_details: