模型训练一段时间报GRPC错误
Created by: AWSWXY
模型训练一段时间报grpc错误,且稳定复现。
2019-06-16 16:36:14,803 - INFO - TRAIN --> pass: 0 batch: 100 loss: 0.61961340332 avg_loss :0.646084533691 avg_loss_100 :0.646084533691 auc: 0.525004341001, auc_100: 0.519851564739 ,batch_auc: 0.565066064668 2019-06-16 17:28:30,594 - INFO - TRAIN --> pass: 0 batch: 200 loss: 0.643115722656 avg_loss :0.633750671387 avg_loss_100 :0.621293518066 auc: 0.564058411824, auc_100: 0.595861241349 ,batch_auc: 0.610522291422 2019-06-16 18:22:15,130 - INFO - TRAIN --> pass: 0 batch: 300 loss: 0.586580200195 avg_loss :0.627978393555 avg_loss_100 :0.616376159668 auc: 0.580829420973, auc_100: 0.609009897329 ,batch_auc: 0.623971018562 2019-06-16 19:13:52,029 - INFO - TRAIN --> pass: 0 batch: 400 loss: 0.614757080078 avg_loss :0.625462036133 avg_loss_100 :0.617887817383 auc: 0.588551356393, auc_100: 0.61169839673 ,batch_auc: 0.62135427562 F0616 19:36:56.932013 26878 grpc_client.cc:357] GetRPC name:[embedding_0.w_0.block39], ep:[10.87.137.17:62004], status:[-1] meets grpc error, error_code:14 error_message:Socket closed error_details: