Fault tolerant distributed training, just work version, with etcd (#2849)
* using etcd as fault tolerant training * update * workable version, ft not tested * small fix * update * remove TODO
Showing
想要评论请 注册 或 登录
Fork自 PaddlePaddle / Paddle
* using etcd as fault tolerant training * update * workable version, ft not tested * small fix * update * remove TODO