Created by: danleifeng
PR types
Bug fixes
PR changes
APIs
Describe
Users can set worker-ports optionally when using fleetrun command.
how to use:
CPU cluster training:
# 2 servers 4 workers
fleetrun --servers="xx.xx.xx.xx:6170,yy.yy.yy.yy:6171" --workers="xx.xx.xx.xx,xx.xx.xx.xx,yy.yy.yy.yy,yy.yy.yy.yy" train.py
CPU cluster training with gloo backend:
# 2 servers 4 workers
fleetrun --servers="xx.xx.xx.xx:6170,yy.yy.yy.yy:6171" --workers="xx.xx.xx.xx:6172,xx.xx.xx.xx:6173,yy.yy.yy.yy:6174,yy.yy.yy.yy:6175" train.py