Created by: Yancey1989
Fixed #9161 (closed) Fixed #10969 (closed)
Experiment with vgg16 + flowers on P40, 2 pservers + 2 trainers
branch | GPUs per trainer | executor | performance |
---|---|---|---|
develop | 1 | default executor | 12.946086 imgs/s |
overlap | 1 | default executor | 14.587641 imgs/s |
develop | 8 | parallel executor | 144.8 imgs/s |
overlap | 8 | parallel executor | 175 imgs/s |
The performance improves 12% on a single device, 20% on multi devices.