# Performance for Distributed vgg16
## Test Result
### Hardware Infomation
- CPU: Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz
- cpu MHz : 2101.000
- cache size : 20480 KB
### Blas settings
Setting environment variable: `MKL_NUM_THREADS=1`.
### Single Node Single Thread
- Metrics: samples / sec
Batch Size |
32 |
64 |
128 |
256 |
PaddlePaddle Fluid |
15.44 |
16.32 |
16.74 |
16.79 |
PaddlePaddle v2 |
15.97 |
17.04 |
17.60 |
17.83 |
TensorFlow |
9.09 |
9.10 |
9.24 |
8.66 |
### Different Batch Size
- PServer Count: 10
- Trainer Count: 20
- Metrics: samples / sec
Batch Size |
32 |
64 |
128 |
256 |
PaddlePaddle Fluid |
190.20 |
222.15 |
247.40 |
258.18 |
PaddlePaddle v2 |
170.96 |
233.71 |
256.14 |
329.23 |
TensorFlow |
- |
- |
- |
- |
### Accelerate Rate
- Pserver Count: 20
- Batch Size: 128
- Metrics: samples / sec
Trainer Count |
20 |
40 |
80 |
100 |
PaddlePaddle Fluid |
263.29 (78.64%) |
518.80 (77.47%) |
836.26 (62.44%) |
1019.29 (60.89%) |
PaddlePaddle v2 (need more tests) |
326.85 (92.85%) |
534.58 (75.93%) |
853.30 (60.60%) |
1041.99 (59.20%) |
TensorFlow |
- |
- |
- |
- |
### Different Pserver Count
- Trainer Count: 60
- Batch Size: 128
- Metrics: samples/ sec
PServer Count |
3 |
6 |
10 |
20 |
PaddlePaddle Fluid(should fix in next PR) |
589.1 |
592.6 |
656.4 |
655.8 |
PaddlePaddle v2 (need more tests) |
593.4 |
791.3 |
729.7 |
821.7 |
TensorFlow |
- |
- |
- |
- |
*The performance gap between Fuild and v2 comes from the network interference.*
## Steps to Run the Performance Test
1. You must re-compile PaddlePaddle and enable `-DWITH_DISTRIBUTE` to build PaddlePaddle with distributed support.
1. When the build finishes, copy the output `whl` package located under `build/python/dist` to current directory.
1. Run `docker build -t [image:tag] .` to build the docker image and run `docker push [image:tag]` to push the image to reponsitory so kubernetes can find it.
1. Run `kubectl create -f pserver.yaml && kubectl create -f trainer.yaml` to start the job on your kubernetes cluster (you must configure the `kubectl` client before this step).
1. Run `kubectl get po` to get running pods, and run `kubectl logs [podID]` to fetch the pod log of pservers and trainers.
Check the logs for the distributed training progress and analyze the performance.
## Enable Verbos Logs
Edit `pserver.yaml` and `trainer.yaml` and add an environment variable `GLOG_v=3` and `GLOG_logtostderr=1` to see what happend in detail.