# Performance for Distributed vgg16 ## Test Result ### Hardware Infomation - CPU: Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz - cpu MHz : 2101.000 - cache size : 20480 KB ### Blas settings Setting environment variable: `MKL_NUM_THREADS=1`. ### Single Node Single Thread - Metrics: samples / sec

Batch Size	32	64	128	256
PaddlePaddle Fluid	15.44	16.32	16.74	16.79
PaddlePaddle v2	15.97	17.04	17.60	17.83
TensorFlow	9.09	9.10	9.24	8.66

### Different Batch Size - PServer Count: 10 - Trainer Count: 20 - Metrics: samples / sec

Batch Size	32	64	128	256
PaddlePaddle Fluid	190.20	222.15	247.40	258.18
PaddlePaddle v2	170.96	233.71	256.14	329.23
TensorFlow	-	-	-	-

### Accelerate Rate - Pserver Count: 20 - Batch Size: 128 - Metrics: samples / sec

Trainer Count	20	40	80	100
PaddlePaddle Fluid	263.29 (78.64%)	518.80 (77.47%)	836.26 (62.44%)	1019.29 (60.89%)
PaddlePaddle v2 (need more tests)	326.85 (92.85%)	534.58 (75.93%)	853.30 (60.60%)	1041.99 (59.20%)
TensorFlow	-	-	-	-

### Different Pserver Count - Trainer Count: 60 - Batch Size: 128 - Metrics: samples/ sec

PServer Count	3	6	10	20
PaddlePaddle Fluid(should fix in next PR)	589.1	592.6	656.4	655.8
PaddlePaddle v2 (need more tests)	593.4	791.3	729.7	821.7
TensorFlow	-	-	-	-

*The performance gap between Fuild and v2 comes from the network interference.* ## Steps to Run the Performance Test 1. You must re-compile PaddlePaddle and enable `-DWITH_DISTRIBUTE` to build PaddlePaddle with distributed support. 1. When the build finishes, copy the output `whl` package located under `build/python/dist` to current directory. 1. Run `docker build -t [image:tag] .` to build the docker image and run `docker push [image:tag]` to push the image to reponsitory so kubernetes can find it. 1. Run `kubectl create -f pserver.yaml && kubectl create -f trainer.yaml` to start the job on your kubernetes cluster (you must configure the `kubectl` client before this step). 1. Run `kubectl get po` to get running pods, and run `kubectl logs [podID]` to fetch the pod log of pservers and trainers. Check the logs for the distributed training progress and analyze the performance. ## Enable Verbos Logs Edit `pserver.yaml` and `trainer.yaml` and add an environment variable `GLOG_v=3` and `GLOG_logtostderr=1` to see what happend in detail.