README.md 2.3 KB
Newer Older
T
typhoonzero 已提交
1 2 3 4
# Performance for distributed vgg16

## Test Result

T
typhoonzero 已提交
5 6 7 8 9 10 11 12 13 14 15
### Hardware Infomation

- CPU: Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz
- cpu MHz		: 2101.000
- cache size	: 20480 KB

### Single Node Single Thread

- PServer Count: 10
- Trainer Count: 20
- Metrics: samples / sec
T
typhoonzero 已提交
16 17 18

| Batch Size | 32 | 64 | 128 | 256 |
| -- | -- | -- | -- | -- |
T
typhoonzero 已提交
19 20
| PaddlePaddle Fluid | 15.44 | 16.32 | 16.74 | 16.79 |
| PaddlePaddle v2 | 15.97 | 17.04 | 17.60 | 17.83 |
T
typhoonzero 已提交
21 22 23 24 25 26
| TensorFlow | - | - | - | - |

### different batch size

- PServer Count: 10
- Trainer Count: 20
T
typhoonzero 已提交
27
- Per trainer CPU Core: 1
T
typhoonzero 已提交
28 29 30 31
- Metrics: samples / sec

| Batch Size | 32 | 64 | 128 | 256 |
| -- | -- | -- | -- | -- |
T
typhoonzero 已提交
32 33
| PaddlePaddle Fluid | 190.20 | 222.15 | 247.40 | 258.18 |
| PaddlePaddle v2 | 170.96 | 233.71 | 256.14 | 329.23 |
T
typhoonzero 已提交
34 35 36
| TensorFlow | - | - | - | - |


T
typhoonzero 已提交
37
### Accelerate rate
T
typhoonzero 已提交
38

T
typhoonzero 已提交
39 40 41 42 43
- Pserver Count: 20
- Batch Size: 128
- Metrics: samples / sec

| Trainer Counter | 20 | 40 | 80 | 100 |
T
typhoonzero 已提交
44
| -- | -- | -- | -- | -- |
T
typhoonzero 已提交
45 46
| PaddlePaddle Fluid | 263.29 | 518.80 | 836.26 | 1019.29 |
| PaddlePaddle v2 (need more tests) | 326.85 | 534.58 | 853.30 | 1041.99 |
T
typhoonzero 已提交
47 48
| TensorFlow | - | - | - | - |

T
typhoonzero 已提交
49
### different pserver number
T
typhoonzero 已提交
50

T
typhoonzero 已提交
51 52 53 54 55
- Trainer Count: 100
- Batch Size: 128
- Metrics: mini-batch / sec

| PServer Count | 10 | 20 | 40 | 60 |
T
typhoonzero 已提交
56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74
| -- | -- | -- | -- | -- |
| PaddlePaddle Fluid | - | - | - | - |
| PaddlePaddle v2 | - | - | - | - |
| TensorFlow | - | - | - | - |


## Steps to run the performance test

1. You must re-compile PaddlePaddle and enable `-DWITH_DISTRIBUTE` to build PaddlePaddle with distributed support.
1. When the build finishes, copy the output `whl` package located under `build/python/dist` to current directory.
1. Run `docker build -t [image:tag] .` to build the docker image and run `docker push [image:tag]` to push the image to reponsitory so kubernetes can find it.
1. Run `kubectl create -f pserver.yaml && kubectl create -f trainer.yaml` to start the job on your kubernetes cluster (you must configure the `kubectl` client before this step).
1. Run `kubectl get po` to get running pods, and run `kubectl logs [podID]` to fetch the pod log of pservers and trainers.

Check the logs for the distributed training progress and analyze the performance.

## Enable verbos logs

Edit `pserver.yaml` and `trainer.yaml` and add an environment variable `GLOG_v=3` to see what happend in detail.