IntelOptimizedPaddle.md 3.5 KB
Newer Older
1 2 3 4
# Benchmark

Machine:

L
Luo Tao 已提交
5 6
- Server: Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz, 2 Sockets, 20 Cores per socket
- Laptop: TBD
7 8 9

System: CentOS release 6.3 (Final), Docker 1.12.1.

10 11
PaddlePaddle:
- paddlepaddle/paddle:0.11.0 (for MKLML and MKL-DNN)
L
Luo Tao 已提交
12 13
  - MKL-DNN tag v0.11
  - MKLML 2018.0.1.20171007
14
- paddlepaddle/paddle:0.11.0-openblas (for OpenBLAS)
L
Luo Tao 已提交
15
  - OpenBLAS v0.2.20
16 17 18 19 20 21
	 
On each machine, we will test and compare the performance of training on single node using MKL-DNN / MKLML / OpenBLAS respectively.

## Benchmark Model

### Server
T
tensor-tang 已提交
22 23

#### Training
T
tensor-tang 已提交
24
Test on batch size 64, 128, 256 on Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz
T
tensor-tang 已提交
25
Pay attetion that the speed below includes forward, backward and parameter update time. So we can not directly compare the data with the benchmark of caffe `time` [command](https://github.com/PaddlePaddle/Paddle/blob/develop/benchmark/caffe/image/run.sh#L9), which only contain forward and backward. The updating time of parameter would become very heavy when the weight size are large, especially on alexnet.
26 27 28 29 30 31 32

Input image size - 3 * 224 * 224, Time: images/second

- VGG-19

| BatchSize    | 64    | 128  | 256     |
|--------------|-------| -----| --------|
33 34 35
| OpenBLAS     | 7.80  | 9.00  | 10.80  | 
| MKLML        | 12.12 | 13.70 | 16.18  |
| MKL-DNN      | 28.46 | 29.83 | 30.44  |
36

L
Luo Tao 已提交
37
<img src="figs/vgg-cpu-train.png" width="500">
38

T
tensor-tang 已提交
39 40 41 42
 - ResNet-50

| BatchSize    | 64    | 128   | 256    |
|--------------|-------| ------| -------|
43 44 45
| OpenBLAS     | 25.22 | 25.68 | 27.12  | 
| MKLML        | 32.52 | 31.89 | 33.12  |
| MKL-DNN      | 81.69 | 82.35 | 84.08  |
T
tensor-tang 已提交
46

L
Luo Tao 已提交
47
<img src="figs/resnet-cpu-train.png" width="500">
T
tensor-tang 已提交
48

49 50
 - GoogLeNet

T
tensor-tang 已提交
51 52
| BatchSize    | 64    | 128   | 256    |
|--------------|-------| ------| -------|
T
Tao Luo 已提交
53 54 55
| OpenBLAS     | 89.52 | 96.97 | 108.25 | 
| MKLML        | 128.46| 137.89| 158.63 |
| MKL-DNN      | 250.46| 264.83| 269.50 |
T
tensor-tang 已提交
56

L
Luo Tao 已提交
57
<img src="figs/googlenet-cpu-train.png" width="500">
T
tensor-tang 已提交
58

59
- AlexNet
T
tensor-tang 已提交
60 61 62

| BatchSize    | 64     | 128    | 256    |
|--------------|--------| ------ | -------|
63
| OpenBLAS     | 45.62  | 72.79  | 107.22 | 
T
Tao Luo 已提交
64 65
| MKLML        | 66.37  | 105.60 | 144.04 |
| MKL-DNN      | 399.00 | 498.94 | 626.53 | 
T
tensor-tang 已提交
66 67 68

chart TBD

T
tensor-tang 已提交
69 70 71 72 73 74
#### Inference
Test on batch size 1, 2, 4, 8, 16 on Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz
- VGG-19

| BatchSize | 1     | 2     | 4     | 8     | 16    |
|-----------|-------|-------|-------|-------|-------|
75
| OpenBLAS  | 1.10  | 1.96  | 3.62  | 3.63  | 2.25  |
T
Tao Luo 已提交
76 77
| MKLML     | 5.58  | 9.80  | 15.15 | 21.21 | 28.67 |
| MKL-DNN   | 75.07 | 88.64 | 82.58 | 92.29 | 96.75 |
T
tensor-tang 已提交
78 79 80 81 82

- ResNet-50

| BatchSize | 1     | 2      | 4      | 8      | 16     |
|-----------|-------|--------|--------|--------|--------|
83
| OpenBLAS  | 3.31  | 6.72   | 11.59  | 13.17  | 9.27   |
T
Tao Luo 已提交
84 85
| MKLML     | 6.33  | 12.02  | 22.88  | 40.53  | 63.09  |
| MKL-DNN   | 107.83| 148.84 | 177.78 | 189.35 | 217.69 |
T
tensor-tang 已提交
86 87


T
Tao Luo 已提交
88
- GoogLeNet
T
tensor-tang 已提交
89 90 91

| BatchSize | 1      | 2      | 4      | 8      | 16     |
|-----------|--------|--------|--------|--------|--------|
92
| OpenBLAS  | 12.06  | 23.56  | 34.48  | 36.45  | 23.12  |
T
Tao Luo 已提交
93 94
| MKLML     | 22.74  | 41.56  | 81.22  | 133.47 | 210.53 |
| MKL-DNN   | 175.10 | 272.92 | 450.70 | 512.00 | 600.94 |
T
tensor-tang 已提交
95

96
- AlexNet
97 98 99

| BatchSize | 1      | 2      | 4      | 8      | 16     |
|-----------|--------|--------|--------|--------|--------|
100
| OpenBLAS  | 3.53   | 6.23   | 15.04  | 26.06  | 31.62  |
101 102 103 104
| MKLML     | 21.32  | 36.55  | 73.06  | 131.15 | 192.77 |
| MKL-DNN   | 442.91 | 656.41 | 719.10 | 847.68 | 850.51 |

chart TBD
T
tensor-tang 已提交
105

106 107
### Laptop
TBD