IntelOptimizedPaddle.md 3.7 KB
Newer Older
1 2 3 4
# Benchmark

Machine:

L
Luo Tao 已提交
5 6
- Server: Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz, 2 Sockets, 20 Cores per socket
- Laptop: TBD
7 8 9

System: CentOS release 6.3 (Final), Docker 1.12.1.

10 11
PaddlePaddle:
- paddlepaddle/paddle:0.11.0 (for MKLML and MKL-DNN)
L
Luo Tao 已提交
12 13
  - MKL-DNN tag v0.11
  - MKLML 2018.0.1.20171007
14
- paddlepaddle/paddle:0.11.0-openblas (for OpenBLAS)
L
Luo Tao 已提交
15
  - OpenBLAS v0.2.20
16 17 18 19 20 21
	 
On each machine, we will test and compare the performance of training on single node using MKL-DNN / MKLML / OpenBLAS respectively.

## Benchmark Model

### Server
T
tensor-tang 已提交
22 23

#### Training
T
tensor-tang 已提交
24
Test on batch size 64, 128, 256 on Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz
T
tensor-tang 已提交
25
Pay attetion that the speed below includes forward, backward and parameter update time. So we can not directly compare the data with the benchmark of caffe `time` [command](https://github.com/PaddlePaddle/Paddle/blob/develop/benchmark/caffe/image/run.sh#L9), which only contain forward and backward. The updating time of parameter would become very heavy when the weight size are large, especially on alexnet.
26 27 28 29 30 31 32

Input image size - 3 * 224 * 224, Time: images/second

- VGG-19

| BatchSize    | 64    | 128  | 256     |
|--------------|-------| -----| --------|
33 34 35
| OpenBLAS     | 7.80  | 9.00  | 10.80  | 
| MKLML        | 12.12 | 13.70 | 16.18  |
| MKL-DNN      | 28.46 | 29.83 | 30.44  |
36

L
Luo Tao 已提交
37
<img src="figs/vgg-cpu-train.png" width="500">
38

T
tensor-tang 已提交
39 40 41 42
 - ResNet-50

| BatchSize    | 64    | 128   | 256    |
|--------------|-------| ------| -------|
43 44 45
| OpenBLAS     | 25.22 | 25.68 | 27.12  | 
| MKLML        | 32.52 | 31.89 | 33.12  |
| MKL-DNN      | 81.69 | 82.35 | 84.08  |
T
tensor-tang 已提交
46

L
Luo Tao 已提交
47
<img src="figs/resnet-cpu-train.png" width="500">
T
tensor-tang 已提交
48

49 50
 - GoogLeNet

T
tensor-tang 已提交
51 52
| BatchSize    | 64    | 128   | 256    |
|--------------|-------| ------| -------|
T
Tao Luo 已提交
53 54 55
| OpenBLAS     | 89.52 | 96.97 | 108.25 | 
| MKLML        | 128.46| 137.89| 158.63 |
| MKL-DNN      | 250.46| 264.83| 269.50 |
T
tensor-tang 已提交
56

L
Luo Tao 已提交
57
<img src="figs/googlenet-cpu-train.png" width="500">
T
tensor-tang 已提交
58

59
- AlexNet
T
tensor-tang 已提交
60 61 62

| BatchSize    | 64     | 128    | 256    |
|--------------|--------| ------ | -------|
63
| OpenBLAS     | 45.62  | 72.79  | 107.22 | 
T
Tao Luo 已提交
64 65
| MKLML        | 66.37  | 105.60 | 144.04 |
| MKL-DNN      | 399.00 | 498.94 | 626.53 | 
T
tensor-tang 已提交
66

67
<img src="figs/alexnet-cpu-train.png" width="500">
T
tensor-tang 已提交
68

T
tensor-tang 已提交
69 70 71 72 73 74
#### Inference
Test on batch size 1, 2, 4, 8, 16 on Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz
- VGG-19

| BatchSize | 1     | 2     | 4     | 8     | 16    |
|-----------|-------|-------|-------|-------|-------|
75
| OpenBLAS  | 1.10  | 1.96  | 3.62  | 3.63  | 2.25  |
T
Tao Luo 已提交
76 77
| MKLML     | 5.58  | 9.80  | 15.15 | 21.21 | 28.67 |
| MKL-DNN   | 75.07 | 88.64 | 82.58 | 92.29 | 96.75 |
T
tensor-tang 已提交
78

79 80
<img src="figs/vgg-cpu-infer.png" width="500">

T
tensor-tang 已提交
81 82 83 84
- ResNet-50

| BatchSize | 1     | 2      | 4      | 8      | 16     |
|-----------|-------|--------|--------|--------|--------|
85
| OpenBLAS  | 3.31  | 6.72   | 11.59  | 13.17  | 9.27   |
T
Tao Luo 已提交
86 87
| MKLML     | 6.33  | 12.02  | 22.88  | 40.53  | 63.09  |
| MKL-DNN   | 107.83| 148.84 | 177.78 | 189.35 | 217.69 |
T
tensor-tang 已提交
88

89
<img src="figs/resnet-cpu-infer.png" width="500">
T
tensor-tang 已提交
90

T
Tao Luo 已提交
91
- GoogLeNet
T
tensor-tang 已提交
92 93 94

| BatchSize | 1      | 2      | 4      | 8      | 16     |
|-----------|--------|--------|--------|--------|--------|
95
| OpenBLAS  | 12.06  | 23.56  | 34.48  | 36.45  | 23.12  |
T
Tao Luo 已提交
96 97
| MKLML     | 22.74  | 41.56  | 81.22  | 133.47 | 210.53 |
| MKL-DNN   | 175.10 | 272.92 | 450.70 | 512.00 | 600.94 |
T
tensor-tang 已提交
98

99 100
<img src="figs/googlenet-cpu-infer.png" width="500">

101
- AlexNet
102 103 104

| BatchSize | 1      | 2      | 4      | 8      | 16     |
|-----------|--------|--------|--------|--------|--------|
105
| OpenBLAS  | 3.53   | 6.23   | 15.04  | 26.06  | 31.62  |
106 107 108
| MKLML     | 21.32  | 36.55  | 73.06  | 131.15 | 192.77 |
| MKL-DNN   | 442.91 | 656.41 | 719.10 | 847.68 | 850.51 |

109
<img src="figs/alexnet-cpu-infer.png" width="500">
T
tensor-tang 已提交
110

111 112
### Laptop
TBD