IntelOptimizedPaddle.md 1.6 KB
Newer Older
1 2 3 4
# Benchmark

Machine:

L
Luo Tao 已提交
5 6
- Server: Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz, 2 Sockets, 20 Cores per socket
- Laptop: TBD
7 8 9

System: CentOS release 6.3 (Final), Docker 1.12.1.

L
Luo Tao 已提交
10 11 12 13 14 15
PaddlePaddle: (TODO: will rerun after 0.11.0)
- paddlepaddle/paddle:latest (for MKLML and MKL-DNN)
  - MKL-DNN tag v0.11
  - MKLML 2018.0.1.20171007
- paddlepaddle/paddle:latest-openblas (for OpenBLAS)
  - OpenBLAS v0.2.20
16 17 18 19 20 21
	 
On each machine, we will test and compare the performance of training on single node using MKL-DNN / MKLML / OpenBLAS respectively.

## Benchmark Model

### Server
T
tensor-tang 已提交
22
Test on batch size 64, 128, 256 on Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz
23 24 25 26 27 28 29

Input image size - 3 * 224 * 224, Time: images/second

- VGG-19

| BatchSize    | 64    | 128  | 256     |
|--------------|-------| -----| --------|
30 31 32
| OpenBLAS     | 7.80  | 9.00  | 10.80  | 
| MKLML        | 12.12 | 13.70 | 16.18  |
| MKL-DNN      | 28.46 | 29.83 | 30.44  |
33

L
Luo Tao 已提交
34
<img src="figs/vgg-cpu-train.png" width="500">
35

T
tensor-tang 已提交
36 37 38 39
 - ResNet-50

| BatchSize    | 64    | 128   | 256    |
|--------------|-------| ------| -------|
40 41 42
| OpenBLAS     | 25.22 | 25.68 | 27.12  | 
| MKLML        | 32.52 | 31.89 | 33.12  |
| MKL-DNN      | 81.69 | 82.35 | 84.08  |
T
tensor-tang 已提交
43

L
Luo Tao 已提交
44
<img src="figs/resnet-cpu-train.png" width="500">
T
tensor-tang 已提交
45

46 47
 - GoogLeNet

T
tensor-tang 已提交
48 49
| BatchSize    | 64    | 128   | 256    |
|--------------|-------| ------| -------|
T
Tao Luo 已提交
50 51 52
| OpenBLAS     | 89.52 | 96.97 | 108.25 | 
| MKLML        | 128.46| 137.89| 158.63 |
| MKL-DNN      | 250.46| 264.83| 269.50 |
T
tensor-tang 已提交
53

L
Luo Tao 已提交
54
<img src="figs/googlenet-cpu-train.png" width="500">
T
tensor-tang 已提交
55

56 57
### Laptop
TBD