IntelOptimizedPaddle.md 1.3 KB
Newer Older
1 2 3 4
# Benchmark

Machine:

T
tensor-tang 已提交
5
- Server
6
 	- Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz, 2 Sockets, 20 Cores per socket
T
tensor-tang 已提交
7 8
- Laptop
 	- DELL XPS15-9560-R1745: i7-7700HQ 8G 256GSSD
9 10 11 12 13 14 15 16
 	- i5 MacBook Pro (Retina, 13-inch, Early 2015)
- Desktop
 	- i7-6700k

System: CentOS release 6.3 (Final), Docker 1.12.1.

PaddlePaddle: paddlepaddle/paddle:latest (TODO: will rerun after 0.11.0)

T
tensor-tang 已提交
17 18
- MKL-DNN tag v0.10
- MKLML 2018.0.20170720
19 20 21 22 23 24 25
- OpenBLAS v0.2.20
	 
On each machine, we will test and compare the performance of training on single node using MKL-DNN / MKLML / OpenBLAS respectively.

## Benchmark Model

### Server
T
tensor-tang 已提交
26
Test on batch size 64, 128, 256 on Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz
27 28 29 30 31 32 33 34 35 36 37 38 39 40 41

Input image size - 3 * 224 * 224, Time: images/second

- VGG-19

| BatchSize    | 64    | 128  | 256     |
|--------------|-------| -----| --------|
| OpenBLAS     | 7.82  | 8.62  | 10.34  | 
| MKLML        | 11.02 | 12.86 | 15.33  |
| MKL-DNN      | 27.69 | 28.8 | 29.27  |


chart on batch size 128
TBD

T
tensor-tang 已提交
42 43 44 45 46 47 48 49 50 51 52 53
 - ResNet-50

| BatchSize    | 64    | 128   | 256    |
|--------------|-------| ------| -------|
| OpenBLAS     | 22.90 | 23.10 | 25.59  | 
| MKLML        | 29.81 | 30.18 | 32.77  |
| MKL-DNN      | 80.49 | 82.89 | 83.13  |


chart on batch size 128
TBD

54 55 56 57 58 59
 - GoogLeNet

### Laptop
TBD
### Desktop
TBD