Merge pull request #6878 from tensor-tang/benchmark

update alexnet training data

Merge pull request #6878 from tensor-tang/benchmark
update alexnet training data
52965458 · Tao Luo · GitHub · 7777c811 · e9ee6fe8 · 52965458
隐藏空白更改
内联并排

Showing with 11 addition and 0 deletion

benchmark/IntelOptimizedPaddle.md benchmark/IntelOptimizedPaddle.md +11 -0

未找到文件。
--- a/benchmark/IntelOptimizedPaddle.md
+++ b/benchmark/IntelOptimizedPaddle.md
@@ -22,6 +22,7 @@ On each machine, we will test and compare the performance of training on single
 #### Training
 Test on batch size 64, 128, 256 on Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz
+Pay attetion that the speed below includes forward, backward and parameter update time. So we can not directly compare the data with the benchmark of caffe `time` [command](https://github.com/PaddlePaddle/Paddle/blob/develop/benchmark/caffe/image/run.sh#L9), which only contain forward and backward. The updating time of parameter would become very heavy when the weight size are large, especially on alexnet.
 Input image size - 3 * 224 * 224, Time: images/second
@@ -55,6 +56,16 @@ Input image size - 3 * 224 * 224, Time: images/second
 <img src="figs/googlenet-cpu-train.png" width="500">
+- Alexnet
+| BatchSize    | 64     | 128    | 256    |
+|--------------|--------| ------ | -------|
+| OpenBLAS     | 2.13   | 2.45   | 2.68   | 
+| MKLML        | 66.37  | 105.60 | 144.04 |
+| MKL-DNN      | 399.00 | 498.94 | 626.53 | 
+chart TBD
 #### Inference
 Test on batch size 1, 2, 4, 8, 16 on Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz
 - VGG-19