GoogleNet on MKLDNN
Created by: luotao1
Model URL: http://paddle-inference-dist.bj.bcebos.com/googlenet.tar.gz
command: likes test_analyzer_resnet50
, only change the context of --infer_model
Diff
There is diff when use MKLDNN on googlenet.
I1109 13:37:42.628175 50355 helper.h:161] ====== batch_size: 1, repeat: 1, threads: 1, thread id: 0, latency: 268.931ms, fps: 3.71843 ======
/Paddle/paddle/fluid/inference/tests/api/tester_helper.h:70: Failure
The difference between pdata_ref[j] and pdata[j] is 0.015455007553100586, which exceeds 1e-3, where
pdata_ref[j] evaluates to 0.67508238554000854,
pdata[j] evaluates to 0.65962737798690796, and
1e-3 evaluates to 0.001.
/Paddle/paddle/fluid/inference/tests/api/tester_helper.h:70: Failure
The difference between pdata_ref[j] and pdata[j] is 0.029862642288208008, which exceeds 1e-3, where
pdata_ref[j] evaluates to 0.64170312881469727,
pdata[j] evaluates to 0.67156577110290527, and
1e-3 evaluates to 0.001.
/Paddle/paddle/fluid/inference/tests/api/tester_helper.h:70: Failure
The difference between pdata_ref[j] and pdata[j] is 2.03973388671875, which exceeds 1e-3, where
pdata_ref[j] evaluates to 226.21131896972656,
pdata[j] evaluates to 224.17158508300781, and
1e-3 evaluates to 0.001.
/Paddle/paddle/fluid/inference/tests/api/tester_helper.h:70: Failure
The difference between pdata_ref[j] and pdata[j] is 2.217803955078125, which exceeds 1e-3, where
pdata_ref[j] evaluates to 227.33116149902344,
pdata[j] evaluates to 225.11335754394531, and
1e-3 evaluates to 0.001.
[ FAILED ] Analyzer_resnet50.compare_mkldnn (2188 ms)
[----------] 1 test from Analyzer_resnet50 (2188 ms total)
Performance
The performance on MKLDNN (193ms) is almost the same as MKL (206ms). Machine: Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz
- Profile on MKL
I1109 11:53:54.818035 51680 helper.h:161] ====== batch_size: 1, repeat: 100, threads: 1, thread id: 0, latency: 206.139ms, fps: 4.8511 ======
-------------------------> Profiling Report <-------------------------
Place: CPU
Time unit: ms
Sorted by total time in descending order in the same thread
Event Calls Total Min. Max. Ave. Ratio.
thread0::lrn 200 9851.59 25.3704 77.0484 49.258 0.478162
thread0::conv2d 5700 7756.49 0.09196 25.0828 1.36079 0.376473
thread0::pool2d 1400 1550.66 0.076101 2.85543 1.10761 0.0752636
thread0::elementwise_add 5700 1165.1 0.017176 3.08475 0.204403 0.0565497
thread0::relu 5700 119.171 0.004205 0.685152 0.0209072 0.00578414
thread0::concat 900 116.127 0.031237 0.37544 0.12903 0.0056364
thread0::load_combine 2 37.106 12.8365 24.2695 18.553 0.001801
thread0::fc 100 5.55069 0.049486 0.13439 0.0555069 0.000269411
thread0::fetch 100 0.715028 0.005518 0.026451 0.00715028 3.4705e-05
thread0::feed 100 0.529316 0.004019 0.014951 0.00529316 2.56912e-05
- Profile on MKLDNN, seems pool2d costs a lot of time
I1109 12:06:08.191417 17514 helper.h:161] ====== batch_size: 1, repeat: 100, threads: 1, thread id: 0, latency: 198.451ms, fps: 5.03903 ======
-------------------------> Profiling Report <-------------------------
Place: CPU
Time unit: ms
Sorted by total time in descending order in the same thread
Event Calls Total Min. Max. Ave. Ratio.
thread0::pool2d 1400 11195.9 0.136173 22.0287 7.99705 0.56419
thread0::conv2d 5700 6709.73 0.201956 15.2978 1.17714 0.338121
thread0::concat 900 1025.27 0.747875 3.38313 1.13919 0.051666
thread0::lrn 200 863.445 2.65105 9.83194 4.31723 0.0435113
thread0::load_combine 2 37.6152 12.6916 24.9237 18.8076 0.00189553
thread0::fc 100 10.8304 0.091616 0.84363 0.108304 0.000545772
thread0::fetch 100 0.811231 0.006754 0.027217 0.00811231 4.08801e-05
thread0::feed 100 0.600731 0.004528 0.035933 0.00600731 3.02724e-05