Compare Inference Perf BTW CPU and MKLDNN [OCR CRNN_CTC model]
Created by: luotao1
Model
Original Model: https://github.com/PaddlePaddle/models/blob/develop/fluid/ocr_recognition/crnn_ctc_model.py
- Since the inference model will do fuse batch norm, thus, we can easily change https://github.com/PaddlePaddle/models/blob/develop/fluid/ocr_recognition/crnn_ctc_model.py#L14-L28 directly for profiling inference Perf, which has the same Perf with fuse batch norm.
tmp = fluid.layers.conv2d(
input=tmp,
num_filters=out_ch[i],
filter_size=3,
padding=1,
param_attr=param if param_0 is None else param_0,
bias_attr=bias
act=act
use_cudnn=True)
- We can add
use_mkldnn=True
directly to obain a MKLDNN ProgramDesc, like https://github.com/PaddlePaddle/Paddle/compare/develop...tensor-tang:compare. And after #10682 (closed) is solved, we can auto change a CPU ProgramDesc to MKLDNN ProgramDesc.
The final model likes:
4.0K conv2d_0.b_0
4.0K conv2d_0.w_0
4.0K conv2d_1.b_0
12K conv2d_1.w_0
4.0K conv2d_2.b_0
20K conv2d_2.w_0
4.0K conv2d_3.b_0
40K conv2d_3.w_0
4.0K conv2d_4.b_0
76K conv2d_4.w_0
4.0K conv2d_5.b_0
148K conv2d_5.w_0
4.0K conv2d_6.b_0
292K conv2d_6.w_0
4.0K conv2d_7.b_0
580K conv2d_7.w_0
4.0K fc_0.b_0
904K fc_0.w_0
4.0K fc_1.b_0
904K fc_1.w_0
44K fc_2.b_0
8.3M fc_2.w_0
8.3M fc_2.w_1
4.0K gru_0.b_0
472K gru_0.w_0
4.0K gru_1.b_0
472K gru_1.w_0
12K __model__
Test
A patch for test crnn_ctc model on C++ end. https://github.com/PaddlePaddle/Paddle/compare/develop...luotao1:ocr_test?expand=1
# build test
cd build
make test ARGS="-R test_crnn_ctc -V"
# run test
cd paddle/fluid/inference/tests/book
./test_crnn_ctc --dirname=DIR_PATH --batch_size=1 --repeat=10
Note that this will give the result of MKLDNN multi-threads, with single threads please try:
taskset -c 0 ./test_crnn_ctc --dirname=DIR_PATH --batch_size=1 --repeat=10
refer: #10651 (closed)