PR 22961 caused significant performance regression on BERT inference
Created by: ddokupil
-PaddlePaddle version: https://github.com/PaddlePaddle/Paddle/pull/22961 -CPU: including CLX 6248, DNNL1.2 -OS Platform: Ubuntu 16.04 -Python version 2.7.12 -Cmake orders cmake .. -DCMAKE_BUILD_TYPE=Release -DWITH_GPU=OFF -DWITH_MKLDNN=ON -DWITH_TESTING=ON -DWITH_PROFILER=ON -DWITH_STYLE_CHECK=OFF -DON_INFER=ON -DWITH_INFERENCE_API_TEST=ON
Steps to reproduce the behavior: execute: $ /repos/paddle_paddle/build/paddle/fluid/inference/tests/api/test_analyzer_bert --gtest_filter=Analyzer_bert.profile --paddle_num_threads=1 --repeat=1000 -use_mkldnn=true --infer_data=/repos/paddle_paddle/build/third_party/inference_demo/bert_emb128/data.txt --batch_size=1 --infer_model=/repos/paddle_paddle/build/third_party/inference_demo/bert_emb128/model
We've always had around 132fps on that and after this commit it falls to 106fps