Created by: yihuaxu
According to the performance status of Bert/Transformer model, fused matmul/reshape/transpose operators to reduce memory's copy.
Platform: Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz Model Path: third_party/inference_demo/bert_emb128/model Batch Size: 1 Command: ./paddle/fluid/inference/tests/api/test_analyzer_bert --infer_model=third_party/inference_demo/bert_emb128/model/ --infer_data=third_party/inference_demo/bert_emb128/data.txt --gtest_filter=Analyzer_bert.profile --paddle_num_threads=1 --repeat=10 --batch_size=1 --test_all_data Data Source: third_party/inference_demo/bert_emb128/data.txt.
The following is the comparison with the different scenarios.