Remove reshape and transpose operators from attention module (!16342) · 合并请求 · PaddlePaddle / Paddle

Remove reshape and transpose operators from attention module !16342

Created by: yihuaxu

According to the performance status of Bert/Transformer model, fused matmul/reshape/transpose operators to reduce memory's copy.

Platform: Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz Model Path: third_party/inference_demo/bert_emb128/model Batch Size: 1 Command: ./paddle/fluid/inference/tests/api/test_analyzer_bert --infer_model=third_party/inference_demo/bert_emb128/model/ --infer_data=third_party/inference_demo/bert_emb128/data.txt --gtest_filter=Analyzer_bert.profile --paddle_num_threads=1 --repeat=10 --batch_size=1 --test_all_data Data Source: third_party/inference_demo/bert_emb128/data.txt.

The following is the comparison with the different scenarios.

Model Comparison: (a).Before Optimization:

(b).After Optimization:

Reference: Can we avoid head split_merge in Transformer.pdf

PaddlePaddle / Paddle 大约 2 年 前同步成功

Remove reshape and transpose operators from attention module !16342

PaddlePaddle / Paddle
大约 2 年前同步成功