Optimize the layer_norm operator with AVX intrinsic function (!14417) · 合并请求 · PaddlePaddle / Paddle

Optimize the layer_norm operator with AVX intrinsic function !14417

Created by: yihuaxu

According to the performance status of Layer Norm, just implemented the intrinsic function's optimization to accelerate the data processing.

Platform: Intel(R) Xeon(R) Gold 6151 CPU @ 3.00GHz / Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz Batch Size: 1 Command: build/paddle/fluid/inference/tests/api/test_analyzer_dam --infer_model=PaddlePaddle/pretrained_models/dam --infer_data=third_party/inference_demo/dam/data.txt --gtest_filter=Analyzer_dam.profile --batch_size=1 --repeat=1 --test_all_data=true --num_threads=1 --use_analysis=false

The following is the comparison with the different scenarios.

PaddlePaddle / Paddle 大约 1 年 前同步成功

Optimize the layer_norm operator with AVX intrinsic function !14417

PaddlePaddle / Paddle
大约 1 年前同步成功