Optimize VMUL/VADD part with AVX intrinsic function (!13496) · 合并请求 · PaddlePaddle / Paddle

You need to sign in or sign up before continuing.

Optimize VMUL/VADD part with AVX intrinsic function !13496

Created by: yihuaxu

Used AVX register as the intermediate to save the delay during reading/writing memory frequently.

Platform: Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz / Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz Model Path: Text Classification: third_party/inference_demo/text_classification/model RNN1: third_party/inference_demo/rnn1/model Batch Size: 1 Frame Size: Text Classification 128 / RNN1 15 Command: Text Classification: cd build && ./paddle/fluid/inference/tests/api/test_analyzer_text_classification --infer_model=third_party/inference_demo/text_classification/model --infer_data=third_party/inference_demo/text_classification/data.txt --paddle_num_threads=1 --repeat=1 --batch_size=1 --test_all_data --num_threads=1 RNN1: cd build && ./paddle/fluid/inference/tests/api/test_analyzer_rnn1 --infer_model=third_party/inference_demo/rnn1/model --infer_data=third_party/inference_demo/rnn1/data.txt --paddle_num_threads=1 --repeat=1000 --batch_size=1 --test_all_data --num_threads=1 --profiler=1 Data Source: Text Classification: build/third_party/inference_demo/text_classification/data.txt RNN: build/third_party/inference_demo/rnn1/data.txt

The following is the comparison with the different scenarios.

PaddlePaddle / Paddle 大约 2 年 前同步成功

Optimize VMUL/VADD part with AVX intrinsic function !13496

PaddlePaddle / Paddle
大约 2 年前同步成功