arm64/lib: improve CRC32 performance for deep pipelines
mainline inclusion
from mainline-5.0
commit: efdb25efc7645b326cd5eb82be5feeabe167c24e
category: perf
bugzilla: 20886
CVE: NA
lib/crc32test result:
[root@localhost build]# rmmod crc32test && insmod lib/crc32test.ko &&
dmesg | grep cycles
[83170.153209] CPU7: use cycles 26243990
[83183.122137] CPU7: use cycles 26151290
[83309.691628] CPU7: use cycles 26122830
[83312.415559] CPU7: use cycles 26232600
[83313.191479] CPU8: use cycles 26082350
rmmod crc32test && insmod lib/crc32test.ko && dmesg | grep cycles
[ 1023.539931] CPU25: use cycles 12256730
[ 1024.850360] CPU24: use cycles 12249680
[ 1025.463622] CPU25: use cycles 12253330
[ 1025.862925] CPU25: use cycles 12269720
[ 1026.376038] CPU26: use cycles 12222480
Based on 13702:
arm64/lib: improve CRC32 performance for deep pipelines
crypto: arm64/crc32 - remove PMULL based CRC32 driver
arm64/lib: add accelerated crc32 routines
arm64: cpufeature: add feature for CRC32 instructions
lib/crc32: make core crc32() routines weak so they can be overridden
----------------------------------------------
Improve the performance of the crc32() asm routines by getting rid of
most of the branches and small sized loads on the common path.
Instead, use a branchless code path involving overlapping 16 byte
loads to process the first (length % 32) bytes, and process the
remainder using a loop that processes 32 bytes at a time.
Tested using the following test program:
#include <stdlib.h>
extern void crc32_le(unsigned short, char const*, int);
int main(void)
{
static const char buf[4096];
srand(20181126);
for (int i = 0; i < 100 * 1000 * 1000; i++)
crc32_le(0, buf, rand() % 1024);
return 0;
}
On Cortex-A53 and Cortex-A57, the performance regresses but only very
slightly. On Cortex-A72 however, the performance improves from
$ time ./crc32
real 0m10.149s
user 0m10.149s
sys 0m0.000s
to
$ time ./crc32
real 0m7.915s
user 0m7.915s
sys 0m0.000s
Cc: Rui Sun <sunrui26@huawei.com>
Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NXie XiuQi <xiexiuqi@huawei.com>
Reviewed-by: NHanjun Guo <guohanjun@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
Showing
想要评论请 注册 或 登录