arch/arm64/lib/csum.S · efa29e0a3f1be9336cdaaf5cb91eb7ce9ce1da64 · openeuler / raspberrypi-kernel

arm64: do_csum: implement accelerated scalar version · efa29e0a

由 Ard Biesheuvel 提交于 4月 24, 2019

hulk inclusion
category: feature
feature: checksum performance
bugzilla: 13700
CVE: NA

--------------------------------------------------

It turns out that the IP checksumming code is still exercised often,
even though one might expect that modern NICs with checksum offload
have no use for it. However, as Lingyan points out, there are
combinations of features where the network stack may still fall back
to software checksumming, and so it makes sense to provide an
optimized implementation in software as well.

So provide an implementation of do_csum() in scalar assembler, which,
unlike C, gives direct access to the carry flag, making the code run
substantially faster. The routine uses overlapping 64 byte loads for
all input size > 64 bytes, in order to reduce the number of branches
and improve performance on cores with deep pipelines.

On Cortex-A57, this implementation is on par with Lingyan's NEON
implementation, and roughly 7x as fast as the generic C code.

Diff with ard's original patch: add validation check for the len.

Cc: "huanglingyan (A)" <huanglingyan2@huawei.com>
Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NChen Zhou <chenzhou10@huawei.com>
Reviewed-by: NHanjun Guo <guohanjun@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

efa29e0a

csum.S 2.4 KB

openeuler / raspberrypi-kernel

Replace csum.S