提交 · 112cbae26d18e75098d95cc234cfa5059de8d479 · gsplhtlxg / clone-Linux

07 8月, 2018 1 次提交

crypto: arm64 - revert NEON yield for fast AEAD implementations · f10dc56c

由 Ard Biesheuvel 提交于 7月 29, 2018

As it turns out, checking the TIF_NEED_RESCHED flag after each
iteration results in a significant performance regression (~10%)
when running fast algorithms (i.e., ones that use special instructions
and operate in the < 4 cycles per byte range) on in-order cores with
comparatively slow memory accesses such as the Cortex-A53.

Given the speed of these ciphers, and the fact that the page based
nature of the AEAD scatterwalk API guarantees that the core NEON
transform is never invoked with more than a single page's worth of
input, we can estimate the worst case duration of any resulting
scheduling blackout: on a 1 GHz Cortex-A53 running with 64k pages,
processing a page's worth of input at 4 cycles per byte results in
a delay of ~250 us, which is a reasonable upper bound.

So let's remove the yield checks from the fused AES-CCM and AES-GCM
routines entirely.

This reverts commit 7b67ae4d and
partially reverts commit 7c50136a.

Fixes: 7c50136a ("crypto: arm64/aes-ghash - yield NEON after every ...")
Fixes: 7b67ae4d ("crypto: arm64/aes-ccm - yield NEON after every ...")
Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

f10dc56c

12 5月, 2018 1 次提交

crypto: arm64/aes-ghash - yield NEON after every block of input · 7c50136a

由 Ard Biesheuvel 提交于 4月 30, 2018

Avoid excessive scheduling delays under a preemptible kernel by
yielding the NEON after every block of input.
Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

7c50136a

04 8月, 2017 2 次提交

crypto: arm64/ghash - add NEON accelerated fallback for 64-bit PMULL · 03c9a333

由 Ard Biesheuvel 提交于 7月 24, 2017

Implement a NEON fallback for systems that do support NEON but have
no support for the optional 64x64->128 polynomial multiplication
instruction that is part of the ARMv8 Crypto Extensions. It is based
on the paper "Fast Software Polynomial Multiplication on ARM Processors
Using the NEON Engine" by Danilo Camara, Conrado Gouvea, Julio Lopez and
Ricardo Dahab (https://hal.inria.fr/hal-01506572), but has been reworked
extensively for the AArch64 ISA.

On a low-end core such as the Cortex-A53 found in the Raspberry Pi3, the
NEON based implementation is 4x faster than the table based one, and
is time invariant as well, making it less vulnerable to timing attacks.
When combined with the bit-sliced NEON implementation of AES-CTR, the
AES-GCM performance increases by 2x (from 58 to 29 cycles per byte).
Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

03c9a333

crypto: arm64/gcm - implement native driver using v8 Crypto Extensions · 537c1445

由 Ard Biesheuvel 提交于 7月 24, 2017

Currently, the AES-GCM implementation for arm64 systems that support the
ARMv8 Crypto Extensions is based on the generic GCM module, which combines
the AES-CTR implementation using AES instructions with the PMULL based
GHASH driver. This is suboptimal, given the fact that the input data needs
to be loaded twice, once for the encryption and again for the MAC
calculation.

On Cortex-A57 (r1p2) and other recent cores that implement micro-op fusing
for the AES instructions, AES executes at less than 1 cycle per byte, which
means that any cycles wasted on loading the data twice hurt even more.

So implement a new GCM driver that combines the AES and PMULL instructions
at the block level. This improves performance on Cortex-A57 by ~37% (from
3.5 cpb to 2.6 cpb)
Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

537c1445

21 10月, 2016 1 次提交

crypto: arm64/ghash-ce - fix for big endian · 9c433ad5

由 Ard Biesheuvel 提交于 10月 11, 2016

The GHASH key and digest are both pairs of 64-bit quantities, but the
GHASH code does not always refer to them as such, causing failures when
built for big endian. So replace the 16x1 loads and stores with 2x8 ones.

Fixes: b913a640 ("arm64/crypto: improve performance of GHASH algorithm")
Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

9c433ad5

18 6月, 2014 1 次提交

arm64/crypto: improve performance of GHASH algorithm · b913a640

由 Ard Biesheuvel 提交于 6月 16, 2014

This patches modifies the GHASH secure hash implementation to switch to a
faster, polynomial multiplication based reduction instead of one that uses
shifts and rotates.
Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

b913a640

15 5月, 2014 1 次提交

arm64/crypto: GHASH secure hash using ARMv8 Crypto Extensions · fdd23894

由 Ard Biesheuvel 提交于 3月 26, 2014

This is a port to ARMv8 (Crypto Extensions) of the Intel implementation of the
GHASH Secure Hash (used in the Galois/Counter chaining mode). It relies on the
optional PMULL/PMULL2 instruction (polynomial multiply long, what Intel call
carry-less multiply).
Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>

fdd23894