提交 · b4df50de6ab66e41b3b8d8acf3ce45c632084163 · openanolis / cloud-kernel

25 8月, 2018 1 次提交

crypto: arm64/aes-gcm-ce - fix scatterwalk API violation · c2b24c36

由 Ard Biesheuvel 提交于 8月 20, 2018

Commit 71e52c27 ("crypto: arm64/aes-ce-gcm - operate on
two input blocks at a time") modified the granularity at which
the AES/GCM code processes its input to allow subsequent changes
to be applied that improve performance by using aggregation to
process multiple input blocks at once.

For this reason, it doubled the algorithm's 'chunksize' property
to 2 x AES_BLOCK_SIZE, but retained the non-SIMD fallback path that
processes a single block at a time. In some cases, this violates the
skcipher scatterwalk API, by calling skcipher_walk_done() with a
non-zero residue value for a chunk that is expected to be handled
in its entirety. This results in a WARN_ON() to be hit by the TLS
self test code, but is likely to break other user cases as well.
Unfortunately, none of the current test cases exercises this exact
code path at the moment.

Fixes: 71e52c27 ("crypto: arm64/aes-ce-gcm - operate on two ...")
Reported-by: NVakul Garg <vakul.garg@nxp.com>
Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Tested-by: NVakul Garg <vakul.garg@nxp.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

c2b24c36

07 8月, 2018 5 次提交

crypto: arm64/ghash-ce - implement 4-way aggregation · 22240df7

由 Ard Biesheuvel 提交于 8月 04, 2018

Enhance the GHASH implementation that uses 64-bit polynomial
multiplication by adding support for 4-way aggregation. This
more than doubles the performance, from 2.4 cycles per byte
to 1.1 cpb on Cortex-A53.
Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

22240df7

crypto: arm64/ghash-ce - replace NEON yield check with block limit · 8e492eff

由 Ard Biesheuvel 提交于 8月 04, 2018

Checking the TIF_NEED_RESCHED flag is disproportionately costly on cores
with fast crypto instructions and comparatively slow memory accesses.

On algorithms such as GHASH, which executes at ~1 cycle per byte on
cores that implement support for 64 bit polynomial multiplication,
there is really no need to check the TIF_NEED_RESCHED particularly
often, and so we can remove the NEON yield check from the assembler
routines.

However, unlike the AEAD or skcipher APIs, the shash/ahash APIs take
arbitrary input lengths, and so there needs to be some sanity check
to ensure that we don't hog the CPU for excessive amounts of time.

So let's simply cap the maximum input size that is processed in one go
to 64 KB.
Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

8e492eff

crypto: arm64/aes-ce-gcm - don't reload key schedule if avoidable · 30f1a9f5

由 Ard Biesheuvel 提交于 7月 30, 2018

Squeeze out another 5% of performance by minimizing the number
of invocations of kernel_neon_begin()/kernel_neon_end() on the
common path, which also allows some reloads of the key schedule
to be optimized away.

The resulting code runs at 2.3 cycles per byte on a Cortex-A53.
Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

30f1a9f5

crypto: arm64/aes-ce-gcm - implement 2-way aggregation · e0bd888d

由 Ard Biesheuvel 提交于 7月 30, 2018

Implement a faster version of the GHASH transform which amortizes
the reduction modulo the characteristic polynomial across two
input blocks at a time.

On a Cortex-A53, the gcm(aes) performance increases 24%, from
3.0 cycles per byte to 2.4 cpb for large input sizes.
Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

e0bd888d

crypto: arm64/aes-ce-gcm - operate on two input blocks at a time · 71e52c27

由 Ard Biesheuvel 提交于 7月 30, 2018

Update the core AES/GCM transform and the associated plumbing to operate
on 2 AES/GHASH blocks at a time. By itself, this is not expected to
result in a noticeable speedup, but it paves the way for reimplementing
the GHASH component using 2-way aggregation.
Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

71e52c27

31 7月, 2018 1 次提交

crypto/arm64: aes-ce-gcm - add missing kernel_neon_begin/end pair · c7513c2a

由 Ard Biesheuvel 提交于 7月 27, 2018

Calling pmull_gcm_encrypt_block() requires kernel_neon_begin() and
kernel_neon_end() to be used since the routine touches the NEON
register file. Add the missing calls.

Also, since NEON register contents are not preserved outside of
a kernel mode NEON region, pass the key schedule array again.

Fixes: 7c50136a ("crypto: arm64/aes-ghash - yield NEON after every ...")
Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

c7513c2a

09 7月, 2018 1 次提交

crypto: shash - remove useless setting of type flags · e50944e2

由 Eric Biggers 提交于 6月 30, 2018

Many shash algorithms set .cra_flags = CRYPTO_ALG_TYPE_SHASH.  But this
is redundant with the C structure type ('struct shash_alg'), and
crypto_register_shash() already sets the type flag automatically,
clearing any type flag that was already there.  Apparently the useless
assignment has just been copy+pasted around.

So, remove the useless assignment from all the shash algorithms.

This patch shouldn't change any actual behavior.
Signed-off-by: NEric Biggers <ebiggers@google.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

e50944e2

12 5月, 2018 1 次提交

crypto: arm64/aes-ghash - yield NEON after every block of input · 7c50136a

由 Ard Biesheuvel 提交于 4月 30, 2018

Avoid excessive scheduling delays under a preemptible kernel by
yielding the NEON after every block of input.
Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

7c50136a

04 8月, 2017 3 次提交

crypto: arm64/ghash - add NEON accelerated fallback for 64-bit PMULL · 03c9a333

由 Ard Biesheuvel 提交于 7月 24, 2017

Implement a NEON fallback for systems that do support NEON but have
no support for the optional 64x64->128 polynomial multiplication
instruction that is part of the ARMv8 Crypto Extensions. It is based
on the paper "Fast Software Polynomial Multiplication on ARM Processors
Using the NEON Engine" by Danilo Camara, Conrado Gouvea, Julio Lopez and
Ricardo Dahab (https://hal.inria.fr/hal-01506572), but has been reworked
extensively for the AArch64 ISA.

On a low-end core such as the Cortex-A53 found in the Raspberry Pi3, the
NEON based implementation is 4x faster than the table based one, and
is time invariant as well, making it less vulnerable to timing attacks.
When combined with the bit-sliced NEON implementation of AES-CTR, the
AES-GCM performance increases by 2x (from 58 to 29 cycles per byte).
Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

03c9a333

crypto: arm64/gcm - implement native driver using v8 Crypto Extensions · 537c1445

由 Ard Biesheuvel 提交于 7月 24, 2017

Currently, the AES-GCM implementation for arm64 systems that support the
ARMv8 Crypto Extensions is based on the generic GCM module, which combines
the AES-CTR implementation using AES instructions with the PMULL based
GHASH driver. This is suboptimal, given the fact that the input data needs
to be loaded twice, once for the encryption and again for the MAC
calculation.

On Cortex-A57 (r1p2) and other recent cores that implement micro-op fusing
for the AES instructions, AES executes at less than 1 cycle per byte, which
means that any cycles wasted on loading the data twice hurt even more.

So implement a new GCM driver that combines the AES and PMULL instructions
at the block level. This improves performance on Cortex-A57 by ~37% (from
3.5 cpb to 2.6 cpb)
Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

537c1445

crypto: arm64/ghash-ce - add non-SIMD scalar fallback · 6d6254d7

由 Ard Biesheuvel 提交于 7月 24, 2017

The arm64 kernel will shortly disallow nested kernel mode NEON, so
add a fallback to scalar C code that can be invoked in that case.
Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

6d6254d7

18 6月, 2014 2 次提交

arm64/crypto: improve performance of GHASH algorithm · b913a640

由 Ard Biesheuvel 提交于 6月 16, 2014

This patches modifies the GHASH secure hash implementation to switch to a
faster, polynomial multiplication based reduction instead of one that uses
shifts and rotates.
Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

b913a640

arm64/crypto: fix data corruption bug in GHASH algorithm · 6aa8b209

由 Ard Biesheuvel 提交于 6月 16, 2014

This fixes a bug in the GHASH algorithm resulting in the calculated hash to be
incorrect if the input is presented in chunks whose size is not a multiple of
16 bytes.
Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Fixes: fdd23894 ("arm64/crypto: GHASH secure hash using ARMv8 Crypto Extensions")
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

6aa8b209

15 5月, 2014 1 次提交

arm64/crypto: GHASH secure hash using ARMv8 Crypto Extensions · fdd23894

由 Ard Biesheuvel 提交于 3月 26, 2014

This is a port to ARMv8 (Crypto Extensions) of the Intel implementation of the
GHASH Secure Hash (used in the Galois/Counter chaining mode). It relies on the
optional PMULL/PMULL2 instruction (polynomial multiply long, what Intel call
carry-less multiply).
Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>

fdd23894

openanolis / cloud-kernel 接近 2 年 前同步成功

openanolis / cloud-kernel
接近 2 年前同步成功