1. 07 8月, 2018 1 次提交
    • A
      crypto: arm64 - revert NEON yield for fast AEAD implementations · f10dc56c
      Ard Biesheuvel 提交于
      As it turns out, checking the TIF_NEED_RESCHED flag after each
      iteration results in a significant performance regression (~10%)
      when running fast algorithms (i.e., ones that use special instructions
      and operate in the < 4 cycles per byte range) on in-order cores with
      comparatively slow memory accesses such as the Cortex-A53.
      
      Given the speed of these ciphers, and the fact that the page based
      nature of the AEAD scatterwalk API guarantees that the core NEON
      transform is never invoked with more than a single page's worth of
      input, we can estimate the worst case duration of any resulting
      scheduling blackout: on a 1 GHz Cortex-A53 running with 64k pages,
      processing a page's worth of input at 4 cycles per byte results in
      a delay of ~250 us, which is a reasonable upper bound.
      
      So let's remove the yield checks from the fused AES-CCM and AES-GCM
      routines entirely.
      
      This reverts commit 7b67ae4d and
      partially reverts commit 7c50136a.
      
      Fixes: 7c50136a ("crypto: arm64/aes-ghash - yield NEON after every ...")
      Fixes: 7b67ae4d ("crypto: arm64/aes-ccm - yield NEON after every ...")
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      f10dc56c
  2. 12 5月, 2018 1 次提交
  3. 04 8月, 2017 2 次提交
    • A
      crypto: arm64/ghash - add NEON accelerated fallback for 64-bit PMULL · 03c9a333
      Ard Biesheuvel 提交于
      Implement a NEON fallback for systems that do support NEON but have
      no support for the optional 64x64->128 polynomial multiplication
      instruction that is part of the ARMv8 Crypto Extensions. It is based
      on the paper "Fast Software Polynomial Multiplication on ARM Processors
      Using the NEON Engine" by Danilo Camara, Conrado Gouvea, Julio Lopez and
      Ricardo Dahab (https://hal.inria.fr/hal-01506572), but has been reworked
      extensively for the AArch64 ISA.
      
      On a low-end core such as the Cortex-A53 found in the Raspberry Pi3, the
      NEON based implementation is 4x faster than the table based one, and
      is time invariant as well, making it less vulnerable to timing attacks.
      When combined with the bit-sliced NEON implementation of AES-CTR, the
      AES-GCM performance increases by 2x (from 58 to 29 cycles per byte).
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      03c9a333
    • A
      crypto: arm64/gcm - implement native driver using v8 Crypto Extensions · 537c1445
      Ard Biesheuvel 提交于
      Currently, the AES-GCM implementation for arm64 systems that support the
      ARMv8 Crypto Extensions is based on the generic GCM module, which combines
      the AES-CTR implementation using AES instructions with the PMULL based
      GHASH driver. This is suboptimal, given the fact that the input data needs
      to be loaded twice, once for the encryption and again for the MAC
      calculation.
      
      On Cortex-A57 (r1p2) and other recent cores that implement micro-op fusing
      for the AES instructions, AES executes at less than 1 cycle per byte, which
      means that any cycles wasted on loading the data twice hurt even more.
      
      So implement a new GCM driver that combines the AES and PMULL instructions
      at the block level. This improves performance on Cortex-A57 by ~37% (from
      3.5 cpb to 2.6 cpb)
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      537c1445
  4. 21 10月, 2016 1 次提交
  5. 18 6月, 2014 1 次提交
  6. 15 5月, 2014 1 次提交