1. 07 12月, 2016 1 次提交
  2. 11 5月, 2015 1 次提交
    • A
      crypto: arm/sha512 - accelerated SHA-512 using ARM generic ASM and NEON · c80ae7ca
      Ard Biesheuvel 提交于
      This replaces the SHA-512 NEON module with the faster and more
      versatile implementation from the OpenSSL project. It consists
      of both a NEON and a generic ASM version of the core SHA-512
      transform, where the NEON version reverts to the ASM version
      when invoked in non-process context.
      
      This patch is based on the OpenSSL upstream version b1a5d1c65208
      of sha512-armv4.pl, which can be found here:
      
        https://git.openssl.org/gitweb/?p=openssl.git;h=b1a5d1c65208
      
      Performance relative to the generic implementation (measured
      using tcrypt.ko mode=306 sec=1 running on a Cortex-A57 under
      KVM):
      
        input size	block size	asm	neon	old neon
      
        16		16		1.39	2.54	2.21
        64		16		1.32	2.33	2.09
        64		64		1.38	2.53	2.19
        256		16		1.31	2.28	2.06
        256		64		1.38	2.54	2.25
        256		256		1.40	2.77	2.39
        1024		16		1.29	2.22	2.01
        1024		256		1.40	2.82	2.45
        1024		1024		1.41	2.93	2.53
        2048		16		1.33	2.21	2.00
        2048		256		1.40	2.84	2.46
        2048		1024		1.41	2.96	2.55
        2048		2048		1.41	2.98	2.56
        4096		16		1.34	2.20	1.99
        4096		256		1.40	2.84	2.46
        4096		1024		1.41	2.97	2.56
        4096		4096		1.41	3.01	2.58
        8192		16		1.34	2.19	1.99
        8192		256		1.40	2.85	2.47
        8192		1024		1.41	2.98	2.56
        8192		4096		1.41	2.71	2.59
        8192		8192		1.51	3.51	2.69
      Acked-by: NJussi Kivilinna <jussi.kivilinna@iki.fi>
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      c80ae7ca
  3. 13 4月, 2015 1 次提交
    • A
      crypto: arm - workaround for building with old binutils · 3abafaf2
      Ard Biesheuvel 提交于
      Old versions of binutils (before 2.23) do not yet understand the
      crypto-neon-fp-armv8 fpu instructions, and an attempt to build these
      files results in a build failure:
      
      arch/arm/crypto/aes-ce-core.S:133: Error: selected processor does not support ARM mode `vld1.8 {q10-q11},[ip]!'
      arch/arm/crypto/aes-ce-core.S:133: Error: bad instruction `aese.8 q0,q8'
      arch/arm/crypto/aes-ce-core.S:133: Error: bad instruction `aesmc.8 q0,q0'
      arch/arm/crypto/aes-ce-core.S:133: Error: bad instruction `aese.8 q0,q9'
      arch/arm/crypto/aes-ce-core.S:133: Error: bad instruction `aesmc.8 q0,q0'
      
      Since the affected versions are still in widespread use, and this breaks
      'allmodconfig' builds, we should try to at least get a successful kernel
      build. Unfortunately, I could not come up with a way to make the Kconfig
      symbol depend on the binutils version, which would be the nicest solution.
      
      Instead, this patch uses the 'as-instr' Kbuild macro to find out whether
      the support is present in the assembler, and otherwise emits a non-fatal
      warning indicating which selected modules could not be built.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Link: http://storage.kernelci.org/next/next-20150410/arm-allmodconfig/build.log
      Fixes: 864cbeed ("crypto: arm - add support for SHA1 using ARMv8 Crypto Instructions")
      [ard.biesheuvel:
       - omit modules entirely instead of building empty ones if binutils is too old
       - update commit log accordingly]
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      3abafaf2
  4. 03 4月, 2015 1 次提交
    • S
      crypto: arm/sha256 - Add optimized SHA-256/224 · f2f770d7
      Sami Tolvanen 提交于
      Add Andy Polyakov's optimized assembly and NEON implementations for
      SHA-256/224.
      
      The sha256-armv4.pl script for generating the assembly code is from
      OpenSSL commit 51f8d095562f36cdaa6893597b5c609e943b0565.
      
      Compared to sha256-generic these implementations have the following
      tcrypt speed improvements on Motorola Nexus 6 (Snapdragon 805):
      
        bs    b/u      sha256-neon  sha256-asm
        16    16       x1.32        x1.19
        64    16       x1.27        x1.15
        64    64       x1.36        x1.20
        256   16       x1.22        x1.11
        256   64       x1.36        x1.19
        256   256      x1.59        x1.23
        1024  16       x1.21        x1.10
        1024  256      x1.65        x1.23
        1024  1024     x1.76        x1.25
        2048  16       x1.21        x1.10
        2048  256      x1.66        x1.23
        2048  1024     x1.78        x1.25
        2048  2048     x1.79        x1.25
        4096  16       x1.20        x1.09
        4096  256      x1.66        x1.23
        4096  1024     x1.79        x1.26
        4096  4096     x1.82        x1.26
        8192  16       x1.20        x1.09
        8192  256      x1.67        x1.23
        8192  1024     x1.80        x1.26
        8192  4096     x1.85        x1.28
        8192  8192     x1.85        x1.27
      
      Where bs refers to block size and b/u to bytes per update.
      Signed-off-by: NSami Tolvanen <samitolvanen@google.com>
      Cc: Andy Polyakov <appro@openssl.org>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      f2f770d7
  5. 12 3月, 2015 4 次提交
  6. 02 8月, 2014 2 次提交
  7. 05 10月, 2013 1 次提交
    • A
      ARM: add support for bit sliced AES using NEON instructions · e4e7f10b
      Ard Biesheuvel 提交于
      Bit sliced AES gives around 45% speedup on Cortex-A15 for encryption
      and around 25% for decryption. This implementation of the AES algorithm
      does not rely on any lookup tables so it is believed to be invulnerable
      to cache timing attacks.
      
      This algorithm processes up to 8 blocks in parallel in constant time. This
      means that it is not usable by chaining modes that are strictly sequential
      in nature, such as CBC encryption. CBC decryption, however, can benefit from
      this implementation and runs about 25% faster. The other chaining modes
      implemented in this module, XTS and CTR, can execute fully in parallel in
      both directions.
      
      The core code has been adopted from the OpenSSL project (in collaboration
      with the original author, on cc). For ease of maintenance, this version is
      identical to the upstream OpenSSL code, i.e., all modifications that were
      required to make it suitable for inclusion into the kernel have been made
      upstream. The original can be found here:
      
          http://git.openssl.org/gitweb/?p=openssl.git;a=commit;h=6f6a6130
      
      Note to integrators:
      While this implementation is significantly faster than the existing table
      based ones (generic or ARM asm), especially in CTR mode, the effects on
      power efficiency are unclear as of yet. This code does fundamentally more
      work, by calculating values that the table based code obtains by a simple
      lookup; only by doing all of that work in a SIMD fashion, it manages to
      perform better.
      
      Cc: Andy Polyakov <appro@openssl.org>
      Acked-by: NNicolas Pitre <nico@linaro.org>
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      e4e7f10b
  8. 07 9月, 2012 1 次提交