1. 20 6月, 2014 2 次提交
  2. 21 3月, 2014 1 次提交
    • C
      crypto: sha - SHA1 transform x86_64 AVX2 · 7c1da8d0
      chandramouli narayanan 提交于
      This git patch adds x86_64 AVX2 optimization of SHA1
      transform to crypto support. The patch has been tested with 3.14.0-rc1
      kernel.
      
      On a Haswell desktop, with turbo disabled and all cpus running
      at maximum frequency, tcrypt shows AVX2 performance improvement
      from 3% for 256 bytes update to 16% for 1024 bytes update over
      AVX implementation.
      
      This patch adds sha1_avx2_transform(), the glue, build and
      configuration changes needed for AVX2 optimization of
      SHA1 transform to crypto support.
      
      sha1-ssse3 is one module which adds the necessary optimization
      support (SSSE3/AVX/AVX2) for the low-level SHA1 transform function.
      With better optimization support, transform function is overridden
      as the case may be. In the case of AVX2, due to performance reasons
      across datablock sizes, the AVX or AVX2 transform function is used
      at run-time as it suits best. The Makefile change therefore appends
      the necessary objects to the linkage. Due to this, the patch merely
      appends AVX2 transform to the existing build mix and Kconfig support
      and leaves the configuration build support as is.
      Signed-off-by: NChandramouli Narayanan <mouli@linux.intel.com>
      Reviewed-by: NMarek Vasut <marex@denx.de>
      Acked-by: NH. Peter Anvin <hpa@linux.intel.com>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      7c1da8d0
  3. 26 10月, 2013 1 次提交
  4. 05 10月, 2013 1 次提交
    • A
      ARM: add support for bit sliced AES using NEON instructions · e4e7f10b
      Ard Biesheuvel 提交于
      Bit sliced AES gives around 45% speedup on Cortex-A15 for encryption
      and around 25% for decryption. This implementation of the AES algorithm
      does not rely on any lookup tables so it is believed to be invulnerable
      to cache timing attacks.
      
      This algorithm processes up to 8 blocks in parallel in constant time. This
      means that it is not usable by chaining modes that are strictly sequential
      in nature, such as CBC encryption. CBC decryption, however, can benefit from
      this implementation and runs about 25% faster. The other chaining modes
      implemented in this module, XTS and CTR, can execute fully in parallel in
      both directions.
      
      The core code has been adopted from the OpenSSL project (in collaboration
      with the original author, on cc). For ease of maintenance, this version is
      identical to the upstream OpenSSL code, i.e., all modifications that were
      required to make it suitable for inclusion into the kernel have been made
      upstream. The original can be found here:
      
          http://git.openssl.org/gitweb/?p=openssl.git;a=commit;h=6f6a6130
      
      Note to integrators:
      While this implementation is significantly faster than the existing table
      based ones (generic or ARM asm), especially in CTR mode, the effects on
      power efficiency are unclear as of yet. This code does fundamentally more
      work, by calculating values that the table based code obtains by a simple
      lookup; only by doing all of that work in a SIMD fashion, it manages to
      perform better.
      
      Cc: Andy Polyakov <appro@openssl.org>
      Acked-by: NNicolas Pitre <nico@linaro.org>
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      e4e7f10b
  5. 24 9月, 2013 2 次提交
  6. 07 9月, 2013 1 次提交
  7. 24 7月, 2013 1 次提交
  8. 10 7月, 2013 1 次提交
  9. 21 6月, 2013 2 次提交
  10. 05 6月, 2013 2 次提交
  11. 24 5月, 2013 1 次提交
  12. 20 5月, 2013 1 次提交
  13. 25 4月, 2013 10 次提交
  14. 26 2月, 2013 1 次提交
  15. 20 1月, 2013 1 次提交
  16. 12 1月, 2013 1 次提交
  17. 10 1月, 2013 1 次提交
  18. 06 12月, 2012 1 次提交
  19. 09 11月, 2012 1 次提交
    • J
      crypto: camellia - add AES-NI/AVX/x86_64 assembler implementation of camellia cipher · d9b1d2e7
      Jussi Kivilinna 提交于
      This patch adds AES-NI/AVX/x86_64 assembler implementation of Camellia block
      cipher. Implementation process data in sixteen block chunks, which are
      byte-sliced and AES SubBytes is reused for Camellia s-box with help of pre-
      and post-filtering.
      
      Patch has been tested with tcrypt and automated filesystem tests.
      
      tcrypt test results:
      
      Intel Core i5-2450M:
      
      camellia-aesni-avx vs camellia-asm-x86_64-2way:
      128bit key:                                             (lrw:256bit)    (xts:256bit)
      size    ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec
      16B     0.98x   0.96x   0.99x   0.96x   0.96x   0.95x   0.95x   0.94x   0.97x   0.98x
      64B     0.99x   0.98x   1.00x   0.98x   0.98x   0.99x   0.98x   0.93x   0.99x   0.98x
      256B    2.28x   2.28x   1.01x   2.29x   2.25x   2.24x   1.96x   1.97x   1.91x   1.90x
      1024B   2.57x   2.56x   1.00x   2.57x   2.51x   2.53x   2.19x   2.17x   2.19x   2.22x
      8192B   2.49x   2.49x   1.00x   2.53x   2.48x   2.49x   2.17x   2.17x   2.22x   2.22x
      
      256bit key:                                             (lrw:384bit)    (xts:512bit)
      size    ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec
      16B     0.97x   0.98x   0.99x   0.97x   0.97x   0.96x   0.97x   0.98x   0.98x   0.99x
      64B     1.00x   1.00x   1.01x   0.99x   0.98x   0.99x   0.99x   0.99x   0.99x   0.99x
      256B    2.37x   2.37x   1.01x   2.39x   2.35x   2.33x   2.10x   2.11x   1.99x   2.02x
      1024B   2.58x   2.60x   1.00x   2.58x   2.56x   2.56x   2.28x   2.29x   2.28x   2.29x
      8192B   2.50x   2.52x   1.00x   2.56x   2.51x   2.51x   2.24x   2.25x   2.26x   2.29x
      Signed-off-by: NJussi Kivilinna <jussi.kivilinna@mbnet.fi>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      d9b1d2e7
  20. 15 10月, 2012 1 次提交
    • T
      crypto: crc32c - Optimize CRC32C calculation with PCLMULQDQ instruction · 6a8ce1ef
      Tim Chen 提交于
      This patch adds the crc_pcl function that calculates CRC32C checksum using the
      PCLMULQDQ instruction on processors that support this feature. This will
      provide speedup over using CRC32 instruction only.
      The usage of PCLMULQDQ necessitate the invocation of kernel_fpu_begin and
      kernel_fpu_end and incur some overhead.  So the new crc_pcl function is only
      invoked for buffer size of 512 bytes or more.  Larger sized
      buffers will expect to see greater speedup.  This feature is best used coupled
      with eager_fpu which reduces the kernel_fpu_begin/end overhead.  For
      buffer size of 1K the speedup is around 1.6x and for buffer size greater than
      4K, the speedup is around 3x compared to original implementation in crc32c-intel
      module. Test was performed on Sandy Bridge based platform with constant frequency
      set for cpu.
      
      A white paper detailing the algorithm can be found here:
      http://download.intel.com/design/intarch/papers/323405.pdfSigned-off-by: NTim Chen <tim.c.chen@linux.intel.com>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      6a8ce1ef
  21. 08 10月, 2012 1 次提交
    • D
      KEYS: Implement asymmetric key type · 964f3b3b
      David Howells 提交于
      Create a key type that can be used to represent an asymmetric key type for use
      in appropriate cryptographic operations, such as encryption, decryption,
      signature generation and signature verification.
      
      The key type is "asymmetric" and can provide access to a variety of
      cryptographic algorithms.
      
      Possibly, this would be better as "public_key" - but that has the disadvantage
      that "public key" is an overloaded term.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      964f3b3b
  22. 03 10月, 2012 1 次提交
  23. 07 9月, 2012 1 次提交
  24. 29 8月, 2012 1 次提交
  25. 26 8月, 2012 1 次提交
  26. 23 8月, 2012 2 次提交