1. 20 1月, 2013 14 次提交
  2. 08 1月, 2013 1 次提交
  3. 06 12月, 2012 1 次提交
  4. 09 11月, 2012 2 次提交
    • J
      crypto: camellia - add AES-NI/AVX/x86_64 assembler implementation of camellia cipher · d9b1d2e7
      Jussi Kivilinna 提交于
      This patch adds AES-NI/AVX/x86_64 assembler implementation of Camellia block
      cipher. Implementation process data in sixteen block chunks, which are
      byte-sliced and AES SubBytes is reused for Camellia s-box with help of pre-
      and post-filtering.
      
      Patch has been tested with tcrypt and automated filesystem tests.
      
      tcrypt test results:
      
      Intel Core i5-2450M:
      
      camellia-aesni-avx vs camellia-asm-x86_64-2way:
      128bit key:                                             (lrw:256bit)    (xts:256bit)
      size    ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec
      16B     0.98x   0.96x   0.99x   0.96x   0.96x   0.95x   0.95x   0.94x   0.97x   0.98x
      64B     0.99x   0.98x   1.00x   0.98x   0.98x   0.99x   0.98x   0.93x   0.99x   0.98x
      256B    2.28x   2.28x   1.01x   2.29x   2.25x   2.24x   1.96x   1.97x   1.91x   1.90x
      1024B   2.57x   2.56x   1.00x   2.57x   2.51x   2.53x   2.19x   2.17x   2.19x   2.22x
      8192B   2.49x   2.49x   1.00x   2.53x   2.48x   2.49x   2.17x   2.17x   2.22x   2.22x
      
      256bit key:                                             (lrw:384bit)    (xts:512bit)
      size    ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec
      16B     0.97x   0.98x   0.99x   0.97x   0.97x   0.96x   0.97x   0.98x   0.98x   0.99x
      64B     1.00x   1.00x   1.01x   0.99x   0.98x   0.99x   0.99x   0.99x   0.99x   0.99x
      256B    2.37x   2.37x   1.01x   2.39x   2.35x   2.33x   2.10x   2.11x   1.99x   2.02x
      1024B   2.58x   2.60x   1.00x   2.58x   2.56x   2.56x   2.28x   2.29x   2.28x   2.29x
      8192B   2.50x   2.52x   1.00x   2.56x   2.51x   2.51x   2.24x   2.25x   2.26x   2.29x
      Signed-off-by: NJussi Kivilinna <jussi.kivilinna@mbnet.fi>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      d9b1d2e7
    • J
      crypto: camellia-x86_64 - share common functions and move structures and... · cf582cce
      Jussi Kivilinna 提交于
      crypto: camellia-x86_64 - share common functions and move structures and function definitions to header file
      
      Prepare camellia-x86_64 functions to be reused from AVX/AESNI implementation
      module.
      Signed-off-by: NJussi Kivilinna <jussi.kivilinna@mbnet.fi>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      cf582cce
  5. 24 10月, 2012 5 次提交
  6. 19 10月, 2012 1 次提交
  7. 15 10月, 2012 2 次提交
  8. 04 10月, 2012 1 次提交
  9. 27 9月, 2012 1 次提交
  10. 07 9月, 2012 4 次提交
    • J
      crypto: camellia-x86_64 - fix sparse warnings (constant is so big) · 1ffb72a3
      Jussi Kivilinna 提交于
      Fix "constant 0xXXXXXXXXXXXXXXXX is so big it's unsigned long" sparse warnings.
      Signed-off-by: NJussi Kivilinna <jussi.kivilinna@mbnet.fi>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      1ffb72a3
    • J
      crypto: cast6-avx - tune assembler code for more performance · c09220e1
      Jussi Kivilinna 提交于
      Patch replaces 'movb' instructions with 'movzbl' to break false register
      dependencies, interleaves instructions better for out-of-order scheduling
      and merges constant 16-bit rotation with round-key variable rotation.
      
      tcrypt ECB results:
      
      Intel Core i5-2450M:
      
      size    old-vs-new      new-vs-generic  old-vs-generic
              enc     dec     enc     dec     enc     dec
      256     1.13x   1.19x   2.05x   2.17x   1.82x   1.82x
      1k      1.18x   1.21x   2.26x   2.33x   1.93x   1.93x
      8k      1.19x   1.19x   2.32x   2.33x   1.95x   1.95x
      
      [v2]
       - Do instruction interleaving another way to avoid adding new FPU<=>CPU
         register moves as these cause performance drop on Bulldozer.
       - Improvements to round-key variable rotation handling.
       - Further interleaving improvements for better out-of-order scheduling.
      
      Cc: Johannes Goetzfried <Johannes.Goetzfried@informatik.stud.uni-erlangen.de>
      Signed-off-by: NJussi Kivilinna <jussi.kivilinna@mbnet.fi>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      c09220e1
    • J
      crypto: cast5-avx - tune assembler code for more performance · ddaea786
      Jussi Kivilinna 提交于
      Patch replaces 'movb' instructions with 'movzbl' to break false register
      dependencies, interleaves instructions better for out-of-order scheduling
      and merges constant 16-bit rotation with round-key variable rotation.
      
      tcrypt ECB results (128bit key):
      
      Intel Core i5-2450M:
      
      size    old-vs-new      new-vs-generic  old-vs-generic
              enc     dec     enc     dec     enc     dec
      256     1.18x   1.18x   2.45x   2.47x   2.08x   2.10x
      1k      1.20x   1.20x   2.73x   2.73x   2.28x   2.28x
      8k      1.20x   1.19x   2.73x   2.73x   2.28x   2.29x
      
      [v2]
       - Do instruction interleaving another way to avoid adding new FPU<=>CPU
         register moves as these cause performance drop on Bulldozer.
       - Improvements to round-key variable rotation handling.
       - Further interleaving improvements for better out-of-order scheduling.
      
      Cc: Johannes Goetzfried <Johannes.Goetzfried@informatik.stud.uni-erlangen.de>
      Signed-off-by: NJussi Kivilinna <jussi.kivilinna@mbnet.fi>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      ddaea786
    • J
      crypto: twofish-avx - tune assembler code for more performance · f94a73f8
      Jussi Kivilinna 提交于
      Patch replaces 'movb' instructions with 'movzbl' to break false register
      dependencies and interleaves instructions better for out-of-order scheduling.
      
      Tested on Intel Core i5-2450M and AMD FX-8100.
      
      tcrypt ECB results:
      
      Intel Core i5-2450M:
      
      size    old-vs-new      new-vs-3way     old-vs-3way
              enc     dec     enc     dec     enc     dec
      256     1.12x   1.13x   1.36x   1.37x   1.21x   1.22x
      1k      1.14x   1.14x   1.48x   1.49x   1.29x   1.31x
      8k      1.14x   1.14x   1.50x   1.52x   1.32x   1.33x
      
      AMD FX-8100:
      
      size    old-vs-new      new-vs-3way     old-vs-3way
              enc     dec     enc     dec     enc     dec
      256     1.10x   1.11x   1.01x   1.01x   0.92x   0.91x
      1k      1.11x   1.12x   1.08x   1.07x   0.97x   0.96x
      8k      1.11x   1.13x   1.10x   1.08x   0.99x   0.97x
      
      [v2]
       - Do instruction interleaving another way to avoid adding new FPU<=>CPU
         register moves as these cause performance drop on Bulldozer.
       - Further interleaving improvements for better out-of-order scheduling.
      Tested-by: NBorislav Petkov <bp@alien8.de>
      Cc: Johannes Goetzfried <Johannes.Goetzfried@informatik.stud.uni-erlangen.de>
      Signed-off-by: NJussi Kivilinna <jussi.kivilinna@mbnet.fi>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      f94a73f8
  11. 20 8月, 2012 1 次提交
    • J
      crypto: aesni_intel - improve lrw and xts performance by utilizing parallel... · 023af608
      Jussi Kivilinna 提交于
      crypto: aesni_intel - improve lrw and xts performance by utilizing parallel AES-NI hardware pipelines
      
      Use parallel LRW and XTS encryption facilities to better utilize AES-NI
      hardware pipelines and gain extra performance.
      
      Tcrypt benchmark results (async), old vs new ratios:
      
      Intel Core i5-2450M CPU (fam: 6, model: 42, step: 7)
      
      aes:128bit
              lrw:256bit      xts:256bit
      size    lrw-enc lrw-dec xts-dec xts-dec
      16B     0.99x   1.00x   1.22x   1.19x
      64B     1.38x   1.50x   1.58x   1.61x
      256B    2.04x   2.02x   2.27x   2.29x
      1024B   2.56x   2.54x   2.89x   2.92x
      8192B   2.85x   2.99x   3.40x   3.23x
      
      aes:192bit
              lrw:320bit      xts:384bit
      size    lrw-enc lrw-dec xts-dec xts-dec
      16B     1.08x   1.08x   1.16x   1.17x
      64B     1.48x   1.54x   1.59x   1.65x
      256B    2.18x   2.17x   2.29x   2.28x
      1024B   2.67x   2.67x   2.87x   3.05x
      8192B   2.93x   2.84x   3.28x   3.33x
      
      aes:256bit
              lrw:348bit      xts:512bit
      size    lrw-enc lrw-dec xts-dec xts-dec
      16B     1.07x   1.07x   1.18x   1.19x
      64B     1.56x   1.56x   1.70x   1.71x
      256B    2.22x   2.24x   2.46x   2.46x
      1024B   2.76x   2.77x   3.13x   3.05x
      8192B   2.99x   3.05x   3.40x   3.30x
      
      Cc: Huang Ying <ying.huang@intel.com>
      Signed-off-by: NJussi Kivilinna <jussi.kivilinna@mbnet.fi>
      Reviewed-by: NKim Phillips <kim.phillips@freescale.com>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      023af608
  12. 01 8月, 2012 3 次提交
    • J
      crypto: cast6 - add x86_64/avx assembler implementation · 4ea1277d
      Johannes Goetzfried 提交于
      This patch adds a x86_64/avx assembler implementation of the Cast6 block
      cipher. The implementation processes eight blocks in parallel (two 4 block
      chunk AVX operations). The table-lookups are done in general-purpose registers.
      For small blocksizes the functions from the generic module are called. A good
      performance increase is provided for blocksizes greater or equal to 128B.
      
      Patch has been tested with tcrypt and automated filesystem tests.
      
      Tcrypt benchmark results:
      
      Intel Core i5-2500 CPU (fam:6, model:42, step:7)
      
      cast6-avx-x86_64 vs. cast6-generic
      128bit key:                                             (lrw:256bit)    (xts:256bit)
      size    ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec
      16B     0.97x   1.00x   1.01x   1.01x   0.99x   0.97x   0.98x   1.01x   0.96x   0.98x
      64B     0.98x   0.99x   1.02x   1.01x   0.99x   1.00x   1.01x   0.99x   1.00x   0.99x
      256B    1.77x   1.84x   0.99x   1.85x   1.77x   1.77x   1.70x   1.74x   1.69x   1.72x
      1024B   1.93x   1.95x   0.99x   1.96x   1.93x   1.93x   1.84x   1.85x   1.89x   1.87x
      8192B   1.91x   1.95x   0.99x   1.97x   1.95x   1.91x   1.86x   1.87x   1.93x   1.90x
      
      256bit key:                                             (lrw:384bit)    (xts:512bit)
      size    ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec
      16B     0.97x   0.99x   1.02x   1.01x   0.98x   0.99x   1.00x   1.00x   0.98x   0.98x
      64B     0.98x   0.99x   1.01x   1.00x   1.00x   1.00x   1.01x   1.01x   0.97x   1.00x
      256B    1.77x   1.83x   1.00x   1.86x   1.79x   1.78x   1.70x   1.76x   1.71x   1.69x
      1024B   1.92x   1.95x   0.99x   1.96x   1.93x   1.93x   1.83x   1.86x   1.89x   1.87x
      8192B   1.94x   1.95x   0.99x   1.97x   1.95x   1.95x   1.87x   1.87x   1.93x   1.91x
      Signed-off-by: NJohannes Goetzfried <Johannes.Goetzfried@informatik.stud.uni-erlangen.de>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      4ea1277d
    • J
      crypto: cast5 - add x86_64/avx assembler implementation · 4d6d6a2c
      Johannes Goetzfried 提交于
      This patch adds a x86_64/avx assembler implementation of the Cast5 block
      cipher. The implementation processes sixteen blocks in parallel (four 4 block
      chunk AVX operations). The table-lookups are done in general-purpose registers.
      For small blocksizes the functions from the generic module are called. A good
      performance increase is provided for blocksizes greater or equal to 128B.
      
      Patch has been tested with tcrypt and automated filesystem tests.
      
      Tcrypt benchmark results:
      
      Intel Core i5-2500 CPU (fam:6, model:42, step:7)
      
      cast5-avx-x86_64 vs. cast5-generic
      64bit key:
      size    ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec
      16B     0.99x   0.99x   1.00x   1.00x   1.02x   1.01x
      64B     1.00x   1.00x   0.98x   1.00x   1.01x   1.02x
      256B    2.03x   2.01x   0.95x   2.11x   2.12x   2.13x
      1024B   2.30x   2.24x   0.95x   2.29x   2.35x   2.35x
      8192B   2.31x   2.27x   0.95x   2.31x   2.39x   2.39x
      
      128bit key:
      size    ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec
      16B     0.99x   0.99x   1.00x   1.00x   1.01x   1.01x
      64B     1.00x   1.00x   0.98x   1.01x   1.02x   1.01x
      256B    2.17x   2.13x   0.96x   2.19x   2.19x   2.19x
      1024B   2.29x   2.32x   0.95x   2.34x   2.37x   2.38x
      8192B   2.35x   2.32x   0.95x   2.35x   2.39x   2.39x
      Signed-off-by: NJohannes Goetzfried <Johannes.Goetzfried@informatik.stud.uni-erlangen.de>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      4d6d6a2c
    • J
      crypto: arch/x86 - cleanup - remove unneeded crypto_alg.cra_list initializations · 7af6c245
      Jussi Kivilinna 提交于
      Initialization of cra_list is currently mixed, most ciphers initialize this
      field and most shashes do not. Initialization however is not needed at all
      since cra_list is initialized/overwritten in __crypto_register_alg() with
      list_add(). Therefore perform cleanup to remove all unneeded initializations
      of this field in 'arch/x86/crypto/'.
      Signed-off-by: NJussi Kivilinna <jussi.kivilinna@mbnet.fi>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      7af6c245
  13. 11 7月, 2012 2 次提交
  14. 27 6月, 2012 2 次提交