1. 25 4月, 2013 1 次提交
  2. 20 1月, 2013 1 次提交
  3. 06 12月, 2012 1 次提交
  4. 24 10月, 2012 1 次提交
  5. 07 9月, 2012 1 次提交
    • J
      crypto: cast6-avx - tune assembler code for more performance · c09220e1
      Jussi Kivilinna 提交于
      Patch replaces 'movb' instructions with 'movzbl' to break false register
      dependencies, interleaves instructions better for out-of-order scheduling
      and merges constant 16-bit rotation with round-key variable rotation.
      
      tcrypt ECB results:
      
      Intel Core i5-2450M:
      
      size    old-vs-new      new-vs-generic  old-vs-generic
              enc     dec     enc     dec     enc     dec
      256     1.13x   1.19x   2.05x   2.17x   1.82x   1.82x
      1k      1.18x   1.21x   2.26x   2.33x   1.93x   1.93x
      8k      1.19x   1.19x   2.32x   2.33x   1.95x   1.95x
      
      [v2]
       - Do instruction interleaving another way to avoid adding new FPU<=>CPU
         register moves as these cause performance drop on Bulldozer.
       - Improvements to round-key variable rotation handling.
       - Further interleaving improvements for better out-of-order scheduling.
      
      Cc: Johannes Goetzfried <Johannes.Goetzfried@informatik.stud.uni-erlangen.de>
      Signed-off-by: NJussi Kivilinna <jussi.kivilinna@mbnet.fi>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      c09220e1
  6. 01 8月, 2012 1 次提交
    • J
      crypto: cast6 - add x86_64/avx assembler implementation · 4ea1277d
      Johannes Goetzfried 提交于
      This patch adds a x86_64/avx assembler implementation of the Cast6 block
      cipher. The implementation processes eight blocks in parallel (two 4 block
      chunk AVX operations). The table-lookups are done in general-purpose registers.
      For small blocksizes the functions from the generic module are called. A good
      performance increase is provided for blocksizes greater or equal to 128B.
      
      Patch has been tested with tcrypt and automated filesystem tests.
      
      Tcrypt benchmark results:
      
      Intel Core i5-2500 CPU (fam:6, model:42, step:7)
      
      cast6-avx-x86_64 vs. cast6-generic
      128bit key:                                             (lrw:256bit)    (xts:256bit)
      size    ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec
      16B     0.97x   1.00x   1.01x   1.01x   0.99x   0.97x   0.98x   1.01x   0.96x   0.98x
      64B     0.98x   0.99x   1.02x   1.01x   0.99x   1.00x   1.01x   0.99x   1.00x   0.99x
      256B    1.77x   1.84x   0.99x   1.85x   1.77x   1.77x   1.70x   1.74x   1.69x   1.72x
      1024B   1.93x   1.95x   0.99x   1.96x   1.93x   1.93x   1.84x   1.85x   1.89x   1.87x
      8192B   1.91x   1.95x   0.99x   1.97x   1.95x   1.91x   1.86x   1.87x   1.93x   1.90x
      
      256bit key:                                             (lrw:384bit)    (xts:512bit)
      size    ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec
      16B     0.97x   0.99x   1.02x   1.01x   0.98x   0.99x   1.00x   1.00x   0.98x   0.98x
      64B     0.98x   0.99x   1.01x   1.00x   1.00x   1.00x   1.01x   1.01x   0.97x   1.00x
      256B    1.77x   1.83x   1.00x   1.86x   1.79x   1.78x   1.70x   1.76x   1.71x   1.69x
      1024B   1.92x   1.95x   0.99x   1.96x   1.93x   1.93x   1.83x   1.86x   1.89x   1.87x
      8192B   1.94x   1.95x   0.99x   1.97x   1.95x   1.95x   1.87x   1.87x   1.93x   1.91x
      Signed-off-by: NJohannes Goetzfried <Johannes.Goetzfried@informatik.stud.uni-erlangen.de>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      4ea1277d