1. 12 6月, 2012 4 次提交
    • J
      crypto: serpent - add x86_64/avx assembler implementation · 7efe4076
      Johannes Goetzfried 提交于
      This patch adds a x86_64/avx assembler implementation of the Serpent block
      cipher. The implementation is very similar to the sse2 implementation and
      processes eight blocks in parallel. Because of the new non-destructive three
      operand syntax all move-instructions can be removed and therefore a little
      performance increase is provided.
      
      Patch has been tested with tcrypt and automated filesystem tests.
      
      Tcrypt benchmark results:
      
      Intel Core i5-2500 CPU (fam:6, model:42, step:7)
      
      serpent-avx-x86_64 vs. serpent-sse2-x86_64
      128bit key:                                             (lrw:256bit)    (xts:256bit)
      size    ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec
      16B     1.03x   1.01x   1.01x   1.01x   1.00x   1.00x   1.00x   1.00x   1.00x   1.01x
      64B     1.00x   1.00x   1.00x   1.00x   1.00x   0.99x   1.00x   1.01x   1.00x   1.00x
      256B    1.05x   1.03x   1.00x   1.02x   1.05x   1.06x   1.05x   1.02x   1.05x   1.02x
      1024B   1.05x   1.02x   1.00x   1.02x   1.05x   1.06x   1.05x   1.03x   1.05x   1.02x
      8192B   1.05x   1.02x   1.00x   1.02x   1.06x   1.06x   1.04x   1.03x   1.04x   1.02x
      
      256bit key:                                             (lrw:384bit)    (xts:512bit)
      size    ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec
      16B     1.01x   1.00x   1.01x   1.01x   1.00x   1.00x   0.99x   1.03x   1.01x   1.01x
      64B     1.00x   1.00x   1.00x   1.00x   1.00x   1.00x   1.00x   1.01x   1.00x   1.02x
      256B    1.05x   1.02x   1.00x   1.02x   1.05x   1.02x   1.04x   1.05x   1.05x   1.02x
      1024B   1.06x   1.02x   1.00x   1.02x   1.07x   1.06x   1.05x   1.04x   1.05x   1.02x
      8192B   1.05x   1.02x   1.00x   1.02x   1.06x   1.06x   1.04x   1.05x   1.05x   1.02x
      
      serpent-avx-x86_64 vs aes-asm (8kB block):
               128bit  256bit
      ecb-enc  1.26x   1.73x
      ecb-dec  1.20x   1.64x
      cbc-enc  0.33x   0.45x
      cbc-dec  1.24x   1.67x
      ctr-enc  1.32x   1.76x
      ctr-dec  1.32x   1.76x
      lrw-enc  1.20x   1.60x
      lrw-dec  1.15x   1.54x
      xts-enc  1.22x   1.64x
      xts-dec  1.17x   1.57x
      Signed-off-by: NJohannes Goetzfried <Johannes.Goetzfried@informatik.stud.uni-erlangen.de>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      7efe4076
    • J
      crypto: testmgr - expand twofish test vectors · 4da7de4d
      Johannes Goetzfried 提交于
      The AVX implementation of the twofish cipher processes 8 blocks parallel, so we
      need to make test vectors larger to check parallel code paths. Test vectors are
      also large enough to deal with 16 block parallel implementations which may occur
      in the future.
      Signed-off-by: NJohannes Goetzfried <Johannes.Goetzfried@informatik.stud.uni-erlangen.de>
      Signed-off-by: NJussi Kivilinna <jussi.kivilinna@mbnet.fi>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      4da7de4d
    • J
      crypto: twofish - add x86_64/avx assembler implementation · 107778b5
      Johannes Goetzfried 提交于
      This patch adds a x86_64/avx assembler implementation of the Twofish block
      cipher. The implementation processes eight blocks in parallel (two 4 block
      chunk AVX operations). The table-lookups are done in general-purpose registers.
      For small blocksizes the 3way-parallel functions from the twofish-x86_64-3way
      module are called. A good performance increase is provided for blocksizes
      greater or equal to 128B.
      
      Patch has been tested with tcrypt and automated filesystem tests.
      
      Tcrypt benchmark results:
      
      Intel Core i5-2500 CPU (fam:6, model:42, step:7)
      
      twofish-avx-x86_64 vs. twofish-x86_64-3way
      128bit key:                                             (lrw:256bit)    (xts:256bit)
      size    ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec
      16B     0.96x   0.97x   1.00x   0.95x   0.97x   0.97x   0.96x   0.95x   0.95x   0.98x
      64B     0.99x   0.99x   1.00x   0.99x   0.98x   0.98x   0.99x   0.98x   0.99x   0.98x
      256B    1.20x   1.21x   1.00x   1.19x   1.15x   1.14x   1.19x   1.20x   1.18x   1.19x
      1024B   1.29x   1.30x   1.00x   1.28x   1.23x   1.24x   1.26x   1.28x   1.26x   1.27x
      8192B   1.31x   1.32x   1.00x   1.31x   1.25x   1.25x   1.28x   1.29x   1.28x   1.30x
      
      256bit key:                                             (lrw:384bit)    (xts:512bit)
      size    ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec
      16B     0.96x   0.96x   1.00x   0.96x   0.97x   0.98x   0.95x   0.95x   0.95x   0.96x
      64B     1.00x   0.99x   1.00x   0.98x   0.98x   1.01x   0.98x   0.98x   0.98x   0.98x
      256B    1.20x   1.21x   1.00x   1.21x   1.15x   1.15x   1.19x   1.20x   1.18x   1.19x
      1024B   1.29x   1.30x   1.00x   1.28x   1.23x   1.23x   1.26x   1.27x   1.26x   1.27x
      8192B   1.31x   1.33x   1.00x   1.31x   1.26x   1.26x   1.29x   1.29x   1.28x   1.30x
      
      twofish-avx-x86_64 vs aes-asm (8kB block):
               128bit  256bit
      ecb-enc  1.19x   1.63x
      ecb-dec  1.18x   1.62x
      cbc-enc  0.75x   1.03x
      cbc-dec  1.23x   1.67x
      ctr-enc  1.24x   1.65x
      ctr-dec  1.24x   1.65x
      lrw-enc  1.15x   1.53x
      lrw-dec  1.14x   1.52x
      xts-enc  1.16x   1.56x
      xts-dec  1.16x   1.56x
      Signed-off-by: NJohannes Goetzfried <Johannes.Goetzfried@informatik.stud.uni-erlangen.de>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      107778b5
    • S
  2. 22 5月, 2012 2 次提交
  3. 10 4月, 2012 1 次提交
  4. 09 4月, 2012 1 次提交
  5. 05 4月, 2012 1 次提交
  6. 02 4月, 2012 1 次提交
  7. 29 3月, 2012 3 次提交
  8. 24 3月, 2012 1 次提交
  9. 20 3月, 2012 1 次提交
  10. 14 3月, 2012 7 次提交
    • J
      crypto: camellia - add assembler implementation for x86_64 · 0b95ec56
      Jussi Kivilinna 提交于
      Patch adds x86_64 assembler implementation of Camellia block cipher. Two set of
      functions are provided. First set is regular 'one-block at time' encrypt/decrypt
      functions. Second is 'two-block at time' functions that gain performance increase
      on out-of-order CPUs. Performance of 2-way functions should be equal to 1-way
      functions with in-order CPUs.
      
      Patch has been tested with tcrypt and automated filesystem tests.
      
      Tcrypt benchmark results:
      
      AMD Phenom II 1055T (fam:16, model:10):
      
      camellia-asm vs camellia_generic:
      128bit key:                                             (lrw:256bit)    (xts:256bit)
      size    ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec
      16B     1.27x   1.22x   1.30x   1.42x   1.30x   1.34x   1.19x   1.05x   1.23x   1.24x
      64B     1.74x   1.79x   1.43x   1.87x   1.81x   1.87x   1.48x   1.38x   1.55x   1.62x
      256B    1.90x   1.87x   1.43x   1.94x   1.94x   1.95x   1.63x   1.62x   1.67x   1.70x
      1024B   1.96x   1.93x   1.43x   1.95x   1.98x   2.01x   1.67x   1.69x   1.74x   1.80x
      8192B   1.96x   1.96x   1.39x   1.93x   2.01x   2.03x   1.72x   1.64x   1.71x   1.76x
      
      256bit key:                                             (lrw:384bit)    (xts:512bit)
      size    ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec
      16B     1.23x   1.23x   1.33x   1.39x   1.34x   1.38x   1.04x   1.18x   1.21x   1.29x
      64B     1.72x   1.69x   1.42x   1.78x   1.81x   1.89x   1.57x   1.52x   1.56x   1.65x
      256B    1.85x   1.88x   1.42x   1.86x   1.93x   1.96x   1.69x   1.65x   1.70x   1.75x
      1024B   1.88x   1.86x   1.45x   1.95x   1.96x   1.95x   1.77x   1.71x   1.77x   1.78x
      8192B   1.91x   1.86x   1.42x   1.91x   2.03x   1.98x   1.73x   1.71x   1.78x   1.76x
      
      camellia-asm vs aes-asm (8kB block):
               128bit  256bit
      ecb-enc  1.15x   1.22x
      ecb-dec  1.16x   1.16x
      cbc-enc  0.85x   0.90x
      cbc-dec  1.20x   1.23x
      ctr-enc  1.28x   1.30x
      ctr-dec  1.27x   1.28x
      lrw-enc  1.12x   1.16x
      lrw-dec  1.08x   1.10x
      xts-enc  1.11x   1.15x
      xts-dec  1.14x   1.15x
      
      Intel Core2 T8100 (fam:6, model:23, step:6):
      
      camellia-asm vs camellia_generic:
      128bit key:                                             (lrw:256bit)    (xts:256bit)
      size    ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec
      16B     1.10x   1.12x   1.14x   1.16x   1.16x   1.15x   1.02x   1.02x   1.08x   1.08x
      64B     1.61x   1.60x   1.17x   1.68x   1.67x   1.66x   1.43x   1.42x   1.44x   1.42x
      256B    1.65x   1.73x   1.17x   1.77x   1.81x   1.80x   1.54x   1.53x   1.58x   1.54x
      1024B   1.76x   1.74x   1.18x   1.80x   1.85x   1.85x   1.60x   1.59x   1.65x   1.60x
      8192B   1.77x   1.75x   1.19x   1.81x   1.85x   1.86x   1.63x   1.61x   1.66x   1.62x
      
      256bit key:                                             (lrw:384bit)    (xts:512bit)
      size    ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec
      16B     1.10x   1.07x   1.13x   1.16x   1.11x   1.16x   1.03x   1.02x   1.08x   1.07x
      64B     1.61x   1.62x   1.15x   1.66x   1.63x   1.68x   1.47x   1.46x   1.47x   1.44x
      256B    1.71x   1.70x   1.16x   1.75x   1.69x   1.79x   1.58x   1.57x   1.59x   1.55x
      1024B   1.78x   1.72x   1.17x   1.75x   1.80x   1.80x   1.63x   1.62x   1.65x   1.62x
      8192B   1.76x   1.73x   1.17x   1.78x   1.80x   1.81x   1.64x   1.62x   1.68x   1.64x
      
      camellia-asm vs aes-asm (8kB block):
               128bit  256bit
      ecb-enc  1.17x   1.21x
      ecb-dec  1.17x   1.20x
      cbc-enc  0.80x   0.82x
      cbc-dec  1.22x   1.24x
      ctr-enc  1.25x   1.26x
      ctr-dec  1.25x   1.26x
      lrw-enc  1.14x   1.18x
      lrw-dec  1.13x   1.17x
      xts-enc  1.14x   1.18x
      xts-dec  1.14x   1.17x
      Signed-off-by: NJussi Kivilinna <jussi.kivilinna@mbnet.fi>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      0b95ec56
    • J
    • J
      crypto: camellia - fix checkpatch warnings · e2861a71
      Jussi Kivilinna 提交于
      Fix checkpatch warnings before renaming file.
      Signed-off-by: NJussi Kivilinna <jussi.kivilinna@mbnet.fi>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      e2861a71
    • J
      crypto: camellia - rename camellia module to camellia_generic · 075e39df
      Jussi Kivilinna 提交于
      Rename camellia module to camellia_generic to allow optimized assembler
      implementations to autoload with module-alias.
      Signed-off-by: NJussi Kivilinna <jussi.kivilinna@mbnet.fi>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      075e39df
    • J
      crypto: tcrypt - add more camellia tests · 4de59337
      Jussi Kivilinna 提交于
      Add tests for CTR, LRW and XTS modes.
      Signed-off-by: NJussi Kivilinna <jussi.kivilinna@mbnet.fi>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      4de59337
    • J
      crypto: testmgr - add more camellia test vectors · 0840605e
      Jussi Kivilinna 提交于
      New ECB, CBC, CTR, LRW and XTS test vectors for camellia. Larger ECB/CBC test
      vectors needed for parallel 2-way camellia implementation.
      Signed-off-by: NJussi Kivilinna <jussi.kivilinna@mbnet.fi>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      0840605e
    • J
      crypto: camellia - simplify key setup and CAMELLIA_ROUNDSM macro · c9b56d33
      Jussi Kivilinna 提交于
      camellia_setup_tail() applies 'inverse of the last half of P-function' to
      subkeys, which is unneeded if keys are applied directly to yl/yr in
      CAMELLIA_ROUNDSM.
      
      Patch speeds up key setup and should speed up CAMELLIA_ROUNDSM as applying
      key to yl/yr early has less register dependencies.
      
      Quick tcrypt camellia results:
       x86_64, AMD Phenom II, ~5% faster
       x86_64, Intel Core 2, ~0.5% faster
       i386, Intel Atom N270, ~1% faster
      Signed-off-by: NJussi Kivilinna <jussi.kivilinna@mbnet.fi>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      c9b56d33
  11. 27 2月, 2012 1 次提交
  12. 16 2月, 2012 1 次提交
  13. 05 2月, 2012 2 次提交
  14. 26 1月, 2012 2 次提交
  15. 15 1月, 2012 3 次提交
    • A
      crypto: sha512 - use standard ror64() · b85a088f
      Alexey Dobriyan 提交于
      Use standard ror64() instead of hand-written.
      There is no standard ror64, so create it.
      
      The difference is shift value being "unsigned int" instead of uint64_t
      (for which there is no reason). gcc starts to emit native ROR instructions
      which it doesn't do for some reason currently. This should make the code
      faster.
      
      Patch survives in-tree crypto test and ping flood with hmac(sha512) on.
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      b85a088f
    • A
      crypto: sha512 - reduce stack usage to safe number · 51fc6dc8
      Alexey Dobriyan 提交于
      For rounds 16--79, W[i] only depends on W[i - 2], W[i - 7], W[i - 15] and W[i - 16].
      Consequently, keeping all W[80] array on stack is unnecessary,
      only 16 values are really needed.
      
      Using W[16] instead of W[80] greatly reduces stack usage
      (~750 bytes to ~340 bytes on x86_64).
      
      Line by line explanation:
      * BLEND_OP
        array is "circular" now, all indexes have to be modulo 16.
        Round number is positive, so remainder operation should be
        without surprises.
      
      * initial full message scheduling is trimmed to first 16 values which
        come from data block, the rest is calculated before it's needed.
      
      * original loop body is unrolled version of new SHA512_0_15 and
        SHA512_16_79 macros, unrolling was done to not do explicit variable
        renaming. Otherwise it's the very same code after preprocessing.
        See sha1_transform() code which does the same trick.
      
      Patch survives in-tree crypto test and original bugreport test
      (ping flood with hmac(sha512).
      
      See FIPS 180-2 for SHA-512 definition
      http://csrc.nist.gov/publications/fips/fips180-2/fips180-2withchangenotice.pdfSigned-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      51fc6dc8
    • A
      crypto: sha512 - make it work, undo percpu message schedule · 84e31fdb
      Alexey Dobriyan 提交于
      commit f9e2bca6
      aka "crypto: sha512 - Move message schedule W[80] to static percpu area"
      created global message schedule area.
      
      If sha512_update will ever be entered twice, hash will be silently
      calculated incorrectly.
      
      Probably the easiest way to notice incorrect hashes being calculated is
      to run 2 ping floods over AH with hmac(sha512):
      
      	#!/usr/sbin/setkey -f
      	flush;
      	spdflush;
      	add IP1 IP2 ah 25 -A hmac-sha512 0x00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000025;
      	add IP2 IP1 ah 52 -A hmac-sha512 0x00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000052;
      	spdadd IP1 IP2 any -P out ipsec ah/transport//require;
      	spdadd IP2 IP1 any -P in  ipsec ah/transport//require;
      
      XfrmInStateProtoError will start ticking with -EBADMSG being returned
      from ah_input(). This never happens with, say, hmac(sha1).
      
      With patch applied (on BOTH sides), XfrmInStateProtoError does not tick
      with multiple bidirectional ping flood streams like it doesn't tick
      with SHA-1.
      
      After this patch sha512_transform() will start using ~750 bytes of stack on x86_64.
      This is OK for simple loads, for something more heavy, stack reduction will be done
      separatedly.
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      84e31fdb
  16. 20 12月, 2011 5 次提交
  17. 30 11月, 2011 3 次提交
  18. 21 11月, 2011 1 次提交
    • J
      crypto: serpent-sse2 - add lrw support · 18482053
      Jussi Kivilinna 提交于
      Patch adds LRW support for serpent-sse2 by using lrw_crypt(). Patch has been
      tested with tcrypt and automated filesystem tests.
      
      Tcrypt benchmarks results (serpent-sse2/serpent_generic speed ratios):
      
      Benchmark results with tcrypt:
      
      Intel Celeron T1600 (x86_64) (fam:6, model:15, step:13):
      size    lrw-enc lrw-dec
      16B     1.00x   0.96x
      64B     1.01x   1.01x
      256B    3.01x   2.97x
      1024B   3.39x   3.33x
      8192B   3.35x   3.33x
      
      AMD Phenom II 1055T (x86_64) (fam:16, model:10):
      size    lrw-enc lrw-dec
      16B     0.98x   1.03x
      64B     1.01x   1.04x
      256B    2.10x   2.14x
      1024B   2.28x   2.33x
      8192B   2.30x   2.33x
      
      Intel Atom N270 (i586):
      size    lrw-enc lrw-dec
      16B     0.97x   0.97x
      64B     1.47x   1.50x
      256B    1.72x   1.69x
      1024B   1.88x   1.81x
      8192B   1.84x   1.79x
      Signed-off-by: NJussi Kivilinna <jussi.kivilinna@mbnet.fi>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      18482053