1. 12 10月, 2017 1 次提交
  2. 23 1月, 2017 1 次提交
    • D
      crypto: x86 - make constants readonly, allow linker to merge them · e183914a
      Denys Vlasenko 提交于
      A lot of asm-optimized routines in arch/x86/crypto/ keep its
      constants in .data. This is wrong, they should be on .rodata.
      
      Mnay of these constants are the same in different modules.
      For example, 128-bit shuffle mask 0x000102030405060708090A0B0C0D0E0F
      exists in at least half a dozen places.
      
      There is a way to let linker merge them and use just one copy.
      The rules are as follows: mergeable objects of different sizes
      should not share sections. You can't put them all in one .rodata
      section, they will lose "mergeability".
      
      GCC puts its mergeable constants in ".rodata.cstSIZE" sections,
      or ".rodata.cstSIZE.<object_name>" if -fdata-sections is used.
      This patch does the same:
      
      	.section .rodata.cst16.SHUF_MASK, "aM", @progbits, 16
      
      It is important that all data in such section consists of
      16-byte elements, not larger ones, and there are no implicit
      use of one element from another.
      
      When this is not the case, use non-mergeable section:
      
      	.section .rodata[.VAR_NAME], "a", @progbits
      
      This reduces .data by ~15 kbytes:
      
          text    data     bss     dec      hex filename
      11097415 2705840 2630712 16433967  fac32f vmlinux-prev.o
      11112095 2690672 2630712 16433479  fac147 vmlinux.o
      
      Merged objects are visible in System.map:
      
      ffffffff81a28810 r POLY
      ffffffff81a28810 r POLY
      ffffffff81a28820 r TWOONE
      ffffffff81a28820 r TWOONE
      ffffffff81a28830 r PSHUFFLE_BYTE_FLIP_MASK <- merged regardless of
      ffffffff81a28830 r SHUF_MASK   <------------- the name difference
      ffffffff81a28830 r SHUF_MASK
      ffffffff81a28830 r SHUF_MASK
      ..
      ffffffff81a28d00 r K512 <- merged three identical 640-byte tables
      ffffffff81a28d00 r K512
      ffffffff81a28d00 r K512
      
      Use of object names in section name suffixes is not strictly necessary,
      but might help if someday link stage will use garbage collection
      to eliminate unused sections (ld --gc-sections).
      Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com>
      CC: Herbert Xu <herbert@gondor.apana.org.au>
      CC: Josh Poimboeuf <jpoimboe@redhat.com>
      CC: Xiaodong Liu <xiaodong.liu@intel.com>
      CC: Megha Dey <megha.dey@intel.com>
      CC: linux-crypto@vger.kernel.org
      CC: x86@kernel.org
      CC: linux-kernel@vger.kernel.org
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      e183914a
  3. 25 1月, 2016 1 次提交
  4. 17 7月, 2015 2 次提交
    • M
      crypto: chacha20 - Add a four block SSSE3 variant for x86_64 · 274f938e
      Martin Willi 提交于
      Extends the x86_64 SSSE3 ChaCha20 implementation by a function processing
      four ChaCha20 blocks in parallel. This avoids the word shuffling needed
      in the single block variant, further increasing throughput.
      
      For large messages, throughput increases by ~110% compared to single block
      SSSE3:
      
      testing speed of chacha20 (chacha20-simd) encryption
      test 0 (256 bit key, 16 byte blocks): 43141886 operations in 10 seconds (690270176 bytes)
      test 1 (256 bit key, 64 byte blocks): 46845874 operations in 10 seconds (2998135936 bytes)
      test 2 (256 bit key, 256 byte blocks): 18458512 operations in 10 seconds (4725379072 bytes)
      test 3 (256 bit key, 1024 byte blocks): 5360533 operations in 10 seconds (5489185792 bytes)
      test 4 (256 bit key, 8192 byte blocks): 692846 operations in 10 seconds (5675794432 bytes)
      
      testing speed of chacha20 (chacha20-simd) encryption
      test 0 (256 bit key, 16 byte blocks): 42249230 operations in 10 seconds (675987680 bytes)
      test 1 (256 bit key, 64 byte blocks): 46441641 operations in 10 seconds (2972265024 bytes)
      test 2 (256 bit key, 256 byte blocks): 33028112 operations in 10 seconds (8455196672 bytes)
      test 3 (256 bit key, 1024 byte blocks): 11568759 operations in 10 seconds (11846409216 bytes)
      test 4 (256 bit key, 8192 byte blocks): 1448761 operations in 10 seconds (11868250112 bytes)
      
      Benchmark results from a Core i5-4670T.
      Signed-off-by: NMartin Willi <martin@strongswan.org>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      274f938e
    • M
      crypto: chacha20 - Add a SSSE3 SIMD variant for x86_64 · c9320b6d
      Martin Willi 提交于
      Implements an x86_64 assembler driver for the ChaCha20 stream cipher. This
      single block variant works on a single state matrix using SSE instructions.
      It requires SSSE3 due the use of pshufb for efficient 8/16-bit rotate
      operations.
      
      For large messages, throughput increases by ~65% compared to
      chacha20-generic:
      
      testing speed of chacha20 (chacha20-generic) encryption
      test 0 (256 bit key, 16 byte blocks): 45089207 operations in 10 seconds (721427312 bytes)
      test 1 (256 bit key, 64 byte blocks): 43839521 operations in 10 seconds (2805729344 bytes)
      test 2 (256 bit key, 256 byte blocks): 12702056 operations in 10 seconds (3251726336 bytes)
      test 3 (256 bit key, 1024 byte blocks): 3371173 operations in 10 seconds (3452081152 bytes)
      test 4 (256 bit key, 8192 byte blocks): 422468 operations in 10 seconds (3460857856 bytes)
      
      testing speed of chacha20 (chacha20-simd) encryption
      test 0 (256 bit key, 16 byte blocks): 43141886 operations in 10 seconds (690270176 bytes)
      test 1 (256 bit key, 64 byte blocks): 46845874 operations in 10 seconds (2998135936 bytes)
      test 2 (256 bit key, 256 byte blocks): 18458512 operations in 10 seconds (4725379072 bytes)
      test 3 (256 bit key, 1024 byte blocks): 5360533 operations in 10 seconds (5489185792 bytes)
      test 4 (256 bit key, 8192 byte blocks): 692846 operations in 10 seconds (5675794432 bytes)
      
      Benchmark results from a Core i5-4670T.
      Signed-off-by: NMartin Willi <martin@strongswan.org>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      c9320b6d