1. 25 8月, 2018 1 次提交
    • D
      crypto: aesni - Use unaligned loads from gcm_context_data · e5b954e8
      Dave Watson 提交于
      A regression was reported bisecting to 1476db2d
      "Move HashKey computation from stack to gcm_context".  That diff
      moved HashKey computation from the stack, which was explicitly aligned
      in the asm, to a struct provided from the C code, depending on
      AESNI_ALIGN_ATTR for alignment.   It appears some compilers may not
      align this struct correctly, resulting in a crash on the movdqa
      instruction when attempting to encrypt or decrypt data.
      
      Fix by using unaligned loads for the HashKeys.  On modern
      hardware there is no perf difference between the unaligned and
      aligned loads.  All other accesses to gcm_context_data already use
      unaligned loads.
      Reported-by: NMauro Rossi <issor.oruam@gmail.com>
      Fixes: 1476db2d ("Move HashKey computation from stack to gcm_context")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NDave Watson <davejwatson@fb.com>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      e5b954e8
  2. 03 7月, 2018 1 次提交
    • J
      x86/asm/64: Use 32-bit XOR to zero registers · a7bea830
      Jan Beulich 提交于
      Some Intel CPUs don't recognize 64-bit XORs as zeroing idioms. Zeroing
      idioms don't require execution bandwidth, as they're being taken care
      of in the frontend (through register renaming). Use 32-bit XORs instead.
      Signed-off-by: NJan Beulich <jbeulich@suse.com>
      Cc: Alok Kataria <akataria@vmware.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: davem@davemloft.net
      Cc: herbert@gondor.apana.org.au
      Cc: pavel@ucw.cz
      Cc: rjw@rjwysocki.net
      Link: http://lkml.kernel.org/r/5B39FF1A02000078001CFB54@prv1-mh.provo.novell.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      a7bea830
  3. 22 2月, 2018 13 次提交
  4. 12 1月, 2018 1 次提交
  5. 28 12月, 2017 2 次提交
  6. 18 5月, 2017 2 次提交
  7. 23 1月, 2017 1 次提交
    • D
      crypto: x86 - make constants readonly, allow linker to merge them · e183914a
      Denys Vlasenko 提交于
      A lot of asm-optimized routines in arch/x86/crypto/ keep its
      constants in .data. This is wrong, they should be on .rodata.
      
      Mnay of these constants are the same in different modules.
      For example, 128-bit shuffle mask 0x000102030405060708090A0B0C0D0E0F
      exists in at least half a dozen places.
      
      There is a way to let linker merge them and use just one copy.
      The rules are as follows: mergeable objects of different sizes
      should not share sections. You can't put them all in one .rodata
      section, they will lose "mergeability".
      
      GCC puts its mergeable constants in ".rodata.cstSIZE" sections,
      or ".rodata.cstSIZE.<object_name>" if -fdata-sections is used.
      This patch does the same:
      
      	.section .rodata.cst16.SHUF_MASK, "aM", @progbits, 16
      
      It is important that all data in such section consists of
      16-byte elements, not larger ones, and there are no implicit
      use of one element from another.
      
      When this is not the case, use non-mergeable section:
      
      	.section .rodata[.VAR_NAME], "a", @progbits
      
      This reduces .data by ~15 kbytes:
      
          text    data     bss     dec      hex filename
      11097415 2705840 2630712 16433967  fac32f vmlinux-prev.o
      11112095 2690672 2630712 16433479  fac147 vmlinux.o
      
      Merged objects are visible in System.map:
      
      ffffffff81a28810 r POLY
      ffffffff81a28810 r POLY
      ffffffff81a28820 r TWOONE
      ffffffff81a28820 r TWOONE
      ffffffff81a28830 r PSHUFFLE_BYTE_FLIP_MASK <- merged regardless of
      ffffffff81a28830 r SHUF_MASK   <------------- the name difference
      ffffffff81a28830 r SHUF_MASK
      ffffffff81a28830 r SHUF_MASK
      ..
      ffffffff81a28d00 r K512 <- merged three identical 640-byte tables
      ffffffff81a28d00 r K512
      ffffffff81a28d00 r K512
      
      Use of object names in section name suffixes is not strictly necessary,
      but might help if someday link stage will use garbage collection
      to eliminate unused sections (ld --gc-sections).
      Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com>
      CC: Herbert Xu <herbert@gondor.apana.org.au>
      CC: Josh Poimboeuf <jpoimboe@redhat.com>
      CC: Xiaodong Liu <xiaodong.liu@intel.com>
      CC: Megha Dey <megha.dey@intel.com>
      CC: linux-crypto@vger.kernel.org
      CC: x86@kernel.org
      CC: linux-kernel@vger.kernel.org
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      e183914a
  8. 24 2月, 2016 2 次提交
    • J
      x86/asm/crypto: Create stack frames in crypto functions · 8691ccd7
      Josh Poimboeuf 提交于
      The crypto code has several callable non-leaf functions which don't
      honor CONFIG_FRAME_POINTER, which can result in bad stack traces.
      
      Create stack frames for them when CONFIG_FRAME_POINTER is enabled.
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Bernd Petrovitsch <bernd@petrovitsch.priv.at>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Chris J Arges <chris.j.arges@canonical.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michal Marek <mmarek@suse.cz>
      Cc: Namhyung Kim <namhyung@gmail.com>
      Cc: Pedro Alves <palves@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: live-patching@vger.kernel.org
      Link: http://lkml.kernel.org/r/6c20192bcf1102ae18ae5a242cabf30ce9b29895.1453405861.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      8691ccd7
    • J
      x86/asm/crypto: Move .Lbswap_mask data to .rodata section · 1253cab8
      Josh Poimboeuf 提交于
      stacktool reports the following warning:
      
        stacktool: arch/x86/crypto/aesni-intel_asm.o: _aesni_inc_init(): can't find starting instruction
      
      stacktool gets confused when it tries to disassemble the following data
      in the .text section:
      
        .Lbswap_mask:
                .byte 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0
      
      Move it to .rodata which is a more appropriate section for read-only
      data.
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Bernd Petrovitsch <bernd@petrovitsch.priv.at>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Chris J Arges <chris.j.arges@canonical.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michal Marek <mmarek@suse.cz>
      Cc: Namhyung Kim <namhyung@gmail.com>
      Cc: Pedro Alves <palves@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: live-patching@vger.kernel.org
      Link: http://lkml.kernel.org/r/b6a2f3f8bda705143e127c025edb2b53c86e6eb4.1453405861.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      1253cab8
  9. 14 1月, 2015 1 次提交
    • T
      crypto: aesni - Add support for 192 & 256 bit keys to AESNI RFC4106 · e31ac32d
      Timothy McCaffrey 提交于
      These patches fix the RFC4106 implementation in the aesni-intel
      module so it supports 192 & 256 bit keys.
      
      Since the AVX support that was added to this module also only
      supports 128 bit keys, and this patch only affects the SSE
      implementation, changes were also made to use the SSE version
      if key sizes other than 128 are specified.
      
      RFC4106 specifies that 192 & 256 bit keys must be supported (section
      8.4).
      
      Also, this should fix Strongswan issue 341 where the aesni module
      needs to be unloaded if 256 bit keys are used:
      
      http://wiki.strongswan.org/issues/341
      
      This patch has been tested with Sandy Bridge and Haswell processors.
      With 128 bit keys and input buffers > 512 bytes a slight performance
      degradation was noticed (~1%).  For input buffers of less than 512
      bytes there was no performance impact.  Compared to 128 bit keys,
      256 bit key size performance is approx. .5 cycles per byte slower
      on Sandy Bridge, and .37 cycles per byte slower on Haswell (vs.
      SSE code).
      
      This patch has also been tested with StrongSwan IPSec connections
      where it worked correctly.
      
      I created this diff from a git clone of crypto-2.6.git.
      
      Any questions, please feel free to contact me.
      Signed-off-by: NTimothy McCaffrey <timothy.mccaffrey@unisys.com>
      Signed-off-by: NJarod Wilson <jarod@redhat.com>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      e31ac32d
  10. 13 6月, 2013 1 次提交
  11. 25 4月, 2013 1 次提交
    • J
      crypto: aesni_intel - add more optimized XTS mode for x86-64 · c456a9cd
      Jussi Kivilinna 提交于
      Add more optimized XTS code for aesni_intel in 64-bit mode, for smaller stack
      usage and boost for speed.
      
      tcrypt results, with Intel i5-2450M:
      256-bit key
              enc     dec
      16B     0.98x   0.99x
      64B     0.64x   0.63x
      256B    1.29x   1.32x
      1024B   1.54x   1.58x
      8192B   1.57x   1.60x
      
      512-bit key
              enc     dec
      16B     0.98x   0.99x
      64B     0.60x   0.59x
      256B    1.24x   1.25x
      1024B   1.39x   1.42x
      8192B   1.38x   1.42x
      
      I chose not to optimize smaller than block size of 256 bytes, since XTS is
      practically always used with data blocks of size 512 bytes. This is why
      performance is reduced in tcrypt for 64 byte long blocks.
      
      Cc: Huang Ying <ying.huang@intel.com>
      Signed-off-by: NJussi Kivilinna <jussi.kivilinna@iki.fi>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      c456a9cd
  12. 20 1月, 2013 1 次提交
  13. 31 5月, 2012 1 次提交
  14. 27 3月, 2011 1 次提交
  15. 18 3月, 2011 1 次提交
  16. 13 12月, 2010 1 次提交
  17. 29 11月, 2010 1 次提交
  18. 27 11月, 2010 1 次提交
    • M
      crypto: aesni-intel - Ported implementation to x86-32 · 0d258efb
      Mathias Krause 提交于
      The AES-NI instructions are also available in legacy mode so the 32-bit
      architecture may profit from those, too.
      
      To illustrate the performance gain here's a short summary of a dm-crypt
      speed test on a Core i7 M620 running at 2.67GHz comparing both assembler
      implementations:
      
      x86:                   i568       aes-ni    delta
      ECB, 256 bit:     93.8 MB/s   123.3 MB/s   +31.4%
      CBC, 256 bit:     84.8 MB/s   262.3 MB/s  +209.3%
      LRW, 256 bit:    108.6 MB/s   222.1 MB/s  +104.5%
      XTS, 256 bit:    105.0 MB/s   205.5 MB/s   +95.7%
      
      Additionally, due to some minor optimizations, the 64-bit version also
      got a minor performance gain as seen below:
      
      x86-64:           old impl.    new impl.    delta
      ECB, 256 bit:    121.1 MB/s   123.0 MB/s    +1.5%
      CBC, 256 bit:    285.3 MB/s   290.8 MB/s    +1.9%
      LRW, 256 bit:    263.7 MB/s   265.3 MB/s    +0.6%
      XTS, 256 bit:    251.1 MB/s   255.3 MB/s    +1.7%
      Signed-off-by: NMathias Krause <minipli@googlemail.com>
      Reviewed-by: NHuang Ying <ying.huang@intel.com>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      0d258efb
  19. 13 11月, 2010 1 次提交
  20. 13 3月, 2010 1 次提交
  21. 10 3月, 2010 1 次提交
  22. 23 11月, 2009 1 次提交
  23. 18 6月, 2009 1 次提交
  24. 18 2月, 2009 1 次提交