1. 19 5月, 2015 3 次提交
    • I
      x86/fpu: Rename fpu/xsave.h to fpu/xstate.h · 669ebabb
      Ingo Molnar 提交于
      'xsave' is an x86 instruction name to most people - but xsave.h is
      about a lot more than just the XSAVE instruction: it includes
      definitions and support, both internal and external, related to
      xstate and xfeatures support.
      
      As a first step in cleaning up the various xstate uses rename this
      header to 'fpu/xstate.h' to better reflect what this header file
      is about.
      
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      669ebabb
    • I
      x86/fpu: Move xsave.h to fpu/xsave.h · a137fb6b
      Ingo Molnar 提交于
      Move the xsave.h header file to the FPU directory as well.
      Reviewed-by: NBorislav Petkov <bp@alien8.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      a137fb6b
    • I
      x86/fpu: Rename i387.h to fpu/api.h · df6b35f4
      Ingo Molnar 提交于
      We already have fpu/types.h, move i387.h to fpu/api.h.
      
      The file name has become a misnomer anyway: it offers generic FPU APIs,
      but is not limited to i387 functionality.
      Reviewed-by: NBorislav Petkov <bp@alien8.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      df6b35f4
  2. 10 4月, 2015 1 次提交
  3. 24 11月, 2014 1 次提交
  4. 25 3月, 2014 1 次提交
  5. 21 3月, 2014 1 次提交
    • C
      crypto: sha - SHA1 transform x86_64 AVX2 · 7c1da8d0
      chandramouli narayanan 提交于
      This git patch adds x86_64 AVX2 optimization of SHA1
      transform to crypto support. The patch has been tested with 3.14.0-rc1
      kernel.
      
      On a Haswell desktop, with turbo disabled and all cpus running
      at maximum frequency, tcrypt shows AVX2 performance improvement
      from 3% for 256 bytes update to 16% for 1024 bytes update over
      AVX implementation.
      
      This patch adds sha1_avx2_transform(), the glue, build and
      configuration changes needed for AVX2 optimization of
      SHA1 transform to crypto support.
      
      sha1-ssse3 is one module which adds the necessary optimization
      support (SSSE3/AVX/AVX2) for the low-level SHA1 transform function.
      With better optimization support, transform function is overridden
      as the case may be. In the case of AVX2, due to performance reasons
      across datablock sizes, the AVX or AVX2 transform function is used
      at run-time as it suits best. The Makefile change therefore appends
      the necessary objects to the linkage. Due to this, the patch merely
      appends AVX2 transform to the existing build mix and Kconfig support
      and leaves the configuration build support as is.
      Signed-off-by: NChandramouli Narayanan <mouli@linux.intel.com>
      Reviewed-by: NMarek Vasut <marex@denx.de>
      Acked-by: NH. Peter Anvin <hpa@linux.intel.com>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      7c1da8d0
  6. 12 6月, 2012 1 次提交
  7. 10 8月, 2011 1 次提交
    • M
      crypto: sha1 - SSSE3 based SHA1 implementation for x86-64 · 66be8951
      Mathias Krause 提交于
      This is an assembler implementation of the SHA1 algorithm using the
      Supplemental SSE3 (SSSE3) instructions or, when available, the
      Advanced Vector Extensions (AVX).
      
      Testing with the tcrypt module shows the raw hash performance is up to
      2.3 times faster than the C implementation, using 8k data blocks on a
      Core 2 Duo T5500. For the smalest data set (16 byte) it is still 25%
      faster.
      
      Since this implementation uses SSE/YMM registers it cannot safely be
      used in every situation, e.g. while an IRQ interrupts a kernel thread.
      The implementation falls back to the generic SHA1 variant, if using
      the SSE/YMM registers is not possible.
      
      With this algorithm I was able to increase the throughput of a single
      IPsec link from 344 Mbit/s to 464 Mbit/s on a Core 2 Quad CPU using
      the SSSE3 variant -- a speedup of +34.8%.
      
      Saving and restoring SSE/YMM state might make the actual throughput
      fluctuate when there are FPU intensive userland applications running.
      For example, meassuring the performance using iperf2 directly on the
      machine under test gives wobbling numbers because iperf2 uses the FPU
      for each packet to check if the reporting interval has expired (in the
      above test I got min/max/avg: 402/484/464 MBit/s).
      
      Using this algorithm on a IPsec gateway gives much more reasonable and
      stable numbers, albeit not as high as in the directly connected case.
      Here is the result from an RFC 2544 test run with a EXFO Packet Blazer
      FTB-8510:
      
       frame size    sha1-generic     sha1-ssse3    delta
          64 byte     37.5 MBit/s    37.5 MBit/s     0.0%
         128 byte     56.3 MBit/s    62.5 MBit/s   +11.0%
         256 byte     87.5 MBit/s   100.0 MBit/s   +14.3%
         512 byte    131.3 MBit/s   150.0 MBit/s   +14.2%
        1024 byte    162.5 MBit/s   193.8 MBit/s   +19.3%
        1280 byte    175.0 MBit/s   212.5 MBit/s   +21.4%
        1420 byte    175.0 MBit/s   218.7 MBit/s   +25.0%
        1518 byte    150.0 MBit/s   181.2 MBit/s   +20.8%
      
      The throughput for the largest frame size is lower than for the
      previous size because the IP packets need to be fragmented in this
      case to make there way through the IPsec tunnel.
      Signed-off-by: NMathias Krause <minipli@googlemail.com>
      Cc: Maxim Locktyukhin <maxim.locktyukhin@intel.com>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      66be8951