提交 · 669ebabb79906302ba6e6922a683893788a134e8 · openeuler / Kernel

19 5月, 2015 3 次提交

x86/fpu: Rename fpu/xsave.h to fpu/xstate.h · 669ebabb

由 Ingo Molnar 提交于 4月 28, 2015

'xsave' is an x86 instruction name to most people - but xsave.h is
about a lot more than just the XSAVE instruction: it includes
definitions and support, both internal and external, related to
xstate and xfeatures support.

As a first step in cleaning up the various xstate uses rename this
header to 'fpu/xstate.h' to better reflect what this header file
is about.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: NIngo Molnar <mingo@kernel.org>

669ebabb

x86/fpu: Move xsave.h to fpu/xsave.h · a137fb6b

由 Ingo Molnar 提交于 4月 27, 2015

Move the xsave.h header file to the FPU directory as well.
Reviewed-by: NBorislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: NIngo Molnar <mingo@kernel.org>

a137fb6b

x86/fpu: Rename i387.h to fpu/api.h · df6b35f4

由 Ingo Molnar 提交于 4月 24, 2015

We already have fpu/types.h, move i387.h to fpu/api.h.

The file name has become a misnomer anyway: it offers generic FPU APIs,
but is not limited to i387 functionality.
Reviewed-by: NBorislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: NIngo Molnar <mingo@kernel.org>

df6b35f4

10 4月, 2015 1 次提交

crypto: x86/sha1_ssse3 - move SHA-1 SSSE3 implementation to base layer · 824b4376

由 Ard Biesheuvel 提交于 4月 09, 2015

This removes all the boilerplate from the existing implementation,
and replaces it with calls into the base layer.
Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

824b4376

24 11月, 2014 1 次提交

crypto: prefix module autoloading with "crypto-" · 5d26a105

由 Kees Cook 提交于 11月 20, 2014

This prefixes all crypto module loading with "crypto-" so we never run
the risk of exposing module auto-loading to userspace via a crypto API,
as demonstrated by Mathias Krause:

https://lkml.org/lkml/2013/3/4/70Signed-off-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

5d26a105

25 3月, 2014 1 次提交

crypto: x86/sha1 - re-enable the AVX variant · 6ca5afb8

由 Mathias Krause 提交于 3月 24, 2014

Commit 7c1da8d0 "crypto: sha - SHA1 transform x86_64 AVX2"
accidentally disabled the AVX variant by making the avx_usable() test
not only fail in case the CPU doesn't support AVX or OSXSAVE but also
if it doesn't support AVX2.

Fix that regression by splitting up the AVX/AVX2 test into two
functions. Also test for the BMI1 extension in the avx2_usable() test
as the AVX2 implementation not only makes use of BMI2 but also BMI1
instructions.

Cc: Chandramouli Narayanan <mouli@linux.intel.com>
Signed-off-by: NMathias Krause <minipli@googlemail.com>
Reviewed-by: NH. Peter Anvin <hpa@linux.intel.com>
Reviewed-by: NMarek Vasut <marex@denx.de>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

6ca5afb8

21 3月, 2014 1 次提交

crypto: sha - SHA1 transform x86_64 AVX2 · 7c1da8d0

由 chandramouli narayanan 提交于 3月 20, 2014

This git patch adds x86_64 AVX2 optimization of SHA1
transform to crypto support. The patch has been tested with 3.14.0-rc1
kernel.

On a Haswell desktop, with turbo disabled and all cpus running
at maximum frequency, tcrypt shows AVX2 performance improvement
from 3% for 256 bytes update to 16% for 1024 bytes update over
AVX implementation.

This patch adds sha1_avx2_transform(), the glue, build and
configuration changes needed for AVX2 optimization of
SHA1 transform to crypto support.

sha1-ssse3 is one module which adds the necessary optimization
support (SSSE3/AVX/AVX2) for the low-level SHA1 transform function.
With better optimization support, transform function is overridden
as the case may be. In the case of AVX2, due to performance reasons
across datablock sizes, the AVX or AVX2 transform function is used
at run-time as it suits best. The Makefile change therefore appends
the necessary objects to the linkage. Due to this, the patch merely
appends AVX2 transform to the existing build mix and Kconfig support
and leaves the configuration build support as is.
Signed-off-by: NChandramouli Narayanan <mouli@linux.intel.com>
Reviewed-by: NMarek Vasut <marex@denx.de>
Acked-by: NH. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

7c1da8d0

12 6月, 2012 1 次提交

crypto: sha1 - use Kbuild supplied flags for AVX test · 65df5774

由 Mathias Krause 提交于 5月 24, 2012

Commit ea4d26ae ("raid5: add AVX optimized RAID5 checksumming")
introduced x86/ arch wide defines for AFLAGS and CFLAGS indicating AVX
support in binutils based on the same test we have in x86/crypto/ right
now. To minimize duplication drop our implementation in favour to the
one in x86/.
Signed-off-by: NMathias Krause <minipli@googlemail.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

65df5774

10 8月, 2011 1 次提交

crypto: sha1 - SSSE3 based SHA1 implementation for x86-64 · 66be8951

由 Mathias Krause 提交于 8月 04, 2011

This is an assembler implementation of the SHA1 algorithm using the
Supplemental SSE3 (SSSE3) instructions or, when available, the
Advanced Vector Extensions (AVX).

Testing with the tcrypt module shows the raw hash performance is up to
2.3 times faster than the C implementation, using 8k data blocks on a
Core 2 Duo T5500. For the smalest data set (16 byte) it is still 25%
faster.

Since this implementation uses SSE/YMM registers it cannot safely be
used in every situation, e.g. while an IRQ interrupts a kernel thread.
The implementation falls back to the generic SHA1 variant, if using
the SSE/YMM registers is not possible.

With this algorithm I was able to increase the throughput of a single
IPsec link from 344 Mbit/s to 464 Mbit/s on a Core 2 Quad CPU using
the SSSE3 variant -- a speedup of +34.8%.

Saving and restoring SSE/YMM state might make the actual throughput
fluctuate when there are FPU intensive userland applications running.
For example, meassuring the performance using iperf2 directly on the
machine under test gives wobbling numbers because iperf2 uses the FPU
for each packet to check if the reporting interval has expired (in the
above test I got min/max/avg: 402/484/464 MBit/s).

Using this algorithm on a IPsec gateway gives much more reasonable and
stable numbers, albeit not as high as in the directly connected case.
Here is the result from an RFC 2544 test run with a EXFO Packet Blazer
FTB-8510:

 frame size    sha1-generic     sha1-ssse3    delta
    64 byte     37.5 MBit/s    37.5 MBit/s     0.0%
   128 byte     56.3 MBit/s    62.5 MBit/s   +11.0%
   256 byte     87.5 MBit/s   100.0 MBit/s   +14.3%
   512 byte    131.3 MBit/s   150.0 MBit/s   +14.2%
  1024 byte    162.5 MBit/s   193.8 MBit/s   +19.3%
  1280 byte    175.0 MBit/s   212.5 MBit/s   +21.4%
  1420 byte    175.0 MBit/s   218.7 MBit/s   +25.0%
  1518 byte    150.0 MBit/s   181.2 MBit/s   +20.8%

The throughput for the largest frame size is lower than for the
previous size because the IP packets need to be fragmented in this
case to make there way through the IPsec tunnel.
Signed-off-by: NMathias Krause <minipli@googlemail.com>
Cc: Maxim Locktyukhin <maxim.locktyukhin@intel.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

66be8951

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功