提交 · 0b8c960cf6defc56b3aa1a71b5af95872b6dff2b · openanolis / cloud-kernel

05 1月, 2015 2 次提交

crypto: sha-mb - Add avx2_supported check. · 0b8c960c

由 Vinson Lee 提交于 12月 29, 2014

This patch fixes this allyesconfig target build error with older
binutils.

  LD      arch/x86/crypto/built-in.o
ld: arch/x86/crypto/sha-mb/built-in.o: No such file: No such file or directory

Cc: stable@vger.kernel.org # 3.18+
Signed-off-by: NVinson Lee <vlee@twitter.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

0b8c960c

crypto: aesni - fix "by8" variant for 128 bit keys · 0b1e95b2

由 Mathias Krause 提交于 12月 30, 2014

The "by8" counter mode optimization is broken for 128 bit keys with
input data longer than 128 bytes. It uses the wrong key material for
en- and decryption.

The key registers xkey0, xkey4, xkey8 and xkey12 need to be preserved
in case we're handling more than 128 bytes of input data -- they won't
get reloaded after the initial load. They must therefore be (a) loaded
on the first iteration and (b) be preserved for the latter ones. The
implementation for 128 bit keys does not comply with (a) nor (b).

Fix this by bringing the implementation back to its original source
and correctly load the key registers and preserve their values by
*not* re-using the registers for other purposes.

Kudos to James for reporting the issue and providing a test case
showing the discrepancies.
Reported-by: NJames Yonan <james@openvpn.net>
Cc: Chandramouli Narayanan <mouli@linux.intel.com>
Cc: <stable@vger.kernel.org> # v3.18
Signed-off-by: NMathias Krause <minipli@googlemail.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

0b1e95b2

02 12月, 2014 1 次提交

crypto: sha - replace memset by memzero_explicit · a6326ba0

由 Julia Lawall 提交于 11月 30, 2014

Memset on a local variable may be removed when it is called just before the
variable goes out of scope.  Using memzero_explicit defeats this
optimization.  A simplified version of the semantic patch that makes this
change is as follows: (http://coccinelle.lip6.fr/)

// <smpl>
@@
identifier x;
type T;
@@

{
... when any
T x[...];
... when any
    when exists
- memset
+ memzero_explicit
  (x,
-0,
  ...)
... when != x
    when strict
}
// </smpl>

This change was suggested by Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: NJulia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

a6326ba0

26 11月, 2014 1 次提交

crypto: include crypto- module prefix in template · 4943ba16

由 Kees Cook 提交于 11月 24, 2014

This adds the module loading prefix "crypto-" to the template lookup
as well.

For example, attempting to load 'vfat(blowfish)' via AF_ALG now correctly
includes the "crypto-" prefix at every level, correctly rejecting "vfat":

	net-pf-38
	algif-hash
	crypto-vfat(blowfish)
	crypto-vfat(blowfish)-all
	crypto-vfat
Reported-by: NMathias Krause <minipli@googlemail.com>
Signed-off-by: NKees Cook <keescook@chromium.org>
Acked-by: NMathias Krause <minipli@googlemail.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

4943ba16

25 11月, 2014 1 次提交

crypto: sha-mb - remove a bogus NULL check · 5d1b3c98

由 Dan Carpenter 提交于 11月 22, 2014

This can't be NULL and we dereferenced it earlier.  Smatch used to
ignore these things where the pointer was obviously non-NULL but I've
found that sometimes the intention was to check something else so we
were maybe missing bugs.
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Acked-by: NTim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

5d1b3c98

24 11月, 2014 1 次提交

crypto: prefix module autoloading with "crypto-" · 5d26a105

由 Kees Cook 提交于 11月 20, 2014

This prefixes all crypto module loading with "crypto-" so we never run
the risk of exposing module auto-loading to userspace via a crypto API,
as demonstrated by Mathias Krause:

https://lkml.org/lkml/2013/3/4/70Signed-off-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

5d26a105

06 11月, 2014 1 次提交

crypto: aesni - remove unnecessary #define · 304576a7

由 Valentin Rothberg 提交于 10月 21, 2014

The CPP identifier 'HAS_PCBC' is defined when the Kconfig
option CRYPTO_PCBC is set as 'y' or 'm', and is further
used in two ifdef blocks to conditionally compile source
code. This indirection hides the actual Kconfig dependency
and complicates readability. Moreover, it's inconsistent
with the rest of the ifdef blocks in the file, which
directly reference Kconfig options.

This patch removes 'HAS_PCBC' and replaces its occurrences
with the actual dependency on 'CRYPTO_PCBC' being set as
'y' or 'm'.
Signed-off-by: NValentin Rothberg <valentinrothberg@gmail.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

304576a7

02 10月, 2014 3 次提交

Revert "crypto: aesni - disable "by8" AVX CTR optimization" · 5cfed7b3

由 Mathias Krause 提交于 9月 28, 2014

This reverts commit 7da4b29d.

Now, that the issue is fixed, we can re-enable the code.
Signed-off-by: NMathias Krause <minipli@googlemail.com>
Cc: Chandramouli Narayanan <mouli@linux.intel.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

5cfed7b3

crypto: aesni - remove unused defines in "by8" variant · e3b3bb5a

由 Mathias Krause 提交于 9月 28, 2014

The defines for xkey3, xkey6 and xkey9 are not used in the code. They're
probably left overs from merging the three source files for 128, 192 and
256 bit AES. They can safely be removed.
Signed-off-by: NMathias Krause <minipli@googlemail.com>
Cc: Chandramouli Narayanan <mouli@linux.intel.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

e3b3bb5a

crypto: aesni - fix counter overflow handling in "by8" variant · 80dca473

由 Mathias Krause 提交于 9月 28, 2014

The "by8" CTR AVX implementation fails to propperly handle counter
overflows. That was the reason it got disabled in commit 7da4b29d
("crypto: aesni - disable "by8" AVX CTR optimization").

Fix the overflow handling by incrementing the counter block as a double
quad word, i.e. a 128 bit, and testing for overflows afterwards. We need
to use VPTEST to do so as VPADD* does not set the flags itself and
silently drops the carry bit.

As this change adds branches to the hot path, minor performance
regressions  might be a side effect. But, OTOH, we now have a conforming
implementation -- the preferable goal.

A tcrypt test on a SandyBridge system (i7-2620M) showed almost identical
numbers for the old and this version with differences within the noise
range. A dm-crypt test with the fixed version gave even slightly better
results for this version. So the performance impact might not be as big
as expected.
Tested-by: NRomain Francoise <romain@orebokech.com>
Signed-off-by: NMathias Krause <minipli@googlemail.com>
Cc: Chandramouli Narayanan <mouli@linux.intel.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

80dca473

24 9月, 2014 1 次提交

crypto: aesni - disable "by8" AVX CTR optimization · 7da4b29d

由 Mathias Krause 提交于 9月 23, 2014

The "by8" implementation introduced in commit 22cddcc7 ("crypto: aes
- AES CTR x86_64 "by8" AVX optimization") is failing crypto tests as it
handles counter block overflows differently. It only accounts the right
most 32 bit as a counter -- not the whole block as all other
implementations do. This makes it fail the cryptomgr test #4 that
specifically tests this corner case.

As we're quite late in the release cycle, just disable the "by8" variant
for now.
Reported-by: NRomain Francoise <romain@orebokech.com>
Signed-off-by: NMathias Krause <minipli@googlemail.com>
Cc: Chandramouli Narayanan <mouli@linux.intel.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

7da4b29d

26 8月, 2014 1 次提交

crypto: sha-mb - sha1_mb_alg_state can be static · 4c1948fc

由 Fengguang Wu 提交于 8月 26, 2014

CC: Tim Chen <tim.c.chen@linux.intel.com>
CC: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NFengguang Wu <fengguang.wu@intel.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

4c1948fc

25 8月, 2014 4 次提交

crypto: sha-mb - SHA1 multibuffer job manager and glue code · ad61e042

由 Tim Chen 提交于 7月 31, 2014

This patch introduces the multi-buffer job manager which is responsible
for submitting scatter-gather buffers from several SHA1 jobs to the
multi-buffer algorithm.  It also contains the flush routine to that's
called by the crypto daemon to complete the job when no new jobs arrive
before the deadline of maximum latency of a SHA1 crypto job.

The SHA1 multi-buffer crypto algorithm is defined and initialized in
this patch.
Signed-off-by: NTim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

ad61e042

crypto: sha-mb - SHA1 multibuffer crypto computation (x8 AVX2) · 12d2513d

由 Tim Chen 提交于 7月 31, 2014

This patch introduces the assembly routines to do SHA1 computation on
buffers belonging to serveral jobs at once.  The assembly routines are
optimized with AVX2 instructions that have 8 data lanes and using AVX2
registers.
Signed-off-by: NTim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

12d2513d

crypto: sha-mb - SHA1 multibuffer submit and flush routines for AVX2 · 2249cbb5

由 Tim Chen 提交于 7月 31, 2014

This patch introduces the routines used to submit and flush buffers
belonging to SHA1 crypto jobs to the SHA1 multibuffer algorithm. It is
implemented mostly in assembly optimized with AVX2 instructions.
Signed-off-by: NTim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

2249cbb5

crypto: sha-mb - SHA1 multibuffer algorithm data structures · 11617778

由 Tim Chen 提交于 7月 31, 2014

This patch introduces the data structures and prototypes of functions
needed for computing SHA1 hash using multi-buffer.  Included are the
structures of the multi-buffer SHA1 job, job scheduler in C and x86
assembly.
Signed-off-by: NTim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

11617778

25 6月, 2014 2 次提交

crypto: sha512_ssse3 - fix byte count to bit count conversion · cfe82d4f

由 Jussi Kivilinna 提交于 6月 23, 2014

Byte-to-bit-count computation is only partly converted to big-endian and is
mixing in CPU-endian values. Problem was noticed by sparce with warning:

CHECK arch/x86/crypto/sha512_ssse3_glue.c
arch/x86/crypto/sha512_ssse3_glue.c:144:19: warning: restricted __be64 degrades to integer
arch/x86/crypto/sha512_ssse3_glue.c:144:17: warning: incorrect type in assignment (different base types)
arch/x86/crypto/sha512_ssse3_glue.c:144:17: expected restricted __be64 <noident>
arch/x86/crypto/sha512_ssse3_glue.c:144:17: got unsigned long long

Cc: <stable@vger.kernel.org>
Signed-off-by: NJussi Kivilinna <jussi.kivilinna@iki.fi>
Acked-by: NTim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

cfe82d4f

crypto: des3_ede-x86_64 - fix parse warning · 5e50d43d

由 Jussi Kivilinna 提交于 6月 23, 2014

Patch fixes following sparse warning:

CHECK arch/x86/crypto/des3_ede_glue.c
arch/x86/crypto/des3_ede_glue.c:308:52: warning: restricted __be64 degrades to integer
arch/x86/crypto/des3_ede_glue.c:309:52: warning: restricted __be64 degrades to integer
arch/x86/crypto/des3_ede_glue.c:310:52: warning: restricted __be64 degrades to integer
arch/x86/crypto/des3_ede_glue.c:326:44: warning: restricted __be64 degrades to integer
Reported-by: Nkbuild test robot <fengguang.wu@intel.com>
Signed-off-by: NJussi Kivilinna <jussi.kivilinna@iki.fi>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

5e50d43d

20 6月, 2014 3 次提交

crypto: aes - AES CTR x86_64 "by8" AVX optimization · 22cddcc7

由 chandramouli narayanan 提交于 6月 10, 2014

This patch introduces "by8" AES CTR mode AVX optimization inspired by
Intel Optimized IPSEC Cryptograhpic library. For additional information,
please see:
http://downloadcenter.intel.com/Detail_Desc.aspx?agr=Y&DwnldID=22972

The functions aes_ctr_enc_128_avx_by8(), aes_ctr_enc_192_avx_by8() and
aes_ctr_enc_256_avx_by8() are adapted from
Intel Optimized IPSEC Cryptographic library. When both AES and AVX features
are enabled in a platform, the glue code in AESNI module overrieds the
existing "by4" CTR mode en/decryption with the "by8"
AES CTR mode en/decryption.

On a Haswell desktop, with turbo disabled and all cpus running
at maximum frequency, the "by8" CTR mode optimization
shows better performance results across data & key sizes
as measured by tcrypt.

The average performance improvement of the "by8" version over the "by4"
version is as follows:

For 128 bit key and data sizes >= 256 bytes, there is a 10-16% improvement.
For 192 bit key and data sizes >= 256 bytes, there is a 20-22% improvement.
For 256 bit key and data sizes >= 256 bytes, there is a 20-25% improvement.

A typical run of tcrypt with AES CTR mode encryption of the "by4" and "by8"
optimization shows the following results:

tcrypt with "by4" AES CTR mode encryption optimization on a Haswell Desktop:
---------------------------------------------------------------------------

testing speed of __ctr-aes-aesni encryption
test 0 (128 bit key, 16 byte blocks): 1 operation in 343 cycles (16 bytes)
test 1 (128 bit key, 64 byte blocks): 1 operation in 336 cycles (64 bytes)
test 2 (128 bit key, 256 byte blocks): 1 operation in 491 cycles (256 bytes)
test 3 (128 bit key, 1024 byte blocks): 1 operation in 1130 cycles (1024 bytes)
test 4 (128 bit key, 8192 byte blocks): 1 operation in 7309 cycles (8192 bytes)
test 5 (192 bit key, 16 byte blocks): 1 operation in 346 cycles (16 bytes)
test 6 (192 bit key, 64 byte blocks): 1 operation in 361 cycles (64 bytes)
test 7 (192 bit key, 256 byte blocks): 1 operation in 543 cycles (256 bytes)
test 8 (192 bit key, 1024 byte blocks): 1 operation in 1321 cycles (1024 bytes)
test 9 (192 bit key, 8192 byte blocks): 1 operation in 9649 cycles (8192 bytes)
test 10 (256 bit key, 16 byte blocks): 1 operation in 369 cycles (16 bytes)
test 11 (256 bit key, 64 byte blocks): 1 operation in 366 cycles (64 bytes)
test 12 (256 bit key, 256 byte blocks): 1 operation in 595 cycles (256 bytes)
test 13 (256 bit key, 1024 byte blocks): 1 operation in 1531 cycles (1024 bytes)
test 14 (256 bit key, 8192 byte blocks): 1 operation in 10522 cycles (8192 bytes)

testing speed of __ctr-aes-aesni decryption
test 0 (128 bit key, 16 byte blocks): 1 operation in 336 cycles (16 bytes)
test 1 (128 bit key, 64 byte blocks): 1 operation in 350 cycles (64 bytes)
test 2 (128 bit key, 256 byte blocks): 1 operation in 487 cycles (256 bytes)
test 3 (128 bit key, 1024 byte blocks): 1 operation in 1129 cycles (1024 bytes)
test 4 (128 bit key, 8192 byte blocks): 1 operation in 7287 cycles (8192 bytes)
test 5 (192 bit key, 16 byte blocks): 1 operation in 350 cycles (16 bytes)
test 6 (192 bit key, 64 byte blocks): 1 operation in 359 cycles (64 bytes)
test 7 (192 bit key, 256 byte blocks): 1 operation in 635 cycles (256 bytes)
test 8 (192 bit key, 1024 byte blocks): 1 operation in 1324 cycles (1024 bytes)
test 9 (192 bit key, 8192 byte blocks): 1 operation in 9595 cycles (8192 bytes)
test 10 (256 bit key, 16 byte blocks): 1 operation in 364 cycles (16 bytes)
test 11 (256 bit key, 64 byte blocks): 1 operation in 377 cycles (64 bytes)
test 12 (256 bit key, 256 byte blocks): 1 operation in 604 cycles (256 bytes)
test 13 (256 bit key, 1024 byte blocks): 1 operation in 1527 cycles (1024 bytes)
test 14 (256 bit key, 8192 byte blocks): 1 operation in 10549 cycles (8192 bytes)

tcrypt with "by8" AES CTR mode encryption optimization on a Haswell Desktop:
---------------------------------------------------------------------------

testing speed of __ctr-aes-aesni encryption
test 0 (128 bit key, 16 byte blocks): 1 operation in 340 cycles (16 bytes)
test 1 (128 bit key, 64 byte blocks): 1 operation in 330 cycles (64 bytes)
test 2 (128 bit key, 256 byte blocks): 1 operation in 450 cycles (256 bytes)
test 3 (128 bit key, 1024 byte blocks): 1 operation in 1043 cycles (1024 bytes)
test 4 (128 bit key, 8192 byte blocks): 1 operation in 6597 cycles (8192 bytes)
test 5 (192 bit key, 16 byte blocks): 1 operation in 339 cycles (16 bytes)
test 6 (192 bit key, 64 byte blocks): 1 operation in 352 cycles (64 bytes)
test 7 (192 bit key, 256 byte blocks): 1 operation in 539 cycles (256 bytes)
test 8 (192 bit key, 1024 byte blocks): 1 operation in 1153 cycles (1024 bytes)
test 9 (192 bit key, 8192 byte blocks): 1 operation in 8458 cycles (8192 bytes)
test 10 (256 bit key, 16 byte blocks): 1 operation in 353 cycles (16 bytes)
test 11 (256 bit key, 64 byte blocks): 1 operation in 360 cycles (64 bytes)
test 12 (256 bit key, 256 byte blocks): 1 operation in 512 cycles (256 bytes)
test 13 (256 bit key, 1024 byte blocks): 1 operation in 1277 cycles (1024 bytes)
test 14 (256 bit key, 8192 byte blocks): 1 operation in 8745 cycles (8192 bytes)

testing speed of __ctr-aes-aesni decryption
test 0 (128 bit key, 16 byte blocks): 1 operation in 348 cycles (16 bytes)
test 1 (128 bit key, 64 byte blocks): 1 operation in 335 cycles (64 bytes)
test 2 (128 bit key, 256 byte blocks): 1 operation in 451 cycles (256 bytes)
test 3 (128 bit key, 1024 byte blocks): 1 operation in 1030 cycles (1024 bytes)
test 4 (128 bit key, 8192 byte blocks): 1 operation in 6611 cycles (8192 bytes)
test 5 (192 bit key, 16 byte blocks): 1 operation in 354 cycles (16 bytes)
test 6 (192 bit key, 64 byte blocks): 1 operation in 346 cycles (64 bytes)
test 7 (192 bit key, 256 byte blocks): 1 operation in 488 cycles (256 bytes)
test 8 (192 bit key, 1024 byte blocks): 1 operation in 1154 cycles (1024 bytes)
test 9 (192 bit key, 8192 byte blocks): 1 operation in 8390 cycles (8192 bytes)
test 10 (256 bit key, 16 byte blocks): 1 operation in 357 cycles (16 bytes)
test 11 (256 bit key, 64 byte blocks): 1 operation in 362 cycles (64 bytes)
test 12 (256 bit key, 256 byte blocks): 1 operation in 515 cycles (256 bytes)
test 13 (256 bit key, 1024 byte blocks): 1 operation in 1284 cycles (1024 bytes)
test 14 (256 bit key, 8192 byte blocks): 1 operation in 8681 cycles (8192 bytes)

crypto: Incorporate feed back to AES CTR mode optimization patch

Specifically, the following:
a) alignment around main loop in aes_ctrby8_avx_x86_64.S
b) .rodata around data constants used in the assembely code.
c) the use of CONFIG_AVX in the glue code.
d) fix up white space.
e) informational message for "by8" AES CTR mode optimization
f) "by8" AES CTR mode optimization can be simply enabled
if the platform supports both AES and AVX features. The
optimization works superbly on Sandybridge as well.

Testing on Haswell shows no performance change since the last.

Testing on Sandybridge shows that the "by8" AES CTR mode optimization
greatly improves performance.

tcrypt log with "by4" AES CTR mode optimization on Sandybridge
--------------------------------------------------------------

testing speed of __ctr-aes-aesni encryption
test 0 (128 bit key, 16 byte blocks): 1 operation in 383 cycles (16 bytes)
test 1 (128 bit key, 64 byte blocks): 1 operation in 408 cycles (64 bytes)
test 2 (128 bit key, 256 byte blocks): 1 operation in 707 cycles (256 bytes)
test 3 (128 bit key, 1024 byte blocks): 1 operation in 1864 cycles (1024 bytes)
test 4 (128 bit key, 8192 byte blocks): 1 operation in 12813 cycles (8192 bytes)
test 5 (192 bit key, 16 byte blocks): 1 operation in 395 cycles (16 bytes)
test 6 (192 bit key, 64 byte blocks): 1 operation in 432 cycles (64 bytes)
test 7 (192 bit key, 256 byte blocks): 1 operation in 780 cycles (256 bytes)
test 8 (192 bit key, 1024 byte blocks): 1 operation in 2132 cycles (1024 bytes)
test 9 (192 bit key, 8192 byte blocks): 1 operation in 15765 cycles (8192 bytes)
test 10 (256 bit key, 16 byte blocks): 1 operation in 416 cycles (16 bytes)
test 11 (256 bit key, 64 byte blocks): 1 operation in 438 cycles (64 bytes)
test 12 (256 bit key, 256 byte blocks): 1 operation in 842 cycles (256 bytes)
test 13 (256 bit key, 1024 byte blocks): 1 operation in 2383 cycles (1024 bytes)
test 14 (256 bit key, 8192 byte blocks): 1 operation in 16945 cycles (8192 bytes)

testing speed of __ctr-aes-aesni decryption
test 0 (128 bit key, 16 byte blocks): 1 operation in 389 cycles (16 bytes)
test 1 (128 bit key, 64 byte blocks): 1 operation in 409 cycles (64 bytes)
test 2 (128 bit key, 256 byte blocks): 1 operation in 704 cycles (256 bytes)
test 3 (128 bit key, 1024 byte blocks): 1 operation in 1865 cycles (1024 bytes)
test 4 (128 bit key, 8192 byte blocks): 1 operation in 12783 cycles (8192 bytes)
test 5 (192 bit key, 16 byte blocks): 1 operation in 409 cycles (16 bytes)
test 6 (192 bit key, 64 byte blocks): 1 operation in 434 cycles (64 bytes)
test 7 (192 bit key, 256 byte blocks): 1 operation in 792 cycles (256 bytes)
test 8 (192 bit key, 1024 byte blocks): 1 operation in 2151 cycles (1024 bytes)
test 9 (192 bit key, 8192 byte blocks): 1 operation in 15804 cycles (8192 bytes)
test 10 (256 bit key, 16 byte blocks): 1 operation in 421 cycles (16 bytes)
test 11 (256 bit key, 64 byte blocks): 1 operation in 444 cycles (64 bytes)
test 12 (256 bit key, 256 byte blocks): 1 operation in 840 cycles (256 bytes)
test 13 (256 bit key, 1024 byte blocks): 1 operation in 2394 cycles (1024 bytes)
test 14 (256 bit key, 8192 byte blocks): 1 operation in 16928 cycles (8192 bytes)

tcrypt log with "by8" AES CTR mode optimization on Sandybridge
--------------------------------------------------------------

testing speed of __ctr-aes-aesni encryption
test 0 (128 bit key, 16 byte blocks): 1 operation in 383 cycles (16 bytes)
test 1 (128 bit key, 64 byte blocks): 1 operation in 401 cycles (64 bytes)
test 2 (128 bit key, 256 byte blocks): 1 operation in 522 cycles (256 bytes)
test 3 (128 bit key, 1024 byte blocks): 1 operation in 1136 cycles (1024 bytes)
test 4 (128 bit key, 8192 byte blocks): 1 operation in 7046 cycles (8192 bytes)
test 5 (192 bit key, 16 byte blocks): 1 operation in 394 cycles (16 bytes)
test 6 (192 bit key, 64 byte blocks): 1 operation in 418 cycles (64 bytes)
test 7 (192 bit key, 256 byte blocks): 1 operation in 559 cycles (256 bytes)
test 8 (192 bit key, 1024 byte blocks): 1 operation in 1263 cycles (1024 bytes)
test 9 (192 bit key, 8192 byte blocks): 1 operation in 9072 cycles (8192 bytes)
test 10 (256 bit key, 16 byte blocks): 1 operation in 408 cycles (16 bytes)
test 11 (256 bit key, 64 byte blocks): 1 operation in 428 cycles (64 bytes)
test 12 (256 bit key, 256 byte blocks): 1 operation in 595 cycles (256 bytes)
test 13 (256 bit key, 1024 byte blocks): 1 operation in 1385 cycles (1024 bytes)
test 14 (256 bit key, 8192 byte blocks): 1 operation in 9224 cycles (8192 bytes)

testing speed of __ctr-aes-aesni decryption
test 0 (128 bit key, 16 byte blocks): 1 operation in 390 cycles (16 bytes)
test 1 (128 bit key, 64 byte blocks): 1 operation in 402 cycles (64 bytes)
test 2 (128 bit key, 256 byte blocks): 1 operation in 530 cycles (256 bytes)
test 3 (128 bit key, 1024 byte blocks): 1 operation in 1135 cycles (1024 bytes)
test 4 (128 bit key, 8192 byte blocks): 1 operation in 7079 cycles (8192 bytes)
test 5 (192 bit key, 16 byte blocks): 1 operation in 414 cycles (16 bytes)
test 6 (192 bit key, 64 byte blocks): 1 operation in 417 cycles (64 bytes)
test 7 (192 bit key, 256 byte blocks): 1 operation in 572 cycles (256 bytes)
test 8 (192 bit key, 1024 byte blocks): 1 operation in 1312 cycles (1024 bytes)
test 9 (192 bit key, 8192 byte blocks): 1 operation in 9073 cycles (8192 bytes)
test 10 (256 bit key, 16 byte blocks): 1 operation in 415 cycles (16 bytes)
test 11 (256 bit key, 64 byte blocks): 1 operation in 454 cycles (64 bytes)
test 12 (256 bit key, 256 byte blocks): 1 operation in 598 cycles (256 bytes)
test 13 (256 bit key, 1024 byte blocks): 1 operation in 1407 cycles (1024 bytes)
test 14 (256 bit key, 8192 byte blocks): 1 operation in 9288 cycles (8192 bytes)

crypto: Fix redundant checks

a) Fix the redundant check for cpu_has_aes
b) Fix the key length check when invoking the CTR mode "by8"
encryptor/decryptor.

crypto: fix typo in AES ctr mode transform
Signed-off-by: NChandramouli Narayanan <mouli@linux.intel.com>
Reviewed-by: NMathias Krause <minipli@googlemail.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

22cddcc7

crypto: des_3des - add x86-64 assembly implementation · 6574e6c6

由 Jussi Kivilinna 提交于 6月 09, 2014

Patch adds x86_64 assembly implementation of Triple DES EDE cipher algorithm.
Two assembly implementations are provided. First is regular 'one-block at
time' encrypt/decrypt function. Second is 'three-blocks at time' function that
gains performance increase on out-of-order CPUs.

tcrypt test results:

Intel Core i5-4570:

des3_ede-asm vs des3_ede-generic:
size    ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec
16B     1.21x   1.22x   1.27x   1.36x   1.25x   1.25x
64B     1.98x   1.96x   1.23x   2.04x   2.01x   2.00x
256B    2.34x   2.37x   1.21x   2.40x   2.38x   2.39x
1024B   2.50x   2.47x   1.22x   2.51x   2.52x   2.51x
8192B   2.51x   2.53x   1.21x   2.56x   2.54x   2.55x
Signed-off-by: NJussi Kivilinna <jussi.kivilinna@iki.fi>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

6574e6c6

crypto: crc32c-pclmul - Shrink K_table to 32-bit words · 473946e6

由 George Spelvin 提交于 6月 06, 2014

There's no need for the K_table to be made of 64-bit words.  For some
reason, the original authors didn't fully reduce the values modulo the
CRC32C polynomial, and so had some 33-bit values in there.  They can
all be reduced to 32 bits.

Doing that cuts the table size in half.  Since the code depends on both
pclmulq and crc32, SSE 4.1 is obviously present, so we can use pmovzxdq
to fetch it in the correct format.

This adds (measured on Ivy Bridge) 1 cycle per main loop iteration
(CRC of up to 3K bytes), less than 0.2%.  The hope is that the reduced
D-cache footprint will make up the loss in other code.

Two other related fixes:
* K_table is read-only, so belongs in .rodata, and
* There's no need for more than 8-byte alignment
Acked-by: NTim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: NGeorge Spelvin <linux@horizon.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

473946e6

04 4月, 2014 1 次提交

crypto: ghash-clmulni-intel - Use u128 instead of be128 for internal key · 0ea48146

由 Herbert Xu 提交于 4月 04, 2014

The internal key isn't actually in big-endian format so let's switch
to u128 which also happens to allow us to remove a sparse warning.

Based on suggestion by Ard Biesheuvel.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Acked-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>

0ea48146

01 4月, 2014 1 次提交

crypto: ghash-clmulni-intel - use C implementation for setkey() · 8ceee728

由 Ard Biesheuvel 提交于 3月 27, 2014

The GHASH setkey() function uses SSE registers but fails to call
kernel_fpu_begin()/kernel_fpu_end(). Instead of adding these calls, and
then having to deal with the restriction that they cannot be called from
interrupt context, move the setkey() implementation to the C domain.

Note that setkey() does not use any particular SSE features and is not
expected to become a performance bottleneck.
Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: NH. Peter Anvin <hpa@linux.intel.com>
Fixes: 0e1227d3 (crypto: ghash - Add PCLMULQDQ accelerated implementation)
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

8ceee728

25 3月, 2014 3 次提交

crypto: x86/sha1 - reduce size of the AVX2 asm implementation · 37b28947

由 Mathias Krause 提交于 3月 24, 2014

There is really no need to page align sha1_transform_avx2. The default
alignment is just fine. This is not the hot code but only the entry
point, after all.

Cc: Chandramouli Narayanan <mouli@linux.intel.com>
Signed-off-by: NMathias Krause <minipli@googlemail.com>
Reviewed-by: NH. Peter Anvin <hpa@linux.intel.com>
Reviewed-by: NMarek Vasut <marex@denx.de>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

37b28947

crypto: x86/sha1 - fix stack alignment of AVX2 variant · 6c8c17cc

由 Mathias Krause 提交于 3月 24, 2014

The AVX2 implementation might waste up to a page of stack memory because
of a wrong alignment calculation. This will, in the worst case, increase
the stack usage of sha1_transform_avx2() alone to 5.4 kB -- way to big
for a kernel function. Even worse, it might also allocate *less* bytes
than needed if the stack pointer is already aligned bacause in that case
the 'sub %rbx, %rsp' is effectively moving the stack pointer upwards,
not downwards.

Fix those issues by changing and simplifying the alignment calculation
to use a 32 byte alignment, the alignment really needed.

Cc: Chandramouli Narayanan <mouli@linux.intel.com>
Signed-off-by: NMathias Krause <minipli@googlemail.com>
Reviewed-by: NH. Peter Anvin <hpa@linux.intel.com>
Reviewed-by: NMarek Vasut <marex@denx.de>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

6c8c17cc

crypto: x86/sha1 - re-enable the AVX variant · 6ca5afb8

由 Mathias Krause 提交于 3月 24, 2014

Commit 7c1da8d0 "crypto: sha - SHA1 transform x86_64 AVX2"
accidentally disabled the AVX variant by making the avx_usable() test
not only fail in case the CPU doesn't support AVX or OSXSAVE but also
if it doesn't support AVX2.

Fix that regression by splitting up the AVX/AVX2 test into two
functions. Also test for the BMI1 extension in the avx2_usable() test
as the AVX2 implementation not only makes use of BMI2 but also BMI1
instructions.

Cc: Chandramouli Narayanan <mouli@linux.intel.com>
Signed-off-by: NMathias Krause <minipli@googlemail.com>
Reviewed-by: NH. Peter Anvin <hpa@linux.intel.com>
Reviewed-by: NMarek Vasut <marex@denx.de>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

6ca5afb8

21 3月, 2014 1 次提交

crypto: sha - SHA1 transform x86_64 AVX2 · 7c1da8d0

由 chandramouli narayanan 提交于 3月 20, 2014

This git patch adds x86_64 AVX2 optimization of SHA1
transform to crypto support. The patch has been tested with 3.14.0-rc1
kernel.

On a Haswell desktop, with turbo disabled and all cpus running
at maximum frequency, tcrypt shows AVX2 performance improvement
from 3% for 256 bytes update to 16% for 1024 bytes update over
AVX implementation.

This patch adds sha1_avx2_transform(), the glue, build and
configuration changes needed for AVX2 optimization of
SHA1 transform to crypto support.

sha1-ssse3 is one module which adds the necessary optimization
support (SSSE3/AVX/AVX2) for the low-level SHA1 transform function.
With better optimization support, transform function is overridden
as the case may be. In the case of AVX2, due to performance reasons
across datablock sizes, the AVX or AVX2 transform function is used
at run-time as it suits best. The Makefile change therefore appends
the necessary objects to the linkage. Due to this, the patch merely
appends AVX2 transform to the existing build mix and Kconfig support
and leaves the configuration build support as is.
Signed-off-by: NChandramouli Narayanan <mouli@linux.intel.com>
Reviewed-by: NMarek Vasut <marex@denx.de>
Acked-by: NH. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

7c1da8d0

27 2月, 2014 1 次提交

crypto: remove a duplicate checks in __cbc_decrypt() · b3bd5869

由 Dan Carpenter 提交于 2月 13, 2014

We checked "nbytes < bsize" before so it can't happen here.
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Acked-by: NJussi Kivilinna <jussi.kivilinna@iki.fi>
Acked-by: NJohannes Götzfried <johannes.goetzfried@cs.fau.de>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

b3bd5869

15 1月, 2014 1 次提交

crypto: aesni - fix build on x86 (32bit) · 79ba451d

由 Tim Chen 提交于 1月 09, 2014

We rename aesni-intel_avx.S to aesni-intel_avx-x86_64.S to indicate
that it is only used by x86_64 architecture.
Signed-off-by: NTim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

79ba451d

31 12月, 2013 1 次提交

crypto: aesni - fix build on x86 (32bit) · 8610d7bf

由 Andy Shevchenko 提交于 12月 30, 2013

It seems commit d764593a "crypto: aesni - AVX and AVX2 version of AESNI-GCM
encode and decode" breaks a build on x86_32 since it's designed only for
x86_64. This patch makes a compilation unit conditional to CONFIG_64BIT and
functions usage to CONFIG_X86_64.
Signed-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

8610d7bf

20 12月, 2013 2 次提交

crypto: aesni - AVX and AVX2 version of AESNI-GCM encode and decode · d764593a

由 Tim Chen 提交于 12月 11, 2013

We have added AVX and AVX2 routines that optimize AESNI-GCM encode/decode.
These routines are optimized for encrypt and decrypt of large buffers.
In tests we have seen up to 6% speedup for 1K, 11% speedup for 2K and
18% speedup for 8K buffer over the existing SSE version. These routines
should provide even better speedup for future Intel x86_64 cpus.
Signed-off-by: NTim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

d764593a

crypto: arch - use crypto_memneq instead of memcmp · fed28611

由 Daniel Borkmann 提交于 12月 11, 2013

Replace remaining occurences (just as we did in crypto/) under arch/*/crypto/
that make use of memcmp() for comparing keys or authentication tags for
usage with crypto_memneq(). It can simply be used as a drop-in replacement
for the normal memcmp().
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Cc: James Yonan <james@openvpn.net>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

fed28611

07 10月, 2013 1 次提交

crypto: sha256_ssse3 - also test for BMI2 · 16c0c4e1

由 Oliver Neukum 提交于 10月 01, 2013

The AVX2 implementation also uses BMI2 instructions,
but doesn't test for their availability. The assumption
that AVX2 and BMI2 always go together is false. Some
Haswells have AVX2 but not BMI2.
Signed-off-by: NOliver Neukum <oneukum@suse.de>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

16c0c4e1

24 9月, 2013 1 次提交

crypto: move x86 to the generic version of ablk_helper · 801201aa

由 Ard Biesheuvel 提交于 9月 20, 2013

Move all users of ablk_helper under x86/ to the generic version
and delete the x86 specific version.
Acked-by: NJussi Kivilinna <jussi.kivilinna@iki.fi>
Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

801201aa

13 9月, 2013 2 次提交

crypto: x86 - restore avx2_supported check · 58497204

由 Jussi Kivilinna 提交于 9月 03, 2013

Commit 3d387ef0 (Revert "crypto: blowfish - add AVX2/x86_64 implementation
of blowfish cipher") reverted too much as it removed the 'assembler supports
AVX2' check and therefore disabled remaining AVX2 implementations of Camellia
and Serpent. Patch restores the check and enables these implementations.
Signed-off-by: NJussi Kivilinna <jussi.kivilinna@iki.fi>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

58497204

crypto: sha256_ssse3 - use correct module alias for sha224 · 7d444909

由 Jussi Kivilinna 提交于 9月 03, 2013

Commit a710f761 (crypto: sha256_ssse3 - add sha224 support) attempted to add
MODULE_ALIAS for SHA-224, but it ended up being "sha384", probably because
mix-up with previous commit 340991e3 (crypto: sha512_ssse3 - add sha384
support). Patch corrects module alias to "sha224".
Reported-by: NPierre-Mayeul Badaire <pierre-mayeul.badaire@m4x.org>
Signed-off-by: NJussi Kivilinna <jussi.kivilinna@iki.fi>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

7d444909

07 9月, 2013 1 次提交

Reinstate "crypto: crct10dif - Wrap crc_t10dif function all to use crypto transform framework" · 68411521

由 Herbert Xu 提交于 9月 07, 2013

This patch reinstates commits
	67822649
	39761214
	0b95a7f8
	31d93962
	2d31e518

Now that module softdeps are in the kernel we can use that to resolve
the boot issue which cause the revert.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

68411521

21 8月, 2013 1 次提交

crypto: camellia-x86-64 - replace commas by semicolons and adjust code alignment · 2a128b4b

由 Julia Lawall 提交于 8月 14, 2013

Adjust alignment and replace commas by semicolons in automatically
generated code.
Signed-off-by: NJulia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

2a128b4b

14 8月, 2013 1 次提交

crypto: make tables used from assembler __visible · f22d0811

由 Andi Kleen 提交于 8月 10, 2013

Tables used from assembler should be marked __visible to let
the compiler know.
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

f22d0811

24 7月, 2013 1 次提交

Revert "crypto: crct10dif - Wrap crc_t10dif function all to use crypto transform framework" · e70308ec

由 Herbert Xu 提交于 7月 24, 2013

This reverts commits
    67822649
    39761214
    0b95a7f8
    31d93962
    2d31e518

Unfortunately this change broke boot on some systems that used an
initrd which does not include the newly created crct10dif modules.
As these modules are required by sd_mod under certain configurations
this is a serious problem.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

e70308ec

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功