提交 · fbe5cff932295f2468524bf6ffe109a9d0849178 · openanolis / cloud-kernel

27 4月, 2019 1 次提交

crypto: x86/poly1305 - fix overflow during partial reduction · fbe5cff9

由 Eric Biggers 提交于 3月 31, 2019

commit 678cce4019d746da6c680c48ba9e6d417803e127 upstream.

The x86_64 implementation of Poly1305 produces the wrong result on some
inputs because poly1305_4block_avx2() incorrectly assumes that when
partially reducing the accumulator, the bits carried from limb 'd4' to
limb 'h0' fit in a 32-bit integer. This is true for poly1305-generic
which processes only one block at a time. However, it's not true for
the AVX2 implementation, which processes 4 blocks at a time and
therefore can produce intermediate limbs about 4x larger.

Fix it by making the relevant calculations use 64-bit arithmetic rather
than 32-bit. Note that most of the carries already used 64-bit
arithmetic, but the d4 -> h0 carry was different for some reason.

To be safe I also made the same change to the corresponding SSE2 code,
though that only operates on 1 or 2 blocks at a time. I don't think
it's really needed for poly1305_block_sse2(), but it doesn't hurt
because it's already x86_64 code. It *might* be needed for
poly1305_2block_sse2(), but overflows aren't easy to reproduce there.

This bug was originally detected by my patches that improve testmgr to
fuzz algorithms against their generic implementation. But also add a
test vector which reproduces it directly (in the AVX2 case).

Fixes: b1ccc8f4 ("crypto: poly1305 - Add a four block AVX2 variant for x86_64")
Fixes: c70f4abe ("crypto: poly1305 - Add a SSE2 SIMD variant for x86_64")
Cc: <stable@vger.kernel.org> # v4.3+
Cc: Martin Willi <martin@strongswan.org>
Cc: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: NEric Biggers <ebiggers@google.com>
Reviewed-by: NMartin Willi <martin@strongswan.org>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

fbe5cff9

24 3月, 2019 3 次提交

crypto: x86/morus - fix handling chunked inputs and MAY_SLEEP · 66700c89

由 Eric Biggers 提交于 1月 31, 2019

commit 2060e284e9595fc3baed6e035903c05b93266555 upstream.

The x86 MORUS implementations all fail the improved AEAD tests because
they produce the wrong result with some data layouts.  The issue is that
they assume that if the skcipher_walk API gives 'nbytes' not aligned to
the walksize (a.k.a. walk.stride), then it is the end of the data.  In
fact, this can happen before the end.

Also, when the CRYPTO_TFM_REQ_MAY_SLEEP flag is given, they can
incorrectly sleep in the skcipher_walk_*() functions while preemption
has been disabled by kernel_fpu_begin().

Fix these bugs.

Fixes: 56e8e57f ("crypto: morus - Add common SIMD glue code for MORUS")
Cc: <stable@vger.kernel.org> # v4.18+
Cc: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: NEric Biggers <ebiggers@google.com>
Reviewed-by: NOndrej Mosnacek <omosnace@redhat.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

66700c89

crypto: x86/aesni-gcm - fix crash on empty plaintext · 8a9fcf4a

由 Eric Biggers 提交于 1月 31, 2019

commit 3af349639597fea582a93604734d717e59a0e223 upstream.

gcmaes_crypt_by_sg() dereferences the NULL pointer returned by
scatterwalk_ffwd() when encrypting an empty plaintext and the source
scatterlist ends immediately after the associated data.

Fix it by only fast-forwarding to the src/dst data scatterlists if the
data length is nonzero.

This bug is reproduced by the "rfc4543(gcm(aes))" test vectors when run
with the new AEAD test manager.

Fixes: e8455207 ("crypto: aesni - Update aesni-intel_glue to use scatter/gather")
Cc: <stable@vger.kernel.org> # v4.17+
Cc: Dave Watson <davejwatson@fb.com>
Signed-off-by: NEric Biggers <ebiggers@google.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

8a9fcf4a

crypto: x86/aegis - fix handling chunked inputs and MAY_SLEEP · 5d2a5172

由 Eric Biggers 提交于 1月 31, 2019

commit ba6771c0a0bc2fac9d6a8759bab8493bd1cffe3b upstream.

The x86 AEGIS implementations all fail the improved AEAD tests because
they produce the wrong result with some data layouts.  The issue is that
they assume that if the skcipher_walk API gives 'nbytes' not aligned to
the walksize (a.k.a. walk.stride), then it is the end of the data.  In
fact, this can happen before the end.

Also, when the CRYPTO_TFM_REQ_MAY_SLEEP flag is given, they can
incorrectly sleep in the skcipher_walk_*() functions while preemption
has been disabled by kernel_fpu_begin().

Fix these bugs.

Fixes: 1d373d4e ("crypto: x86 - Add optimized AEGIS implementations")
Cc: <stable@vger.kernel.org> # v4.18+
Cc: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: NEric Biggers <ebiggers@google.com>
Reviewed-by: NOndrej Mosnacek <omosnace@redhat.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

5d2a5172

14 11月, 2018 1 次提交

crypto: aesni - don't use GFP_ATOMIC allocation if the request doesn't cross a page in gcm · 0b5fdbbe

由 Mikulas Patocka 提交于 9月 05, 2018

commit a788848116454d753b13a4888e0e31ada3c4d393 upstream.

This patch fixes gcmaes_crypt_by_sg so that it won't use memory
allocation if the data doesn't cross a page boundary.

Authenticated encryption may be used by dm-crypt. If the encryption or
decryption fails, it would result in I/O error and filesystem corruption.
The function gcmaes_crypt_by_sg is using GFP_ATOMIC allocation that can
fail anytime. This patch fixes the logic so that it won't attempt the
failing allocation if the data doesn't cross a page boundary.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

0b5fdbbe

14 9月, 2018 1 次提交

crypto: x86/aegis,morus - Do not require OSXSAVE for SSE2 · 24568b47

由 Ondrej Mosnacek 提交于 9月 05, 2018

It turns out OSXSAVE needs to be checked only for AVX, not for SSE.
Without this patch the affected modules refuse to load on CPUs with SSE2
but without AVX support.

Fixes: 877ccce7 ("crypto: x86/aegis,morus - Fix and simplify CPUID checks")
Cc: <stable@vger.kernel.org> # 4.18
Reported-by: NZdenek Kaspar <zkaspar82@gmail.com>
Signed-off-by: NOndrej Mosnacek <omosnace@redhat.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

24568b47

25 8月, 2018 1 次提交

crypto: aesni - Use unaligned loads from gcm_context_data · e5b954e8

由 Dave Watson 提交于 8月 15, 2018

A regression was reported bisecting to 1476db2d
"Move HashKey computation from stack to gcm_context".  That diff
moved HashKey computation from the stack, which was explicitly aligned
in the asm, to a struct provided from the C code, depending on
AESNI_ALIGN_ATTR for alignment.   It appears some compilers may not
align this struct correctly, resulting in a crash on the movdqa
instruction when attempting to encrypt or decrypt data.

Fix by using unaligned loads for the HashKeys.  On modern
hardware there is no perf difference between the unaligned and
aligned loads.  All other accesses to gcm_context_data already use
unaligned loads.
Reported-by: NMauro Rossi <issor.oruam@gmail.com>
Fixes: 1476db2d ("Move HashKey computation from stack to gcm_context")
Cc: <stable@vger.kernel.org>
Signed-off-by: NDave Watson <davejwatson@fb.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

e5b954e8

07 8月, 2018 1 次提交

crypto: x86/aegis,morus - Fix and simplify CPUID checks · 877ccce7

由 Ondrej Mosnacek 提交于 8月 03, 2018

It turns out I had misunderstood how the x86_match_cpu() function works.
It evaluates a logical OR of the matching conditions, not logical AND.
This caused the CPU feature checks for AEGIS to pass even if only SSE2
(but not AES-NI) was supported (or vice versa), leading to potential
crashes if something tried to use the registered algs.

This patch switches the checks to a simpler method that is used e.g. in
the Camellia x86 code.

The patch also removes the MODULE_DEVICE_TABLE declarations which
actually seem to cause the modules to be auto-loaded at boot, which is
not desired. The crypto API on-demand module loading is sufficient.

Fixes: 1d373d4e ("crypto: x86 - Add optimized AEGIS implementations")
Fixes: 6ecc9d9f ("crypto: x86 - Add optimized MORUS implementations")
Signed-off-by: NOndrej Mosnacek <omosnace@redhat.com>
Tested-by: NMilan Broz <gmazyland@gmail.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

877ccce7

09 7月, 2018 5 次提交

crypto: ahash - remove useless setting of cra_type · c87a405e

由 Eric Biggers 提交于 6月 30, 2018

Some ahash algorithms set .cra_type = &crypto_ahash_type.  But this is
redundant with the C structure type ('struct ahash_alg'), and
crypto_register_ahash() already sets the .cra_type automatically.
Apparently the useless assignment has just been copy+pasted around.

So, remove the useless assignment from all the ahash algorithms.

This patch shouldn't change any actual behavior.
Signed-off-by: NEric Biggers <ebiggers@google.com>
Acked-by: NGilad Ben-Yossef <gilad@benyossef.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

c87a405e

crypto: ahash - remove useless setting of type flags · 6a38f622

由 Eric Biggers 提交于 6月 30, 2018

Many ahash algorithms set .cra_flags = CRYPTO_ALG_TYPE_AHASH.  But this
is redundant with the C structure type ('struct ahash_alg'), and
crypto_register_ahash() already sets the type flag automatically,
clearing any type flag that was already there.  Apparently the useless
assignment has just been copy+pasted around.

So, remove the useless assignment from all the ahash algorithms.

This patch shouldn't change any actual behavior.
Signed-off-by: NEric Biggers <ebiggers@google.com>
Acked-by: NGilad Ben-Yossef <gilad@benyossef.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

6a38f622

crypto: shash - remove useless setting of type flags · e50944e2

由 Eric Biggers 提交于 6月 30, 2018

Many shash algorithms set .cra_flags = CRYPTO_ALG_TYPE_SHASH.  But this
is redundant with the C structure type ('struct shash_alg'), and
crypto_register_shash() already sets the type flag automatically,
clearing any type flag that was already there.  Apparently the useless
assignment has just been copy+pasted around.

So, remove the useless assignment from all the shash algorithms.

This patch shouldn't change any actual behavior.
Signed-off-by: NEric Biggers <ebiggers@google.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

e50944e2

crypto: x86/sha-mb - decrease priority of multibuffer algorithms · 8aeef492

由 Eric Biggers 提交于 6月 29, 2018

With all the crypto modules enabled on x86, and with a CPU that supports
AVX-2 but not SHA-NI instructions (e.g. Haswell, Broadwell, Skylake),
the "multibuffer" implementations of SHA-1, SHA-256, and SHA-512 are the
highest priority. However, these implementations only perform well when
many hash requests are being submitted concurrently, filling all 8 AVX-2
lanes. Otherwise, they are incredibly slow, as they waste time waiting
for more requests to arrive before proceeding to execute each request.

For example, here are the speeds I see hashing 4096-byte buffers with a
single thread on a Haswell-based processor:

generic avx2 mb (multibuffer)
------- -------- ----------------
sha1 602 MB/s 997 MB/s 0.61 MB/s
sha256 228 MB/s 412 MB/s 0.61 MB/s
sha512 312 MB/s 559 MB/s 0.61 MB/s

So, the multibuffer implementation is 500 to 1000 times slower than the
other implementations. Note that with smaller buffers or more update()s
per digest, the difference would be even greater.

I believe the vast majority of people are in the boat where the
multibuffer code is much slower, and only a small minority are doing the
highly parallel, hashing-intensive, latency-flexible workloads (maybe
IPsec on servers?) where the multibuffer code may be beneficial. Yet,
people often aren't familiar with all the crypto config options and so
the multibuffer code may inadvertently be built into the kernel.

Also the multibuffer code apparently hasn't been very well tested,
seeing as it was sometimes computing the wrong SHA-256 digest.

So, let's make the multibuffer algorithms low priority. Users who want
to use them can either request them explicitly by driver name, or use
NETLINK_CRYPTO (crypto_user) to increase their priority at runtime.
Signed-off-by: NEric Biggers <ebiggers@google.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

8aeef492

crypto: x86/sha256-mb - fix digest copy in sha256_mb_mgr_get_comp_job_avx2() · af839b4e

由 Eric Biggers 提交于 6月 29, 2018

There is a copy-paste error where sha256_mb_mgr_get_comp_job_avx2()
copies the SHA-256 digest state from sha256_mb_mgr::args::digest to
job_sha256::result_digest.  Consequently, the sha256_mb algorithm
sometimes calculates the wrong digest.  Fix it.

Reproducer using AF_ALG:

    #include <assert.h>
    #include <linux/if_alg.h>
    #include <stdio.h>
    #include <string.h>
    #include <sys/socket.h>
    #include <unistd.h>

    static const __u8 expected[32] =
        "\xad\x7f\xac\xb2\x58\x6f\xc6\xe9\x66\xc0\x04\xd7\xd1\xd1\x6b\x02"
        "\x4f\x58\x05\xff\x7c\xb4\x7c\x7a\x85\xda\xbd\x8b\x48\x89\x2c\xa7";

    int main()
    {
        int fd;
        struct sockaddr_alg addr = {
            .salg_type = "hash",
            .salg_name = "sha256_mb",
        };
        __u8 data[4096] = { 0 };
        __u8 digest[32];
        int ret;
        int i;

        fd = socket(AF_ALG, SOCK_SEQPACKET, 0);
        bind(fd, (void *)&addr, sizeof(addr));
        fork();
        fd = accept(fd, 0, 0);
        do {
            ret = write(fd, data, 4096);
            assert(ret == 4096);
            ret = read(fd, digest, 32);
            assert(ret == 32);
        } while (memcmp(digest, expected, 32) == 0);

        printf("wrong digest: ");
        for (i = 0; i < 32; i++)
            printf("%02x", digest[i]);
        printf("\n");
    }

Output was:

    wrong digest: ad7facb2000000000000000000000000ffffffef7cb47c7a85dabd8b48892ca7

Fixes: 172b1d6b ("crypto: sha256-mb - fix ctx pointer and digest copy")
Cc: <stable@vger.kernel.org> # v4.8+
Signed-off-by: NEric Biggers <ebiggers@google.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

af839b4e

03 7月, 2018 1 次提交

x86/asm/64: Use 32-bit XOR to zero registers · a7bea830

由 Jan Beulich 提交于 7月 02, 2018

Some Intel CPUs don't recognize 64-bit XORs as zeroing idioms. Zeroing
idioms don't require execution bandwidth, as they're being taken care
of in the frontend (through register renaming). Use 32-bit XORs instead.
Signed-off-by: NJan Beulich <jbeulich@suse.com>
Cc: Alok Kataria <akataria@vmware.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: davem@davemloft.net
Cc: herbert@gondor.apana.org.au
Cc: pavel@ucw.cz
Cc: rjw@rjwysocki.net
Link: http://lkml.kernel.org/r/5B39FF1A02000078001CFB54@prv1-mh.provo.novell.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

a7bea830

01 7月, 2018 1 次提交

crypto: x86 - Add missing RETs · 221e00d1

由 Borislav Petkov 提交于 6月 23, 2018

Add explicit RETs to the tail calls of AEGIS and MORUS crypto algorithms
otherwise they run into INT3 padding due to

  ("x86/asm: Pad assembly functions with INT3 instructions")

leading to spurious debug exceptions.

Mike Galbraith <efault@gmx.de> took care of all the remaining callsites.
Signed-off-by: NBorislav Petkov <bp@suse.de>
Acked-by: NOndrej Mosnacek <omosnacek@gmail.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

221e00d1

31 5月, 2018 2 次提交

crypto: x86/salsa20 - remove x86 salsa20 implementations · b7b73cd5

由 Eric Biggers 提交于 5月 26, 2018

The x86 assembly implementations of Salsa20 use the frame base pointer
register (%ebp or %rbp), which breaks frame pointer convention and
breaks stack traces when unwinding from an interrupt in the crypto code.
Recent (v4.10+) kernels will warn about this, e.g.

WARNING: kernel stack regs at 00000000a8291e69 in syzkaller047086:4677 has bad 'bp' value 000000001077994c
[...]

But after looking into it, I believe there's very little reason to still
retain the x86 Salsa20 code. First, these are *not* vectorized
(SSE2/SSSE3/AVX2) implementations, which would be needed to get anywhere
close to the best Salsa20 performance on any remotely modern x86
processor; they're just regular x86 assembly. Second, it's still
unclear that anyone is actually using the kernel's Salsa20 at all,
especially given that now ChaCha20 is supported too, and with much more
efficient SSSE3 and AVX2 implementations. Finally, in benchmarks I did
on both Intel and AMD processors with both gcc 8.1.0 and gcc 4.9.4, the
x86_64 salsa20-asm is actually slightly *slower* than salsa20-generic
(~3% slower on Skylake, ~10% slower on Zen), while the i686 salsa20-asm
is only slightly faster than salsa20-generic (~15% faster on Skylake,
~20% faster on Zen). The gcc version made little difference.

So, the x86_64 salsa20-asm is pretty clearly useless. That leaves just
the i686 salsa20-asm, which based on my tests provides a 15-20% speed
boost. But that's without updating the code to not use %ebp. And given
the maintenance cost, the small speed difference vs. salsa20-generic,
the fact that few people still use i686 kernels, the doubt that anyone
is even using the kernel's Salsa20 at all, and the fact that a SSE2
implementation would almost certainly be much faster on any remotely
modern x86 processor yet no one has cared enough to add one yet, I don't
think it's worthwhile to keep.

Thus, just remove both the x86_64 and i686 salsa20-asm implementations.

Reported-by: syzbot+ffa3a158337bbc01ff09@syzkaller.appspotmail.com
Signed-off-by: NEric Biggers <ebiggers@google.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

b7b73cd5

crypto: morus - Mark MORUS SIMD glue as x86-specific · 2808f173

由 Ondrej Mosnacek 提交于 5月 21, 2018

Commit 56e8e57f ("crypto: morus - Add common SIMD glue code for
MORUS") accidetally consiedered the glue code to be usable by different
architectures, but it seems to be only usable on x86.

This patch moves it under arch/x86/crypto and adds 'depends on X86' to
the Kconfig options and also removes the prompt to hide these internal
options from the user.
Reported-by: Nkbuild test robot <lkp@intel.com>
Signed-off-by: NOndrej Mosnacek <omosnacek@gmail.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

2808f173

27 5月, 2018 1 次提交

crypto: x86/aegis256 - Fix wrong key buffer size · dd09f58c

由 Ondrej Mosnacek 提交于 5月 20, 2018

AEGIS-256 key is two blocks, not one.

Fixes: 1d373d4e ("crypto: x86 - Add optimized AEGIS implementations")
Reported-by: NEric Biggers <ebiggers3@gmail.com>
Signed-off-by: NOndrej Mosnacek <omosnacek@gmail.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

dd09f58c

19 5月, 2018 2 次提交

crypto: x86 - Add optimized MORUS implementations · 6ecc9d9f

由 Ondrej Mosnacek 提交于 5月 11, 2018

This patch adds optimized implementations of MORUS-640 and MORUS-1280,
utilizing the SSE2 and AVX2 x86 extensions.

For MORUS-1280 (which operates on 256-bit blocks) we provide both AVX2
and SSE2 implementation. Although SSE2 MORUS-1280 is slower than AVX2
MORUS-1280, it is comparable in speed to the SSE2 MORUS-640.
Signed-off-by: NOndrej Mosnacek <omosnacek@gmail.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

6ecc9d9f

crypto: x86 - Add optimized AEGIS implementations · 1d373d4e

由 Ondrej Mosnacek 提交于 5月 11, 2018

This patch adds optimized implementations of AEGIS-128, AEGIS-128L,
and AEGIS-256, utilizing the AES-NI and SSE2 x86 extensions.
Signed-off-by: NOndrej Mosnacek <omosnacek@gmail.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

1d373d4e

05 5月, 2018 1 次提交

crypto: ghash-clmulni - fix spelling mistake: "acclerated" -> "accelerated" · 158b52ff

由 Colin Ian King 提交于 4月 27, 2018

Trivial fix to spelling mistake in module description text
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

158b52ff

09 3月, 2018 1 次提交

crypto: x86/des3_ede - des3_ede_skciphers[] can be static · 9cc16b4d

由 Wu Fengguang 提交于 3月 03, 2018

Fixes: 09c0f03b ("crypto: x86/des3_ede - convert to skcipher interface")
Signed-off-by: NFengguang Wu <fengguang.wu@intel.com>
Acked-by: NEric Biggers <ebiggers@google.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

9cc16b4d

03 3月, 2018 18 次提交

crypto: x86/glue_helper - rename glue_skwalk_fpu_begin() · 75d8a553