1. 27 4月, 2019 1 次提交
    • E
      crypto: x86/poly1305 - fix overflow during partial reduction · fbe5cff9
      Eric Biggers 提交于
      commit 678cce4019d746da6c680c48ba9e6d417803e127 upstream.
      
      The x86_64 implementation of Poly1305 produces the wrong result on some
      inputs because poly1305_4block_avx2() incorrectly assumes that when
      partially reducing the accumulator, the bits carried from limb 'd4' to
      limb 'h0' fit in a 32-bit integer.  This is true for poly1305-generic
      which processes only one block at a time.  However, it's not true for
      the AVX2 implementation, which processes 4 blocks at a time and
      therefore can produce intermediate limbs about 4x larger.
      
      Fix it by making the relevant calculations use 64-bit arithmetic rather
      than 32-bit.  Note that most of the carries already used 64-bit
      arithmetic, but the d4 -> h0 carry was different for some reason.
      
      To be safe I also made the same change to the corresponding SSE2 code,
      though that only operates on 1 or 2 blocks at a time.  I don't think
      it's really needed for poly1305_block_sse2(), but it doesn't hurt
      because it's already x86_64 code.  It *might* be needed for
      poly1305_2block_sse2(), but overflows aren't easy to reproduce there.
      
      This bug was originally detected by my patches that improve testmgr to
      fuzz algorithms against their generic implementation.  But also add a
      test vector which reproduces it directly (in the AVX2 case).
      
      Fixes: b1ccc8f4 ("crypto: poly1305 - Add a four block AVX2 variant for x86_64")
      Fixes: c70f4abe ("crypto: poly1305 - Add a SSE2 SIMD variant for x86_64")
      Cc: <stable@vger.kernel.org> # v4.3+
      Cc: Martin Willi <martin@strongswan.org>
      Cc: Jason A. Donenfeld <Jason@zx2c4.com>
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Reviewed-by: NMartin Willi <martin@strongswan.org>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fbe5cff9
  2. 24 3月, 2019 3 次提交
  3. 14 11月, 2018 1 次提交
  4. 14 9月, 2018 1 次提交
  5. 25 8月, 2018 1 次提交
    • D
      crypto: aesni - Use unaligned loads from gcm_context_data · e5b954e8
      Dave Watson 提交于
      A regression was reported bisecting to 1476db2d
      "Move HashKey computation from stack to gcm_context".  That diff
      moved HashKey computation from the stack, which was explicitly aligned
      in the asm, to a struct provided from the C code, depending on
      AESNI_ALIGN_ATTR for alignment.   It appears some compilers may not
      align this struct correctly, resulting in a crash on the movdqa
      instruction when attempting to encrypt or decrypt data.
      
      Fix by using unaligned loads for the HashKeys.  On modern
      hardware there is no perf difference between the unaligned and
      aligned loads.  All other accesses to gcm_context_data already use
      unaligned loads.
      Reported-by: NMauro Rossi <issor.oruam@gmail.com>
      Fixes: 1476db2d ("Move HashKey computation from stack to gcm_context")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NDave Watson <davejwatson@fb.com>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      e5b954e8
  6. 07 8月, 2018 1 次提交
    • O
      crypto: x86/aegis,morus - Fix and simplify CPUID checks · 877ccce7
      Ondrej Mosnacek 提交于
      It turns out I had misunderstood how the x86_match_cpu() function works.
      It evaluates a logical OR of the matching conditions, not logical AND.
      This caused the CPU feature checks for AEGIS to pass even if only SSE2
      (but not AES-NI) was supported (or vice versa), leading to potential
      crashes if something tried to use the registered algs.
      
      This patch switches the checks to a simpler method that is used e.g. in
      the Camellia x86 code.
      
      The patch also removes the MODULE_DEVICE_TABLE declarations which
      actually seem to cause the modules to be auto-loaded at boot, which is
      not desired. The crypto API on-demand module loading is sufficient.
      
      Fixes: 1d373d4e ("crypto: x86 - Add optimized AEGIS implementations")
      Fixes: 6ecc9d9f ("crypto: x86 - Add optimized MORUS implementations")
      Signed-off-by: NOndrej Mosnacek <omosnace@redhat.com>
      Tested-by: NMilan Broz <gmazyland@gmail.com>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      877ccce7
  7. 09 7月, 2018 5 次提交
    • E
      crypto: ahash - remove useless setting of cra_type · c87a405e
      Eric Biggers 提交于
      Some ahash algorithms set .cra_type = &crypto_ahash_type.  But this is
      redundant with the C structure type ('struct ahash_alg'), and
      crypto_register_ahash() already sets the .cra_type automatically.
      Apparently the useless assignment has just been copy+pasted around.
      
      So, remove the useless assignment from all the ahash algorithms.
      
      This patch shouldn't change any actual behavior.
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Acked-by: NGilad Ben-Yossef <gilad@benyossef.com>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      c87a405e
    • E
      crypto: ahash - remove useless setting of type flags · 6a38f622
      Eric Biggers 提交于
      Many ahash algorithms set .cra_flags = CRYPTO_ALG_TYPE_AHASH.  But this
      is redundant with the C structure type ('struct ahash_alg'), and
      crypto_register_ahash() already sets the type flag automatically,
      clearing any type flag that was already there.  Apparently the useless
      assignment has just been copy+pasted around.
      
      So, remove the useless assignment from all the ahash algorithms.
      
      This patch shouldn't change any actual behavior.
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Acked-by: NGilad Ben-Yossef <gilad@benyossef.com>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      6a38f622
    • E
      crypto: shash - remove useless setting of type flags · e50944e2
      Eric Biggers 提交于
      Many shash algorithms set .cra_flags = CRYPTO_ALG_TYPE_SHASH.  But this
      is redundant with the C structure type ('struct shash_alg'), and
      crypto_register_shash() already sets the type flag automatically,
      clearing any type flag that was already there.  Apparently the useless
      assignment has just been copy+pasted around.
      
      So, remove the useless assignment from all the shash algorithms.
      
      This patch shouldn't change any actual behavior.
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      e50944e2
    • E
      crypto: x86/sha-mb - decrease priority of multibuffer algorithms · 8aeef492
      Eric Biggers 提交于
      With all the crypto modules enabled on x86, and with a CPU that supports
      AVX-2 but not SHA-NI instructions (e.g. Haswell, Broadwell, Skylake),
      the "multibuffer" implementations of SHA-1, SHA-256, and SHA-512 are the
      highest priority.  However, these implementations only perform well when
      many hash requests are being submitted concurrently, filling all 8 AVX-2
      lanes.  Otherwise, they are incredibly slow, as they waste time waiting
      for more requests to arrive before proceeding to execute each request.
      
      For example, here are the speeds I see hashing 4096-byte buffers with a
      single thread on a Haswell-based processor:
      
                  generic            avx2          mb (multibuffer)
                  -------            --------      ----------------
      sha1        602 MB/s           997 MB/s      0.61 MB/s
      sha256      228 MB/s           412 MB/s      0.61 MB/s
      sha512      312 MB/s           559 MB/s      0.61 MB/s
      
      So, the multibuffer implementation is 500 to 1000 times slower than the
      other implementations.  Note that with smaller buffers or more update()s
      per digest, the difference would be even greater.
      
      I believe the vast majority of people are in the boat where the
      multibuffer code is much slower, and only a small minority are doing the
      highly parallel, hashing-intensive, latency-flexible workloads (maybe
      IPsec on servers?) where the multibuffer code may be beneficial.  Yet,
      people often aren't familiar with all the crypto config options and so
      the multibuffer code may inadvertently be built into the kernel.
      
      Also the multibuffer code apparently hasn't been very well tested,
      seeing as it was sometimes computing the wrong SHA-256 digest.
      
      So, let's make the multibuffer algorithms low priority.  Users who want
      to use them can either request them explicitly by driver name, or use
      NETLINK_CRYPTO (crypto_user) to increase their priority at runtime.
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      8aeef492
    • E
      crypto: x86/sha256-mb - fix digest copy in sha256_mb_mgr_get_comp_job_avx2() · af839b4e
      Eric Biggers 提交于
      There is a copy-paste error where sha256_mb_mgr_get_comp_job_avx2()
      copies the SHA-256 digest state from sha256_mb_mgr::args::digest to
      job_sha256::result_digest.  Consequently, the sha256_mb algorithm
      sometimes calculates the wrong digest.  Fix it.
      
      Reproducer using AF_ALG:
      
          #include <assert.h>
          #include <linux/if_alg.h>
          #include <stdio.h>
          #include <string.h>
          #include <sys/socket.h>
          #include <unistd.h>
      
          static const __u8 expected[32] =
              "\xad\x7f\xac\xb2\x58\x6f\xc6\xe9\x66\xc0\x04\xd7\xd1\xd1\x6b\x02"
              "\x4f\x58\x05\xff\x7c\xb4\x7c\x7a\x85\xda\xbd\x8b\x48\x89\x2c\xa7";
      
          int main()
          {
              int fd;
              struct sockaddr_alg addr = {
                  .salg_type = "hash",
                  .salg_name = "sha256_mb",
              };
              __u8 data[4096] = { 0 };
              __u8 digest[32];
              int ret;
              int i;
      
              fd = socket(AF_ALG, SOCK_SEQPACKET, 0);
              bind(fd, (void *)&addr, sizeof(addr));
              fork();
              fd = accept(fd, 0, 0);
              do {
                  ret = write(fd, data, 4096);
                  assert(ret == 4096);
                  ret = read(fd, digest, 32);
                  assert(ret == 32);
              } while (memcmp(digest, expected, 32) == 0);
      
              printf("wrong digest: ");
              for (i = 0; i < 32; i++)
                  printf("%02x", digest[i]);
              printf("\n");
          }
      
      Output was:
      
          wrong digest: ad7facb2000000000000000000000000ffffffef7cb47c7a85dabd8b48892ca7
      
      Fixes: 172b1d6b ("crypto: sha256-mb - fix ctx pointer and digest copy")
      Cc: <stable@vger.kernel.org> # v4.8+
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      af839b4e
  8. 03 7月, 2018 1 次提交
    • J
      x86/asm/64: Use 32-bit XOR to zero registers · a7bea830
      Jan Beulich 提交于
      Some Intel CPUs don't recognize 64-bit XORs as zeroing idioms. Zeroing
      idioms don't require execution bandwidth, as they're being taken care
      of in the frontend (through register renaming). Use 32-bit XORs instead.
      Signed-off-by: NJan Beulich <jbeulich@suse.com>
      Cc: Alok Kataria <akataria@vmware.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: davem@davemloft.net
      Cc: herbert@gondor.apana.org.au
      Cc: pavel@ucw.cz
      Cc: rjw@rjwysocki.net
      Link: http://lkml.kernel.org/r/5B39FF1A02000078001CFB54@prv1-mh.provo.novell.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      a7bea830
  9. 01 7月, 2018 1 次提交
  10. 31 5月, 2018 2 次提交
    • E
      crypto: x86/salsa20 - remove x86 salsa20 implementations · b7b73cd5
      Eric Biggers 提交于
      The x86 assembly implementations of Salsa20 use the frame base pointer
      register (%ebp or %rbp), which breaks frame pointer convention and
      breaks stack traces when unwinding from an interrupt in the crypto code.
      Recent (v4.10+) kernels will warn about this, e.g.
      
      WARNING: kernel stack regs at 00000000a8291e69 in syzkaller047086:4677 has bad 'bp' value 000000001077994c
      [...]
      
      But after looking into it, I believe there's very little reason to still
      retain the x86 Salsa20 code.  First, these are *not* vectorized
      (SSE2/SSSE3/AVX2) implementations, which would be needed to get anywhere
      close to the best Salsa20 performance on any remotely modern x86
      processor; they're just regular x86 assembly.  Second, it's still
      unclear that anyone is actually using the kernel's Salsa20 at all,
      especially given that now ChaCha20 is supported too, and with much more
      efficient SSSE3 and AVX2 implementations.  Finally, in benchmarks I did
      on both Intel and AMD processors with both gcc 8.1.0 and gcc 4.9.4, the
      x86_64 salsa20-asm is actually slightly *slower* than salsa20-generic
      (~3% slower on Skylake, ~10% slower on Zen), while the i686 salsa20-asm
      is only slightly faster than salsa20-generic (~15% faster on Skylake,
      ~20% faster on Zen).  The gcc version made little difference.
      
      So, the x86_64 salsa20-asm is pretty clearly useless.  That leaves just
      the i686 salsa20-asm, which based on my tests provides a 15-20% speed
      boost.  But that's without updating the code to not use %ebp.  And given
      the maintenance cost, the small speed difference vs. salsa20-generic,
      the fact that few people still use i686 kernels, the doubt that anyone
      is even using the kernel's Salsa20 at all, and the fact that a SSE2
      implementation would almost certainly be much faster on any remotely
      modern x86 processor yet no one has cared enough to add one yet, I don't
      think it's worthwhile to keep.
      
      Thus, just remove both the x86_64 and i686 salsa20-asm implementations.
      
      Reported-by: syzbot+ffa3a158337bbc01ff09@syzkaller.appspotmail.com
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      b7b73cd5
    • O
      crypto: morus - Mark MORUS SIMD glue as x86-specific · 2808f173
      Ondrej Mosnacek 提交于
      Commit 56e8e57f ("crypto: morus - Add common SIMD glue code for
      MORUS") accidetally consiedered the glue code to be usable by different
      architectures, but it seems to be only usable on x86.
      
      This patch moves it under arch/x86/crypto and adds 'depends on X86' to
      the Kconfig options and also removes the prompt to hide these internal
      options from the user.
      Reported-by: Nkbuild test robot <lkp@intel.com>
      Signed-off-by: NOndrej Mosnacek <omosnacek@gmail.com>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      2808f173
  11. 27 5月, 2018 1 次提交
  12. 19 5月, 2018 2 次提交
  13. 05 5月, 2018 1 次提交
  14. 09 3月, 2018 1 次提交
  15. 03 3月, 2018 18 次提交