提交 · ad1064cd612e11b807eb764140deb5c1875ca5dc · openeuler / Kernel

05 4月, 2017 3 次提交

crypto: xts - drop gf128mul dependency · ad1064cd

由 Ondrej Mosnáček 提交于 4月 02, 2017

Since the gf128mul_x_ble function used by xts.c is now defined inline
in the header file, the XTS module no longer depends on gf128mul.
Therefore, the 'select CRYPTO_GF128MUL' line can be safely removed.
Signed-off-by: NOndrej Mosnacek <omosnacek@gmail.com>
Reviewd-by: NEric Biggers <ebiggers@google.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

ad1064cd

crypto: gf128mul - switch gf128mul_x_ble to le128 · e55318c8

由 Ondrej Mosnáček 提交于 4月 02, 2017

Currently, gf128mul_x_ble works with pointers to be128, even though it
actually interprets the words as little-endian. Consequently, it uses
cpu_to_le64/le64_to_cpu on fields of type __be64, which is incorrect.

This patch fixes that by changing the function to accept pointers to
le128 and updating all users accordingly.
Signed-off-by: NOndrej Mosnacek <omosnacek@gmail.com>
Reviewd-by: NEric Biggers <ebiggers@google.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

e55318c8

crypto: gf128mul - define gf128mul_x_* in gf128mul.h · acb9b159

由 Ondrej Mosnáček 提交于 4月 02, 2017

The gf128mul_x_ble function is currently defined in gf128mul.c, because
it depends on the gf128mul_table_be multiplication table.

However, since the function is very small and only uses two values from
the table, it is better for it to be defined as inline function in
gf128mul.h. That way, the function can be inlined by the compiler for
better performance.

For consistency, the other gf128mul_x_* functions are also moved to the
header file. In addition, the code is rewritten to be constant-time.

After this change, the speed of the generic 'xts(aes)' implementation
increased from ~225 MiB/s to ~235 MiB/s (measured using 'cryptsetup
benchmark -c aes-xts-plain64' on an Intel system with CRYPTO_AES_X86_64
and CRYPTO_AES_NI_INTEL disabled).
Signed-off-by: NOndrej Mosnacek <omosnacek@gmail.com>
Reviewd-by: NEric Biggers <ebiggers@google.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

acb9b159

24 3月, 2017 6 次提交

crypto: DRBG - initialize SGL only once · 44068d59

由 Stephan Mueller 提交于 3月 22, 2017

An SGL to be initialized only once even when its buffers are written
to several times.
Signed-off-by: NStephan Mueller <smueller@chronox.de>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

44068d59

crypto: testmgr - mark ctr(des3_ede) as fips_allowed · 0d8da104

由 Marcelo Cerri 提交于 3月 20, 2017

3DES is missing the fips_allowed flag for CTR mode.
Signed-off-by: NMarcelo Henrique Cerri <marcelo.cerri@canonical.com>
Acked-by: NStephan Mueller <smueller@chronox.de>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

0d8da104

md5: remove from lib and only live in crypto · 3c7eb3cc

由 Jason A. Donenfeld 提交于 3月 16, 2017

The md5_transform function is no longer used any where in the tree,
except for the crypto api's actual implementation of md5, so we can drop
the function from lib and put it as a static function of the crypto
file, where it belongs. There should be no new users of md5_transform,
anyway, since there are more modern ways of doing what it once achieved.
Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
Reviewed-by: NEric Biggers <ebiggers@google.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

3c7eb3cc

crypto: powerpc - Stress test for vpmsum implementations · 146c8688

由 Daniel Axtens 提交于 3月 15, 2017

vpmsum implementations often don't kick in for short test vectors.
This is a simple test module that does a configurable number of
random tests, each up to 64kB and each with random offsets.

Both CRC-T10DIF and CRC32C are tested.

Cc: Anton Blanchard <anton@samba.org>
Signed-off-by: NDaniel Axtens <dja@axtens.net>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

146c8688

crypto: powerpc - Add CRC-T10DIF acceleration · b01df1c1

由 Daniel Axtens 提交于 3月 15, 2017

T10DIF is a CRC16 used heavily in NVMe.

It turns out we can accelerate it with a CRC32 library and a few
little tricks.

Provide the accelerator based the refactored CRC32 code.

Cc: Anton Blanchard <anton@samba.org>
Thanks-to: Hong Bo Peng <penghb@cn.ibm.com>
Signed-off-by: NDaniel Axtens <dja@axtens.net>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

b01df1c1

crypto: xts,lrw - fix out-of-bounds write after kmalloc failure · 9df0eb18

由 Eric Biggers 提交于 3月 23, 2017

In the generic XTS and LRW algorithms, for input data > 128 bytes, a
temporary buffer is allocated to hold the values to be XOR'ed with the
data before and after encryption or decryption.  If the allocation
fails, the fixed-size buffer embedded in the request buffer is meant to
be used as a fallback --- resulting in more calls to the ECB algorithm,
but still producing the correct result.  However, we weren't correctly
limiting subreq->cryptlen in this case, resulting in pre_crypt()
overrunning the embedded buffer.  Fix this by setting subreq->cryptlen
correctly.

Fixes: f1c131b4 ("crypto: xts - Convert to skcipher")
Fixes: 700cb3f5 ("crypto: lrw - Convert to skcipher")
Cc: stable@vger.kernel.org # v4.10+
Reported-by: NDmitry Vyukov <dvyukov@google.com>
Signed-off-by: NEric Biggers <ebiggers@google.com>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

9df0eb18

10 3月, 2017 1 次提交

net: Work around lockdep limitation in sockets that use sockets · cdfbabfb

由 David Howells 提交于 3月 09, 2017

Lockdep issues a circular dependency warning when AFS issues an operation
through AF_RXRPC from a context in which the VFS/VM holds the mmap_sem.

The theory lockdep comes up with is as follows:

 (1) If the pagefault handler decides it needs to read pages from AFS, it
     calls AFS with mmap_sem held and AFS begins an AF_RXRPC call, but
     creating a call requires the socket lock:

	mmap_sem must be taken before sk_lock-AF_RXRPC

 (2) afs_open_socket() opens an AF_RXRPC socket and binds it.  rxrpc_bind()
     binds the underlying UDP socket whilst holding its socket lock.
     inet_bind() takes its own socket lock:

	sk_lock-AF_RXRPC must be taken before sk_lock-AF_INET

 (3) Reading from a TCP socket into a userspace buffer might cause a fault
     and thus cause the kernel to take the mmap_sem, but the TCP socket is
     locked whilst doing this:

	sk_lock-AF_INET must be taken before mmap_sem

However, lockdep's theory is wrong in this instance because it deals only
with lock classes and not individual locks.  The AF_INET lock in (2) isn't
really equivalent to the AF_INET lock in (3) as the former deals with a
socket entirely internal to the kernel that never sees userspace.  This is
a limitation in the design of lockdep.

Fix the general case by:

 (1) Double up all the locking keys used in sockets so that one set are
     used if the socket is created by userspace and the other set is used
     if the socket is created by the kernel.

 (2) Store the kern parameter passed to sk_alloc() in a variable in the
     sock struct (sk_kern_sock).  This informs sock_lock_init(),
     sock_init_data() and sk_clone_lock() as to the lock keys to be used.

     Note that the child created by sk_clone_lock() inherits the parent's
     kern setting.

 (3) Add a 'kern' parameter to ->accept() that is analogous to the one
     passed in to ->create() that distinguishes whether kernel_accept() or
     sys_accept4() was the caller and can be passed to sk_alloc().

     Note that a lot of accept functions merely dequeue an already
     allocated socket.  I haven't touched these as the new socket already
     exists before we get the parameter.

     Note also that there are a couple of places where I've made the accepted
     socket unconditionally kernel-based:

	irda_accept()
	rds_rcp_accept_one()
	tcp_accept_from_sock()

     because they follow a sock_create_kern() and accept off of that.

Whilst creating this, I noticed that lustre and ocfs don't create sockets
through sock_create_kern() and thus they aren't marked as for-kernel,
though they appear to be internal.  I wonder if these should do that so
that they use the new set of lock keys.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cdfbabfb

09 3月, 2017 9 次提交

crypto: ctr - Propagate NEED_FALLBACK bit · d2c2a85c

由 Marcelo Cerri 提交于 2月 27, 2017

When requesting a fallback algorithm, we should propagate the
NEED_FALLBACK bit when search for the underlying algorithm.

This will prevents drivers from allocating unnecessary fallbacks that
are never called. For instance, currently the vmx-crypto driver will use
the following chain of calls when calling the fallback implementation:

p8_aes_ctr -> ctr(p8_aes) -> aes-generic

However p8_aes will always delegate its calls to aes-generic. With this
patch, p8_aes_ctr will be able to use ctr(aes-generic) directly as its
fallback. The same applies to aes_s390.
Signed-off-by: NMarcelo Henrique Cerri <marcelo.cerri@canonical.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

d2c2a85c

crypto: cbc - Propagate NEED_FALLBACK bit · e6c2e65c

由 Marcelo Cerri 提交于 2月 27, 2017

When requesting a fallback algorithm, we should propagate the
NEED_FALLBACK bit when search for the underlying algorithm.

p8_aes_cbc -> cbc(p8_aes) -> aes-generic

However p8_aes will always delegate its calls to aes-generic. With this
patch, p8_aes_cbc will be able to use cbc(aes-generic) directly as its
fallback. The same applies to aes_s390.
Signed-off-by: NMarcelo Henrique Cerri <marcelo.cerri@canonical.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

e6c2e65c

crypto: testmgr - constify all test vectors · b13b1e0c

由 Eric Biggers 提交于 2月 24, 2017

Cryptographic test vectors should never be modified, so constify them to
enforce this at both compile-time and run-time. This moves a significant
amount of data from .data to .rodata when the crypto tests are enabled.
Signed-off-by: NEric Biggers <ebiggers@google.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

b13b1e0c

crypto: kpp - constify buffer passed to crypto_kpp_set_secret() · 5527dfb6

由 Eric Biggers 提交于 2月 24, 2017

Constify the buffer passed to crypto_kpp_set_secret() and
kpp_alg.set_secret, since it is never modified.
Signed-off-by: NEric Biggers <ebiggers@google.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

5527dfb6

crypto: algapi - annotate expected branch behavior in crypto_inc() · 27c539ae

由 Ard Biesheuvel 提交于 2月 14, 2017

To prevent unnecessary branching, mark the exit condition of the
primary loop as likely(), given that a carry in a 32-bit counter
occurs very rarely.

On arm64, the resulting code is emitted by GCC as

     9a8:   cmp     w1, #0x3
     9ac:   add     x3, x0, w1, uxtw
     9b0:   b.ls    9e0 <crypto_inc+0x38>
     9b4:   ldr     w2, [x3,#-4]!
     9b8:   rev     w2, w2
     9bc:   add     w2, w2, #0x1
     9c0:   rev     w4, w2
     9c4:   str     w4, [x3]
     9c8:   cbz     w2, 9d0 <crypto_inc+0x28>
     9cc:   ret

where the two remaining branch conditions (one for size < 4 and one for
the carry) are statically predicted as non-taken, resulting in optimal
execution in the vast majority of cases.

Also, replace the open coded alignment test with IS_ALIGNED().

Cc: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

27c539ae

crypto: gf128mul - constify 4k and 64k multiplication tables · 3ea996dd

由 Eric Biggers 提交于 2月 14, 2017

Constify the multiplication tables passed to the 4k and 64k
multiplication functions, as they are not modified by these functions.

Cc: Alex Cope <alexcope@google.com>
Signed-off-by: NEric Biggers <ebiggers@google.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

3ea996dd

crypto: gf128mul - rename the byte overflow tables · f33fd647

由 Eric Biggers 提交于 2月 14, 2017

Though the GF(2^128) byte overflow tables were named the "lle" and "bbe"
tables, they are not actually tied to these element formats
specifically, but rather to particular a "bit endianness".  For example,
the bbe table is actually used for both bbe and ble multiplication.
Therefore, rename the tables to "le" and "be" and update the comment to
explain this.

Cc: Alex Cope <alexcope@google.com>
Signed-off-by: NEric Biggers <ebiggers@google.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

f33fd647

crypto: gf128mul - remove xx() macro · 2416e4fa

由 Eric Biggers 提交于 2月 14, 2017

The xx() macro serves no purpose and can be removed.

Cc: Alex Cope <alexcope@google.com>
Signed-off-by: NEric Biggers <ebiggers@google.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

2416e4fa

crypto: gf128mul - fix some comments · 63be5b53

由 Eric Biggers 提交于 2月 14, 2017

Fix incorrect references to GF(128) instead of GF(2^128), as these are
two entirely different fields, and fix a few other incorrect comments.

Cc: Alex Cope <alexcope@google.com>
Signed-off-by: NEric Biggers <ebiggers@google.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

63be5b53

02 3月, 2017 3 次提交

sched/headers: Prepare for new header dependencies before moving code to <linux/sched/stat.h> · 03441a34

由 Ingo Molnar 提交于 2月 08, 2017

We are going to split <linux/sched/stat.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.

Create a trivial placeholder <linux/sched/stat.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.

Include the new header in the files that are going to need it.
Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: NIngo Molnar <mingo@kernel.org>

03441a34

sched/headers: Prepare to move signal wakeup & sigpending methods from... · 174cd4b1

由 Ingo Molnar 提交于 2月 02, 2017

sched/headers: Prepare to move signal wakeup & sigpending methods from <linux/sched.h> into <linux/sched/signal.h>

Fix up affected files that include this signal functionality via sched.h.
Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: NIngo Molnar <mingo@kernel.org>

174cd4b1

sched/headers: Prepare for new header dependencies before moving code to <uapi/linux/sched/types.h> · ae7e81c0

由 Ingo Molnar 提交于 2月 01, 2017

We are going to move scheduler ABI details to <uapi/linux/sched/types.h>,
which will be used from a number of .c files.

Create empty placeholder header that maps to <linux/types.h>.

Include the new header in the files that are going to need it.
Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: NIngo Molnar <mingo@kernel.org>

ae7e81c0

01 3月, 2017 1 次提交

crypto: testmgr - Pad aes_ccm_enc_tv_template vector · 1c68bb0f

由 Laura Abbott 提交于 2月 28, 2017

Running with KASAN and crypto tests currently gives

 BUG: KASAN: global-out-of-bounds in __test_aead+0x9d9/0x2200 at addr ffffffff8212fca0
 Read of size 16 by task cryptomgr_test/1107
 Address belongs to variable 0xffffffff8212fca0
 CPU: 0 PID: 1107 Comm: cryptomgr_test Not tainted 4.10.0+ #45
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.1-1.fc24 04/01/2014
 Call Trace:
  dump_stack+0x63/0x8a
  kasan_report.part.1+0x4a7/0x4e0
  ? __test_aead+0x9d9/0x2200
  ? crypto_ccm_init_crypt+0x218/0x3c0 [ccm]
  kasan_report+0x20/0x30
  check_memory_region+0x13c/0x1a0
  memcpy+0x23/0x50
  __test_aead+0x9d9/0x2200
  ? kasan_unpoison_shadow+0x35/0x50
  ? alg_test_akcipher+0xf0/0xf0
  ? crypto_skcipher_init_tfm+0x2e3/0x310
  ? crypto_spawn_tfm2+0x37/0x60
  ? crypto_ccm_init_tfm+0xa9/0xd0 [ccm]
  ? crypto_aead_init_tfm+0x7b/0x90
  ? crypto_alloc_tfm+0xc4/0x190
  test_aead+0x28/0xc0
  alg_test_aead+0x54/0xd0
  alg_test+0x1eb/0x3d0
  ? alg_find_test+0x90/0x90
  ? __sched_text_start+0x8/0x8
  ? __wake_up_common+0x70/0xb0
  cryptomgr_test+0x4d/0x60
  kthread+0x173/0x1c0
  ? crypto_acomp_scomp_free_ctx+0x60/0x60
  ? kthread_create_on_node+0xa0/0xa0
  ret_from_fork+0x2c/0x40
 Memory state around the buggy address:
  ffffffff8212fb80: 00 00 00 00 01 fa fa fa fa fa fa fa 00 00 00 00
  ffffffff8212fc00: 00 01 fa fa fa fa fa fa 00 00 00 00 01 fa fa fa
 >ffffffff8212fc80: fa fa fa fa 00 05 fa fa fa fa fa fa 00 00 00 00
                                   ^
  ffffffff8212fd00: 01 fa fa fa fa fa fa fa 00 00 00 00 01 fa fa fa
  ffffffff8212fd80: fa fa fa fa 00 00 00 00 00 05 fa fa fa fa fa fa

This always happens on the same IV which is less than 16 bytes.

Per Ard,

"CCM IVs are 16 bytes, but due to the way they are constructed
internally, the final couple of bytes of input IV are dont-cares.

Apparently, we do read all 16 bytes, which triggers the KASAN errors."

Fix this by padding the IV with null bytes to be at least 16 bytes.

Cc: stable@vger.kernel.org
Fixes: 0bc5a6c5 ("crypto: testmgr - Disable rfc4309 test and convert
test vectors")
Acked-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NLaura Abbott <labbott@redhat.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

1c68bb0f

28 2月, 2017 1 次提交

crypto: ccm - move cbcmac input off the stack · 3b30460c

由 Ard Biesheuvel 提交于 2月 27, 2017

Commit f15f05b0 ("crypto: ccm - switch to separate cbcmac driver")
refactored the CCM driver to allow separate implementations of the
underlying MAC to be provided by a platform. However, in doing so, it
moved some data from the linear region to the stack, which violates the
SG constraints when the stack is virtually mapped.

So move idata/odata back to the request ctx struct, of which we can
reasonably expect that it has been allocated using kmalloc() et al.
Reported-by: NJohannes Berg <johannes@sipsolutions.net>
Fixes: f15f05b0 ("crypto: ccm - switch to separate cbcmac driver")
Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Tested-by: NJohannes Berg <johannes@sipsolutions.net>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

3b30460c

27 2月, 2017 1 次提交

crypto: xts - Propagate NEED_FALLBACK bit · 89027579

由 Herbert Xu 提交于 2月 26, 2017

When we're used as a fallback algorithm, we should propagate
the NEED_FALLBACK bit when searching for the underlying ECB mode.

This just happens to fix a hang too because otherwise the search
may end up loading the same module that triggered this XTS creation.

Cc: stable@vger.kernel.org #4.10
Fixes: f1c131b4 ("crypto: xts - Convert to skcipher")
Reported-by: NHarald Freudenberger <freude@linux.vnet.ibm.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

89027579

25 2月, 2017 1 次提交

crypto: change LZ4 modules to work with new LZ4 module version · 73a15ac6

由 Sven Schmidt 提交于 2月 24, 2017

Update the crypto modules using LZ4 compression as well as the test
cases in testmgr.h to work with the new LZ4 module version.

Link: http://lkml.kernel.org/r/1486321748-19085-4-git-send-email-4sschmid@informatik.uni-hamburg.deSigned-off-by: NSven Schmidt <4sschmid@informatik.uni-hamburg.de>
Cc: Bongkyu Kim <bongkyu.kim@lge.com>
Cc: Rui Salvaterra <rsalvaterra@gmail.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: David S. Miller <davem@davemloft.net>
Cc: Anton Vorontsov <anton@enomsg.org>
Cc: Colin Cross <ccross@android.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

73a15ac6

23 2月, 2017 1 次提交

crypto: xts - Add ECB dependency · 12cb3a1c

由 Milan Broz 提交于 2月 23, 2017

Since the
   commit f1c131b4
   crypto: xts - Convert to skcipher
the XTS mode is based on ECB, so the mode must select
ECB otherwise it can fail to initialize.
Signed-off-by: NMilan Broz <gmazyland@gmail.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

12cb3a1c

15 2月, 2017 2 次提交

crypto: ccm - drop unnecessary minimum 32-bit alignment · 5ba8e2a0

由 Ard Biesheuvel 提交于 2月 11, 2017

The CCM driver forces 32-bit alignment even if the underlying ciphers
don't care about alignment. This is because crypto_xor() used to require
this, but since this is no longer the case, drop the hardcoded minimum
of 32 bits.
Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

5ba8e2a0

crypto: ccm - honour alignmask of subordinate MAC cipher · 5338ad70

由 Ard Biesheuvel 提交于 2月 11, 2017

The CCM driver was recently updated to defer the MAC part of the algorithm
to a dedicated crypto transform, and a template for instantiating such
transforms was added at the same time.

However, this new cbcmac template fails to take the alignmask of the
encapsulated cipher into account, which may result in buffer addresses
being passed down that are not sufficiently aligned.

So update the code to ensure that the digest buffer in the desc ctx
appears at a sufficiently aligned offset, and tweak the code so that all
calls to crypto_cipher_encrypt_one() operate on this buffer exclusively.
Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

5338ad70

11 2月, 2017 6 次提交

crypto: algapi - make crypto_xor() and crypto_inc() alignment agnostic · db91af0f

由 Ard Biesheuvel 提交于 2月 05, 2017

Instead of unconditionally forcing 4 byte alignment for all generic
chaining modes that rely on crypto_xor() or crypto_inc() (which may
result in unnecessary copying of data when the underlying hardware
can perform unaligned accesses efficiently), make those functions
deal with unaligned input explicitly, but only if the Kconfig symbol
HAVE_EFFICIENT_UNALIGNED_ACCESS is set. This will allow us to drop
the alignmasks from the CBC, CMAC, CTR, CTS, PCBC and SEQIV drivers.

For crypto_inc(), this simply involves making the 4-byte stride
conditional on HAVE_EFFICIENT_UNALIGNED_ACCESS being set, given that
it typically operates on 16 byte buffers.

For crypto_xor(), an algorithm is implemented that simply runs through
the input using the largest strides possible if unaligned accesses are
allowed. If they are not, an optimal sequence of memory accesses is
emitted that takes the relative alignment of the input buffers into
account, e.g., if the relative misalignment of dst and src is 4 bytes,
the entire xor operation will be completed using 4 byte loads and stores
(modulo unaligned bits at the start and end). Note that all expressions
involving misalign are simply eliminated by the compiler when
HAVE_EFFICIENT_UNALIGNED_ACCESS is defined.
Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

db91af0f

crypto: improve gcc optimization flags for serpent and wp512 · 7d6e9105

由 Arnd Bergmann 提交于 2月 03, 2017

An ancient gcc bug (first reported in 2003) has apparently resurfaced
on MIPS, where kernelci.org reports an overly large stack frame in the
whirlpool hash algorithm:

crypto/wp512.c:987:1: warning: the frame size of 1112 bytes is larger than 1024 bytes [-Wframe-larger-than=]

With some testing in different configurations, I'm seeing large
variations in stack frames size up to 1500 bytes for what should have
around 300 bytes at most. I also checked the reference implementation,
which is essentially the same code but also comes with some test and
benchmarking infrastructure.

It seems that recent compiler versions on at least arm, arm64 and powerpc
have a partial fix for this problem, but enabling "-fsched-pressure", but
even with that fix they suffer from the issue to a certain degree. Some
testing on arm64 shows that the time needed to hash a given amount of
data is roughly proportional to the stack frame size here, which makes
sense given that the wp512 implementation is doing lots of loads for
table lookups, and the problem with the overly large stack is a result
of doing a lot more loads and stores for spilled registers (as seen from
inspecting the object code).

Disabling -fschedule-insns consistently fixes the problem for wp512,
in my collection of cross-compilers, the results are consistently better
or identical when comparing the stack sizes in this function, though
some architectures (notable x86) have schedule-insns disabled by
default.

The four columns are:
default: -O2
press:	 -O2 -fsched-pressure
nopress: -O2 -fschedule-insns -fno-sched-pressure
nosched: -O2 -no-schedule-insns (disables sched-pressure)

				default	press	nopress	nosched
alpha-linux-gcc-4.9.3		1136	848	1136	176
am33_2.0-linux-gcc-4.9.3	2100	2076	2100	2104
arm-linux-gnueabi-gcc-4.9.3	848	848	1048	352
cris-linux-gcc-4.9.3		272	272	272	272
frv-linux-gcc-4.9.3		1128	1000	1128	280
hppa64-linux-gcc-4.9.3		1128	336	1128	184
hppa-linux-gcc-4.9.3		644	308	644	276
i386-linux-gcc-4.9.3		352	352	352	352
m32r-linux-gcc-4.9.3		720	656	720	268
microblaze-linux-gcc-4.9.3	1108	604	1108	256
mips64-linux-gcc-4.9.3		1328	592	1328	208
mips-linux-gcc-4.9.3		1096	624	1096	240
powerpc64-linux-gcc-4.9.3	1088	432	1088	160
powerpc-linux-gcc-4.9.3		1080	584	1080	224
s390-linux-gcc-4.9.3		456	456	624	360
sh3-linux-gcc-4.9.3		292	292	292	292
sparc64-linux-gcc-4.9.3		992	240	992	208
sparc-linux-gcc-4.9.3		680	592	680	312
x86_64-linux-gcc-4.9.3		224	240	272	224
xtensa-linux-gcc-4.9.3		1152	704	1152	304

aarch64-linux-gcc-7.0.0		224	224	1104	208
arm-linux-gnueabi-gcc-7.0.1	824	824	1048	352
mips-linux-gcc-7.0.0		1120	648	1120	272
x86_64-linux-gcc-7.0.1		240	240	304	240

arm-linux-gnueabi-gcc-4.4.7	840			392
arm-linux-gnueabi-gcc-4.5.4	784	728	784	320
arm-linux-gnueabi-gcc-4.6.4	736	728	736	304
arm-linux-gnueabi-gcc-4.7.4	944	784	944	352
arm-linux-gnueabi-gcc-4.8.5	464	464	760	352
arm-linux-gnueabi-gcc-4.9.3	848	848	1048	352
arm-linux-gnueabi-gcc-5.3.1	824	824	1064	336
arm-linux-gnueabi-gcc-6.1.1	808	808	1056	344
arm-linux-gnueabi-gcc-7.0.1	824	824	1048	352

Trying the same test for serpent-generic, the picture is a bit different,
and while -fno-schedule-insns is generally better here than the default,
-fsched-pressure wins overall, so I picked that instead.

				default	press	nopress	nosched
alpha-linux-gcc-4.9.3		1392	864	1392	960
am33_2.0-linux-gcc-4.9.3	536	524	536	528
arm-linux-gnueabi-gcc-4.9.3	552	552	776	536
cris-linux-gcc-4.9.3		528	528	528	528
frv-linux-gcc-4.9.3		536	400	536	504
hppa64-linux-gcc-4.9.3		524	208	524	480
hppa-linux-gcc-4.9.3		768	472	768	508
i386-linux-gcc-4.9.3		564	564	564	564
m32r-linux-gcc-4.9.3		712	576	712	532
microblaze-linux-gcc-4.9.3	724	392	724	512
mips64-linux-gcc-4.9.3		720	384	720	496
mips-linux-gcc-4.9.3		728	384	728	496
powerpc64-linux-gcc-4.9.3	704	304	704	480
powerpc-linux-gcc-4.9.3		704	296	704	480
s390-linux-gcc-4.9.3		560	560	592	536
sh3-linux-gcc-4.9.3		540	540	540	540
sparc64-linux-gcc-4.9.3		544	352	544	496
sparc-linux-gcc-4.9.3		544	344	544	496
x86_64-linux-gcc-4.9.3		528	536	576	528
xtensa-linux-gcc-4.9.3		752	544	752	544

aarch64-linux-gcc-7.0.0		432	432	656	480
arm-linux-gnueabi-gcc-7.0.1	616	616	808	536
mips-linux-gcc-7.0.0		720	464	720	488
x86_64-linux-gcc-7.0.1		536	528	600	536

arm-linux-gnueabi-gcc-4.4.7	592			440
arm-linux-gnueabi-gcc-4.5.4	776	448	776	544
arm-linux-gnueabi-gcc-4.6.4	776	448	776	544
arm-linux-gnueabi-gcc-4.7.4	768	448	768	544
arm-linux-gnueabi-gcc-4.8.5	488	488	776	544
arm-linux-gnueabi-gcc-4.9.3	552	552	776	536
arm-linux-gnueabi-gcc-5.3.1	552	552	776	536
arm-linux-gnueabi-gcc-6.1.1	560	560	776	536
arm-linux-gnueabi-gcc-7.0.1	616	616	808	536

I did not do any runtime tests with serpent, so it is possible that stack
frame size does not directly correlate with runtime performance here and
it actually makes things worse, but it's more likely to help here, and
the reduced stack frame size is probably enough reason to apply the patch,
especially given that the crypto code is often used in deep call chains.

Link: https://kernelci.org/build/id/58797d7559b5149efdf6c3a9/logs/
Link: http://www.larc.usp.br/~pbarreto/WhirlpoolPage.html
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=11488
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79149
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

7d6e9105

crypto: ccm - switch to separate cbcmac driver · f15f05b0

由 Ard Biesheuvel 提交于 2月 03, 2017

Update the generic CCM driver to defer CBC-MAC processing to a
dedicated CBC-MAC ahash transform rather than open coding this
transform (and much of the associated scatterwalk plumbing) in
the CCM driver itself.

This cleans up the code considerably, but more importantly, it allows
the use of alternative CBC-MAC implementations that don't suffer from
performance degradation due to significant setup time (e.g., the NEON
based AES code needs to enable/disable the NEON, and load the S-box
into 16 SIMD registers, which cannot be amortized over the entire input
when using the cipher interface)
Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

f15f05b0

crypto: testmgr - add test cases for cbcmac(aes) · 092acf06

由 Ard Biesheuvel 提交于 2月 03, 2017

In preparation of splitting off the CBC-MAC transform in the CCM
driver into a separate algorithm, define some test cases for the
AES incarnation of cbcmac.
Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

092acf06

crypto: aes - add generic time invariant AES cipher · b5e0b032

由 Ard Biesheuvel 提交于 2月 02, 2017

Lookup table based AES is sensitive to timing attacks, which is due to
the fact that such table lookups are data dependent, and the fact that
8 KB worth of tables covers a significant number of cachelines on any
architecture, resulting in an exploitable correlation between the key
and the processing time for known plaintexts.

For network facing algorithms such as CTR, CCM or GCM, this presents a
security risk, which is why arch specific AES ports are typically time
invariant, either through the use of special instructions, or by using
SIMD algorithms that don't rely on table lookups.

For generic code, this is difficult to achieve without losing too much
performance, but we can improve the situation significantly by switching
to an implementation that only needs 256 bytes of table data (the actual
S-box itself), which can be prefetched at the start of each block to
eliminate data dependent latencies.

This code encrypts at ~25 cycles per byte on ARM Cortex-A57 (while the
ordinary generic AES driver manages 18 cycles per byte on this
hardware). Decryption is substantially slower.
Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

b5e0b032

crypto: aes-generic - drop alignment requirement · ec38a937

由 Ard Biesheuvel 提交于 2月 02, 2017

The generic AES code exposes a 32-bit align mask, which forces all
users of the code to use temporary buffers or take other measures to
ensure the alignment requirement is adhered to, even on architectures
that don't care about alignment for software algorithms such as this
one.

So drop the align mask, and fix the code to use get_unaligned_le32()
where appropriate, which will resolve to whatever is optimal for the
architecture.
Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

ec38a937

03 2月, 2017 1 次提交

crypto: algif_aead - Fix kernel panic on list_del · 0b529f14

由 Harsh Jain 提交于 2月 01, 2017

Kernel panics when userspace program try to access AEAD interface.
Remove node from Linked List before freeing its memory.

Cc: <stable@vger.kernel.org>
Signed-off-by: NHarsh Jain <harsh@chelsio.com>
Reviewed-by: NStephan Müller <smueller@chronox.de>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

0b529f14

23 1月, 2017 2 次提交

crypto: tcrypt - Add debug prints · 76512f2d

由 Rabin Vincent 提交于 1月 18, 2017

tcrypt is very tight-lipped when it succeeds, but a bit more feedback
would be useful when developing or debugging crypto drivers, especially
since even a successful run ends with the module failing to insert. Add
a couple of debug prints, which can be enabled with dynamic debug:

Before:

 # insmod tcrypt.ko mode=10
 insmod: can't insert 'tcrypt.ko': Resource temporarily unavailable

After:

 # insmod tcrypt.ko mode=10 dyndbg
 tcrypt: testing ecb(aes)
 tcrypt: testing cbc(aes)
 tcrypt: testing lrw(aes)
 tcrypt: testing xts(aes)
 tcrypt: testing ctr(aes)
 tcrypt: testing rfc3686(ctr(aes))
 tcrypt: all tests passed
 insmod: can't insert 'tcrypt.ko': Resource temporarily unavailable
Signed-off-by: NRabin Vincent <rabinv@axis.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

76512f2d

crypto: api - Clear CRYPTO_ALG_DEAD bit before registering an alg · d6040764

由 Salvatore Benedetto 提交于 1月 13, 2017

Make sure CRYPTO_ALG_DEAD bit is cleared before proceeding with
the algorithm registration. This fixes qat-dh registration when
driver is restarted

Cc: <stable@vger.kernel.org>
Signed-off-by: NSalvatore Benedetto <salvatore.benedetto@intel.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

d6040764

13 1月, 2017 2 次提交

crypto: testmgr - use calculated count for number of test vectors · 21c8e720

由 Ard Biesheuvel 提交于 1月 12, 2017

When working on AES in CCM mode for ARM, my code passed the internal
tcrypt test before I had even bothered to implement the AES-192 and
AES-256 code paths, which is strange because the tcrypt does contain
AES-192 and AES-256 test vectors for CCM.

As it turned out, the define AES_CCM_ENC_TEST_VECTORS was out of sync
with the actual number of test vectors, causing only the AES-128 ones
to be executed.

So get rid of the defines, and wrap the test vector references in a
macro that calculates the number of vectors automatically.

The following test vector counts were out of sync with the respective
defines:

    BF_CTR_ENC_TEST_VECTORS          2 ->  3
    BF_CTR_DEC_TEST_VECTORS          2 ->  3
    TF_CTR_ENC_TEST_VECTORS          2 ->  3
    TF_CTR_DEC_TEST_VECTORS          2 ->  3
    SERPENT_CTR_ENC_TEST_VECTORS     2 ->  3
    SERPENT_CTR_DEC_TEST_VECTORS     2 ->  3
    AES_CCM_ENC_TEST_VECTORS         8 -> 14
    AES_CCM_DEC_TEST_VECTORS         7 -> 17
    AES_CCM_4309_ENC_TEST_VECTORS    7 -> 23
    AES_CCM_4309_DEC_TEST_VECTORS   10 -> 23
    CAMELLIA_CTR_ENC_TEST_VECTORS    2 ->  3
    CAMELLIA_CTR_DEC_TEST_VECTORS    2 ->  3
Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

21c8e720

crypto: testmgr - Allocate only the required output size for hash tests · e93acd6f

由 Andrew Lutomirski 提交于 1月 10, 2017

There are some hashes (e.g. sha224) that have some internal trickery
to make sure that only the correct number of output bytes are
generated.  If something goes wrong, they could potentially overrun
the output buffer.

Make the test more robust by allocating only enough space for the
correct output size so that memory debugging will catch the error if
the output is overrun.

Tested by intentionally breaking sha224 to output all 256
internally-generated bits while running on KASAN.

Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

e93acd6f

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功