提交 · 944c3d4dca34403e802287a1e7e9d02c06dce0d5 · openeuler / Kernel

15 2月, 2017 18 次提交

crypto: caam - fix state buffer DMA (un)mapping · 944c3d4d

由 Horia Geantă 提交于 2月 10, 2017

If we register the DMA API debug notification chain to
receive platform bus events:
    dma_debug_add_bus(&platform_bus_type);
we start receiving warnings after a simple test like "modprobe caam_jr &&
modprobe caamhash && modprobe -r caamhash && modprobe -r caam_jr":
platform ffe301000.jr: DMA-API: device driver has pending DMA allocations while released from device [count=1938]
One of leaked entries details: [device address=0x0000000173fda090] [size=63 bytes] [mapped with DMA_TO_DEVICE] [mapped as single]

It turns out there are several issues with handling buf_dma (mapping of buffer
holding the previous chunk smaller than hash block size):
-detection of buf_dma mapping failure occurs too late, after a job descriptor
using that value has been submitted for execution
-dma mapping leak - unmapping is not performed in all places: for e.g.
in ahash_export or in most ahash_fin* callbacks (due to current back-to-back
implementation of buf_dma unmapping/mapping)

Fix these by:
-calling dma_mapping_error() on buf_dma right after the mapping and providing
an error code if needed
-unmapping buf_dma during the "job done" (ahash_done_*) callbacks
Signed-off-by: NHoria Geantă <horia.geanta@nxp.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

944c3d4d

crypto: caam - abstract ahash request double buffering · 0355d23d

由 Horia Geantă 提交于 2月 10, 2017

caamhash uses double buffering for holding previous/current
and next chunks (data smaller than block size) to be hashed.

Add (inline) functions to abstract this mechanism.
Signed-off-by: NHoria Geantă <horia.geanta@nxp.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

0355d23d

crypto: caam - fix error path for ctx_dma mapping failure · 87ec02e7

由 Horia Geantă 提交于 2月 10, 2017

In case ctx_dma dma mapping fails, ahash_unmap_ctx() tries to
dma unmap an invalid address:
map_seq_out_ptr_ctx() / ctx_map_to_sec4_sg() -> goto unmap_ctx ->
-> ahash_unmap_ctx() -> dma unmap ctx_dma

There is also possible to reach ahash_unmap_ctx() with ctx_dma
uninitialzed or to try to unmap the same address twice.

Fix these by setting ctx_dma = 0 where needed:
-initialize ctx_dma in ahash_init()
-clear ctx_dma in case of mapping error (instead of holding
the error code returned by the dma map function)
-clear ctx_dma after each unmapping

Fixes: 32686d34 ("crypto: caam - ensure that we clean up after an error")
Signed-off-by: NHoria Geantă <horia.geanta@nxp.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

87ec02e7

crypto: caam - fix DMA API leaks for multiple setkey() calls · bbf22344

由 Horia Geantă 提交于 2月 10, 2017

setkey() callback may be invoked multiple times for the same tfm.
In this case, DMA API leaks are caused by shared descriptors
(and key for caamalg) being mapped several times and unmapped only once.
Fix this by performing mapping / unmapping only in crypto algorithm's
cra_init() / cra_exit() callbacks and sync_for_device in the setkey()
tfm callback.
Signed-off-by: NHoria Geantă <horia.geanta@nxp.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

bbf22344

crypto: caam - don't dma_map key for hash algorithms · cfb725f6

由 Horia Geantă 提交于 2月 10, 2017

Shared descriptors for hash algorithms are small enough
for (split) keys to be inlined in all cases.
Since driver already does this, all what's left is to remove
unused ctx->key_dma.

Fixes: 045e3678 ("crypto: caam - ahash hmac support")
Signed-off-by: NHoria Geantă <horia.geanta@nxp.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

cfb725f6

crypto: caam - use dma_map_sg() return code · 838e0a89

由 Horia Geantă 提交于 2月 10, 2017

dma_map_sg() might coalesce S/G entries, so use the number of S/G
entries returned by it instead of what sg_nents_for_len() initially
returns.
Signed-off-by: NHoria Geantă <horia.geanta@nxp.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

838e0a89

crypto: caam - replace sg_count() with sg_nents_for_len() · fa0c92db

由 Horia Geantă 提交于 2月 10, 2017

Replace internal sg_count() function and the convoluted logic
around it with the standard sg_nents_for_len() function.
src_nents, dst_nents now hold the number of SW S/G entries,
instead of the HW S/G table entries.

With this change, null (zero length) input data for AEAD case
needs to be handled in a visible way. req->src is no longer
(un)mapped, pointer address is set to 0 in SEQ IN PTR command.
Signed-off-by: NHoria Geantă <horia.geanta@nxp.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

fa0c92db

crypto: caam - check sg_count() return value · fd144d83

由 Horia Geantă 提交于 2月 10, 2017

sg_count() internally calls sg_nents_for_len(), which could fail
in case the required number of bytes is larger than the total
bytes in the S/G.

Thus, add checks to validate the input.
Signed-off-by: NHoria Geantă <horia.geanta@nxp.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

fd144d83

crypto: caam - fix HW S/G in ablkcipher_giv_edesc_alloc() · fd88aac9

由 Horia Geantă 提交于 2月 10, 2017

HW S/G generation does not work properly when the following conditions
are met:
-src == dst
-src/dst is S/G
-IV is right before (contiguous with) the first src/dst S/G entry
since "iv_contig" is set to true (iv_contig is a misnomer here and
it actually refers to the whole output being contiguous)

Fix this by setting dst S/G nents equal to src S/G nents, instead of
leaving it set to init value (0).
Signed-off-by: NHoria Geantă <horia.geanta@nxp.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

fd88aac9

crypto: caam - fix JR IO mapping if one fails · 4d8348d8

由 Tudor Ambarus 提交于 2月 10, 2017

If one of the JRs failed at init, the next JR used
the failed JR's IO space. The patch fixes this bug.
Signed-off-by: NTudor Ambarus <tudor-dan.ambarus@nxp.com>
Signed-off-by: NHoria Geantă <horia.geanta@nxp.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

4d8348d8

crypto: caam - check return code of dma_set_mask_and_coherent() · b3b5fce7

由 Horia Geantă 提交于 2月 10, 2017

Setting the dma mask could fail, thus make sure it succeeds
before going further.
Signed-off-by: NHoria Geantă <horia.geanta@nxp.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

b3b5fce7

crypto: caam - don't include unneeded headers · 78fd0fff

由 Horia Geantă 提交于 2月 10, 2017

intern.h, jr.h are not needed in error.c
error.h is not needed in ctrl.c
Signed-off-by: NHoria Geantă <horia.geanta@nxp.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

78fd0fff

crypto: ccp - Simplify some buffer management routines · 83d650ab

由 Gary R Hook 提交于 2月 09, 2017

The reverse-get/set functions can be simplified by
eliminating unused code.
Signed-off-by: NGary R Hook <gary.hook@amd.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

83d650ab

crypto: ccp - Update the command queue on errors · 4cdf101e

由 Gary R Hook 提交于 2月 09, 2017

Move the command queue tail pointer when an error is
detected. Always return the error.
Signed-off-by: NGary R Hook <gary.hook@amd.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

4cdf101e

crypto: ccp - Change mode for detailed CCP init messages · a60496a0

由 Gary R Hook 提交于 2月 09, 2017

The CCP initialization messages only need to be sent to
syslog in debug mode.
Signed-off-by: NGary R Hook <gary.hook@amd.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

a60496a0

crypto: atmel-sha - fix error management in atmel_sha_start() · 19998acb

由 Cyrille Pitchen 提交于 2月 09, 2017

This patch clarifies and fixes how errors should be handled by
atmel_sha_start().

For update operations, the previous code wrongly assumed that
(err != -EINPROGRESS) implies (err == 0). It's wrong because that doesn't
take the error cases (err < 0) into account.

This patch also adds many comments to detail all the possible returned
values and what should be done in each case.

Especially, when an error occurs, since atmel_sha_complete() has already
been called, hence releasing the hardware, atmel_sha_start() must not call
atmel_sha_finish_req() later otherwise atmel_sha_complete() would be
called a second time.
Signed-off-by: NCyrille Pitchen <cyrille.pitchen@atmel.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

19998acb

crypto: atmel-sha - fix missing "return" instructions · dd3f9f40

由 Cyrille Pitchen 提交于 2月 09, 2017

This patch fixes a previous patch: "crypto: atmel-sha - update request
queue management to make it more generic".

Indeed the patch above should have replaced the "return -EINVAL;" lines by
"return atmel_sha_complete(dd, -EINVAL);" but instead replaced them by a
simple call of "atmel_sha_complete(dd, -EINVAL);".
Hence all "return" instructions were missing.
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NCyrille Pitchen <cyrille.pitchen@atmel.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

dd3f9f40

crypto: ccp - Set the AES size field for all modes · f7cc02b3

由 Gary R Hook 提交于 2月 08, 2017

Ensure that the size field is correctly populated for
all AES modes.
Signed-off-by: NGary R Hook <gary.hook@amd.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

f7cc02b3

11 2月, 2017 17 次提交

crypto: brcm - Add Broadcom SPU driver · 9d12ba86

由 Rob Rice 提交于 2月 03, 2017

Add Broadcom Secure Processing Unit (SPU) crypto driver for SPU
hardware crypto offload. The driver supports ablkcipher, ahash,
and aead symmetric crypto operations.
Signed-off-by: NSteve Lin <steven.lin1@broadcom.com>
Signed-off-by: NRob Rice <rob.rice@broadcom.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

9d12ba86

crypto: brcm - DT documentation for Broadcom SPU hardware · 206dc4fc

由 Rob Rice 提交于 2月 03, 2017

Device tree documentation for Broadcom Secure Processing Unit
(SPU) crypto hardware.
Signed-off-by: NSteve Lin <steven.lin1@broadcom.com>
Signed-off-by: NRob Rice <rob.rice@broadcom.com>
Acked-by: NRob Herring <robh@kernel.org>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

206dc4fc

crypto: cavium - Enable CPT options crypto for build · 62ad8b5c

由 George Cherian 提交于 2月 07, 2017

Add the CPT options in crypto Kconfig and update the
crypto Makefile

Update the MAINTAINERS file too.
Signed-off-by: NGeorge Cherian <george.cherian@cavium.com>
Reviewed-by: NDavid Daney <david.daney@cavium.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

62ad8b5c

crypto: cavium - Add the Virtual Function driver for CPT · c694b233

由 George Cherian 提交于 2月 07, 2017

Enable the CPT VF driver. CPT is the cryptographic Acceleration Unit
in Octeon-tx series of processors.
Signed-off-by: NGeorge Cherian <george.cherian@cavium.com>
Reviewed-by: NDavid Daney <david.daney@cavium.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

c694b233

crypto: cavium - Add Support for Octeon-tx CPT Engine · 9e2c7d99

由 George Cherian 提交于 2月 07, 2017

Enable the Physical Function driver for the Cavium Crypto Engine (CPT)
found in Octeon-tx series of SoC's. CPT is the Cryptographic Accelaration
Unit. CPT includes microcoded GigaCypher symmetric engines (SEs) and
asymmetric engines (AEs).
Signed-off-by: NGeorge Cherian <george.cherian@cavium.com>
Reviewed-by: NDavid Daney <david.daney@cavium.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

9e2c7d99

hwrng: cavium - Use per device name to allow for multiple devices. · 87f3d088

由 David Daney 提交于 2月 06, 2017

Systems containing the Cavium HW RNG may have one device per NUMA
node.  A typical configuration is a 2-node NUMA system, which results
in 2 RNG devices.  The hwrng subsystem refuses (and rightly so) to
register more than one device with he same name, so we get failure
messages on these systems.

Make the hwrng name unique by including the underlying device name.
Also remove spaces from the name to make it possible to switch devices
via the sysfs knobs.
Signed-off-by: NDavid Daney <david.daney@cavium.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

87f3d088

crypto: atmel - fix 64-bit build warnings · 4c147bcf

由 Arnd Bergmann 提交于 2月 06, 2017

When we enable COMPILE_TEST building for the Atmel sha and tdes implementations,
we run into a couple of warnings about incorrect format strings, e.g.

In file included from include/linux/platform_device.h:14:0,
from drivers/crypto/atmel-sha.c:24:
drivers/crypto/atmel-sha.c: In function 'atmel_sha_xmit_cpu':
drivers/crypto/atmel-sha.c:571:19: error: format '%d' expects argument of type 'int', but argument 6 has type 'size_t {aka long unsigned int}' [-Werror=format=]
In file included from include/linux/printk.h:6:0,
from include/linux/kernel.h:13,
from drivers/crypto/atmel-tdes.c:17:
drivers/crypto/atmel-tdes.c: In function 'atmel_tdes_crypt_dma_stop':
include/linux/kern_levels.h:4:18: error: format '%u' expects argument of type 'unsigned int', but argument 2 has type 'size_t {aka long unsigned int}' [-Werror=format=]

These are all fixed by using the "%z" modifier for size_t data.

There are also a few uses of min()/max() with incompatible types:

drivers/crypto/atmel-tdes.c: In function 'atmel_tdes_crypt_start':
drivers/crypto/atmel-tdes.c:528:181: error: comparison of distinct pointer types lacks a cast [-Werror]

Where possible, we should use consistent types here, otherwise we can use
min_t()/max_t() to get well-defined behavior without a warning.
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

4c147bcf

crypto: atmel - refine Kconfig dependencies · ceb4afb3

由 Arnd Bergmann 提交于 2月 06, 2017

With the new authenc support, we get a harmless Kconfig warning:

warning: (CRYPTO_DEV_ATMEL_AUTHENC) selects CRYPTO_DEV_ATMEL_SHA which has unmet direct dependencies (CRYPTO && CRYPTO_HW && ARCH_AT91)

The problem is that each of the options has slightly different dependencies,
although they all seem to want the same thing: allow building for real AT91
targets that actually have the hardware, and possibly for compile testing.

This makes all four options consistent: instead of depending on a particular
dmaengine implementation, we depend on the ARM platform, CONFIG_COMPILE_TEST
as an alternative when that is turned off. This makes the 'select' statements
work correctly.

Fixes: 89a82ef8 ("crypto: atmel-authenc - add support to authenc(hmac(shaX), Y(aes)) modes")
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

ceb4afb3

crypto: algapi - make crypto_xor() and crypto_inc() alignment agnostic · db91af0f

由 Ard Biesheuvel 提交于 2月 05, 2017

Instead of unconditionally forcing 4 byte alignment for all generic
chaining modes that rely on crypto_xor() or crypto_inc() (which may
result in unnecessary copying of data when the underlying hardware
can perform unaligned accesses efficiently), make those functions
deal with unaligned input explicitly, but only if the Kconfig symbol
HAVE_EFFICIENT_UNALIGNED_ACCESS is set. This will allow us to drop
the alignmasks from the CBC, CMAC, CTR, CTS, PCBC and SEQIV drivers.

For crypto_inc(), this simply involves making the 4-byte stride
conditional on HAVE_EFFICIENT_UNALIGNED_ACCESS being set, given that
it typically operates on 16 byte buffers.

For crypto_xor(), an algorithm is implemented that simply runs through
the input using the largest strides possible if unaligned accesses are
allowed. If they are not, an optimal sequence of memory accesses is
emitted that takes the relative alignment of the input buffers into
account, e.g., if the relative misalignment of dst and src is 4 bytes,
the entire xor operation will be completed using 4 byte loads and stores
(modulo unaligned bits at the start and end). Note that all expressions
involving misalign are simply eliminated by the compiler when
HAVE_EFFICIENT_UNALIGNED_ACCESS is defined.
Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

db91af0f

crypto: improve gcc optimization flags for serpent and wp512 · 7d6e9105

由 Arnd Bergmann 提交于 2月 03, 2017

An ancient gcc bug (first reported in 2003) has apparently resurfaced
on MIPS, where kernelci.org reports an overly large stack frame in the
whirlpool hash algorithm:

crypto/wp512.c:987:1: warning: the frame size of 1112 bytes is larger than 1024 bytes [-Wframe-larger-than=]

With some testing in different configurations, I'm seeing large
variations in stack frames size up to 1500 bytes for what should have
around 300 bytes at most. I also checked the reference implementation,
which is essentially the same code but also comes with some test and
benchmarking infrastructure.

It seems that recent compiler versions on at least arm, arm64 and powerpc
have a partial fix for this problem, but enabling "-fsched-pressure", but
even with that fix they suffer from the issue to a certain degree. Some
testing on arm64 shows that the time needed to hash a given amount of
data is roughly proportional to the stack frame size here, which makes
sense given that the wp512 implementation is doing lots of loads for
table lookups, and the problem with the overly large stack is a result
of doing a lot more loads and stores for spilled registers (as seen from
inspecting the object code).

Disabling -fschedule-insns consistently fixes the problem for wp512,
in my collection of cross-compilers, the results are consistently better
or identical when comparing the stack sizes in this function, though
some architectures (notable x86) have schedule-insns disabled by
default.

The four columns are:
default: -O2
press:	 -O2 -fsched-pressure
nopress: -O2 -fschedule-insns -fno-sched-pressure
nosched: -O2 -no-schedule-insns (disables sched-pressure)

				default	press	nopress	nosched
alpha-linux-gcc-4.9.3		1136	848	1136	176
am33_2.0-linux-gcc-4.9.3	2100	2076	2100	2104
arm-linux-gnueabi-gcc-4.9.3	848	848	1048	352
cris-linux-gcc-4.9.3		272	272	272	272
frv-linux-gcc-4.9.3		1128	1000	1128	280
hppa64-linux-gcc-4.9.3		1128	336	1128	184
hppa-linux-gcc-4.9.3		644	308	644	276
i386-linux-gcc-4.9.3		352	352	352	352
m32r-linux-gcc-4.9.3		720	656	720	268
microblaze-linux-gcc-4.9.3	1108	604	1108	256
mips64-linux-gcc-4.9.3		1328	592	1328	208
mips-linux-gcc-4.9.3		1096	624	1096	240
powerpc64-linux-gcc-4.9.3	1088	432	1088	160
powerpc-linux-gcc-4.9.3		1080	584	1080	224
s390-linux-gcc-4.9.3		456	456	624	360
sh3-linux-gcc-4.9.3		292	292	292	292
sparc64-linux-gcc-4.9.3		992	240	992	208
sparc-linux-gcc-4.9.3		680	592	680	312
x86_64-linux-gcc-4.9.3		224	240	272	224
xtensa-linux-gcc-4.9.3		1152	704	1152	304

aarch64-linux-gcc-7.0.0		224	224	1104	208
arm-linux-gnueabi-gcc-7.0.1	824	824	1048	352
mips-linux-gcc-7.0.0		1120	648	1120	272
x86_64-linux-gcc-7.0.1		240	240	304	240

arm-linux-gnueabi-gcc-4.4.7	840			392
arm-linux-gnueabi-gcc-4.5.4	784	728	784	320
arm-linux-gnueabi-gcc-4.6.4	736	728	736	304
arm-linux-gnueabi-gcc-4.7.4	944	784	944	352
arm-linux-gnueabi-gcc-4.8.5	464	464	760	352
arm-linux-gnueabi-gcc-4.9.3	848	848	1048	352
arm-linux-gnueabi-gcc-5.3.1	824	824	1064	336
arm-linux-gnueabi-gcc-6.1.1	808	808	1056	344
arm-linux-gnueabi-gcc-7.0.1	824	824	1048	352

Trying the same test for serpent-generic, the picture is a bit different,
and while -fno-schedule-insns is generally better here than the default,
-fsched-pressure wins overall, so I picked that instead.

				default	press	nopress	nosched
alpha-linux-gcc-4.9.3		1392	864	1392	960
am33_2.0-linux-gcc-4.9.3	536	524	536	528
arm-linux-gnueabi-gcc-4.9.3	552	552	776	536
cris-linux-gcc-4.9.3		528	528	528	528
frv-linux-gcc-4.9.3		536	400	536	504
hppa64-linux-gcc-4.9.3		524	208	524	480
hppa-linux-gcc-4.9.3		768	472	768	508
i386-linux-gcc-4.9.3		564	564	564	564
m32r-linux-gcc-4.9.3		712	576	712	532
microblaze-linux-gcc-4.9.3	724	392	724	512
mips64-linux-gcc-4.9.3		720	384	720	496
mips-linux-gcc-4.9.3		728	384	728	496
powerpc64-linux-gcc-4.9.3	704	304	704	480
powerpc-linux-gcc-4.9.3		704	296	704	480
s390-linux-gcc-4.9.3		560	560	592	536
sh3-linux-gcc-4.9.3		540	540	540	540
sparc64-linux-gcc-4.9.3		544	352	544	496
sparc-linux-gcc-4.9.3		544	344	544	496
x86_64-linux-gcc-4.9.3		528	536	576	528
xtensa-linux-gcc-4.9.3		752	544	752	544

aarch64-linux-gcc-7.0.0		432	432	656	480
arm-linux-gnueabi-gcc-7.0.1	616	616	808	536
mips-linux-gcc-7.0.0		720	464	720	488
x86_64-linux-gcc-7.0.1		536	528	600	536

arm-linux-gnueabi-gcc-4.4.7	592			440
arm-linux-gnueabi-gcc-4.5.4	776	448	776	544
arm-linux-gnueabi-gcc-4.6.4	776	448	776	544
arm-linux-gnueabi-gcc-4.7.4	768	448	768	544
arm-linux-gnueabi-gcc-4.8.5	488	488	776	544
arm-linux-gnueabi-gcc-4.9.3	552	552	776	536
arm-linux-gnueabi-gcc-5.3.1	552	552	776	536
arm-linux-gnueabi-gcc-6.1.1	560	560	776	536
arm-linux-gnueabi-gcc-7.0.1	616	616	808	536

I did not do any runtime tests with serpent, so it is possible that stack
frame size does not directly correlate with runtime performance here and
it actually makes things worse, but it's more likely to help here, and
the reduced stack frame size is probably enough reason to apply the patch,
especially given that the crypto code is often used in deep call chains.

Link: https://kernelci.org/build/id/58797d7559b5149efdf6c3a9/logs/
Link: http://www.larc.usp.br/~pbarreto/WhirlpoolPage.html
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=11488
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79149
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

7d6e9105

crypto: arm64/aes - add NEON/Crypto Extensions CBCMAC/CMAC/XCBC driver · 4860620d

由 Ard Biesheuvel 提交于 2月 03, 2017

On ARMv8 implementations that do not support the Crypto Extensions,
such as the Raspberry Pi 3, the CCM driver falls back to the generic
table based AES implementation to perform the MAC part of the
algorithm, which is slow and not time invariant. So add a CBCMAC
implementation to the shared glue code between NEON AES and Crypto
Extensions AES, so that it can be used instead now that the CCM
driver has been updated to look for CBCMAC implementations other
than the one it supplies itself.

Also, given how these algorithms mostly only differ in the way the key
handling and the final encryption are implemented, expose CMAC and XCBC
algorithms as well based on the same core update code.
Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

4860620d

crypto: ccm - switch to separate cbcmac driver · f15f05b0

由 Ard Biesheuvel 提交于 2月 03, 2017

Update the generic CCM driver to defer CBC-MAC processing to a
dedicated CBC-MAC ahash transform rather than open coding this
transform (and much of the associated scatterwalk plumbing) in
the CCM driver itself.

This cleans up the code considerably, but more importantly, it allows
the use of alternative CBC-MAC implementations that don't suffer from
performance degradation due to significant setup time (e.g., the NEON
based AES code needs to enable/disable the NEON, and load the S-box
into 16 SIMD registers, which cannot be amortized over the entire input
when using the cipher interface)
Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

f15f05b0

crypto: testmgr - add test cases for cbcmac(aes) · 092acf06

由 Ard Biesheuvel 提交于 2月 03, 2017

In preparation of splitting off the CBC-MAC transform in the CCM
driver into a separate algorithm, define some test cases for the
AES incarnation of cbcmac.
Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

092acf06

crypto: aes - add generic time invariant AES cipher · b5e0b032

由 Ard Biesheuvel 提交于 2月 02, 2017

Lookup table based AES is sensitive to timing attacks, which is due to
the fact that such table lookups are data dependent, and the fact that
8 KB worth of tables covers a significant number of cachelines on any
architecture, resulting in an exploitable correlation between the key
and the processing time for known plaintexts.

For network facing algorithms such as CTR, CCM or GCM, this presents a
security risk, which is why arch specific AES ports are typically time
invariant, either through the use of special instructions, or by using
SIMD algorithms that don't rely on table lookups.

For generic code, this is difficult to achieve without losing too much
performance, but we can improve the situation significantly by switching
to an implementation that only needs 256 bytes of table data (the actual
S-box itself), which can be prefetched at the start of each block to
eliminate data dependent latencies.

This code encrypts at ~25 cycles per byte on ARM Cortex-A57 (while the
ordinary generic AES driver manages 18 cycles per byte on this
hardware). Decryption is substantially slower.
Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

b5e0b032

crypto: aes-generic - drop alignment requirement · ec38a937

由 Ard Biesheuvel 提交于 2月 02, 2017

The generic AES code exposes a 32-bit align mask, which forces all
users of the code to use temporary buffers or take other measures to
ensure the alignment requirement is adhered to, even on architectures
that don't care about alignment for software algorithms such as this
one.

So drop the align mask, and fix the code to use get_unaligned_le32()
where appropriate, which will resolve to whatever is optimal for the
architecture.
Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

ec38a937

crypto: sha512-mb - Protect sha512 mb ctx mgr access · c459bd7b

由 Tim Chen 提交于 2月 01, 2017

The flusher and regular multi-buffer computation via mcryptd may race with another.
Add here a lock and turn off interrupt to to access multi-buffer
computation state cstate->mgr before a round of computation. This should
prevent the flusher code jumping in.
Signed-off-by: NTim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

c459bd7b

crypto: arm64/crc32 - merge CRC32 and PMULL instruction based drivers · 5d3d9c8b

由 Ard Biesheuvel 提交于 2月 01, 2017

The PMULL based CRC32 implementation already contains code based on the
separate, optional CRC32 instructions to fallback to when operating on
small quantities of data. We can expose these routines directly on systems
that lack the 64x64 PMULL instructions but do implement the CRC32 ones,
which makes the driver that is based solely on those CRC32 instructions
redundant. So remove it.

Note that this aligns arm64 with ARM, whose accelerated CRC32 driver
also combines the CRC32 extension based and the PMULL based versions.
Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Tested-by: NMatthias Brugger <mbrugger@suse.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

5d3d9c8b

03 2月, 2017 5 次提交

crypto: arm/aes - don't use IV buffer to return final keystream block · 1a20b966

由 Ard Biesheuvel 提交于 2月 02, 2017

The ARM bit sliced AES core code uses the IV buffer to pass the final
keystream block back to the glue code if the input is not a multiple of
the block size, so that the asm code does not have to deal with anything
except 16 byte blocks. This is done under the assumption that the outgoing
IV is meaningless anyway in this case, given that chaining is no longer
possible under these circumstances.

However, as it turns out, the CCM driver does expect the IV to retain
a value that is equal to the original IV except for the counter value,
and even interprets byte zero as a length indicator, which may result
in memory corruption if the IV is overwritten with something else.

So use a separate buffer to return the final keystream block.
Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

1a20b966

crypto: arm64/aes - don't use IV buffer to return final keystream block · 88a3f582

由 Ard Biesheuvel 提交于 2月 02, 2017

The arm64 bit sliced AES core code uses the IV buffer to pass the final
keystream block back to the glue code if the input is not a multiple of
the block size, so that the asm code does not have to deal with anything
except 16 byte blocks. This is done under the assumption that the outgoing
IV is meaningless anyway in this case, given that chaining is no longer
possible under these circumstances.

So use a separate buffer to return the final keystream block.
Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

88a3f582

crypto: arm64/aes - replace scalar fallback with plain NEON fallback · 12fcd923

由 Ard Biesheuvel 提交于 1月 28, 2017

The new bitsliced NEON implementation of AES uses a fallback in two
places: CBC encryption (which is strictly sequential, whereas this
driver can only operate efficiently on 8 blocks at a time), and the
XTS tweak generation, which involves encrypting a single AES block
with a different key schedule.

The plain (i.e., non-bitsliced) NEON code is more suitable as a fallback,
given that it is faster than scalar on low end cores (which is what
the NEON implementations target, since high end cores have dedicated
instructions for AES), and shows similar behavior in terms of D-cache
footprint and sensitivity to cache timing attacks. So switch the fallback
handling to the plain NEON driver.
Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

12fcd923

crypto: arm64/aes-neon-blk - tweak performance for low end cores · 4edd7d01

由 Ard Biesheuvel 提交于 1月 28, 2017

The non-bitsliced AES implementation using the NEON is highly sensitive
to micro-architectural details, and, as it turns out, the Cortex-A53 on
the Raspberry Pi 3 is a core that can benefit from this code, given that
its scalar AES performance is abysmal (32.9 cycles per byte).

The new bitsliced AES code manages 19.8 cycles per byte on this core,
but can only operate on 8 blocks at a time, which is not supported by
all chaining modes. With a bit of tweaking, we can get the plain NEON
code to run at 22.0 cycles per byte, making it useful for sequential
modes like CBC encryption. (Like bitsliced NEON, the plain NEON
implementation does not use any lookup tables, which makes it easy on
the D-cache, and invulnerable to cache timing attacks)

So tweak the plain NEON AES code to use tbl instructions rather than
shl/sri pairs, and to avoid the need to reload permutation vectors or
other constants from memory in every round. Also, improve the decryption
performance by switching to 16x8 pmul instructions for the performing
the multiplications in GF(2^8).

To allow the ECB and CBC encrypt routines to be reused by the bitsliced
NEON code in a subsequent patch, export them from the module.
Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

4edd7d01

crypto: arm64/aes - performance tweak · c458c4ad

由 Ard Biesheuvel 提交于 1月 28, 2017

Shuffle some instructions around in the __hround macro to shave off
0.1 cycles per byte on Cortex-A57.
Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

c458c4ad

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功