提交 9eb31227 编写于 作者: L Linus Torvalds

Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6

Pull crypto updates from Herbert Xu:
 "API:

   - add AEAD support to crypto engine

   - allow batch registration in simd

  Algorithms:

   - add CFB mode

   - add speck block cipher

   - add sm4 block cipher

   - new test case for crct10dif

   - improve scheduling latency on ARM

   - scatter/gather support to gcm in aesni

   - convert x86 crypto algorithms to skcihper

  Drivers:

   - hmac(sha224/sha256) support in inside-secure

   - aes gcm/ccm support in stm32

   - stm32mp1 support in stm32

   - ccree driver from staging tree

   - gcm support over QI in caam

   - add ks-sa hwrng driver"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (212 commits)
  crypto: ccree - remove unused enums
  crypto: ahash - Fix early termination in hash walk
  crypto: brcm - explicitly cast cipher to hash type
  crypto: talitos - don't leak pointers to authenc keys
  crypto: qat - don't leak pointers to authenc keys
  crypto: picoxcell - don't leak pointers to authenc keys
  crypto: ixp4xx - don't leak pointers to authenc keys
  crypto: chelsio - don't leak pointers to authenc keys
  crypto: caam/qi - don't leak pointers to authenc keys
  crypto: caam - don't leak pointers to authenc keys
  crypto: lrw - Free rctx->ext with kzfree
  crypto: talitos - fix IPsec cipher in length
  crypto: Deduplicate le32_to_cpu_array() and cpu_to_le32_array()
  crypto: doc - clarify hash callbacks state machine
  crypto: api - Keep failed instances alive
  crypto: api - Make crypto_alg_lookup static
  crypto: api - Remove unused crypto_type lookup function
  crypto: chelsio - Remove declaration of static function from header
  crypto: inside-secure - hmac(sha224) support
  crypto: inside-secure - hmac(sha256) support
  ..
=============
CRYPTO ENGINE
=============
Overview
--------
The crypto engine API (CE), is a crypto queue manager.
Requirement
-----------
You have to put at start of your tfm_ctx the struct crypto_engine_ctx
struct your_tfm_ctx {
struct crypto_engine_ctx enginectx;
...
};
Why: Since CE manage only crypto_async_request, it cannot know the underlying
request_type and so have access only on the TFM.
So using container_of for accessing __ctx is impossible.
Furthermore, the crypto engine cannot know the "struct your_tfm_ctx",
so it must assume that crypto_engine_ctx is at start of it.
Order of operations
-------------------
You have to obtain a struct crypto_engine via crypto_engine_alloc_init().
And start it via crypto_engine_start().
Before transferring any request, you have to fill the enginectx.
- prepare_request: (taking a function pointer) If you need to do some processing before doing the request
- unprepare_request: (taking a function pointer) Undoing what's done in prepare_request
- do_one_request: (taking a function pointer) Do encryption for current request
Note: that those three functions get the crypto_async_request associated with the received request.
So your need to get the original request via container_of(areq, struct yourrequesttype_request, base);
When your driver receive a crypto_request, you have to transfer it to
the cryptoengine via one of:
- crypto_transfer_ablkcipher_request_to_engine()
- crypto_transfer_aead_request_to_engine()
- crypto_transfer_akcipher_request_to_engine()
- crypto_transfer_hash_request_to_engine()
- crypto_transfer_skcipher_request_to_engine()
At the end of the request process, a call to one of the following function is needed:
- crypto_finalize_ablkcipher_request
- crypto_finalize_aead_request
- crypto_finalize_akcipher_request
- crypto_finalize_hash_request
- crypto_finalize_skcipher_request
......@@ -236,6 +236,14 @@ when used from another part of the kernel.
|
'---------------> HASH2
Note that it is perfectly legal to "abandon" a request object:
- call .init() and then (as many times) .update()
- _not_ call any of .final(), .finup() or .export() at any point in future
In other words implementations should mind the resource allocation and clean-up.
No resources related to request objects should remain allocated after a call
to .init() or .update(), since there might be no chance to free them.
Specifics Of Asynchronous HASH Transformation
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
......
Arm TrustZone CryptoCell cryptographic engine
Required properties:
- compatible: Should be "arm,cryptocell-712-ree".
- compatible: Should be one of: "arm,cryptocell-712-ree",
"arm,cryptocell-710-ree" or "arm,cryptocell-630p-ree".
- reg: Base physical address of the engine and length of memory mapped region.
- interrupts: Interrupt number for the device.
......
......@@ -8,7 +8,11 @@ Required properties:
- interrupt-names: Should be "ring0", "ring1", "ring2", "ring3", "eip", "mem".
Optional properties:
- clocks: Reference to the crypto engine clock.
- clocks: Reference to the crypto engine clocks, the second clock is
needed for the Armada 7K/8K SoCs.
- clock-names: mandatory if there is a second clock, in this case the
name must be "core" for the first clock and "reg" for
the second one.
Example:
......
Freescale RNGC (Random Number Generator Version C)
The driver also supports version B, which is mostly compatible
to version C.
Freescale RNGA/RNGB/RNGC (Random Number Generator Versions A, B and C)
Required properties:
- compatible : should be one of
"fsl,imx21-rnga"
"fsl,imx31-rnga" (backward compatible with "fsl,imx21-rnga")
"fsl,imx25-rngb"
"fsl,imx35-rngc"
- reg : offset and length of the register set of this block
- interrupts : the interrupt number for the RNGC block
- clocks : the RNGC clk source
- interrupts : the interrupt number for the RNG block
- clocks : the RNG clk source
Example:
......
Keystone SoC Hardware Random Number Generator(HWRNG) Module
On Keystone SoCs HWRNG module is a submodule of the Security Accelerator.
- compatible: should be "ti,keystone-rng"
- ti,syscon-sa-cfg: phandle to syscon node of the SA configuration registers.
This registers are shared between hwrng and crypto drivers.
- clocks: phandle to the reference clocks for the subsystem
- clock-names: functional clock name. Should be set to "fck"
- reg: HWRNG module register space
Example:
/* K2HK */
rng@24000 {
compatible = "ti,keystone-rng";
ti,syscon-sa-cfg = <&sa_config>;
clocks = <&clksa>;
clock-names = "fck";
reg = <0x24000 0x1000>;
};
......@@ -13,7 +13,12 @@ Required properties:
- interrupts : the interrupt number for the RNG module.
Used for "ti,omap4-rng" and "inside-secure,safexcel-eip76"
- clocks: the trng clock source. Only mandatory for the
"inside-secure,safexcel-eip76" compatible.
"inside-secure,safexcel-eip76" compatible, the second clock is
needed for the Armada 7K/8K SoCs
- clock-names: mandatory if there is a second clock, in this case the
name must be "core" for the first clock and "reg" for the second
one
Example:
/* AM335x */
......
......@@ -11,6 +11,10 @@ Required properties:
- interrupts : The designated IRQ line for the RNG
- clocks : The clock needed to enable the RNG
Optional properties:
- resets : The reset to properly start RNG
- clock-error-detect : Enable the clock detection management
Example:
rng: rng@50060800 {
......
......@@ -3252,12 +3252,11 @@ F: drivers/net/ieee802154/cc2520.c
F: include/linux/spi/cc2520.h
F: Documentation/devicetree/bindings/net/ieee802154/cc2520.txt
CCREE ARM TRUSTZONE CRYPTOCELL 700 REE DRIVER
CCREE ARM TRUSTZONE CRYPTOCELL REE DRIVER
M: Gilad Ben-Yossef <gilad@benyossef.com>
L: linux-crypto@vger.kernel.org
L: driverdev-devel@linuxdriverproject.org
S: Supported
F: drivers/staging/ccree/
F: drivers/crypto/ccree/
W: https://developer.arm.com/products/system-ip/trustzone-cryptocell/cryptocell-700-family
CEC FRAMEWORK
......@@ -6962,7 +6961,7 @@ F: drivers/input/input-mt.c
K: \b(ABS|SYN)_MT_
INSIDE SECURE CRYPTO DRIVER
M: Antoine Tenart <antoine.tenart@free-electrons.com>
M: Antoine Tenart <antoine.tenart@bootlin.com>
F: drivers/crypto/inside-secure/
S: Maintained
L: linux-crypto@vger.kernel.org
......@@ -7200,6 +7199,14 @@ L: linux-rdma@vger.kernel.org
S: Supported
F: drivers/infiniband/hw/i40iw/
INTEL SHA MULTIBUFFER DRIVER
M: Megha Dey <megha.dey@linux.intel.com>
R: Tim Chen <tim.c.chen@linux.intel.com>
L: linux-crypto@vger.kernel.org
S: Supported
F: arch/x86/crypto/sha*-mb
F: crypto/mcryptd.c
INTEL TELEMETRY DRIVER
M: Souvik Kumar Chakravarty <souvik.k.chakravarty@intel.com>
L: platform-driver-x86@vger.kernel.org
......
......@@ -121,4 +121,10 @@ config CRYPTO_CHACHA20_NEON
select CRYPTO_BLKCIPHER
select CRYPTO_CHACHA20
config CRYPTO_SPECK_NEON
tristate "NEON accelerated Speck cipher algorithms"
depends on KERNEL_MODE_NEON
select CRYPTO_BLKCIPHER
select CRYPTO_SPECK
endif
......@@ -10,6 +10,7 @@ obj-$(CONFIG_CRYPTO_SHA1_ARM_NEON) += sha1-arm-neon.o
obj-$(CONFIG_CRYPTO_SHA256_ARM) += sha256-arm.o
obj-$(CONFIG_CRYPTO_SHA512_ARM) += sha512-arm.o
obj-$(CONFIG_CRYPTO_CHACHA20_NEON) += chacha20-neon.o
obj-$(CONFIG_CRYPTO_SPECK_NEON) += speck-neon.o
ce-obj-$(CONFIG_CRYPTO_AES_ARM_CE) += aes-arm-ce.o
ce-obj-$(CONFIG_CRYPTO_SHA1_ARM_CE) += sha1-arm-ce.o
......@@ -53,7 +54,9 @@ ghash-arm-ce-y := ghash-ce-core.o ghash-ce-glue.o
crct10dif-arm-ce-y := crct10dif-ce-core.o crct10dif-ce-glue.o
crc32-arm-ce-y:= crc32-ce-core.o crc32-ce-glue.o
chacha20-neon-y := chacha20-neon-core.o chacha20-neon-glue.o
speck-neon-y := speck-neon-core.o speck-neon-glue.o
ifdef REGENERATE_ARM_CRYPTO
quiet_cmd_perl = PERL $@
cmd_perl = $(PERL) $(<) > $(@)
......@@ -62,5 +65,6 @@ $(src)/sha256-core.S_shipped: $(src)/sha256-armv4.pl
$(src)/sha512-core.S_shipped: $(src)/sha512-armv4.pl
$(call cmd,perl)
endif
.PRECIOUS: $(obj)/sha256-core.S $(obj)/sha512-core.S
......@@ -174,6 +174,16 @@
.ltorg
.endm
ENTRY(__aes_arm_encrypt)
do_crypt fround, crypto_ft_tab, crypto_ft_tab + 1, 2
ENDPROC(__aes_arm_encrypt)
.align 5
ENTRY(__aes_arm_decrypt)
do_crypt iround, crypto_it_tab, __aes_arm_inverse_sbox, 0
ENDPROC(__aes_arm_decrypt)
.section ".rodata", "a"
.align L1_CACHE_SHIFT
.type __aes_arm_inverse_sbox, %object
__aes_arm_inverse_sbox:
......@@ -210,12 +220,3 @@ __aes_arm_inverse_sbox:
.byte 0x17, 0x2b, 0x04, 0x7e, 0xba, 0x77, 0xd6, 0x26
.byte 0xe1, 0x69, 0x14, 0x63, 0x55, 0x21, 0x0c, 0x7d
.size __aes_arm_inverse_sbox, . - __aes_arm_inverse_sbox
ENTRY(__aes_arm_encrypt)
do_crypt fround, crypto_ft_tab, crypto_ft_tab + 1, 2
ENDPROC(__aes_arm_encrypt)
.align 5
ENTRY(__aes_arm_decrypt)
do_crypt iround, crypto_it_tab, __aes_arm_inverse_sbox, 0
ENDPROC(__aes_arm_decrypt)
// SPDX-License-Identifier: GPL-2.0
/*
* NEON-accelerated implementation of Speck128-XTS and Speck64-XTS
*
* Copyright (c) 2018 Google, Inc
*
* Author: Eric Biggers <ebiggers@google.com>
*/
#include <linux/linkage.h>
.text
.fpu neon
// arguments
ROUND_KEYS .req r0 // const {u64,u32} *round_keys
NROUNDS .req r1 // int nrounds
DST .req r2 // void *dst
SRC .req r3 // const void *src
NBYTES .req r4 // unsigned int nbytes
TWEAK .req r5 // void *tweak
// registers which hold the data being encrypted/decrypted
X0 .req q0
X0_L .req d0
X0_H .req d1
Y0 .req q1
Y0_H .req d3
X1 .req q2
X1_L .req d4
X1_H .req d5
Y1 .req q3
Y1_H .req d7
X2 .req q4
X2_L .req d8
X2_H .req d9
Y2 .req q5
Y2_H .req d11
X3 .req q6
X3_L .req d12
X3_H .req d13
Y3 .req q7
Y3_H .req d15
// the round key, duplicated in all lanes
ROUND_KEY .req q8
ROUND_KEY_L .req d16
ROUND_KEY_H .req d17
// index vector for vtbl-based 8-bit rotates
ROTATE_TABLE .req d18
// multiplication table for updating XTS tweaks
GF128MUL_TABLE .req d19
GF64MUL_TABLE .req d19
// current XTS tweak value(s)
TWEAKV .req q10
TWEAKV_L .req d20
TWEAKV_H .req d21
TMP0 .req q12
TMP0_L .req d24
TMP0_H .req d25
TMP1 .req q13
TMP2 .req q14
TMP3 .req q15
.align 4
.Lror64_8_table:
.byte 1, 2, 3, 4, 5, 6, 7, 0
.Lror32_8_table:
.byte 1, 2, 3, 0, 5, 6, 7, 4
.Lrol64_8_table:
.byte 7, 0, 1, 2, 3, 4, 5, 6
.Lrol32_8_table:
.byte 3, 0, 1, 2, 7, 4, 5, 6
.Lgf128mul_table:
.byte 0, 0x87
.fill 14
.Lgf64mul_table:
.byte 0, 0x1b, (0x1b << 1), (0x1b << 1) ^ 0x1b
.fill 12
/*
* _speck_round_128bytes() - Speck encryption round on 128 bytes at a time
*
* Do one Speck encryption round on the 128 bytes (8 blocks for Speck128, 16 for
* Speck64) stored in X0-X3 and Y0-Y3, using the round key stored in all lanes
* of ROUND_KEY. 'n' is the lane size: 64 for Speck128, or 32 for Speck64.
*
* The 8-bit rotates are implemented using vtbl instead of vshr + vsli because
* the vtbl approach is faster on some processors and the same speed on others.
*/
.macro _speck_round_128bytes n
// x = ror(x, 8)
vtbl.8 X0_L, {X0_L}, ROTATE_TABLE
vtbl.8 X0_H, {X0_H}, ROTATE_TABLE
vtbl.8 X1_L, {X1_L}, ROTATE_TABLE
vtbl.8 X1_H, {X1_H}, ROTATE_TABLE
vtbl.8 X2_L, {X2_L}, ROTATE_TABLE
vtbl.8 X2_H, {X2_H}, ROTATE_TABLE
vtbl.8 X3_L, {X3_L}, ROTATE_TABLE
vtbl.8 X3_H, {X3_H}, ROTATE_TABLE
// x += y
vadd.u\n X0, Y0
vadd.u\n X1, Y1
vadd.u\n X2, Y2
vadd.u\n X3, Y3
// x ^= k
veor X0, ROUND_KEY
veor X1, ROUND_KEY
veor X2, ROUND_KEY
veor X3, ROUND_KEY
// y = rol(y, 3)
vshl.u\n TMP0, Y0, #3
vshl.u\n TMP1, Y1, #3
vshl.u\n TMP2, Y2, #3
vshl.u\n TMP3, Y3, #3
vsri.u\n TMP0, Y0, #(\n - 3)
vsri.u\n TMP1, Y1, #(\n - 3)
vsri.u\n TMP2, Y2, #(\n - 3)
vsri.u\n TMP3, Y3, #(\n - 3)
// y ^= x
veor Y0, TMP0, X0
veor Y1, TMP1, X1
veor Y2, TMP2, X2
veor Y3, TMP3, X3
.endm
/*
* _speck_unround_128bytes() - Speck decryption round on 128 bytes at a time
*
* This is the inverse of _speck_round_128bytes().
*/
.macro _speck_unround_128bytes n
// y ^= x
veor TMP0, Y0, X0
veor TMP1, Y1, X1
veor TMP2, Y2, X2
veor TMP3, Y3, X3
// y = ror(y, 3)
vshr.u\n Y0, TMP0, #3
vshr.u\n Y1, TMP1, #3
vshr.u\n Y2, TMP2, #3
vshr.u\n Y3, TMP3, #3
vsli.u\n Y0, TMP0, #(\n - 3)
vsli.u\n Y1, TMP1, #(\n - 3)
vsli.u\n Y2, TMP2, #(\n - 3)
vsli.u\n Y3, TMP3, #(\n - 3)
// x ^= k
veor X0, ROUND_KEY
veor X1, ROUND_KEY
veor X2, ROUND_KEY
veor X3, ROUND_KEY
// x -= y
vsub.u\n X0, Y0
vsub.u\n X1, Y1
vsub.u\n X2, Y2
vsub.u\n X3, Y3
// x = rol(x, 8);
vtbl.8 X0_L, {X0_L}, ROTATE_TABLE
vtbl.8 X0_H, {X0_H}, ROTATE_TABLE
vtbl.8 X1_L, {X1_L}, ROTATE_TABLE
vtbl.8 X1_H, {X1_H}, ROTATE_TABLE
vtbl.8 X2_L, {X2_L}, ROTATE_TABLE
vtbl.8 X2_H, {X2_H}, ROTATE_TABLE
vtbl.8 X3_L, {X3_L}, ROTATE_TABLE
vtbl.8 X3_H, {X3_H}, ROTATE_TABLE
.endm
.macro _xts128_precrypt_one dst_reg, tweak_buf, tmp
// Load the next source block
vld1.8 {\dst_reg}, [SRC]!
// Save the current tweak in the tweak buffer
vst1.8 {TWEAKV}, [\tweak_buf:128]!
// XOR the next source block with the current tweak
veor \dst_reg, TWEAKV
/*
* Calculate the next tweak by multiplying the current one by x,
* modulo p(x) = x^128 + x^7 + x^2 + x + 1.
*/
vshr.u64 \tmp, TWEAKV, #63
vshl.u64 TWEAKV, #1
veor TWEAKV_H, \tmp\()_L
vtbl.8 \tmp\()_H, {GF128MUL_TABLE}, \tmp\()_H
veor TWEAKV_L, \tmp\()_H
.endm
.macro _xts64_precrypt_two dst_reg, tweak_buf, tmp
// Load the next two source blocks
vld1.8 {\dst_reg}, [SRC]!
// Save the current two tweaks in the tweak buffer
vst1.8 {TWEAKV}, [\tweak_buf:128]!
// XOR the next two source blocks with the current two tweaks
veor \dst_reg, TWEAKV
/*
* Calculate the next two tweaks by multiplying the current ones by x^2,
* modulo p(x) = x^64 + x^4 + x^3 + x + 1.
*/
vshr.u64 \tmp, TWEAKV, #62
vshl.u64 TWEAKV, #2
vtbl.8 \tmp\()_L, {GF64MUL_TABLE}, \tmp\()_L
vtbl.8 \tmp\()_H, {GF64MUL_TABLE}, \tmp\()_H
veor TWEAKV, \tmp
.endm
/*
* _speck_xts_crypt() - Speck-XTS encryption/decryption
*
* Encrypt or decrypt NBYTES bytes of data from the SRC buffer to the DST buffer
* using Speck-XTS, specifically the variant with a block size of '2n' and round
* count given by NROUNDS. The expanded round keys are given in ROUND_KEYS, and
* the current XTS tweak value is given in TWEAK. It's assumed that NBYTES is a
* nonzero multiple of 128.
*/
.macro _speck_xts_crypt n, decrypting
push {r4-r7}
mov r7, sp
/*
* The first four parameters were passed in registers r0-r3. Load the
* additional parameters, which were passed on the stack.
*/
ldr NBYTES, [sp, #16]
ldr TWEAK, [sp, #20]
/*
* If decrypting, modify the ROUND_KEYS parameter to point to the last
* round key rather than the first, since for decryption the round keys
* are used in reverse order.
*/
.if \decrypting
.if \n == 64
add ROUND_KEYS, ROUND_KEYS, NROUNDS, lsl #3
sub ROUND_KEYS, #8
.else
add ROUND_KEYS, ROUND_KEYS, NROUNDS, lsl #2
sub ROUND_KEYS, #4
.endif
.endif
// Load the index vector for vtbl-based 8-bit rotates
.if \decrypting
ldr r12, =.Lrol\n\()_8_table
.else
ldr r12, =.Lror\n\()_8_table
.endif
vld1.8 {ROTATE_TABLE}, [r12:64]
// One-time XTS preparation
/*
* Allocate stack space to store 128 bytes worth of tweaks. For
* performance, this space is aligned to a 16-byte boundary so that we
* can use the load/store instructions that declare 16-byte alignment.
*/
sub sp, #128
bic sp, #0xf
.if \n == 64
// Load first tweak
vld1.8 {TWEAKV}, [TWEAK]
// Load GF(2^128) multiplication table
ldr r12, =.Lgf128mul_table
vld1.8 {GF128MUL_TABLE}, [r12:64]
.else
// Load first tweak
vld1.8 {TWEAKV_L}, [TWEAK]
// Load GF(2^64) multiplication table
ldr r12, =.Lgf64mul_table
vld1.8 {GF64MUL_TABLE}, [r12:64]
// Calculate second tweak, packing it together with the first
vshr.u64 TMP0_L, TWEAKV_L, #63
vtbl.u8 TMP0_L, {GF64MUL_TABLE}, TMP0_L
vshl.u64 TWEAKV_H, TWEAKV_L, #1
veor TWEAKV_H, TMP0_L
.endif
.Lnext_128bytes_\@:
/*
* Load the source blocks into {X,Y}[0-3], XOR them with their XTS tweak
* values, and save the tweaks on the stack for later. Then
* de-interleave the 'x' and 'y' elements of each block, i.e. make it so
* that the X[0-3] registers contain only the second halves of blocks,
* and the Y[0-3] registers contain only the first halves of blocks.
* (Speck uses the order (y, x) rather than the more intuitive (x, y).)
*/
mov r12, sp
.if \n == 64
_xts128_precrypt_one X0, r12, TMP0
_xts128_precrypt_one Y0, r12, TMP0
_xts128_precrypt_one X1, r12, TMP0
_xts128_precrypt_one Y1, r12, TMP0
_xts128_precrypt_one X2, r12, TMP0
_xts128_precrypt_one Y2, r12, TMP0
_xts128_precrypt_one X3, r12, TMP0
_xts128_precrypt_one Y3, r12, TMP0
vswp X0_L, Y0_H
vswp X1_L, Y1_H
vswp X2_L, Y2_H
vswp X3_L, Y3_H
.else
_xts64_precrypt_two X0, r12, TMP0
_xts64_precrypt_two Y0, r12, TMP0
_xts64_precrypt_two X1, r12, TMP0
_xts64_precrypt_two Y1, r12, TMP0
_xts64_precrypt_two X2, r12, TMP0
_xts64_precrypt_two Y2, r12, TMP0
_xts64_precrypt_two X3, r12, TMP0
_xts64_precrypt_two Y3, r12, TMP0
vuzp.32 Y0, X0
vuzp.32 Y1, X1
vuzp.32 Y2, X2
vuzp.32 Y3, X3
.endif
// Do the cipher rounds
mov r12, ROUND_KEYS
mov r6, NROUNDS
.Lnext_round_\@:
.if \decrypting
.if \n == 64
vld1.64 ROUND_KEY_L, [r12]
sub r12, #8
vmov ROUND_KEY_H, ROUND_KEY_L
.else
vld1.32 {ROUND_KEY_L[],ROUND_KEY_H[]}, [r12]
sub r12, #4
.endif
_speck_unround_128bytes \n
.else
.if \n == 64
vld1.64 ROUND_KEY_L, [r12]!
vmov ROUND_KEY_H, ROUND_KEY_L
.else
vld1.32 {ROUND_KEY_L[],ROUND_KEY_H[]}, [r12]!
.endif
_speck_round_128bytes \n
.endif
subs r6, r6, #1
bne .Lnext_round_\@
// Re-interleave the 'x' and 'y' elements of each block
.if \n == 64
vswp X0_L, Y0_H
vswp X1_L, Y1_H
vswp X2_L, Y2_H
vswp X3_L, Y3_H
.else
vzip.32 Y0, X0
vzip.32 Y1, X1
vzip.32 Y2, X2
vzip.32 Y3, X3
.endif
// XOR the encrypted/decrypted blocks with the tweaks we saved earlier
mov r12, sp
vld1.8 {TMP0, TMP1}, [r12:128]!
vld1.8 {TMP2, TMP3}, [r12:128]!
veor X0, TMP0
veor Y0, TMP1
veor X1, TMP2
veor Y1, TMP3
vld1.8 {TMP0, TMP1}, [r12:128]!
vld1.8 {TMP2, TMP3}, [r12:128]!
veor X2, TMP0
veor Y2, TMP1
veor X3, TMP2
veor Y3, TMP3
// Store the ciphertext in the destination buffer
vst1.8 {X0, Y0}, [DST]!
vst1.8 {X1, Y1}, [DST]!
vst1.8 {X2, Y2}, [DST]!
vst1.8 {X3, Y3}, [DST]!
// Continue if there are more 128-byte chunks remaining, else return
subs NBYTES, #128
bne .Lnext_128bytes_\@
// Store the next tweak
.if \n == 64
vst1.8 {TWEAKV}, [TWEAK]
.else
vst1.8 {TWEAKV_L}, [TWEAK]
.endif
mov sp, r7
pop {r4-r7}
bx lr
.endm
ENTRY(speck128_xts_encrypt_neon)
_speck_xts_crypt n=64, decrypting=0
ENDPROC(speck128_xts_encrypt_neon)
ENTRY(speck128_xts_decrypt_neon)
_speck_xts_crypt n=64, decrypting=1
ENDPROC(speck128_xts_decrypt_neon)
ENTRY(speck64_xts_encrypt_neon)
_speck_xts_crypt n=32, decrypting=0
ENDPROC(speck64_xts_encrypt_neon)
ENTRY(speck64_xts_decrypt_neon)
_speck_xts_crypt n=32, decrypting=1
ENDPROC(speck64_xts_decrypt_neon)
// SPDX-License-Identifier: GPL-2.0
/*
* NEON-accelerated implementation of Speck128-XTS and Speck64-XTS
*
* Copyright (c) 2018 Google, Inc
*
* Note: the NIST recommendation for XTS only specifies a 128-bit block size,
* but a 64-bit version (needed for Speck64) is fairly straightforward; the math
* is just done in GF(2^64) instead of GF(2^128), with the reducing polynomial
* x^64 + x^4 + x^3 + x + 1 from the original XEX paper (Rogaway, 2004:
* "Efficient Instantiations of Tweakable Blockciphers and Refinements to Modes
* OCB and PMAC"), represented as 0x1B.
*/
#include <asm/hwcap.h>
#include <asm/neon.h>
#include <asm/simd.h>
#include <crypto/algapi.h>
#include <crypto/gf128mul.h>
#include <crypto/internal/skcipher.h>
#include <crypto/speck.h>
#include <crypto/xts.h>
#include <linux/kernel.h>
#include <linux/module.h>
/* The assembly functions only handle multiples of 128 bytes */
#define SPECK_NEON_CHUNK_SIZE 128
/* Speck128 */
struct speck128_xts_tfm_ctx {
struct speck128_tfm_ctx main_key;
struct speck128_tfm_ctx tweak_key;
};
asmlinkage void speck128_xts_encrypt_neon(const u64 *round_keys, int nrounds,
void *dst, const void *src,
unsigned int nbytes, void *tweak);
asmlinkage void speck128_xts_decrypt_neon(const u64 *round_keys, int nrounds,
void *dst, const void *src,
unsigned int nbytes, void *tweak);
typedef void (*speck128_crypt_one_t)(const struct speck128_tfm_ctx *,
u8 *, const u8 *);
typedef void (*speck128_xts_crypt_many_t)(const u64 *, int, void *,
const void *, unsigned int, void *);
static __always_inline int
__speck128_xts_crypt(struct skcipher_request *req,
speck128_crypt_one_t crypt_one,
speck128_xts_crypt_many_t crypt_many)
{
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
const struct speck128_xts_tfm_ctx *ctx = crypto_skcipher_ctx(tfm);
struct skcipher_walk walk;
le128 tweak;
int err;
err = skcipher_walk_virt(&walk, req, true);
crypto_speck128_encrypt(&ctx->tweak_key, (u8 *)&tweak, walk.iv);
while (walk.nbytes > 0) {
unsigned int nbytes = walk.nbytes;
u8 *dst = walk.dst.virt.addr;
const u8 *src = walk.src.virt.addr;
if (nbytes >= SPECK_NEON_CHUNK_SIZE && may_use_simd()) {
unsigned int count;
count = round_down(nbytes, SPECK_NEON_CHUNK_SIZE);
kernel_neon_begin();
(*crypt_many)(ctx->main_key.round_keys,
ctx->main_key.nrounds,
dst, src, count, &tweak);
kernel_neon_end();
dst += count;
src += count;
nbytes -= count;
}
/* Handle any remainder with generic code */
while (nbytes >= sizeof(tweak)) {
le128_xor((le128 *)dst, (const le128 *)src, &tweak);
(*crypt_one)(&ctx->main_key, dst, dst);
le128_xor((le128 *)dst, (const le128 *)dst, &tweak);
gf128mul_x_ble(&tweak, &tweak);
dst += sizeof(tweak);
src += sizeof(tweak);
nbytes -= sizeof(tweak);
}
err = skcipher_walk_done(&walk, nbytes);
}
return err;
}
static int speck128_xts_encrypt(struct skcipher_request *req)
{
return __speck128_xts_crypt(req, crypto_speck128_encrypt,
speck128_xts_encrypt_neon);
}
static int speck128_xts_decrypt(struct skcipher_request *req)
{
return __speck128_xts_crypt(req, crypto_speck128_decrypt,
speck128_xts_decrypt_neon);
}
static int speck128_xts_setkey(struct crypto_skcipher *tfm, const u8 *key,
unsigned int keylen)
{
struct speck128_xts_tfm_ctx *ctx = crypto_skcipher_ctx(tfm);
int err;
err = xts_verify_key(tfm, key, keylen);
if (err)
return err;
keylen /= 2;
err = crypto_speck128_setkey(&ctx->main_key, key, keylen);
if (err)
return err;
return crypto_speck128_setkey(&ctx->tweak_key, key + keylen, keylen);
}
/* Speck64 */
struct speck64_xts_tfm_ctx {
struct speck64_tfm_ctx main_key;
struct speck64_tfm_ctx tweak_key;
};
asmlinkage void speck64_xts_encrypt_neon(const u32 *round_keys, int nrounds,
void *dst, const void *src,
unsigned int nbytes, void *tweak);
asmlinkage void speck64_xts_decrypt_neon(const u32 *round_keys, int nrounds,
void *dst, const void *src,
unsigned int nbytes, void *tweak);
typedef void (*speck64_crypt_one_t)(const struct speck64_tfm_ctx *,
u8 *, const u8 *);
typedef void (*speck64_xts_crypt_many_t)(const u32 *, int, void *,
const void *, unsigned int, void *);
static __always_inline int
__speck64_xts_crypt(struct skcipher_request *req, speck64_crypt_one_t crypt_one,
speck64_xts_crypt_many_t crypt_many)
{
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
const struct speck64_xts_tfm_ctx *ctx = crypto_skcipher_ctx(tfm);
struct skcipher_walk walk;
__le64 tweak;
int err;
err = skcipher_walk_virt(&walk, req, true);
crypto_speck64_encrypt(&ctx->tweak_key, (u8 *)&tweak, walk.iv);
while (walk.nbytes > 0) {
unsigned int nbytes = walk.nbytes;
u8 *dst = walk.dst.virt.addr;
const u8 *src = walk.src.virt.addr;
if (nbytes >= SPECK_NEON_CHUNK_SIZE && may_use_simd()) {
unsigned int count;
count = round_down(nbytes, SPECK_NEON_CHUNK_SIZE);
kernel_neon_begin();
(*crypt_many)(ctx->main_key.round_keys,
ctx->main_key.nrounds,
dst, src, count, &tweak);
kernel_neon_end();
dst += count;
src += count;
nbytes -= count;
}
/* Handle any remainder with generic code */
while (nbytes >= sizeof(tweak)) {
*(__le64 *)dst = *(__le64 *)src ^ tweak;
(*crypt_one)(&ctx->main_key, dst, dst);
*(__le64 *)dst ^= tweak;
tweak = cpu_to_le64((le64_to_cpu(tweak) << 1) ^
((tweak & cpu_to_le64(1ULL << 63)) ?
0x1B : 0));
dst += sizeof(tweak);
src += sizeof(tweak);
nbytes -= sizeof(tweak);
}
err = skcipher_walk_done(&walk, nbytes);
}
return err;
}
static int speck64_xts_encrypt(struct skcipher_request *req)
{
return __speck64_xts_crypt(req, crypto_speck64_encrypt,
speck64_xts_encrypt_neon);
}
static int speck64_xts_decrypt(struct skcipher_request *req)
{
return __speck64_xts_crypt(req, crypto_speck64_decrypt,
speck64_xts_decrypt_neon);
}
static int speck64_xts_setkey(struct crypto_skcipher *tfm, const u8 *key,
unsigned int keylen)
{
struct speck64_xts_tfm_ctx *ctx = crypto_skcipher_ctx(tfm);
int err;
err = xts_verify_key(tfm, key, keylen);
if (err)
return err;
keylen /= 2;
err = crypto_speck64_setkey(&ctx->main_key, key, keylen);
if (err)
return err;
return crypto_speck64_setkey(&ctx->tweak_key, key + keylen, keylen);
}
static struct skcipher_alg speck_algs[] = {
{
.base.cra_name = "xts(speck128)",
.base.cra_driver_name = "xts-speck128-neon",
.base.cra_priority = 300,
.base.cra_blocksize = SPECK128_BLOCK_SIZE,
.base.cra_ctxsize = sizeof(struct speck128_xts_tfm_ctx),
.base.cra_alignmask = 7,
.base.cra_module = THIS_MODULE,
.min_keysize = 2 * SPECK128_128_KEY_SIZE,
.max_keysize = 2 * SPECK128_256_KEY_SIZE,
.ivsize = SPECK128_BLOCK_SIZE,
.walksize = SPECK_NEON_CHUNK_SIZE,
.setkey = speck128_xts_setkey,
.encrypt = speck128_xts_encrypt,
.decrypt = speck128_xts_decrypt,
}, {
.base.cra_name = "xts(speck64)",
.base.cra_driver_name = "xts-speck64-neon",
.base.cra_priority = 300,
.base.cra_blocksize = SPECK64_BLOCK_SIZE,
.base.cra_ctxsize = sizeof(struct speck64_xts_tfm_ctx),
.base.cra_alignmask = 7,
.base.cra_module = THIS_MODULE,
.min_keysize = 2 * SPECK64_96_KEY_SIZE,
.max_keysize = 2 * SPECK64_128_KEY_SIZE,
.ivsize = SPECK64_BLOCK_SIZE,
.walksize = SPECK_NEON_CHUNK_SIZE,
.setkey = speck64_xts_setkey,
.encrypt = speck64_xts_encrypt,
.decrypt = speck64_xts_decrypt,
}
};
static int __init speck_neon_module_init(void)
{
if (!(elf_hwcap & HWCAP_NEON))
return -ENODEV;
return crypto_register_skciphers(speck_algs, ARRAY_SIZE(speck_algs));
}
static void __exit speck_neon_module_exit(void)
{
crypto_unregister_skciphers(speck_algs, ARRAY_SIZE(speck_algs));
}
module_init(speck_neon_module_init);
module_exit(speck_neon_module_exit);
MODULE_DESCRIPTION("Speck block cipher (NEON-accelerated)");
MODULE_LICENSE("GPL");
MODULE_AUTHOR("Eric Biggers <ebiggers@google.com>");
MODULE_ALIAS_CRYPTO("xts(speck128)");
MODULE_ALIAS_CRYPTO("xts-speck128-neon");
MODULE_ALIAS_CRYPTO("xts(speck64)");
MODULE_ALIAS_CRYPTO("xts-speck64-neon");
......@@ -113,4 +113,10 @@ config CRYPTO_AES_ARM64_BS
select CRYPTO_AES_ARM64
select CRYPTO_SIMD
config CRYPTO_SPECK_NEON
tristate "NEON accelerated Speck cipher algorithms"
depends on KERNEL_MODE_NEON
select CRYPTO_BLKCIPHER
select CRYPTO_SPECK
endif
......@@ -53,20 +53,21 @@ sha512-arm64-y := sha512-glue.o sha512-core.o
obj-$(CONFIG_CRYPTO_CHACHA20_NEON) += chacha20-neon.o
chacha20-neon-y := chacha20-neon-core.o chacha20-neon-glue.o
obj-$(CONFIG_CRYPTO_SPECK_NEON) += speck-neon.o
speck-neon-y := speck-neon-core.o speck-neon-glue.o
obj-$(CONFIG_CRYPTO_AES_ARM64) += aes-arm64.o
aes-arm64-y := aes-cipher-core.o aes-cipher-glue.o
obj-$(CONFIG_CRYPTO_AES_ARM64_BS) += aes-neon-bs.o
aes-neon-bs-y := aes-neonbs-core.o aes-neonbs-glue.o
AFLAGS_aes-ce.o := -DINTERLEAVE=4
AFLAGS_aes-neon.o := -DINTERLEAVE=4
CFLAGS_aes-glue-ce.o := -DUSE_V8_CRYPTO_EXTENSIONS
$(obj)/aes-glue-%.o: $(src)/aes-glue.c FORCE
$(call if_changed_rule,cc_o_c)
ifdef REGENERATE_ARM64_CRYPTO
quiet_cmd_perlasm = PERLASM $@
cmd_perlasm = $(PERL) $(<) void $(@)
......@@ -75,5 +76,6 @@ $(src)/sha256-core.S_shipped: $(src)/sha512-armv8.pl
$(src)/sha512-core.S_shipped: $(src)/sha512-armv8.pl
$(call cmd,perlasm)
endif
.PRECIOUS: $(obj)/sha256-core.S $(obj)/sha512-core.S
......@@ -107,11 +107,13 @@ static int ccm_init_mac(struct aead_request *req, u8 maciv[], u32 msglen)
}
static void ccm_update_mac(struct crypto_aes_ctx *key, u8 mac[], u8 const in[],
u32 abytes, u32 *macp, bool use_neon)
u32 abytes, u32 *macp)
{
if (likely(use_neon)) {
if (may_use_simd()) {
kernel_neon_begin();
ce_aes_ccm_auth_data(mac, in, abytes, macp, key->key_enc,
num_rounds(key));
kernel_neon_end();
} else {
if (*macp > 0 && *macp < AES_BLOCK_SIZE) {
int added = min(abytes, AES_BLOCK_SIZE - *macp);
......@@ -143,8 +145,7 @@ static void ccm_update_mac(struct crypto_aes_ctx *key, u8 mac[], u8 const in[],
}
}
static void ccm_calculate_auth_mac(struct aead_request *req, u8 mac[],
bool use_neon)
static void ccm_calculate_auth_mac(struct aead_request *req, u8 mac[])
{
struct crypto_aead *aead = crypto_aead_reqtfm(req);
struct crypto_aes_ctx *ctx = crypto_aead_ctx(aead);
......@@ -163,7 +164,7 @@ static void ccm_calculate_auth_mac(struct aead_request *req, u8 mac[],
ltag.len = 6;
}
ccm_update_mac(ctx, mac, (u8 *)&ltag, ltag.len, &macp, use_neon);
ccm_update_mac(ctx, mac, (u8 *)&ltag, ltag.len, &macp);
scatterwalk_start(&walk, req->src);
do {
......@@ -175,7 +176,7 @@ static void ccm_calculate_auth_mac(struct aead_request *req, u8 mac[],
n = scatterwalk_clamp(&walk, len);
}
p = scatterwalk_map(&walk);
ccm_update_mac(ctx, mac, p, n, &macp, use_neon);
ccm_update_mac(ctx, mac, p, n, &macp);
len -= n;
scatterwalk_unmap(p);
......@@ -242,43 +243,42 @@ static int ccm_encrypt(struct aead_request *req)
u8 __aligned(8) mac[AES_BLOCK_SIZE];
u8 buf[AES_BLOCK_SIZE];
u32 len = req->cryptlen;
bool use_neon = may_use_simd();
int err;
err = ccm_init_mac(req, mac, len);
if (err)
return err;
if (likely(use_neon))
kernel_neon_begin();
if (req->assoclen)
ccm_calculate_auth_mac(req, mac, use_neon);
ccm_calculate_auth_mac(req, mac);
/* preserve the original iv for the final round */
memcpy(buf, req->iv, AES_BLOCK_SIZE);
err = skcipher_walk_aead_encrypt(&walk, req, true);
if (likely(use_neon)) {
if (may_use_simd()) {
while (walk.nbytes) {
u32 tail = walk.nbytes % AES_BLOCK_SIZE;
if (walk.nbytes == walk.total)
tail = 0;
kernel_neon_begin();
ce_aes_ccm_encrypt(walk.dst.virt.addr,
walk.src.virt.addr,
walk.nbytes - tail, ctx->key_enc,
num_rounds(ctx), mac, walk.iv);
kernel_neon_end();
err = skcipher_walk_done(&walk, tail);
}
if (!err)
if (!err) {
kernel_neon_begin();
ce_aes_ccm_final(mac, buf, ctx->key_enc,
num_rounds(ctx));
kernel_neon_end();
}
} else {
err = ccm_crypt_fallback(&walk, mac, buf, ctx, true);
}
......@@ -301,43 +301,42 @@ static int ccm_decrypt(struct aead_request *req)
u8 __aligned(8) mac[AES_BLOCK_SIZE];
u8 buf[AES_BLOCK_SIZE];
u32 len = req->cryptlen - authsize;
bool use_neon = may_use_simd();
int err;
err = ccm_init_mac(req, mac, len);
if (err)
return err;
if (likely(use_neon))
kernel_neon_begin();
if (req->assoclen)
ccm_calculate_auth_mac(req, mac, use_neon);
ccm_calculate_auth_mac(req, mac);
/* preserve the original iv for the final round */
memcpy(buf, req->iv, AES_BLOCK_SIZE);
err = skcipher_walk_aead_decrypt(&walk, req, true);
if (likely(use_neon)) {
if (may_use_simd()) {
while (walk.nbytes) {
u32 tail = walk.nbytes % AES_BLOCK_SIZE;
if (walk.nbytes == walk.total)
tail = 0;
kernel_neon_begin();
ce_aes_ccm_decrypt(walk.dst.virt.addr,
walk.src.virt.addr,
walk.nbytes - tail, ctx->key_enc,
num_rounds(ctx), mac, walk.iv);
kernel_neon_end();
err = skcipher_walk_done(&walk, tail);
}
if (!err)
if (!err) {
kernel_neon_begin();
ce_aes_ccm_final(mac, buf, ctx->key_enc,
num_rounds(ctx));
kernel_neon_end();
}
} else {
err = ccm_crypt_fallback(&walk, mac, buf, ctx, false);
}
......
......@@ -64,17 +64,17 @@ MODULE_LICENSE("GPL v2");
/* defined in aes-modes.S */
asmlinkage void aes_ecb_encrypt(u8 out[], u8 const in[], u8 const rk[],
int rounds, int blocks, int first);
int rounds, int blocks);
asmlinkage void aes_ecb_decrypt(u8 out[], u8 const in[], u8 const rk[],
int rounds, int blocks, int first);
int rounds, int blocks);
asmlinkage void aes_cbc_encrypt(u8 out[], u8 const in[], u8 const rk[],
int rounds, int blocks, u8 iv[], int first);
int rounds, int blocks, u8 iv[]);
asmlinkage void aes_cbc_decrypt(u8 out[], u8 const in[], u8 const rk[],
int rounds, int blocks, u8 iv[], int first);
int rounds, int blocks, u8 iv[]);
asmlinkage void aes_ctr_encrypt(u8 out[], u8 const in[], u8 const rk[],
int rounds, int blocks, u8 ctr[], int first);
int rounds, int blocks, u8 ctr[]);
asmlinkage void aes_xts_encrypt(u8 out[], u8 const in[], u8 const rk1[],
int rounds, int blocks, u8 const rk2[], u8 iv[],
......@@ -133,19 +133,19 @@ static int ecb_encrypt(struct skcipher_request *req)
{
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
struct crypto_aes_ctx *ctx = crypto_skcipher_ctx(tfm);
int err, first, rounds = 6 + ctx->key_length / 4;
int err, rounds = 6 + ctx->key_length / 4;
struct skcipher_walk walk;
unsigned int blocks;
err = skcipher_walk_virt(&walk, req, true);
err = skcipher_walk_virt(&walk, req, false);
while ((blocks = (walk.nbytes / AES_BLOCK_SIZE))) {
kernel_neon_begin();
for (first = 1; (blocks = (walk.nbytes / AES_BLOCK_SIZE)); first = 0) {
aes_ecb_encrypt(walk.dst.virt.addr, walk.src.virt.addr,
(u8 *)ctx->key_enc, rounds, blocks, first);
(u8 *)ctx->key_enc, rounds, blocks);
kernel_neon_end();
err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE);
}
kernel_neon_end();
return err;
}
......@@ -153,19 +153,19 @@ static int ecb_decrypt(struct skcipher_request *req)
{
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
struct crypto_aes_ctx *ctx = crypto_skcipher_ctx(tfm);
int err, first, rounds = 6 + ctx->key_length / 4;
int err, rounds = 6 + ctx->key_length / 4;
struct skcipher_walk walk;
unsigned int blocks;
err = skcipher_walk_virt(&walk, req, true);
err = skcipher_walk_virt(&walk, req, false);
while ((blocks = (walk.nbytes / AES_BLOCK_SIZE))) {
kernel_neon_begin();
for (first = 1; (blocks = (walk.nbytes / AES_BLOCK_SIZE)); first = 0) {
aes_ecb_decrypt(walk.dst.virt.addr, walk.src.virt.addr,
(u8 *)ctx->key_dec, rounds, blocks, first);
(u8 *)ctx->key_dec, rounds, blocks);
kernel_neon_end();
err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE);
}
kernel_neon_end();
return err;
}
......@@ -173,20 +173,19 @@ static int cbc_encrypt(struct skcipher_request *req)
{
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
struct crypto_aes_ctx *ctx = crypto_skcipher_ctx(tfm);
int err, first, rounds = 6 + ctx->key_length / 4;
int err, rounds = 6 + ctx->key_length / 4;
struct skcipher_walk walk;
unsigned int blocks;
err = skcipher_walk_virt(&walk, req, true);
err = skcipher_walk_virt(&walk, req, false);
while ((blocks = (walk.nbytes / AES_BLOCK_SIZE))) {
kernel_neon_begin();
for (first = 1; (blocks = (walk.nbytes / AES_BLOCK_SIZE)); first = 0) {
aes_cbc_encrypt(walk.dst.virt.addr, walk.src.virt.addr,
(u8 *)ctx->key_enc, rounds, blocks, walk.iv,
first);
(u8 *)ctx->key_enc, rounds, blocks, walk.iv);
kernel_neon_end();
err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE);
}
kernel_neon_end();
return err;
}
......@@ -194,20 +193,19 @@ static int cbc_decrypt(struct skcipher_request *req)
{
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
struct crypto_aes_ctx *ctx = crypto_skcipher_ctx(tfm);
int err, first, rounds = 6 + ctx->key_length / 4;
int err, rounds = 6 + ctx->key_length / 4;
struct skcipher_walk walk;
unsigned int blocks;
err = skcipher_walk_virt(&walk, req, true);
err = skcipher_walk_virt(&walk, req, false);
while ((blocks = (walk.nbytes / AES_BLOCK_SIZE))) {
kernel_neon_begin();
for (first = 1; (blocks = (walk.nbytes / AES_BLOCK_SIZE)); first = 0) {
aes_cbc_decrypt(walk.dst.virt.addr, walk.src.virt.addr,
(u8 *)ctx->key_dec, rounds, blocks, walk.iv,
first);
(u8 *)ctx->key_dec, rounds, blocks, walk.iv);
kernel_neon_end();
err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE);
}
kernel_neon_end();
return err;
}
......@@ -215,20 +213,18 @@ static int ctr_encrypt(struct skcipher_request *req)
{
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
struct crypto_aes_ctx *ctx = crypto_skcipher_ctx(tfm);
int err, first, rounds = 6 + ctx->key_length / 4;
int err, rounds = 6 + ctx->key_length / 4;
struct skcipher_walk walk;
int blocks;
err = skcipher_walk_virt(&walk, req, true);
err = skcipher_walk_virt(&walk, req, false);
first = 1;
kernel_neon_begin();
while ((blocks = (walk.nbytes / AES_BLOCK_SIZE))) {
kernel_neon_begin();
aes_ctr_encrypt(walk.dst.virt.addr, walk.src.virt.addr,
(u8 *)ctx->key_enc, rounds, blocks, walk.iv,
first);
(u8 *)ctx->key_enc, rounds, blocks, walk.iv);
err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE);
first = 0;
kernel_neon_end();
}
if (walk.nbytes) {
u8 __aligned(8) tail[AES_BLOCK_SIZE];
......@@ -241,12 +237,13 @@ static int ctr_encrypt(struct skcipher_request *req)
*/
blocks = -1;
kernel_neon_begin();
aes_ctr_encrypt(tail, NULL, (u8 *)ctx->key_enc, rounds,
blocks, walk.iv, first);
blocks, walk.iv);
kernel_neon_end();
crypto_xor_cpy(tdst, tsrc, tail, nbytes);
err = skcipher_walk_done(&walk, 0);
}
kernel_neon_end();
return err;
}
......@@ -270,16 +267,16 @@ static int xts_encrypt(struct skcipher_request *req)
struct skcipher_walk walk;
unsigned int blocks;
err = skcipher_walk_virt(&walk, req, true);
err = skcipher_walk_virt(&walk, req, false);
kernel_neon_begin();
for (first = 1; (blocks = (walk.nbytes / AES_BLOCK_SIZE)); first = 0) {
kernel_neon_begin();
aes_xts_encrypt(walk.dst.virt.addr, walk.src.virt.addr,
(u8 *)ctx->key1.key_enc, rounds, blocks,
(u8 *)ctx->key2.key_enc, walk.iv, first);
kernel_neon_end();
err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE);
}
kernel_neon_end();
return err;
}
......@@ -292,16 +289,16 @@ static int xts_decrypt(struct skcipher_request *req)
struct skcipher_walk walk;
unsigned int blocks;
err = skcipher_walk_virt(&walk, req, true);
err = skcipher_walk_virt(&walk, req, false);
kernel_neon_begin();
for (first = 1; (blocks = (walk.nbytes / AES_BLOCK_SIZE)); first = 0) {
kernel_neon_begin();
aes_xts_decrypt(walk.dst.virt.addr, walk.src.virt.addr,
(u8 *)ctx->key1.key_dec, rounds, blocks,
(u8 *)ctx->key2.key_enc, walk.iv, first);
kernel_neon_end();
err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE);
}
kernel_neon_end();
return err;
}
......@@ -425,7 +422,7 @@ static int cmac_setkey(struct crypto_shash *tfm, const u8 *in_key,
/* encrypt the zero vector */
kernel_neon_begin();
aes_ecb_encrypt(ctx->consts, (u8[AES_BLOCK_SIZE]){}, rk, rounds, 1, 1);
aes_ecb_encrypt(ctx->consts, (u8[AES_BLOCK_SIZE]){}, rk, rounds, 1);
kernel_neon_end();
cmac_gf128_mul_by_x(consts, consts);
......@@ -454,8 +451,8 @@ static int xcbc_setkey(struct crypto_shash *tfm, const u8 *in_key,
return err;
kernel_neon_begin();
aes_ecb_encrypt(key, ks[0], rk, rounds, 1, 1);
aes_ecb_encrypt(ctx->consts, ks[1], rk, rounds, 2, 0);
aes_ecb_encrypt(key, ks[0], rk, rounds, 1);
aes_ecb_encrypt(ctx->consts, ks[1], rk, rounds, 2);
kernel_neon_end();
return cbcmac_setkey(tfm, key, sizeof(key));
......
......@@ -13,127 +13,39 @@
.text
.align 4
/*
* There are several ways to instantiate this code:
* - no interleave, all inline
* - 2-way interleave, 2x calls out of line (-DINTERLEAVE=2)
* - 2-way interleave, all inline (-DINTERLEAVE=2 -DINTERLEAVE_INLINE)
* - 4-way interleave, 4x calls out of line (-DINTERLEAVE=4)
* - 4-way interleave, all inline (-DINTERLEAVE=4 -DINTERLEAVE_INLINE)
*
* Macros imported by this code:
* - enc_prepare - setup NEON registers for encryption
* - dec_prepare - setup NEON registers for decryption
* - enc_switch_key - change to new key after having prepared for encryption
* - encrypt_block - encrypt a single block
* - decrypt block - decrypt a single block
* - encrypt_block2x - encrypt 2 blocks in parallel (if INTERLEAVE == 2)
* - decrypt_block2x - decrypt 2 blocks in parallel (if INTERLEAVE == 2)
* - encrypt_block4x - encrypt 4 blocks in parallel (if INTERLEAVE == 4)
* - decrypt_block4x - decrypt 4 blocks in parallel (if INTERLEAVE == 4)
*/
#if defined(INTERLEAVE) && !defined(INTERLEAVE_INLINE)
#define FRAME_PUSH stp x29, x30, [sp,#-16]! ; mov x29, sp
#define FRAME_POP ldp x29, x30, [sp],#16
#if INTERLEAVE == 2
aes_encrypt_block2x:
encrypt_block2x v0, v1, w3, x2, x6, w7
ret
ENDPROC(aes_encrypt_block2x)
aes_decrypt_block2x:
decrypt_block2x v0, v1, w3, x2, x6, w7
ret
ENDPROC(aes_decrypt_block2x)
#elif INTERLEAVE == 4
aes_encrypt_block4x:
encrypt_block4x v0, v1, v2, v3, w3, x2, x6, w7
encrypt_block4x v0, v1, v2, v3, w3, x2, x8, w7
ret
ENDPROC(aes_encrypt_block4x)
aes_decrypt_block4x:
decrypt_block4x v0, v1, v2, v3, w3, x2, x6, w7
decrypt_block4x v0, v1, v2, v3, w3, x2, x8, w7
ret
ENDPROC(aes_decrypt_block4x)
#else
#error INTERLEAVE should equal 2 or 4
#endif
.macro do_encrypt_block2x
bl aes_encrypt_block2x
.endm
.macro do_decrypt_block2x
bl aes_decrypt_block2x
.endm
.macro do_encrypt_block4x
bl aes_encrypt_block4x
.endm
.macro do_decrypt_block4x
bl aes_decrypt_block4x
.endm
#else
#define FRAME_PUSH
#define FRAME_POP
.macro do_encrypt_block2x
encrypt_block2x v0, v1, w3, x2, x6, w7
.endm
.macro do_decrypt_block2x
decrypt_block2x v0, v1, w3, x2, x6, w7
.endm
.macro do_encrypt_block4x
encrypt_block4x v0, v1, v2, v3, w3, x2, x6, w7
.endm
.macro do_decrypt_block4x
decrypt_block4x v0, v1, v2, v3, w3, x2, x6, w7
.endm
#endif
/*
* aes_ecb_encrypt(u8 out[], u8 const in[], u8 const rk[], int rounds,
* int blocks, int first)
* int blocks)
* aes_ecb_decrypt(u8 out[], u8 const in[], u8 const rk[], int rounds,
* int blocks, int first)
* int blocks)
*/
AES_ENTRY(aes_ecb_encrypt)
FRAME_PUSH
cbz w5, .LecbencloopNx
stp x29, x30, [sp, #-16]!
mov x29, sp
enc_prepare w3, x2, x5
.LecbencloopNx:
#if INTERLEAVE >= 2
subs w4, w4, #INTERLEAVE
subs w4, w4, #4
bmi .Lecbenc1x
#if INTERLEAVE == 2
ld1 {v0.16b-v1.16b}, [x1], #32 /* get 2 pt blocks */
do_encrypt_block2x
st1 {v0.16b-v1.16b}, [x0], #32
#else
ld1 {v0.16b-v3.16b}, [x1], #64 /* get 4 pt blocks */
do_encrypt_block4x
bl aes_encrypt_block4x
st1 {v0.16b-v3.16b}, [x0], #64
#endif
b .LecbencloopNx
.Lecbenc1x:
adds w4, w4, #INTERLEAVE
adds w4, w4, #4
beq .Lecbencout
#endif
.Lecbencloop:
ld1 {v0.16b}, [x1], #16 /* get next pt block */
encrypt_block v0, w3, x2, x5, w6
......@@ -141,35 +53,27 @@ AES_ENTRY(aes_ecb_encrypt)
subs w4, w4, #1
bne .Lecbencloop
.Lecbencout:
FRAME_POP
ldp x29, x30, [sp], #16
ret
AES_ENDPROC(aes_ecb_encrypt)
AES_ENTRY(aes_ecb_decrypt)
FRAME_PUSH
cbz w5, .LecbdecloopNx
stp x29, x30, [sp, #-16]!
mov x29, sp
dec_prepare w3, x2, x5
.LecbdecloopNx:
#if INTERLEAVE >= 2
subs w4, w4, #INTERLEAVE
subs w4, w4, #4
bmi .Lecbdec1x
#if INTERLEAVE == 2
ld1 {v0.16b-v1.16b}, [x1], #32 /* get 2 ct blocks */
do_decrypt_block2x
st1 {v0.16b-v1.16b}, [x0], #32
#else
ld1 {v0.16b-v3.16b}, [x1], #64 /* get 4 ct blocks */
do_decrypt_block4x
bl aes_decrypt_block4x
st1 {v0.16b-v3.16b}, [x0], #64
#endif
b .LecbdecloopNx
.Lecbdec1x:
adds w4, w4, #INTERLEAVE
adds w4, w4, #4
beq .Lecbdecout
#endif
.Lecbdecloop:
ld1 {v0.16b}, [x1], #16 /* get next ct block */
decrypt_block v0, w3, x2, x5, w6
......@@ -177,62 +81,68 @@ AES_ENTRY(aes_ecb_decrypt)
subs w4, w4, #1
bne .Lecbdecloop
.Lecbdecout:
FRAME_POP
ldp x29, x30, [sp], #16
ret
AES_ENDPROC(aes_ecb_decrypt)
/*
* aes_cbc_encrypt(u8 out[], u8 const in[], u8 const rk[], int rounds,
* int blocks, u8 iv[], int first)
* int blocks, u8 iv[])
* aes_cbc_decrypt(u8 out[], u8 const in[], u8 const rk[], int rounds,
* int blocks, u8 iv[], int first)
* int blocks, u8 iv[])
*/
AES_ENTRY(aes_cbc_encrypt)
cbz w6, .Lcbcencloop
ld1 {v0.16b}, [x5] /* get iv */
ld1 {v4.16b}, [x5] /* get iv */
enc_prepare w3, x2, x6
.Lcbcencloop:
ld1 {v1.16b}, [x1], #16 /* get next pt block */
eor v0.16b, v0.16b, v1.16b /* ..and xor with iv */
.Lcbcencloop4x:
subs w4, w4, #4
bmi .Lcbcenc1x
ld1 {v0.16b-v3.16b}, [x1], #64 /* get 4 pt blocks */
eor v0.16b, v0.16b, v4.16b /* ..and xor with iv */
encrypt_block v0, w3, x2, x6, w7
st1 {v0.16b}, [x0], #16
eor v1.16b, v1.16b, v0.16b
encrypt_block v1, w3, x2, x6, w7
eor v2.16b, v2.16b, v1.16b
encrypt_block v2, w3, x2, x6, w7
eor v3.16b, v3.16b, v2.16b
encrypt_block v3, w3, x2, x6, w7
st1 {v0.16b-v3.16b}, [x0], #64
mov v4.16b, v3.16b
b .Lcbcencloop4x
.Lcbcenc1x:
adds w4, w4, #4
beq .Lcbcencout
.Lcbcencloop:
ld1 {v0.16b}, [x1], #16 /* get next pt block */
eor v4.16b, v4.16b, v0.16b /* ..and xor with iv */
encrypt_block v4, w3, x2, x6, w7
st1 {v4.16b}, [x0], #16
subs w4, w4, #1
bne .Lcbcencloop
st1 {v0.16b}, [x5] /* return iv */
.Lcbcencout:
st1 {v4.16b}, [x5] /* return iv */
ret
AES_ENDPROC(aes_cbc_encrypt)
AES_ENTRY(aes_cbc_decrypt)
FRAME_PUSH
cbz w6, .LcbcdecloopNx
stp x29, x30, [sp, #-16]!
mov x29, sp
ld1 {v7.16b}, [x5] /* get iv */
dec_prepare w3, x2, x6
.LcbcdecloopNx:
#if INTERLEAVE >= 2
subs w4, w4, #INTERLEAVE
subs w4, w4, #4
bmi .Lcbcdec1x
#if INTERLEAVE == 2
ld1 {v0.16b-v1.16b}, [x1], #32 /* get 2 ct blocks */
mov v2.16b, v0.16b
mov v3.16b, v1.16b
do_decrypt_block2x
eor v0.16b, v0.16b, v7.16b
eor v1.16b, v1.16b, v2.16b
mov v7.16b, v3.16b
st1 {v0.16b-v1.16b}, [x0], #32
#else
ld1 {v0.16b-v3.16b}, [x1], #64 /* get 4 ct blocks */
mov v4.16b, v0.16b
mov v5.16b, v1.16b
mov v6.16b, v2.16b
do_decrypt_block4x
bl aes_decrypt_block4x
sub x1, x1, #16
eor v0.16b, v0.16b, v7.16b
eor v1.16b, v1.16b, v4.16b
......@@ -240,12 +150,10 @@ AES_ENTRY(aes_cbc_decrypt)
eor v2.16b, v2.16b, v5.16b
eor v3.16b, v3.16b, v6.16b
st1 {v0.16b-v3.16b}, [x0], #64
#endif
b .LcbcdecloopNx
.Lcbcdec1x:
adds w4, w4, #INTERLEAVE
adds w4, w4, #4
beq .Lcbcdecout
#endif
.Lcbcdecloop:
ld1 {v1.16b}, [x1], #16 /* get next ct block */
mov v0.16b, v1.16b /* ...and copy to v0 */
......@@ -256,49 +164,33 @@ AES_ENTRY(aes_cbc_decrypt)
subs w4, w4, #1
bne .Lcbcdecloop
.Lcbcdecout:
FRAME_POP
st1 {v7.16b}, [x5] /* return iv */
ldp x29, x30, [sp], #16
ret
AES_ENDPROC(aes_cbc_decrypt)
/*
* aes_ctr_encrypt(u8 out[], u8 const in[], u8 const rk[], int rounds,
* int blocks, u8 ctr[], int first)
* int blocks, u8 ctr[])
*/
AES_ENTRY(aes_ctr_encrypt)
FRAME_PUSH
cbz w6, .Lctrnotfirst /* 1st time around? */
stp x29, x30, [sp, #-16]!
mov x29, sp
enc_prepare w3, x2, x6
ld1 {v4.16b}, [x5]
.Lctrnotfirst:
umov x8, v4.d[1] /* keep swabbed ctr in reg */
rev x8, x8
#if INTERLEAVE >= 2
cmn w8, w4 /* 32 bit overflow? */
umov x6, v4.d[1] /* keep swabbed ctr in reg */
rev x6, x6
cmn w6, w4 /* 32 bit overflow? */
bcs .Lctrloop
.LctrloopNx:
subs w4, w4, #INTERLEAVE
subs w4, w4, #4
bmi .Lctr1x
#if INTERLEAVE == 2
mov v0.8b, v4.8b
mov v1.8b, v4.8b
rev x7, x8
add x8, x8, #1
ins v0.d[1], x7
rev x7, x8
add x8, x8, #1
ins v1.d[1], x7
ld1 {v2.16b-v3.16b}, [x1], #32 /* get 2 input blocks */
do_encrypt_block2x
eor v0.16b, v0.16b, v2.16b
eor v1.16b, v1.16b, v3.16b
st1 {v0.16b-v1.16b}, [x0], #32
#else
ldr q8, =0x30000000200000001 /* addends 1,2,3[,0] */
dup v7.4s, w8
dup v7.4s, w6
mov v0.16b, v4.16b
add v7.4s, v7.4s, v8.4s
mov v1.16b, v4.16b
......@@ -309,29 +201,27 @@ AES_ENTRY(aes_ctr_encrypt)
mov v2.s[3], v8.s[1]
mov v3.s[3], v8.s[2]
ld1 {v5.16b-v7.16b}, [x1], #48 /* get 3 input blocks */
do_encrypt_block4x
bl aes_encrypt_block4x
eor v0.16b, v5.16b, v0.16b
ld1 {v5.16b}, [x1], #16 /* get 1 input block */
eor v1.16b, v6.16b, v1.16b
eor v2.16b, v7.16b, v2.16b
eor v3.16b, v5.16b, v3.16b
st1 {v0.16b-v3.16b}, [x0], #64
add x8, x8, #INTERLEAVE
#endif
rev x7, x8
add x6, x6, #4
rev x7, x6
ins v4.d[1], x7
cbz w4, .Lctrout
b .LctrloopNx
.Lctr1x:
adds w4, w4, #INTERLEAVE
adds w4, w4, #4
beq .Lctrout
#endif
.Lctrloop:
mov v0.16b, v4.16b
encrypt_block v0, w3, x2, x6, w7
encrypt_block v0, w3, x2, x8, w7
adds x8, x8, #1 /* increment BE ctr */
rev x7, x8
adds x6, x6, #1 /* increment BE ctr */
rev x7, x6
ins v4.d[1], x7
bcs .Lctrcarry /* overflow? */
......@@ -345,12 +235,12 @@ AES_ENTRY(aes_ctr_encrypt)
.Lctrout:
st1 {v4.16b}, [x5] /* return next CTR value */
FRAME_POP
ldp x29, x30, [sp], #16
ret
.Lctrtailblock:
st1 {v0.16b}, [x0]
FRAME_POP
ldp x29, x30, [sp], #16
ret
.Lctrcarry:
......@@ -384,39 +274,26 @@ CPU_LE( .quad 1, 0x87 )
CPU_BE( .quad 0x87, 1 )
AES_ENTRY(aes_xts_encrypt)
FRAME_PUSH
cbz w7, .LxtsencloopNx
stp x29, x30, [sp, #-16]!
mov x29, sp
ld1 {v4.16b}, [x6]
enc_prepare w3, x5, x6
encrypt_block v4, w3, x5, x6, w7 /* first tweak */
enc_switch_key w3, x2, x6
cbz w7, .Lxtsencnotfirst
enc_prepare w3, x5, x8
encrypt_block v4, w3, x5, x8, w7 /* first tweak */
enc_switch_key w3, x2, x8
ldr q7, .Lxts_mul_x
b .LxtsencNx
.Lxtsencnotfirst:
enc_prepare w3, x2, x8
.LxtsencloopNx:
ldr q7, .Lxts_mul_x
next_tweak v4, v4, v7, v8
.LxtsencNx:
#if INTERLEAVE >= 2
subs w4, w4, #INTERLEAVE
subs w4, w4, #4
bmi .Lxtsenc1x
#if INTERLEAVE == 2
ld1 {v0.16b-v1.16b}, [x1], #32 /* get 2 pt blocks */
next_tweak v5, v4, v7, v8
eor v0.16b, v0.16b, v4.16b
eor v1.16b, v1.16b, v5.16b
do_encrypt_block2x
eor v0.16b, v0.16b, v4.16b
eor v1.16b, v1.16b, v5.16b
st1 {v0.16b-v1.16b}, [x0], #32
cbz w4, .LxtsencoutNx
next_tweak v4, v5, v7, v8
b .LxtsencNx
.LxtsencoutNx:
mov v4.16b, v5.16b
b .Lxtsencout
#else
ld1 {v0.16b-v3.16b}, [x1], #64 /* get 4 pt blocks */
next_tweak v5, v4, v7, v8
eor v0.16b, v0.16b, v4.16b
......@@ -425,7 +302,7 @@ AES_ENTRY(aes_xts_encrypt)
eor v2.16b, v2.16b, v6.16b
next_tweak v7, v6, v7, v8
eor v3.16b, v3.16b, v7.16b
do_encrypt_block4x
bl aes_encrypt_block4x
eor v3.16b, v3.16b, v7.16b
eor v0.16b, v0.16b, v4.16b
eor v1.16b, v1.16b, v5.16b
......@@ -434,15 +311,13 @@ AES_ENTRY(aes_xts_encrypt)
mov v4.16b, v7.16b
cbz w4, .Lxtsencout
b .LxtsencloopNx
#endif
.Lxtsenc1x:
adds w4, w4, #INTERLEAVE
adds w4, w4, #4
beq .Lxtsencout
#endif
.Lxtsencloop:
ld1 {v1.16b}, [x1], #16
eor v0.16b, v1.16b, v4.16b
encrypt_block v0, w3, x2, x6, w7
encrypt_block v0, w3, x2, x8, w7
eor v0.16b, v0.16b, v4.16b
st1 {v0.16b}, [x0], #16
subs w4, w4, #1
......@@ -450,45 +325,33 @@ AES_ENTRY(aes_xts_encrypt)
next_tweak v4, v4, v7, v8
b .Lxtsencloop
.Lxtsencout:
FRAME_POP
st1 {v4.16b}, [x6]
ldp x29, x30, [sp], #16
ret
AES_ENDPROC(aes_xts_encrypt)
AES_ENTRY(aes_xts_decrypt)
FRAME_PUSH
cbz w7, .LxtsdecloopNx
stp x29, x30, [sp, #-16]!
mov x29, sp
ld1 {v4.16b}, [x6]
enc_prepare w3, x5, x6
encrypt_block v4, w3, x5, x6, w7 /* first tweak */
dec_prepare w3, x2, x6
cbz w7, .Lxtsdecnotfirst
enc_prepare w3, x5, x8
encrypt_block v4, w3, x5, x8, w7 /* first tweak */
dec_prepare w3, x2, x8
ldr q7, .Lxts_mul_x
b .LxtsdecNx
.Lxtsdecnotfirst:
dec_prepare w3, x2, x8
.LxtsdecloopNx:
ldr q7, .Lxts_mul_x
next_tweak v4, v4, v7, v8
.LxtsdecNx:
#if INTERLEAVE >= 2
subs w4, w4, #INTERLEAVE
subs w4, w4, #4
bmi .Lxtsdec1x
#if INTERLEAVE == 2
ld1 {v0.16b-v1.16b}, [x1], #32 /* get 2 ct blocks */
next_tweak v5, v4, v7, v8
eor v0.16b, v0.16b, v4.16b
eor v1.16b, v1.16b, v5.16b
do_decrypt_block2x
eor v0.16b, v0.16b, v4.16b
eor v1.16b, v1.16b, v5.16b
st1 {v0.16b-v1.16b}, [x0], #32
cbz w4, .LxtsdecoutNx
next_tweak v4, v5, v7, v8
b .LxtsdecNx
.LxtsdecoutNx:
mov v4.16b, v5.16b
b .Lxtsdecout
#else
ld1 {v0.16b-v3.16b}, [x1], #64 /* get 4 ct blocks */
next_tweak v5, v4, v7, v8
eor v0.16b, v0.16b, v4.16b
......@@ -497,7 +360,7 @@ AES_ENTRY(aes_xts_decrypt)
eor v2.16b, v2.16b, v6.16b
next_tweak v7, v6, v7, v8
eor v3.16b, v3.16b, v7.16b
do_decrypt_block4x
bl aes_decrypt_block4x
eor v3.16b, v3.16b, v7.16b
eor v0.16b, v0.16b, v4.16b
eor v1.16b, v1.16b, v5.16b
......@@ -506,15 +369,13 @@ AES_ENTRY(aes_xts_decrypt)
mov v4.16b, v7.16b
cbz w4, .Lxtsdecout
b .LxtsdecloopNx
#endif
.Lxtsdec1x:
adds w4, w4, #INTERLEAVE
adds w4, w4, #4
beq .Lxtsdecout
#endif
.Lxtsdecloop:
ld1 {v1.16b}, [x1], #16
eor v0.16b, v1.16b, v4.16b
decrypt_block v0, w3, x2, x6, w7
decrypt_block v0, w3, x2, x8, w7
eor v0.16b, v0.16b, v4.16b
st1 {v0.16b}, [x0], #16
subs w4, w4, #1
......@@ -522,7 +383,8 @@ AES_ENTRY(aes_xts_decrypt)
next_tweak v4, v4, v7, v8
b .Lxtsdecloop
.Lxtsdecout:
FRAME_POP
st1 {v4.16b}, [x6]
ldp x29, x30, [sp], #16
ret
AES_ENDPROC(aes_xts_decrypt)
......@@ -533,8 +395,28 @@ AES_ENDPROC(aes_xts_decrypt)
AES_ENTRY(aes_mac_update)
ld1 {v0.16b}, [x4] /* get dg */
enc_prepare w2, x1, x7
cbnz w5, .Lmacenc
cbz w5, .Lmacloop4x
encrypt_block v0, w2, x1, x7, w8
.Lmacloop4x:
subs w3, w3, #4
bmi .Lmac1x
ld1 {v1.16b-v4.16b}, [x0], #64 /* get next pt block */
eor v0.16b, v0.16b, v1.16b /* ..and xor with dg */
encrypt_block v0, w2, x1, x7, w8
eor v0.16b, v0.16b, v2.16b
encrypt_block v0, w2, x1, x7, w8
eor v0.16b, v0.16b, v3.16b
encrypt_block v0, w2, x1, x7, w8
eor v0.16b, v0.16b, v4.16b
cmp w3, wzr
csinv x5, x6, xzr, eq
cbz w5, .Lmacout
encrypt_block v0, w2, x1, x7, w8
b .Lmacloop4x
.Lmac1x:
add w3, w3, #4
.Lmacloop:
cbz w3, .Lmacout
ld1 {v1.16b}, [x0], #16 /* get next pt block */
......@@ -544,7 +426,6 @@ AES_ENTRY(aes_mac_update)
csinv x5, x6, xzr, eq
cbz w5, .Lmacout
.Lmacenc:
encrypt_block v0, w2, x1, x7, w8
b .Lmacloop
......
......@@ -46,10 +46,9 @@ asmlinkage void aesbs_xts_decrypt(u8 out[], u8 const in[], u8 const rk[],
/* borrowed from aes-neon-blk.ko */
asmlinkage void neon_aes_ecb_encrypt(u8 out[], u8 const in[], u32 const rk[],
int rounds, int blocks, int first);
int rounds, int blocks);
asmlinkage void neon_aes_cbc_encrypt(u8 out[], u8 const in[], u32 const rk[],
int rounds, int blocks, u8 iv[],
int first);
int rounds, int blocks, u8 iv[]);
struct aesbs_ctx {
u8 rk[13 * (8 * AES_BLOCK_SIZE) + 32];
......@@ -100,9 +99,8 @@ static int __ecb_crypt(struct skcipher_request *req,
struct skcipher_walk walk;
int err;
err = skcipher_walk_virt(&walk, req, true);
err = skcipher_walk_virt(&walk, req, false);
kernel_neon_begin();
while (walk.nbytes >= AES_BLOCK_SIZE) {
unsigned int blocks = walk.nbytes / AES_BLOCK_SIZE;
......@@ -110,12 +108,13 @@ static int __ecb_crypt(struct skcipher_request *req,
blocks = round_down(blocks,
walk.stride / AES_BLOCK_SIZE);
kernel_neon_begin();
fn(walk.dst.virt.addr, walk.src.virt.addr, ctx->rk,
ctx->rounds, blocks);
kernel_neon_end();
err = skcipher_walk_done(&walk,
walk.nbytes - blocks * AES_BLOCK_SIZE);
}
kernel_neon_end();
return err;
}
......@@ -157,22 +156,21 @@ static int cbc_encrypt(struct skcipher_request *req)
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
struct aesbs_cbc_ctx *ctx = crypto_skcipher_ctx(tfm);
struct skcipher_walk walk;
int err, first = 1;
int err;
err = skcipher_walk_virt(&walk, req, true);
err = skcipher_walk_virt(&walk, req, false);
kernel_neon_begin();
while (walk.nbytes >= AES_BLOCK_SIZE) {
unsigned int blocks = walk.nbytes / AES_BLOCK_SIZE;
/* fall back to the non-bitsliced NEON implementation */
kernel_neon_begin();
neon_aes_cbc_encrypt(walk.dst.virt.addr, walk.src.virt.addr,
ctx->enc, ctx->key.rounds, blocks, walk.iv,
first);
ctx->enc, ctx->key.rounds, blocks,
walk.iv);
kernel_neon_end();
err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE);
first = 0;
}
kernel_neon_end();
return err;
}
......@@ -183,9 +181,8 @@ static int cbc_decrypt(struct skcipher_request *req)
struct skcipher_walk walk;
int err;
err = skcipher_walk_virt(&walk, req, true);
err = skcipher_walk_virt(&walk, req, false);
kernel_neon_begin();
while (walk.nbytes >= AES_BLOCK_SIZE) {
unsigned int blocks = walk.nbytes / AES_BLOCK_SIZE;
......@@ -193,13 +190,14 @@ static int cbc_decrypt(struct skcipher_request *req)
blocks = round_down(blocks,
walk.stride / AES_BLOCK_SIZE);
kernel_neon_begin();
aesbs_cbc_decrypt(walk.dst.virt.addr, walk.src.virt.addr,
ctx->key.rk, ctx->key.rounds, blocks,
walk.iv);
kernel_neon_end();
err = skcipher_walk_done(&walk,
walk.nbytes - blocks * AES_BLOCK_SIZE);
}
kernel_neon_end();
return err;
}
......@@ -231,9 +229,8 @@ static int ctr_encrypt(struct skcipher_request *req)
u8 buf[AES_BLOCK_SIZE];
int err;
err = skcipher_walk_virt(&walk, req, true);
err = skcipher_walk_virt(&walk, req, false);
kernel_neon_begin();
while (walk.nbytes > 0) {
unsigned int blocks = walk.nbytes / AES_BLOCK_SIZE;
u8 *final = (walk.total % AES_BLOCK_SIZE) ? buf : NULL;
......@@ -244,8 +241,10 @@ static int ctr_encrypt(struct skcipher_request *req)
final = NULL;
}
kernel_neon_begin();
aesbs_ctr_encrypt(walk.dst.virt.addr, walk.src.virt.addr,
ctx->rk, ctx->rounds, blocks, walk.iv, final);
kernel_neon_end();
if (final) {
u8 *dst = walk.dst.virt.addr + blocks * AES_BLOCK_SIZE;
......@@ -260,8 +259,6 @@ static int ctr_encrypt(struct skcipher_request *req)
err = skcipher_walk_done(&walk,
walk.nbytes - blocks * AES_BLOCK_SIZE);
}
kernel_neon_end();
return err;
}
......@@ -306,12 +303,11 @@ static int __xts_crypt(struct skcipher_request *req,
struct skcipher_walk walk;
int err;
err = skcipher_walk_virt(&walk, req, true);
err = skcipher_walk_virt(&walk, req, false);
kernel_neon_begin();
neon_aes_ecb_encrypt(walk.iv, walk.iv, ctx->twkey,
ctx->key.rounds, 1, 1);
neon_aes_ecb_encrypt(walk.iv, walk.iv, ctx->twkey, ctx->key.rounds, 1);
kernel_neon_end();
while (walk.nbytes >= AES_BLOCK_SIZE) {
unsigned int blocks = walk.nbytes / AES_BLOCK_SIZE;
......@@ -320,13 +316,13 @@ static int __xts_crypt(struct skcipher_request *req,
blocks = round_down(blocks,
walk.stride / AES_BLOCK_SIZE);
kernel_neon_begin();
fn(walk.dst.virt.addr, walk.src.virt.addr, ctx->key.rk,
ctx->key.rounds, blocks, walk.iv);
kernel_neon_end();
err = skcipher_walk_done(&walk,
walk.nbytes - blocks * AES_BLOCK_SIZE);
}
kernel_neon_end();
return err;
}
......
......@@ -37,12 +37,19 @@ static void chacha20_doneon(u32 *state, u8 *dst, const u8 *src,
u8 buf[CHACHA20_BLOCK_SIZE];
while (bytes >= CHACHA20_BLOCK_SIZE * 4) {
kernel_neon_begin();
chacha20_4block_xor_neon(state, dst, src);
kernel_neon_end();
bytes -= CHACHA20_BLOCK_SIZE * 4;
src += CHACHA20_BLOCK_SIZE * 4;
dst += CHACHA20_BLOCK_SIZE * 4;
state[12] += 4;
}
if (!bytes)
return;
kernel_neon_begin();
while (bytes >= CHACHA20_BLOCK_SIZE) {
chacha20_block_xor_neon(state, dst, src);
bytes -= CHACHA20_BLOCK_SIZE;
......@@ -55,6 +62,7 @@ static void chacha20_doneon(u32 *state, u8 *dst, const u8 *src,
chacha20_block_xor_neon(state, buf, buf);
memcpy(dst, buf, bytes);
}
kernel_neon_end();
}
static int chacha20_neon(struct skcipher_request *req)
......@@ -68,11 +76,10 @@ static int chacha20_neon(struct skcipher_request *req)
if (!may_use_simd() || req->cryptlen <= CHACHA20_BLOCK_SIZE)
return crypto_chacha20_crypt(req);
err = skcipher_walk_virt(&walk, req, true);
err = skcipher_walk_virt(&walk, req, false);
crypto_chacha20_init(state, ctx, walk.iv);
kernel_neon_begin();
while (walk.nbytes > 0) {
unsigned int nbytes = walk.nbytes;
......@@ -83,7 +90,6 @@ static int chacha20_neon(struct skcipher_request *req)
nbytes);
err = skcipher_walk_done(&walk, walk.nbytes - nbytes);
}
kernel_neon_end();
return err;
}
......
......@@ -89,21 +89,32 @@ static struct shash_alg algs[] = { {
static int sha256_update_neon(struct shash_desc *desc, const u8 *data,
unsigned int len)
{
/*
* Stacking and unstacking a substantial slice of the NEON register
* file may significantly affect performance for small updates when
* executing in interrupt context, so fall back to the scalar code
* in that case.
*/
struct sha256_state *sctx = shash_desc_ctx(desc);
if (!may_use_simd())
return sha256_base_do_update(desc, data, len,
(sha256_block_fn *)sha256_block_data_order);
while (len > 0) {
unsigned int chunk = len;
/*
* Don't hog the CPU for the entire time it takes to process all
* input when running on a preemptible kernel, but process the
* data block by block instead.
*/
if (IS_ENABLED(CONFIG_PREEMPT) &&
chunk + sctx->count % SHA256_BLOCK_SIZE > SHA256_BLOCK_SIZE)
chunk = SHA256_BLOCK_SIZE -
sctx->count % SHA256_BLOCK_SIZE;
kernel_neon_begin();
sha256_base_do_update(desc, data, len,
sha256_base_do_update(desc, data, chunk,
(sha256_block_fn *)sha256_block_neon);
kernel_neon_end();
data += chunk;
len -= chunk;
}
return 0;
}
......@@ -117,10 +128,9 @@ static int sha256_finup_neon(struct shash_desc *desc, const u8 *data,
sha256_base_do_finalize(desc,
(sha256_block_fn *)sha256_block_data_order);
} else {
kernel_neon_begin();
if (len)
sha256_base_do_update(desc, data, len,
(sha256_block_fn *)sha256_block_neon);
sha256_update_neon(desc, data, len);
kernel_neon_begin();
sha256_base_do_finalize(desc,
(sha256_block_fn *)sha256_block_neon);
kernel_neon_end();
......
// SPDX-License-Identifier: GPL-2.0
/*
* ARM64 NEON-accelerated implementation of Speck128-XTS and Speck64-XTS
*
* Copyright (c) 2018 Google, Inc
*
* Author: Eric Biggers <ebiggers@google.com>
*/
#include <linux/linkage.h>
.text
// arguments
ROUND_KEYS .req x0 // const {u64,u32} *round_keys
NROUNDS .req w1 // int nrounds
NROUNDS_X .req x1
DST .req x2 // void *dst
SRC .req x3 // const void *src
NBYTES .req w4 // unsigned int nbytes
TWEAK .req x5 // void *tweak
// registers which hold the data being encrypted/decrypted
// (underscores avoid a naming collision with ARM64 registers x0-x3)
X_0 .req v0
Y_0 .req v1
X_1 .req v2
Y_1 .req v3
X_2 .req v4
Y_2 .req v5
X_3 .req v6
Y_3 .req v7
// the round key, duplicated in all lanes
ROUND_KEY .req v8
// index vector for tbl-based 8-bit rotates
ROTATE_TABLE .req v9
ROTATE_TABLE_Q .req q9
// temporary registers
TMP0 .req v10
TMP1 .req v11
TMP2 .req v12
TMP3 .req v13
// multiplication table for updating XTS tweaks
GFMUL_TABLE .req v14
GFMUL_TABLE_Q .req q14
// next XTS tweak value(s)
TWEAKV_NEXT .req v15
// XTS tweaks for the blocks currently being encrypted/decrypted
TWEAKV0 .req v16
TWEAKV1 .req v17
TWEAKV2 .req v18
TWEAKV3 .req v19
TWEAKV4 .req v20
TWEAKV5 .req v21
TWEAKV6 .req v22
TWEAKV7 .req v23
.align 4
.Lror64_8_table:
.octa 0x080f0e0d0c0b0a090007060504030201
.Lror32_8_table:
.octa 0x0c0f0e0d080b0a090407060500030201
.Lrol64_8_table:
.octa 0x0e0d0c0b0a09080f0605040302010007
.Lrol32_8_table:
.octa 0x0e0d0c0f0a09080b0605040702010003
.Lgf128mul_table:
.octa 0x00000000000000870000000000000001
.Lgf64mul_table:
.octa 0x0000000000000000000000002d361b00
/*
* _speck_round_128bytes() - Speck encryption round on 128 bytes at a time
*
* Do one Speck encryption round on the 128 bytes (8 blocks for Speck128, 16 for
* Speck64) stored in X0-X3 and Y0-Y3, using the round key stored in all lanes
* of ROUND_KEY. 'n' is the lane size: 64 for Speck128, or 32 for Speck64.
* 'lanes' is the lane specifier: "2d" for Speck128 or "4s" for Speck64.
*/
.macro _speck_round_128bytes n, lanes
// x = ror(x, 8)
tbl X_0.16b, {X_0.16b}, ROTATE_TABLE.16b
tbl X_1.16b, {X_1.16b}, ROTATE_TABLE.16b
tbl X_2.16b, {X_2.16b}, ROTATE_TABLE.16b
tbl X_3.16b, {X_3.16b}, ROTATE_TABLE.16b
// x += y
add X_0.\lanes, X_0.\lanes, Y_0.\lanes
add X_1.\lanes, X_1.\lanes, Y_1.\lanes
add X_2.\lanes, X_2.\lanes, Y_2.\lanes
add X_3.\lanes, X_3.\lanes, Y_3.\lanes
// x ^= k
eor X_0.16b, X_0.16b, ROUND_KEY.16b
eor X_1.16b, X_1.16b, ROUND_KEY.16b
eor X_2.16b, X_2.16b, ROUND_KEY.16b
eor X_3.16b, X_3.16b, ROUND_KEY.16b
// y = rol(y, 3)
shl TMP0.\lanes, Y_0.\lanes, #3
shl TMP1.\lanes, Y_1.\lanes, #3
shl TMP2.\lanes, Y_2.\lanes, #3
shl TMP3.\lanes, Y_3.\lanes, #3
sri TMP0.\lanes, Y_0.\lanes, #(\n - 3)
sri TMP1.\lanes, Y_1.\lanes, #(\n - 3)
sri TMP2.\lanes, Y_2.\lanes, #(\n - 3)
sri TMP3.\lanes, Y_3.\lanes, #(\n - 3)
// y ^= x
eor Y_0.16b, TMP0.16b, X_0.16b
eor Y_1.16b, TMP1.16b, X_1.16b
eor Y_2.16b, TMP2.16b, X_2.16b
eor Y_3.16b, TMP3.16b, X_3.16b
.endm
/*
* _speck_unround_128bytes() - Speck decryption round on 128 bytes at a time
*
* This is the inverse of _speck_round_128bytes().
*/
.macro _speck_unround_128bytes n, lanes
// y ^= x
eor TMP0.16b, Y_0.16b, X_0.16b
eor TMP1.16b, Y_1.16b, X_1.16b
eor TMP2.16b, Y_2.16b, X_2.16b
eor TMP3.16b, Y_3.16b, X_3.16b
// y = ror(y, 3)
ushr Y_0.\lanes, TMP0.\lanes, #3
ushr Y_1.\lanes, TMP1.\lanes, #3
ushr Y_2.\lanes, TMP2.\lanes, #3
ushr Y_3.\lanes, TMP3.\lanes, #3
sli Y_0.\lanes, TMP0.\lanes, #(\n - 3)
sli Y_1.\lanes, TMP1.\lanes, #(\n - 3)
sli Y_2.\lanes, TMP2.\lanes, #(\n - 3)
sli Y_3.\lanes, TMP3.\lanes, #(\n - 3)
// x ^= k
eor X_0.16b, X_0.16b, ROUND_KEY.16b
eor X_1.16b, X_1.16b, ROUND_KEY.16b
eor X_2.16b, X_2.16b, ROUND_KEY.16b
eor X_3.16b, X_3.16b, ROUND_KEY.16b
// x -= y
sub X_0.\lanes, X_0.\lanes, Y_0.\lanes
sub X_1.\lanes, X_1.\lanes, Y_1.\lanes
sub X_2.\lanes, X_2.\lanes, Y_2.\lanes
sub X_3.\lanes, X_3.\lanes, Y_3.\lanes
// x = rol(x, 8)
tbl X_0.16b, {X_0.16b}, ROTATE_TABLE.16b
tbl X_1.16b, {X_1.16b}, ROTATE_TABLE.16b
tbl X_2.16b, {X_2.16b}, ROTATE_TABLE.16b
tbl X_3.16b, {X_3.16b}, ROTATE_TABLE.16b
.endm
.macro _next_xts_tweak next, cur, tmp, n
.if \n == 64
/*
* Calculate the next tweak by multiplying the current one by x,
* modulo p(x) = x^128 + x^7 + x^2 + x + 1.
*/
sshr \tmp\().2d, \cur\().2d, #63
and \tmp\().16b, \tmp\().16b, GFMUL_TABLE.16b
shl \next\().2d, \cur\().2d, #1
ext \tmp\().16b, \tmp\().16b, \tmp\().16b, #8
eor \next\().16b, \next\().16b, \tmp\().16b
.else
/*
* Calculate the next two tweaks by multiplying the current ones by x^2,
* modulo p(x) = x^64 + x^4 + x^3 + x + 1.
*/
ushr \tmp\().2d, \cur\().2d, #62
shl \next\().2d, \cur\().2d, #2
tbl \tmp\().16b, {GFMUL_TABLE.16b}, \tmp\().16b
eor \next\().16b, \next\().16b, \tmp\().16b
.endif
.endm
/*
* _speck_xts_crypt() - Speck-XTS encryption/decryption
*
* Encrypt or decrypt NBYTES bytes of data from the SRC buffer to the DST buffer
* using Speck-XTS, specifically the variant with a block size of '2n' and round
* count given by NROUNDS. The expanded round keys are given in ROUND_KEYS, and
* the current XTS tweak value is given in TWEAK. It's assumed that NBYTES is a
* nonzero multiple of 128.
*/
.macro _speck_xts_crypt n, lanes, decrypting
/*
* If decrypting, modify the ROUND_KEYS parameter to point to the last
* round key rather than the first, since for decryption the round keys
* are used in reverse order.
*/
.if \decrypting
mov NROUNDS, NROUNDS /* zero the high 32 bits */
.if \n == 64
add ROUND_KEYS, ROUND_KEYS, NROUNDS_X, lsl #3
sub ROUND_KEYS, ROUND_KEYS, #8
.else
add ROUND_KEYS, ROUND_KEYS, NROUNDS_X, lsl #2
sub ROUND_KEYS, ROUND_KEYS, #4
.endif
.endif
// Load the index vector for tbl-based 8-bit rotates
.if \decrypting
ldr ROTATE_TABLE_Q, .Lrol\n\()_8_table
.else
ldr ROTATE_TABLE_Q, .Lror\n\()_8_table
.endif
// One-time XTS preparation
.if \n == 64
// Load first tweak
ld1 {TWEAKV0.16b}, [TWEAK]
// Load GF(2^128) multiplication table
ldr GFMUL_TABLE_Q, .Lgf128mul_table
.else
// Load first tweak
ld1 {TWEAKV0.8b}, [TWEAK]
// Load GF(2^64) multiplication table
ldr GFMUL_TABLE_Q, .Lgf64mul_table
// Calculate second tweak, packing it together with the first
ushr TMP0.2d, TWEAKV0.2d, #63
shl TMP1.2d, TWEAKV0.2d, #1
tbl TMP0.8b, {GFMUL_TABLE.16b}, TMP0.8b
eor TMP0.8b, TMP0.8b, TMP1.8b
mov TWEAKV0.d[1], TMP0.d[0]
.endif
.Lnext_128bytes_\@:
// Calculate XTS tweaks for next 128 bytes
_next_xts_tweak TWEAKV1, TWEAKV0, TMP0, \n
_next_xts_tweak TWEAKV2, TWEAKV1, TMP0, \n
_next_xts_tweak TWEAKV3, TWEAKV2, TMP0, \n
_next_xts_tweak TWEAKV4, TWEAKV3, TMP0, \n
_next_xts_tweak TWEAKV5, TWEAKV4, TMP0, \n
_next_xts_tweak TWEAKV6, TWEAKV5, TMP0, \n
_next_xts_tweak TWEAKV7, TWEAKV6, TMP0, \n
_next_xts_tweak TWEAKV_NEXT, TWEAKV7, TMP0, \n
// Load the next source blocks into {X,Y}[0-3]
ld1 {X_0.16b-Y_1.16b}, [SRC], #64
ld1 {X_2.16b-Y_3.16b}, [SRC], #64
// XOR the source blocks with their XTS tweaks
eor TMP0.16b, X_0.16b, TWEAKV0.16b
eor Y_0.16b, Y_0.16b, TWEAKV1.16b
eor TMP1.16b, X_1.16b, TWEAKV2.16b
eor Y_1.16b, Y_1.16b, TWEAKV3.16b
eor TMP2.16b, X_2.16b, TWEAKV4.16b
eor Y_2.16b, Y_2.16b, TWEAKV5.16b
eor TMP3.16b, X_3.16b, TWEAKV6.16b
eor Y_3.16b, Y_3.16b, TWEAKV7.16b
/*
* De-interleave the 'x' and 'y' elements of each block, i.e. make it so
* that the X[0-3] registers contain only the second halves of blocks,
* and the Y[0-3] registers contain only the first halves of blocks.
* (Speck uses the order (y, x) rather than the more intuitive (x, y).)
*/
uzp2 X_0.\lanes, TMP0.\lanes, Y_0.\lanes
uzp1 Y_0.\lanes, TMP0.\lanes, Y_0.\lanes
uzp2 X_1.\lanes, TMP1.\lanes, Y_1.\lanes
uzp1 Y_1.\lanes, TMP1.\lanes, Y_1.\lanes
uzp2 X_2.\lanes, TMP2.\lanes, Y_2.\lanes
uzp1 Y_2.\lanes, TMP2.\lanes, Y_2.\lanes
uzp2 X_3.\lanes, TMP3.\lanes, Y_3.\lanes
uzp1 Y_3.\lanes, TMP3.\lanes, Y_3.\lanes
// Do the cipher rounds
mov x6, ROUND_KEYS
mov w7, NROUNDS
.Lnext_round_\@:
.if \decrypting
ld1r {ROUND_KEY.\lanes}, [x6]
sub x6, x6, #( \n / 8 )
_speck_unround_128bytes \n, \lanes
.else
ld1r {ROUND_KEY.\lanes}, [x6], #( \n / 8 )
_speck_round_128bytes \n, \lanes
.endif
subs w7, w7, #1
bne .Lnext_round_\@
// Re-interleave the 'x' and 'y' elements of each block
zip1 TMP0.\lanes, Y_0.\lanes, X_0.\lanes
zip2 Y_0.\lanes, Y_0.\lanes, X_0.\lanes
zip1 TMP1.\lanes, Y_1.\lanes, X_1.\lanes
zip2 Y_1.\lanes, Y_1.\lanes, X_1.\lanes
zip1 TMP2.\lanes, Y_2.\lanes, X_2.\lanes
zip2 Y_2.\lanes, Y_2.\lanes, X_2.\lanes
zip1 TMP3.\lanes, Y_3.\lanes, X_3.\lanes
zip2 Y_3.\lanes, Y_3.\lanes, X_3.\lanes
// XOR the encrypted/decrypted blocks with the tweaks calculated earlier
eor X_0.16b, TMP0.16b, TWEAKV0.16b
eor Y_0.16b, Y_0.16b, TWEAKV1.16b
eor X_1.16b, TMP1.16b, TWEAKV2.16b
eor Y_1.16b, Y_1.16b, TWEAKV3.16b
eor X_2.16b, TMP2.16b, TWEAKV4.16b
eor Y_2.16b, Y_2.16b, TWEAKV5.16b
eor X_3.16b, TMP3.16b, TWEAKV6.16b
eor Y_3.16b, Y_3.16b, TWEAKV7.16b
mov TWEAKV0.16b, TWEAKV_NEXT.16b
// Store the ciphertext in the destination buffer
st1 {X_0.16b-Y_1.16b}, [DST], #64
st1 {X_2.16b-Y_3.16b}, [DST], #64
// Continue if there are more 128-byte chunks remaining
subs NBYTES, NBYTES, #128
bne .Lnext_128bytes_\@
// Store the next tweak and return
.if \n == 64
st1 {TWEAKV_NEXT.16b}, [TWEAK]
.else
st1 {TWEAKV_NEXT.8b}, [TWEAK]
.endif
ret
.endm
ENTRY(speck128_xts_encrypt_neon)
_speck_xts_crypt n=64, lanes=2d, decrypting=0
ENDPROC(speck128_xts_encrypt_neon)
ENTRY(speck128_xts_decrypt_neon)
_speck_xts_crypt n=64, lanes=2d, decrypting=1
ENDPROC(speck128_xts_decrypt_neon)
ENTRY(speck64_xts_encrypt_neon)
_speck_xts_crypt n=32, lanes=4s, decrypting=0
ENDPROC(speck64_xts_encrypt_neon)
ENTRY(speck64_xts_decrypt_neon)
_speck_xts_crypt n=32, lanes=4s, decrypting=1
ENDPROC(speck64_xts_decrypt_neon)
// SPDX-License-Identifier: GPL-2.0
/*
* NEON-accelerated implementation of Speck128-XTS and Speck64-XTS
* (64-bit version; based on the 32-bit version)
*
* Copyright (c) 2018 Google, Inc
*/
#include <asm/hwcap.h>
#include <asm/neon.h>
#include <asm/simd.h>
#include <crypto/algapi.h>
#include <crypto/gf128mul.h>
#include <crypto/internal/skcipher.h>
#include <crypto/speck.h>
#include <crypto/xts.h>
#include <linux/kernel.h>
#include <linux/module.h>
/* The assembly functions only handle multiples of 128 bytes */
#define SPECK_NEON_CHUNK_SIZE 128
/* Speck128 */
struct speck128_xts_tfm_ctx {
struct speck128_tfm_ctx main_key;
struct speck128_tfm_ctx tweak_key;
};
asmlinkage void speck128_xts_encrypt_neon(const u64 *round_keys, int nrounds,
void *dst, const void *src,
unsigned int nbytes, void *tweak);
asmlinkage void speck128_xts_decrypt_neon(const u64 *round_keys, int nrounds,
void *dst, const void *src,
unsigned int nbytes, void *tweak);
typedef void (*speck128_crypt_one_t)(const struct speck128_tfm_ctx *,
u8 *, const u8 *);
typedef void (*speck128_xts_crypt_many_t)(const u64 *, int, void *,
const void *, unsigned int, void *);
static __always_inline int
__speck128_xts_crypt(struct skcipher_request *req,
speck128_crypt_one_t crypt_one,
speck128_xts_crypt_many_t crypt_many)
{
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
const struct speck128_xts_tfm_ctx *ctx = crypto_skcipher_ctx(tfm);
struct skcipher_walk walk;
le128 tweak;
int err;
err = skcipher_walk_virt(&walk, req, true);
crypto_speck128_encrypt(&ctx->tweak_key, (u8 *)&tweak, walk.iv);
while (walk.nbytes > 0) {
unsigned int nbytes = walk.nbytes;
u8 *dst = walk.dst.virt.addr;
const u8 *src = walk.src.virt.addr;
if (nbytes >= SPECK_NEON_CHUNK_SIZE && may_use_simd()) {
unsigned int count;
count = round_down(nbytes, SPECK_NEON_CHUNK_SIZE);
kernel_neon_begin();
(*crypt_many)(ctx->main_key.round_keys,
ctx->main_key.nrounds,
dst, src, count, &tweak);
kernel_neon_end();
dst += count;
src += count;
nbytes -= count;
}
/* Handle any remainder with generic code */
while (nbytes >= sizeof(tweak)) {
le128_xor((le128 *)dst, (const le128 *)src, &tweak);
(*crypt_one)(&ctx->main_key, dst, dst);
le128_xor((le128 *)dst, (const le128 *)dst, &tweak);
gf128mul_x_ble(&tweak, &tweak);
dst += sizeof(tweak);
src += sizeof(tweak);
nbytes -= sizeof(tweak);
}
err = skcipher_walk_done(&walk, nbytes);
}
return err;
}
static int speck128_xts_encrypt(struct skcipher_request *req)
{
return __speck128_xts_crypt(req, crypto_speck128_encrypt,
speck128_xts_encrypt_neon);
}
static int speck128_xts_decrypt(struct skcipher_request *req)
{
return __speck128_xts_crypt(req, crypto_speck128_decrypt,
speck128_xts_decrypt_neon);
}
static int speck128_xts_setkey(struct crypto_skcipher *tfm, const u8 *key,
unsigned int keylen)
{
struct speck128_xts_tfm_ctx *ctx = crypto_skcipher_ctx(tfm);
int err;
err = xts_verify_key(tfm, key, keylen);
if (err)
return err;
keylen /= 2;
err = crypto_speck128_setkey(&ctx->main_key, key, keylen);
if (err)
return err;
return crypto_speck128_setkey(&ctx->tweak_key, key + keylen, keylen);
}
/* Speck64 */
struct speck64_xts_tfm_ctx {
struct speck64_tfm_ctx main_key;
struct speck64_tfm_ctx tweak_key;
};
asmlinkage void speck64_xts_encrypt_neon(const u32 *round_keys, int nrounds,
void *dst, const void *src,
unsigned int nbytes, void *tweak);
asmlinkage void speck64_xts_decrypt_neon(const u32 *round_keys, int nrounds,
void *dst, const void *src,
unsigned int nbytes, void *tweak);
typedef void (*speck64_crypt_one_t)(const struct speck64_tfm_ctx *,
u8 *, const u8 *);
typedef void (*speck64_xts_crypt_many_t)(const u32 *, int, void *,
const void *, unsigned int, void *);
static __always_inline int
__speck64_xts_crypt(struct skcipher_request *req, speck64_crypt_one_t crypt_one,
speck64_xts_crypt_many_t crypt_many)
{
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
const struct speck64_xts_tfm_ctx *ctx = crypto_skcipher_ctx(tfm);
struct skcipher_walk walk;
__le64 tweak;
int err;
err = skcipher_walk_virt(&walk, req, true);
crypto_speck64_encrypt(&ctx->tweak_key, (u8 *)&tweak, walk.iv);
while (walk.nbytes > 0) {
unsigned int nbytes = walk.nbytes;
u8 *dst = walk.dst.virt.addr;
const u8 *src = walk.src.virt.addr;
if (nbytes >= SPECK_NEON_CHUNK_SIZE && may_use_simd()) {
unsigned int count;
count = round_down(nbytes, SPECK_NEON_CHUNK_SIZE);
kernel_neon_begin();
(*crypt_many)(ctx->main_key.round_keys,
ctx->main_key.nrounds,
dst, src, count, &tweak);
kernel_neon_end();
dst += count;
src += count;
nbytes -= count;
}
/* Handle any remainder with generic code */
while (nbytes >= sizeof(tweak)) {
*(__le64 *)dst = *(__le64 *)src ^ tweak;
(*crypt_one)(&ctx->main_key, dst, dst);
*(__le64 *)dst ^= tweak;
tweak = cpu_to_le64((le64_to_cpu(tweak) << 1) ^
((tweak & cpu_to_le64(1ULL << 63)) ?
0x1B : 0));
dst += sizeof(tweak);
src += sizeof(tweak);
nbytes -= sizeof(tweak);
}
err = skcipher_walk_done(&walk, nbytes);
}
return err;
}
static int speck64_xts_encrypt(struct skcipher_request *req)
{
return __speck64_xts_crypt(req, crypto_speck64_encrypt,
speck64_xts_encrypt_neon);
}
static int speck64_xts_decrypt(struct skcipher_request *req)
{
return __speck64_xts_crypt(req, crypto_speck64_decrypt,
speck64_xts_decrypt_neon);
}
static int speck64_xts_setkey(struct crypto_skcipher *tfm, const u8 *key,
unsigned int keylen)
{
struct speck64_xts_tfm_ctx *ctx = crypto_skcipher_ctx(tfm);
int err;
err = xts_verify_key(tfm, key, keylen);
if (err)
return err;
keylen /= 2;
err = crypto_speck64_setkey(&ctx->main_key, key, keylen);
if (err)
return err;
return crypto_speck64_setkey(&ctx->tweak_key, key + keylen, keylen);
}
static struct skcipher_alg speck_algs[] = {
{
.base.cra_name = "xts(speck128)",
.base.cra_driver_name = "xts-speck128-neon",
.base.cra_priority = 300,
.base.cra_blocksize = SPECK128_BLOCK_SIZE,
.base.cra_ctxsize = sizeof(struct speck128_xts_tfm_ctx),
.base.cra_alignmask = 7,
.base.cra_module = THIS_MODULE,
.min_keysize = 2 * SPECK128_128_KEY_SIZE,
.max_keysize = 2 * SPECK128_256_KEY_SIZE,
.ivsize = SPECK128_BLOCK_SIZE,
.walksize = SPECK_NEON_CHUNK_SIZE,
.setkey = speck128_xts_setkey,
.encrypt = speck128_xts_encrypt,
.decrypt = speck128_xts_decrypt,
}, {
.base.cra_name = "xts(speck64)",
.base.cra_driver_name = "xts-speck64-neon",
.base.cra_priority = 300,
.base.cra_blocksize = SPECK64_BLOCK_SIZE,
.base.cra_ctxsize = sizeof(struct speck64_xts_tfm_ctx),
.base.cra_alignmask = 7,
.base.cra_module = THIS_MODULE,
.min_keysize = 2 * SPECK64_96_KEY_SIZE,
.max_keysize = 2 * SPECK64_128_KEY_SIZE,
.ivsize = SPECK64_BLOCK_SIZE,
.walksize = SPECK_NEON_CHUNK_SIZE,
.setkey = speck64_xts_setkey,
.encrypt = speck64_xts_encrypt,
.decrypt = speck64_xts_decrypt,
}
};
static int __init speck_neon_module_init(void)
{
if (!(elf_hwcap & HWCAP_ASIMD))
return -ENODEV;
return crypto_register_skciphers(speck_algs, ARRAY_SIZE(speck_algs));
}
static void __exit speck_neon_module_exit(void)
{
crypto_unregister_skciphers(speck_algs, ARRAY_SIZE(speck_algs));
}
module_init(speck_neon_module_init);
module_exit(speck_neon_module_exit);
MODULE_DESCRIPTION("Speck block cipher (NEON-accelerated)");
MODULE_LICENSE("GPL");
MODULE_AUTHOR("Eric Biggers <ebiggers@google.com>");
MODULE_ALIAS_CRYPTO("xts(speck128)");
MODULE_ALIAS_CRYPTO("xts-speck128-neon");
MODULE_ALIAS_CRYPTO("xts(speck64)");
MODULE_ALIAS_CRYPTO("xts-speck64-neon");
此差异已折叠。
......@@ -72,6 +72,21 @@ struct aesni_xts_ctx {
u8 raw_crypt_ctx[sizeof(struct crypto_aes_ctx)] AESNI_ALIGN_ATTR;
};
#define GCM_BLOCK_LEN 16
struct gcm_context_data {
/* init, update and finalize context data */
u8 aad_hash[GCM_BLOCK_LEN];
u64 aad_length;
u64 in_length;
u8 partial_block_enc_key[GCM_BLOCK_LEN];
u8 orig_IV[GCM_BLOCK_LEN];
u8 current_counter[GCM_BLOCK_LEN];
u64 partial_block_len;
u64 unused;
u8 hash_keys[GCM_BLOCK_LEN * 8];
};
asmlinkage int aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
unsigned int key_len);
asmlinkage void aesni_enc(struct crypto_aes_ctx *ctx, u8 *out,
......@@ -105,6 +120,7 @@ asmlinkage void aesni_xts_crypt8(struct crypto_aes_ctx *ctx, u8 *out,
/* asmlinkage void aesni_gcm_enc()
* void *ctx, AES Key schedule. Starts on a 16 byte boundary.
* struct gcm_context_data. May be uninitialized.
* u8 *out, Ciphertext output. Encrypt in-place is allowed.
* const u8 *in, Plaintext input
* unsigned long plaintext_len, Length of data in bytes for encryption.
......@@ -117,13 +133,15 @@ asmlinkage void aesni_xts_crypt8(struct crypto_aes_ctx *ctx, u8 *out,
* unsigned long auth_tag_len), Authenticated Tag Length in bytes.
* Valid values are 16 (most likely), 12 or 8.
*/
asmlinkage void aesni_gcm_enc(void *ctx, u8 *out,
asmlinkage void aesni_gcm_enc(void *ctx,
struct gcm_context_data *gdata, u8 *out,
const u8 *in, unsigned long plaintext_len, u8 *iv,
u8 *hash_subkey, const u8 *aad, unsigned long aad_len,
u8 *auth_tag, unsigned long auth_tag_len);
/* asmlinkage void aesni_gcm_dec()
* void *ctx, AES Key schedule. Starts on a 16 byte boundary.
* struct gcm_context_data. May be uninitialized.
* u8 *out, Plaintext output. Decrypt in-place is allowed.
* const u8 *in, Ciphertext input
* unsigned long ciphertext_len, Length of data in bytes for decryption.
......@@ -137,11 +155,28 @@ asmlinkage void aesni_gcm_enc(void *ctx, u8 *out,
* unsigned long auth_tag_len) Authenticated Tag Length in bytes.
* Valid values are 16 (most likely), 12 or 8.
*/
asmlinkage void aesni_gcm_dec(void *ctx, u8 *out,
asmlinkage void aesni_gcm_dec(void *ctx,
struct gcm_context_data *gdata, u8 *out,
const u8 *in, unsigned long ciphertext_len, u8 *iv,
u8 *hash_subkey, const u8 *aad, unsigned long aad_len,
u8 *auth_tag, unsigned long auth_tag_len);
/* Scatter / Gather routines, with args similar to above */
asmlinkage void aesni_gcm_init(void *ctx,
struct gcm_context_data *gdata,
u8 *iv,
u8 *hash_subkey, const u8 *aad,
unsigned long aad_len);
asmlinkage void aesni_gcm_enc_update(void *ctx,
struct gcm_context_data *gdata, u8 *out,
const u8 *in, unsigned long plaintext_len);
asmlinkage void aesni_gcm_dec_update(void *ctx,
struct gcm_context_data *gdata, u8 *out,
const u8 *in,
unsigned long ciphertext_len);
asmlinkage void aesni_gcm_finalize(void *ctx,
struct gcm_context_data *gdata,
u8 *auth_tag, unsigned long auth_tag_len);
#ifdef CONFIG_AS_AVX
asmlinkage void aes_ctr_enc_128_avx_by8(const u8 *in, u8 *iv,
......@@ -167,14 +202,16 @@ asmlinkage void aesni_gcm_dec_avx_gen2(void *ctx, u8 *out,
const u8 *aad, unsigned long aad_len,
u8 *auth_tag, unsigned long auth_tag_len);
static void aesni_gcm_enc_avx(void *ctx, u8 *out,
static void aesni_gcm_enc_avx(void *ctx,
struct gcm_context_data *data, u8 *out,
const u8 *in, unsigned long plaintext_len, u8 *iv,
u8 *hash_subkey, const u8 *aad, unsigned long aad_len,
u8 *auth_tag, unsigned long auth_tag_len)
{
struct crypto_aes_ctx *aes_ctx = (struct crypto_aes_ctx*)ctx;
if ((plaintext_len < AVX_GEN2_OPTSIZE) || (aes_ctx-> key_length != AES_KEYSIZE_128)){
aesni_gcm_enc(ctx, out, in, plaintext_len, iv, hash_subkey, aad,
aesni_gcm_enc(ctx, data, out, in,
plaintext_len, iv, hash_subkey, aad,
aad_len, auth_tag, auth_tag_len);
} else {
aesni_gcm_precomp_avx_gen2(ctx, hash_subkey);
......@@ -183,14 +220,16 @@ static void aesni_gcm_enc_avx(void *ctx, u8 *out,
}
}
static void aesni_gcm_dec_avx(void *ctx, u8 *out,
static void aesni_gcm_dec_avx(void *ctx,
struct gcm_context_data *data, u8 *out,
const u8 *in, unsigned long ciphertext_len, u8 *iv,
u8 *hash_subkey, const u8 *aad, unsigned long aad_len,
u8 *auth_tag, unsigned long auth_tag_len)
{
struct crypto_aes_ctx *aes_ctx = (struct crypto_aes_ctx*)ctx;
if ((ciphertext_len < AVX_GEN2_OPTSIZE) || (aes_ctx-> key_length != AES_KEYSIZE_128)) {
aesni_gcm_dec(ctx, out, in, ciphertext_len, iv, hash_subkey, aad,
aesni_gcm_dec(ctx, data, out, in,
ciphertext_len, iv, hash_subkey, aad,
aad_len, auth_tag, auth_tag_len);
} else {
aesni_gcm_precomp_avx_gen2(ctx, hash_subkey);
......@@ -218,14 +257,16 @@ asmlinkage void aesni_gcm_dec_avx_gen4(void *ctx, u8 *out,
const u8 *aad, unsigned long aad_len,
u8 *auth_tag, unsigned long auth_tag_len);
static void aesni_gcm_enc_avx2(void *ctx, u8 *out,
static void aesni_gcm_enc_avx2(void *ctx,
struct gcm_context_data *data, u8 *out,
const u8 *in, unsigned long plaintext_len, u8 *iv,
u8 *hash_subkey, const u8 *aad, unsigned long aad_len,
u8 *auth_tag, unsigned long auth_tag_len)
{
struct crypto_aes_ctx *aes_ctx = (struct crypto_aes_ctx*)ctx;
if ((plaintext_len < AVX_GEN2_OPTSIZE) || (aes_ctx-> key_length != AES_KEYSIZE_128)) {
aesni_gcm_enc(ctx, out, in, plaintext_len, iv, hash_subkey, aad,
aesni_gcm_enc(ctx, data, out, in,
plaintext_len, iv, hash_subkey, aad,
aad_len, auth_tag, auth_tag_len);
} else if (plaintext_len < AVX_GEN4_OPTSIZE) {
aesni_gcm_precomp_avx_gen2(ctx, hash_subkey);
......@@ -238,14 +279,16 @@ static void aesni_gcm_enc_avx2(void *ctx, u8 *out,
}
}
static void aesni_gcm_dec_avx2(void *ctx, u8 *out,
static void aesni_gcm_dec_avx2(void *ctx,
struct gcm_context_data *data, u8 *out,
const u8 *in, unsigned long ciphertext_len, u8 *iv,
u8 *hash_subkey, const u8 *aad, unsigned long aad_len,
u8 *auth_tag, unsigned long auth_tag_len)
{
struct crypto_aes_ctx *aes_ctx = (struct crypto_aes_ctx*)ctx;
if ((ciphertext_len < AVX_GEN2_OPTSIZE) || (aes_ctx-> key_length != AES_KEYSIZE_128)) {
aesni_gcm_dec(ctx, out, in, ciphertext_len, iv, hash_subkey,
aesni_gcm_dec(ctx, data, out, in,
ciphertext_len, iv, hash_subkey,
aad, aad_len, auth_tag, auth_tag_len);
} else if (ciphertext_len < AVX_GEN4_OPTSIZE) {
aesni_gcm_precomp_avx_gen2(ctx, hash_subkey);
......@@ -259,15 +302,19 @@ static void aesni_gcm_dec_avx2(void *ctx, u8 *out,
}
#endif
static void (*aesni_gcm_enc_tfm)(void *ctx, u8 *out,
const u8 *in, unsigned long plaintext_len, u8 *iv,
u8 *hash_subkey, const u8 *aad, unsigned long aad_len,
u8 *auth_tag, unsigned long auth_tag_len);
static void (*aesni_gcm_enc_tfm)(void *ctx,
struct gcm_context_data *data, u8 *out,
const u8 *in, unsigned long plaintext_len,
u8 *iv, u8 *hash_subkey, const u8 *aad,
unsigned long aad_len, u8 *auth_tag,
unsigned long auth_tag_len);
static void (*aesni_gcm_dec_tfm)(void *ctx, u8 *out,
const u8 *in, unsigned long ciphertext_len, u8 *iv,
u8 *hash_subkey, const u8 *aad, unsigned long aad_len,
u8 *auth_tag, unsigned long auth_tag_len);
static void (*aesni_gcm_dec_tfm)(void *ctx,
struct gcm_context_data *data, u8 *out,
const u8 *in, unsigned long ciphertext_len,
u8 *iv, u8 *hash_subkey, const u8 *aad,
unsigned long aad_len, u8 *auth_tag,
unsigned long auth_tag_len);
static inline struct
aesni_rfc4106_gcm_ctx *aesni_rfc4106_gcm_ctx_get(struct crypto_aead *tfm)
......@@ -744,6 +791,127 @@ static int generic_gcmaes_set_authsize(struct crypto_aead *tfm,
return 0;
}
static int gcmaes_crypt_by_sg(bool enc, struct aead_request *req,
unsigned int assoclen, u8 *hash_subkey,
u8 *iv, void *aes_ctx)
{
struct crypto_aead *tfm = crypto_aead_reqtfm(req);
unsigned long auth_tag_len = crypto_aead_authsize(tfm);
struct gcm_context_data data AESNI_ALIGN_ATTR;
struct scatter_walk dst_sg_walk = {};
unsigned long left = req->cryptlen;
unsigned long len, srclen, dstlen;
struct scatter_walk assoc_sg_walk;
struct scatter_walk src_sg_walk;
struct scatterlist src_start[2];
struct scatterlist dst_start[2];
struct scatterlist *src_sg;
struct scatterlist *dst_sg;
u8 *src, *dst, *assoc;
u8 *assocmem = NULL;
u8 authTag[16];
if (!enc)
left -= auth_tag_len;
/* Linearize assoc, if not already linear */
if (req->src->length >= assoclen && req->src->length &&
(!PageHighMem(sg_page(req->src)) ||
req->src->offset + req->src->length < PAGE_SIZE)) {
scatterwalk_start(&assoc_sg_walk, req->src);
assoc = scatterwalk_map(&assoc_sg_walk);
} else {
/* assoc can be any length, so must be on heap */
assocmem = kmalloc(assoclen, GFP_ATOMIC);
if (unlikely(!assocmem))
return -ENOMEM;
assoc = assocmem;
scatterwalk_map_and_copy(assoc, req->src, 0, assoclen, 0);
}
src_sg = scatterwalk_ffwd(src_start, req->src, req->assoclen);
scatterwalk_start(&src_sg_walk, src_sg);
if (req->src != req->dst) {
dst_sg = scatterwalk_ffwd(dst_start, req->dst, req->assoclen);
scatterwalk_start(&dst_sg_walk, dst_sg);
}
kernel_fpu_begin();
aesni_gcm_init(aes_ctx, &data, iv,
hash_subkey, assoc, assoclen);
if (req->src != req->dst) {
while (left) {
src = scatterwalk_map(&src_sg_walk);
dst = scatterwalk_map(&dst_sg_walk);
srclen = scatterwalk_clamp(&src_sg_walk, left);
dstlen = scatterwalk_clamp(&dst_sg_walk, left);
len = min(srclen, dstlen);
if (len) {
if (enc)
aesni_gcm_enc_update(aes_ctx, &data,
dst, src, len);
else
aesni_gcm_dec_update(aes_ctx, &data,
dst, src, len);
}
left -= len;
scatterwalk_unmap(src);
scatterwalk_unmap(dst);
scatterwalk_advance(&src_sg_walk, len);
scatterwalk_advance(&dst_sg_walk, len);
scatterwalk_done(&src_sg_walk, 0, left);
scatterwalk_done(&dst_sg_walk, 1, left);
}
} else {
while (left) {
dst = src = scatterwalk_map(&src_sg_walk);
len = scatterwalk_clamp(&src_sg_walk, left);
if (len) {
if (enc)
aesni_gcm_enc_update(aes_ctx, &data,
src, src, len);
else
aesni_gcm_dec_update(aes_ctx, &data,
src, src, len);
}
left -= len;
scatterwalk_unmap(src);
scatterwalk_advance(&src_sg_walk, len);
scatterwalk_done(&src_sg_walk, 1, left);
}
}
aesni_gcm_finalize(aes_ctx, &data, authTag, auth_tag_len);
kernel_fpu_end();
if (!assocmem)
scatterwalk_unmap(assoc);
else
kfree(assocmem);
if (!enc) {
u8 authTagMsg[16];
/* Copy out original authTag */
scatterwalk_map_and_copy(authTagMsg, req->src,
req->assoclen + req->cryptlen -
auth_tag_len,
auth_tag_len, 0);
/* Compare generated tag with passed in tag. */
return crypto_memneq(authTagMsg, authTag, auth_tag_len) ?
-EBADMSG : 0;
}
/* Copy in the authTag */
scatterwalk_map_and_copy(authTag, req->dst,
req->assoclen + req->cryptlen,
auth_tag_len, 1);
return 0;
}
static int gcmaes_encrypt(struct aead_request *req, unsigned int assoclen,
u8 *hash_subkey, u8 *iv, void *aes_ctx)
{
......@@ -753,7 +921,14 @@ static int gcmaes_encrypt(struct aead_request *req, unsigned int assoclen,
unsigned long auth_tag_len = crypto_aead_authsize(tfm);
struct scatter_walk src_sg_walk;
struct scatter_walk dst_sg_walk = {};
struct gcm_context_data data AESNI_ALIGN_ATTR;
if (((struct crypto_aes_ctx *)aes_ctx)->key_length != AES_KEYSIZE_128 ||
aesni_gcm_enc_tfm == aesni_gcm_enc ||
req->cryptlen < AVX_GEN2_OPTSIZE) {
return gcmaes_crypt_by_sg(true, req, assoclen, hash_subkey, iv,
aes_ctx);
}
if (sg_is_last(req->src) &&
(!PageHighMem(sg_page(req->src)) ||
req->src->offset + req->src->length <= PAGE_SIZE) &&
......@@ -782,7 +957,7 @@ static int gcmaes_encrypt(struct aead_request *req, unsigned int assoclen,
}
kernel_fpu_begin();
aesni_gcm_enc_tfm(aes_ctx, dst, src, req->cryptlen, iv,
aesni_gcm_enc_tfm(aes_ctx, &data, dst, src, req->cryptlen, iv,
hash_subkey, assoc, assoclen,
dst + req->cryptlen, auth_tag_len);
kernel_fpu_end();
......@@ -817,8 +992,15 @@ static int gcmaes_decrypt(struct aead_request *req, unsigned int assoclen,
u8 authTag[16];
struct scatter_walk src_sg_walk;
struct scatter_walk dst_sg_walk = {};
struct gcm_context_data data AESNI_ALIGN_ATTR;
int retval = 0;
if (((struct crypto_aes_ctx *)aes_ctx)->key_length != AES_KEYSIZE_128 ||
aesni_gcm_enc_tfm == aesni_gcm_enc ||
req->cryptlen < AVX_GEN2_OPTSIZE) {
return gcmaes_crypt_by_sg(false, req, assoclen, hash_subkey, iv,
aes_ctx);
}
tempCipherLen = (unsigned long)(req->cryptlen - auth_tag_len);
if (sg_is_last(req->src) &&
......@@ -849,7 +1031,7 @@ static int gcmaes_decrypt(struct aead_request *req, unsigned int assoclen,
kernel_fpu_begin();
aesni_gcm_dec_tfm(aes_ctx, dst, src, tempCipherLen, iv,
aesni_gcm_dec_tfm(aes_ctx, &data, dst, src, tempCipherLen, iv,
hash_subkey, assoc, assoclen,
authTag, auth_tag_len);
kernel_fpu_end();
......
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
......@@ -106,13 +106,6 @@ static asmlinkage struct job_sha1* (*sha1_job_mgr_flush)
static asmlinkage struct job_sha1* (*sha1_job_mgr_get_comp_job)
(struct sha1_mb_mgr *state);
static inline void sha1_init_digest(uint32_t *digest)
{
static const uint32_t initial_digest[SHA1_DIGEST_LENGTH] = {SHA1_H0,
SHA1_H1, SHA1_H2, SHA1_H3, SHA1_H4 };
memcpy(digest, initial_digest, sizeof(initial_digest));
}
static inline uint32_t sha1_pad(uint8_t padblock[SHA1_BLOCK_SIZE * 2],
uint64_t total_len)
{
......@@ -244,11 +237,8 @@ static struct sha1_hash_ctx *sha1_ctx_mgr_submit(struct sha1_ctx_mgr *mgr,
uint32_t len,
int flags)
{
if (flags & (~HASH_ENTIRE)) {
/*
* User should not pass anything other than FIRST, UPDATE, or
* LAST
*/
if (flags & ~(HASH_UPDATE | HASH_LAST)) {
/* User should not pass anything other than UPDATE or LAST */
ctx->error = HASH_CTX_ERROR_INVALID_FLAGS;
return ctx;
}
......@@ -259,24 +249,12 @@ static struct sha1_hash_ctx *sha1_ctx_mgr_submit(struct sha1_ctx_mgr *mgr,
return ctx;
}
if ((ctx->status & HASH_CTX_STS_COMPLETE) && !(flags & HASH_FIRST)) {
if (ctx->status & HASH_CTX_STS_COMPLETE) {
/* Cannot update a finished job. */
ctx->error = HASH_CTX_ERROR_ALREADY_COMPLETED;
return ctx;
}
if (flags & HASH_FIRST) {
/* Init digest */
sha1_init_digest(ctx->job.result_digest);
/* Reset byte counter */
ctx->total_length = 0;
/* Clear extra blocks */
ctx->partial_block_buffer_length = 0;
}
/*
* If we made it here, there were no errors during this call to
* submit
......
......@@ -57,11 +57,9 @@
#include "sha1_mb_mgr.h"
#define HASH_UPDATE 0x00
#define HASH_FIRST 0x01
#define HASH_LAST 0x02
#define HASH_ENTIRE 0x03
#define HASH_DONE 0x04
#define HASH_FINAL 0x08
#define HASH_LAST 0x01
#define HASH_DONE 0x02
#define HASH_FINAL 0x04
#define HASH_CTX_STS_IDLE 0x00
#define HASH_CTX_STS_PROCESSING 0x01
......
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
......@@ -271,7 +271,7 @@ static int crypto_report(struct sk_buff *in_skb, struct nlmsghdr *in_nlh,
return -ENOENT;
err = -ENOMEM;
skb = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_ATOMIC);
skb = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
if (!skb)
goto drop_alg;
......
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册