提交 62606c22 编写于 作者: L Linus Torvalds

Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6

Pull crypto updates from Herbert Xu:
 "API:
   - Remove VLA usage
   - Add cryptostat user-space interface
   - Add notifier for new crypto algorithms

  Algorithms:
   - Add OFB mode
   - Remove speck

  Drivers:
   - Remove x86/sha*-mb as they are buggy
   - Remove pcbc(aes) from x86/aesni
   - Improve performance of arm/ghash-ce by up to 85%
   - Implement CTS-CBC in arm64/aes-blk, faster by up to 50%
   - Remove PMULL based arm64/crc32 driver
   - Use PMULL in arm64/crct10dif
   - Add aes-ctr support in s5p-sss
   - Add caam/qi2 driver

  Others:
   - Pick better transform if one becomes available in crc-t10dif"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (124 commits)
  crypto: chelsio - Update ntx queue received from cxgb4
  crypto: ccree - avoid implicit enum conversion
  crypto: caam - add SPDX license identifier to all files
  crypto: caam/qi - simplify CGR allocation, freeing
  crypto: mxs-dcp - make symbols 'sha1_null_hash' and 'sha256_null_hash' static
  crypto: arm64/aes-blk - ensure XTS mask is always loaded
  crypto: testmgr - fix sizeof() on COMP_BUF_SIZE
  crypto: chtls - remove set but not used variable 'csk'
  crypto: axis - fix platform_no_drv_owner.cocci warnings
  crypto: x86/aes-ni - fix build error following fpu template removal
  crypto: arm64/aes - fix handling sub-block CTS-CBC inputs
  crypto: caam/qi2 - avoid double export
  crypto: mxs-dcp - Fix AES issues
  crypto: mxs-dcp - Fix SHA null hashes and output length
  crypto: mxs-dcp - Implement sha import/export
  crypto: aegis/generic - fix for big endian systems
  crypto: morus/generic - fix for big endian systems
  crypto: lrw - fix rebase error after out of bounds fix
  crypto: cavium/nitrox - use pci_alloc_irq_vectors() while enabling MSI-X.
  crypto: cavium/nitrox - NITROX command queue changes.
  ...
无相关合并请求
......@@ -191,21 +191,11 @@ Currently, the following pairs of encryption modes are supported:
- AES-256-XTS for contents and AES-256-CTS-CBC for filenames
- AES-128-CBC for contents and AES-128-CTS-CBC for filenames
- Speck128/256-XTS for contents and Speck128/256-CTS-CBC for filenames
It is strongly recommended to use AES-256-XTS for contents encryption.
AES-128-CBC was added only for low-powered embedded devices with
crypto accelerators such as CAAM or CESA that do not support XTS.
Similarly, Speck128/256 support was only added for older or low-end
CPUs which cannot do AES fast enough -- especially ARM CPUs which have
NEON instructions but not the Cryptography Extensions -- and for which
it would not otherwise be feasible to use encryption at all. It is
not recommended to use Speck on CPUs that have AES instructions.
Speck support is only available if it has been enabled in the crypto
API via CONFIG_CRYPTO_SPECK. Also, on ARM platforms, to get
acceptable performance CONFIG_CRYPTO_SPECK_NEON must be enabled.
New encryption modes can be added relatively easily, without changes
to individual filesystems. However, authenticated encryption (AE)
modes are not currently supported because of the difficulty of dealing
......
......@@ -7578,14 +7578,6 @@ S: Supported
F: drivers/infiniband/hw/i40iw/
F: include/uapi/rdma/i40iw-abi.h
INTEL SHA MULTIBUFFER DRIVER
M: Megha Dey <megha.dey@linux.intel.com>
R: Tim Chen <tim.c.chen@linux.intel.com>
L: linux-crypto@vger.kernel.org
S: Supported
F: arch/x86/crypto/sha*-mb/
F: crypto/mcryptd.c
INTEL TELEMETRY DRIVER
M: Souvik Kumar Chakravarty <souvik.k.chakravarty@intel.com>
L: platform-driver-x86@vger.kernel.org
......
......@@ -99,6 +99,7 @@ config CRYPTO_GHASH_ARM_CE
depends on KERNEL_MODE_NEON
select CRYPTO_HASH
select CRYPTO_CRYPTD
select CRYPTO_GF128MUL
help
Use an implementation of GHASH (used by the GCM AEAD chaining mode)
that uses the 64x64 to 128 bit polynomial multiplication (vmull.p64)
......@@ -121,10 +122,4 @@ config CRYPTO_CHACHA20_NEON
select CRYPTO_BLKCIPHER
select CRYPTO_CHACHA20
config CRYPTO_SPECK_NEON
tristate "NEON accelerated Speck cipher algorithms"
depends on KERNEL_MODE_NEON
select CRYPTO_BLKCIPHER
select CRYPTO_SPECK
endif
......@@ -10,7 +10,6 @@ obj-$(CONFIG_CRYPTO_SHA1_ARM_NEON) += sha1-arm-neon.o
obj-$(CONFIG_CRYPTO_SHA256_ARM) += sha256-arm.o
obj-$(CONFIG_CRYPTO_SHA512_ARM) += sha512-arm.o
obj-$(CONFIG_CRYPTO_CHACHA20_NEON) += chacha20-neon.o
obj-$(CONFIG_CRYPTO_SPECK_NEON) += speck-neon.o
ce-obj-$(CONFIG_CRYPTO_AES_ARM_CE) += aes-arm-ce.o
ce-obj-$(CONFIG_CRYPTO_SHA1_ARM_CE) += sha1-arm-ce.o
......@@ -54,7 +53,6 @@ ghash-arm-ce-y := ghash-ce-core.o ghash-ce-glue.o
crct10dif-arm-ce-y := crct10dif-ce-core.o crct10dif-ce-glue.o
crc32-arm-ce-y:= crc32-ce-core.o crc32-ce-glue.o
chacha20-neon-y := chacha20-neon-core.o chacha20-neon-glue.o
speck-neon-y := speck-neon-core.o speck-neon-glue.o
ifdef REGENERATE_ARM_CRYPTO
quiet_cmd_perl = PERL $@
......
......@@ -18,6 +18,34 @@
* (at your option) any later version.
*/
/*
* NEON doesn't have a rotate instruction. The alternatives are, more or less:
*
* (a) vshl.u32 + vsri.u32 (needs temporary register)
* (b) vshl.u32 + vshr.u32 + vorr (needs temporary register)
* (c) vrev32.16 (16-bit rotations only)
* (d) vtbl.8 + vtbl.8 (multiple of 8 bits rotations only,
* needs index vector)
*
* ChaCha20 has 16, 12, 8, and 7-bit rotations. For the 12 and 7-bit
* rotations, the only choices are (a) and (b). We use (a) since it takes
* two-thirds the cycles of (b) on both Cortex-A7 and Cortex-A53.
*
* For the 16-bit rotation, we use vrev32.16 since it's consistently fastest
* and doesn't need a temporary register.
*
* For the 8-bit rotation, we use vtbl.8 + vtbl.8. On Cortex-A7, this sequence
* is twice as fast as (a), even when doing (a) on multiple registers
* simultaneously to eliminate the stall between vshl and vsri. Also, it
* parallelizes better when temporary registers are scarce.
*
* A disadvantage is that on Cortex-A53, the vtbl sequence is the same speed as
* (a), so the need to load the rotation table actually makes the vtbl method
* slightly slower overall on that CPU (~1.3% slower ChaCha20). Still, it
* seems to be a good compromise to get a more significant speed boost on some
* CPUs, e.g. ~4.8% faster ChaCha20 on Cortex-A7.
*/
#include <linux/linkage.h>
.text
......@@ -46,7 +74,9 @@ ENTRY(chacha20_block_xor_neon)
vmov q10, q2
vmov q11, q3
adr ip, .Lrol8_table
mov r3, #10
vld1.8 {d10}, [ip, :64]
.Ldoubleround:
// x0 += x1, x3 = rotl32(x3 ^ x0, 16)
......@@ -62,9 +92,9 @@ ENTRY(chacha20_block_xor_neon)
// x0 += x1, x3 = rotl32(x3 ^ x0, 8)
vadd.i32 q0, q0, q1
veor q4, q3, q0
vshl.u32 q3, q4, #8
vsri.u32 q3, q4, #24
veor q3, q3, q0
vtbl.8 d6, {d6}, d10
vtbl.8 d7, {d7}, d10
// x2 += x3, x1 = rotl32(x1 ^ x2, 7)
vadd.i32 q2, q2, q3
......@@ -92,9 +122,9 @@ ENTRY(chacha20_block_xor_neon)
// x0 += x1, x3 = rotl32(x3 ^ x0, 8)
vadd.i32 q0, q0, q1
veor q4, q3, q0
vshl.u32 q3, q4, #8
vsri.u32 q3, q4, #24
veor q3, q3, q0
vtbl.8 d6, {d6}, d10
vtbl.8 d7, {d7}, d10
// x2 += x3, x1 = rotl32(x1 ^ x2, 7)
vadd.i32 q2, q2, q3
......@@ -139,13 +169,17 @@ ENTRY(chacha20_block_xor_neon)
bx lr
ENDPROC(chacha20_block_xor_neon)
.align 4
.Lctrinc: .word 0, 1, 2, 3
.Lrol8_table: .byte 3, 0, 1, 2, 7, 4, 5, 6
.align 5
ENTRY(chacha20_4block_xor_neon)
push {r4-r6, lr}
mov ip, sp // preserve the stack pointer
sub r3, sp, #0x20 // allocate a 32 byte buffer
bic r3, r3, #0x1f // aligned to 32 bytes
mov sp, r3
push {r4-r5}
mov r4, sp // preserve the stack pointer
sub ip, sp, #0x20 // allocate a 32 byte buffer
bic ip, ip, #0x1f // aligned to 32 bytes
mov sp, ip
// r0: Input state matrix, s
// r1: 4 data blocks output, o
......@@ -155,25 +189,24 @@ ENTRY(chacha20_4block_xor_neon)
// This function encrypts four consecutive ChaCha20 blocks by loading
// the state matrix in NEON registers four times. The algorithm performs
// each operation on the corresponding word of each state matrix, hence
// requires no word shuffling. For final XORing step we transpose the
// matrix by interleaving 32- and then 64-bit words, which allows us to
// do XOR in NEON registers.
// requires no word shuffling. The words are re-interleaved before the
// final addition of the original state and the XORing step.
//
// x0..15[0-3] = s0..3[0..3]
add r3, r0, #0x20
// x0..15[0-3] = s0..15[0-3]
add ip, r0, #0x20
vld1.32 {q0-q1}, [r0]
vld1.32 {q2-q3}, [r3]
vld1.32 {q2-q3}, [ip]
adr r3, CTRINC
adr r5, .Lctrinc
vdup.32 q15, d7[1]
vdup.32 q14, d7[0]
vld1.32 {q11}, [r3, :128]
vld1.32 {q4}, [r5, :128]
vdup.32 q13, d6[1]
vdup.32 q12, d6[0]
vadd.i32 q12, q12, q11 // x12 += counter values 0-3
vdup.32 q11, d5[1]
vdup.32 q10, d5[0]
vadd.u32 q12, q12, q4 // x12 += counter values 0-3
vdup.32 q9, d4[1]
vdup.32 q8, d4[0]
vdup.32 q7, d3[1]
......@@ -185,9 +218,13 @@ ENTRY(chacha20_4block_xor_neon)
vdup.32 q1, d0[1]
vdup.32 q0, d0[0]
adr ip, .Lrol8_table
mov r3, #10
b 1f
.Ldoubleround4:
vld1.32 {q8-q9}, [sp, :256]
1:
// x0 += x4, x12 = rotl32(x12 ^ x0, 16)
// x1 += x5, x13 = rotl32(x13 ^ x1, 16)
// x2 += x6, x14 = rotl32(x14 ^ x2, 16)
......@@ -236,24 +273,25 @@ ENTRY(chacha20_4block_xor_neon)
// x1 += x5, x13 = rotl32(x13 ^ x1, 8)
// x2 += x6, x14 = rotl32(x14 ^ x2, 8)
// x3 += x7, x15 = rotl32(x15 ^ x3, 8)
vld1.8 {d16}, [ip, :64]
vadd.i32 q0, q0, q4
vadd.i32 q1, q1, q5
vadd.i32 q2, q2, q6
vadd.i32 q3, q3, q7
veor q8, q12, q0
veor q9, q13, q1
vshl.u32 q12, q8, #8
vshl.u32 q13, q9, #8
vsri.u32 q12, q8, #24
vsri.u32 q13, q9, #24
veor q12, q12, q0
veor q13, q13, q1
veor q14, q14, q2
veor q15, q15, q3
veor q8, q14, q2
veor q9, q15, q3
vshl.u32 q14, q8, #8
vshl.u32 q15, q9, #8
vsri.u32 q14, q8, #24
vsri.u32 q15, q9, #24
vtbl.8 d24, {d24}, d16
vtbl.8 d25, {d25}, d16
vtbl.8 d26, {d26}, d16
vtbl.8 d27, {d27}, d16
vtbl.8 d28, {d28}, d16
vtbl.8 d29, {d29}, d16
vtbl.8 d30, {d30}, d16
vtbl.8 d31, {d31}, d16
vld1.32 {q8-q9}, [sp, :256]
......@@ -332,24 +370,25 @@ ENTRY(chacha20_4block_xor_neon)
// x1 += x6, x12 = rotl32(x12 ^ x1, 8)
// x2 += x7, x13 = rotl32(x13 ^ x2, 8)
// x3 += x4, x14 = rotl32(x14 ^ x3, 8)
vld1.8 {d16}, [ip, :64]
vadd.i32 q0, q0, q5
vadd.i32 q1, q1, q6
vadd.i32 q2, q2, q7
vadd.i32 q3, q3, q4
veor q8, q15, q0
veor q9, q12, q1
vshl.u32 q15, q8, #8
vshl.u32 q12, q9, #8
vsri.u32 q15, q8, #24
vsri.u32 q12, q9, #24
veor q15, q15, q0
veor q12, q12, q1
veor q13, q13, q2
veor q14, q14, q3
veor q8, q13, q2
veor q9, q14, q3
vshl.u32 q13, q8, #8
vshl.u32 q14, q9, #8
vsri.u32 q13, q8, #24
vsri.u32 q14, q9, #24
vtbl.8 d30, {d30}, d16
vtbl.8 d31, {d31}, d16
vtbl.8 d24, {d24}, d16
vtbl.8 d25, {d25}, d16
vtbl.8 d26, {d26}, d16
vtbl.8 d27, {d27}, d16
vtbl.8 d28, {d28}, d16
vtbl.8 d29, {d29}, d16
vld1.32 {q8-q9}, [sp, :256]
......@@ -379,104 +418,76 @@ ENTRY(chacha20_4block_xor_neon)
vsri.u32 q6, q9, #25
subs r3, r3, #1
beq 0f
vld1.32 {q8-q9}, [sp, :256]
b .Ldoubleround4
// x0[0-3] += s0[0]
// x1[0-3] += s0[1]
// x2[0-3] += s0[2]
// x3[0-3] += s0[3]
0: ldmia r0!, {r3-r6}
vdup.32 q8, r3
vdup.32 q9, r4
vadd.i32 q0, q0, q8
vadd.i32 q1, q1, q9
vdup.32 q8, r5
vdup.32 q9, r6
vadd.i32 q2, q2, q8
vadd.i32 q3, q3, q9
// x4[0-3] += s1[0]
// x5[0-3] += s1[1]
// x6[0-3] += s1[2]
// x7[0-3] += s1[3]
ldmia r0!, {r3-r6}
vdup.32 q8, r3
vdup.32 q9, r4
vadd.i32 q4, q4, q8
vadd.i32 q5, q5, q9
vdup.32 q8, r5
vdup.32 q9, r6
vadd.i32 q6, q6, q8
vadd.i32 q7, q7, q9
// interleave 32-bit words in state n, n+1
vzip.32 q0, q1
vzip.32 q2, q3
vzip.32 q4, q5
vzip.32 q6, q7
// interleave 64-bit words in state n, n+2
bne .Ldoubleround4
// x0..7[0-3] are in q0-q7, x10..15[0-3] are in q10-q15.
// x8..9[0-3] are on the stack.
// Re-interleave the words in the first two rows of each block (x0..7).
// Also add the counter values 0-3 to x12[0-3].
vld1.32 {q8}, [r5, :128] // load counter values 0-3
vzip.32 q0, q1 // => (0 1 0 1) (0 1 0 1)
vzip.32 q2, q3 // => (2 3 2 3) (2 3 2 3)
vzip.32 q4, q5 // => (4 5 4 5) (4 5 4 5)
vzip.32 q6, q7 // => (6 7 6 7) (6 7 6 7)
vadd.u32 q12, q8 // x12 += counter values 0-3
vswp d1, d4
vswp d3, d6
vld1.32 {q8-q9}, [r0]! // load s0..7
vswp d9, d12
vswp d11, d14
// xor with corresponding input, write to output
// Swap q1 and q4 so that we'll free up consecutive registers (q0-q1)
// after XORing the first 32 bytes.
vswp q1, q4
// First two rows of each block are (q0 q1) (q2 q6) (q4 q5) (q3 q7)
// x0..3[0-3] += s0..3[0-3] (add orig state to 1st row of each block)
vadd.u32 q0, q0, q8
vadd.u32 q2, q2, q8
vadd.u32 q4, q4, q8
vadd.u32 q3, q3, q8
// x4..7[0-3] += s4..7[0-3] (add orig state to 2nd row of each block)
vadd.u32 q1, q1, q9
vadd.u32 q6, q6, q9
vadd.u32 q5, q5, q9
vadd.u32 q7, q7, q9
// XOR first 32 bytes using keystream from first two rows of first block
vld1.8 {q8-q9}, [r2]!
veor q8, q8, q0
veor q9, q9, q4
veor q9, q9, q1
vst1.8 {q8-q9}, [r1]!
// Re-interleave the words in the last two rows of each block (x8..15).
vld1.32 {q8-q9}, [sp, :256]
// x8[0-3] += s2[0]
// x9[0-3] += s2[1]
// x10[0-3] += s2[2]
// x11[0-3] += s2[3]
ldmia r0!, {r3-r6}
vdup.32 q0, r3
vdup.32 q4, r4
vadd.i32 q8, q8, q0
vadd.i32 q9, q9, q4
vdup.32 q0, r5
vdup.32 q4, r6
vadd.i32 q10, q10, q0
vadd.i32 q11, q11, q4
// x12[0-3] += s3[0]
// x13[0-3] += s3[1]
// x14[0-3] += s3[2]
// x15[0-3] += s3[3]
ldmia r0!, {r3-r6}
vdup.32 q0, r3
vdup.32 q4, r4
adr r3, CTRINC
vadd.i32 q12, q12, q0
vld1.32 {q0}, [r3, :128]
vadd.i32 q13, q13, q4
vadd.i32 q12, q12, q0 // x12 += counter values 0-3
vdup.32 q0, r5
vdup.32 q4, r6
vadd.i32 q14, q14, q0
vadd.i32 q15, q15, q4
// interleave 32-bit words in state n, n+1
vzip.32 q8, q9
vzip.32 q10, q11
vzip.32 q12, q13
vzip.32 q14, q15
// interleave 64-bit words in state n, n+2
vswp d17, d20
vswp d19, d22
vzip.32 q12, q13 // => (12 13 12 13) (12 13 12 13)
vzip.32 q14, q15 // => (14 15 14 15) (14 15 14 15)
vzip.32 q8, q9 // => (8 9 8 9) (8 9 8 9)
vzip.32 q10, q11 // => (10 11 10 11) (10 11 10 11)
vld1.32 {q0-q1}, [r0] // load s8..15
vswp d25, d28
vswp d27, d30
vswp d17, d20
vswp d19, d22
// Last two rows of each block are (q8 q12) (q10 q14) (q9 q13) (q11 q15)
// x8..11[0-3] += s8..11[0-3] (add orig state to 3rd row of each block)
vadd.u32 q8, q8, q0
vadd.u32 q10, q10, q0
vadd.u32 q9, q9, q0
vadd.u32 q11, q11, q0
// x12..15[0-3] += s12..15[0-3] (add orig state to 4th row of each block)
vadd.u32 q12, q12, q1
vadd.u32 q14, q14, q1
vadd.u32 q13, q13, q1
vadd.u32 q15, q15, q1
vmov q4, q1
// XOR the rest of the data with the keystream
vld1.8 {q0-q1}, [r2]!
veor q0, q0, q8
......@@ -509,13 +520,11 @@ ENTRY(chacha20_4block_xor_neon)
vst1.8 {q0-q1}, [r1]!
vld1.8 {q0-q1}, [r2]
mov sp, r4 // restore original stack pointer
veor q0, q0, q11
veor q1, q1, q15
vst1.8 {q0-q1}, [r1]
mov sp, ip
pop {r4-r6, pc}
pop {r4-r5}
bx lr
ENDPROC(chacha20_4block_xor_neon)
.align 4
CTRINC: .word 0, 1, 2, 3
......@@ -236,7 +236,7 @@ static void __exit crc32_pmull_mod_exit(void)
ARRAY_SIZE(crc32_pmull_algs));
}
static const struct cpu_feature crc32_cpu_feature[] = {
static const struct cpu_feature __maybe_unused crc32_cpu_feature[] = {
{ cpu_feature(CRC32) }, { cpu_feature(PMULL) }, { }
};
MODULE_DEVICE_TABLE(cpu, crc32_cpu_feature);
......
......@@ -63,6 +63,33 @@
k48 .req d31
SHASH2_p64 .req d31
HH .req q10
HH3 .req q11
HH4 .req q12
HH34 .req q13
HH_L .req d20
HH_H .req d21
HH3_L .req d22
HH3_H .req d23
HH4_L .req d24
HH4_H .req d25
HH34_L .req d26
HH34_H .req d27
SHASH2_H .req d29
XL2 .req q5
XM2 .req q6
XH2 .req q7
T3 .req q8
XL2_L .req d10
XL2_H .req d11
XM2_L .req d12
XM2_H .req d13
T3_L .req d16
T3_H .req d17
.text
.fpu crypto-neon-fp-armv8
......@@ -175,12 +202,77 @@
beq 0f
vld1.64 {T1}, [ip]
teq r0, #0
b 1f
b 3f
0: .ifc \pn, p64
tst r0, #3 // skip until #blocks is a
bne 2f // round multiple of 4
vld1.8 {XL2-XM2}, [r2]!
1: vld1.8 {T3-T2}, [r2]!
vrev64.8 XL2, XL2
vrev64.8 XM2, XM2
subs r0, r0, #4
vext.8 T1, XL2, XL2, #8
veor XL2_H, XL2_H, XL_L
veor XL, XL, T1
vrev64.8 T3, T3
vrev64.8 T1, T2
vmull.p64 XH, HH4_H, XL_H // a1 * b1
veor XL2_H, XL2_H, XL_H
vmull.p64 XL, HH4_L, XL_L // a0 * b0
vmull.p64 XM, HH34_H, XL2_H // (a1 + a0)(b1 + b0)
vmull.p64 XH2, HH3_H, XM2_L // a1 * b1
veor XM2_L, XM2_L, XM2_H
vmull.p64 XL2, HH3_L, XM2_H // a0 * b0
vmull.p64 XM2, HH34_L, XM2_L // (a1 + a0)(b1 + b0)
veor XH, XH, XH2
veor XL, XL, XL2
veor XM, XM, XM2
vmull.p64 XH2, HH_H, T3_L // a1 * b1
veor T3_L, T3_L, T3_H
vmull.p64 XL2, HH_L, T3_H // a0 * b0
vmull.p64 XM2, SHASH2_H, T3_L // (a1 + a0)(b1 + b0)
veor XH, XH, XH2
veor XL, XL, XL2
veor XM, XM, XM2
vmull.p64 XH2, SHASH_H, T1_L // a1 * b1
veor T1_L, T1_L, T1_H
vmull.p64 XL2, SHASH_L, T1_H // a0 * b0
vmull.p64 XM2, SHASH2_p64, T1_L // (a1 + a0)(b1 + b0)
veor XH, XH, XH2
veor XL, XL, XL2
veor XM, XM, XM2
0: vld1.64 {T1}, [r2]!
beq 4f
vld1.8 {XL2-XM2}, [r2]!
veor T1, XL, XH
veor XM, XM, T1
__pmull_reduce_p64
veor T1, T1, XH
veor XL, XL, T1
b 1b
.endif
2: vld1.64 {T1}, [r2]!
subs r0, r0, #1
1: /* multiply XL by SHASH in GF(2^128) */
3: /* multiply XL by SHASH in GF(2^128) */
#ifndef CONFIG_CPU_BIG_ENDIAN
vrev64.8 T1, T1
#endif
......@@ -193,7 +285,7 @@
__pmull_\pn XL, XL_L, SHASH_L, s1l, s2l, s3l, s4l @ a0 * b0
__pmull_\pn XM, T1_L, SHASH2_\pn @ (a1+a0)(b1+b0)
veor T1, XL, XH
4: veor T1, XL, XH
veor XM, XM, T1
__pmull_reduce_\pn
......@@ -212,8 +304,14 @@
* struct ghash_key const *k, const char *head)
*/
ENTRY(pmull_ghash_update_p64)
vld1.64 {SHASH}, [r3]
vld1.64 {SHASH}, [r3]!
vld1.64 {HH}, [r3]!
vld1.64 {HH3-HH4}, [r3]
veor SHASH2_p64, SHASH_L, SHASH_H
veor SHASH2_H, HH_L, HH_H
veor HH34_L, HH3_L, HH3_H
veor HH34_H, HH4_L, HH4_H
vmov.i8 MASK, #0xe1
vshl.u64 MASK, MASK, #57
......
/*
* Accelerated GHASH implementation with ARMv8 vmull.p64 instructions.
*
* Copyright (C) 2015 Linaro Ltd. <ard.biesheuvel@linaro.org>
* Copyright (C) 2015 - 2018 Linaro Ltd. <ard.biesheuvel@linaro.org>
*
* This program is free software; you can redistribute it and/or modify it
* under the terms of the GNU General Public License version 2 as published
......@@ -28,8 +28,10 @@ MODULE_ALIAS_CRYPTO("ghash");
#define GHASH_DIGEST_SIZE 16
struct ghash_key {
u64 a;
u64 b;
u64 h[2];
u64 h2[2];
u64 h3[2];
u64 h4[2];
};
struct ghash_desc_ctx {
......@@ -117,26 +119,40 @@ static int ghash_final(struct shash_desc *desc, u8 *dst)
return 0;
}
static void ghash_reflect(u64 h[], const be128 *k)
{
u64 carry = be64_to_cpu(k->a) >> 63;
h[0] = (be64_to_cpu(k->b) << 1) | carry;
h[1] = (be64_to_cpu(k->a) << 1) | (be64_to_cpu(k->b) >> 63);
if (carry)
h[1] ^= 0xc200000000000000UL;
}
static int ghash_setkey(struct crypto_shash *tfm,
const u8 *inkey, unsigned int keylen)
{
struct ghash_key *key = crypto_shash_ctx(tfm);
u64 a, b;
be128 h, k;
if (keylen != GHASH_BLOCK_SIZE) {
crypto_shash_set_flags(tfm, CRYPTO_TFM_RES_BAD_KEY_LEN);
return -EINVAL;
}
/* perform multiplication by 'x' in GF(2^128) */
b = get_unaligned_be64(inkey);
a = get_unaligned_be64(inkey + 8);
memcpy(&k, inkey, GHASH_BLOCK_SIZE);
ghash_reflect(key->h, &k);
h = k;
gf128mul_lle(&h, &k);
ghash_reflect(key->h2, &h);
key->a = (a << 1) | (b >> 63);
key->b = (b << 1) | (a >> 63);
gf128mul_lle(&h, &k);
ghash_reflect(key->h3, &h);
if (b >> 63)
key->b ^= 0xc200000000000000UL;
gf128mul_lle(&h, &k);
ghash_reflect(key->h4, &h);
return 0;
}
......
// SPDX-License-Identifier: GPL-2.0
/*
* NEON-accelerated implementation of Speck128-XTS and Speck64-XTS
*
* Copyright (c) 2018 Google, Inc
*
* Author: Eric Biggers <ebiggers@google.com>
*/
#include <linux/linkage.h>
.text
.fpu neon
// arguments
ROUND_KEYS .req r0 // const {u64,u32} *round_keys
NROUNDS .req r1 // int nrounds
DST .req r2 // void *dst
SRC .req r3 // const void *src
NBYTES .req r4 // unsigned int nbytes
TWEAK .req r5 // void *tweak
// registers which hold the data being encrypted/decrypted
X0 .req q0
X0_L .req d0
X0_H .req d1
Y0 .req q1
Y0_H .req d3
X1 .req q2
X1_L .req d4
X1_H .req d5
Y1 .req q3
Y1_H .req d7
X2 .req q4
X2_L .req d8
X2_H .req d9
Y2 .req q5
Y2_H .req d11
X3 .req q6
X3_L .req d12
X3_H .req d13
Y3 .req q7
Y3_H .req d15
// the round key, duplicated in all lanes
ROUND_KEY .req q8
ROUND_KEY_L .req d16
ROUND_KEY_H .req d17
// index vector for vtbl-based 8-bit rotates
ROTATE_TABLE .req d18
// multiplication table for updating XTS tweaks
GF128MUL_TABLE .req d19
GF64MUL_TABLE .req d19
// current XTS tweak value(s)
TWEAKV .req q10
TWEAKV_L .req d20
TWEAKV_H .req d21
TMP0 .req q12
TMP0_L .req d24
TMP0_H .req d25
TMP1 .req q13
TMP2 .req q14
TMP3 .req q15
.align 4
.Lror64_8_table:
.byte 1, 2, 3, 4, 5, 6, 7, 0
.Lror32_8_table:
.byte 1, 2, 3, 0, 5, 6, 7, 4
.Lrol64_8_table:
.byte 7, 0, 1, 2, 3, 4, 5, 6
.Lrol32_8_table:
.byte 3, 0, 1, 2, 7, 4, 5, 6
.Lgf128mul_table:
.byte 0, 0x87
.fill 14
.Lgf64mul_table:
.byte 0, 0x1b, (0x1b << 1), (0x1b << 1) ^ 0x1b
.fill 12
/*
* _speck_round_128bytes() - Speck encryption round on 128 bytes at a time
*
* Do one Speck encryption round on the 128 bytes (8 blocks for Speck128, 16 for
* Speck64) stored in X0-X3 and Y0-Y3, using the round key stored in all lanes
* of ROUND_KEY. 'n' is the lane size: 64 for Speck128, or 32 for Speck64.
*
* The 8-bit rotates are implemented using vtbl instead of vshr + vsli because
* the vtbl approach is faster on some processors and the same speed on others.
*/
.macro _speck_round_128bytes n
// x = ror(x, 8)
vtbl.8 X0_L, {X0_L}, ROTATE_TABLE
vtbl.8 X0_H, {X0_H}, ROTATE_TABLE
vtbl.8 X1_L, {X1_L}, ROTATE_TABLE
vtbl.8 X1_H, {X1_H}, ROTATE_TABLE
vtbl.8 X2_L, {X2_L}, ROTATE_TABLE
vtbl.8 X2_H, {X2_H}, ROTATE_TABLE
vtbl.8 X3_L, {X3_L}, ROTATE_TABLE
vtbl.8 X3_H, {X3_H}, ROTATE_TABLE
// x += y
vadd.u\n X0, Y0
vadd.u\n X1, Y1
vadd.u\n X2, Y2
vadd.u\n X3, Y3
// x ^= k
veor X0, ROUND_KEY
veor X1, ROUND_KEY
veor X2, ROUND_KEY
veor X3, ROUND_KEY
// y = rol(y, 3)
vshl.u\n TMP0, Y0, #3
vshl.u\n TMP1, Y1, #3
vshl.u\n TMP2, Y2, #3
vshl.u\n TMP3, Y3, #3
vsri.u\n TMP0, Y0, #(\n - 3)
vsri.u\n TMP1, Y1, #(\n - 3)
vsri.u\n TMP2, Y2, #(\n - 3)
vsri.u\n TMP3, Y3, #(\n - 3)
// y ^= x
veor Y0, TMP0, X0
veor Y1, TMP1, X1
veor Y2, TMP2, X2
veor Y3, TMP3, X3
.endm
/*
* _speck_unround_128bytes() - Speck decryption round on 128 bytes at a time
*
* This is the inverse of _speck_round_128bytes().
*/
.macro _speck_unround_128bytes n
// y ^= x
veor TMP0, Y0, X0
veor TMP1, Y1, X1
veor TMP2, Y2, X2
veor TMP3, Y3, X3
// y = ror(y, 3)
vshr.u\n Y0, TMP0, #3
vshr.u\n Y1, TMP1, #3
vshr.u\n Y2, TMP2, #3
vshr.u\n Y3, TMP3, #3
vsli.u\n Y0, TMP0, #(\n - 3)
vsli.u\n Y1, TMP1, #(\n - 3)
vsli.u\n Y2, TMP2, #(\n - 3)
vsli.u\n Y3, TMP3, #(\n - 3)
// x ^= k
veor X0, ROUND_KEY
veor X1, ROUND_KEY
veor X2, ROUND_KEY
veor X3, ROUND_KEY
// x -= y
vsub.u\n X0, Y0
vsub.u\n X1, Y1
vsub.u\n X2, Y2
vsub.u\n X3, Y3
// x = rol(x, 8);
vtbl.8 X0_L, {X0_L}, ROTATE_TABLE
vtbl.8 X0_H, {X0_H}, ROTATE_TABLE
vtbl.8 X1_L, {X1_L}, ROTATE_TABLE
vtbl.8 X1_H, {X1_H}, ROTATE_TABLE
vtbl.8 X2_L, {X2_L}, ROTATE_TABLE
vtbl.8 X2_H, {X2_H}, ROTATE_TABLE
vtbl.8 X3_L, {X3_L}, ROTATE_TABLE
vtbl.8 X3_H, {X3_H}, ROTATE_TABLE
.endm
.macro _xts128_precrypt_one dst_reg, tweak_buf, tmp
// Load the next source block
vld1.8 {\dst_reg}, [SRC]!
// Save the current tweak in the tweak buffer
vst1.8 {TWEAKV}, [\tweak_buf:128]!
// XOR the next source block with the current tweak
veor \dst_reg, TWEAKV
/*
* Calculate the next tweak by multiplying the current one by x,
* modulo p(x) = x^128 + x^7 + x^2 + x + 1.
*/
vshr.u64 \tmp, TWEAKV, #63
vshl.u64 TWEAKV, #1
veor TWEAKV_H, \tmp\()_L
vtbl.8 \tmp\()_H, {GF128MUL_TABLE}, \tmp\()_H
veor TWEAKV_L, \tmp\()_H
.endm
.macro _xts64_precrypt_two dst_reg, tweak_buf, tmp
// Load the next two source blocks
vld1.8 {\dst_reg}, [SRC]!
// Save the current two tweaks in the tweak buffer
vst1.8 {TWEAKV}, [\tweak_buf:128]!
// XOR the next two source blocks with the current two tweaks
veor \dst_reg, TWEAKV
/*
* Calculate the next two tweaks by multiplying the current ones by x^2,
* modulo p(x) = x^64 + x^4 + x^3 + x + 1.
*/
vshr.u64 \tmp, TWEAKV, #62
vshl.u64 TWEAKV, #2
vtbl.8 \tmp\()_L, {GF64MUL_TABLE}, \tmp\()_L
vtbl.8 \tmp\()_H, {GF64MUL_TABLE}, \tmp\()_H
veor TWEAKV, \tmp
.endm
/*
* _speck_xts_crypt() - Speck-XTS encryption/decryption
*
* Encrypt or decrypt NBYTES bytes of data from the SRC buffer to the DST buffer
* using Speck-XTS, specifically the variant with a block size of '2n' and round
* count given by NROUNDS. The expanded round keys are given in ROUND_KEYS, and
* the current XTS tweak value is given in TWEAK. It's assumed that NBYTES is a
* nonzero multiple of 128.
*/
.macro _speck_xts_crypt n, decrypting
push {r4-r7}
mov r7, sp
/*
* The first four parameters were passed in registers r0-r3. Load the
* additional parameters, which were passed on the stack.
*/
ldr NBYTES, [sp, #16]
ldr TWEAK, [sp, #20]
/*
* If decrypting, modify the ROUND_KEYS parameter to point to the last
* round key rather than the first, since for decryption the round keys
* are used in reverse order.
*/
.if \decrypting
.if \n == 64
add ROUND_KEYS, ROUND_KEYS, NROUNDS, lsl #3
sub ROUND_KEYS, #8
.else
add ROUND_KEYS, ROUND_KEYS, NROUNDS, lsl #2
sub ROUND_KEYS, #4
.endif
.endif
// Load the index vector for vtbl-based 8-bit rotates
.if \decrypting
ldr r12, =.Lrol\n\()_8_table
.else
ldr r12, =.Lror\n\()_8_table
.endif
vld1.8 {ROTATE_TABLE}, [r12:64]
// One-time XTS preparation
/*
* Allocate stack space to store 128 bytes worth of tweaks. For
* performance, this space is aligned to a 16-byte boundary so that we
* can use the load/store instructions that declare 16-byte alignment.
* For Thumb2 compatibility, don't do the 'bic' directly on 'sp'.
*/
sub r12, sp, #128
bic r12, #0xf
mov sp, r12
.if \n == 64
// Load first tweak
vld1.8 {TWEAKV}, [TWEAK]
// Load GF(2^128) multiplication table
ldr r12, =.Lgf128mul_table
vld1.8 {GF128MUL_TABLE}, [r12:64]
.else
// Load first tweak
vld1.8 {TWEAKV_L}, [TWEAK]
// Load GF(2^64) multiplication table
ldr r12, =.Lgf64mul_table
vld1.8 {GF64MUL_TABLE}, [r12:64]
// Calculate second tweak, packing it together with the first
vshr.u64 TMP0_L, TWEAKV_L, #63
vtbl.u8 TMP0_L, {GF64MUL_TABLE}, TMP0_L
vshl.u64 TWEAKV_H, TWEAKV_L, #1
veor TWEAKV_H, TMP0_L
.endif
.Lnext_128bytes_\@:
/*
* Load the source blocks into {X,Y}[0-3], XOR them with their XTS tweak
* values, and save the tweaks on the stack for later. Then
* de-interleave the 'x' and 'y' elements of each block, i.e. make it so
* that the X[0-3] registers contain only the second halves of blocks,
* and the Y[0-3] registers contain only the first halves of blocks.
* (Speck uses the order (y, x) rather than the more intuitive (x, y).)
*/
mov r12, sp
.if \n == 64
_xts128_precrypt_one X0, r12, TMP0
_xts128_precrypt_one Y0, r12, TMP0
_xts128_precrypt_one X1, r12, TMP0
_xts128_precrypt_one Y1, r12, TMP0
_xts128_precrypt_one X2, r12, TMP0
_xts128_precrypt_one Y2, r12, TMP0
_xts128_precrypt_one X3, r12, TMP0
_xts128_precrypt_one Y3, r12, TMP0
vswp X0_L, Y0_H
vswp X1_L, Y1_H
vswp X2_L, Y2_H
vswp X3_L, Y3_H
.else
_xts64_precrypt_two X0, r12, TMP0
_xts64_precrypt_two Y0, r12, TMP0
_xts64_precrypt_two X1, r12, TMP0
_xts64_precrypt_two Y1, r12, TMP0
_xts64_precrypt_two X2, r12, TMP0
_xts64_precrypt_two Y2, r12, TMP0
_xts64_precrypt_two X3, r12, TMP0
_xts64_precrypt_two Y3, r12, TMP0
vuzp.32 Y0, X0
vuzp.32 Y1, X1
vuzp.32 Y2, X2
vuzp.32 Y3, X3
.endif
// Do the cipher rounds
mov r12, ROUND_KEYS
mov r6, NROUNDS
.Lnext_round_\@:
.if \decrypting
.if \n == 64
vld1.64 ROUND_KEY_L, [r12]
sub r12, #8
vmov ROUND_KEY_H, ROUND_KEY_L
.else
vld1.32 {ROUND_KEY_L[],ROUND_KEY_H[]}, [r12]
sub r12, #4
.endif
_speck_unround_128bytes \n
.else
.if \n == 64
vld1.64 ROUND_KEY_L, [r12]!
vmov ROUND_KEY_H, ROUND_KEY_L
.else
vld1.32 {ROUND_KEY_L[],ROUND_KEY_H[]}, [r12]!
.endif
_speck_round_128bytes \n
.endif
subs r6, r6, #1
bne .Lnext_round_\@
// Re-interleave the 'x' and 'y' elements of each block
.if \n == 64
vswp X0_L, Y0_H
vswp X1_L, Y1_H
vswp X2_L, Y2_H
vswp X3_L, Y3_H
.else
vzip.32 Y0, X0
vzip.32 Y1, X1
vzip.32 Y2, X2
vzip.32 Y3, X3
.endif
// XOR the encrypted/decrypted blocks with the tweaks we saved earlier
mov r12, sp
vld1.8 {TMP0, TMP1}, [r12:128]!
vld1.8 {TMP2, TMP3}, [r12:128]!
veor X0, TMP0
veor Y0, TMP1
veor X1, TMP2
veor Y1, TMP3
vld1.8 {TMP0, TMP1}, [r12:128]!
vld1.8 {TMP2, TMP3}, [r12:128]!
veor X2, TMP0
veor Y2, TMP1
veor X3, TMP2
veor Y3, TMP3
// Store the ciphertext in the destination buffer
vst1.8 {X0, Y0}, [DST]!
vst1.8 {X1, Y1}, [DST]!
vst1.8 {X2, Y2}, [DST]!
vst1.8 {X3, Y3}, [DST]!
// Continue if there are more 128-byte chunks remaining, else return
subs NBYTES, #128
bne .Lnext_128bytes_\@
// Store the next tweak
.if \n == 64
vst1.8 {TWEAKV}, [TWEAK]
.else
vst1.8 {TWEAKV_L}, [TWEAK]
.endif
mov sp, r7
pop {r4-r7}
bx lr
.endm
ENTRY(speck128_xts_encrypt_neon)
_speck_xts_crypt n=64, decrypting=0
ENDPROC(speck128_xts_encrypt_neon)
ENTRY(speck128_xts_decrypt_neon)
_speck_xts_crypt n=64, decrypting=1
ENDPROC(speck128_xts_decrypt_neon)
ENTRY(speck64_xts_encrypt_neon)
_speck_xts_crypt n=32, decrypting=0
ENDPROC(speck64_xts_encrypt_neon)
ENTRY(speck64_xts_decrypt_neon)
_speck_xts_crypt n=32, decrypting=1
ENDPROC(speck64_xts_decrypt_neon)
// SPDX-License-Identifier: GPL-2.0
/*
* NEON-accelerated implementation of Speck128-XTS and Speck64-XTS
*
* Copyright (c) 2018 Google, Inc
*
* Note: the NIST recommendation for XTS only specifies a 128-bit block size,
* but a 64-bit version (needed for Speck64) is fairly straightforward; the math
* is just done in GF(2^64) instead of GF(2^128), with the reducing polynomial
* x^64 + x^4 + x^3 + x + 1 from the original XEX paper (Rogaway, 2004:
* "Efficient Instantiations of Tweakable Blockciphers and Refinements to Modes
* OCB and PMAC"), represented as 0x1B.
*/
#include <asm/hwcap.h>
#include <asm/neon.h>
#include <asm/simd.h>
#include <crypto/algapi.h>
#include <crypto/gf128mul.h>
#include <crypto/internal/skcipher.h>
#include <crypto/speck.h>
#include <crypto/xts.h>
#include <linux/kernel.h>
#include <linux/module.h>
/* The assembly functions only handle multiples of 128 bytes */
#define SPECK_NEON_CHUNK_SIZE 128
/* Speck128 */
struct speck128_xts_tfm_ctx {
struct speck128_tfm_ctx main_key;
struct speck128_tfm_ctx tweak_key;
};
asmlinkage void speck128_xts_encrypt_neon(const u64 *round_keys, int nrounds,
void *dst, const void *src,
unsigned int nbytes, void *tweak);
asmlinkage void speck128_xts_decrypt_neon(const u64 *round_keys, int nrounds,
void *dst, const void *src,
unsigned int nbytes, void *tweak);
typedef void (*speck128_crypt_one_t)(const struct speck128_tfm_ctx *,
u8 *, const u8 *);
typedef void (*speck128_xts_crypt_many_t)(const u64 *, int, void *,
const void *, unsigned int, void *);
static __always_inline int
__speck128_xts_crypt(struct skcipher_request *req,
speck128_crypt_one_t crypt_one,
speck128_xts_crypt_many_t crypt_many)
{
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
const struct speck128_xts_tfm_ctx *ctx = crypto_skcipher_ctx(tfm);
struct skcipher_walk walk;
le128 tweak;
int err;
err = skcipher_walk_virt(&walk, req, true);
crypto_speck128_encrypt(&ctx->tweak_key, (u8 *)&tweak, walk.iv);
while (walk.nbytes > 0) {
unsigned int nbytes = walk.nbytes;
u8 *dst = walk.dst.virt.addr;
const u8 *src = walk.src.virt.addr;
if (nbytes >= SPECK_NEON_CHUNK_SIZE && may_use_simd()) {
unsigned int count;
count = round_down(nbytes, SPECK_NEON_CHUNK_SIZE);
kernel_neon_begin();
(*crypt_many)(ctx->main_key.round_keys,
ctx->main_key.nrounds,
dst, src, count, &tweak);
kernel_neon_end();
dst += count;
src += count;
nbytes -= count;
}
/* Handle any remainder with generic code */
while (nbytes >= sizeof(tweak)) {
le128_xor((le128 *)dst, (const le128 *)src, &tweak);
(*crypt_one)(&ctx->main_key, dst, dst);
le128_xor((le128 *)dst, (const le128 *)dst, &tweak);
gf128mul_x_ble(&tweak, &tweak);
dst += sizeof(tweak);
src += sizeof(tweak);
nbytes -= sizeof(tweak);
}
err = skcipher_walk_done(&walk, nbytes);
}
return err;
}
static int speck128_xts_encrypt(struct skcipher_request *req)
{
return __speck128_xts_crypt(req, crypto_speck128_encrypt,
speck128_xts_encrypt_neon);
}
static int speck128_xts_decrypt(struct skcipher_request *req)
{
return __speck128_xts_crypt(req, crypto_speck128_decrypt,
speck128_xts_decrypt_neon);
}
static int speck128_xts_setkey(struct crypto_skcipher *tfm, const u8 *key,
unsigned int keylen)
{
struct speck128_xts_tfm_ctx *ctx = crypto_skcipher_ctx(tfm);
int err;
err = xts_verify_key(tfm, key, keylen);
if (err)
return err;
keylen /= 2;
err = crypto_speck128_setkey(&ctx->main_key, key, keylen);
if (err)
return err;
return crypto_speck128_setkey(&ctx->tweak_key, key + keylen, keylen);
}
/* Speck64 */
struct speck64_xts_tfm_ctx {
struct speck64_tfm_ctx main_key;
struct speck64_tfm_ctx tweak_key;
};
asmlinkage void speck64_xts_encrypt_neon(const u32 *round_keys, int nrounds,
void *dst, const void *src,
unsigned int nbytes, void *tweak);
asmlinkage void speck64_xts_decrypt_neon(const u32 *round_keys, int nrounds,
void *dst, const void *src,
unsigned int nbytes, void *tweak);
typedef void (*speck64_crypt_one_t)(const struct speck64_tfm_ctx *,
u8 *, const u8 *);
typedef void (*speck64_xts_crypt_many_t)(const u32 *, int, void *,
const void *, unsigned int, void *);
static __always_inline int
__speck64_xts_crypt(struct skcipher_request *req, speck64_crypt_one_t crypt_one,
speck64_xts_crypt_many_t crypt_many)
{
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
const struct speck64_xts_tfm_ctx *ctx = crypto_skcipher_ctx(tfm);
struct skcipher_walk walk;
__le64 tweak;
int err;
err = skcipher_walk_virt(&walk, req, true);
crypto_speck64_encrypt(&ctx->tweak_key, (u8 *)&tweak, walk.iv);
while (walk.nbytes > 0) {
unsigned int nbytes = walk.nbytes;
u8 *dst = walk.dst.virt.addr;
const u8 *src = walk.src.virt.addr;
if (nbytes >= SPECK_NEON_CHUNK_SIZE && may_use_simd()) {
unsigned int count;
count = round_down(nbytes, SPECK_NEON_CHUNK_SIZE);
kernel_neon_begin();
(*crypt_many)(ctx->main_key.round_keys,
ctx->main_key.nrounds,
dst, src, count, &tweak);
kernel_neon_end();
dst += count;
src += count;
nbytes -= count;
}
/* Handle any remainder with generic code */
while (nbytes >= sizeof(tweak)) {
*(__le64 *)dst = *(__le64 *)src ^ tweak;
(*crypt_one)(&ctx->main_key, dst, dst);
*(__le64 *)dst ^= tweak;
tweak = cpu_to_le64((le64_to_cpu(tweak) << 1) ^
((tweak & cpu_to_le64(1ULL << 63)) ?
0x1B : 0));
dst += sizeof(tweak);
src += sizeof(tweak);
nbytes -= sizeof(tweak);
}
err = skcipher_walk_done(&walk, nbytes);
}
return err;
}
static int speck64_xts_encrypt(struct skcipher_request *req)
{
return __speck64_xts_crypt(req, crypto_speck64_encrypt,
speck64_xts_encrypt_neon);
}
static int speck64_xts_decrypt(struct skcipher_request *req)
{
return __speck64_xts_crypt(req, crypto_speck64_decrypt,
speck64_xts_decrypt_neon);
}
static int speck64_xts_setkey(struct crypto_skcipher *tfm, const u8 *key,
unsigned int keylen)
{
struct speck64_xts_tfm_ctx *ctx = crypto_skcipher_ctx(tfm);
int err;
err = xts_verify_key(tfm, key, keylen);
if (err)
return err;
keylen /= 2;
err = crypto_speck64_setkey(&ctx->main_key, key, keylen);
if (err)
return err;
return crypto_speck64_setkey(&ctx->tweak_key, key + keylen, keylen);
}
static struct skcipher_alg speck_algs[] = {
{
.base.cra_name = "xts(speck128)",
.base.cra_driver_name = "xts-speck128-neon",
.base.cra_priority = 300,
.base.cra_blocksize = SPECK128_BLOCK_SIZE,
.base.cra_ctxsize = sizeof(struct speck128_xts_tfm_ctx),
.base.cra_alignmask = 7,
.base.cra_module = THIS_MODULE,
.min_keysize = 2 * SPECK128_128_KEY_SIZE,
.max_keysize = 2 * SPECK128_256_KEY_SIZE,
.ivsize = SPECK128_BLOCK_SIZE,
.walksize = SPECK_NEON_CHUNK_SIZE,
.setkey = speck128_xts_setkey,
.encrypt = speck128_xts_encrypt,
.decrypt = speck128_xts_decrypt,
}, {
.base.cra_name = "xts(speck64)",
.base.cra_driver_name = "xts-speck64-neon",
.base.cra_priority = 300,
.base.cra_blocksize = SPECK64_BLOCK_SIZE,
.base.cra_ctxsize = sizeof(struct speck64_xts_tfm_ctx),
.base.cra_alignmask = 7,
.base.cra_module = THIS_MODULE,
.min_keysize = 2 * SPECK64_96_KEY_SIZE,
.max_keysize = 2 * SPECK64_128_KEY_SIZE,
.ivsize = SPECK64_BLOCK_SIZE,
.walksize = SPECK_NEON_CHUNK_SIZE,
.setkey = speck64_xts_setkey,
.encrypt = speck64_xts_encrypt,
.decrypt = speck64_xts_decrypt,
}
};
static int __init speck_neon_module_init(void)
{
if (!(elf_hwcap & HWCAP_NEON))
return -ENODEV;
return crypto_register_skciphers(speck_algs, ARRAY_SIZE(speck_algs));
}
static void __exit speck_neon_module_exit(void)
{
crypto_unregister_skciphers(speck_algs, ARRAY_SIZE(speck_algs));
}
module_init(speck_neon_module_init);
module_exit(speck_neon_module_exit);
MODULE_DESCRIPTION("Speck block cipher (NEON-accelerated)");
MODULE_LICENSE("GPL");
MODULE_AUTHOR("Eric Biggers <ebiggers@google.com>");
MODULE_ALIAS_CRYPTO("xts(speck128)");
MODULE_ALIAS_CRYPTO("xts-speck128-neon");
MODULE_ALIAS_CRYPTO("xts(speck64)");
MODULE_ALIAS_CRYPTO("xts-speck64-neon");
......@@ -698,6 +698,7 @@ CONFIG_MEMTEST=y
CONFIG_SECURITY=y
CONFIG_CRYPTO_ECHAINIV=y
CONFIG_CRYPTO_ANSI_CPRNG=y
CONFIG_CRYPTO_DEV_FSL_DPAA2_CAAM=y
CONFIG_ARM64_CRYPTO=y
CONFIG_CRYPTO_SHA1_ARM64_CE=y
CONFIG_CRYPTO_SHA2_ARM64_CE=y
......@@ -706,7 +707,6 @@ CONFIG_CRYPTO_SHA3_ARM64=m
CONFIG_CRYPTO_SM3_ARM64_CE=m
CONFIG_CRYPTO_GHASH_ARM64_CE=y
CONFIG_CRYPTO_CRCT10DIF_ARM64_CE=m
CONFIG_CRYPTO_CRC32_ARM64_CE=m
CONFIG_CRYPTO_AES_ARM64_CE_CCM=y
CONFIG_CRYPTO_AES_ARM64_CE_BLK=y
CONFIG_CRYPTO_CHACHA20_NEON=m
......
......@@ -66,11 +66,6 @@ config CRYPTO_CRCT10DIF_ARM64_CE
depends on KERNEL_MODE_NEON && CRC_T10DIF
select CRYPTO_HASH
config CRYPTO_CRC32_ARM64_CE
tristate "CRC32 and CRC32C digest algorithms using ARMv8 extensions"
depends on CRC32
select CRYPTO_HASH
config CRYPTO_AES_ARM64
tristate "AES core cipher using scalar instructions"
select CRYPTO_AES
......@@ -119,10 +114,4 @@ config CRYPTO_AES_ARM64_BS
select CRYPTO_AES_ARM64
select CRYPTO_SIMD
config CRYPTO_SPECK_NEON
tristate "NEON accelerated Speck cipher algorithms"
depends on KERNEL_MODE_NEON
select CRYPTO_BLKCIPHER
select CRYPTO_SPECK
endif
......@@ -32,9 +32,6 @@ ghash-ce-y := ghash-ce-glue.o ghash-ce-core.o
obj-$(CONFIG_CRYPTO_CRCT10DIF_ARM64_CE) += crct10dif-ce.o
crct10dif-ce-y := crct10dif-ce-core.o crct10dif-ce-glue.o
obj-$(CONFIG_CRYPTO_CRC32_ARM64_CE) += crc32-ce.o
crc32-ce-y:= crc32-ce-core.o crc32-ce-glue.o
obj-$(CONFIG_CRYPTO_AES_ARM64_CE) += aes-ce-cipher.o
aes-ce-cipher-y := aes-ce-core.o aes-ce-glue.o
......@@ -56,9 +53,6 @@ sha512-arm64-y := sha512-glue.o sha512-core.o
obj-$(CONFIG_CRYPTO_CHACHA20_NEON) += chacha20-neon.o
chacha20-neon-y := chacha20-neon-core.o chacha20-neon-glue.o
obj-$(CONFIG_CRYPTO_SPECK_NEON) += speck-neon.o
speck-neon-y := speck-neon-core.o speck-neon-glue.o
obj-$(CONFIG_CRYPTO_AES_ARM64) += aes-arm64.o
aes-arm64-y := aes-cipher-core.o aes-cipher-glue.o
......
......@@ -17,6 +17,11 @@
.arch armv8-a+crypto
xtsmask .req v16
.macro xts_reload_mask, tmp
.endm
/* preload all round keys */
.macro load_round_keys, rounds, rk
cmp \rounds, #12
......
......@@ -15,6 +15,7 @@
#include <crypto/internal/hash.h>
#include <crypto/internal/simd.h>
#include <crypto/internal/skcipher.h>
#include <crypto/scatterwalk.h>
#include <linux/module.h>
#include <linux/cpufeature.h>
#include <crypto/xts.h>
......@@ -31,6 +32,8 @@
#define aes_ecb_decrypt ce_aes_ecb_decrypt
#define aes_cbc_encrypt ce_aes_cbc_encrypt
#define aes_cbc_decrypt ce_aes_cbc_decrypt
#define aes_cbc_cts_encrypt ce_aes_cbc_cts_encrypt
#define aes_cbc_cts_decrypt ce_aes_cbc_cts_decrypt
#define aes_ctr_encrypt ce_aes_ctr_encrypt
#define aes_xts_encrypt ce_aes_xts_encrypt
#define aes_xts_decrypt ce_aes_xts_decrypt
......@@ -45,6 +48,8 @@ MODULE_DESCRIPTION("AES-ECB/CBC/CTR/XTS using ARMv8 Crypto Extensions");
#define aes_ecb_decrypt neon_aes_ecb_decrypt
#define aes_cbc_encrypt neon_aes_cbc_encrypt
#define aes_cbc_decrypt neon_aes_cbc_decrypt
#define aes_cbc_cts_encrypt neon_aes_cbc_cts_encrypt
#define aes_cbc_cts_decrypt neon_aes_cbc_cts_decrypt
#define aes_ctr_encrypt neon_aes_ctr_encrypt
#define aes_xts_encrypt neon_aes_xts_encrypt
#define aes_xts_decrypt neon_aes_xts_decrypt
......@@ -63,30 +68,41 @@ MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
MODULE_LICENSE("GPL v2");
/* defined in aes-modes.S */
asmlinkage void aes_ecb_encrypt(u8 out[], u8 const in[], u8 const rk[],
asmlinkage void aes_ecb_encrypt(u8 out[], u8 const in[], u32 const rk[],
int rounds, int blocks);
asmlinkage void aes_ecb_decrypt(u8 out[], u8 const in[], u8 const rk[],
asmlinkage void aes_ecb_decrypt(u8 out[], u8 const in[], u32 const rk[],
int rounds, int blocks);
asmlinkage void aes_cbc_encrypt(u8 out[], u8 const in[], u8 const rk[],
asmlinkage void aes_cbc_encrypt(u8 out[], u8 const in[], u32 const rk[],
int rounds, int blocks, u8 iv[]);
asmlinkage void aes_cbc_decrypt(u8 out[], u8 const in[], u8 const rk[],
asmlinkage void aes_cbc_decrypt(u8 out[], u8 const in[], u32 const rk[],
int rounds, int blocks, u8 iv[]);
asmlinkage void aes_ctr_encrypt(u8 out[], u8 const in[], u8 const rk[],
asmlinkage void aes_cbc_cts_encrypt(u8 out[], u8 const in[], u32 const rk[],
int rounds, int bytes, u8 const iv[]);
asmlinkage void aes_cbc_cts_decrypt(u8 out[], u8 const in[], u32 const rk[],
int rounds, int bytes, u8 const iv[]);
asmlinkage void aes_ctr_encrypt(u8 out[], u8 const in[], u32 const rk[],
int rounds, int blocks, u8 ctr[]);
asmlinkage void aes_xts_encrypt(u8 out[], u8 const in[], u8 const rk1[],
int rounds, int blocks, u8 const rk2[], u8 iv[],
asmlinkage void aes_xts_encrypt(u8 out[], u8 const in[], u32 const rk1[],
int rounds, int blocks, u32 const rk2[], u8 iv[],
int first);
asmlinkage void aes_xts_decrypt(u8 out[], u8 const in[], u8 const rk1[],
int rounds, int blocks, u8 const rk2[], u8 iv[],
asmlinkage void aes_xts_decrypt(u8 out[], u8 const in[], u32 const rk1[],
int rounds, int blocks, u32 const rk2[], u8 iv[],
int first);
asmlinkage void aes_mac_update(u8 const in[], u32 const rk[], int rounds,
int blocks, u8 dg[], int enc_before,
int enc_after);
struct cts_cbc_req_ctx {
struct scatterlist sg_src[2];
struct scatterlist sg_dst[2];
struct skcipher_request subreq;
};
struct crypto_aes_xts_ctx {
struct crypto_aes_ctx key1;
struct crypto_aes_ctx __aligned(8) key2;
......@@ -142,7 +158,7 @@ static int ecb_encrypt(struct skcipher_request *req)
while ((blocks = (walk.nbytes / AES_BLOCK_SIZE))) {
kernel_neon_begin();
aes_ecb_encrypt(walk.dst.virt.addr, walk.src.virt.addr,
(u8 *)ctx->key_enc, rounds, blocks);
ctx->key_enc, rounds, blocks);
kernel_neon_end();
err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE);
}
......@@ -162,7 +178,7 @@ static int ecb_decrypt(struct skcipher_request *req)
while ((blocks = (walk.nbytes / AES_BLOCK_SIZE))) {
kernel_neon_begin();
aes_ecb_decrypt(walk.dst.virt.addr, walk.src.virt.addr,
(u8 *)ctx->key_dec, rounds, blocks);
ctx->key_dec, rounds, blocks);
kernel_neon_end();
err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE);
}
......@@ -182,7 +198,7 @@ static int cbc_encrypt(struct skcipher_request *req)
while ((blocks = (walk.nbytes / AES_BLOCK_SIZE))) {
kernel_neon_begin();
aes_cbc_encrypt(walk.dst.virt.addr, walk.src.virt.addr,
(u8 *)ctx->key_enc, rounds, blocks, walk.iv);
ctx->key_enc, rounds, blocks, walk.iv);
kernel_neon_end();
err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE);
}
......@@ -202,13 +218,149 @@ static int cbc_decrypt(struct skcipher_request *req)
while ((blocks = (walk.nbytes / AES_BLOCK_SIZE))) {
kernel_neon_begin();
aes_cbc_decrypt(walk.dst.virt.addr, walk.src.virt.addr,
(u8 *)ctx->key_dec, rounds, blocks, walk.iv);
ctx->key_dec, rounds, blocks, walk.iv);
kernel_neon_end();
err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE);
}
return err;
}
static int cts_cbc_init_tfm(struct crypto_skcipher *tfm)
{
crypto_skcipher_set_reqsize(tfm, sizeof(struct cts_cbc_req_ctx));
return 0;
}
static int cts_cbc_encrypt(struct skcipher_request *req)
{
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
struct crypto_aes_ctx *ctx = crypto_skcipher_ctx(tfm);
struct cts_cbc_req_ctx *rctx = skcipher_request_ctx(req);
int err, rounds = 6 + ctx->key_length / 4;
int cbc_blocks = DIV_ROUND_UP(req->cryptlen, AES_BLOCK_SIZE) - 2;
struct scatterlist *src = req->src, *dst = req->dst;
struct skcipher_walk walk;
skcipher_request_set_tfm(&rctx->subreq, tfm);
if (req->cryptlen <= AES_BLOCK_SIZE) {
if (req->cryptlen < AES_BLOCK_SIZE)
return -EINVAL;
cbc_blocks = 1;
}
if (cbc_blocks > 0) {
unsigned int blocks;
skcipher_request_set_crypt(&rctx->subreq, req->src, req->dst,
cbc_blocks * AES_BLOCK_SIZE,
req->iv);
err = skcipher_walk_virt(&walk, &rctx->subreq, false);
while ((blocks = (walk.nbytes / AES_BLOCK_SIZE))) {
kernel_neon_begin();
aes_cbc_encrypt(walk.dst.virt.addr, walk.src.virt.addr,
ctx->key_enc, rounds, blocks, walk.iv);
kernel_neon_end();
err = skcipher_walk_done(&walk,
walk.nbytes % AES_BLOCK_SIZE);
}
if (err)
return err;
if (req->cryptlen == AES_BLOCK_SIZE)
return 0;
dst = src = scatterwalk_ffwd(rctx->sg_src, req->src,
rctx->subreq.cryptlen);
if (req->dst != req->src)
dst = scatterwalk_ffwd(rctx->sg_dst, req->dst,
rctx->subreq.cryptlen);
}
/* handle ciphertext stealing */
skcipher_request_set_crypt(&rctx->subreq, src, dst,
req->cryptlen - cbc_blocks * AES_BLOCK_SIZE,
req->iv);
err = skcipher_walk_virt(&walk, &rctx->subreq, false);
if (err)
return err;
kernel_neon_begin();
aes_cbc_cts_encrypt(walk.dst.virt.addr, walk.src.virt.addr,
ctx->key_enc, rounds, walk.nbytes, walk.iv);
kernel_neon_end();
return skcipher_walk_done(&walk, 0);
}
static int cts_cbc_decrypt(struct skcipher_request *req)
{
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
struct crypto_aes_ctx *ctx = crypto_skcipher_ctx(tfm);
struct cts_cbc_req_ctx *rctx = skcipher_request_ctx(req);
int err, rounds = 6 + ctx->key_length / 4;
int cbc_blocks = DIV_ROUND_UP(req->cryptlen, AES_BLOCK_SIZE) - 2;
struct scatterlist *src = req->src, *dst = req->dst;
struct skcipher_walk walk;
skcipher_request_set_tfm(&rctx->subreq, tfm);
if (req->cryptlen <= AES_BLOCK_SIZE) {
if (req->cryptlen < AES_BLOCK_SIZE)
return -EINVAL;
cbc_blocks = 1;
}
if (cbc_blocks > 0) {
unsigned int blocks;
skcipher_request_set_crypt(&rctx->subreq, req->src, req->dst,
cbc_blocks * AES_BLOCK_SIZE,
req->iv);
err = skcipher_walk_virt(&walk, &rctx->subreq, false);
while ((blocks = (walk.nbytes / AES_BLOCK_SIZE))) {
kernel_neon_begin();
aes_cbc_decrypt(walk.dst.virt.addr, walk.src.virt.addr,
ctx->key_dec, rounds, blocks, walk.iv);
kernel_neon_end();
err = skcipher_walk_done(&walk,
walk.nbytes % AES_BLOCK_SIZE);
}
if (err)
return err;
if (req->cryptlen == AES_BLOCK_SIZE)
return 0;
dst = src = scatterwalk_ffwd(rctx->sg_src, req->src,
rctx->subreq.cryptlen);
if (req->dst != req->src)
dst = scatterwalk_ffwd(rctx->sg_dst, req->dst,
rctx->subreq.cryptlen);
}
/* handle ciphertext stealing */
skcipher_request_set_crypt(&rctx->subreq, src, dst,
req->cryptlen - cbc_blocks * AES_BLOCK_SIZE,
req->iv);
err = skcipher_walk_virt(&walk, &rctx->subreq, false);
if (err)
return err;
kernel_neon_begin();
aes_cbc_cts_decrypt(walk.dst.virt.addr, walk.src.virt.addr,
ctx->key_dec, rounds, walk.nbytes, walk.iv);
kernel_neon_end();
return skcipher_walk_done(&walk, 0);
}
static int ctr_encrypt(struct skcipher_request *req)
{
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
......@@ -222,7 +374,7 @@ static int ctr_encrypt(struct skcipher_request *req)
while ((blocks = (walk.nbytes / AES_BLOCK_SIZE))) {
kernel_neon_begin();
aes_ctr_encrypt(walk.dst.virt.addr, walk.src.virt.addr,
(u8 *)ctx->key_enc, rounds, blocks, walk.iv);
ctx->key_enc, rounds, blocks, walk.iv);
kernel_neon_end();
err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE);
}
......@@ -238,7 +390,7 @@ static int ctr_encrypt(struct skcipher_request *req)
blocks = -1;
kernel_neon_begin();
aes_ctr_encrypt(tail, NULL, (u8 *)ctx->key_enc, rounds,
aes_ctr_encrypt(tail, NULL, ctx->key_enc, rounds,
blocks, walk.iv);
kernel_neon_end();
crypto_xor_cpy(tdst, tsrc, tail, nbytes);
......@@ -272,8 +424,8 @@ static int xts_encrypt(struct skcipher_request *req)
for (first = 1; (blocks = (walk.nbytes / AES_BLOCK_SIZE)); first = 0) {
kernel_neon_begin();
aes_xts_encrypt(walk.dst.virt.addr, walk.src.virt.addr,
(u8 *)ctx->key1.key_enc, rounds, blocks,
(u8 *)ctx->key2.key_enc, walk.iv, first);
ctx->key1.key_enc, rounds, blocks,
ctx->key2.key_enc, walk.iv, first);
kernel_neon_end();
err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE);
}
......@@ -294,8 +446,8 @@ static int xts_decrypt(struct skcipher_request *req)
for (first = 1; (blocks = (walk.nbytes / AES_BLOCK_SIZE)); first = 0) {
kernel_neon_begin();
aes_xts_decrypt(walk.dst.virt.addr, walk.src.virt.addr,
(u8 *)ctx->key1.key_dec, rounds, blocks,
(u8 *)ctx->key2.key_enc, walk.iv, first);
ctx->key1.key_dec, rounds, blocks,
ctx->key2.key_enc, walk.iv, first);
kernel_neon_end();
err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE);
}
......@@ -334,6 +486,24 @@ static struct skcipher_alg aes_algs[] = { {
.setkey = skcipher_aes_setkey,
.encrypt = cbc_encrypt,
.decrypt = cbc_decrypt,
}, {
.base = {
.cra_name = "__cts(cbc(aes))",
.cra_driver_name = "__cts-cbc-aes-" MODE,
.cra_priority = PRIO,
.cra_flags = CRYPTO_ALG_INTERNAL,
.cra_blocksize = AES_BLOCK_SIZE,
.cra_ctxsize = sizeof(struct crypto_aes_ctx),
.cra_module = THIS_MODULE,
},
.min_keysize = AES_MIN_KEY_SIZE,
.max_keysize = AES_MAX_KEY_SIZE,
.ivsize = AES_BLOCK_SIZE,
.walksize = 2 * AES_BLOCK_SIZE,
.setkey = skcipher_aes_setkey,
.encrypt = cts_cbc_encrypt,
.decrypt = cts_cbc_decrypt,
.init = cts_cbc_init_tfm,
}, {
.base = {
.cra_name = "__ctr(aes)",
......@@ -412,7 +582,6 @@ static int cmac_setkey(struct crypto_shash *tfm, const u8 *in_key,
{
struct mac_tfm_ctx *ctx = crypto_shash_ctx(tfm);
be128 *consts = (be128 *)ctx->consts;
u8 *rk = (u8 *)ctx->key.key_enc;
int rounds = 6 + key_len / 4;
int err;
......@@ -422,7 +591,8 @@ static int cmac_setkey(struct crypto_shash *tfm, const u8 *in_key,
/* encrypt the zero vector */
kernel_neon_begin();
aes_ecb_encrypt(ctx->consts, (u8[AES_BLOCK_SIZE]){}, rk, rounds, 1);
aes_ecb_encrypt(ctx->consts, (u8[AES_BLOCK_SIZE]){}, ctx->key.key_enc,
rounds, 1);
kernel_neon_end();
cmac_gf128_mul_by_x(consts, consts);
......@@ -441,7 +611,6 @@ static int xcbc_setkey(struct crypto_shash *tfm, const u8 *in_key,
};
struct mac_tfm_ctx *ctx = crypto_shash_ctx(tfm);
u8 *rk = (u8 *)ctx->key.key_enc;
int rounds = 6 + key_len / 4;
u8 key[AES_BLOCK_SIZE];
int err;
......@@ -451,8 +620,8 @@ static int xcbc_setkey(struct crypto_shash *tfm, const u8 *in_key,
return err;
kernel_neon_begin();
aes_ecb_encrypt(key, ks[0], rk, rounds, 1);
aes_ecb_encrypt(ctx->consts, ks[1], rk, rounds, 2);
aes_ecb_encrypt(key, ks[0], ctx->key.key_enc, rounds, 1);
aes_ecb_encrypt(ctx->consts, ks[1], ctx->key.key_enc, rounds, 2);
kernel_neon_end();
return cbcmac_setkey(tfm, key, sizeof(key));
......
......@@ -14,12 +14,12 @@
.align 4
aes_encrypt_block4x:
encrypt_block4x v0, v1, v2, v3, w22, x21, x8, w7
encrypt_block4x v0, v1, v2, v3, w3, x2, x8, w7
ret
ENDPROC(aes_encrypt_block4x)
aes_decrypt_block4x:
decrypt_block4x v0, v1, v2, v3, w22, x21, x8, w7
decrypt_block4x v0, v1, v2, v3, w3, x2, x8, w7
ret
ENDPROC(aes_decrypt_block4x)
......@@ -31,71 +31,57 @@ ENDPROC(aes_decrypt_block4x)
*/
AES_ENTRY(aes_ecb_encrypt)
frame_push 5
stp x29, x30, [sp, #-16]!
mov x29, sp
mov x19, x0
mov x20, x1
mov x21, x2
mov x22, x3
mov x23, x4
.Lecbencrestart:
enc_prepare w22, x21, x5
enc_prepare w3, x2, x5
.LecbencloopNx:
subs w23, w23, #4
subs w4, w4, #4
bmi .Lecbenc1x
ld1 {v0.16b-v3.16b}, [x20], #64 /* get 4 pt blocks */
ld1 {v0.16b-v3.16b}, [x1], #64 /* get 4 pt blocks */
bl aes_encrypt_block4x
st1 {v0.16b-v3.16b}, [x19], #64
cond_yield_neon .Lecbencrestart
st1 {v0.16b-v3.16b}, [x0], #64
b .LecbencloopNx
.Lecbenc1x:
adds w23, w23, #4
adds w4, w4, #4
beq .Lecbencout
.Lecbencloop:
ld1 {v0.16b}, [x20], #16 /* get next pt block */
encrypt_block v0, w22, x21, x5, w6
st1 {v0.16b}, [x19], #16
subs w23, w23, #1
ld1 {v0.16b}, [x1], #16 /* get next pt block */
encrypt_block v0, w3, x2, x5, w6
st1 {v0.16b}, [x0], #16
subs w4, w4, #1
bne .Lecbencloop
.Lecbencout:
frame_pop
ldp x29, x30, [sp], #16
ret
AES_ENDPROC(aes_ecb_encrypt)
AES_ENTRY(aes_ecb_decrypt)
frame_push 5
stp x29, x30, [sp, #-16]!
mov x29, sp
mov x19, x0
mov x20, x1
mov x21, x2
mov x22, x3
mov x23, x4
.Lecbdecrestart:
dec_prepare w22, x21, x5
dec_prepare w3, x2, x5
.LecbdecloopNx:
subs w23, w23, #4
subs w4, w4, #4
bmi .Lecbdec1x
ld1 {v0.16b-v3.16b}, [x20], #64 /* get 4 ct blocks */
ld1 {v0.16b-v3.16b}, [x1], #64 /* get 4 ct blocks */
bl aes_decrypt_block4x
st1 {v0.16b-v3.16b}, [x19], #64
cond_yield_neon .Lecbdecrestart
st1 {v0.16b-v3.16b}, [x0], #64
b .LecbdecloopNx
.Lecbdec1x:
adds w23, w23, #4
adds w4, w4, #4
beq .Lecbdecout
.Lecbdecloop:
ld1 {v0.16b}, [x20], #16 /* get next ct block */
decrypt_block v0, w22, x21, x5, w6
st1 {v0.16b}, [x19], #16
subs w23, w23, #1
ld1 {v0.16b}, [x1], #16 /* get next ct block */
decrypt_block v0, w3, x2, x5, w6
st1 {v0.16b}, [x0], #16
subs w4, w4, #1
bne .Lecbdecloop
.Lecbdecout:
frame_pop
ldp x29, x30, [sp], #16
ret
AES_ENDPROC(aes_ecb_decrypt)
......@@ -108,162 +94,211 @@ AES_ENDPROC(aes_ecb_decrypt)
*/
AES_ENTRY(aes_cbc_encrypt)
frame_push 6
mov x19, x0
mov x20, x1
mov x21, x2
mov x22, x3
mov x23, x4
mov x24, x5
.Lcbcencrestart:
ld1 {v4.16b}, [x24] /* get iv */
enc_prepare w22, x21, x6
ld1 {v4.16b}, [x5] /* get iv */
enc_prepare w3, x2, x6
.Lcbcencloop4x:
subs w23, w23, #4
subs w4, w4, #4
bmi .Lcbcenc1x
ld1 {v0.16b-v3.16b}, [x20], #64 /* get 4 pt blocks */
ld1 {v0.16b-v3.16b}, [x1], #64 /* get 4 pt blocks */
eor v0.16b, v0.16b, v4.16b /* ..and xor with iv */
encrypt_block v0, w22, x21, x6, w7
encrypt_block v0, w3, x2, x6, w7
eor v1.16b, v1.16b, v0.16b
encrypt_block v1, w22, x21, x6, w7
encrypt_block v1, w3, x2, x6, w7
eor v2.16b, v2.16b, v1.16b
encrypt_block v2, w22, x21, x6, w7
encrypt_block v2, w3, x2, x6, w7
eor v3.16b, v3.16b, v2.16b
encrypt_block v3, w22, x21, x6, w7
st1 {v0.16b-v3.16b}, [x19], #64
encrypt_block v3, w3, x2, x6, w7
st1 {v0.16b-v3.16b}, [x0], #64
mov v4.16b, v3.16b
st1 {v4.16b}, [x24] /* return iv */
cond_yield_neon .Lcbcencrestart
b .Lcbcencloop4x
.Lcbcenc1x:
adds w23, w23, #4
adds w4, w4, #4
beq .Lcbcencout
.Lcbcencloop:
ld1 {v0.16b}, [x20], #16 /* get next pt block */
ld1 {v0.16b}, [x1], #16 /* get next pt block */
eor v4.16b, v4.16b, v0.16b /* ..and xor with iv */
encrypt_block v4, w22, x21, x6, w7
st1 {v4.16b}, [x19], #16
subs w23, w23, #1
encrypt_block v4, w3, x2, x6, w7
st1 {v4.16b}, [x0], #16
subs w4, w4, #1
bne .Lcbcencloop
.Lcbcencout:
st1 {v4.16b}, [x24] /* return iv */
frame_pop
st1 {v4.16b}, [x5] /* return iv */
ret
AES_ENDPROC(aes_cbc_encrypt)
AES_ENTRY(aes_cbc_decrypt)
frame_push 6
mov x19, x0
mov x20, x1
mov x21, x2
mov x22, x3
mov x23, x4
mov x24, x5
stp x29, x30, [sp, #-16]!
mov x29, sp
.Lcbcdecrestart:
ld1 {v7.16b}, [x24] /* get iv */
dec_prepare w22, x21, x6
ld1 {v7.16b}, [x5] /* get iv */
dec_prepare w3, x2, x6
.LcbcdecloopNx:
subs w23, w23, #4
subs w4, w4, #4
bmi .Lcbcdec1x
ld1 {v0.16b-v3.16b}, [x20], #64 /* get 4 ct blocks */
ld1 {v0.16b-v3.16b}, [x1], #64 /* get 4 ct blocks */
mov v4.16b, v0.16b
mov v5.16b, v1.16b
mov v6.16b, v2.16b
bl aes_decrypt_block4x
sub x20, x20, #16
sub x1, x1, #16
eor v0.16b, v0.16b, v7.16b
eor v1.16b, v1.16b, v4.16b
ld1 {v7.16b}, [x20], #16 /* reload 1 ct block */
ld1 {v7.16b}, [x1], #16 /* reload 1 ct block */
eor v2.16b, v2.16b, v5.16b
eor v3.16b, v3.16b, v6.16b
st1 {v0.16b-v3.16b}, [x19], #64
st1 {v7.16b}, [x24] /* return iv */
cond_yield_neon .Lcbcdecrestart
st1 {v0.16b-v3.16b}, [x0], #64
b .LcbcdecloopNx
.Lcbcdec1x:
adds w23, w23, #4
adds w4, w4, #4
beq .Lcbcdecout
.Lcbcdecloop:
ld1 {v1.16b}, [x20], #16 /* get next ct block */
ld1 {v1.16b}, [x1], #16 /* get next ct block */
mov v0.16b, v1.16b /* ...and copy to v0 */
decrypt_block v0, w22, x21, x6, w7
decrypt_block v0, w3, x2, x6, w7
eor v0.16b, v0.16b, v7.16b /* xor with iv => pt */
mov v7.16b, v1.16b /* ct is next iv */
st1 {v0.16b}, [x19], #16
subs w23, w23, #1
st1 {v0.16b}, [x0], #16
subs w4, w4, #1
bne .Lcbcdecloop
.Lcbcdecout:
st1 {v7.16b}, [x24] /* return iv */
frame_pop
st1 {v7.16b}, [x5] /* return iv */
ldp x29, x30, [sp], #16
ret
AES_ENDPROC(aes_cbc_decrypt)
/*
* aes_cbc_cts_encrypt(u8 out[], u8 const in[], u32 const rk[],
* int rounds, int bytes, u8 const iv[])
* aes_cbc_cts_decrypt(u8 out[], u8 const in[], u32 const rk[],
* int rounds, int bytes, u8 const iv[])
*/
AES_ENTRY(aes_cbc_cts_encrypt)
adr_l x8, .Lcts_permute_table
sub x4, x4, #16
add x9, x8, #32
add x8, x8, x4
sub x9, x9, x4
ld1 {v3.16b}, [x8]
ld1 {v4.16b}, [x9]
ld1 {v0.16b}, [x1], x4 /* overlapping loads */
ld1 {v1.16b}, [x1]
ld1 {v5.16b}, [x5] /* get iv */
enc_prepare w3, x2, x6
eor v0.16b, v0.16b, v5.16b /* xor with iv */
tbl v1.16b, {v1.16b}, v4.16b
encrypt_block v0, w3, x2, x6, w7
eor v1.16b, v1.16b, v0.16b
tbl v0.16b, {v0.16b}, v3.16b
encrypt_block v1, w3, x2, x6, w7
add x4, x0, x4
st1 {v0.16b}, [x4] /* overlapping stores */
st1 {v1.16b}, [x0]
ret
AES_ENDPROC(aes_cbc_cts_encrypt)
AES_ENTRY(aes_cbc_cts_decrypt)
adr_l x8, .Lcts_permute_table
sub x4, x4, #16
add x9, x8, #32
add x8, x8, x4
sub x9, x9, x4
ld1 {v3.16b}, [x8]
ld1 {v4.16b}, [x9]
ld1 {v0.16b}, [x1], x4 /* overlapping loads */
ld1 {v1.16b}, [x1]
ld1 {v5.16b}, [x5] /* get iv */
dec_prepare w3, x2, x6
tbl v2.16b, {v1.16b}, v4.16b
decrypt_block v0, w3, x2, x6, w7
eor v2.16b, v2.16b, v0.16b
tbx v0.16b, {v1.16b}, v4.16b
tbl v2.16b, {v2.16b}, v3.16b
decrypt_block v0, w3, x2, x6, w7
eor v0.16b, v0.16b, v5.16b /* xor with iv */
add x4, x0, x4
st1 {v2.16b}, [x4] /* overlapping stores */
st1 {v0.16b}, [x0]
ret
AES_ENDPROC(aes_cbc_cts_decrypt)
.section ".rodata", "a"
.align 6
.Lcts_permute_table:
.byte 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff
.byte 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff
.byte 0x0, 0x1, 0x2, 0x3, 0x4, 0x5, 0x6, 0x7
.byte 0x8, 0x9, 0xa, 0xb, 0xc, 0xd, 0xe, 0xf
.byte 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff
.byte 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff
.previous
/*
* aes_ctr_encrypt(u8 out[], u8 const in[], u8 const rk[], int rounds,
* int blocks, u8 ctr[])
*/
AES_ENTRY(aes_ctr_encrypt)
frame_push 6
stp x29, x30, [sp, #-16]!
mov x29, sp
mov x19, x0
mov x20, x1
mov x21, x2
mov x22, x3
mov x23, x4
mov x24, x5
.Lctrrestart:
enc_prepare w22, x21, x6
ld1 {v4.16b}, [x24]
enc_prepare w3, x2, x6
ld1 {v4.16b}, [x5]
umov x6, v4.d[1] /* keep swabbed ctr in reg */
rev x6, x6
cmn w6, w4 /* 32 bit overflow? */
bcs .Lctrloop
.LctrloopNx:
subs w23, w23, #4
subs w4, w4, #4
bmi .Lctr1x
cmn w6, #4 /* 32 bit overflow? */
bcs .Lctr1x
ldr q8, =0x30000000200000001 /* addends 1,2,3[,0] */
dup v7.4s, w6
add w7, w6, #1
mov v0.16b, v4.16b
add v7.4s, v7.4s, v8.4s
add w8, w6, #2
mov v1.16b, v4.16b
rev32 v8.16b, v7.16b
add w9, w6, #3
mov v2.16b, v4.16b
rev w7, w7
mov v3.16b, v4.16b
mov v1.s[3], v8.s[0]
mov v2.s[3], v8.s[1]
mov v3.s[3], v8.s[2]
ld1 {v5.16b-v7.16b}, [x20], #48 /* get 3 input blocks */
rev w8, w8
mov v1.s[3], w7
rev w9, w9
mov v2.s[3], w8
mov v3.s[3], w9
ld1 {v5.16b-v7.16b}, [x1], #48 /* get 3 input blocks */
bl aes_encrypt_block4x
eor v0.16b, v5.16b, v0.16b
ld1 {v5.16b}, [x20], #16 /* get 1 input block */
ld1 {v5.16b}, [x1], #16 /* get 1 input block */
eor v1.16b, v6.16b, v1.16b
eor v2.16b, v7.16b, v2.16b
eor v3.16b, v5.16b, v3.16b
st1 {v0.16b-v3.16b}, [x19], #64
st1 {v0.16b-v3.16b}, [x0], #64
add x6, x6, #4
rev x7, x6
ins v4.d[1], x7
cbz w23, .Lctrout
st1 {v4.16b}, [x24] /* return next CTR value */
cond_yield_neon .Lctrrestart
cbz w4, .Lctrout
b .LctrloopNx
.Lctr1x:
adds w23, w23, #4
adds w4, w4, #4
beq .Lctrout
.Lctrloop:
mov v0.16b, v4.16b
encrypt_block v0, w22, x21, x8, w7
encrypt_block v0, w3, x2, x8, w7
adds x6, x6, #1 /* increment BE ctr */
rev x7, x6
......@@ -271,22 +306,22 @@ AES_ENTRY(aes_ctr_encrypt)
bcs .Lctrcarry /* overflow? */
.Lctrcarrydone:
subs w23, w23, #1
subs w4, w4, #1
bmi .Lctrtailblock /* blocks <0 means tail block */
ld1 {v3.16b}, [x20], #16
ld1 {v3.16b}, [x1], #16
eor v3.16b, v0.16b, v3.16b
st1 {v3.16b}, [x19], #16
st1 {v3.16b}, [x0], #16
bne .Lctrloop
.Lctrout:
st1 {v4.16b}, [x24] /* return next CTR value */
.Lctrret:
frame_pop
st1 {v4.16b}, [x5] /* return next CTR value */
ldp x29, x30, [sp], #16
ret
.Lctrtailblock:
st1 {v0.16b}, [x19]
b .Lctrret
st1 {v0.16b}, [x0]
ldp x29, x30, [sp], #16
ret
.Lctrcarry:
umov x7, v4.d[0] /* load upper word of ctr */
......@@ -296,7 +331,6 @@ AES_ENTRY(aes_ctr_encrypt)
ins v4.d[0], x7
b .Lctrcarrydone
AES_ENDPROC(aes_ctr_encrypt)
.ltorg
/*
......@@ -306,150 +340,132 @@ AES_ENDPROC(aes_ctr_encrypt)
* int blocks, u8 const rk2[], u8 iv[], int first)
*/
.macro next_tweak, out, in, const, tmp
.macro next_tweak, out, in, tmp
sshr \tmp\().2d, \in\().2d, #63
and \tmp\().16b, \tmp\().16b, \const\().16b
and \tmp\().16b, \tmp\().16b, xtsmask.16b
add \out\().2d, \in\().2d, \in\().2d
ext \tmp\().16b, \tmp\().16b, \tmp\().16b, #8
eor \out\().16b, \out\().16b, \tmp\().16b
.endm
.Lxts_mul_x:
CPU_LE( .quad 1, 0x87 )
CPU_BE( .quad 0x87, 1 )
.macro xts_load_mask, tmp
movi xtsmask.2s, #0x1
movi \tmp\().2s, #0x87
uzp1 xtsmask.4s, xtsmask.4s, \tmp\().4s
.endm
AES_ENTRY(aes_xts_encrypt)
frame_push 6
stp x29, x30, [sp, #-16]!
mov x29, sp
mov x19, x0
mov x20, x1
mov x21, x2
mov x22, x3
mov x23, x4
mov x24, x6
ld1 {v4.16b}, [x24]
ld1 {v4.16b}, [x6]
xts_load_mask v8
cbz w7, .Lxtsencnotfirst
enc_prepare w3, x5, x8
encrypt_block v4, w3, x5, x8, w7 /* first tweak */
enc_switch_key w3, x2, x8
ldr q7, .Lxts_mul_x
b .LxtsencNx
.Lxtsencrestart:
ld1 {v4.16b}, [x24]
.Lxtsencnotfirst:
enc_prepare w22, x21, x8
enc_prepare w3, x2, x8
.LxtsencloopNx:
ldr q7, .Lxts_mul_x
next_tweak v4, v4, v7, v8
next_tweak v4, v4, v8
.LxtsencNx:
subs w23, w23, #4
subs w4, w4, #4
bmi .Lxtsenc1x
ld1 {v0.16b-v3.16b}, [x20], #64 /* get 4 pt blocks */
next_tweak v5, v4, v7, v8
ld1 {v0.16b-v3.16b}, [x1], #64 /* get 4 pt blocks */
next_tweak v5, v4, v8
eor v0.16b, v0.16b, v4.16b
next_tweak v6, v5, v7, v8
next_tweak v6, v5, v8
eor v1.16b, v1.16b, v5.16b
eor v2.16b, v2.16b, v6.16b
next_tweak v7, v6, v7, v8
next_tweak v7, v6, v8
eor v3.16b, v3.16b, v7.16b
bl aes_encrypt_block4x
eor v3.16b, v3.16b, v7.16b
eor v0.16b, v0.16b, v4.16b
eor v1.16b, v1.16b, v5.16b
eor v2.16b, v2.16b, v6.16b
st1 {v0.16b-v3.16b}, [x19], #64
st1 {v0.16b-v3.16b}, [x0], #64
mov v4.16b, v7.16b
cbz w23, .Lxtsencout
st1 {v4.16b}, [x24]
cond_yield_neon .Lxtsencrestart
cbz w4, .Lxtsencout
xts_reload_mask v8
b .LxtsencloopNx
.Lxtsenc1x:
adds w23, w23, #4
adds w4, w4, #4
beq .Lxtsencout
.Lxtsencloop:
ld1 {v1.16b}, [x20], #16
ld1 {v1.16b}, [x1], #16
eor v0.16b, v1.16b, v4.16b
encrypt_block v0, w22, x21, x8, w7
encrypt_block v0, w3, x2, x8, w7
eor v0.16b, v0.16b, v4.16b
st1 {v0.16b}, [x19], #16
subs w23, w23, #1
st1 {v0.16b}, [x0], #16
subs w4, w4, #1
beq .Lxtsencout
next_tweak v4, v4, v7, v8
next_tweak v4, v4, v8
b .Lxtsencloop
.Lxtsencout:
st1 {v4.16b}, [x24]
frame_pop
st1 {v4.16b}, [x6]
ldp x29, x30, [sp], #16
ret
AES_ENDPROC(aes_xts_encrypt)
AES_ENTRY(aes_xts_decrypt)
frame_push 6
stp x29, x30, [sp, #-16]!
mov x29, sp
mov x19, x0
mov x20, x1
mov x21, x2
mov x22, x3
mov x23, x4
mov x24, x6
ld1 {v4.16b}, [x24]
ld1 {v4.16b}, [x6]
xts_load_mask v8
cbz w7, .Lxtsdecnotfirst
enc_prepare w3, x5, x8
encrypt_block v4, w3, x5, x8, w7 /* first tweak */
dec_prepare w3, x2, x8
ldr q7, .Lxts_mul_x
b .LxtsdecNx
.Lxtsdecrestart:
ld1 {v4.16b}, [x24]
.Lxtsdecnotfirst:
dec_prepare w22, x21, x8
dec_prepare w3, x2, x8
.LxtsdecloopNx:
ldr q7, .Lxts_mul_x
next_tweak v4, v4, v7, v8
next_tweak v4, v4, v8
.LxtsdecNx:
subs w23, w23, #4
subs w4, w4, #4
bmi .Lxtsdec1x
ld1 {v0.16b-v3.16b}, [x20], #64 /* get 4 ct blocks */
next_tweak v5, v4, v7, v8
ld1 {v0.16b-v3.16b}, [x1], #64 /* get 4 ct blocks */
next_tweak v5, v4, v8
eor v0.16b, v0.16b, v4.16b
next_tweak v6, v5, v7, v8
next_tweak v6, v5, v8
eor v1.16b, v1.16b, v5.16b
eor v2.16b, v2.16b, v6.16b
next_tweak v7, v6, v7, v8
next_tweak v7, v6, v8
eor v3.16b, v3.16b, v7.16b
bl aes_decrypt_block4x
eor v3.16b, v3.16b, v7.16b
eor v0.16b, v0.16b, v4.16b
eor v1.16b, v1.16b, v5.16b
eor v2.16b, v2.16b, v6.16b
st1 {v0.16b-v3.16b}, [x19], #64
st1 {v0.16b-v3.16b}, [x0], #64
mov v4.16b, v7.16b
cbz w23, .Lxtsdecout
st1 {v4.16b}, [x24]
cond_yield_neon .Lxtsdecrestart
cbz w4, .Lxtsdecout
xts_reload_mask v8
b .LxtsdecloopNx
.Lxtsdec1x:
adds w23, w23, #4
adds w4, w4, #4
beq .Lxtsdecout
.Lxtsdecloop:
ld1 {v1.16b}, [x20], #16
ld1 {v1.16b}, [x1], #16
eor v0.16b, v1.16b, v4.16b
decrypt_block v0, w22, x21, x8, w7
decrypt_block v0, w3, x2, x8, w7
eor v0.16b, v0.16b, v4.16b
st1 {v0.16b}, [x19], #16
subs w23, w23, #1
st1 {v0.16b}, [x0], #16
subs w4, w4, #1
beq .Lxtsdecout
next_tweak v4, v4, v7, v8
next_tweak v4, v4, v8
b .Lxtsdecloop
.Lxtsdecout:
st1 {v4.16b}, [x24]
frame_pop
st1 {v4.16b}, [x6]
ldp x29, x30, [sp], #16
ret
AES_ENDPROC(aes_xts_decrypt)
......
......@@ -14,6 +14,12 @@
#define AES_ENTRY(func) ENTRY(neon_ ## func)
#define AES_ENDPROC(func) ENDPROC(neon_ ## func)
xtsmask .req v7
.macro xts_reload_mask, tmp
xts_load_mask \tmp
.endm
/* multiply by polynomial 'x' in GF(2^8) */
.macro mul_by_x, out, in, temp, const
sshr \temp, \in, #7
......
/*
* Accelerated CRC32(C) using arm64 CRC, NEON and Crypto Extensions instructions
*
* Copyright (C) 2016 Linaro Ltd <ard.biesheuvel@linaro.org>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 as
* published by the Free Software Foundation.
*/
/* GPL HEADER START
*
* DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 only,
* as published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful, but
* WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License version 2 for more details (a copy is included
* in the LICENSE file that accompanied this code).
*
* You should have received a copy of the GNU General Public License
* version 2 along with this program; If not, see http://www.gnu.org/licenses
*
* Please visit http://www.xyratex.com/contact if you need additional
* information or have any questions.
*
* GPL HEADER END
*/
/*
* Copyright 2012 Xyratex Technology Limited
*
* Using hardware provided PCLMULQDQ instruction to accelerate the CRC32
* calculation.
* CRC32 polynomial:0x04c11db7(BE)/0xEDB88320(LE)
* PCLMULQDQ is a new instruction in Intel SSE4.2, the reference can be found
* at:
* http://www.intel.com/products/processor/manuals/
* Intel(R) 64 and IA-32 Architectures Software Developer's Manual
* Volume 2B: Instruction Set Reference, N-Z
*
* Authors: Gregory Prestas <Gregory_Prestas@us.xyratex.com>
* Alexander Boyko <Alexander_Boyko@xyratex.com>
*/
#include <linux/linkage.h>
#include <asm/assembler.h>
.section ".rodata", "a"
.align 6
.cpu generic+crypto+crc
.Lcrc32_constants:
/*
* [x4*128+32 mod P(x) << 32)]' << 1 = 0x154442bd4
* #define CONSTANT_R1 0x154442bd4LL
*
* [(x4*128-32 mod P(x) << 32)]' << 1 = 0x1c6e41596
* #define CONSTANT_R2 0x1c6e41596LL
*/
.octa 0x00000001c6e415960000000154442bd4
/*
* [(x128+32 mod P(x) << 32)]' << 1 = 0x1751997d0
* #define CONSTANT_R3 0x1751997d0LL
*
* [(x128-32 mod P(x) << 32)]' << 1 = 0x0ccaa009e
* #define CONSTANT_R4 0x0ccaa009eLL
*/
.octa 0x00000000ccaa009e00000001751997d0
/*
* [(x64 mod P(x) << 32)]' << 1 = 0x163cd6124
* #define CONSTANT_R5 0x163cd6124LL
*/
.quad 0x0000000163cd6124
.quad 0x00000000FFFFFFFF
/*
* #define CRCPOLY_TRUE_LE_FULL 0x1DB710641LL
*
* Barrett Reduction constant (u64`) = u` = (x**64 / P(x))`
* = 0x1F7011641LL
* #define CONSTANT_RU 0x1F7011641LL
*/
.octa 0x00000001F701164100000001DB710641
.Lcrc32c_constants:
.octa 0x000000009e4addf800000000740eef02
.octa 0x000000014cd00bd600000000f20c0dfe
.quad 0x00000000dd45aab8
.quad 0x00000000FFFFFFFF
.octa 0x00000000dea713f10000000105ec76f0
vCONSTANT .req v0
dCONSTANT .req d0
qCONSTANT .req q0
BUF .req x19
LEN .req x20
CRC .req x21
CONST .req x22
vzr .req v9
/**
* Calculate crc32
* BUF - buffer
* LEN - sizeof buffer (multiple of 16 bytes), LEN should be > 63
* CRC - initial crc32
* return %eax crc32
* uint crc32_pmull_le(unsigned char const *buffer,
* size_t len, uint crc32)
*/
.text
ENTRY(crc32_pmull_le)
adr_l x3, .Lcrc32_constants
b 0f
ENTRY(crc32c_pmull_le)
adr_l x3, .Lcrc32c_constants
0: frame_push 4, 64
mov BUF, x0
mov LEN, x1
mov CRC, x2
mov CONST, x3
bic LEN, LEN, #15
ld1 {v1.16b-v4.16b}, [BUF], #0x40
movi vzr.16b, #0
fmov dCONSTANT, CRC
eor v1.16b, v1.16b, vCONSTANT.16b
sub LEN, LEN, #0x40
cmp LEN, #0x40
b.lt less_64
ldr qCONSTANT, [CONST]
loop_64: /* 64 bytes Full cache line folding */
sub LEN, LEN, #0x40
pmull2 v5.1q, v1.2d, vCONSTANT.2d
pmull2 v6.1q, v2.2d, vCONSTANT.2d
pmull2 v7.1q, v3.2d, vCONSTANT.2d
pmull2 v8.1q, v4.2d, vCONSTANT.2d
pmull v1.1q, v1.1d, vCONSTANT.1d
pmull v2.1q, v2.1d, vCONSTANT.1d
pmull v3.1q, v3.1d, vCONSTANT.1d
pmull v4.1q, v4.1d, vCONSTANT.1d
eor v1.16b, v1.16b, v5.16b
ld1 {v5.16b}, [BUF], #0x10
eor v2.16b, v2.16b, v6.16b
ld1 {v6.16b}, [BUF], #0x10
eor v3.16b, v3.16b, v7.16b
ld1 {v7.16b}, [BUF], #0x10
eor v4.16b, v4.16b, v8.16b
ld1 {v8.16b}, [BUF], #0x10
eor v1.16b, v1.16b, v5.16b
eor v2.16b, v2.16b, v6.16b
eor v3.16b, v3.16b, v7.16b
eor v4.16b, v4.16b, v8.16b
cmp LEN, #0x40
b.lt less_64
if_will_cond_yield_neon
stp q1, q2, [sp, #.Lframe_local_offset]
stp q3, q4, [sp, #.Lframe_local_offset + 32]
do_cond_yield_neon
ldp q1, q2, [sp, #.Lframe_local_offset]
ldp q3, q4, [sp, #.Lframe_local_offset + 32]
ldr qCONSTANT, [CONST]
movi vzr.16b, #0
endif_yield_neon
b loop_64
less_64: /* Folding cache line into 128bit */
ldr qCONSTANT, [CONST, #16]
pmull2 v5.1q, v1.2d, vCONSTANT.2d
pmull v1.1q, v1.1d, vCONSTANT.1d
eor v1.16b, v1.16b, v5.16b
eor v1.16b, v1.16b, v2.16b
pmull2 v5.1q, v1.2d, vCONSTANT.2d
pmull v1.1q, v1.1d, vCONSTANT.1d
eor v1.16b, v1.16b, v5.16b
eor v1.16b, v1.16b, v3.16b
pmull2 v5.1q, v1.2d, vCONSTANT.2d
pmull v1.1q, v1.1d, vCONSTANT.1d
eor v1.16b, v1.16b, v5.16b
eor v1.16b, v1.16b, v4.16b
cbz LEN, fold_64
loop_16: /* Folding rest buffer into 128bit */
subs LEN, LEN, #0x10
ld1 {v2.16b}, [BUF], #0x10
pmull2 v5.1q, v1.2d, vCONSTANT.2d
pmull v1.1q, v1.1d, vCONSTANT.1d
eor v1.16b, v1.16b, v5.16b
eor v1.16b, v1.16b, v2.16b
b.ne loop_16
fold_64:
/* perform the last 64 bit fold, also adds 32 zeroes
* to the input stream */
ext v2.16b, v1.16b, v1.16b, #8
pmull2 v2.1q, v2.2d, vCONSTANT.2d
ext v1.16b, v1.16b, vzr.16b, #8
eor v1.16b, v1.16b, v2.16b
/* final 32-bit fold */
ldr dCONSTANT, [CONST, #32]
ldr d3, [CONST, #40]
ext v2.16b, v1.16b, vzr.16b, #4
and v1.16b, v1.16b, v3.16b
pmull v1.1q, v1.1d, vCONSTANT.1d
eor v1.16b, v1.16b, v2.16b
/* Finish up with the bit-reversed barrett reduction 64 ==> 32 bits */
ldr qCONSTANT, [CONST, #48]
and v2.16b, v1.16b, v3.16b
ext v2.16b, vzr.16b, v2.16b, #8
pmull2 v2.1q, v2.2d, vCONSTANT.2d
and v2.16b, v2.16b, v3.16b
pmull v2.1q, v2.1d, vCONSTANT.1d
eor v1.16b, v1.16b, v2.16b
mov w0, v1.s[1]
frame_pop
ret
ENDPROC(crc32_pmull_le)
ENDPROC(crc32c_pmull_le)
.macro __crc32, c
0: subs x2, x2, #16
b.mi 8f
ldp x3, x4, [x1], #16
CPU_BE( rev x3, x3 )
CPU_BE( rev x4, x4 )
crc32\c\()x w0, w0, x3
crc32\c\()x w0, w0, x4
b.ne 0b
ret
8: tbz x2, #3, 4f
ldr x3, [x1], #8
CPU_BE( rev x3, x3 )
crc32\c\()x w0, w0, x3
4: tbz x2, #2, 2f
ldr w3, [x1], #4
CPU_BE( rev w3, w3 )
crc32\c\()w w0, w0, w3
2: tbz x2, #1, 1f
ldrh w3, [x1], #2
CPU_BE( rev16 w3, w3 )
crc32\c\()h w0, w0, w3
1: tbz x2, #0, 0f
ldrb w3, [x1]
crc32\c\()b w0, w0, w3
0: ret
.endm
.align 5
ENTRY(crc32_armv8_le)
__crc32
ENDPROC(crc32_armv8_le)
.align 5
ENTRY(crc32c_armv8_le)
__crc32 c
ENDPROC(crc32c_armv8_le)
/*
* Accelerated CRC32(C) using arm64 NEON and Crypto Extensions instructions
*
* Copyright (C) 2016 - 2017 Linaro Ltd <ard.biesheuvel@linaro.org>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 as
* published by the Free Software Foundation.
*/
#include <linux/cpufeature.h>
#include <linux/crc32.h>
#include <linux/init.h>
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/string.h>
#include <crypto/internal/hash.h>
#include <asm/hwcap.h>
#include <asm/neon.h>
#include <asm/simd.h>
#include <asm/unaligned.h>
#define PMULL_MIN_LEN 64L /* minimum size of buffer
* for crc32_pmull_le_16 */
#define SCALE_F 16L /* size of NEON register */
asmlinkage u32 crc32_pmull_le(const u8 buf[], u64 len, u32 init_crc);
asmlinkage u32 crc32_armv8_le(u32 init_crc, const u8 buf[], size_t len);
asmlinkage u32 crc32c_pmull_le(const u8 buf[], u64 len, u32 init_crc);
asmlinkage u32 crc32c_armv8_le(u32 init_crc, const u8 buf[], size_t len);
static u32 (*fallback_crc32)(u32 init_crc, const u8 buf[], size_t len);
static u32 (*fallback_crc32c)(u32 init_crc, const u8 buf[], size_t len);
static int crc32_pmull_cra_init(struct crypto_tfm *tfm)
{
u32 *key = crypto_tfm_ctx(tfm);
*key = 0;
return 0;
}
static int crc32c_pmull_cra_init(struct crypto_tfm *tfm)
{
u32 *key = crypto_tfm_ctx(tfm);
*key = ~0;
return 0;
}
static int crc32_pmull_setkey(struct crypto_shash *hash, const u8 *key,
unsigned int keylen)
{
u32 *mctx = crypto_shash_ctx(hash);
if (keylen != sizeof(u32)) {
crypto_shash_set_flags(hash, CRYPTO_TFM_RES_BAD_KEY_LEN);
return -EINVAL;
}
*mctx = le32_to_cpup((__le32 *)key);
return 0;
}
static int crc32_pmull_init(struct shash_desc *desc)
{
u32 *mctx = crypto_shash_ctx(desc->tfm);
u32 *crc = shash_desc_ctx(desc);
*crc = *mctx;
return 0;
}
static int crc32_update(struct shash_desc *desc, const u8 *data,
unsigned int length)
{
u32 *crc = shash_desc_ctx(desc);
*crc = crc32_armv8_le(*crc, data, length);
return 0;
}
static int crc32c_update(struct shash_desc *desc, const u8 *data,
unsigned int length)
{
u32 *crc = shash_desc_ctx(desc);
*crc = crc32c_armv8_le(*crc, data, length);
return 0;
}
static int crc32_pmull_update(struct shash_desc *desc, const u8 *data,
unsigned int length)
{
u32 *crc = shash_desc_ctx(desc);
unsigned int l;
if ((u64)data % SCALE_F) {
l = min_t(u32, length, SCALE_F - ((u64)data % SCALE_F));
*crc = fallback_crc32(*crc, data, l);
data += l;
length -= l;
}
if (length >= PMULL_MIN_LEN && may_use_simd()) {
l = round_down(length, SCALE_F);
kernel_neon_begin();
*crc = crc32_pmull_le(data, l, *crc);
kernel_neon_end();
data += l;
length -= l;
}
if (length > 0)
*crc = fallback_crc32(*crc, data, length);
return 0;
}
static int crc32c_pmull_update(struct shash_desc *desc, const u8 *data,
unsigned int length)
{
u32 *crc = shash_desc_ctx(desc);
unsigned int l;
if ((u64)data % SCALE_F) {
l = min_t(u32, length, SCALE_F - ((u64)data % SCALE_F));
*crc = fallback_crc32c(*crc, data, l);
data += l;
length -= l;
}
if (length >= PMULL_MIN_LEN && may_use_simd()) {
l = round_down(length, SCALE_F);
kernel_neon_begin();
*crc = crc32c_pmull_le(data, l, *crc);
kernel_neon_end();
data += l;
length -= l;
}
if (length > 0) {
*crc = fallback_crc32c(*crc, data, length);
}
return 0;
}
static int crc32_pmull_final(struct shash_desc *desc, u8 *out)
{
u32 *crc = shash_desc_ctx(desc);
put_unaligned_le32(*crc, out);
return 0;
}
static int crc32c_pmull_final(struct shash_desc *desc, u8 *out)
{
u32 *crc = shash_desc_ctx(desc);
put_unaligned_le32(~*crc, out);
return 0;
}
static struct shash_alg crc32_pmull_algs[] = { {
.setkey = crc32_pmull_setkey,
.init = crc32_pmull_init,
.update = crc32_update,
.final = crc32_pmull_final,
.descsize = sizeof(u32),
.digestsize = sizeof(u32),
.base.cra_ctxsize = sizeof(u32),
.base.cra_init = crc32_pmull_cra_init,
.base.cra_name = "crc32",
.base.cra_driver_name = "crc32-arm64-ce",
.base.cra_priority = 200,
.base.cra_flags = CRYPTO_ALG_OPTIONAL_KEY,
.base.cra_blocksize = 1,
.base.cra_module = THIS_MODULE,
}, {
.setkey = crc32_pmull_setkey,
.init = crc32_pmull_init,
.update = crc32c_update,
.final = crc32c_pmull_final,
.descsize = sizeof(u32),
.digestsize = sizeof(u32),
.base.cra_ctxsize = sizeof(u32),
.base.cra_init = crc32c_pmull_cra_init,
.base.cra_name = "crc32c",
.base.cra_driver_name = "crc32c-arm64-ce",
.base.cra_priority = 200,
.base.cra_flags = CRYPTO_ALG_OPTIONAL_KEY,
.base.cra_blocksize = 1,
.base.cra_module = THIS_MODULE,
} };
static int __init crc32_pmull_mod_init(void)
{
if (IS_ENABLED(CONFIG_KERNEL_MODE_NEON) && (elf_hwcap & HWCAP_PMULL)) {
crc32_pmull_algs[0].update = crc32_pmull_update;
crc32_pmull_algs[1].update = crc32c_pmull_update;
if (elf_hwcap & HWCAP_CRC32) {
fallback_crc32 = crc32_armv8_le;
fallback_crc32c = crc32c_armv8_le;
} else {
fallback_crc32 = crc32_le;
fallback_crc32c = __crc32c_le;
}
} else if (!(elf_hwcap & HWCAP_CRC32)) {
return -ENODEV;
}
return crypto_register_shashes(crc32_pmull_algs,
ARRAY_SIZE(crc32_pmull_algs));
}
static void __exit crc32_pmull_mod_exit(void)
{
crypto_unregister_shashes(crc32_pmull_algs,
ARRAY_SIZE(crc32_pmull_algs));
}
static const struct cpu_feature crc32_cpu_feature[] = {
{ cpu_feature(CRC32) }, { cpu_feature(PMULL) }, { }
};
MODULE_DEVICE_TABLE(cpu, crc32_cpu_feature);
module_init(crc32_pmull_mod_init);
module_exit(crc32_pmull_mod_exit);
MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
MODULE_LICENSE("GPL v2");
......@@ -80,7 +80,186 @@
vzr .req v13
ENTRY(crc_t10dif_pmull)
ad .req v14
bd .req v10
k00_16 .req v15
k32_48 .req v16
t3 .req v17
t4 .req v18
t5 .req v19
t6 .req v20
t7 .req v21
t8 .req v22
t9 .req v23
perm1 .req v24
perm2 .req v25
perm3 .req v26
perm4 .req v27
bd1 .req v28
bd2 .req v29
bd3 .req v30
bd4 .req v31
.macro __pmull_init_p64
.endm
.macro __pmull_pre_p64, bd
.endm
.macro __pmull_init_p8
// k00_16 := 0x0000000000000000_000000000000ffff
// k32_48 := 0x00000000ffffffff_0000ffffffffffff
movi k32_48.2d, #0xffffffff
mov k32_48.h[2], k32_48.h[0]
ushr k00_16.2d, k32_48.2d, #32
// prepare the permutation vectors
mov_q x5, 0x080f0e0d0c0b0a09
movi perm4.8b, #8
dup perm1.2d, x5
eor perm1.16b, perm1.16b, perm4.16b
ushr perm2.2d, perm1.2d, #8
ushr perm3.2d, perm1.2d, #16
ushr perm4.2d, perm1.2d, #24
sli perm2.2d, perm1.2d, #56
sli perm3.2d, perm1.2d, #48
sli perm4.2d, perm1.2d, #40
.endm
.macro __pmull_pre_p8, bd
tbl bd1.16b, {\bd\().16b}, perm1.16b
tbl bd2.16b, {\bd\().16b}, perm2.16b
tbl bd3.16b, {\bd\().16b}, perm3.16b
tbl bd4.16b, {\bd\().16b}, perm4.16b
.endm
__pmull_p8_core:
.L__pmull_p8_core:
ext t4.8b, ad.8b, ad.8b, #1 // A1
ext t5.8b, ad.8b, ad.8b, #2 // A2
ext t6.8b, ad.8b, ad.8b, #3 // A3
pmull t4.8h, t4.8b, bd.8b // F = A1*B
pmull t8.8h, ad.8b, bd1.8b // E = A*B1
pmull t5.8h, t5.8b, bd.8b // H = A2*B
pmull t7.8h, ad.8b, bd2.8b // G = A*B2
pmull t6.8h, t6.8b, bd.8b // J = A3*B
pmull t9.8h, ad.8b, bd3.8b // I = A*B3
pmull t3.8h, ad.8b, bd4.8b // K = A*B4
b 0f
.L__pmull_p8_core2:
tbl t4.16b, {ad.16b}, perm1.16b // A1
tbl t5.16b, {ad.16b}, perm2.16b // A2
tbl t6.16b, {ad.16b}, perm3.16b // A3
pmull2 t4.8h, t4.16b, bd.16b // F = A1*B
pmull2 t8.8h, ad.16b, bd1.16b // E = A*B1
pmull2 t5.8h, t5.16b, bd.16b // H = A2*B
pmull2 t7.8h, ad.16b, bd2.16b // G = A*B2
pmull2 t6.8h, t6.16b, bd.16b // J = A3*B
pmull2 t9.8h, ad.16b, bd3.16b // I = A*B3
pmull2 t3.8h, ad.16b, bd4.16b // K = A*B4
0: eor t4.16b, t4.16b, t8.16b // L = E + F
eor t5.16b, t5.16b, t7.16b // M = G + H
eor t6.16b, t6.16b, t9.16b // N = I + J
uzp1 t8.2d, t4.2d, t5.2d
uzp2 t4.2d, t4.2d, t5.2d
uzp1 t7.2d, t6.2d, t3.2d
uzp2 t6.2d, t6.2d, t3.2d
// t4 = (L) (P0 + P1) << 8
// t5 = (M) (P2 + P3) << 16
eor t8.16b, t8.16b, t4.16b
and t4.16b, t4.16b, k32_48.16b
// t6 = (N) (P4 + P5) << 24
// t7 = (K) (P6 + P7) << 32
eor t7.16b, t7.16b, t6.16b
and t6.16b, t6.16b, k00_16.16b
eor t8.16b, t8.16b, t4.16b
eor t7.16b, t7.16b, t6.16b
zip2 t5.2d, t8.2d, t4.2d
zip1 t4.2d, t8.2d, t4.2d
zip2 t3.2d, t7.2d, t6.2d
zip1 t6.2d, t7.2d, t6.2d
ext t4.16b, t4.16b, t4.16b, #15
ext t5.16b, t5.16b, t5.16b, #14
ext t6.16b, t6.16b, t6.16b, #13
ext t3.16b, t3.16b, t3.16b, #12
eor t4.16b, t4.16b, t5.16b
eor t6.16b, t6.16b, t3.16b
ret
ENDPROC(__pmull_p8_core)
.macro __pmull_p8, rq, ad, bd, i
.ifnc \bd, v10
.err
.endif
mov ad.16b, \ad\().16b
.ifb \i
pmull \rq\().8h, \ad\().8b, bd.8b // D = A*B
.else
pmull2 \rq\().8h, \ad\().16b, bd.16b // D = A*B
.endif
bl .L__pmull_p8_core\i
eor \rq\().16b, \rq\().16b, t4.16b
eor \rq\().16b, \rq\().16b, t6.16b
.endm
.macro fold64, p, reg1, reg2
ldp q11, q12, [arg2], #0x20
__pmull_\p v8, \reg1, v10, 2
__pmull_\p \reg1, \reg1, v10
CPU_LE( rev64 v11.16b, v11.16b )
CPU_LE( rev64 v12.16b, v12.16b )
__pmull_\p v9, \reg2, v10, 2
__pmull_\p \reg2, \reg2, v10
CPU_LE( ext v11.16b, v11.16b, v11.16b, #8 )
CPU_LE( ext v12.16b, v12.16b, v12.16b, #8 )
eor \reg1\().16b, \reg1\().16b, v8.16b
eor \reg2\().16b, \reg2\().16b, v9.16b
eor \reg1\().16b, \reg1\().16b, v11.16b
eor \reg2\().16b, \reg2\().16b, v12.16b
.endm
.macro fold16, p, reg, rk
__pmull_\p v8, \reg, v10
__pmull_\p \reg, \reg, v10, 2
.ifnb \rk
ldr_l q10, \rk, x8
__pmull_pre_\p v10
.endif
eor v7.16b, v7.16b, v8.16b
eor v7.16b, v7.16b, \reg\().16b
.endm
.macro __pmull_p64, rd, rn, rm, n
.ifb \n
pmull \rd\().1q, \rn\().1d, \rm\().1d
.else
pmull2 \rd\().1q, \rn\().2d, \rm\().2d
.endif
.endm
.macro crc_t10dif_pmull, p
frame_push 3, 128
mov arg1_low32, w0
......@@ -89,6 +268,8 @@ ENTRY(crc_t10dif_pmull)
movi vzr.16b, #0 // init zero register
__pmull_init_\p
// adjust the 16-bit initial_crc value, scale it to 32 bits
lsl arg1_low32, arg1_low32, #16
......@@ -96,7 +277,7 @@ ENTRY(crc_t10dif_pmull)
cmp arg3, #256
// for sizes less than 128, we can't fold 64B at a time...
b.lt _less_than_128
b.lt .L_less_than_128_\@
// load the initial crc value
// crc value does not need to be byte-reflected, but it needs
......@@ -137,6 +318,7 @@ CPU_LE( ext v7.16b, v7.16b, v7.16b, #8 )
ldr_l q10, rk3, x8 // xmm10 has rk3 and rk4
// type of pmull instruction
// will determine which constant to use
__pmull_pre_\p v10
//
// we subtract 256 instead of 128 to save one instruction from the loop
......@@ -147,41 +329,19 @@ CPU_LE( ext v7.16b, v7.16b, v7.16b, #8 )
// buffer. The _fold_64_B_loop will fold 64B at a time
// until we have 64+y Bytes of buffer
// fold 64B at a time. This section of the code folds 4 vector
// registers in parallel
_fold_64_B_loop:
.L_fold_64_B_loop_\@:
.macro fold64, reg1, reg2
ldp q11, q12, [arg2], #0x20
pmull2 v8.1q, \reg1\().2d, v10.2d
pmull \reg1\().1q, \reg1\().1d, v10.1d
CPU_LE( rev64 v11.16b, v11.16b )
CPU_LE( rev64 v12.16b, v12.16b )
pmull2 v9.1q, \reg2\().2d, v10.2d
pmull \reg2\().1q, \reg2\().1d, v10.1d
CPU_LE( ext v11.16b, v11.16b, v11.16b, #8 )
CPU_LE( ext v12.16b, v12.16b, v12.16b, #8 )
eor \reg1\().16b, \reg1\().16b, v8.16b
eor \reg2\().16b, \reg2\().16b, v9.16b
eor \reg1\().16b, \reg1\().16b, v11.16b
eor \reg2\().16b, \reg2\().16b, v12.16b
.endm
fold64 v0, v1
fold64 v2, v3
fold64 v4, v5
fold64 v6, v7
fold64 \p, v0, v1
fold64 \p, v2, v3
fold64 \p, v4, v5
fold64 \p, v6, v7
subs arg3, arg3, #128
// check if there is another 64B in the buffer to be able to fold
b.lt _fold_64_B_end
b.lt .L_fold_64_B_end_\@
if_will_cond_yield_neon
stp q0, q1, [sp, #.Lframe_local_offset]
......@@ -195,11 +355,13 @@ CPU_LE( ext v12.16b, v12.16b, v12.16b, #8 )
ldp q6, q7, [sp, #.Lframe_local_offset + 96]
ldr_l q10, rk3, x8
movi vzr.16b, #0 // init zero register
__pmull_init_\p
__pmull_pre_\p v10
endif_yield_neon
b _fold_64_B_loop
b .L_fold_64_B_loop_\@
_fold_64_B_end:
.L_fold_64_B_end_\@:
// at this point, the buffer pointer is pointing at the last y Bytes
// of the buffer the 64B of folded data is in 4 of the vector
// registers: v0, v1, v2, v3
......@@ -208,38 +370,29 @@ _fold_64_B_end:
// constants
ldr_l q10, rk9, x8
__pmull_pre_\p v10
.macro fold16, reg, rk
pmull v8.1q, \reg\().1d, v10.1d
pmull2 \reg\().1q, \reg\().2d, v10.2d
.ifnb \rk
ldr_l q10, \rk, x8
.endif
eor v7.16b, v7.16b, v8.16b
eor v7.16b, v7.16b, \reg\().16b
.endm
fold16 v0, rk11
fold16 v1, rk13
fold16 v2, rk15
fold16 v3, rk17
fold16 v4, rk19
fold16 v5, rk1
fold16 v6
fold16 \p, v0, rk11
fold16 \p, v1, rk13
fold16 \p, v2, rk15
fold16 \p, v3, rk17
fold16 \p, v4, rk19
fold16 \p, v5, rk1
fold16 \p, v6
// instead of 64, we add 48 to the loop counter to save 1 instruction
// from the loop instead of a cmp instruction, we use the negative
// flag with the jl instruction
adds arg3, arg3, #(128-16)
b.lt _final_reduction_for_128
b.lt .L_final_reduction_for_128_\@
// now we have 16+y bytes left to reduce. 16 Bytes is in register v7
// and the rest is in memory. We can fold 16 bytes at a time if y>=16
// continue folding 16B at a time
_16B_reduction_loop:
pmull v8.1q, v7.1d, v10.1d
pmull2 v7.1q, v7.2d, v10.2d
.L_16B_reduction_loop_\@:
__pmull_\p v8, v7, v10
__pmull_\p v7, v7, v10, 2
eor v7.16b, v7.16b, v8.16b
ldr q0, [arg2], #16
......@@ -251,22 +404,22 @@ CPU_LE( ext v0.16b, v0.16b, v0.16b, #8 )
// instead of a cmp instruction, we utilize the flags with the
// jge instruction equivalent of: cmp arg3, 16-16
// check if there is any more 16B in the buffer to be able to fold
b.ge _16B_reduction_loop
b.ge .L_16B_reduction_loop_\@
// now we have 16+z bytes left to reduce, where 0<= z < 16.
// first, we reduce the data in the xmm7 register
_final_reduction_for_128:
.L_final_reduction_for_128_\@:
// check if any more data to fold. If not, compute the CRC of
// the final 128 bits
adds arg3, arg3, #16
b.eq _128_done
b.eq .L_128_done_\@
// here we are getting data that is less than 16 bytes.
// since we know that there was data before the pointer, we can
// offset the input pointer before the actual point, to receive
// exactly 16 bytes. after that the registers need to be adjusted.
_get_last_two_regs:
.L_get_last_two_regs_\@:
add arg2, arg2, arg3
ldr q1, [arg2, #-16]
CPU_LE( rev64 v1.16b, v1.16b )
......@@ -291,47 +444,48 @@ CPU_LE( ext v1.16b, v1.16b, v1.16b, #8 )
bsl v0.16b, v2.16b, v1.16b
// fold 16 Bytes
pmull v8.1q, v7.1d, v10.1d
pmull2 v7.1q, v7.2d, v10.2d
__pmull_\p v8, v7, v10
__pmull_\p v7, v7, v10, 2
eor v7.16b, v7.16b, v8.16b
eor v7.16b, v7.16b, v0.16b
_128_done:
.L_128_done_\@:
// compute crc of a 128-bit value
ldr_l q10, rk5, x8 // rk5 and rk6 in xmm10
__pmull_pre_\p v10
// 64b fold
ext v0.16b, vzr.16b, v7.16b, #8
mov v7.d[0], v7.d[1]
pmull v7.1q, v7.1d, v10.1d
__pmull_\p v7, v7, v10
eor v7.16b, v7.16b, v0.16b
// 32b fold
ext v0.16b, v7.16b, vzr.16b, #4
mov v7.s[3], vzr.s[0]
pmull2 v0.1q, v0.2d, v10.2d
__pmull_\p v0, v0, v10, 2
eor v7.16b, v7.16b, v0.16b
// barrett reduction
_barrett:
ldr_l q10, rk7, x8
__pmull_pre_\p v10
mov v0.d[0], v7.d[1]
pmull v0.1q, v0.1d, v10.1d
__pmull_\p v0, v0, v10
ext v0.16b, vzr.16b, v0.16b, #12
pmull2 v0.1q, v0.2d, v10.2d
__pmull_\p v0, v0, v10, 2
ext v0.16b, vzr.16b, v0.16b, #12
eor v7.16b, v7.16b, v0.16b
mov w0, v7.s[1]
_cleanup:
.L_cleanup_\@:
// scale the result back to 16 bits
lsr x0, x0, #16
frame_pop
ret
_less_than_128:
cbz arg3, _cleanup
.L_less_than_128_\@:
cbz arg3, .L_cleanup_\@
movi v0.16b, #0
mov v0.s[3], arg1_low32 // get the initial crc value
......@@ -342,20 +496,21 @@ CPU_LE( ext v7.16b, v7.16b, v7.16b, #8 )
eor v7.16b, v7.16b, v0.16b // xor the initial crc value
cmp arg3, #16
b.eq _128_done // exactly 16 left
b.lt _less_than_16_left
b.eq .L_128_done_\@ // exactly 16 left
b.lt .L_less_than_16_left_\@
ldr_l q10, rk1, x8 // rk1 and rk2 in xmm10
__pmull_pre_\p v10
// update the counter. subtract 32 instead of 16 to save one
// instruction from the loop
subs arg3, arg3, #32
b.ge _16B_reduction_loop
b.ge .L_16B_reduction_loop_\@
add arg3, arg3, #16
b _get_last_two_regs
b .L_get_last_two_regs_\@
_less_than_16_left:
.L_less_than_16_left_\@:
// shl r9, 4
adr_l x0, tbl_shf_table + 16
sub x0, x0, arg3
......@@ -363,8 +518,17 @@ _less_than_16_left:
movi v9.16b, #0x80
eor v0.16b, v0.16b, v9.16b
tbl v7.16b, {v7.16b}, v0.16b
b _128_done
ENDPROC(crc_t10dif_pmull)
b .L_128_done_\@
.endm
ENTRY(crc_t10dif_pmull_p8)
crc_t10dif_pmull p8
ENDPROC(crc_t10dif_pmull_p8)
.align 5
ENTRY(crc_t10dif_pmull_p64)
crc_t10dif_pmull p64
ENDPROC(crc_t10dif_pmull_p64)
// precomputed constants
// these constants are precomputed from the poly:
......
......@@ -22,7 +22,10 @@
#define CRC_T10DIF_PMULL_CHUNK_SIZE 16U
asmlinkage u16 crc_t10dif_pmull(u16 init_crc, const u8 buf[], u64 len);
asmlinkage u16 crc_t10dif_pmull_p64(u16 init_crc, const u8 buf[], u64 len);
asmlinkage u16 crc_t10dif_pmull_p8(u16 init_crc, const u8 buf[], u64 len);
static u16 (*crc_t10dif_pmull)(u16 init_crc, const u8 buf[], u64 len);
static int crct10dif_init(struct shash_desc *desc)
{
......@@ -85,6 +88,11 @@ static struct shash_alg crc_t10dif_alg = {
static int __init crc_t10dif_mod_init(void)
{
if (elf_hwcap & HWCAP_PMULL)
crc_t10dif_pmull = crc_t10dif_pmull_p64;
else
crc_t10dif_pmull = crc_t10dif_pmull_p8;
return crypto_register_shash(&crc_t10dif_alg);
}
......@@ -93,8 +101,10 @@ static void __exit crc_t10dif_mod_exit(void)
crypto_unregister_shash(&crc_t10dif_alg);
}
module_cpu_feature_match(PMULL, crc_t10dif_mod_init);
module_cpu_feature_match(ASIMD, crc_t10dif_mod_init);
module_exit(crc_t10dif_mod_exit);
MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
MODULE_LICENSE("GPL v2");
MODULE_ALIAS_CRYPTO("crct10dif");
MODULE_ALIAS_CRYPTO("crct10dif-arm64-ce");
// SPDX-License-Identifier: GPL-2.0
/*
* ARM64 NEON-accelerated implementation of Speck128-XTS and Speck64-XTS
*
* Copyright (c) 2018 Google, Inc
*
* Author: Eric Biggers <ebiggers@google.com>
*/
#include <linux/linkage.h>
.text
// arguments
ROUND_KEYS .req x0 // const {u64,u32} *round_keys
NROUNDS .req w1 // int nrounds
NROUNDS_X .req x1
DST .req x2 // void *dst
SRC .req x3 // const void *src
NBYTES .req w4 // unsigned int nbytes
TWEAK .req x5 // void *tweak
// registers which hold the data being encrypted/decrypted
// (underscores avoid a naming collision with ARM64 registers x0-x3)
X_0 .req v0
Y_0 .req v1
X_1 .req v2
Y_1 .req v3
X_2 .req v4
Y_2 .req v5
X_3 .req v6
Y_3 .req v7
// the round key, duplicated in all lanes
ROUND_KEY .req v8
// index vector for tbl-based 8-bit rotates
ROTATE_TABLE .req v9
ROTATE_TABLE_Q .req q9
// temporary registers
TMP0 .req v10
TMP1 .req v11
TMP2 .req v12
TMP3 .req v13
// multiplication table for updating XTS tweaks
GFMUL_TABLE .req v14
GFMUL_TABLE_Q .req q14
// next XTS tweak value(s)
TWEAKV_NEXT .req v15
// XTS tweaks for the blocks currently being encrypted/decrypted
TWEAKV0 .req v16
TWEAKV1 .req v17
TWEAKV2 .req v18
TWEAKV3 .req v19
TWEAKV4 .req v20
TWEAKV5 .req v21
TWEAKV6 .req v22
TWEAKV7 .req v23
.align 4
.Lror64_8_table:
.octa 0x080f0e0d0c0b0a090007060504030201
.Lror32_8_table:
.octa 0x0c0f0e0d080b0a090407060500030201
.Lrol64_8_table:
.octa 0x0e0d0c0b0a09080f0605040302010007
.Lrol32_8_table:
.octa 0x0e0d0c0f0a09080b0605040702010003
.Lgf128mul_table:
.octa 0x00000000000000870000000000000001
.Lgf64mul_table:
.octa 0x0000000000000000000000002d361b00
/*
* _speck_round_128bytes() - Speck encryption round on 128 bytes at a time
*
* Do one Speck encryption round on the 128 bytes (8 blocks for Speck128, 16 for
* Speck64) stored in X0-X3 and Y0-Y3, using the round key stored in all lanes
* of ROUND_KEY. 'n' is the lane size: 64 for Speck128, or 32 for Speck64.
* 'lanes' is the lane specifier: "2d" for Speck128 or "4s" for Speck64.
*/
.macro _speck_round_128bytes n, lanes
// x = ror(x, 8)
tbl X_0.16b, {X_0.16b}, ROTATE_TABLE.16b
tbl X_1.16b, {X_1.16b}, ROTATE_TABLE.16b
tbl X_2.16b, {X_2.16b}, ROTATE_TABLE.16b
tbl X_3.16b, {X_3.16b}, ROTATE_TABLE.16b
// x += y
add X_0.\lanes, X_0.\lanes, Y_0.\lanes
add X_1.\lanes, X_1.\lanes, Y_1.\lanes
add X_2.\lanes, X_2.\lanes, Y_2.\lanes
add X_3.\lanes, X_3.\lanes, Y_3.\lanes
// x ^= k
eor X_0.16b, X_0.16b, ROUND_KEY.16b
eor X_1.16b, X_1.16b, ROUND_KEY.16b
eor X_2.16b, X_2.16b, ROUND_KEY.16b
eor X_3.16b, X_3.16b, ROUND_KEY.16b
// y = rol(y, 3)
shl TMP0.\lanes, Y_0.\lanes, #3
shl TMP1.\lanes, Y_1.\lanes, #3
shl TMP2.\lanes, Y_2.\lanes, #3
shl TMP3.\lanes, Y_3.\lanes, #3
sri TMP0.\lanes, Y_0.\lanes, #(\n - 3)
sri TMP1.\lanes, Y_1.\lanes, #(\n - 3)
sri TMP2.\lanes, Y_2.\lanes, #(\n - 3)
sri TMP3.\lanes, Y_3.\lanes, #(\n - 3)
// y ^= x
eor Y_0.16b, TMP0.16b, X_0.16b
eor Y_1.16b, TMP1.16b, X_1.16b
eor Y_2.16b, TMP2.16b, X_2.16b
eor Y_3.16b, TMP3.16b, X_3.16b
.endm
/*
* _speck_unround_128bytes() - Speck decryption round on 128 bytes at a time
*
* This is the inverse of _speck_round_128bytes().
*/
.macro _speck_unround_128bytes n, lanes
// y ^= x
eor TMP0.16b, Y_0.16b, X_0.16b
eor TMP1.16b, Y_1.16b, X_1.16b
eor TMP2.16b, Y_2.16b, X_2.16b
eor TMP3.16b, Y_3.16b, X_3.16b
// y = ror(y, 3)
ushr Y_0.\lanes, TMP0.\lanes, #3
ushr Y_1.\lanes, TMP1.\lanes, #3
ushr Y_2.\lanes, TMP2.\lanes, #3
ushr Y_3.\lanes, TMP3.\lanes, #3
sli Y_0.\lanes, TMP0.\lanes, #(\n - 3)
sli Y_1.\lanes, TMP1.\lanes, #(\n - 3)
sli Y_2.\lanes, TMP2.\lanes, #(\n - 3)
sli Y_3.\lanes, TMP3.\lanes, #(\n - 3)
// x ^= k
eor X_0.16b, X_0.16b, ROUND_KEY.16b
eor X_1.16b, X_1.16b, ROUND_KEY.16b
eor X_2.16b, X_2.16b, ROUND_KEY.16b
eor X_3.16b, X_3.16b, ROUND_KEY.16b
// x -= y
sub X_0.\lanes, X_0.\lanes, Y_0.\lanes
sub X_1.\lanes, X_1.\lanes, Y_1.\lanes
sub X_2.\lanes, X_2.\lanes, Y_2.\lanes
sub X_3.\lanes, X_3.\lanes, Y_3.\lanes
// x = rol(x, 8)
tbl X_0.16b, {X_0.16b}, ROTATE_TABLE.16b
tbl X_1.16b, {X_1.16b}, ROTATE_TABLE.16b
tbl X_2.16b, {X_2.16b}, ROTATE_TABLE.16b
tbl X_3.16b, {X_3.16b}, ROTATE_TABLE.16b
.endm
.macro _next_xts_tweak next, cur, tmp, n
.if \n == 64
/*
* Calculate the next tweak by multiplying the current one by x,
* modulo p(x) = x^128 + x^7 + x^2 + x + 1.
*/
sshr \tmp\().2d, \cur\().2d, #63
and \tmp\().16b, \tmp\().16b, GFMUL_TABLE.16b
shl \next\().2d, \cur\().2d, #1
ext \tmp\().16b, \tmp\().16b, \tmp\().16b, #8
eor \next\().16b, \next\().16b, \tmp\().16b
.else
/*
* Calculate the next two tweaks by multiplying the current ones by x^2,
* modulo p(x) = x^64 + x^4 + x^3 + x + 1.
*/
ushr \tmp\().2d, \cur\().2d, #62
shl \next\().2d, \cur\().2d, #2
tbl \tmp\().16b, {GFMUL_TABLE.16b}, \tmp\().16b
eor \next\().16b, \next\().16b, \tmp\().16b
.endif
.endm
/*
* _speck_xts_crypt() - Speck-XTS encryption/decryption
*
* Encrypt or decrypt NBYTES bytes of data from the SRC buffer to the DST buffer
* using Speck-XTS, specifically the variant with a block size of '2n' and round
* count given by NROUNDS. The expanded round keys are given in ROUND_KEYS, and
* the current XTS tweak value is given in TWEAK. It's assumed that NBYTES is a
* nonzero multiple of 128.
*/
.macro _speck_xts_crypt n, lanes, decrypting
/*
* If decrypting, modify the ROUND_KEYS parameter to point to the last
* round key rather than the first, since for decryption the round keys
* are used in reverse order.
*/
.if \decrypting
mov NROUNDS, NROUNDS /* zero the high 32 bits */
.if \n == 64
add ROUND_KEYS, ROUND_KEYS, NROUNDS_X, lsl #3
sub ROUND_KEYS, ROUND_KEYS, #8
.else
add ROUND_KEYS, ROUND_KEYS, NROUNDS_X, lsl #2
sub ROUND_KEYS, ROUND_KEYS, #4
.endif
.endif
// Load the index vector for tbl-based 8-bit rotates
.if \decrypting
ldr ROTATE_TABLE_Q, .Lrol\n\()_8_table
.else
ldr ROTATE_TABLE_Q, .Lror\n\()_8_table
.endif
// One-time XTS preparation
.if \n == 64
// Load first tweak
ld1 {TWEAKV0.16b}, [TWEAK]
// Load GF(2^128) multiplication table
ldr GFMUL_TABLE_Q, .Lgf128mul_table
.else
// Load first tweak
ld1 {TWEAKV0.8b}, [TWEAK]
// Load GF(2^64) multiplication table
ldr GFMUL_TABLE_Q, .Lgf64mul_table
// Calculate second tweak, packing it together with the first
ushr TMP0.2d, TWEAKV0.2d, #63
shl TMP1.2d, TWEAKV0.2d, #1
tbl TMP0.8b, {GFMUL_TABLE.16b}, TMP0.8b
eor TMP0.8b, TMP0.8b, TMP1.8b
mov TWEAKV0.d[1], TMP0.d[0]
.endif
.Lnext_128bytes_\@:
// Calculate XTS tweaks for next 128 bytes
_next_xts_tweak TWEAKV1, TWEAKV0, TMP0, \n
_next_xts_tweak TWEAKV2, TWEAKV1, TMP0, \n
_next_xts_tweak TWEAKV3, TWEAKV2, TMP0, \n
_next_xts_tweak TWEAKV4, TWEAKV3, TMP0, \n
_next_xts_tweak TWEAKV5, TWEAKV4, TMP0, \n
_next_xts_tweak TWEAKV6, TWEAKV5, TMP0, \n
_next_xts_tweak TWEAKV7, TWEAKV6, TMP0, \n
_next_xts_tweak TWEAKV_NEXT, TWEAKV7, TMP0, \n
// Load the next source blocks into {X,Y}[0-3]
ld1 {X_0.16b-Y_1.16b}, [SRC], #64
ld1 {X_2.16b-Y_3.16b}, [SRC], #64
// XOR the source blocks with their XTS tweaks
eor TMP0.16b, X_0.16b, TWEAKV0.16b
eor Y_0.16b, Y_0.16b, TWEAKV1.16b
eor TMP1.16b, X_1.16b, TWEAKV2.16b
eor Y_1.16b, Y_1.16b, TWEAKV3.16b
eor TMP2.16b, X_2.16b, TWEAKV4.16b
eor Y_2.16b, Y_2.16b, TWEAKV5.16b
eor TMP3.16b, X_3.16b, TWEAKV6.16b
eor Y_3.16b, Y_3.16b, TWEAKV7.16b
/*
* De-interleave the 'x' and 'y' elements of each block, i.e. make it so
* that the X[0-3] registers contain only the second halves of blocks,
* and the Y[0-3] registers contain only the first halves of blocks.
* (Speck uses the order (y, x) rather than the more intuitive (x, y).)
*/
uzp2 X_0.\lanes, TMP0.\lanes, Y_0.\lanes
uzp1 Y_0.\lanes, TMP0.\lanes, Y_0.\lanes
uzp2 X_1.\lanes, TMP1.\lanes, Y_1.\lanes
uzp1 Y_1.\lanes, TMP1.\lanes, Y_1.\lanes
uzp2 X_2.\lanes, TMP2.\lanes, Y_2.\lanes
uzp1 Y_2.\lanes, TMP2.\lanes, Y_2.\lanes
uzp2 X_3.\lanes, TMP3.\lanes, Y_3.\lanes
uzp1 Y_3.\lanes, TMP3.\lanes, Y_3.\lanes
// Do the cipher rounds
mov x6, ROUND_KEYS
mov w7, NROUNDS
.Lnext_round_\@:
.if \decrypting
ld1r {ROUND_KEY.\lanes}, [x6]
sub x6, x6, #( \n / 8 )
_speck_unround_128bytes \n, \lanes
.else
ld1r {ROUND_KEY.\lanes}, [x6], #( \n / 8 )
_speck_round_128bytes \n, \lanes
.endif
subs w7, w7, #1
bne .Lnext_round_\@
// Re-interleave the 'x' and 'y' elements of each block
zip1 TMP0.\lanes, Y_0.\lanes, X_0.\lanes
zip2 Y_0.\lanes, Y_0.\lanes, X_0.\lanes
zip1 TMP1.\lanes, Y_1.\lanes, X_1.\lanes
zip2 Y_1.\lanes, Y_1.\lanes, X_1.\lanes
zip1 TMP2.\lanes, Y_2.\lanes, X_2.\lanes
zip2 Y_2.\lanes, Y_2.\lanes, X_2.\lanes
zip1 TMP3.\lanes, Y_3.\lanes, X_3.\lanes
zip2 Y_3.\lanes, Y_3.\lanes, X_3.\lanes
// XOR the encrypted/decrypted blocks with the tweaks calculated earlier
eor X_0.16b, TMP0.16b, TWEAKV0.16b
eor Y_0.16b, Y_0.16b, TWEAKV1.16b
eor X_1.16b, TMP1.16b, TWEAKV2.16b
eor Y_1.16b, Y_1.16b, TWEAKV3.16b
eor X_2.16b, TMP2.16b, TWEAKV4.16b
eor Y_2.16b, Y_2.16b, TWEAKV5.16b
eor X_3.16b, TMP3.16b, TWEAKV6.16b
eor Y_3.16b, Y_3.16b, TWEAKV7.16b
mov TWEAKV0.16b, TWEAKV_NEXT.16b
// Store the ciphertext in the destination buffer
st1 {X_0.16b-Y_1.16b}, [DST], #64
st1 {X_2.16b-Y_3.16b}, [DST], #64
// Continue if there are more 128-byte chunks remaining
subs NBYTES, NBYTES, #128
bne .Lnext_128bytes_\@
// Store the next tweak and return
.if \n == 64
st1 {TWEAKV_NEXT.16b}, [TWEAK]
.else
st1 {TWEAKV_NEXT.8b}, [TWEAK]
.endif
ret
.endm
ENTRY(speck128_xts_encrypt_neon)
_speck_xts_crypt n=64, lanes=2d, decrypting=0
ENDPROC(speck128_xts_encrypt_neon)
ENTRY(speck128_xts_decrypt_neon)
_speck_xts_crypt n=64, lanes=2d, decrypting=1
ENDPROC(speck128_xts_decrypt_neon)
ENTRY(speck64_xts_encrypt_neon)
_speck_xts_crypt n=32, lanes=4s, decrypting=0
ENDPROC(speck64_xts_encrypt_neon)
ENTRY(speck64_xts_decrypt_neon)
_speck_xts_crypt n=32, lanes=4s, decrypting=1
ENDPROC(speck64_xts_decrypt_neon)
// SPDX-License-Identifier: GPL-2.0
/*
* NEON-accelerated implementation of Speck128-XTS and Speck64-XTS
* (64-bit version; based on the 32-bit version)
*
* Copyright (c) 2018 Google, Inc
*/
#include <asm/hwcap.h>
#include <asm/neon.h>
#include <asm/simd.h>
#include <crypto/algapi.h>
#include <crypto/gf128mul.h>
#include <crypto/internal/skcipher.h>
#include <crypto/speck.h>
#include <crypto/xts.h>
#include <linux/kernel.h>
#include <linux/module.h>
/* The assembly functions only handle multiples of 128 bytes */
#define SPECK_NEON_CHUNK_SIZE 128
/* Speck128 */
struct speck128_xts_tfm_ctx {
struct speck128_tfm_ctx main_key;
struct speck128_tfm_ctx tweak_key;
};
asmlinkage void speck128_xts_encrypt_neon(const u64 *round_keys, int nrounds,
void *dst, const void *src,
unsigned int nbytes, void *tweak);
asmlinkage void speck128_xts_decrypt_neon(const u64 *round_keys, int nrounds,
void *dst, const void *src,
unsigned int nbytes, void *tweak);
typedef void (*speck128_crypt_one_t)(const struct speck128_tfm_ctx *,
u8 *, const u8 *);
typedef void (*speck128_xts_crypt_many_t)(const u64 *, int, void *,
const void *, unsigned int, void *);
static __always_inline int
__speck128_xts_crypt(struct skcipher_request *req,
speck128_crypt_one_t crypt_one,
speck128_xts_crypt_many_t crypt_many)
{
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
const struct speck128_xts_tfm_ctx *ctx = crypto_skcipher_ctx(tfm);
struct skcipher_walk walk;
le128 tweak;
int err;
err = skcipher_walk_virt(&walk, req, true);
crypto_speck128_encrypt(&ctx->tweak_key, (u8 *)&tweak, walk.iv);
while (walk.nbytes > 0) {
unsigned int nbytes = walk.nbytes;
u8 *dst = walk.dst.virt.addr;
const u8 *src = walk.src.virt.addr;
if (nbytes >= SPECK_NEON_CHUNK_SIZE && may_use_simd()) {
unsigned int count;
count = round_down(nbytes, SPECK_NEON_CHUNK_SIZE);
kernel_neon_begin();
(*crypt_many)(ctx->main_key.round_keys,
ctx->main_key.nrounds,
dst, src, count, &tweak);
kernel_neon_end();
dst += count;
src += count;
nbytes -= count;
}
/* Handle any remainder with generic code */
while (nbytes >= sizeof(tweak)) {
le128_xor((le128 *)dst, (const le128 *)src, &tweak);
(*crypt_one)(&ctx->main_key, dst, dst);
le128_xor((le128 *)dst, (const le128 *)dst, &tweak);
gf128mul_x_ble(&tweak, &tweak);
dst += sizeof(tweak);
src += sizeof(tweak);
nbytes -= sizeof(tweak);
}
err = skcipher_walk_done(&walk, nbytes);
}
return err;
}
static int speck128_xts_encrypt(struct skcipher_request *req)
{
return __speck128_xts_crypt(req, crypto_speck128_encrypt,
speck128_xts_encrypt_neon);
}
static int speck128_xts_decrypt(struct skcipher_request *req)
{
return __speck128_xts_crypt(req, crypto_speck128_decrypt,
speck128_xts_decrypt_neon);
}
static int speck128_xts_setkey(struct crypto_skcipher *tfm, const u8 *key,
unsigned int keylen)
{
struct speck128_xts_tfm_ctx *ctx = crypto_skcipher_ctx(tfm);
int err;
err = xts_verify_key(tfm, key, keylen);
if (err)
return err;
keylen /= 2;
err = crypto_speck128_setkey(&ctx->main_key, key, keylen);
if (err)
return err;
return crypto_speck128_setkey(&ctx->tweak_key, key + keylen, keylen);
}
/* Speck64 */
struct speck64_xts_tfm_ctx {
struct speck64_tfm_ctx main_key;
struct speck64_tfm_ctx tweak_key;
};
asmlinkage void speck64_xts_encrypt_neon(const u32 *round_keys, int nrounds,
void *dst, const void *src,
unsigned int nbytes, void *tweak);
asmlinkage void speck64_xts_decrypt_neon(const u32 *round_keys, int nrounds,
void *dst, const void *src,
unsigned int nbytes, void *tweak);
typedef void (*speck64_crypt_one_t)(const struct speck64_tfm_ctx *,
u8 *, const u8 *);
typedef void (*speck64_xts_crypt_many_t)(const u32 *, int, void *,
const void *, unsigned int, void *);
static __always_inline int
__speck64_xts_crypt(struct skcipher_request *req, speck64_crypt_one_t crypt_one,
speck64_xts_crypt_many_t crypt_many)
{
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
const struct speck64_xts_tfm_ctx *ctx = crypto_skcipher_ctx(tfm);
struct skcipher_walk walk;
__le64 tweak;
int err;
err = skcipher_walk_virt(&walk, req, true);
crypto_speck64_encrypt(&ctx->tweak_key, (u8 *)&tweak, walk.iv);
while (walk.nbytes > 0) {
unsigned int nbytes = walk.nbytes;
u8 *dst = walk.dst.virt.addr;
const u8 *src = walk.src.virt.addr;
if (nbytes >= SPECK_NEON_CHUNK_SIZE && may_use_simd()) {
unsigned int count;
count = round_down(nbytes, SPECK_NEON_CHUNK_SIZE);
kernel_neon_begin();
(*crypt_many)(ctx->main_key.round_keys,
ctx->main_key.nrounds,
dst, src, count, &tweak);
kernel_neon_end();
dst += count;
src += count;
nbytes -= count;
}
/* Handle any remainder with generic code */
while (nbytes >= sizeof(tweak)) {
*(__le64 *)dst = *(__le64 *)src ^ tweak;
(*crypt_one)(&ctx->main_key, dst, dst);
*(__le64 *)dst ^= tweak;
tweak = cpu_to_le64((le64_to_cpu(tweak) << 1) ^
((tweak & cpu_to_le64(1ULL << 63)) ?
0x1B : 0));
dst += sizeof(tweak);
src += sizeof(tweak);
nbytes -= sizeof(tweak);
}
err = skcipher_walk_done(&walk, nbytes);
}
return err;
}
static int speck64_xts_encrypt(struct skcipher_request *req)
{
return __speck64_xts_crypt(req, crypto_speck64_encrypt,
speck64_xts_encrypt_neon);
}
static int speck64_xts_decrypt(struct skcipher_request *req)
{
return __speck64_xts_crypt(req, crypto_speck64_decrypt,
speck64_xts_decrypt_neon);
}
static int speck64_xts_setkey(struct crypto_skcipher *tfm, const u8 *key,
unsigned int keylen)
{
struct speck64_xts_tfm_ctx *ctx = crypto_skcipher_ctx(tfm);
int err;
err = xts_verify_key(tfm, key, keylen);
if (err)
return err;
keylen /= 2;
err = crypto_speck64_setkey(&ctx->main_key, key, keylen);
if (err)
return err;
return crypto_speck64_setkey(&ctx->tweak_key, key + keylen, keylen);
}
static struct skcipher_alg speck_algs[] = {
{
.base.cra_name = "xts(speck128)",
.base.cra_driver_name = "xts-speck128-neon",
.base.cra_priority = 300,
.base.cra_blocksize = SPECK128_BLOCK_SIZE,
.base.cra_ctxsize = sizeof(struct speck128_xts_tfm_ctx),
.base.cra_alignmask = 7,
.base.cra_module = THIS_MODULE,
.min_keysize = 2 * SPECK128_128_KEY_SIZE,
.max_keysize = 2 * SPECK128_256_KEY_SIZE,
.ivsize = SPECK128_BLOCK_SIZE,
.walksize = SPECK_NEON_CHUNK_SIZE,
.setkey = speck128_xts_setkey,
.encrypt = speck128_xts_encrypt,
.decrypt = speck128_xts_decrypt,
}, {
.base.cra_name = "xts(speck64)",
.base.cra_driver_name = "xts-speck64-neon",
.base.cra_priority = 300,
.base.cra_blocksize = SPECK64_BLOCK_SIZE,
.base.cra_ctxsize = sizeof(struct speck64_xts_tfm_ctx),
.base.cra_alignmask = 7,
.base.cra_module = THIS_MODULE,
.min_keysize = 2 * SPECK64_96_KEY_SIZE,
.max_keysize = 2 * SPECK64_128_KEY_SIZE,
.ivsize = SPECK64_BLOCK_SIZE,
.walksize = SPECK_NEON_CHUNK_SIZE,
.setkey = speck64_xts_setkey,
.encrypt = speck64_xts_encrypt,
.decrypt = speck64_xts_decrypt,
}
};
static int __init speck_neon_module_init(void)
{
if (!(elf_hwcap & HWCAP_ASIMD))
return -ENODEV;
return crypto_register_skciphers(speck_algs, ARRAY_SIZE(speck_algs));
}
static void __exit speck_neon_module_exit(void)
{
crypto_unregister_skciphers(speck_algs, ARRAY_SIZE(speck_algs));
}
module_init(speck_neon_module_init);
module_exit(speck_neon_module_exit);
MODULE_DESCRIPTION("Speck block cipher (NEON-accelerated)");
MODULE_LICENSE("GPL");
MODULE_AUTHOR("Eric Biggers <ebiggers@google.com>");
MODULE_ALIAS_CRYPTO("xts(speck128)");
MODULE_ALIAS_CRYPTO("xts-speck128-neon");
MODULE_ALIAS_CRYPTO("xts(speck64)");
MODULE_ALIAS_CRYPTO("xts-speck64-neon");
......@@ -621,7 +621,6 @@ CONFIG_CRYPTO_ECDH=m
CONFIG_CRYPTO_MANAGER=y
CONFIG_CRYPTO_USER=m
CONFIG_CRYPTO_CRYPTD=m
CONFIG_CRYPTO_MCRYPTD=m
CONFIG_CRYPTO_TEST=m
CONFIG_CRYPTO_CHACHA20POLY1305=m
CONFIG_CRYPTO_AEGIS128=m
......@@ -657,7 +656,6 @@ CONFIG_CRYPTO_SALSA20=m
CONFIG_CRYPTO_SEED=m
CONFIG_CRYPTO_SERPENT=m
CONFIG_CRYPTO_SM4=m
CONFIG_CRYPTO_SPECK=m
CONFIG_CRYPTO_TEA=m
CONFIG_CRYPTO_TWOFISH=m
CONFIG_CRYPTO_LZO=m
......
......@@ -578,7 +578,6 @@ CONFIG_CRYPTO_ECDH=m
CONFIG_CRYPTO_MANAGER=y
CONFIG_CRYPTO_USER=m
CONFIG_CRYPTO_CRYPTD=m
CONFIG_CRYPTO_MCRYPTD=m
CONFIG_CRYPTO_TEST=m
CONFIG_CRYPTO_CHACHA20POLY1305=m
CONFIG_CRYPTO_AEGIS128=m
......@@ -614,7 +613,6 @@ CONFIG_CRYPTO_SALSA20=m
CONFIG_CRYPTO_SEED=m
CONFIG_CRYPTO_SERPENT=m
CONFIG_CRYPTO_SM4=m
CONFIG_CRYPTO_SPECK=m
CONFIG_CRYPTO_TEA=m
CONFIG_CRYPTO_TWOFISH=m
CONFIG_CRYPTO_LZO=m
......
......@@ -599,7 +599,6 @@ CONFIG_CRYPTO_ECDH=m
CONFIG_CRYPTO_MANAGER=y
CONFIG_CRYPTO_USER=m
CONFIG_CRYPTO_CRYPTD=m
CONFIG_CRYPTO_MCRYPTD=m
CONFIG_CRYPTO_TEST=m
CONFIG_CRYPTO_CHACHA20POLY1305=m
CONFIG_CRYPTO_AEGIS128=m
......@@ -635,7 +634,6 @@ CONFIG_CRYPTO_SALSA20=m
CONFIG_CRYPTO_SEED=m
CONFIG_CRYPTO_SERPENT=m
CONFIG_CRYPTO_SM4=m
CONFIG_CRYPTO_SPECK=m
CONFIG_CRYPTO_TEA=m
CONFIG_CRYPTO_TWOFISH=m
CONFIG_CRYPTO_LZO=m
......
......@@ -570,7 +570,6 @@ CONFIG_CRYPTO_ECDH=m
CONFIG_CRYPTO_MANAGER=y
CONFIG_CRYPTO_USER=m
CONFIG_CRYPTO_CRYPTD=m
CONFIG_CRYPTO_MCRYPTD=m
CONFIG_CRYPTO_TEST=m
CONFIG_CRYPTO_CHACHA20POLY1305=m
CONFIG_CRYPTO_AEGIS128=m
......@@ -606,7 +605,6 @@ CONFIG_CRYPTO_SALSA20=m
CONFIG_CRYPTO_SEED=m
CONFIG_CRYPTO_SERPENT=m
CONFIG_CRYPTO_SM4=m
CONFIG_CRYPTO_SPECK=m
CONFIG_CRYPTO_TEA=m
CONFIG_CRYPTO_TWOFISH=m
CONFIG_CRYPTO_LZO=m
......
......@@ -580,7 +580,6 @@ CONFIG_CRYPTO_ECDH=m
CONFIG_CRYPTO_MANAGER=y
CONFIG_CRYPTO_USER=m
CONFIG_CRYPTO_CRYPTD=m
CONFIG_CRYPTO_MCRYPTD=m
CONFIG_CRYPTO_TEST=m
CONFIG_CRYPTO_CHACHA20POLY1305=m
CONFIG_CRYPTO_AEGIS128=m
......@@ -616,7 +615,6 @@ CONFIG_CRYPTO_SALSA20=m
CONFIG_CRYPTO_SEED=m
CONFIG_CRYPTO_SERPENT=m
CONFIG_CRYPTO_SM4=m
CONFIG_CRYPTO_SPECK=m
CONFIG_CRYPTO_TEA=m
CONFIG_CRYPTO_TWOFISH=m
CONFIG_CRYPTO_LZO=m
......
......@@ -602,7 +602,6 @@ CONFIG_CRYPTO_ECDH=m
CONFIG_CRYPTO_MANAGER=y
CONFIG_CRYPTO_USER=m
CONFIG_CRYPTO_CRYPTD=m
CONFIG_CRYPTO_MCRYPTD=m
CONFIG_CRYPTO_TEST=m
CONFIG_CRYPTO_CHACHA20POLY1305=m
CONFIG_CRYPTO_AEGIS128=m
......@@ -638,7 +637,6 @@ CONFIG_CRYPTO_SALSA20=m
CONFIG_CRYPTO_SEED=m
CONFIG_CRYPTO_SERPENT=m
CONFIG_CRYPTO_SM4=m
CONFIG_CRYPTO_SPECK=m
CONFIG_CRYPTO_TEA=m
CONFIG_CRYPTO_TWOFISH=m
CONFIG_CRYPTO_LZO=m
......
......@@ -684,7 +684,6 @@ CONFIG_CRYPTO_ECDH=m
CONFIG_CRYPTO_MANAGER=y
CONFIG_CRYPTO_USER=m
CONFIG_CRYPTO_CRYPTD=m
CONFIG_CRYPTO_MCRYPTD=m
CONFIG_CRYPTO_TEST=m
CONFIG_CRYPTO_CHACHA20POLY1305=m
CONFIG_CRYPTO_AEGIS128=m
......@@ -720,7 +719,6 @@ CONFIG_CRYPTO_SALSA20=m
CONFIG_CRYPTO_SEED=m
CONFIG_CRYPTO_SERPENT=m
CONFIG_CRYPTO_SM4=m
CONFIG_CRYPTO_SPECK=m
CONFIG_CRYPTO_TEA=m
CONFIG_CRYPTO_TWOFISH=m
CONFIG_CRYPTO_LZO=m
......
......@@ -570,7 +570,6 @@ CONFIG_CRYPTO_ECDH=m
CONFIG_CRYPTO_MANAGER=y
CONFIG_CRYPTO_USER=m
CONFIG_CRYPTO_CRYPTD=m
CONFIG_CRYPTO_MCRYPTD=m
CONFIG_CRYPTO_TEST=m
CONFIG_CRYPTO_CHACHA20POLY1305=m
CONFIG_CRYPTO_AEGIS128=m
......@@ -606,7 +605,6 @@ CONFIG_CRYPTO_SALSA20=m
CONFIG_CRYPTO_SEED=m
CONFIG_CRYPTO_SERPENT=m
CONFIG_CRYPTO_SM4=m
CONFIG_CRYPTO_SPECK=m
CONFIG_CRYPTO_TEA=m
CONFIG_CRYPTO_TWOFISH=m
CONFIG_CRYPTO_LZO=m
......
......@@ -570,7 +570,6 @@ CONFIG_CRYPTO_ECDH=m
CONFIG_CRYPTO_MANAGER=y
CONFIG_CRYPTO_USER=m
CONFIG_CRYPTO_CRYPTD=m
CONFIG_CRYPTO_MCRYPTD=m
CONFIG_CRYPTO_TEST=m
CONFIG_CRYPTO_CHACHA20POLY1305=m
CONFIG_CRYPTO_AEGIS128=m
......@@ -606,7 +605,6 @@ CONFIG_CRYPTO_SALSA20=m
CONFIG_CRYPTO_SEED=m
CONFIG_CRYPTO_SERPENT=m
CONFIG_CRYPTO_SM4=m
CONFIG_CRYPTO_SPECK=m
CONFIG_CRYPTO_TEA=m
CONFIG_CRYPTO_TWOFISH=m
CONFIG_CRYPTO_LZO=m
......
......@@ -593,7 +593,6 @@ CONFIG_CRYPTO_ECDH=m
CONFIG_CRYPTO_MANAGER=y
CONFIG_CRYPTO_USER=m
CONFIG_CRYPTO_CRYPTD=m
CONFIG_CRYPTO_MCRYPTD=m
CONFIG_CRYPTO_TEST=m
CONFIG_CRYPTO_CHACHA20POLY1305=m
CONFIG_CRYPTO_AEGIS128=m
......@@ -629,7 +628,6 @@ CONFIG_CRYPTO_SALSA20=m
CONFIG_CRYPTO_SEED=m
CONFIG_CRYPTO_SERPENT=m
CONFIG_CRYPTO_SM4=m
CONFIG_CRYPTO_SPECK=m
CONFIG_CRYPTO_TEA=m
CONFIG_CRYPTO_TWOFISH=m
CONFIG_CRYPTO_LZO=m
......
......@@ -571,7 +571,6 @@ CONFIG_CRYPTO_ECDH=m
CONFIG_CRYPTO_MANAGER=y
CONFIG_CRYPTO_USER=m
CONFIG_CRYPTO_CRYPTD=m
CONFIG_CRYPTO_MCRYPTD=m
CONFIG_CRYPTO_TEST=m
CONFIG_CRYPTO_CHACHA20POLY1305=m
CONFIG_CRYPTO_AEGIS128=m
......@@ -607,7 +606,6 @@ CONFIG_CRYPTO_SALSA20=m
CONFIG_CRYPTO_SEED=m
CONFIG_CRYPTO_SERPENT=m
CONFIG_CRYPTO_SM4=m
CONFIG_CRYPTO_SPECK=m
CONFIG_CRYPTO_TEA=m
CONFIG_CRYPTO_TWOFISH=m
CONFIG_CRYPTO_LZO=m
......
......@@ -572,7 +572,6 @@ CONFIG_CRYPTO_ECDH=m
CONFIG_CRYPTO_MANAGER=y
CONFIG_CRYPTO_USER=m
CONFIG_CRYPTO_CRYPTD=m
CONFIG_CRYPTO_MCRYPTD=m
CONFIG_CRYPTO_TEST=m
CONFIG_CRYPTO_CHACHA20POLY1305=m
CONFIG_CRYPTO_AEGIS128=m
......@@ -608,7 +607,6 @@ CONFIG_CRYPTO_SALSA20=m
CONFIG_CRYPTO_SEED=m
CONFIG_CRYPTO_SERPENT=m
CONFIG_CRYPTO_SM4=m
CONFIG_CRYPTO_SPECK=m
CONFIG_CRYPTO_TEA=m
CONFIG_CRYPTO_TWOFISH=m
CONFIG_CRYPTO_LZO=m
......
......@@ -668,7 +668,6 @@ CONFIG_CRYPTO_USER=m
# CONFIG_CRYPTO_MANAGER_DISABLE_TESTS is not set
CONFIG_CRYPTO_PCRYPT=m
CONFIG_CRYPTO_CRYPTD=m
CONFIG_CRYPTO_MCRYPTD=m
CONFIG_CRYPTO_TEST=m
CONFIG_CRYPTO_CHACHA20POLY1305=m
CONFIG_CRYPTO_LRW=m
......
......@@ -610,7 +610,6 @@ CONFIG_CRYPTO_USER=m
# CONFIG_CRYPTO_MANAGER_DISABLE_TESTS is not set
CONFIG_CRYPTO_PCRYPT=m
CONFIG_CRYPTO_CRYPTD=m
CONFIG_CRYPTO_MCRYPTD=m
CONFIG_CRYPTO_TEST=m
CONFIG_CRYPTO_CHACHA20POLY1305=m
CONFIG_CRYPTO_LRW=m
......
......@@ -44,7 +44,7 @@ struct s390_aes_ctx {
int key_len;
unsigned long fc;
union {
struct crypto_skcipher *blk;
struct crypto_sync_skcipher *blk;
struct crypto_cipher *cip;
} fallback;
};
......@@ -54,7 +54,7 @@ struct s390_xts_ctx {
u8 pcc_key[32];
int key_len;
unsigned long fc;
struct crypto_skcipher *fallback;
struct crypto_sync_skcipher *fallback;
};
struct gcm_sg_walk {
......@@ -184,14 +184,15 @@ static int setkey_fallback_blk(struct crypto_tfm *tfm, const u8 *key,
struct s390_aes_ctx *sctx = crypto_tfm_ctx(tfm);
unsigned int ret;
crypto_skcipher_clear_flags(sctx->fallback.blk, CRYPTO_TFM_REQ_MASK);
crypto_skcipher_set_flags(sctx->fallback.blk, tfm->crt_flags &
crypto_sync_skcipher_clear_flags(sctx->fallback.blk,
CRYPTO_TFM_REQ_MASK);
crypto_sync_skcipher_set_flags(sctx->fallback.blk, tfm->crt_flags &
CRYPTO_TFM_REQ_MASK);
ret = crypto_skcipher_setkey(sctx->fallback.blk, key, len);
ret = crypto_sync_skcipher_setkey(sctx->fallback.blk, key, len);
tfm->crt_flags &= ~CRYPTO_TFM_RES_MASK;
tfm->crt_flags |= crypto_skcipher_get_flags(sctx->fallback.blk) &
tfm->crt_flags |= crypto_sync_skcipher_get_flags(sctx->fallback.blk) &
CRYPTO_TFM_RES_MASK;
return ret;
......@@ -204,9 +205,9 @@ static int fallback_blk_dec(struct blkcipher_desc *desc,
unsigned int ret;
struct crypto_blkcipher *tfm = desc->tfm;
struct s390_aes_ctx *sctx = crypto_blkcipher_ctx(tfm);
SKCIPHER_REQUEST_ON_STACK(req, sctx->fallback.blk);
SYNC_SKCIPHER_REQUEST_ON_STACK(req, sctx->fallback.blk);
skcipher_request_set_tfm(req, sctx->fallback.blk);
skcipher_request_set_sync_tfm(req, sctx->fallback.blk);
skcipher_request_set_callback(req, desc->flags, NULL, NULL);
skcipher_request_set_crypt(req, src, dst, nbytes, desc->info);
......@@ -223,9 +224,9 @@ static int fallback_blk_enc(struct blkcipher_desc *desc,
unsigned int ret;
struct crypto_blkcipher *tfm = desc->tfm;
struct s390_aes_ctx *sctx = crypto_blkcipher_ctx(tfm);
SKCIPHER_REQUEST_ON_STACK(req, sctx->fallback.blk);
SYNC_SKCIPHER_REQUEST_ON_STACK(req, sctx->fallback.blk);
skcipher_request_set_tfm(req, sctx->fallback.blk);
skcipher_request_set_sync_tfm(req, sctx->fallback.blk);
skcipher_request_set_callback(req, desc->flags, NULL, NULL);
skcipher_request_set_crypt(req, src, dst, nbytes, desc->info);
......@@ -306,8 +307,7 @@ static int fallback_init_blk(struct crypto_tfm *tfm)
const char *name = tfm->__crt_alg->cra_name;
struct s390_aes_ctx *sctx = crypto_tfm_ctx(tfm);
sctx->fallback.blk = crypto_alloc_skcipher(name, 0,
CRYPTO_ALG_ASYNC |
sctx->fallback.blk = crypto_alloc_sync_skcipher(name, 0,
CRYPTO_ALG_NEED_FALLBACK);
if (IS_ERR(sctx->fallback.blk)) {
......@@ -323,7 +323,7 @@ static void fallback_exit_blk(struct crypto_tfm *tfm)
{
struct s390_aes_ctx *sctx = crypto_tfm_ctx(tfm);
crypto_free_skcipher(sctx->fallback.blk);
crypto_free_sync_skcipher(sctx->fallback.blk);
}
static struct crypto_alg ecb_aes_alg = {
......@@ -453,14 +453,15 @@ static int xts_fallback_setkey(struct crypto_tfm *tfm, const u8 *key,
struct s390_xts_ctx *xts_ctx = crypto_tfm_ctx(tfm);
unsigned int ret;
crypto_skcipher_clear_flags(xts_ctx->fallback, CRYPTO_TFM_REQ_MASK);
crypto_skcipher_set_flags(xts_ctx->fallback, tfm->crt_flags &
crypto_sync_skcipher_clear_flags(xts_ctx->fallback,
CRYPTO_TFM_REQ_MASK);
crypto_sync_skcipher_set_flags(xts_ctx->fallback, tfm->crt_flags &
CRYPTO_TFM_REQ_MASK);
ret = crypto_skcipher_setkey(xts_ctx->fallback, key, len);
ret = crypto_sync_skcipher_setkey(xts_ctx->fallback, key, len);
tfm->crt_flags &= ~CRYPTO_TFM_RES_MASK;
tfm->crt_flags |= crypto_skcipher_get_flags(xts_ctx->fallback) &
tfm->crt_flags |= crypto_sync_skcipher_get_flags(xts_ctx->fallback) &
CRYPTO_TFM_RES_MASK;
return ret;
......@@ -472,10 +473,10 @@ static int xts_fallback_decrypt(struct blkcipher_desc *desc,
{
struct crypto_blkcipher *tfm = desc->tfm;
struct s390_xts_ctx *xts_ctx = crypto_blkcipher_ctx(tfm);
SKCIPHER_REQUEST_ON_STACK(req, xts_ctx->fallback);
SYNC_SKCIPHER_REQUEST_ON_STACK(req, xts_ctx->fallback);
unsigned int ret;
skcipher_request_set_tfm(req, xts_ctx->fallback);
skcipher_request_set_sync_tfm(req, xts_ctx->fallback);
skcipher_request_set_callback(req, desc->flags, NULL, NULL);
skcipher_request_set_crypt(req, src, dst, nbytes, desc->info);
......@@ -491,10 +492,10 @@ static int xts_fallback_encrypt(struct blkcipher_desc *desc,
{
struct crypto_blkcipher *tfm = desc->tfm;
struct s390_xts_ctx *xts_ctx = crypto_blkcipher_ctx(tfm);
SKCIPHER_REQUEST_ON_STACK(req, xts_ctx->fallback);
SYNC_SKCIPHER_REQUEST_ON_STACK(req, xts_ctx->fallback);
unsigned int ret;
skcipher_request_set_tfm(req, xts_ctx->fallback);
skcipher_request_set_sync_tfm(req, xts_ctx->fallback);
skcipher_request_set_callback(req, desc->flags, NULL, NULL);
skcipher_request_set_crypt(req, src, dst, nbytes, desc->info);
......@@ -611,8 +612,7 @@ static int xts_fallback_init(struct crypto_tfm *tfm)
const char *name = tfm->__crt_alg->cra_name;
struct s390_xts_ctx *xts_ctx = crypto_tfm_ctx(tfm);
xts_ctx->fallback = crypto_alloc_skcipher(name, 0,
CRYPTO_ALG_ASYNC |
xts_ctx->fallback = crypto_alloc_sync_skcipher(name, 0,
CRYPTO_ALG_NEED_FALLBACK);
if (IS_ERR(xts_ctx->fallback)) {
......@@ -627,7 +627,7 @@ static void xts_fallback_exit(struct crypto_tfm *tfm)
{
struct s390_xts_ctx *xts_ctx = crypto_tfm_ctx(tfm);
crypto_free_skcipher(xts_ctx->fallback);
crypto_free_sync_skcipher(xts_ctx->fallback);
}
static struct crypto_alg xts_aes_alg = {
......
......@@ -221,7 +221,6 @@ CONFIG_CRYPTO_SALSA20=m
CONFIG_CRYPTO_SEED=m
CONFIG_CRYPTO_SERPENT=m
CONFIG_CRYPTO_SM4=m
CONFIG_CRYPTO_SPECK=m
CONFIG_CRYPTO_TEA=m
CONFIG_CRYPTO_TWOFISH=m
CONFIG_CRYPTO_DEFLATE=m
......
......@@ -60,9 +60,6 @@ endif
ifeq ($(avx2_supported),yes)
obj-$(CONFIG_CRYPTO_CAMELLIA_AESNI_AVX2_X86_64) += camellia-aesni-avx2.o
obj-$(CONFIG_CRYPTO_SERPENT_AVX2_X86_64) += serpent-avx2.o
obj-$(CONFIG_CRYPTO_SHA1_MB) += sha1-mb/
obj-$(CONFIG_CRYPTO_SHA256_MB) += sha256-mb/
obj-$(CONFIG_CRYPTO_SHA512_MB) += sha512-mb/
obj-$(CONFIG_CRYPTO_MORUS1280_AVX2) += morus1280-avx2.o
endif
......@@ -106,7 +103,7 @@ ifeq ($(avx2_supported),yes)
morus1280-avx2-y := morus1280-avx2-asm.o morus1280-avx2-glue.o
endif
aesni-intel-y := aesni-intel_asm.o aesni-intel_glue.o fpu.o
aesni-intel-y := aesni-intel_asm.o aesni-intel_glue.o
aesni-intel-$(CONFIG_64BIT) += aesni-intel_avx-x86_64.o aes_ctrby8_avx-x86_64.o
ghash-clmulni-intel-y := ghash-clmulni-intel_asm.o ghash-clmulni-intel_glue.o
sha1-ssse3-y := sha1_ssse3_asm.o sha1_ssse3_glue.o
......
......@@ -102,9 +102,6 @@ asmlinkage void aesni_cbc_enc(struct crypto_aes_ctx *ctx, u8 *out,
asmlinkage void aesni_cbc_dec(struct crypto_aes_ctx *ctx, u8 *out,
const u8 *in, unsigned int len, u8 *iv);
int crypto_fpu_init(void);
void crypto_fpu_exit(void);
#define AVX_GEN2_OPTSIZE 640
#define AVX_GEN4_OPTSIZE 4096
......@@ -817,7 +814,7 @@ static int gcmaes_crypt_by_sg(bool enc, struct aead_request *req,
/* Linearize assoc, if not already linear */
if (req->src->length >= assoclen && req->src->length &&
(!PageHighMem(sg_page(req->src)) ||
req->src->offset + req->src->length < PAGE_SIZE)) {
req->src->offset + req->src->length <= PAGE_SIZE)) {
scatterwalk_start(&assoc_sg_walk, req->src);
assoc = scatterwalk_map(&assoc_sg_walk);
} else {
......@@ -1253,22 +1250,6 @@ static struct skcipher_alg aesni_skciphers[] = {
static
struct simd_skcipher_alg *aesni_simd_skciphers[ARRAY_SIZE(aesni_skciphers)];
static struct {
const char *algname;
const char *drvname;
const char *basename;
struct simd_skcipher_alg *simd;
} aesni_simd_skciphers2[] = {
#if (defined(MODULE) && IS_ENABLED(CONFIG_CRYPTO_PCBC)) || \
IS_BUILTIN(CONFIG_CRYPTO_PCBC)
{
.algname = "pcbc(aes)",
.drvname = "pcbc-aes-aesni",
.basename = "fpu(pcbc(__aes-aesni))",
},
#endif
};
#ifdef CONFIG_X86_64
static int generic_gcmaes_set_key(struct crypto_aead *aead, const u8 *key,
unsigned int key_len)
......@@ -1422,10 +1403,6 @@ static void aesni_free_simds(void)
for (i = 0; i < ARRAY_SIZE(aesni_simd_skciphers) &&
aesni_simd_skciphers[i]; i++)
simd_skcipher_free(aesni_simd_skciphers[i]);
for (i = 0; i < ARRAY_SIZE(aesni_simd_skciphers2); i++)
if (aesni_simd_skciphers2[i].simd)
simd_skcipher_free(aesni_simd_skciphers2[i].simd);
}
static int __init aesni_init(void)
......@@ -1469,13 +1446,9 @@ static int __init aesni_init(void)
#endif
#endif
err = crypto_fpu_init();
if (err)
return err;
err = crypto_register_algs(aesni_algs, ARRAY_SIZE(aesni_algs));
if (err)
goto fpu_exit;
return err;
err = crypto_register_skciphers(aesni_skciphers,
ARRAY_SIZE(aesni_skciphers));
......@@ -1499,18 +1472,6 @@ static int __init aesni_init(void)
aesni_simd_skciphers[i] = simd;
}
for (i = 0; i < ARRAY_SIZE(aesni_simd_skciphers2); i++) {
algname = aesni_simd_skciphers2[i].algname;
drvname = aesni_simd_skciphers2[i].drvname;
basename = aesni_simd_skciphers2[i].basename;
simd = simd_skcipher_create_compat(algname, drvname, basename);
err = PTR_ERR(simd);
if (IS_ERR(simd))
continue;
aesni_simd_skciphers2[i].simd = simd;
}
return 0;
unregister_simds:
......@@ -1521,8 +1482,6 @@ static int __init aesni_init(void)
ARRAY_SIZE(aesni_skciphers));
unregister_algs:
crypto_unregister_algs(aesni_algs, ARRAY_SIZE(aesni_algs));
fpu_exit:
crypto_fpu_exit();
return err;
}
......@@ -1533,8 +1492,6 @@ static void __exit aesni_exit(void)
crypto_unregister_skciphers(aesni_skciphers,
ARRAY_SIZE(aesni_skciphers));
crypto_unregister_algs(aesni_algs, ARRAY_SIZE(aesni_algs));
crypto_fpu_exit();
}
late_initcall(aesni_init);
......
/*
* FPU: Wrapper for blkcipher touching fpu
*
* Copyright (c) Intel Corp.
* Author: Huang Ying <ying.huang@intel.com>
*
* This program is free software; you can redistribute it and/or modify it
* under the terms of the GNU General Public License as published by the Free
* Software Foundation; either version 2 of the License, or (at your option)
* any later version.
*
*/
#include <crypto/internal/skcipher.h>
#include <linux/err.h>
#include <linux/init.h>
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/slab.h>
#include <asm/fpu/api.h>
struct crypto_fpu_ctx {
struct crypto_skcipher *child;
};
static int crypto_fpu_setkey(struct crypto_skcipher *parent, const u8 *key,
unsigned int keylen)
{
struct crypto_fpu_ctx *ctx = crypto_skcipher_ctx(parent);
struct crypto_skcipher *child = ctx->child;
int err;
crypto_skcipher_clear_flags(child, CRYPTO_TFM_REQ_MASK);
crypto_skcipher_set_flags(child, crypto_skcipher_get_flags(parent) &
CRYPTO_TFM_REQ_MASK);
err = crypto_skcipher_setkey(child, key, keylen);
crypto_skcipher_set_flags(parent, crypto_skcipher_get_flags(child) &
CRYPTO_TFM_RES_MASK);
return err;
}
static int crypto_fpu_encrypt(struct skcipher_request *req)
{
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
struct crypto_fpu_ctx *ctx = crypto_skcipher_ctx(tfm);
struct crypto_skcipher *child = ctx->child;
SKCIPHER_REQUEST_ON_STACK(subreq, child);
int err;
skcipher_request_set_tfm(subreq, child);
skcipher_request_set_callback(subreq, 0, NULL, NULL);
skcipher_request_set_crypt(subreq, req->src, req->dst, req->cryptlen,
req->iv);
kernel_fpu_begin();
err = crypto_skcipher_encrypt(subreq);
kernel_fpu_end();
skcipher_request_zero(subreq);
return err;
}
static int crypto_fpu_decrypt(struct skcipher_request *req)
{
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
struct crypto_fpu_ctx *ctx = crypto_skcipher_ctx(tfm);
struct crypto_skcipher *child = ctx->child;
SKCIPHER_REQUEST_ON_STACK(subreq, child);
int err;
skcipher_request_set_tfm(subreq, child);
skcipher_request_set_callback(subreq, 0, NULL, NULL);
skcipher_request_set_crypt(subreq, req->src, req->dst, req->cryptlen,
req->iv);
kernel_fpu_begin();
err = crypto_skcipher_decrypt(subreq);
kernel_fpu_end();
skcipher_request_zero(subreq);
return err;
}
static int crypto_fpu_init_tfm(struct crypto_skcipher *tfm)
{
struct skcipher_instance *inst = skcipher_alg_instance(tfm);
struct crypto_fpu_ctx *ctx = crypto_skcipher_ctx(tfm);
struct crypto_skcipher_spawn *spawn;
struct crypto_skcipher *cipher;
spawn = skcipher_instance_ctx(inst);
cipher = crypto_spawn_skcipher(spawn);
if (IS_ERR(cipher))
return PTR_ERR(cipher);
ctx->child = cipher;
return 0;
}
static void crypto_fpu_exit_tfm(struct crypto_skcipher *tfm)
{
struct crypto_fpu_ctx *ctx = crypto_skcipher_ctx(tfm);
crypto_free_skcipher(ctx->child);
}
static void crypto_fpu_free(struct skcipher_instance *inst)
{
crypto_drop_skcipher(skcipher_instance_ctx(inst));
kfree(inst);
}
static int crypto_fpu_create(struct crypto_template *tmpl, struct rtattr **tb)
{
struct crypto_skcipher_spawn *spawn;
struct skcipher_instance *inst;
struct crypto_attr_type *algt;
struct skcipher_alg *alg;
const char *cipher_name;
int err;
algt = crypto_get_attr_type(tb);
if (IS_ERR(algt))
return PTR_ERR(algt);
if ((algt->type ^ (CRYPTO_ALG_INTERNAL | CRYPTO_ALG_TYPE_SKCIPHER)) &
algt->mask)
return -EINVAL;
if (!(algt->mask & CRYPTO_ALG_INTERNAL))
return -EINVAL;
cipher_name = crypto_attr_alg_name(tb[1]);
if (IS_ERR(cipher_name))
return PTR_ERR(cipher_name);
inst = kzalloc(sizeof(*inst) + sizeof(*spawn), GFP_KERNEL);
if (!inst)
return -ENOMEM;
spawn = skcipher_instance_ctx(inst);
crypto_set_skcipher_spawn(spawn, skcipher_crypto_instance(inst));
err = crypto_grab_skcipher(spawn, cipher_name, CRYPTO_ALG_INTERNAL,
CRYPTO_ALG_INTERNAL | CRYPTO_ALG_ASYNC);
if (err)
goto out_free_inst;
alg = crypto_skcipher_spawn_alg(spawn);
err = crypto_inst_setname(skcipher_crypto_instance(inst), "fpu",
&alg->base);
if (err)
goto out_drop_skcipher;
inst->alg.base.cra_flags = CRYPTO_ALG_INTERNAL;
inst->alg.base.cra_priority = alg->base.cra_priority;
inst->alg.base.cra_blocksize = alg->base.cra_blocksize;
inst->alg.base.cra_alignmask = alg->base.cra_alignmask;
inst->alg.ivsize = crypto_skcipher_alg_ivsize(alg);
inst->alg.min_keysize = crypto_skcipher_alg_min_keysize(alg);
inst->alg.max_keysize = crypto_skcipher_alg_max_keysize(alg);
inst->alg.base.cra_ctxsize = sizeof(struct crypto_fpu_ctx);
inst->alg.init = crypto_fpu_init_tfm;
inst->alg.exit = crypto_fpu_exit_tfm;
inst->alg.setkey = crypto_fpu_setkey;
inst->alg.encrypt = crypto_fpu_encrypt;
inst->alg.decrypt = crypto_fpu_decrypt;
inst->free = crypto_fpu_free;
err = skcipher_register_instance(tmpl, inst);
if (err)
goto out_drop_skcipher;
out:
return err;
out_drop_skcipher:
crypto_drop_skcipher(spawn);
out_free_inst:
kfree(inst);
goto out;
}
static struct crypto_template crypto_fpu_tmpl = {
.name = "fpu",
.create = crypto_fpu_create,
.module = THIS_MODULE,
};
int __init crypto_fpu_init(void)
{
return crypto_register_template(&crypto_fpu_tmpl);
}
void crypto_fpu_exit(void)
{
crypto_unregister_template(&crypto_fpu_tmpl);
}
MODULE_ALIAS_CRYPTO("fpu");
# SPDX-License-Identifier: GPL-2.0
#
# Arch-specific CryptoAPI modules.
#
OBJECT_FILES_NON_STANDARD := y
avx2_supported := $(call as-instr,vpgatherdd %ymm0$(comma)(%eax$(comma)%ymm1\
$(comma)4)$(comma)%ymm2,yes,no)
ifeq ($(avx2_supported),yes)
obj-$(CONFIG_CRYPTO_SHA1_MB) += sha1-mb.o
sha1-mb-y := sha1_mb.o sha1_mb_mgr_flush_avx2.o \
sha1_mb_mgr_init_avx2.o sha1_mb_mgr_submit_avx2.o sha1_x8_avx2.o
endif
此差异已折叠。
/*
* Header file for multi buffer SHA context
*
* This file is provided under a dual BSD/GPLv2 license. When using or
* redistributing this file, you may do so under either license.
*
* GPL LICENSE SUMMARY
*
* Copyright(c) 2014 Intel Corporation.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of version 2 of the GNU General Public License as
* published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful, but
* WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License for more details.
*
* Contact Information:
* Tim Chen <tim.c.chen@linux.intel.com>
*
* BSD LICENSE
*
* Copyright(c) 2014 Intel Corporation.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* * Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* * Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in
* the documentation and/or other materials provided with the
* distribution.
* * Neither the name of Intel Corporation nor the names of its
* contributors may be used to endorse or promote products derived
* from this software without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
* A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
* OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
* DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
* THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
* (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
* OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#ifndef _SHA_MB_CTX_INTERNAL_H
#define _SHA_MB_CTX_INTERNAL_H
#include "sha1_mb_mgr.h"
#define HASH_UPDATE 0x00
#define HASH_LAST 0x01
#define HASH_DONE 0x02
#define HASH_FINAL 0x04
#define HASH_CTX_STS_IDLE 0x00
#define HASH_CTX_STS_PROCESSING 0x01
#define HASH_CTX_STS_LAST 0x02
#define HASH_CTX_STS_COMPLETE 0x04
enum hash_ctx_error {
HASH_CTX_ERROR_NONE = 0,
HASH_CTX_ERROR_INVALID_FLAGS = -1,
HASH_CTX_ERROR_ALREADY_PROCESSING = -2,
HASH_CTX_ERROR_ALREADY_COMPLETED = -3,
#ifdef HASH_CTX_DEBUG
HASH_CTX_ERROR_DEBUG_DIGEST_MISMATCH = -4,
#endif
};
#define hash_ctx_user_data(ctx) ((ctx)->user_data)
#define hash_ctx_digest(ctx) ((ctx)->job.result_digest)
#define hash_ctx_processing(ctx) ((ctx)->status & HASH_CTX_STS_PROCESSING)
#define hash_ctx_complete(ctx) ((ctx)->status == HASH_CTX_STS_COMPLETE)
#define hash_ctx_status(ctx) ((ctx)->status)
#define hash_ctx_error(ctx) ((ctx)->error)
#define hash_ctx_init(ctx) \
do { \
(ctx)->error = HASH_CTX_ERROR_NONE; \
(ctx)->status = HASH_CTX_STS_COMPLETE; \
} while (0)
/* Hash Constants and Typedefs */
#define SHA1_DIGEST_LENGTH 5
#define SHA1_LOG2_BLOCK_SIZE 6
#define SHA1_PADLENGTHFIELD_SIZE 8
#ifdef SHA_MB_DEBUG
#define assert(expr) \
do { \
if (unlikely(!(expr))) { \
printk(KERN_ERR "Assertion failed! %s,%s,%s,line=%d\n", \
#expr, __FILE__, __func__, __LINE__); \
} \
} while (0)
#else
#define assert(expr) do {} while (0)
#endif
struct sha1_ctx_mgr {
struct sha1_mb_mgr mgr;
};
/* typedef struct sha1_ctx_mgr sha1_ctx_mgr; */
struct sha1_hash_ctx {
/* Must be at struct offset 0 */
struct job_sha1 job;
/* status flag */
int status;
/* error flag */
int error;
uint64_t total_length;
const void *incoming_buffer;
uint32_t incoming_buffer_length;
uint8_t partial_block_buffer[SHA1_BLOCK_SIZE * 2];
uint32_t partial_block_buffer_length;
void *user_data;
};
#endif
/*
* Header file for multi buffer SHA1 algorithm manager
*
* This file is provided under a dual BSD/GPLv2 license. When using or
* redistributing this file, you may do so under either license.
*
* GPL LICENSE SUMMARY
*
* Copyright(c) 2014 Intel Corporation.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of version 2 of the GNU General Public License as
* published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful, but
* WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License for more details.
*
* Contact Information:
* James Guilford <james.guilford@intel.com>
* Tim Chen <tim.c.chen@linux.intel.com>
*
* BSD LICENSE
*
* Copyright(c) 2014 Intel Corporation.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* * Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* * Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in
* the documentation and/or other materials provided with the
* distribution.
* * Neither the name of Intel Corporation nor the names of its
* contributors may be used to endorse or promote products derived
* from this software without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
* A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
* OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
* DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
* THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
* (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
* OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#ifndef __SHA_MB_MGR_H
#define __SHA_MB_MGR_H
#include <linux/types.h>
#define NUM_SHA1_DIGEST_WORDS 5
enum job_sts { STS_UNKNOWN = 0,
STS_BEING_PROCESSED = 1,
STS_COMPLETED = 2,
STS_INTERNAL_ERROR = 3,
STS_ERROR = 4
};
struct job_sha1 {
u8 *buffer;
u32 len;
u32 result_digest[NUM_SHA1_DIGEST_WORDS] __aligned(32);
enum job_sts status;
void *user_data;
};
/* SHA1 out-of-order scheduler */
/* typedef uint32_t sha1_digest_array[5][8]; */
struct sha1_args_x8 {
uint32_t digest[5][8];
uint8_t *data_ptr[8];
};
struct sha1_lane_data {
struct job_sha1 *job_in_lane;
};
struct sha1_mb_mgr {
struct sha1_args_x8 args;
uint32_t lens[8];
/* each byte is index (0...7) of unused lanes */
uint64_t unused_lanes;
/* byte 4 is set to FF as a flag */
struct sha1_lane_data ldata[8];
};
#define SHA1_MB_MGR_NUM_LANES_AVX2 8
void sha1_mb_mgr_init_avx2(struct sha1_mb_mgr *state);
struct job_sha1 *sha1_mb_mgr_submit_avx2(struct sha1_mb_mgr *state,
struct job_sha1 *job);
struct job_sha1 *sha1_mb_mgr_flush_avx2(struct sha1_mb_mgr *state);
struct job_sha1 *sha1_mb_mgr_get_comp_job_avx2(struct sha1_mb_mgr *state);
#endif
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册
反馈
建议
客服 返回
顶部