提交 3e1a29b3 编写于 作者: L Linus Torvalds

Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6

Pull crypto updates from Herbert Xu:
 "API:

   - Decryption test vectors are now automatically generated from
     encryption test vectors.

  Algorithms:

   - Fix unaligned access issues in crc32/crc32c.

   - Add zstd compression algorithm.

   - Add AEGIS.

   - Add MORUS.

  Drivers:

   - Add accelerated AEGIS/MORUS on x86.

   - Add accelerated SM4 on arm64.

   - Removed x86 assembly salsa implementation as it is slower than C.

   - Add authenc(hmac(sha*), cbc(aes)) support in inside-secure.

   - Add ctr(aes) support in crypto4xx.

   - Add hardware key support in ccree.

   - Add support for new Centaur CPU in via-rng"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (112 commits)
  crypto: chtls - free beyond end rspq_skb_cache
  crypto: chtls - kbuild warnings
  crypto: chtls - dereference null variable
  crypto: chtls - wait for memory sendmsg, sendpage
  crypto: chtls - key len correction
  crypto: salsa20 - Revert "crypto: salsa20 - export generic helpers"
  crypto: x86/salsa20 - remove x86 salsa20 implementations
  crypto: ccp - Add GET_ID SEV command
  crypto: ccp - Add DOWNLOAD_FIRMWARE SEV command
  crypto: qat - Add MODULE_FIRMWARE for all qat drivers
  crypto: ccree - silence debug prints
  crypto: ccree - better clock handling
  crypto: ccree - correct host regs offset
  crypto: chelsio - Remove separate buffer used for DMA map B0 block in CCM
  crypt: chelsio - Send IV as Immediate for cipher algo
  crypto: chelsio - Return -ENOSPC for transient busy indication.
  crypto: caam/qi - fix warning in init_cgr()
  crypto: caam - fix rfc4543 descriptors
  crypto: caam - fix MC firmware detection
  crypto: clarify licensing of OpenSSL asm code
  ...
#define __ARM_ARCH__ __LINUX_ARM_ARCH__
@ SPDX-License-Identifier: GPL-2.0
@ This code is taken from the OpenSSL project but the author (Andy Polyakov)
@ has relicensed it under the GPLv2. Therefore this program is free software;
@ you can redistribute it and/or modify it under the terms of the GNU General
@ Public License version 2 as published by the Free Software Foundation.
@
@ The original headers, including the original license headers, are
@ included below for completeness.
@ ====================================================================
@ Written by Andy Polyakov <appro@fy.chalmers.se> for the OpenSSL
@ project. The module is, however, dual licensed under OpenSSL and
......
#!/usr/bin/env perl
# SPDX-License-Identifier: GPL-2.0
# This code is taken from the OpenSSL project but the author (Andy Polyakov)
# has relicensed it under the GPLv2. Therefore this program is free software;
# you can redistribute it and/or modify it under the terms of the GNU General
# Public License version 2 as published by the Free Software Foundation.
#
# The original headers, including the original license headers, are
# included below for completeness.
# ====================================================================
# Written by Andy Polyakov <appro@openssl.org> for the OpenSSL
# project. The module is, however, dual licensed under OpenSSL and
# CRYPTOGAMS licenses depending on where you obtain it. For further
# details see http://www.openssl.org/~appro/cryptogams/.
#
# Permission to use under GPL terms is granted.
# ====================================================================
# SHA256 block procedure for ARMv4. May 2007.
......
@ SPDX-License-Identifier: GPL-2.0
@ This code is taken from the OpenSSL project but the author (Andy Polyakov)
@ has relicensed it under the GPLv2. Therefore this program is free software;
@ you can redistribute it and/or modify it under the terms of the GNU General
@ Public License version 2 as published by the Free Software Foundation.
@
@ The original headers, including the original license headers, are
@ included below for completeness.
@ ====================================================================
@ Written by Andy Polyakov <appro@openssl.org> for the OpenSSL
@ project. The module is, however, dual licensed under OpenSSL and
@ CRYPTOGAMS licenses depending on where you obtain it. For further
@ details see http://www.openssl.org/~appro/cryptogams/.
@
@ Permission to use under GPL terms is granted.
@ ====================================================================
@ SHA256 block procedure for ARMv4. May 2007.
......
#!/usr/bin/env perl
# SPDX-License-Identifier: GPL-2.0
# This code is taken from the OpenSSL project but the author (Andy Polyakov)
# has relicensed it under the GPLv2. Therefore this program is free software;
# you can redistribute it and/or modify it under the terms of the GNU General
# Public License version 2 as published by the Free Software Foundation.
#
# The original headers, including the original license headers, are
# included below for completeness.
# ====================================================================
# Written by Andy Polyakov <appro@openssl.org> for the OpenSSL
# project. The module is, however, dual licensed under OpenSSL and
# CRYPTOGAMS licenses depending on where you obtain it. For further
# details see http://www.openssl.org/~appro/cryptogams/.
#
# Permission to use under GPL terms is granted.
# ====================================================================
# SHA512 block procedure for ARMv4. September 2007.
......
@ SPDX-License-Identifier: GPL-2.0
@ This code is taken from the OpenSSL project but the author (Andy Polyakov)
@ has relicensed it under the GPLv2. Therefore this program is free software;
@ you can redistribute it and/or modify it under the terms of the GNU General
@ Public License version 2 as published by the Free Software Foundation.
@
@ The original headers, including the original license headers, are
@ included below for completeness.
@ ====================================================================
@ Written by Andy Polyakov <appro@openssl.org> for the OpenSSL
@ project. The module is, however, dual licensed under OpenSSL and
@ CRYPTOGAMS licenses depending on where you obtain it. For further
@ details see http://www.openssl.org/~appro/cryptogams/.
@
@ Permission to use under GPL terms is granted.
@ ====================================================================
@ SHA512 block procedure for ARMv4. September 2007.
......
......@@ -47,6 +47,12 @@ config CRYPTO_SM3_ARM64_CE
select CRYPTO_HASH
select CRYPTO_SM3
config CRYPTO_SM4_ARM64_CE
tristate "SM4 symmetric cipher (ARMv8.2 Crypto Extensions)"
depends on KERNEL_MODE_NEON
select CRYPTO_ALGAPI
select CRYPTO_SM4
config CRYPTO_GHASH_ARM64_CE
tristate "GHASH/AES-GCM using ARMv8 Crypto Extensions"
depends on KERNEL_MODE_NEON
......
......@@ -23,6 +23,9 @@ sha3-ce-y := sha3-ce-glue.o sha3-ce-core.o
obj-$(CONFIG_CRYPTO_SM3_ARM64_CE) += sm3-ce.o
sm3-ce-y := sm3-ce-glue.o sm3-ce-core.o
obj-$(CONFIG_CRYPTO_SM4_ARM64_CE) += sm4-ce.o
sm4-ce-y := sm4-ce-glue.o sm4-ce-core.o
obj-$(CONFIG_CRYPTO_GHASH_ARM64_CE) += ghash-ce.o
ghash-ce-y := ghash-ce-glue.o ghash-ce-core.o
......
......@@ -19,24 +19,33 @@
* u32 *macp, u8 const rk[], u32 rounds);
*/
ENTRY(ce_aes_ccm_auth_data)
ldr w8, [x3] /* leftover from prev round? */
frame_push 7
mov x19, x0
mov x20, x1
mov x21, x2
mov x22, x3
mov x23, x4
mov x24, x5
ldr w25, [x22] /* leftover from prev round? */
ld1 {v0.16b}, [x0] /* load mac */
cbz w8, 1f
sub w8, w8, #16
cbz w25, 1f
sub w25, w25, #16
eor v1.16b, v1.16b, v1.16b
0: ldrb w7, [x1], #1 /* get 1 byte of input */
subs w2, w2, #1
add w8, w8, #1
0: ldrb w7, [x20], #1 /* get 1 byte of input */
subs w21, w21, #1
add w25, w25, #1
ins v1.b[0], w7
ext v1.16b, v1.16b, v1.16b, #1 /* rotate in the input bytes */
beq 8f /* out of input? */
cbnz w8, 0b
cbnz w25, 0b
eor v0.16b, v0.16b, v1.16b
1: ld1 {v3.4s}, [x4] /* load first round key */
prfm pldl1strm, [x1]
cmp w5, #12 /* which key size? */
add x6, x4, #16
sub w7, w5, #2 /* modified # of rounds */
1: ld1 {v3.4s}, [x23] /* load first round key */
prfm pldl1strm, [x20]
cmp w24, #12 /* which key size? */
add x6, x23, #16
sub w7, w24, #2 /* modified # of rounds */
bmi 2f
bne 5f
mov v5.16b, v3.16b
......@@ -55,33 +64,43 @@ ENTRY(ce_aes_ccm_auth_data)
ld1 {v5.4s}, [x6], #16 /* load next round key */
bpl 3b
aese v0.16b, v4.16b
subs w2, w2, #16 /* last data? */
subs w21, w21, #16 /* last data? */
eor v0.16b, v0.16b, v5.16b /* final round */
bmi 6f
ld1 {v1.16b}, [x1], #16 /* load next input block */
ld1 {v1.16b}, [x20], #16 /* load next input block */
eor v0.16b, v0.16b, v1.16b /* xor with mac */
bne 1b
6: st1 {v0.16b}, [x0] /* store mac */
beq 6f
if_will_cond_yield_neon
st1 {v0.16b}, [x19] /* store mac */
do_cond_yield_neon
ld1 {v0.16b}, [x19] /* reload mac */
endif_yield_neon
b 1b
6: st1 {v0.16b}, [x19] /* store mac */
beq 10f
adds w2, w2, #16
adds w21, w21, #16
beq 10f
mov w8, w2
7: ldrb w7, [x1], #1
mov w25, w21
7: ldrb w7, [x20], #1
umov w6, v0.b[0]
eor w6, w6, w7
strb w6, [x0], #1
subs w2, w2, #1
strb w6, [x19], #1
subs w21, w21, #1
beq 10f
ext v0.16b, v0.16b, v0.16b, #1 /* rotate out the mac bytes */
b 7b
8: mov w7, w8
add w8, w8, #16
8: mov w7, w25
add w25, w25, #16
9: ext v1.16b, v1.16b, v1.16b, #1
adds w7, w7, #1
bne 9b
eor v0.16b, v0.16b, v1.16b
st1 {v0.16b}, [x0]
10: str w8, [x3]
st1 {v0.16b}, [x19]
10: str w25, [x22]
frame_pop
ret
ENDPROC(ce_aes_ccm_auth_data)
......@@ -126,19 +145,29 @@ ENTRY(ce_aes_ccm_final)
ENDPROC(ce_aes_ccm_final)
.macro aes_ccm_do_crypt,enc
ldr x8, [x6, #8] /* load lower ctr */
ld1 {v0.16b}, [x5] /* load mac */
CPU_LE( rev x8, x8 ) /* keep swabbed ctr in reg */
frame_push 8
mov x19, x0
mov x20, x1
mov x21, x2
mov x22, x3
mov x23, x4
mov x24, x5
mov x25, x6
ldr x26, [x25, #8] /* load lower ctr */
ld1 {v0.16b}, [x24] /* load mac */
CPU_LE( rev x26, x26 ) /* keep swabbed ctr in reg */
0: /* outer loop */
ld1 {v1.8b}, [x6] /* load upper ctr */
prfm pldl1strm, [x1]
add x8, x8, #1
rev x9, x8
cmp w4, #12 /* which key size? */
sub w7, w4, #2 /* get modified # of rounds */
ld1 {v1.8b}, [x25] /* load upper ctr */
prfm pldl1strm, [x20]
add x26, x26, #1
rev x9, x26
cmp w23, #12 /* which key size? */
sub w7, w23, #2 /* get modified # of rounds */
ins v1.d[1], x9 /* no carry in lower ctr */
ld1 {v3.4s}, [x3] /* load first round key */
add x10, x3, #16
ld1 {v3.4s}, [x22] /* load first round key */
add x10, x22, #16
bmi 1f
bne 4f
mov v5.16b, v3.16b
......@@ -165,9 +194,9 @@ CPU_LE( rev x8, x8 ) /* keep swabbed ctr in reg */
bpl 2b
aese v0.16b, v4.16b
aese v1.16b, v4.16b
subs w2, w2, #16
bmi 6f /* partial block? */
ld1 {v2.16b}, [x1], #16 /* load next input block */
subs w21, w21, #16
bmi 7f /* partial block? */
ld1 {v2.16b}, [x20], #16 /* load next input block */
.if \enc == 1
eor v2.16b, v2.16b, v5.16b /* final round enc+mac */
eor v1.16b, v1.16b, v2.16b /* xor with crypted ctr */
......@@ -176,18 +205,29 @@ CPU_LE( rev x8, x8 ) /* keep swabbed ctr in reg */
eor v1.16b, v2.16b, v5.16b /* final round enc */
.endif
eor v0.16b, v0.16b, v2.16b /* xor mac with pt ^ rk[last] */
st1 {v1.16b}, [x0], #16 /* write output block */
bne 0b
CPU_LE( rev x8, x8 )
st1 {v0.16b}, [x5] /* store mac */
str x8, [x6, #8] /* store lsb end of ctr (BE) */
5: ret
6: eor v0.16b, v0.16b, v5.16b /* final round mac */
st1 {v1.16b}, [x19], #16 /* write output block */
beq 5f
if_will_cond_yield_neon
st1 {v0.16b}, [x24] /* store mac */
do_cond_yield_neon
ld1 {v0.16b}, [x24] /* reload mac */
endif_yield_neon
b 0b
5:
CPU_LE( rev x26, x26 )
st1 {v0.16b}, [x24] /* store mac */
str x26, [x25, #8] /* store lsb end of ctr (BE) */
6: frame_pop
ret
7: eor v0.16b, v0.16b, v5.16b /* final round mac */
eor v1.16b, v1.16b, v5.16b /* final round enc */
st1 {v0.16b}, [x5] /* store mac */
add w2, w2, #16 /* process partial tail block */
7: ldrb w9, [x1], #1 /* get 1 byte of input */
st1 {v0.16b}, [x24] /* store mac */
add w21, w21, #16 /* process partial tail block */
8: ldrb w9, [x20], #1 /* get 1 byte of input */
umov w6, v1.b[0] /* get top crypted ctr byte */
umov w7, v0.b[0] /* get top mac byte */
.if \enc == 1
......@@ -197,13 +237,13 @@ CPU_LE( rev x8, x8 )
eor w9, w9, w6
eor w7, w7, w9
.endif
strb w9, [x0], #1 /* store out byte */
strb w7, [x5], #1 /* store mac byte */
subs w2, w2, #1
beq 5b
strb w9, [x19], #1 /* store out byte */
strb w7, [x24], #1 /* store mac byte */
subs w21, w21, #1
beq 6b
ext v0.16b, v0.16b, v0.16b, #1 /* shift out mac byte */
ext v1.16b, v1.16b, v1.16b, #1 /* shift out ctr byte */
b 7b
b 8b
.endm
/*
......
......@@ -30,18 +30,21 @@
.endm
/* prepare for encryption with key in rk[] */
.macro enc_prepare, rounds, rk, ignore
load_round_keys \rounds, \rk
.macro enc_prepare, rounds, rk, temp
mov \temp, \rk
load_round_keys \rounds, \temp
.endm
/* prepare for encryption (again) but with new key in rk[] */
.macro enc_switch_key, rounds, rk, ignore
load_round_keys \rounds, \rk
.macro enc_switch_key, rounds, rk, temp
mov \temp, \rk
load_round_keys \rounds, \temp
.endm
/* prepare for decryption with key in rk[] */
.macro dec_prepare, rounds, rk, ignore
load_round_keys \rounds, \rk
.macro dec_prepare, rounds, rk, temp
mov \temp, \rk
load_round_keys \rounds, \temp
.endm
.macro do_enc_Nx, de, mc, k, i0, i1, i2, i3
......
......@@ -14,12 +14,12 @@
.align 4
aes_encrypt_block4x:
encrypt_block4x v0, v1, v2, v3, w3, x2, x8, w7
encrypt_block4x v0, v1, v2, v3, w22, x21, x8, w7
ret
ENDPROC(aes_encrypt_block4x)
aes_decrypt_block4x:
decrypt_block4x v0, v1, v2, v3, w3, x2, x8, w7
decrypt_block4x v0, v1, v2, v3, w22, x21, x8, w7
ret
ENDPROC(aes_decrypt_block4x)
......@@ -31,57 +31,71 @@ ENDPROC(aes_decrypt_block4x)
*/
AES_ENTRY(aes_ecb_encrypt)
stp x29, x30, [sp, #-16]!
mov x29, sp
frame_push 5
enc_prepare w3, x2, x5
mov x19, x0
mov x20, x1
mov x21, x2
mov x22, x3
mov x23, x4
.Lecbencrestart:
enc_prepare w22, x21, x5
.LecbencloopNx:
subs w4, w4, #4
subs w23, w23, #4
bmi .Lecbenc1x
ld1 {v0.16b-v3.16b}, [x1], #64 /* get 4 pt blocks */
ld1 {v0.16b-v3.16b}, [x20], #64 /* get 4 pt blocks */
bl aes_encrypt_block4x
st1 {v0.16b-v3.16b}, [x0], #64
st1 {v0.16b-v3.16b}, [x19], #64
cond_yield_neon .Lecbencrestart
b .LecbencloopNx
.Lecbenc1x:
adds w4, w4, #4
adds w23, w23, #4
beq .Lecbencout
.Lecbencloop:
ld1 {v0.16b}, [x1], #16 /* get next pt block */
encrypt_block v0, w3, x2, x5, w6
st1 {v0.16b}, [x0], #16
subs w4, w4, #1
ld1 {v0.16b}, [x20], #16 /* get next pt block */
encrypt_block v0, w22, x21, x5, w6
st1 {v0.16b}, [x19], #16
subs w23, w23, #1
bne .Lecbencloop
.Lecbencout:
ldp x29, x30, [sp], #16
frame_pop
ret
AES_ENDPROC(aes_ecb_encrypt)
AES_ENTRY(aes_ecb_decrypt)
stp x29, x30, [sp, #-16]!
mov x29, sp
frame_push 5
mov x19, x0
mov x20, x1
mov x21, x2
mov x22, x3
mov x23, x4
dec_prepare w3, x2, x5
.Lecbdecrestart:
dec_prepare w22, x21, x5
.LecbdecloopNx:
subs w4, w4, #4
subs w23, w23, #4
bmi .Lecbdec1x
ld1 {v0.16b-v3.16b}, [x1], #64 /* get 4 ct blocks */
ld1 {v0.16b-v3.16b}, [x20], #64 /* get 4 ct blocks */
bl aes_decrypt_block4x
st1 {v0.16b-v3.16b}, [x0], #64
st1 {v0.16b-v3.16b}, [x19], #64
cond_yield_neon .Lecbdecrestart
b .LecbdecloopNx
.Lecbdec1x:
adds w4, w4, #4
adds w23, w23, #4
beq .Lecbdecout
.Lecbdecloop:
ld1 {v0.16b}, [x1], #16 /* get next ct block */
decrypt_block v0, w3, x2, x5, w6
st1 {v0.16b}, [x0], #16
subs w4, w4, #1
ld1 {v0.16b}, [x20], #16 /* get next ct block */
decrypt_block v0, w22, x21, x5, w6
st1 {v0.16b}, [x19], #16
subs w23, w23, #1
bne .Lecbdecloop
.Lecbdecout:
ldp x29, x30, [sp], #16
frame_pop
ret
AES_ENDPROC(aes_ecb_decrypt)
......@@ -94,78 +108,100 @@ AES_ENDPROC(aes_ecb_decrypt)
*/
AES_ENTRY(aes_cbc_encrypt)
ld1 {v4.16b}, [x5] /* get iv */
enc_prepare w3, x2, x6
frame_push 6
mov x19, x0
mov x20, x1
mov x21, x2
mov x22, x3
mov x23, x4
mov x24, x5
.Lcbcencrestart:
ld1 {v4.16b}, [x24] /* get iv */
enc_prepare w22, x21, x6
.Lcbcencloop4x:
subs w4, w4, #4
subs w23, w23, #4
bmi .Lcbcenc1x
ld1 {v0.16b-v3.16b}, [x1], #64 /* get 4 pt blocks */
ld1 {v0.16b-v3.16b}, [x20], #64 /* get 4 pt blocks */
eor v0.16b, v0.16b, v4.16b /* ..and xor with iv */
encrypt_block v0, w3, x2, x6, w7
encrypt_block v0, w22, x21, x6, w7
eor v1.16b, v1.16b, v0.16b
encrypt_block v1, w3, x2, x6, w7
encrypt_block v1, w22, x21, x6, w7
eor v2.16b, v2.16b, v1.16b
encrypt_block v2, w3, x2, x6, w7
encrypt_block v2, w22, x21, x6, w7
eor v3.16b, v3.16b, v2.16b
encrypt_block v3, w3, x2, x6, w7
st1 {v0.16b-v3.16b}, [x0], #64
encrypt_block v3, w22, x21, x6, w7
st1 {v0.16b-v3.16b}, [x19], #64
mov v4.16b, v3.16b
st1 {v4.16b}, [x24] /* return iv */
cond_yield_neon .Lcbcencrestart
b .Lcbcencloop4x
.Lcbcenc1x:
adds w4, w4, #4
adds w23, w23, #4
beq .Lcbcencout
.Lcbcencloop:
ld1 {v0.16b}, [x1], #16 /* get next pt block */
ld1 {v0.16b}, [x20], #16 /* get next pt block */
eor v4.16b, v4.16b, v0.16b /* ..and xor with iv */
encrypt_block v4, w3, x2, x6, w7
st1 {v4.16b}, [x0], #16
subs w4, w4, #1
encrypt_block v4, w22, x21, x6, w7
st1 {v4.16b}, [x19], #16
subs w23, w23, #1
bne .Lcbcencloop
.Lcbcencout:
st1 {v4.16b}, [x5] /* return iv */
st1 {v4.16b}, [x24] /* return iv */
frame_pop
ret
AES_ENDPROC(aes_cbc_encrypt)
AES_ENTRY(aes_cbc_decrypt)
stp x29, x30, [sp, #-16]!
mov x29, sp
frame_push 6
mov x19, x0
mov x20, x1
mov x21, x2
mov x22, x3
mov x23, x4
mov x24, x5
ld1 {v7.16b}, [x5] /* get iv */
dec_prepare w3, x2, x6
.Lcbcdecrestart:
ld1 {v7.16b}, [x24] /* get iv */
dec_prepare w22, x21, x6
.LcbcdecloopNx:
subs w4, w4, #4
subs w23, w23, #4
bmi .Lcbcdec1x
ld1 {v0.16b-v3.16b}, [x1], #64 /* get 4 ct blocks */
ld1 {v0.16b-v3.16b}, [x20], #64 /* get 4 ct blocks */
mov v4.16b, v0.16b
mov v5.16b, v1.16b
mov v6.16b, v2.16b
bl aes_decrypt_block4x
sub x1, x1, #16
sub x20, x20, #16
eor v0.16b, v0.16b, v7.16b
eor v1.16b, v1.16b, v4.16b
ld1 {v7.16b}, [x1], #16 /* reload 1 ct block */
ld1 {v7.16b}, [x20], #16 /* reload 1 ct block */
eor v2.16b, v2.16b, v5.16b
eor v3.16b, v3.16b, v6.16b
st1 {v0.16b-v3.16b}, [x0], #64
st1 {v0.16b-v3.16b}, [x19], #64
st1 {v7.16b}, [x24] /* return iv */
cond_yield_neon .Lcbcdecrestart
b .LcbcdecloopNx
.Lcbcdec1x:
adds w4, w4, #4
adds w23, w23, #4
beq .Lcbcdecout
.Lcbcdecloop:
ld1 {v1.16b}, [x1], #16 /* get next ct block */
ld1 {v1.16b}, [x20], #16 /* get next ct block */
mov v0.16b, v1.16b /* ...and copy to v0 */
decrypt_block v0, w3, x2, x6, w7
decrypt_block v0, w22, x21, x6, w7
eor v0.16b, v0.16b, v7.16b /* xor with iv => pt */
mov v7.16b, v1.16b /* ct is next iv */
st1 {v0.16b}, [x0], #16
subs w4, w4, #1
st1 {v0.16b}, [x19], #16
subs w23, w23, #1
bne .Lcbcdecloop
.Lcbcdecout:
st1 {v7.16b}, [x5] /* return iv */
ldp x29, x30, [sp], #16
st1 {v7.16b}, [x24] /* return iv */
frame_pop
ret
AES_ENDPROC(aes_cbc_decrypt)
......@@ -176,19 +212,26 @@ AES_ENDPROC(aes_cbc_decrypt)
*/
AES_ENTRY(aes_ctr_encrypt)
stp x29, x30, [sp, #-16]!
mov x29, sp
frame_push 6
enc_prepare w3, x2, x6
ld1 {v4.16b}, [x5]
mov x19, x0
mov x20, x1
mov x21, x2
mov x22, x3
mov x23, x4
mov x24, x5
.Lctrrestart:
enc_prepare w22, x21, x6
ld1 {v4.16b}, [x24]
umov x6, v4.d[1] /* keep swabbed ctr in reg */
rev x6, x6
cmn w6, w4 /* 32 bit overflow? */
bcs .Lctrloop
.LctrloopNx:
subs w4, w4, #4
subs w23, w23, #4
bmi .Lctr1x
cmn w6, #4 /* 32 bit overflow? */
bcs .Lctr1x
ldr q8, =0x30000000200000001 /* addends 1,2,3[,0] */
dup v7.4s, w6
mov v0.16b, v4.16b
......@@ -200,25 +243,27 @@ AES_ENTRY(aes_ctr_encrypt)
mov v1.s[3], v8.s[0]
mov v2.s[3], v8.s[1]
mov v3.s[3], v8.s[2]
ld1 {v5.16b-v7.16b}, [x1], #48 /* get 3 input blocks */
ld1 {v5.16b-v7.16b}, [x20], #48 /* get 3 input blocks */
bl aes_encrypt_block4x
eor v0.16b, v5.16b, v0.16b
ld1 {v5.16b}, [x1], #16 /* get 1 input block */
ld1 {v5.16b}, [x20], #16 /* get 1 input block */
eor v1.16b, v6.16b, v1.16b
eor v2.16b, v7.16b, v2.16b
eor v3.16b, v5.16b, v3.16b
st1 {v0.16b-v3.16b}, [x0], #64
st1 {v0.16b-v3.16b}, [x19], #64
add x6, x6, #4
rev x7, x6
ins v4.d[1], x7
cbz w4, .Lctrout
cbz w23, .Lctrout
st1 {v4.16b}, [x24] /* return next CTR value */
cond_yield_neon .Lctrrestart
b .LctrloopNx
.Lctr1x:
adds w4, w4, #4
adds w23, w23, #4
beq .Lctrout
.Lctrloop:
mov v0.16b, v4.16b
encrypt_block v0, w3, x2, x8, w7
encrypt_block v0, w22, x21, x8, w7
adds x6, x6, #1 /* increment BE ctr */
rev x7, x6
......@@ -226,22 +271,22 @@ AES_ENTRY(aes_ctr_encrypt)
bcs .Lctrcarry /* overflow? */
.Lctrcarrydone:
subs w4, w4, #1
subs w23, w23, #1
bmi .Lctrtailblock /* blocks <0 means tail block */
ld1 {v3.16b}, [x1], #16
ld1 {v3.16b}, [x20], #16
eor v3.16b, v0.16b, v3.16b
st1 {v3.16b}, [x0], #16
st1 {v3.16b}, [x19], #16
bne .Lctrloop
.Lctrout:
st1 {v4.16b}, [x5] /* return next CTR value */
ldp x29, x30, [sp], #16
st1 {v4.16b}, [x24] /* return next CTR value */
.Lctrret:
frame_pop
ret
.Lctrtailblock:
st1 {v0.16b}, [x0]
ldp x29, x30, [sp], #16
ret
st1 {v0.16b}, [x19]
b .Lctrret
.Lctrcarry:
umov x7, v4.d[0] /* load upper word of ctr */
......@@ -274,10 +319,16 @@ CPU_LE( .quad 1, 0x87 )
CPU_BE( .quad 0x87, 1 )
AES_ENTRY(aes_xts_encrypt)
stp x29, x30, [sp, #-16]!
mov x29, sp
frame_push 6
mov x19, x0
mov x20, x1
mov x21, x2
mov x22, x3
mov x23, x4
mov x24, x6
ld1 {v4.16b}, [x6]
ld1 {v4.16b}, [x24]
cbz w7, .Lxtsencnotfirst
enc_prepare w3, x5, x8
......@@ -286,15 +337,17 @@ AES_ENTRY(aes_xts_encrypt)
ldr q7, .Lxts_mul_x
b .LxtsencNx
.Lxtsencrestart:
ld1 {v4.16b}, [x24]
.Lxtsencnotfirst:
enc_prepare w3, x2, x8
enc_prepare w22, x21, x8
.LxtsencloopNx:
ldr q7, .Lxts_mul_x
next_tweak v4, v4, v7, v8
.LxtsencNx:
subs w4, w4, #4
subs w23, w23, #4
bmi .Lxtsenc1x
ld1 {v0.16b-v3.16b}, [x1], #64 /* get 4 pt blocks */
ld1 {v0.16b-v3.16b}, [x20], #64 /* get 4 pt blocks */
next_tweak v5, v4, v7, v8
eor v0.16b, v0.16b, v4.16b
next_tweak v6, v5, v7, v8
......@@ -307,35 +360,43 @@ AES_ENTRY(aes_xts_encrypt)
eor v0.16b, v0.16b, v4.16b
eor v1.16b, v1.16b, v5.16b
eor v2.16b, v2.16b, v6.16b
st1 {v0.16b-v3.16b}, [x0], #64
st1 {v0.16b-v3.16b}, [x19], #64
mov v4.16b, v7.16b
cbz w4, .Lxtsencout
cbz w23, .Lxtsencout
st1 {v4.16b}, [x24]
cond_yield_neon .Lxtsencrestart
b .LxtsencloopNx
.Lxtsenc1x:
adds w4, w4, #4
adds w23, w23, #4
beq .Lxtsencout
.Lxtsencloop:
ld1 {v1.16b}, [x1], #16
ld1 {v1.16b}, [x20], #16
eor v0.16b, v1.16b, v4.16b
encrypt_block v0, w3, x2, x8, w7
encrypt_block v0, w22, x21, x8, w7
eor v0.16b, v0.16b, v4.16b
st1 {v0.16b}, [x0], #16
subs w4, w4, #1
st1 {v0.16b}, [x19], #16
subs w23, w23, #1
beq .Lxtsencout
next_tweak v4, v4, v7, v8
b .Lxtsencloop
.Lxtsencout:
st1 {v4.16b}, [x6]
ldp x29, x30, [sp], #16
st1 {v4.16b}, [x24]
frame_pop
ret
AES_ENDPROC(aes_xts_encrypt)
AES_ENTRY(aes_xts_decrypt)
stp x29, x30, [sp, #-16]!
mov x29, sp
frame_push 6
ld1 {v4.16b}, [x6]
mov x19, x0
mov x20, x1
mov x21, x2
mov x22, x3
mov x23, x4
mov x24, x6
ld1 {v4.16b}, [x24]
cbz w7, .Lxtsdecnotfirst
enc_prepare w3, x5, x8
......@@ -344,15 +405,17 @@ AES_ENTRY(aes_xts_decrypt)
ldr q7, .Lxts_mul_x
b .LxtsdecNx
.Lxtsdecrestart:
ld1 {v4.16b}, [x24]
.Lxtsdecnotfirst:
dec_prepare w3, x2, x8
dec_prepare w22, x21, x8
.LxtsdecloopNx:
ldr q7, .Lxts_mul_x
next_tweak v4, v4, v7, v8
.LxtsdecNx:
subs w4, w4, #4
subs w23, w23, #4
bmi .Lxtsdec1x
ld1 {v0.16b-v3.16b}, [x1], #64 /* get 4 ct blocks */
ld1 {v0.16b-v3.16b}, [x20], #64 /* get 4 ct blocks */
next_tweak v5, v4, v7, v8
eor v0.16b, v0.16b, v4.16b
next_tweak v6, v5, v7, v8
......@@ -365,26 +428,28 @@ AES_ENTRY(aes_xts_decrypt)
eor v0.16b, v0.16b, v4.16b
eor v1.16b, v1.16b, v5.16b
eor v2.16b, v2.16b, v6.16b
st1 {v0.16b-v3.16b}, [x0], #64
st1 {v0.16b-v3.16b}, [x19], #64
mov v4.16b, v7.16b
cbz w4, .Lxtsdecout
cbz w23, .Lxtsdecout
st1 {v4.16b}, [x24]
cond_yield_neon .Lxtsdecrestart
b .LxtsdecloopNx
.Lxtsdec1x:
adds w4, w4, #4
adds w23, w23, #4
beq .Lxtsdecout
.Lxtsdecloop:
ld1 {v1.16b}, [x1], #16
ld1 {v1.16b}, [x20], #16
eor v0.16b, v1.16b, v4.16b
decrypt_block v0, w3, x2, x8, w7
decrypt_block v0, w22, x21, x8, w7
eor v0.16b, v0.16b, v4.16b
st1 {v0.16b}, [x0], #16
subs w4, w4, #1
st1 {v0.16b}, [x19], #16
subs w23, w23, #1
beq .Lxtsdecout
next_tweak v4, v4, v7, v8
b .Lxtsdecloop
.Lxtsdecout:
st1 {v4.16b}, [x6]
ldp x29, x30, [sp], #16
st1 {v4.16b}, [x24]
frame_pop
ret
AES_ENDPROC(aes_xts_decrypt)
......@@ -393,43 +458,61 @@ AES_ENDPROC(aes_xts_decrypt)
* int blocks, u8 dg[], int enc_before, int enc_after)
*/
AES_ENTRY(aes_mac_update)
ld1 {v0.16b}, [x4] /* get dg */
frame_push 6
mov x19, x0
mov x20, x1
mov x21, x2
mov x22, x3
mov x23, x4
mov x24, x6
ld1 {v0.16b}, [x23] /* get dg */
enc_prepare w2, x1, x7
cbz w5, .Lmacloop4x
encrypt_block v0, w2, x1, x7, w8
.Lmacloop4x:
subs w3, w3, #4
subs w22, w22, #4
bmi .Lmac1x
ld1 {v1.16b-v4.16b}, [x0], #64 /* get next pt block */
ld1 {v1.16b-v4.16b}, [x19], #64 /* get next pt block */
eor v0.16b, v0.16b, v1.16b /* ..and xor with dg */
encrypt_block v0, w2, x1, x7, w8
encrypt_block v0, w21, x20, x7, w8
eor v0.16b, v0.16b, v2.16b
encrypt_block v0, w2, x1, x7, w8
encrypt_block v0, w21, x20, x7, w8
eor v0.16b, v0.16b, v3.16b
encrypt_block v0, w2, x1, x7, w8
encrypt_block v0, w21, x20, x7, w8
eor v0.16b, v0.16b, v4.16b
cmp w3, wzr
csinv x5, x6, xzr, eq
cmp w22, wzr
csinv x5, x24, xzr, eq
cbz w5, .Lmacout
encrypt_block v0, w2, x1, x7, w8
encrypt_block v0, w21, x20, x7, w8
st1 {v0.16b}, [x23] /* return dg */
cond_yield_neon .Lmacrestart
b .Lmacloop4x
.Lmac1x:
add w3, w3, #4
add w22, w22, #4
.Lmacloop:
cbz w3, .Lmacout
ld1 {v1.16b}, [x0], #16 /* get next pt block */
cbz w22, .Lmacout
ld1 {v1.16b}, [x19], #16 /* get next pt block */
eor v0.16b, v0.16b, v1.16b /* ..and xor with dg */
subs w3, w3, #1
csinv x5, x6, xzr, eq
subs w22, w22, #1
csinv x5, x24, xzr, eq
cbz w5, .Lmacout
encrypt_block v0, w2, x1, x7, w8
.Lmacenc:
encrypt_block v0, w21, x20, x7, w8
b .Lmacloop
.Lmacout:
st1 {v0.16b}, [x4] /* return dg */
st1 {v0.16b}, [x23] /* return dg */
frame_pop
ret
.Lmacrestart:
ld1 {v0.16b}, [x23] /* get dg */
enc_prepare w21, x20, x0
b .Lmacloop4x
AES_ENDPROC(aes_mac_update)
......@@ -565,54 +565,61 @@ ENDPROC(aesbs_decrypt8)
* int blocks)
*/
.macro __ecb_crypt, do8, o0, o1, o2, o3, o4, o5, o6, o7
stp x29, x30, [sp, #-16]!
mov x29, sp
frame_push 5
mov x19, x0
mov x20, x1
mov x21, x2
mov x22, x3
mov x23, x4
99: mov x5, #1
lsl x5, x5, x4
subs w4, w4, #8
csel x4, x4, xzr, pl
lsl x5, x5, x23
subs w23, w23, #8
csel x23, x23, xzr, pl
csel x5, x5, xzr, mi
ld1 {v0.16b}, [x1], #16
ld1 {v0.16b}, [x20], #16
tbnz x5, #1, 0f
ld1 {v1.16b}, [x1], #16
ld1 {v1.16b}, [x20], #16
tbnz x5, #2, 0f
ld1 {v2.16b}, [x1], #16
ld1 {v2.16b}, [x20], #16
tbnz x5, #3, 0f
ld1 {v3.16b}, [x1], #16
ld1 {v3.16b}, [x20], #16
tbnz x5, #4, 0f
ld1 {v4.16b}, [x1], #16
ld1 {v4.16b}, [x20], #16
tbnz x5, #5, 0f
ld1 {v5.16b}, [x1], #16
ld1 {v5.16b}, [x20], #16
tbnz x5, #6, 0f
ld1 {v6.16b}, [x1], #16
ld1 {v6.16b}, [x20], #16
tbnz x5, #7, 0f
ld1 {v7.16b}, [x1], #16
ld1 {v7.16b}, [x20], #16
0: mov bskey, x2
mov rounds, x3
0: mov bskey, x21
mov rounds, x22
bl \do8
st1 {\o0\().16b}, [x0], #16
st1 {\o0\().16b}, [x19], #16
tbnz x5, #1, 1f
st1 {\o1\().16b}, [x0], #16
st1 {\o1\().16b}, [x19], #16
tbnz x5, #2, 1f
st1 {\o2\().16b}, [x0], #16
st1 {\o2\().16b}, [x19], #16
tbnz x5, #3, 1f
st1 {\o3\().16b}, [x0], #16
st1 {\o3\().16b}, [x19], #16
tbnz x5, #4, 1f
st1 {\o4\().16b}, [x0], #16
st1 {\o4\().16b}, [x19], #16
tbnz x5, #5, 1f
st1 {\o5\().16b}, [x0], #16
st1 {\o5\().16b}, [x19], #16
tbnz x5, #6, 1f
st1 {\o6\().16b}, [x0], #16
st1 {\o6\().16b}, [x19], #16
tbnz x5, #7, 1f
st1 {\o7\().16b}, [x0], #16
st1 {\o7\().16b}, [x19], #16
cbnz x4, 99b
cbz x23, 1f
cond_yield_neon
b 99b
1: ldp x29, x30, [sp], #16
1: frame_pop
ret
.endm
......@@ -632,43 +639,49 @@ ENDPROC(aesbs_ecb_decrypt)
*/
.align 4
ENTRY(aesbs_cbc_decrypt)
stp x29, x30, [sp, #-16]!
mov x29, sp
frame_push 6
mov x19, x0
mov x20, x1
mov x21, x2
mov x22, x3
mov x23, x4
mov x24, x5
99: mov x6, #1
lsl x6, x6, x4
subs w4, w4, #8
csel x4, x4, xzr, pl
lsl x6, x6, x23
subs w23, w23, #8
csel x23, x23, xzr, pl
csel x6, x6, xzr, mi
ld1 {v0.16b}, [x1], #16
ld1 {v0.16b}, [x20], #16
mov v25.16b, v0.16b
tbnz x6, #1, 0f
ld1 {v1.16b}, [x1], #16
ld1 {v1.16b}, [x20], #16
mov v26.16b, v1.16b
tbnz x6, #2, 0f
ld1 {v2.16b}, [x1], #16
ld1 {v2.16b}, [x20], #16
mov v27.16b, v2.16b
tbnz x6, #3, 0f
ld1 {v3.16b}, [x1], #16
ld1 {v3.16b}, [x20], #16
mov v28.16b, v3.16b
tbnz x6, #4, 0f
ld1 {v4.16b}, [x1], #16
ld1 {v4.16b}, [x20], #16
mov v29.16b, v4.16b
tbnz x6, #5, 0f
ld1 {v5.16b}, [x1], #16
ld1 {v5.16b}, [x20], #16
mov v30.16b, v5.16b
tbnz x6, #6, 0f
ld1 {v6.16b}, [x1], #16
ld1 {v6.16b}, [x20], #16
mov v31.16b, v6.16b
tbnz x6, #7, 0f
ld1 {v7.16b}, [x1]
ld1 {v7.16b}, [x20]
0: mov bskey, x2
mov rounds, x3
0: mov bskey, x21
mov rounds, x22
bl aesbs_decrypt8
ld1 {v24.16b}, [x5] // load IV
ld1 {v24.16b}, [x24] // load IV
eor v1.16b, v1.16b, v25.16b
eor v6.16b, v6.16b, v26.16b
......@@ -679,34 +692,36 @@ ENTRY(aesbs_cbc_decrypt)
eor v3.16b, v3.16b, v30.16b
eor v5.16b, v5.16b, v31.16b
st1 {v0.16b}, [x0], #16
st1 {v0.16b}, [x19], #16
mov v24.16b, v25.16b
tbnz x6, #1, 1f
st1 {v1.16b}, [x0], #16
st1 {v1.16b}, [x19], #16
mov v24.16b, v26.16b
tbnz x6, #2, 1f
st1 {v6.16b}, [x0], #16
st1 {v6.16b}, [x19], #16
mov v24.16b, v27.16b
tbnz x6, #3, 1f
st1 {v4.16b}, [x0], #16
st1 {v4.16b}, [x19], #16
mov v24.16b, v28.16b
tbnz x6, #4, 1f
st1 {v2.16b}, [x0], #16
st1 {v2.16b}, [x19], #16
mov v24.16b, v29.16b
tbnz x6, #5, 1f
st1 {v7.16b}, [x0], #16
st1 {v7.16b}, [x19], #16
mov v24.16b, v30.16b
tbnz x6, #6, 1f
st1 {v3.16b}, [x0], #16
st1 {v3.16b}, [x19], #16
mov v24.16b, v31.16b
tbnz x6, #7, 1f
ld1 {v24.16b}, [x1], #16
st1 {v5.16b}, [x0], #16
1: st1 {v24.16b}, [x5] // store IV
ld1 {v24.16b}, [x20], #16
st1 {v5.16b}, [x19], #16
1: st1 {v24.16b}, [x24] // store IV
cbnz x4, 99b
cbz x23, 2f
cond_yield_neon
b 99b
ldp x29, x30, [sp], #16
2: frame_pop
ret
ENDPROC(aesbs_cbc_decrypt)
......@@ -731,87 +746,93 @@ CPU_BE( .quad 0x87, 1 )
*/
__xts_crypt8:
mov x6, #1
lsl x6, x6, x4
subs w4, w4, #8
csel x4, x4, xzr, pl
lsl x6, x6, x23
subs w23, w23, #8
csel x23, x23, xzr, pl
csel x6, x6, xzr, mi
ld1 {v0.16b}, [x1], #16
ld1 {v0.16b}, [x20], #16
next_tweak v26, v25, v30, v31
eor v0.16b, v0.16b, v25.16b
tbnz x6, #1, 0f
ld1 {v1.16b}, [x1], #16
ld1 {v1.16b}, [x20], #16
next_tweak v27, v26, v30, v31
eor v1.16b, v1.16b, v26.16b
tbnz x6, #2, 0f
ld1 {v2.16b}, [x1], #16
ld1 {v2.16b}, [x20], #16
next_tweak v28, v27, v30, v31
eor v2.16b, v2.16b, v27.16b
tbnz x6, #3, 0f
ld1 {v3.16b}, [x1], #16
ld1 {v3.16b}, [x20], #16
next_tweak v29, v28, v30, v31
eor v3.16b, v3.16b, v28.16b
tbnz x6, #4, 0f
ld1 {v4.16b}, [x1], #16
str q29, [sp, #16]
ld1 {v4.16b}, [x20], #16
str q29, [sp, #.Lframe_local_offset]
eor v4.16b, v4.16b, v29.16b
next_tweak v29, v29, v30, v31
tbnz x6, #5, 0f
ld1 {v5.16b}, [x1], #16
str q29, [sp, #32]
ld1 {v5.16b}, [x20], #16
str q29, [sp, #.Lframe_local_offset + 16]
eor v5.16b, v5.16b, v29.16b
next_tweak v29, v29, v30, v31
tbnz x6, #6, 0f
ld1 {v6.16b}, [x1], #16
str q29, [sp, #48]
ld1 {v6.16b}, [x20], #16
str q29, [sp, #.Lframe_local_offset + 32]
eor v6.16b, v6.16b, v29.16b
next_tweak v29, v29, v30, v31
tbnz x6, #7, 0f
ld1 {v7.16b}, [x1], #16
str q29, [sp, #64]
ld1 {v7.16b}, [x20], #16
str q29, [sp, #.Lframe_local_offset + 48]
eor v7.16b, v7.16b, v29.16b
next_tweak v29, v29, v30, v31
0: mov bskey, x2
mov rounds, x3
0: mov bskey, x21
mov rounds, x22
br x7
ENDPROC(__xts_crypt8)
.macro __xts_crypt, do8, o0, o1, o2, o3, o4, o5, o6, o7
stp x29, x30, [sp, #-80]!
mov x29, sp
frame_push 6, 64
ldr q30, .Lxts_mul_x
ld1 {v25.16b}, [x5]
mov x19, x0
mov x20, x1
mov x21, x2
mov x22, x3
mov x23, x4
mov x24, x5
0: ldr q30, .Lxts_mul_x
ld1 {v25.16b}, [x24]
99: adr x7, \do8
bl __xts_crypt8
ldp q16, q17, [sp, #16]
ldp q18, q19, [sp, #48]
ldp q16, q17, [sp, #.Lframe_local_offset]
ldp q18, q19, [sp, #.Lframe_local_offset + 32]
eor \o0\().16b, \o0\().16b, v25.16b
eor \o1\().16b, \o1\().16b, v26.16b
eor \o2\().16b, \o2\().16b, v27.16b
eor \o3\().16b, \o3\().16b, v28.16b
st1 {\o0\().16b}, [x0], #16
st1 {\o0\().16b}, [x19], #16
mov v25.16b, v26.16b
tbnz x6, #1, 1f
st1 {\o1\().16b}, [x0], #16
st1 {\o1\().16b}, [x19], #16
mov v25.16b, v27.16b
tbnz x6, #2, 1f
st1 {\o2\().16b}, [x0], #16
st1 {\o2\().16b}, [x19], #16
mov v25.16b, v28.16b
tbnz x6, #3, 1f
st1 {\o3\().16b}, [x0], #16
st1 {\o3\().16b}, [x19], #16
mov v25.16b, v29.16b
tbnz x6, #4, 1f
......@@ -820,18 +841,22 @@ ENDPROC(__xts_crypt8)
eor \o6\().16b, \o6\().16b, v18.16b
eor \o7\().16b, \o7\().16b, v19.16b
st1 {\o4\().16b}, [x0], #16
st1 {\o4\().16b}, [x19], #16
tbnz x6, #5, 1f
st1 {\o5\().16b}, [x0], #16
st1 {\o5\().16b}, [x19], #16
tbnz x6, #6, 1f
st1 {\o6\().16b}, [x0], #16
st1 {\o6\().16b}, [x19], #16
tbnz x6, #7, 1f
st1 {\o7\().16b}, [x0], #16
st1 {\o7\().16b}, [x19], #16
cbz x23, 1f
st1 {v25.16b}, [x24]
cbnz x4, 99b
cond_yield_neon 0b
b 99b
1: st1 {v25.16b}, [x5]
ldp x29, x30, [sp], #80
1: st1 {v25.16b}, [x24]
frame_pop
ret
.endm
......@@ -856,24 +881,31 @@ ENDPROC(aesbs_xts_decrypt)
* int rounds, int blocks, u8 iv[], u8 final[])
*/
ENTRY(aesbs_ctr_encrypt)
stp x29, x30, [sp, #-16]!
mov x29, sp
cmp x6, #0
cset x10, ne
add x4, x4, x10 // do one extra block if final
ldp x7, x8, [x5]
ld1 {v0.16b}, [x5]
frame_push 8
mov x19, x0
mov x20, x1
mov x21, x2
mov x22, x3
mov x23, x4
mov x24, x5
mov x25, x6
cmp x25, #0
cset x26, ne
add x23, x23, x26 // do one extra block if final
98: ldp x7, x8, [x24]
ld1 {v0.16b}, [x24]
CPU_LE( rev x7, x7 )
CPU_LE( rev x8, x8 )
adds x8, x8, #1
adc x7, x7, xzr
99: mov x9, #1
lsl x9, x9, x4
subs w4, w4, #8
csel x4, x4, xzr, pl
lsl x9, x9, x23
subs w23, w23, #8
csel x23, x23, xzr, pl
csel x9, x9, xzr, le
tbnz x9, #1, 0f
......@@ -891,82 +923,85 @@ CPU_LE( rev x8, x8 )
tbnz x9, #7, 0f
next_ctr v7
0: mov bskey, x2
mov rounds, x3
0: mov bskey, x21
mov rounds, x22
bl aesbs_encrypt8
lsr x9, x9, x10 // disregard the extra block
lsr x9, x9, x26 // disregard the extra block
tbnz x9, #0, 0f
ld1 {v8.16b}, [x1], #16
ld1 {v8.16b}, [x20], #16
eor v0.16b, v0.16b, v8.16b
st1 {v0.16b}, [x0], #16
st1 {v0.16b}, [x19], #16
tbnz x9, #1, 1f
ld1 {v9.16b}, [x1], #16
ld1 {v9.16b}, [x20], #16
eor v1.16b, v1.16b, v9.16b
st1 {v1.16b}, [x0], #16
st1 {v1.16b}, [x19], #16
tbnz x9, #2, 2f
ld1 {v10.16b}, [x1], #16
ld1 {v10.16b}, [x20], #16
eor v4.16b, v4.16b, v10.16b
st1 {v4.16b}, [x0], #16
st1 {v4.16b}, [x19], #16
tbnz x9, #3, 3f
ld1 {v11.16b}, [x1], #16
ld1 {v11.16b}, [x20], #16
eor v6.16b, v6.16b, v11.16b
st1 {v6.16b}, [x0], #16
st1 {v6.16b}, [x19], #16
tbnz x9, #4, 4f
ld1 {v12.16b}, [x1], #16
ld1 {v12.16b}, [x20], #16
eor v3.16b, v3.16b, v12.16b
st1 {v3.16b}, [x0], #16
st1 {v3.16b}, [x19], #16
tbnz x9, #5, 5f
ld1 {v13.16b}, [x1], #16
ld1 {v13.16b}, [x20], #16
eor v7.16b, v7.16b, v13.16b
st1 {v7.16b}, [x0], #16
st1 {v7.16b}, [x19], #16
tbnz x9, #6, 6f
ld1 {v14.16b}, [x1], #16
ld1 {v14.16b}, [x20], #16
eor v2.16b, v2.16b, v14.16b
st1 {v2.16b}, [x0], #16
st1 {v2.16b}, [x19], #16
tbnz x9, #7, 7f
ld1 {v15.16b}, [x1], #16
ld1 {v15.16b}, [x20], #16
eor v5.16b, v5.16b, v15.16b
st1 {v5.16b}, [x0], #16
st1 {v5.16b}, [x19], #16
8: next_ctr v0
cbnz x4, 99b
st1 {v0.16b}, [x24]
cbz x23, 0f
cond_yield_neon 98b
b 99b
0: st1 {v0.16b}, [x5]
ldp x29, x30, [sp], #16
0: frame_pop
ret
/*
* If we are handling the tail of the input (x6 != NULL), return the
* final keystream block back to the caller.
*/
1: cbz x6, 8b
st1 {v1.16b}, [x6]
1: cbz x25, 8b
st1 {v1.16b}, [x25]
b 8b
2: cbz x6, 8b
st1 {v4.16b}, [x6]
2: cbz x25, 8b
st1 {v4.16b}, [x25]
b 8b
3: cbz x6, 8b
st1 {v6.16b}, [x6]
3: cbz x25, 8b
st1 {v6.16b}, [x25]
b 8b
4: cbz x6, 8b
st1 {v3.16b}, [x6]
4: cbz x25, 8b
st1 {v3.16b}, [x25]
b 8b
5: cbz x6, 8b
st1 {v7.16b}, [x6]
5: cbz x25, 8b
st1 {v7.16b}, [x25]
b 8b
6: cbz x6, 8b
st1 {v2.16b}, [x6]
6: cbz x25, 8b
st1 {v2.16b}, [x25]
b 8b
7: cbz x6, 8b
st1 {v5.16b}, [x6]
7: cbz x25, 8b
st1 {v5.16b}, [x25]
b 8b
ENDPROC(aesbs_ctr_encrypt)
......@@ -100,9 +100,10 @@
dCONSTANT .req d0
qCONSTANT .req q0
BUF .req x0
LEN .req x1
CRC .req x2
BUF .req x19
LEN .req x20
CRC .req x21
CONST .req x22
vzr .req v9
......@@ -123,7 +124,14 @@ ENTRY(crc32_pmull_le)
ENTRY(crc32c_pmull_le)
adr_l x3, .Lcrc32c_constants
0: bic LEN, LEN, #15
0: frame_push 4, 64
mov BUF, x0
mov LEN, x1
mov CRC, x2
mov CONST, x3
bic LEN, LEN, #15
ld1 {v1.16b-v4.16b}, [BUF], #0x40
movi vzr.16b, #0
fmov dCONSTANT, CRC
......@@ -132,7 +140,7 @@ ENTRY(crc32c_pmull_le)
cmp LEN, #0x40
b.lt less_64
ldr qCONSTANT, [x3]
ldr qCONSTANT, [CONST]
loop_64: /* 64 bytes Full cache line folding */
sub LEN, LEN, #0x40
......@@ -162,10 +170,21 @@ loop_64: /* 64 bytes Full cache line folding */
eor v4.16b, v4.16b, v8.16b
cmp LEN, #0x40
b.ge loop_64
b.lt less_64
if_will_cond_yield_neon
stp q1, q2, [sp, #.Lframe_local_offset]
stp q3, q4, [sp, #.Lframe_local_offset + 32]
do_cond_yield_neon
ldp q1, q2, [sp, #.Lframe_local_offset]
ldp q3, q4, [sp, #.Lframe_local_offset + 32]
ldr qCONSTANT, [CONST]
movi vzr.16b, #0
endif_yield_neon
b loop_64
less_64: /* Folding cache line into 128bit */
ldr qCONSTANT, [x3, #16]
ldr qCONSTANT, [CONST, #16]
pmull2 v5.1q, v1.2d, vCONSTANT.2d
pmull v1.1q, v1.1d, vCONSTANT.1d
......@@ -204,8 +223,8 @@ fold_64:
eor v1.16b, v1.16b, v2.16b
/* final 32-bit fold */
ldr dCONSTANT, [x3, #32]
ldr d3, [x3, #40]
ldr dCONSTANT, [CONST, #32]
ldr d3, [CONST, #40]
ext v2.16b, v1.16b, vzr.16b, #4
and v1.16b, v1.16b, v3.16b
......@@ -213,7 +232,7 @@ fold_64:
eor v1.16b, v1.16b, v2.16b
/* Finish up with the bit-reversed barrett reduction 64 ==> 32 bits */
ldr qCONSTANT, [x3, #48]
ldr qCONSTANT, [CONST, #48]
and v2.16b, v1.16b, v3.16b
ext v2.16b, vzr.16b, v2.16b, #8
......@@ -223,6 +242,7 @@ fold_64:
eor v1.16b, v1.16b, v2.16b
mov w0, v1.s[1]
frame_pop
ret
ENDPROC(crc32_pmull_le)
ENDPROC(crc32c_pmull_le)
......
......@@ -74,13 +74,19 @@
.text
.cpu generic+crypto
arg1_low32 .req w0
arg2 .req x1
arg3 .req x2
arg1_low32 .req w19
arg2 .req x20
arg3 .req x21
vzr .req v13
ENTRY(crc_t10dif_pmull)
frame_push 3, 128
mov arg1_low32, w0
mov arg2, x1
mov arg3, x2
movi vzr.16b, #0 // init zero register
// adjust the 16-bit initial_crc value, scale it to 32 bits
......@@ -175,8 +181,25 @@ CPU_LE( ext v12.16b, v12.16b, v12.16b, #8 )
subs arg3, arg3, #128
// check if there is another 64B in the buffer to be able to fold
b.ge _fold_64_B_loop
b.lt _fold_64_B_end
if_will_cond_yield_neon
stp q0, q1, [sp, #.Lframe_local_offset]
stp q2, q3, [sp, #.Lframe_local_offset + 32]
stp q4, q5, [sp, #.Lframe_local_offset + 64]
stp q6, q7, [sp, #.Lframe_local_offset + 96]
do_cond_yield_neon
ldp q0, q1, [sp, #.Lframe_local_offset]
ldp q2, q3, [sp, #.Lframe_local_offset + 32]
ldp q4, q5, [sp, #.Lframe_local_offset + 64]
ldp q6, q7, [sp, #.Lframe_local_offset + 96]
ldr_l q10, rk3, x8
movi vzr.16b, #0 // init zero register
endif_yield_neon
b _fold_64_B_loop
_fold_64_B_end:
// at this point, the buffer pointer is pointing at the last y Bytes
// of the buffer the 64B of folded data is in 4 of the vector
// registers: v0, v1, v2, v3
......@@ -304,6 +327,7 @@ _barrett:
_cleanup:
// scale the result back to 16 bits
lsr x0, x0, #16
frame_pop
ret
_less_than_128:
......
......@@ -213,22 +213,31 @@
.endm
.macro __pmull_ghash, pn
ld1 {SHASH.2d}, [x3]
ld1 {XL.2d}, [x1]
frame_push 5
mov x19, x0
mov x20, x1
mov x21, x2
mov x22, x3
mov x23, x4
0: ld1 {SHASH.2d}, [x22]
ld1 {XL.2d}, [x20]
ext SHASH2.16b, SHASH.16b, SHASH.16b, #8
eor SHASH2.16b, SHASH2.16b, SHASH.16b
__pmull_pre_\pn
/* do the head block first, if supplied */
cbz x4, 0f
ld1 {T1.2d}, [x4]
b 1f
cbz x23, 1f
ld1 {T1.2d}, [x23]
mov x23, xzr
b 2f
0: ld1 {T1.2d}, [x2], #16
sub w0, w0, #1
1: ld1 {T1.2d}, [x21], #16
sub w19, w19, #1
1: /* multiply XL by SHASH in GF(2^128) */
2: /* multiply XL by SHASH in GF(2^128) */
CPU_LE( rev64 T1.16b, T1.16b )
ext T2.16b, XL.16b, XL.16b, #8
......@@ -250,9 +259,18 @@ CPU_LE( rev64 T1.16b, T1.16b )
eor T2.16b, T2.16b, XH.16b
eor XL.16b, XL.16b, T2.16b
cbnz w0, 0b
cbz w19, 3f
if_will_cond_yield_neon
st1 {XL.2d}, [x20]
do_cond_yield_neon
b 0b
endif_yield_neon
b 1b
st1 {XL.2d}, [x1]
3: st1 {XL.2d}, [x20]
frame_pop
ret
.endm
......@@ -304,38 +322,55 @@ ENDPROC(pmull_ghash_update_p8)
.endm
.macro pmull_gcm_do_crypt, enc
ld1 {SHASH.2d}, [x4]
ld1 {XL.2d}, [x1]
ldr x8, [x5, #8] // load lower counter
frame_push 10
mov x19, x0
mov x20, x1
mov x21, x2
mov x22, x3
mov x23, x4
mov x24, x5
mov x25, x6
mov x26, x7
.if \enc == 1
ldr x27, [sp, #96] // first stacked arg
.endif
ldr x28, [x24, #8] // load lower counter
CPU_LE( rev x28, x28 )
0: mov x0, x25
load_round_keys w26, x0
ld1 {SHASH.2d}, [x23]
ld1 {XL.2d}, [x20]
movi MASK.16b, #0xe1
ext SHASH2.16b, SHASH.16b, SHASH.16b, #8
CPU_LE( rev x8, x8 )
shl MASK.2d, MASK.2d, #57
eor SHASH2.16b, SHASH2.16b, SHASH.16b
.if \enc == 1
ld1 {KS.16b}, [x7]
ld1 {KS.16b}, [x27]
.endif
0: ld1 {CTR.8b}, [x5] // load upper counter
ld1 {INP.16b}, [x3], #16
rev x9, x8
add x8, x8, #1
sub w0, w0, #1
1: ld1 {CTR.8b}, [x24] // load upper counter
ld1 {INP.16b}, [x22], #16
rev x9, x28
add x28, x28, #1
sub w19, w19, #1
ins CTR.d[1], x9 // set lower counter
.if \enc == 1
eor INP.16b, INP.16b, KS.16b // encrypt input
st1 {INP.16b}, [x2], #16
st1 {INP.16b}, [x21], #16
.endif
rev64 T1.16b, INP.16b
cmp w6, #12
b.ge 2f // AES-192/256?
cmp w26, #12
b.ge 4f // AES-192/256?
1: enc_round CTR, v21
2: enc_round CTR, v21
ext T2.16b, XL.16b, XL.16b, #8
ext IN1.16b, T1.16b, T1.16b, #8
......@@ -390,27 +425,39 @@ CPU_LE( rev x8, x8 )
.if \enc == 0
eor INP.16b, INP.16b, KS.16b
st1 {INP.16b}, [x2], #16
st1 {INP.16b}, [x21], #16
.endif
cbnz w0, 0b
cbz w19, 3f
CPU_LE( rev x8, x8 )
st1 {XL.2d}, [x1]
str x8, [x5, #8] // store lower counter
if_will_cond_yield_neon
st1 {XL.2d}, [x20]
.if \enc == 1
st1 {KS.16b}, [x27]
.endif
do_cond_yield_neon
b 0b
endif_yield_neon
b 1b
3: st1 {XL.2d}, [x20]
.if \enc == 1
st1 {KS.16b}, [x7]
st1 {KS.16b}, [x27]
.endif
CPU_LE( rev x28, x28 )
str x28, [x24, #8] // store lower counter
frame_pop
ret
2: b.eq 3f // AES-192?
4: b.eq 5f // AES-192?
enc_round CTR, v17
enc_round CTR, v18
3: enc_round CTR, v19
5: enc_round CTR, v19
enc_round CTR, v20
b 1b
b 2b
.endm
/*
......
......@@ -63,11 +63,12 @@ static void (*pmull_ghash_update)(int blocks, u64 dg[], const char *src,
asmlinkage void pmull_gcm_encrypt(int blocks, u64 dg[], u8 dst[],
const u8 src[], struct ghash_key const *k,
u8 ctr[], int rounds, u8 ks[]);
u8 ctr[], u32 const rk[], int rounds,
u8 ks[]);
asmlinkage void pmull_gcm_decrypt(int blocks, u64 dg[], u8 dst[],
const u8 src[], struct ghash_key const *k,
u8 ctr[], int rounds);
u8 ctr[], u32 const rk[], int rounds);
asmlinkage void pmull_gcm_encrypt_block(u8 dst[], u8 const src[],
u32 const rk[], int rounds);
......@@ -368,26 +369,29 @@ static int gcm_encrypt(struct aead_request *req)
pmull_gcm_encrypt_block(ks, iv, NULL,
num_rounds(&ctx->aes_key));
put_unaligned_be32(3, iv + GCM_IV_SIZE);
kernel_neon_end();
err = skcipher_walk_aead_encrypt(&walk, req, true);
err = skcipher_walk_aead_encrypt(&walk, req, false);
while (walk.nbytes >= AES_BLOCK_SIZE) {
int blocks = walk.nbytes / AES_BLOCK_SIZE;
kernel_neon_begin();
pmull_gcm_encrypt(blocks, dg, walk.dst.virt.addr,
walk.src.virt.addr, &ctx->ghash_key,
iv, num_rounds(&ctx->aes_key), ks);
iv, ctx->aes_key.key_enc,
num_rounds(&ctx->aes_key), ks);
kernel_neon_end();
err = skcipher_walk_done(&walk,
walk.nbytes % AES_BLOCK_SIZE);
}
kernel_neon_end();
} else {
__aes_arm64_encrypt(ctx->aes_key.key_enc, tag, iv,
num_rounds(&ctx->aes_key));
put_unaligned_be32(2, iv + GCM_IV_SIZE);
err = skcipher_walk_aead_encrypt(&walk, req, true);
err = skcipher_walk_aead_encrypt(&walk, req, false);
while (walk.nbytes >= AES_BLOCK_SIZE) {
int blocks = walk.nbytes / AES_BLOCK_SIZE;
......@@ -467,15 +471,19 @@ static int gcm_decrypt(struct aead_request *req)
pmull_gcm_encrypt_block(tag, iv, ctx->aes_key.key_enc,
num_rounds(&ctx->aes_key));
put_unaligned_be32(2, iv + GCM_IV_SIZE);
kernel_neon_end();
err = skcipher_walk_aead_decrypt(&walk, req, true);
err = skcipher_walk_aead_decrypt(&walk, req, false);
while (walk.nbytes >= AES_BLOCK_SIZE) {
int blocks = walk.nbytes / AES_BLOCK_SIZE;
kernel_neon_begin();
pmull_gcm_decrypt(blocks, dg, walk.dst.virt.addr,
walk.src.virt.addr, &ctx->ghash_key,
iv, num_rounds(&ctx->aes_key));
iv, ctx->aes_key.key_enc,
num_rounds(&ctx->aes_key));
kernel_neon_end();
err = skcipher_walk_done(&walk,
walk.nbytes % AES_BLOCK_SIZE);
......@@ -483,14 +491,12 @@ static int gcm_decrypt(struct aead_request *req)
if (walk.nbytes)
pmull_gcm_encrypt_block(iv, iv, NULL,
num_rounds(&ctx->aes_key));
kernel_neon_end();
} else {
__aes_arm64_encrypt(ctx->aes_key.key_enc, tag, iv,
num_rounds(&ctx->aes_key));
put_unaligned_be32(2, iv + GCM_IV_SIZE);
err = skcipher_walk_aead_decrypt(&walk, req, true);
err = skcipher_walk_aead_decrypt(&walk, req, false);
while (walk.nbytes >= AES_BLOCK_SIZE) {
int blocks = walk.nbytes / AES_BLOCK_SIZE;
......
......@@ -69,30 +69,36 @@
* int blocks)
*/
ENTRY(sha1_ce_transform)
frame_push 3
mov x19, x0
mov x20, x1
mov x21, x2
/* load round constants */
loadrc k0.4s, 0x5a827999, w6
0: loadrc k0.4s, 0x5a827999, w6
loadrc k1.4s, 0x6ed9eba1, w6
loadrc k2.4s, 0x8f1bbcdc, w6
loadrc k3.4s, 0xca62c1d6, w6
/* load state */
ld1 {dgav.4s}, [x0]
ldr dgb, [x0, #16]
ld1 {dgav.4s}, [x19]
ldr dgb, [x19, #16]
/* load sha1_ce_state::finalize */
ldr_l w4, sha1_ce_offsetof_finalize, x4
ldr w4, [x0, x4]
ldr w4, [x19, x4]
/* load input */
0: ld1 {v8.4s-v11.4s}, [x1], #64
sub w2, w2, #1
1: ld1 {v8.4s-v11.4s}, [x20], #64
sub w21, w21, #1
CPU_LE( rev32 v8.16b, v8.16b )
CPU_LE( rev32 v9.16b, v9.16b )
CPU_LE( rev32 v10.16b, v10.16b )
CPU_LE( rev32 v11.16b, v11.16b )
1: add t0.4s, v8.4s, k0.4s
2: add t0.4s, v8.4s, k0.4s
mov dg0v.16b, dgav.16b
add_update c, ev, k0, 8, 9, 10, 11, dgb
......@@ -123,16 +129,25 @@ CPU_LE( rev32 v11.16b, v11.16b )
add dgbv.2s, dgbv.2s, dg1v.2s
add dgav.4s, dgav.4s, dg0v.4s
cbnz w2, 0b
cbz w21, 3f
if_will_cond_yield_neon
st1 {dgav.4s}, [x19]
str dgb, [x19, #16]
do_cond_yield_neon
b 0b
endif_yield_neon
b 1b
/*
* Final block: add padding and total bit count.
* Skip if the input size was not a round multiple of the block size,
* the padding is handled by the C code in that case.
*/
cbz x4, 3f
3: cbz x4, 4f
ldr_l w4, sha1_ce_offsetof_count, x4
ldr x4, [x0, x4]
ldr x4, [x19, x4]
movi v9.2d, #0
mov x8, #0x80000000
movi v10.2d, #0
......@@ -141,10 +156,11 @@ CPU_LE( rev32 v11.16b, v11.16b )
mov x4, #0
mov v11.d[0], xzr
mov v11.d[1], x7
b 1b
b 2b
/* store new state */
3: st1 {dgav.4s}, [x0]
str dgb, [x0, #16]
4: st1 {dgav.4s}, [x19]
str dgb, [x19, #16]
frame_pop
ret
ENDPROC(sha1_ce_transform)
......@@ -79,30 +79,36 @@
*/
.text
ENTRY(sha2_ce_transform)
frame_push 3
mov x19, x0
mov x20, x1
mov x21, x2
/* load round constants */
adr_l x8, .Lsha2_rcon
0: adr_l x8, .Lsha2_rcon
ld1 { v0.4s- v3.4s}, [x8], #64
ld1 { v4.4s- v7.4s}, [x8], #64
ld1 { v8.4s-v11.4s}, [x8], #64
ld1 {v12.4s-v15.4s}, [x8]
/* load state */
ld1 {dgav.4s, dgbv.4s}, [x0]
ld1 {dgav.4s, dgbv.4s}, [x19]
/* load sha256_ce_state::finalize */
ldr_l w4, sha256_ce_offsetof_finalize, x4
ldr w4, [x0, x4]
ldr w4, [x19, x4]
/* load input */
0: ld1 {v16.4s-v19.4s}, [x1], #64
sub w2, w2, #1
1: ld1 {v16.4s-v19.4s}, [x20], #64
sub w21, w21, #1
CPU_LE( rev32 v16.16b, v16.16b )
CPU_LE( rev32 v17.16b, v17.16b )
CPU_LE( rev32 v18.16b, v18.16b )
CPU_LE( rev32 v19.16b, v19.16b )
1: add t0.4s, v16.4s, v0.4s
2: add t0.4s, v16.4s, v0.4s
mov dg0v.16b, dgav.16b
mov dg1v.16b, dgbv.16b
......@@ -131,16 +137,24 @@ CPU_LE( rev32 v19.16b, v19.16b )
add dgbv.4s, dgbv.4s, dg1v.4s
/* handled all input blocks? */
cbnz w2, 0b
cbz w21, 3f
if_will_cond_yield_neon
st1 {dgav.4s, dgbv.4s}, [x19]
do_cond_yield_neon
b 0b
endif_yield_neon
b 1b
/*
* Final block: add padding and total bit count.
* Skip if the input size was not a round multiple of the block size,
* the padding is handled by the C code in that case.
*/
cbz x4, 3f
3: cbz x4, 4f
ldr_l w4, sha256_ce_offsetof_count, x4
ldr x4, [x0, x4]
ldr x4, [x19, x4]
movi v17.2d, #0
mov x8, #0x80000000
movi v18.2d, #0
......@@ -149,9 +163,10 @@ CPU_LE( rev32 v19.16b, v19.16b )
mov x4, #0
mov v19.d[0], xzr
mov v19.d[1], x7
b 1b
b 2b
/* store new state */
3: st1 {dgav.4s, dgbv.4s}, [x0]
4: st1 {dgav.4s, dgbv.4s}, [x19]
frame_pop
ret
ENDPROC(sha2_ce_transform)
// SPDX-License-Identifier: GPL-2.0
// This code is taken from the OpenSSL project but the author (Andy Polyakov)
// has relicensed it under the GPLv2. Therefore this program is free software;
// you can redistribute it and/or modify it under the terms of the GNU General
// Public License version 2 as published by the Free Software Foundation.
//
// The original headers, including the original license headers, are
// included below for completeness.
// Copyright 2014-2016 The OpenSSL Project Authors. All Rights Reserved.
//
// Licensed under the OpenSSL license (the "License"). You may not use
......@@ -10,8 +20,6 @@
// project. The module is, however, dual licensed under OpenSSL and
// CRYPTOGAMS licenses depending on where you obtain it. For further
// details see http://www.openssl.org/~appro/cryptogams/.
//
// Permission to use under GPLv2 terms is granted.
// ====================================================================
//
// SHA256/512 for ARMv8.
......
......@@ -41,9 +41,16 @@
*/
.text
ENTRY(sha3_ce_transform)
/* load state */
add x8, x0, #32
ld1 { v0.1d- v3.1d}, [x0]
frame_push 4
mov x19, x0
mov x20, x1
mov x21, x2
mov x22, x3
0: /* load state */
add x8, x19, #32
ld1 { v0.1d- v3.1d}, [x19]
ld1 { v4.1d- v7.1d}, [x8], #32
ld1 { v8.1d-v11.1d}, [x8], #32
ld1 {v12.1d-v15.1d}, [x8], #32
......@@ -51,13 +58,13 @@ ENTRY(sha3_ce_transform)
ld1 {v20.1d-v23.1d}, [x8], #32
ld1 {v24.1d}, [x8]
0: sub w2, w2, #1
1: sub w21, w21, #1
mov w8, #24
adr_l x9, .Lsha3_rcon
/* load input */
ld1 {v25.8b-v28.8b}, [x1], #32
ld1 {v29.8b-v31.8b}, [x1], #24
ld1 {v25.8b-v28.8b}, [x20], #32
ld1 {v29.8b-v31.8b}, [x20], #24
eor v0.8b, v0.8b, v25.8b
eor v1.8b, v1.8b, v26.8b
eor v2.8b, v2.8b, v27.8b
......@@ -66,10 +73,10 @@ ENTRY(sha3_ce_transform)
eor v5.8b, v5.8b, v30.8b
eor v6.8b, v6.8b, v31.8b
tbnz x3, #6, 2f // SHA3-512
tbnz x22, #6, 3f // SHA3-512
ld1 {v25.8b-v28.8b}, [x1], #32
ld1 {v29.8b-v30.8b}, [x1], #16
ld1 {v25.8b-v28.8b}, [x20], #32
ld1 {v29.8b-v30.8b}, [x20], #16
eor v7.8b, v7.8b, v25.8b
eor v8.8b, v8.8b, v26.8b
eor v9.8b, v9.8b, v27.8b
......@@ -77,34 +84,34 @@ ENTRY(sha3_ce_transform)
eor v11.8b, v11.8b, v29.8b
eor v12.8b, v12.8b, v30.8b
tbnz x3, #4, 1f // SHA3-384 or SHA3-224
tbnz x22, #4, 2f // SHA3-384 or SHA3-224
// SHA3-256
ld1 {v25.8b-v28.8b}, [x1], #32
ld1 {v25.8b-v28.8b}, [x20], #32
eor v13.8b, v13.8b, v25.8b
eor v14.8b, v14.8b, v26.8b
eor v15.8b, v15.8b, v27.8b
eor v16.8b, v16.8b, v28.8b
b 3f
b 4f
1: tbz x3, #2, 3f // bit 2 cleared? SHA-384
2: tbz x22, #2, 4f // bit 2 cleared? SHA-384
// SHA3-224
ld1 {v25.8b-v28.8b}, [x1], #32
ld1 {v29.8b}, [x1], #8
ld1 {v25.8b-v28.8b}, [x20], #32
ld1 {v29.8b}, [x20], #8
eor v13.8b, v13.8b, v25.8b
eor v14.8b, v14.8b, v26.8b
eor v15.8b, v15.8b, v27.8b
eor v16.8b, v16.8b, v28.8b
eor v17.8b, v17.8b, v29.8b
b 3f
b 4f
// SHA3-512
2: ld1 {v25.8b-v26.8b}, [x1], #16
3: ld1 {v25.8b-v26.8b}, [x20], #16
eor v7.8b, v7.8b, v25.8b
eor v8.8b, v8.8b, v26.8b
3: sub w8, w8, #1
4: sub w8, w8, #1
eor3 v29.16b, v4.16b, v9.16b, v14.16b
eor3 v26.16b, v1.16b, v6.16b, v11.16b
......@@ -183,17 +190,33 @@ ENTRY(sha3_ce_transform)
eor v0.16b, v0.16b, v31.16b
cbnz w8, 3b
cbnz w2, 0b
cbnz w8, 4b
cbz w21, 5f
if_will_cond_yield_neon
add x8, x19, #32
st1 { v0.1d- v3.1d}, [x19]
st1 { v4.1d- v7.1d}, [x8], #32
st1 { v8.1d-v11.1d}, [x8], #32
st1 {v12.1d-v15.1d}, [x8], #32
st1 {v16.1d-v19.1d}, [x8], #32
st1 {v20.1d-v23.1d}, [x8], #32
st1 {v24.1d}, [x8]
do_cond_yield_neon
b 0b
endif_yield_neon
b 1b
/* save state */
st1 { v0.1d- v3.1d}, [x0], #32
st1 { v4.1d- v7.1d}, [x0], #32
st1 { v8.1d-v11.1d}, [x0], #32
st1 {v12.1d-v15.1d}, [x0], #32
st1 {v16.1d-v19.1d}, [x0], #32
st1 {v20.1d-v23.1d}, [x0], #32
st1 {v24.1d}, [x0]
5: st1 { v0.1d- v3.1d}, [x19], #32
st1 { v4.1d- v7.1d}, [x19], #32
st1 { v8.1d-v11.1d}, [x19], #32
st1 {v12.1d-v15.1d}, [x19], #32
st1 {v16.1d-v19.1d}, [x19], #32
st1 {v20.1d-v23.1d}, [x19], #32
st1 {v24.1d}, [x19]
frame_pop
ret
ENDPROC(sha3_ce_transform)
......
#! /usr/bin/env perl
# SPDX-License-Identifier: GPL-2.0
# This code is taken from the OpenSSL project but the author (Andy Polyakov)
# has relicensed it under the GPLv2. Therefore this program is free software;
# you can redistribute it and/or modify it under the terms of the GNU General
# Public License version 2 as published by the Free Software Foundation.
#
# The original headers, including the original license headers, are
# included below for completeness.
# Copyright 2014-2016 The OpenSSL Project Authors. All Rights Reserved.
#
# Licensed under the OpenSSL license (the "License"). You may not use
......@@ -11,8 +21,6 @@
# project. The module is, however, dual licensed under OpenSSL and
# CRYPTOGAMS licenses depending on where you obtain it. For further
# details see http://www.openssl.org/~appro/cryptogams/.
#
# Permission to use under GPLv2 terms is granted.
# ====================================================================
#
# SHA256/512 for ARMv8.
......
......@@ -107,17 +107,23 @@
*/
.text
ENTRY(sha512_ce_transform)
frame_push 3
mov x19, x0
mov x20, x1
mov x21, x2
/* load state */
ld1 {v8.2d-v11.2d}, [x0]
0: ld1 {v8.2d-v11.2d}, [x19]
/* load first 4 round constants */
adr_l x3, .Lsha512_rcon
ld1 {v20.2d-v23.2d}, [x3], #64
/* load input */
0: ld1 {v12.2d-v15.2d}, [x1], #64
ld1 {v16.2d-v19.2d}, [x1], #64
sub w2, w2, #1
1: ld1 {v12.2d-v15.2d}, [x20], #64
ld1 {v16.2d-v19.2d}, [x20], #64
sub w21, w21, #1
CPU_LE( rev64 v12.16b, v12.16b )
CPU_LE( rev64 v13.16b, v13.16b )
......@@ -196,9 +202,18 @@ CPU_LE( rev64 v19.16b, v19.16b )
add v11.2d, v11.2d, v3.2d
/* handled all input blocks? */
cbnz w2, 0b
cbz w21, 3f
if_will_cond_yield_neon
st1 {v8.2d-v11.2d}, [x19]
do_cond_yield_neon
b 0b
endif_yield_neon
b 1b
/* store new state */
3: st1 {v8.2d-v11.2d}, [x0]
3: st1 {v8.2d-v11.2d}, [x19]
frame_pop
ret
ENDPROC(sha512_ce_transform)
// SPDX-License-Identifier: GPL-2.0
// This code is taken from the OpenSSL project but the author (Andy Polyakov)
// has relicensed it under the GPLv2. Therefore this program is free software;
// you can redistribute it and/or modify it under the terms of the GNU General
// Public License version 2 as published by the Free Software Foundation.
//
// The original headers, including the original license headers, are
// included below for completeness.
// Copyright 2014-2016 The OpenSSL Project Authors. All Rights Reserved.
//
// Licensed under the OpenSSL license (the "License"). You may not use
......@@ -10,8 +20,6 @@
// project. The module is, however, dual licensed under OpenSSL and
// CRYPTOGAMS licenses depending on where you obtain it. For further
// details see http://www.openssl.org/~appro/cryptogams/.
//
// Permission to use under GPLv2 terms is granted.
// ====================================================================
//
// SHA256/512 for ARMv8.
......
// SPDX-License-Identifier: GPL-2.0
#include <linux/linkage.h>
#include <asm/assembler.h>
.irp b, 0, 1, 2, 3, 4, 5, 6, 7, 8
.set .Lv\b\().4s, \b
.endr
.macro sm4e, rd, rn
.inst 0xcec08400 | .L\rd | (.L\rn << 5)
.endm
/*
* void sm4_ce_do_crypt(const u32 *rk, u32 *out, const u32 *in);
*/
.text
ENTRY(sm4_ce_do_crypt)
ld1 {v8.4s}, [x2]
ld1 {v0.4s-v3.4s}, [x0], #64
CPU_LE( rev32 v8.16b, v8.16b )
ld1 {v4.4s-v7.4s}, [x0]
sm4e v8.4s, v0.4s
sm4e v8.4s, v1.4s
sm4e v8.4s, v2.4s
sm4e v8.4s, v3.4s
sm4e v8.4s, v4.4s
sm4e v8.4s, v5.4s
sm4e v8.4s, v6.4s
sm4e v8.4s, v7.4s
rev64 v8.4s, v8.4s
ext v8.16b, v8.16b, v8.16b, #8
CPU_LE( rev32 v8.16b, v8.16b )
st1 {v8.4s}, [x1]
ret
ENDPROC(sm4_ce_do_crypt)
// SPDX-License-Identifier: GPL-2.0
#include <asm/neon.h>
#include <asm/simd.h>
#include <crypto/sm4.h>
#include <linux/module.h>
#include <linux/cpufeature.h>
#include <linux/crypto.h>
#include <linux/types.h>
MODULE_ALIAS_CRYPTO("sm4");
MODULE_ALIAS_CRYPTO("sm4-ce");
MODULE_DESCRIPTION("SM4 symmetric cipher using ARMv8 Crypto Extensions");
MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
MODULE_LICENSE("GPL v2");
asmlinkage void sm4_ce_do_crypt(const u32 *rk, void *out, const void *in);
static void sm4_ce_encrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
{
const struct crypto_sm4_ctx *ctx = crypto_tfm_ctx(tfm);
if (!may_use_simd()) {
crypto_sm4_encrypt(tfm, out, in);
} else {
kernel_neon_begin();
sm4_ce_do_crypt(ctx->rkey_enc, out, in);
kernel_neon_end();
}
}
static void sm4_ce_decrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
{
const struct crypto_sm4_ctx *ctx = crypto_tfm_ctx(tfm);
if (!may_use_simd()) {
crypto_sm4_decrypt(tfm, out, in);
} else {
kernel_neon_begin();
sm4_ce_do_crypt(ctx->rkey_dec, out, in);
kernel_neon_end();
}
}
static struct crypto_alg sm4_ce_alg = {
.cra_name = "sm4",
.cra_driver_name = "sm4-ce",
.cra_priority = 200,
.cra_flags = CRYPTO_ALG_TYPE_CIPHER,
.cra_blocksize = SM4_BLOCK_SIZE,
.cra_ctxsize = sizeof(struct crypto_sm4_ctx),
.cra_module = THIS_MODULE,
.cra_u.cipher = {
.cia_min_keysize = SM4_KEY_SIZE,
.cia_max_keysize = SM4_KEY_SIZE,
.cia_setkey = crypto_sm4_set_key,
.cia_encrypt = sm4_ce_encrypt,
.cia_decrypt = sm4_ce_decrypt
}
};
static int __init sm4_ce_mod_init(void)
{
return crypto_register_alg(&sm4_ce_alg);
}
static void __exit sm4_ce_mod_fini(void)
{
crypto_unregister_alg(&sm4_ce_alg);
}
module_cpu_feature_match(SM3, sm4_ce_mod_init);
module_exit(sm4_ce_mod_fini);
......@@ -15,7 +15,6 @@ obj-$(CONFIG_CRYPTO_GLUE_HELPER_X86) += glue_helper.o
obj-$(CONFIG_CRYPTO_AES_586) += aes-i586.o
obj-$(CONFIG_CRYPTO_TWOFISH_586) += twofish-i586.o
obj-$(CONFIG_CRYPTO_SALSA20_586) += salsa20-i586.o
obj-$(CONFIG_CRYPTO_SERPENT_SSE2_586) += serpent-sse2-i586.o
obj-$(CONFIG_CRYPTO_AES_X86_64) += aes-x86_64.o
......@@ -24,7 +23,6 @@ obj-$(CONFIG_CRYPTO_CAMELLIA_X86_64) += camellia-x86_64.o
obj-$(CONFIG_CRYPTO_BLOWFISH_X86_64) += blowfish-x86_64.o
obj-$(CONFIG_CRYPTO_TWOFISH_X86_64) += twofish-x86_64.o
obj-$(CONFIG_CRYPTO_TWOFISH_X86_64_3WAY) += twofish-x86_64-3way.o
obj-$(CONFIG_CRYPTO_SALSA20_X86_64) += salsa20-x86_64.o
obj-$(CONFIG_CRYPTO_CHACHA20_X86_64) += chacha20-x86_64.o
obj-$(CONFIG_CRYPTO_SERPENT_SSE2_X86_64) += serpent-sse2-x86_64.o
obj-$(CONFIG_CRYPTO_AES_NI_INTEL) += aesni-intel.o
......@@ -38,6 +36,16 @@ obj-$(CONFIG_CRYPTO_SHA512_SSSE3) += sha512-ssse3.o
obj-$(CONFIG_CRYPTO_CRCT10DIF_PCLMUL) += crct10dif-pclmul.o
obj-$(CONFIG_CRYPTO_POLY1305_X86_64) += poly1305-x86_64.o
obj-$(CONFIG_CRYPTO_AEGIS128_AESNI_SSE2) += aegis128-aesni.o
obj-$(CONFIG_CRYPTO_AEGIS128L_AESNI_SSE2) += aegis128l-aesni.o
obj-$(CONFIG_CRYPTO_AEGIS256_AESNI_SSE2) += aegis256-aesni.o
obj-$(CONFIG_CRYPTO_MORUS640_GLUE) += morus640_glue.o
obj-$(CONFIG_CRYPTO_MORUS1280_GLUE) += morus1280_glue.o
obj-$(CONFIG_CRYPTO_MORUS640_SSE2) += morus640-sse2.o
obj-$(CONFIG_CRYPTO_MORUS1280_SSE2) += morus1280-sse2.o
# These modules require assembler to support AVX.
ifeq ($(avx_supported),yes)
obj-$(CONFIG_CRYPTO_CAMELLIA_AESNI_AVX_X86_64) += \
......@@ -55,11 +63,12 @@ ifeq ($(avx2_supported),yes)
obj-$(CONFIG_CRYPTO_SHA1_MB) += sha1-mb/
obj-$(CONFIG_CRYPTO_SHA256_MB) += sha256-mb/
obj-$(CONFIG_CRYPTO_SHA512_MB) += sha512-mb/
obj-$(CONFIG_CRYPTO_MORUS1280_AVX2) += morus1280-avx2.o
endif
aes-i586-y := aes-i586-asm_32.o aes_glue.o
twofish-i586-y := twofish-i586-asm_32.o twofish_glue.o
salsa20-i586-y := salsa20-i586-asm_32.o salsa20_glue.o
serpent-sse2-i586-y := serpent-sse2-i586-asm_32.o serpent_sse2_glue.o
aes-x86_64-y := aes-x86_64-asm_64.o aes_glue.o
......@@ -68,10 +77,16 @@ camellia-x86_64-y := camellia-x86_64-asm_64.o camellia_glue.o
blowfish-x86_64-y := blowfish-x86_64-asm_64.o blowfish_glue.o
twofish-x86_64-y := twofish-x86_64-asm_64.o twofish_glue.o
twofish-x86_64-3way-y := twofish-x86_64-asm_64-3way.o twofish_glue_3way.o
salsa20-x86_64-y := salsa20-x86_64-asm_64.o salsa20_glue.o
chacha20-x86_64-y := chacha20-ssse3-x86_64.o chacha20_glue.o
serpent-sse2-x86_64-y := serpent-sse2-x86_64-asm_64.o serpent_sse2_glue.o
aegis128-aesni-y := aegis128-aesni-asm.o aegis128-aesni-glue.o
aegis128l-aesni-y := aegis128l-aesni-asm.o aegis128l-aesni-glue.o
aegis256-aesni-y := aegis256-aesni-asm.o aegis256-aesni-glue.o
morus640-sse2-y := morus640-sse2-asm.o morus640-sse2-glue.o
morus1280-sse2-y := morus1280-sse2-asm.o morus1280-sse2-glue.o
ifeq ($(avx_supported),yes)
camellia-aesni-avx-x86_64-y := camellia-aesni-avx-asm_64.o \
camellia_aesni_avx_glue.o
......@@ -87,6 +102,8 @@ ifeq ($(avx2_supported),yes)
camellia-aesni-avx2-y := camellia-aesni-avx2-asm_64.o camellia_aesni_avx2_glue.o
chacha20-x86_64-y += chacha20-avx2-x86_64.o
serpent-avx2-y := serpent-avx2-asm_64.o serpent_avx2_glue.o
morus1280-avx2-y := morus1280-avx2-asm.o morus1280-avx2-glue.o
endif
aesni-intel-y := aesni-intel_asm.o aesni-intel_glue.o fpu.o
......
/*
* AES-NI + SSE2 implementation of AEGIS-128
*
* Copyright (c) 2017-2018 Ondrej Mosnacek <omosnacek@gmail.com>
* Copyright (C) 2017-2018 Red Hat, Inc. All rights reserved.
*
* This program is free software; you can redistribute it and/or modify it
* under the terms of the GNU General Public License version 2 as published
* by the Free Software Foundation.
*/
#include <linux/linkage.h>
#include <asm/frame.h>
#define STATE0 %xmm0
#define STATE1 %xmm1
#define STATE2 %xmm2
#define STATE3 %xmm3
#define STATE4 %xmm4
#define KEY %xmm5
#define MSG %xmm5
#define T0 %xmm6
#define T1 %xmm7
#define STATEP %rdi
#define LEN %rsi
#define SRC %rdx
#define DST %rcx
.section .rodata.cst16.aegis128_const, "aM", @progbits, 32
.align 16
.Laegis128_const_0:
.byte 0x00, 0x01, 0x01, 0x02, 0x03, 0x05, 0x08, 0x0d
.byte 0x15, 0x22, 0x37, 0x59, 0x90, 0xe9, 0x79, 0x62
.Laegis128_const_1:
.byte 0xdb, 0x3d, 0x18, 0x55, 0x6d, 0xc2, 0x2f, 0xf1
.byte 0x20, 0x11, 0x31, 0x42, 0x73, 0xb5, 0x28, 0xdd
.section .rodata.cst16.aegis128_counter, "aM", @progbits, 16
.align 16
.Laegis128_counter:
.byte 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07
.byte 0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f
.text
/*
* aegis128_update
* input:
* STATE[0-4] - input state
* output:
* STATE[0-4] - output state (shifted positions)
* changed:
* T0
*/
.macro aegis128_update
movdqa STATE4, T0
aesenc STATE0, STATE4
aesenc STATE1, STATE0
aesenc STATE2, STATE1
aesenc STATE3, STATE2
aesenc T0, STATE3
.endm
/*
* __load_partial: internal ABI
* input:
* LEN - bytes
* SRC - src
* output:
* MSG - message block
* changed:
* T0
* %r8
* %r9
*/
__load_partial:
xor %r9, %r9
pxor MSG, MSG
mov LEN, %r8
and $0x1, %r8
jz .Lld_partial_1
mov LEN, %r8
and $0x1E, %r8
add SRC, %r8
mov (%r8), %r9b
.Lld_partial_1:
mov LEN, %r8
and $0x2, %r8
jz .Lld_partial_2
mov LEN, %r8
and $0x1C, %r8
add SRC, %r8
shl $0x10, %r9
mov (%r8), %r9w
.Lld_partial_2:
mov LEN, %r8
and $0x4, %r8
jz .Lld_partial_4
mov LEN, %r8
and $0x18, %r8
add SRC, %r8
shl $32, %r9
mov (%r8), %r8d
xor %r8, %r9
.Lld_partial_4:
movq %r9, MSG
mov LEN, %r8
and $0x8, %r8
jz .Lld_partial_8
mov LEN, %r8
and $0x10, %r8
add SRC, %r8
pslldq $8, MSG
movq (%r8), T0
pxor T0, MSG
.Lld_partial_8:
ret
ENDPROC(__load_partial)
/*
* __store_partial: internal ABI
* input:
* LEN - bytes
* DST - dst
* output:
* T0 - message block
* changed:
* %r8
* %r9
* %r10
*/
__store_partial:
mov LEN, %r8
mov DST, %r9
movq T0, %r10
cmp $8, %r8
jl .Lst_partial_8
mov %r10, (%r9)
psrldq $8, T0
movq T0, %r10
sub $8, %r8
add $8, %r9
.Lst_partial_8:
cmp $4, %r8
jl .Lst_partial_4
mov %r10d, (%r9)
shr $32, %r10
sub $4, %r8
add $4, %r9
.Lst_partial_4:
cmp $2, %r8
jl .Lst_partial_2
mov %r10w, (%r9)
shr $0x10, %r10
sub $2, %r8
add $2, %r9
.Lst_partial_2:
cmp $1, %r8
jl .Lst_partial_1
mov %r10b, (%r9)
.Lst_partial_1:
ret
ENDPROC(__store_partial)
/*
* void crypto_aegis128_aesni_init(void *state, const void *key, const void *iv);
*/
ENTRY(crypto_aegis128_aesni_init)
FRAME_BEGIN
/* load IV: */
movdqu (%rdx), T1
/* load key: */
movdqa (%rsi), KEY
pxor KEY, T1
movdqa T1, STATE0
movdqa KEY, STATE3
movdqa KEY, STATE4
/* load the constants: */
movdqa .Laegis128_const_0, STATE2
movdqa .Laegis128_const_1, STATE1
pxor STATE2, STATE3
pxor STATE1, STATE4
/* update 10 times with KEY / KEY xor IV: */
aegis128_update; pxor KEY, STATE4
aegis128_update; pxor T1, STATE3
aegis128_update; pxor KEY, STATE2
aegis128_update; pxor T1, STATE1
aegis128_update; pxor KEY, STATE0
aegis128_update; pxor T1, STATE4
aegis128_update; pxor KEY, STATE3
aegis128_update; pxor T1, STATE2
aegis128_update; pxor KEY, STATE1
aegis128_update; pxor T1, STATE0
/* store the state: */
movdqu STATE0, 0x00(STATEP)
movdqu STATE1, 0x10(STATEP)
movdqu STATE2, 0x20(STATEP)
movdqu STATE3, 0x30(STATEP)
movdqu STATE4, 0x40(STATEP)
FRAME_END
ret
ENDPROC(crypto_aegis128_aesni_init)
/*
* void crypto_aegis128_aesni_ad(void *state, unsigned int length,
* const void *data);
*/
ENTRY(crypto_aegis128_aesni_ad)
FRAME_BEGIN
cmp $0x10, LEN
jb .Lad_out
/* load the state: */
movdqu 0x00(STATEP), STATE0
movdqu 0x10(STATEP), STATE1
movdqu 0x20(STATEP), STATE2
movdqu 0x30(STATEP), STATE3
movdqu 0x40(STATEP), STATE4
mov SRC, %r8
and $0xF, %r8
jnz .Lad_u_loop
.align 8
.Lad_a_loop:
movdqa 0x00(SRC), MSG
aegis128_update
pxor MSG, STATE4
sub $0x10, LEN
cmp $0x10, LEN
jl .Lad_out_1
movdqa 0x10(SRC), MSG
aegis128_update
pxor MSG, STATE3
sub $0x10, LEN
cmp $0x10, LEN
jl .Lad_out_2
movdqa 0x20(SRC), MSG
aegis128_update
pxor MSG, STATE2
sub $0x10, LEN
cmp $0x10, LEN
jl .Lad_out_3
movdqa 0x30(SRC), MSG
aegis128_update
pxor MSG, STATE1
sub $0x10, LEN
cmp $0x10, LEN
jl .Lad_out_4
movdqa 0x40(SRC), MSG
aegis128_update
pxor MSG, STATE0
sub $0x10, LEN
cmp $0x10, LEN
jl .Lad_out_0
add $0x50, SRC
jmp .Lad_a_loop
.align 8
.Lad_u_loop:
movdqu 0x00(SRC), MSG
aegis128_update
pxor MSG, STATE4
sub $0x10, LEN
cmp $0x10, LEN
jl .Lad_out_1
movdqu 0x10(SRC), MSG
aegis128_update
pxor MSG, STATE3
sub $0x10, LEN
cmp $0x10, LEN
jl .Lad_out_2
movdqu 0x20(SRC), MSG
aegis128_update
pxor MSG, STATE2
sub $0x10, LEN
cmp $0x10, LEN
jl .Lad_out_3
movdqu 0x30(SRC), MSG
aegis128_update
pxor MSG, STATE1
sub $0x10, LEN
cmp $0x10, LEN
jl .Lad_out_4
movdqu 0x40(SRC), MSG
aegis128_update
pxor MSG, STATE0
sub $0x10, LEN
cmp $0x10, LEN
jl .Lad_out_0
add $0x50, SRC
jmp .Lad_u_loop
/* store the state: */
.Lad_out_0:
movdqu STATE0, 0x00(STATEP)
movdqu STATE1, 0x10(STATEP)
movdqu STATE2, 0x20(STATEP)
movdqu STATE3, 0x30(STATEP)
movdqu STATE4, 0x40(STATEP)
FRAME_END
ret
.Lad_out_1:
movdqu STATE4, 0x00(STATEP)
movdqu STATE0, 0x10(STATEP)
movdqu STATE1, 0x20(STATEP)
movdqu STATE2, 0x30(STATEP)
movdqu STATE3, 0x40(STATEP)
FRAME_END
ret
.Lad_out_2:
movdqu STATE3, 0x00(STATEP)
movdqu STATE4, 0x10(STATEP)
movdqu STATE0, 0x20(STATEP)
movdqu STATE1, 0x30(STATEP)
movdqu STATE2, 0x40(STATEP)
FRAME_END
ret
.Lad_out_3:
movdqu STATE2, 0x00(STATEP)
movdqu STATE3, 0x10(STATEP)
movdqu STATE4, 0x20(STATEP)
movdqu STATE0, 0x30(STATEP)
movdqu STATE1, 0x40(STATEP)
FRAME_END
ret
.Lad_out_4:
movdqu STATE1, 0x00(STATEP)
movdqu STATE2, 0x10(STATEP)
movdqu STATE3, 0x20(STATEP)
movdqu STATE4, 0x30(STATEP)
movdqu STATE0, 0x40(STATEP)
FRAME_END
ret
.Lad_out:
FRAME_END
ret
ENDPROC(crypto_aegis128_aesni_ad)
.macro encrypt_block a s0 s1 s2 s3 s4 i
movdq\a (\i * 0x10)(SRC), MSG
movdqa MSG, T0
pxor \s1, T0
pxor \s4, T0
movdqa \s2, T1
pand \s3, T1
pxor T1, T0
movdq\a T0, (\i * 0x10)(DST)
aegis128_update
pxor MSG, \s4
sub $0x10, LEN
cmp $0x10, LEN
jl .Lenc_out_\i
.endm
/*
* void crypto_aegis128_aesni_enc(void *state, unsigned int length,
* const void *src, void *dst);
*/
ENTRY(crypto_aegis128_aesni_enc)
FRAME_BEGIN
cmp $0x10, LEN
jb .Lenc_out
/* load the state: */
movdqu 0x00(STATEP), STATE0
movdqu 0x10(STATEP), STATE1
movdqu 0x20(STATEP), STATE2
movdqu 0x30(STATEP), STATE3
movdqu 0x40(STATEP), STATE4
mov SRC, %r8
or DST, %r8
and $0xF, %r8
jnz .Lenc_u_loop
.align 8
.Lenc_a_loop:
encrypt_block a STATE0 STATE1 STATE2 STATE3 STATE4 0
encrypt_block a STATE4 STATE0 STATE1 STATE2 STATE3 1
encrypt_block a STATE3 STATE4 STATE0 STATE1 STATE2 2
encrypt_block a STATE2 STATE3 STATE4 STATE0 STATE1 3
encrypt_block a STATE1 STATE2 STATE3 STATE4 STATE0 4
add $0x50, SRC
add $0x50, DST
jmp .Lenc_a_loop
.align 8
.Lenc_u_loop:
encrypt_block u STATE0 STATE1 STATE2 STATE3 STATE4 0
encrypt_block u STATE4 STATE0 STATE1 STATE2 STATE3 1
encrypt_block u STATE3 STATE4 STATE0 STATE1 STATE2 2
encrypt_block u STATE2 STATE3 STATE4 STATE0 STATE1 3
encrypt_block u STATE1 STATE2 STATE3 STATE4 STATE0 4
add $0x50, SRC
add $0x50, DST
jmp .Lenc_u_loop
/* store the state: */
.Lenc_out_0:
movdqu STATE4, 0x00(STATEP)
movdqu STATE0, 0x10(STATEP)
movdqu STATE1, 0x20(STATEP)
movdqu STATE2, 0x30(STATEP)
movdqu STATE3, 0x40(STATEP)
FRAME_END
ret
.Lenc_out_1:
movdqu STATE3, 0x00(STATEP)
movdqu STATE4, 0x10(STATEP)
movdqu STATE0, 0x20(STATEP)
movdqu STATE1, 0x30(STATEP)
movdqu STATE2, 0x40(STATEP)
FRAME_END
ret
.Lenc_out_2:
movdqu STATE2, 0x00(STATEP)
movdqu STATE3, 0x10(STATEP)
movdqu STATE4, 0x20(STATEP)
movdqu STATE0, 0x30(STATEP)
movdqu STATE1, 0x40(STATEP)
FRAME_END
ret
.Lenc_out_3:
movdqu STATE1, 0x00(STATEP)
movdqu STATE2, 0x10(STATEP)
movdqu STATE3, 0x20(STATEP)
movdqu STATE4, 0x30(STATEP)
movdqu STATE0, 0x40(STATEP)
FRAME_END
ret
.Lenc_out_4:
movdqu STATE0, 0x00(STATEP)
movdqu STATE1, 0x10(STATEP)
movdqu STATE2, 0x20(STATEP)
movdqu STATE3, 0x30(STATEP)
movdqu STATE4, 0x40(STATEP)
FRAME_END
ret
.Lenc_out:
FRAME_END
ret
ENDPROC(crypto_aegis128_aesni_enc)
/*
* void crypto_aegis128_aesni_enc_tail(void *state, unsigned int length,
* const void *src, void *dst);
*/
ENTRY(crypto_aegis128_aesni_enc_tail)
FRAME_BEGIN
/* load the state: */
movdqu 0x00(STATEP), STATE0
movdqu 0x10(STATEP), STATE1
movdqu 0x20(STATEP), STATE2
movdqu 0x30(STATEP), STATE3
movdqu 0x40(STATEP), STATE4
/* encrypt message: */
call __load_partial
movdqa MSG, T0
pxor STATE1, T0
pxor STATE4, T0
movdqa STATE2, T1
pand STATE3, T1
pxor T1, T0
call __store_partial
aegis128_update
pxor MSG, STATE4
/* store the state: */
movdqu STATE4, 0x00(STATEP)
movdqu STATE0, 0x10(STATEP)
movdqu STATE1, 0x20(STATEP)
movdqu STATE2, 0x30(STATEP)
movdqu STATE3, 0x40(STATEP)
FRAME_END
ENDPROC(crypto_aegis128_aesni_enc_tail)
.macro decrypt_block a s0 s1 s2 s3 s4 i
movdq\a (\i * 0x10)(SRC), MSG
pxor \s1, MSG
pxor \s4, MSG
movdqa \s2, T1
pand \s3, T1
pxor T1, MSG
movdq\a MSG, (\i * 0x10)(DST)
aegis128_update
pxor MSG, \s4
sub $0x10, LEN
cmp $0x10, LEN
jl .Ldec_out_\i
.endm
/*
* void crypto_aegis128_aesni_dec(void *state, unsigned int length,
* const void *src, void *dst);
*/
ENTRY(crypto_aegis128_aesni_dec)
FRAME_BEGIN
cmp $0x10, LEN
jb .Ldec_out
/* load the state: */
movdqu 0x00(STATEP), STATE0
movdqu 0x10(STATEP), STATE1
movdqu 0x20(STATEP), STATE2
movdqu 0x30(STATEP), STATE3
movdqu 0x40(STATEP), STATE4
mov SRC, %r8
or DST, %r8
and $0xF, %r8
jnz .Ldec_u_loop
.align 8
.Ldec_a_loop:
decrypt_block a STATE0 STATE1 STATE2 STATE3 STATE4 0
decrypt_block a STATE4 STATE0 STATE1 STATE2 STATE3 1
decrypt_block a STATE3 STATE4 STATE0 STATE1 STATE2 2
decrypt_block a STATE2 STATE3 STATE4 STATE0 STATE1 3
decrypt_block a STATE1 STATE2 STATE3 STATE4 STATE0 4
add $0x50, SRC
add $0x50, DST
jmp .Ldec_a_loop
.align 8
.Ldec_u_loop:
decrypt_block u STATE0 STATE1 STATE2 STATE3 STATE4 0
decrypt_block u STATE4 STATE0 STATE1 STATE2 STATE3 1
decrypt_block u STATE3 STATE4 STATE0 STATE1 STATE2 2
decrypt_block u STATE2 STATE3 STATE4 STATE0 STATE1 3
decrypt_block u STATE1 STATE2 STATE3 STATE4 STATE0 4
add $0x50, SRC
add $0x50, DST
jmp .Ldec_u_loop
/* store the state: */
.Ldec_out_0:
movdqu STATE4, 0x00(STATEP)
movdqu STATE0, 0x10(STATEP)
movdqu STATE1, 0x20(STATEP)
movdqu STATE2, 0x30(STATEP)
movdqu STATE3, 0x40(STATEP)
FRAME_END
ret
.Ldec_out_1:
movdqu STATE3, 0x00(STATEP)
movdqu STATE4, 0x10(STATEP)
movdqu STATE0, 0x20(STATEP)
movdqu STATE1, 0x30(STATEP)
movdqu STATE2, 0x40(STATEP)
FRAME_END
ret
.Ldec_out_2:
movdqu STATE2, 0x00(STATEP)
movdqu STATE3, 0x10(STATEP)
movdqu STATE4, 0x20(STATEP)
movdqu STATE0, 0x30(STATEP)
movdqu STATE1, 0x40(STATEP)
FRAME_END
ret
.Ldec_out_3:
movdqu STATE1, 0x00(STATEP)
movdqu STATE2, 0x10(STATEP)
movdqu STATE3, 0x20(STATEP)
movdqu STATE4, 0x30(STATEP)
movdqu STATE0, 0x40(STATEP)
FRAME_END
ret
.Ldec_out_4:
movdqu STATE0, 0x00(STATEP)
movdqu STATE1, 0x10(STATEP)
movdqu STATE2, 0x20(STATEP)
movdqu STATE3, 0x30(STATEP)
movdqu STATE4, 0x40(STATEP)
FRAME_END
ret
.Ldec_out:
FRAME_END
ret
ENDPROC(crypto_aegis128_aesni_dec)
/*
* void crypto_aegis128_aesni_dec_tail(void *state, unsigned int length,
* const void *src, void *dst);
*/
ENTRY(crypto_aegis128_aesni_dec_tail)
FRAME_BEGIN
/* load the state: */
movdqu 0x00(STATEP), STATE0
movdqu 0x10(STATEP), STATE1
movdqu 0x20(STATEP), STATE2
movdqu 0x30(STATEP), STATE3
movdqu 0x40(STATEP), STATE4
/* decrypt message: */
call __load_partial
pxor STATE1, MSG
pxor STATE4, MSG
movdqa STATE2, T1
pand STATE3, T1
pxor T1, MSG
movdqa MSG, T0
call __store_partial
/* mask with byte count: */
movq LEN, T0
punpcklbw T0, T0
punpcklbw T0, T0
punpcklbw T0, T0
punpcklbw T0, T0
movdqa .Laegis128_counter, T1
pcmpgtb T1, T0
pand T0, MSG
aegis128_update
pxor MSG, STATE4
/* store the state: */
movdqu STATE4, 0x00(STATEP)
movdqu STATE0, 0x10(STATEP)
movdqu STATE1, 0x20(STATEP)
movdqu STATE2, 0x30(STATEP)
movdqu STATE3, 0x40(STATEP)
FRAME_END
ret
ENDPROC(crypto_aegis128_aesni_dec_tail)
/*
* void crypto_aegis128_aesni_final(void *state, void *tag_xor,
* u64 assoclen, u64 cryptlen);
*/
ENTRY(crypto_aegis128_aesni_final)
FRAME_BEGIN
/* load the state: */
movdqu 0x00(STATEP), STATE0
movdqu 0x10(STATEP), STATE1
movdqu 0x20(STATEP), STATE2
movdqu 0x30(STATEP), STATE3
movdqu 0x40(STATEP), STATE4
/* prepare length block: */
movq %rdx, MSG
movq %rcx, T0
pslldq $8, T0
pxor T0, MSG
psllq $3, MSG /* multiply by 8 (to get bit count) */
pxor STATE3, MSG
/* update state: */
aegis128_update; pxor MSG, STATE4
aegis128_update; pxor MSG, STATE3
aegis128_update; pxor MSG, STATE2
aegis128_update; pxor MSG, STATE1
aegis128_update; pxor MSG, STATE0
aegis128_update; pxor MSG, STATE4
aegis128_update; pxor MSG, STATE3
/* xor tag: */
movdqu (%rsi), MSG
pxor STATE0, MSG
pxor STATE1, MSG
pxor STATE2, MSG
pxor STATE3, MSG
pxor STATE4, MSG
movdqu MSG, (%rsi)
FRAME_END
ret
ENDPROC(crypto_aegis128_aesni_final)
/*
* The AEGIS-128 Authenticated-Encryption Algorithm
* Glue for AES-NI + SSE2 implementation
*
* Copyright (c) 2017-2018 Ondrej Mosnacek <omosnacek@gmail.com>
* Copyright (C) 2017-2018 Red Hat, Inc. All rights reserved.
*
* This program is free software; you can redistribute it and/or modify it
* under the terms of the GNU General Public License as published by the Free
* Software Foundation; either version 2 of the License, or (at your option)
* any later version.
*/
#include <crypto/cryptd.h>
#include <crypto/internal/aead.h>
#include <crypto/internal/skcipher.h>
#include <crypto/scatterwalk.h>
#include <linux/module.h>
#include <asm/fpu/api.h>
#include <asm/cpu_device_id.h>
#define AEGIS128_BLOCK_ALIGN 16
#define AEGIS128_BLOCK_SIZE 16
#define AEGIS128_NONCE_SIZE 16
#define AEGIS128_STATE_BLOCKS 5
#define AEGIS128_KEY_SIZE 16
#define AEGIS128_MIN_AUTH_SIZE 8
#define AEGIS128_MAX_AUTH_SIZE 16
asmlinkage void crypto_aegis128_aesni_init(void *state, void *key, void *iv);
asmlinkage void crypto_aegis128_aesni_ad(
void *state, unsigned int length, const void *data);
asmlinkage void crypto_aegis128_aesni_enc(
void *state, unsigned int length, const void *src, void *dst);
asmlinkage void crypto_aegis128_aesni_dec(
void *state, unsigned int length, const void *src, void *dst);
asmlinkage void crypto_aegis128_aesni_enc_tail(
void *state, unsigned int length, const void *src, void *dst);
asmlinkage void crypto_aegis128_aesni_dec_tail(
void *state, unsigned int length, const void *src, void *dst);
asmlinkage void crypto_aegis128_aesni_final(
void *state, void *tag_xor, unsigned int cryptlen,
unsigned int assoclen);
struct aegis_block {
u8 bytes[AEGIS128_BLOCK_SIZE] __aligned(AEGIS128_BLOCK_ALIGN);
};
struct aegis_state {
struct aegis_block blocks[AEGIS128_STATE_BLOCKS];
};
struct aegis_ctx {
struct aegis_block key;
};
struct aegis_crypt_ops {
int (*skcipher_walk_init)(struct skcipher_walk *walk,
struct aead_request *req, bool atomic);
void (*crypt_blocks)(void *state, unsigned int length, const void *src,
void *dst);
void (*crypt_tail)(void *state, unsigned int length, const void *src,
void *dst);
};
static void crypto_aegis128_aesni_process_ad(
struct aegis_state *state, struct scatterlist *sg_src,
unsigned int assoclen)
{
struct scatter_walk walk;
struct aegis_block buf;
unsigned int pos = 0;
scatterwalk_start(&walk, sg_src);
while (assoclen != 0) {
unsigned int size = scatterwalk_clamp(&walk, assoclen);
unsigned int left = size;
void *mapped = scatterwalk_map(&walk);
const u8 *src = (const u8 *)mapped;
if (pos + size >= AEGIS128_BLOCK_SIZE) {
if (pos > 0) {
unsigned int fill = AEGIS128_BLOCK_SIZE - pos;
memcpy(buf.bytes + pos, src, fill);
crypto_aegis128_aesni_ad(state,
AEGIS128_BLOCK_SIZE,
buf.bytes);
pos = 0;
left -= fill;
src += fill;
}
crypto_aegis128_aesni_ad(state, left, src);
src += left & ~(AEGIS128_BLOCK_SIZE - 1);
left &= AEGIS128_BLOCK_SIZE - 1;
}
memcpy(buf.bytes + pos, src, left);
pos += left;
assoclen -= size;
scatterwalk_unmap(mapped);
scatterwalk_advance(&walk, size);
scatterwalk_done(&walk, 0, assoclen);
}
if (pos > 0) {
memset(buf.bytes + pos, 0, AEGIS128_BLOCK_SIZE - pos);
crypto_aegis128_aesni_ad(state, AEGIS128_BLOCK_SIZE, buf.bytes);
}
}
static void crypto_aegis128_aesni_process_crypt(
struct aegis_state *state, struct aead_request *req,
const struct aegis_crypt_ops *ops)
{
struct skcipher_walk walk;
u8 *src, *dst;
unsigned int chunksize, base;
ops->skcipher_walk_init(&walk, req, false);
while (walk.nbytes) {
src = walk.src.virt.addr;
dst = walk.dst.virt.addr;
chunksize = walk.nbytes;
ops->crypt_blocks(state, chunksize, src, dst);
base = chunksize & ~(AEGIS128_BLOCK_SIZE - 1);
src += base;
dst += base;
chunksize &= AEGIS128_BLOCK_SIZE - 1;
if (chunksize > 0)
ops->crypt_tail(state, chunksize, src, dst);
skcipher_walk_done(&walk, 0);
}
}
static struct aegis_ctx *crypto_aegis128_aesni_ctx(struct crypto_aead *aead)
{
u8 *ctx = crypto_aead_ctx(aead);
ctx = PTR_ALIGN(ctx, __alignof__(struct aegis_ctx));
return (void *)ctx;
}
static int crypto_aegis128_aesni_setkey(struct crypto_aead *aead, const u8 *key,
unsigned int keylen)
{
struct aegis_ctx *ctx = crypto_aegis128_aesni_ctx(aead);
if (keylen != AEGIS128_KEY_SIZE) {
crypto_aead_set_flags(aead, CRYPTO_TFM_RES_BAD_KEY_LEN);
return -EINVAL;
}
memcpy(ctx->key.bytes, key, AEGIS128_KEY_SIZE);
return 0;
}
static int crypto_aegis128_aesni_setauthsize(struct crypto_aead *tfm,
unsigned int authsize)
{
if (authsize > AEGIS128_MAX_AUTH_SIZE)
return -EINVAL;
if (authsize < AEGIS128_MIN_AUTH_SIZE)
return -EINVAL;
return 0;
}
static void crypto_aegis128_aesni_crypt(struct aead_request *req,
struct aegis_block *tag_xor,
unsigned int cryptlen,
const struct aegis_crypt_ops *ops)
{
struct crypto_aead *tfm = crypto_aead_reqtfm(req);
struct aegis_ctx *ctx = crypto_aegis128_aesni_ctx(tfm);
struct aegis_state state;
kernel_fpu_begin();
crypto_aegis128_aesni_init(&state, ctx->key.bytes, req->iv);
crypto_aegis128_aesni_process_ad(&state, req->src, req->assoclen);
crypto_aegis128_aesni_process_crypt(&state, req, ops);
crypto_aegis128_aesni_final(&state, tag_xor, req->assoclen, cryptlen);
kernel_fpu_end();
}
static int crypto_aegis128_aesni_encrypt(struct aead_request *req)
{
static const struct aegis_crypt_ops OPS = {
.skcipher_walk_init = skcipher_walk_aead_encrypt,
.crypt_blocks = crypto_aegis128_aesni_enc,
.crypt_tail = crypto_aegis128_aesni_enc_tail,
};
struct crypto_aead *tfm = crypto_aead_reqtfm(req);
struct aegis_block tag = {};
unsigned int authsize = crypto_aead_authsize(tfm);
unsigned int cryptlen = req->cryptlen;
crypto_aegis128_aesni_crypt(req, &tag, cryptlen, &OPS);
scatterwalk_map_and_copy(tag.bytes, req->dst,
req->assoclen + cryptlen, authsize, 1);
return 0;
}
static int crypto_aegis128_aesni_decrypt(struct aead_request *req)
{
static const struct aegis_block zeros = {};
static const struct aegis_crypt_ops OPS = {
.skcipher_walk_init = skcipher_walk_aead_decrypt,
.crypt_blocks = crypto_aegis128_aesni_dec,
.crypt_tail = crypto_aegis128_aesni_dec_tail,
};
struct crypto_aead *tfm = crypto_aead_reqtfm(req);
struct aegis_block tag;
unsigned int authsize = crypto_aead_authsize(tfm);
unsigned int cryptlen = req->cryptlen - authsize;
scatterwalk_map_and_copy(tag.bytes, req->src,
req->assoclen + cryptlen, authsize, 0);
crypto_aegis128_aesni_crypt(req, &tag, cryptlen, &OPS);
return crypto_memneq(tag.bytes, zeros.bytes, authsize) ? -EBADMSG : 0;
}
static int crypto_aegis128_aesni_init_tfm(struct crypto_aead *aead)
{
return 0;
}
static void crypto_aegis128_aesni_exit_tfm(struct crypto_aead *aead)
{
}
static int cryptd_aegis128_aesni_setkey(struct crypto_aead *aead,
const u8 *key, unsigned int keylen)
{
struct cryptd_aead **ctx = crypto_aead_ctx(aead);
struct cryptd_aead *cryptd_tfm = *ctx;
return crypto_aead_setkey(&cryptd_tfm->base, key, keylen);
}
static int cryptd_aegis128_aesni_setauthsize(struct crypto_aead *aead,
unsigned int authsize)
{
struct cryptd_aead **ctx = crypto_aead_ctx(aead);
struct cryptd_aead *cryptd_tfm = *ctx;
return crypto_aead_setauthsize(&cryptd_tfm->base, authsize);
}
static int cryptd_aegis128_aesni_encrypt(struct aead_request *req)
{
struct crypto_aead *aead = crypto_aead_reqtfm(req);
struct cryptd_aead **ctx = crypto_aead_ctx(aead);
struct cryptd_aead *cryptd_tfm = *ctx;
aead = &cryptd_tfm->base;
if (irq_fpu_usable() && (!in_atomic() ||
!cryptd_aead_queued(cryptd_tfm)))
aead = cryptd_aead_child(cryptd_tfm);
aead_request_set_tfm(req, aead);
return crypto_aead_encrypt(req);
}
static int cryptd_aegis128_aesni_decrypt(struct aead_request *req)
{
struct crypto_aead *aead = crypto_aead_reqtfm(req);
struct cryptd_aead **ctx = crypto_aead_ctx(aead);
struct cryptd_aead *cryptd_tfm = *ctx;
aead = &cryptd_tfm->base;
if (irq_fpu_usable() && (!in_atomic() ||
!cryptd_aead_queued(cryptd_tfm)))
aead = cryptd_aead_child(cryptd_tfm);
aead_request_set_tfm(req, aead);
return crypto_aead_decrypt(req);
}
static int cryptd_aegis128_aesni_init_tfm(struct crypto_aead *aead)
{
struct cryptd_aead *cryptd_tfm;
struct cryptd_aead **ctx = crypto_aead_ctx(aead);
cryptd_tfm = cryptd_alloc_aead("__aegis128-aesni", CRYPTO_ALG_INTERNAL,
CRYPTO_ALG_INTERNAL);
if (IS_ERR(cryptd_tfm))
return PTR_ERR(cryptd_tfm);
*ctx = cryptd_tfm;
crypto_aead_set_reqsize(aead, crypto_aead_reqsize(&cryptd_tfm->base));
return 0;
}
static void cryptd_aegis128_aesni_exit_tfm(struct crypto_aead *aead)
{
struct cryptd_aead **ctx = crypto_aead_ctx(aead);
cryptd_free_aead(*ctx);
}
static struct aead_alg crypto_aegis128_aesni_alg[] = {
{
.setkey = crypto_aegis128_aesni_setkey,
.setauthsize = crypto_aegis128_aesni_setauthsize,
.encrypt = crypto_aegis128_aesni_encrypt,
.decrypt = crypto_aegis128_aesni_decrypt,
.init = crypto_aegis128_aesni_init_tfm,
.exit = crypto_aegis128_aesni_exit_tfm,
.ivsize = AEGIS128_NONCE_SIZE,
.maxauthsize = AEGIS128_MAX_AUTH_SIZE,
.chunksize = AEGIS128_BLOCK_SIZE,
.base = {
.cra_flags = CRYPTO_ALG_INTERNAL,
.cra_blocksize = 1,
.cra_ctxsize = sizeof(struct aegis_ctx) +
__alignof__(struct aegis_ctx),
.cra_alignmask = 0,
.cra_name = "__aegis128",
.cra_driver_name = "__aegis128-aesni",
.cra_module = THIS_MODULE,
}
}, {
.setkey = cryptd_aegis128_aesni_setkey,
.setauthsize = cryptd_aegis128_aesni_setauthsize,
.encrypt = cryptd_aegis128_aesni_encrypt,
.decrypt = cryptd_aegis128_aesni_decrypt,
.init = cryptd_aegis128_aesni_init_tfm,
.exit = cryptd_aegis128_aesni_exit_tfm,
.ivsize = AEGIS128_NONCE_SIZE,
.maxauthsize = AEGIS128_MAX_AUTH_SIZE,
.chunksize = AEGIS128_BLOCK_SIZE,
.base = {
.cra_flags = CRYPTO_ALG_ASYNC,
.cra_blocksize = 1,
.cra_ctxsize = sizeof(struct cryptd_aead *),
.cra_alignmask = 0,
.cra_priority = 400,
.cra_name = "aegis128",
.cra_driver_name = "aegis128-aesni",
.cra_module = THIS_MODULE,
}
}
};
static const struct x86_cpu_id aesni_cpu_id[] = {
X86_FEATURE_MATCH(X86_FEATURE_AES),
X86_FEATURE_MATCH(X86_FEATURE_XMM2),
{}
};
MODULE_DEVICE_TABLE(x86cpu, aesni_cpu_id);
static int __init crypto_aegis128_aesni_module_init(void)
{
if (!x86_match_cpu(aesni_cpu_id))
return -ENODEV;
return crypto_register_aeads(crypto_aegis128_aesni_alg,
ARRAY_SIZE(crypto_aegis128_aesni_alg));
}
static void __exit crypto_aegis128_aesni_module_exit(void)
{
crypto_unregister_aeads(crypto_aegis128_aesni_alg,
ARRAY_SIZE(crypto_aegis128_aesni_alg));
}
module_init(crypto_aegis128_aesni_module_init);
module_exit(crypto_aegis128_aesni_module_exit);
MODULE_LICENSE("GPL");
MODULE_AUTHOR("Ondrej Mosnacek <omosnacek@gmail.com>");
MODULE_DESCRIPTION("AEGIS-128 AEAD algorithm -- AESNI+SSE2 implementation");
MODULE_ALIAS_CRYPTO("aegis128");
MODULE_ALIAS_CRYPTO("aegis128-aesni");
/*
* AES-NI + SSE2 implementation of AEGIS-128L
*
* Copyright (c) 2017-2018 Ondrej Mosnacek <omosnacek@gmail.com>
* Copyright (C) 2017-2018 Red Hat, Inc. All rights reserved.
*
* This program is free software; you can redistribute it and/or modify it
* under the terms of the GNU General Public License version 2 as published
* by the Free Software Foundation.
*/
#include <linux/linkage.h>
#include <asm/frame.h>
#define STATE0 %xmm0
#define STATE1 %xmm1
#define STATE2 %xmm2
#define STATE3 %xmm3
#define STATE4 %xmm4
#define STATE5 %xmm5
#define STATE6 %xmm6
#define STATE7 %xmm7
#define MSG0 %xmm8
#define MSG1 %xmm9
#define T0 %xmm10
#define T1 %xmm11
#define T2 %xmm12
#define T3 %xmm13
#define STATEP %rdi
#define LEN %rsi
#define SRC %rdx
#define DST %rcx
.section .rodata.cst16.aegis128l_const, "aM", @progbits, 32
.align 16
.Laegis128l_const_0:
.byte 0x00, 0x01, 0x01, 0x02, 0x03, 0x05, 0x08, 0x0d
.byte 0x15, 0x22, 0x37, 0x59, 0x90, 0xe9, 0x79, 0x62
.Laegis128l_const_1:
.byte 0xdb, 0x3d, 0x18, 0x55, 0x6d, 0xc2, 0x2f, 0xf1
.byte 0x20, 0x11, 0x31, 0x42, 0x73, 0xb5, 0x28, 0xdd
.section .rodata.cst16.aegis128l_counter, "aM", @progbits, 16
.align 16
.Laegis128l_counter0:
.byte 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07
.byte 0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f
.Laegis128l_counter1:
.byte 0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17
.byte 0x18, 0x19, 0x1a, 0x1b, 0x1c, 0x1d, 0x1e, 0x1f
.text
/*
* __load_partial: internal ABI
* input:
* LEN - bytes
* SRC - src
* output:
* MSG0 - first message block
* MSG1 - second message block
* changed:
* T0
* %r8
* %r9
*/
__load_partial:
xor %r9, %r9
pxor MSG0, MSG0
pxor MSG1, MSG1
mov LEN, %r8
and $0x1, %r8
jz .Lld_partial_1
mov LEN, %r8
and $0x1E, %r8
add SRC, %r8
mov (%r8), %r9b
.Lld_partial_1:
mov LEN, %r8
and $0x2, %r8
jz .Lld_partial_2
mov LEN, %r8
and $0x1C, %r8
add SRC, %r8
shl $0x10, %r9
mov (%r8), %r9w
.Lld_partial_2:
mov LEN, %r8
and $0x4, %r8
jz .Lld_partial_4
mov LEN, %r8
and $0x18, %r8
add SRC, %r8
shl $32, %r9
mov (%r8), %r8d
xor %r8, %r9
.Lld_partial_4:
movq %r9, MSG0
mov LEN, %r8
and $0x8, %r8
jz .Lld_partial_8
mov LEN, %r8
and $0x10, %r8
add SRC, %r8
pslldq $8, MSG0
movq (%r8), T0
pxor T0, MSG0
.Lld_partial_8:
mov LEN, %r8
and $0x10, %r8
jz .Lld_partial_16
movdqa MSG0, MSG1
movdqu (SRC), MSG0
.Lld_partial_16:
ret
ENDPROC(__load_partial)
/*
* __store_partial: internal ABI
* input:
* LEN - bytes
* DST - dst
* output:
* T0 - first message block
* T1 - second message block
* changed:
* %r8
* %r9
* %r10
*/
__store_partial:
mov LEN, %r8
mov DST, %r9
cmp $16, %r8
jl .Lst_partial_16
movdqu T0, (%r9)
movdqa T1, T0
sub $16, %r8
add $16, %r9
.Lst_partial_16:
movq T0, %r10
cmp $8, %r8
jl .Lst_partial_8
mov %r10, (%r9)
psrldq $8, T0
movq T0, %r10
sub $8, %r8
add $8, %r9
.Lst_partial_8:
cmp $4, %r8
jl .Lst_partial_4
mov %r10d, (%r9)
shr $32, %r10
sub $4, %r8
add $4, %r9
.Lst_partial_4:
cmp $2, %r8
jl .Lst_partial_2
mov %r10w, (%r9)
shr $0x10, %r10
sub $2, %r8
add $2, %r9
.Lst_partial_2:
cmp $1, %r8
jl .Lst_partial_1
mov %r10b, (%r9)
.Lst_partial_1:
ret
ENDPROC(__store_partial)
.macro update
movdqa STATE7, T0
aesenc STATE0, STATE7
aesenc STATE1, STATE0
aesenc STATE2, STATE1
aesenc STATE3, STATE2
aesenc STATE4, STATE3
aesenc STATE5, STATE4
aesenc STATE6, STATE5
aesenc T0, STATE6
.endm
.macro update0
update
pxor MSG0, STATE7
pxor MSG1, STATE3
.endm
.macro update1
update
pxor MSG0, STATE6
pxor MSG1, STATE2
.endm
.macro update2
update
pxor MSG0, STATE5
pxor MSG1, STATE1
.endm
.macro update3
update
pxor MSG0, STATE4
pxor MSG1, STATE0
.endm
.macro update4
update
pxor MSG0, STATE3
pxor MSG1, STATE7
.endm
.macro update5
update
pxor MSG0, STATE2
pxor MSG1, STATE6
.endm
.macro update6
update
pxor MSG0, STATE1
pxor MSG1, STATE5
.endm
.macro update7
update
pxor MSG0, STATE0
pxor MSG1, STATE4
.endm
.macro state_load
movdqu 0x00(STATEP), STATE0
movdqu 0x10(STATEP), STATE1
movdqu 0x20(STATEP), STATE2
movdqu 0x30(STATEP), STATE3
movdqu 0x40(STATEP), STATE4
movdqu 0x50(STATEP), STATE5
movdqu 0x60(STATEP), STATE6
movdqu 0x70(STATEP), STATE7
.endm
.macro state_store s0 s1 s2 s3 s4 s5 s6 s7
movdqu \s7, 0x00(STATEP)
movdqu \s0, 0x10(STATEP)
movdqu \s1, 0x20(STATEP)
movdqu \s2, 0x30(STATEP)
movdqu \s3, 0x40(STATEP)
movdqu \s4, 0x50(STATEP)
movdqu \s5, 0x60(STATEP)
movdqu \s6, 0x70(STATEP)
.endm
.macro state_store0
state_store STATE0 STATE1 STATE2 STATE3 STATE4 STATE5 STATE6 STATE7
.endm
.macro state_store1
state_store STATE7 STATE0 STATE1 STATE2 STATE3 STATE4 STATE5 STATE6
.endm
.macro state_store2
state_store STATE6 STATE7 STATE0 STATE1 STATE2 STATE3 STATE4 STATE5
.endm
.macro state_store3
state_store STATE5 STATE6 STATE7 STATE0 STATE1 STATE2 STATE3 STATE4
.endm
.macro state_store4
state_store STATE4 STATE5 STATE6 STATE7 STATE0 STATE1 STATE2 STATE3
.endm
.macro state_store5
state_store STATE3 STATE4 STATE5 STATE6 STATE7 STATE0 STATE1 STATE2
.endm
.macro state_store6
state_store STATE2 STATE3 STATE4 STATE5 STATE6 STATE7 STATE0 STATE1
.endm
.macro state_store7
state_store STATE1 STATE2 STATE3 STATE4 STATE5 STATE6 STATE7 STATE0
.endm
/*
* void crypto_aegis128l_aesni_init(void *state, const void *key, const void *iv);
*/
ENTRY(crypto_aegis128l_aesni_init)
FRAME_BEGIN
/* load key: */
movdqa (%rsi), MSG1
movdqa MSG1, STATE0
movdqa MSG1, STATE4
movdqa MSG1, STATE5
movdqa MSG1, STATE6
movdqa MSG1, STATE7
/* load IV: */
movdqu (%rdx), MSG0
pxor MSG0, STATE0
pxor MSG0, STATE4
/* load the constants: */
movdqa .Laegis128l_const_0, STATE2
movdqa .Laegis128l_const_1, STATE1
movdqa STATE1, STATE3
pxor STATE2, STATE5
pxor STATE1, STATE6
pxor STATE2, STATE7
/* update 10 times with IV and KEY: */
update0
update1
update2
update3
update4
update5
update6
update7
update0
update1
state_store1
FRAME_END
ret
ENDPROC(crypto_aegis128l_aesni_init)
.macro ad_block a i
movdq\a (\i * 0x20 + 0x00)(SRC), MSG0
movdq\a (\i * 0x20 + 0x10)(SRC), MSG1
update\i
sub $0x20, LEN
cmp $0x20, LEN
jl .Lad_out_\i
.endm
/*
* void crypto_aegis128l_aesni_ad(void *state, unsigned int length,
* const void *data);
*/
ENTRY(crypto_aegis128l_aesni_ad)
FRAME_BEGIN
cmp $0x20, LEN
jb .Lad_out
state_load
mov SRC, %r8
and $0xf, %r8
jnz .Lad_u_loop
.align 8
.Lad_a_loop:
ad_block a 0
ad_block a 1
ad_block a 2
ad_block a 3
ad_block a 4
ad_block a 5
ad_block a 6
ad_block a 7
add $0x100, SRC
jmp .Lad_a_loop
.align 8
.Lad_u_loop:
ad_block u 0
ad_block u 1
ad_block u 2
ad_block u 3
ad_block u 4
ad_block u 5
ad_block u 6
ad_block u 7
add $0x100, SRC
jmp .Lad_u_loop
.Lad_out_0:
state_store0
FRAME_END
ret
.Lad_out_1:
state_store1
FRAME_END
ret
.Lad_out_2:
state_store2
FRAME_END
ret
.Lad_out_3:
state_store3
FRAME_END
ret
.Lad_out_4:
state_store4
FRAME_END
ret
.Lad_out_5:
state_store5
FRAME_END
ret
.Lad_out_6:
state_store6
FRAME_END
ret
.Lad_out_7:
state_store7
FRAME_END
ret
.Lad_out:
FRAME_END
ret
ENDPROC(crypto_aegis128l_aesni_ad)
.macro crypt m0 m1 s0 s1 s2 s3 s4 s5 s6 s7
pxor \s1, \m0
pxor \s6, \m0
movdqa \s2, T3
pand \s3, T3
pxor T3, \m0
pxor \s2, \m1
pxor \s5, \m1
movdqa \s6, T3
pand \s7, T3
pxor T3, \m1
.endm
.macro crypt0 m0 m1
crypt \m0 \m1 STATE0 STATE1 STATE2 STATE3 STATE4 STATE5 STATE6 STATE7
.endm
.macro crypt1 m0 m1
crypt \m0 \m1 STATE7 STATE0 STATE1 STATE2 STATE3 STATE4 STATE5 STATE6
.endm
.macro crypt2 m0 m1
crypt \m0 \m1 STATE6 STATE7 STATE0 STATE1 STATE2 STATE3 STATE4 STATE5
.endm
.macro crypt3 m0 m1
crypt \m0 \m1 STATE5 STATE6 STATE7 STATE0 STATE1 STATE2 STATE3 STATE4
.endm
.macro crypt4 m0 m1
crypt \m0 \m1 STATE4 STATE5 STATE6 STATE7 STATE0 STATE1 STATE2 STATE3
.endm
.macro crypt5 m0 m1
crypt \m0 \m1 STATE3 STATE4 STATE5 STATE6 STATE7 STATE0 STATE1 STATE2
.endm
.macro crypt6 m0 m1
crypt \m0 \m1 STATE2 STATE3 STATE4 STATE5 STATE6 STATE7 STATE0 STATE1
.endm
.macro crypt7 m0 m1
crypt \m0 \m1 STATE1 STATE2 STATE3 STATE4 STATE5 STATE6 STATE7 STATE0
.endm
.macro encrypt_block a i
movdq\a (\i * 0x20 + 0x00)(SRC), MSG0
movdq\a (\i * 0x20 + 0x10)(SRC), MSG1
movdqa MSG0, T0
movdqa MSG1, T1
crypt\i T0, T1
movdq\a T0, (\i * 0x20 + 0x00)(DST)
movdq\a T1, (\i * 0x20 + 0x10)(DST)
update\i
sub $0x20, LEN
cmp $0x20, LEN
jl .Lenc_out_\i
.endm
.macro decrypt_block a i
movdq\a (\i * 0x20 + 0x00)(SRC), MSG0
movdq\a (\i * 0x20 + 0x10)(SRC), MSG1
crypt\i MSG0, MSG1
movdq\a MSG0, (\i * 0x20 + 0x00)(DST)
movdq\a MSG1, (\i * 0x20 + 0x10)(DST)
update\i
sub $0x20, LEN
cmp $0x20, LEN
jl .Ldec_out_\i
.endm
/*
* void crypto_aegis128l_aesni_enc(void *state, unsigned int length,
* const void *src, void *dst);
*/
ENTRY(crypto_aegis128l_aesni_enc)
FRAME_BEGIN
cmp $0x20, LEN
jb .Lenc_out
state_load
mov SRC, %r8
or DST, %r8
and $0xf, %r8
jnz .Lenc_u_loop
.align 8
.Lenc_a_loop:
encrypt_block a 0
encrypt_block a 1
encrypt_block a 2
encrypt_block a 3
encrypt_block a 4
encrypt_block a 5
encrypt_block a 6
encrypt_block a 7
add $0x100, SRC
add $0x100, DST
jmp .Lenc_a_loop
.align 8
.Lenc_u_loop:
encrypt_block u 0
encrypt_block u 1
encrypt_block u 2
encrypt_block u 3
encrypt_block u 4
encrypt_block u 5
encrypt_block u 6
encrypt_block u 7
add $0x100, SRC
add $0x100, DST
jmp .Lenc_u_loop
.Lenc_out_0:
state_store0
FRAME_END
ret
.Lenc_out_1:
state_store1
FRAME_END
ret
.Lenc_out_2:
state_store2
FRAME_END
ret
.Lenc_out_3:
state_store3
FRAME_END
ret
.Lenc_out_4:
state_store4
FRAME_END
ret
.Lenc_out_5:
state_store5
FRAME_END
ret
.Lenc_out_6:
state_store6
FRAME_END
ret
.Lenc_out_7:
state_store7
FRAME_END
ret
.Lenc_out:
FRAME_END
ret
ENDPROC(crypto_aegis128l_aesni_enc)
/*
* void crypto_aegis128l_aesni_enc_tail(void *state, unsigned int length,
* const void *src, void *dst);
*/
ENTRY(crypto_aegis128l_aesni_enc_tail)
FRAME_BEGIN
state_load
/* encrypt message: */
call __load_partial
movdqa MSG0, T0
movdqa MSG1, T1
crypt0 T0, T1
call __store_partial
update0
state_store0
FRAME_END
ENDPROC(crypto_aegis128l_aesni_enc_tail)
/*
* void crypto_aegis128l_aesni_dec(void *state, unsigned int length,
* const void *src, void *dst);
*/
ENTRY(crypto_aegis128l_aesni_dec)
FRAME_BEGIN
cmp $0x20, LEN
jb .Ldec_out
state_load
mov SRC, %r8
or DST, %r8
and $0xF, %r8
jnz .Ldec_u_loop
.align 8
.Ldec_a_loop:
decrypt_block a 0
decrypt_block a 1
decrypt_block a 2
decrypt_block a 3
decrypt_block a 4
decrypt_block a 5
decrypt_block a 6
decrypt_block a 7
add $0x100, SRC
add $0x100, DST
jmp .Ldec_a_loop
.align 8
.Ldec_u_loop:
decrypt_block u 0
decrypt_block u 1
decrypt_block u 2
decrypt_block u 3
decrypt_block u 4
decrypt_block u 5
decrypt_block u 6
decrypt_block u 7
add $0x100, SRC
add $0x100, DST
jmp .Ldec_u_loop
.Ldec_out_0:
state_store0
FRAME_END
ret
.Ldec_out_1:
state_store1
FRAME_END
ret
.Ldec_out_2:
state_store2
FRAME_END
ret
.Ldec_out_3:
state_store3
FRAME_END
ret
.Ldec_out_4:
state_store4
FRAME_END
ret
.Ldec_out_5:
state_store5
FRAME_END
ret
.Ldec_out_6:
state_store6
FRAME_END
ret
.Ldec_out_7:
state_store7
FRAME_END
ret
.Ldec_out:
FRAME_END
ret
ENDPROC(crypto_aegis128l_aesni_dec)
/*
* void crypto_aegis128l_aesni_dec_tail(void *state, unsigned int length,
* const void *src, void *dst);
*/
ENTRY(crypto_aegis128l_aesni_dec_tail)
FRAME_BEGIN
state_load
/* decrypt message: */
call __load_partial
crypt0 MSG0, MSG1
movdqa MSG0, T0
movdqa MSG1, T1
call __store_partial
/* mask with byte count: */
movq LEN, T0
punpcklbw T0, T0
punpcklbw T0, T0
punpcklbw T0, T0
punpcklbw T0, T0
movdqa T0, T1
movdqa .Laegis128l_counter0, T2
movdqa .Laegis128l_counter1, T3
pcmpgtb T2, T0
pcmpgtb T3, T1
pand T0, MSG0
pand T1, MSG1
update0
state_store0
FRAME_END
ret
ENDPROC(crypto_aegis128l_aesni_dec_tail)
/*
* void crypto_aegis128l_aesni_final(void *state, void *tag_xor,
* u64 assoclen, u64 cryptlen);
*/
ENTRY(crypto_aegis128l_aesni_final)
FRAME_BEGIN
state_load
/* prepare length block: */
movq %rdx, MSG0
movq %rcx, T0
pslldq $8, T0
pxor T0, MSG0
psllq $3, MSG0 /* multiply by 8 (to get bit count) */
pxor STATE2, MSG0
movdqa MSG0, MSG1
/* update state: */
update0
update1
update2
update3
update4
update5
update6
/* xor tag: */
movdqu (%rsi), T0
pxor STATE1, T0
pxor STATE2, T0
pxor STATE3, T0
pxor STATE4, T0
pxor STATE5, T0
pxor STATE6, T0
pxor STATE7, T0
movdqu T0, (%rsi)
FRAME_END
ret
ENDPROC(crypto_aegis128l_aesni_final)
/*
* The AEGIS-128L Authenticated-Encryption Algorithm
* Glue for AES-NI + SSE2 implementation
*
* Copyright (c) 2017-2018 Ondrej Mosnacek <omosnacek@gmail.com>
* Copyright (C) 2017-2018 Red Hat, Inc. All rights reserved.
*
* This program is free software; you can redistribute it and/or modify it
* under the terms of the GNU General Public License as published by the Free
* Software Foundation; either version 2 of the License, or (at your option)
* any later version.
*/
#include <crypto/cryptd.h>
#include <crypto/internal/aead.h>
#include <crypto/internal/skcipher.h>
#include <crypto/scatterwalk.h>
#include <linux/module.h>
#include <asm/fpu/api.h>
#include <asm/cpu_device_id.h>
#define AEGIS128L_BLOCK_ALIGN 16
#define AEGIS128L_BLOCK_SIZE 32
#define AEGIS128L_NONCE_SIZE 16
#define AEGIS128L_STATE_BLOCKS 8
#define AEGIS128L_KEY_SIZE 16
#define AEGIS128L_MIN_AUTH_SIZE 8
#define AEGIS128L_MAX_AUTH_SIZE 16
asmlinkage void crypto_aegis128l_aesni_init(void *state, void *key, void *iv);
asmlinkage void crypto_aegis128l_aesni_ad(
void *state, unsigned int length, const void *data);
asmlinkage void crypto_aegis128l_aesni_enc(
void *state, unsigned int length, const void *src, void *dst);
asmlinkage void crypto_aegis128l_aesni_dec(
void *state, unsigned int length, const void *src, void *dst);
asmlinkage void crypto_aegis128l_aesni_enc_tail(
void *state, unsigned int length, const void *src, void *dst);
asmlinkage void crypto_aegis128l_aesni_dec_tail(
void *state, unsigned int length, const void *src, void *dst);
asmlinkage void crypto_aegis128l_aesni_final(
void *state, void *tag_xor, unsigned int cryptlen,
unsigned int assoclen);
struct aegis_block {
u8 bytes[AEGIS128L_BLOCK_SIZE] __aligned(AEGIS128L_BLOCK_ALIGN);
};
struct aegis_state {
struct aegis_block blocks[AEGIS128L_STATE_BLOCKS];
};
struct aegis_ctx {
struct aegis_block key;
};
struct aegis_crypt_ops {
int (*skcipher_walk_init)(struct skcipher_walk *walk,
struct aead_request *req, bool atomic);
void (*crypt_blocks)(void *state, unsigned int length, const void *src,
void *dst);
void (*crypt_tail)(void *state, unsigned int length, const void *src,
void *dst);
};
static void crypto_aegis128l_aesni_process_ad(
struct aegis_state *state, struct scatterlist *sg_src,
unsigned int assoclen)
{
struct scatter_walk walk;
struct aegis_block buf;
unsigned int pos = 0;
scatterwalk_start(&walk, sg_src);
while (assoclen != 0) {
unsigned int size = scatterwalk_clamp(&walk, assoclen);
unsigned int left = size;
void *mapped = scatterwalk_map(&walk);
const u8 *src = (const u8 *)mapped;
if (pos + size >= AEGIS128L_BLOCK_SIZE) {
if (pos > 0) {
unsigned int fill = AEGIS128L_BLOCK_SIZE - pos;
memcpy(buf.bytes + pos, src, fill);
crypto_aegis128l_aesni_ad(state,
AEGIS128L_BLOCK_SIZE,
buf.bytes);
pos = 0;
left -= fill;
src += fill;
}
crypto_aegis128l_aesni_ad(state, left, src);
src += left & ~(AEGIS128L_BLOCK_SIZE - 1);
left &= AEGIS128L_BLOCK_SIZE - 1;
}
memcpy(buf.bytes + pos, src, left);
pos += left;
assoclen -= size;
scatterwalk_unmap(mapped);
scatterwalk_advance(&walk, size);
scatterwalk_done(&walk, 0, assoclen);
}
if (pos > 0) {
memset(buf.bytes + pos, 0, AEGIS128L_BLOCK_SIZE - pos);
crypto_aegis128l_aesni_ad(state, AEGIS128L_BLOCK_SIZE, buf.bytes);
}
}
static void crypto_aegis128l_aesni_process_crypt(
struct aegis_state *state, struct aead_request *req,
const struct aegis_crypt_ops *ops)
{
struct skcipher_walk walk;
u8 *src, *dst;
unsigned int chunksize, base;
ops->skcipher_walk_init(&walk, req, false);
while (walk.nbytes) {
src = walk.src.virt.addr;
dst = walk.dst.virt.addr;
chunksize = walk.nbytes;
ops->crypt_blocks(state, chunksize, src, dst);
base = chunksize & ~(AEGIS128L_BLOCK_SIZE - 1);
src += base;
dst += base;
chunksize &= AEGIS128L_BLOCK_SIZE - 1;
if (chunksize > 0)
ops->crypt_tail(state, chunksize, src, dst);
skcipher_walk_done(&walk, 0);
}
}
static struct aegis_ctx *crypto_aegis128l_aesni_ctx(struct crypto_aead *aead)
{
u8 *ctx = crypto_aead_ctx(aead);
ctx = PTR_ALIGN(ctx, __alignof__(struct aegis_ctx));
return (void *)ctx;
}
static int crypto_aegis128l_aesni_setkey(struct crypto_aead *aead,
const u8 *key, unsigned int keylen)
{
struct aegis_ctx *ctx = crypto_aegis128l_aesni_ctx(aead);
if (keylen != AEGIS128L_KEY_SIZE) {
crypto_aead_set_flags(aead, CRYPTO_TFM_RES_BAD_KEY_LEN);
return -EINVAL;
}
memcpy(ctx->key.bytes, key, AEGIS128L_KEY_SIZE);
return 0;
}
static int crypto_aegis128l_aesni_setauthsize(struct crypto_aead *tfm,
unsigned int authsize)
{
if (authsize > AEGIS128L_MAX_AUTH_SIZE)
return -EINVAL;
if (authsize < AEGIS128L_MIN_AUTH_SIZE)
return -EINVAL;
return 0;
}
static void crypto_aegis128l_aesni_crypt(struct aead_request *req,
struct aegis_block *tag_xor,
unsigned int cryptlen,
const struct aegis_crypt_ops *ops)
{
struct crypto_aead *tfm = crypto_aead_reqtfm(req);
struct aegis_ctx *ctx = crypto_aegis128l_aesni_ctx(tfm);
struct aegis_state state;
kernel_fpu_begin();
crypto_aegis128l_aesni_init(&state, ctx->key.bytes, req->iv);
crypto_aegis128l_aesni_process_ad(&state, req->src, req->assoclen);
crypto_aegis128l_aesni_process_crypt(&state, req, ops);
crypto_aegis128l_aesni_final(&state, tag_xor, req->assoclen, cryptlen);
kernel_fpu_end();
}
static int crypto_aegis128l_aesni_encrypt(struct aead_request *req)
{
static const struct aegis_crypt_ops OPS = {
.skcipher_walk_init = skcipher_walk_aead_encrypt,
.crypt_blocks = crypto_aegis128l_aesni_enc,
.crypt_tail = crypto_aegis128l_aesni_enc_tail,
};
struct crypto_aead *tfm = crypto_aead_reqtfm(req);
struct aegis_block tag = {};
unsigned int authsize = crypto_aead_authsize(tfm);
unsigned int cryptlen = req->cryptlen;
crypto_aegis128l_aesni_crypt(req, &tag, cryptlen, &OPS);
scatterwalk_map_and_copy(tag.bytes, req->dst,
req->assoclen + cryptlen, authsize, 1);
return 0;
}
static int crypto_aegis128l_aesni_decrypt(struct aead_request *req)
{
static const struct aegis_block zeros = {};
static const struct aegis_crypt_ops OPS = {
.skcipher_walk_init = skcipher_walk_aead_decrypt,
.crypt_blocks = crypto_aegis128l_aesni_dec,
.crypt_tail = crypto_aegis128l_aesni_dec_tail,
};
struct crypto_aead *tfm = crypto_aead_reqtfm(req);
struct aegis_block tag;
unsigned int authsize = crypto_aead_authsize(tfm);
unsigned int cryptlen = req->cryptlen - authsize;
scatterwalk_map_and_copy(tag.bytes, req->src,
req->assoclen + cryptlen, authsize, 0);
crypto_aegis128l_aesni_crypt(req, &tag, cryptlen, &OPS);
return crypto_memneq(tag.bytes, zeros.bytes, authsize) ? -EBADMSG : 0;
}
static int crypto_aegis128l_aesni_init_tfm(struct crypto_aead *aead)
{
return 0;
}
static void crypto_aegis128l_aesni_exit_tfm(struct crypto_aead *aead)
{
}
static int cryptd_aegis128l_aesni_setkey(struct crypto_aead *aead,
const u8 *key, unsigned int keylen)
{
struct cryptd_aead **ctx = crypto_aead_ctx(aead);
struct cryptd_aead *cryptd_tfm = *ctx;
return crypto_aead_setkey(&cryptd_tfm->base, key, keylen);
}
static int cryptd_aegis128l_aesni_setauthsize(struct crypto_aead *aead,
unsigned int authsize)
{
struct cryptd_aead **ctx = crypto_aead_ctx(aead);
struct cryptd_aead *cryptd_tfm = *ctx;
return crypto_aead_setauthsize(&cryptd_tfm->base, authsize);
}
static int cryptd_aegis128l_aesni_encrypt(struct aead_request *req)
{
struct crypto_aead *aead = crypto_aead_reqtfm(req);
struct cryptd_aead **ctx = crypto_aead_ctx(aead);
struct cryptd_aead *cryptd_tfm = *ctx;
aead = &cryptd_tfm->base;
if (irq_fpu_usable() && (!in_atomic() ||
!cryptd_aead_queued(cryptd_tfm)))
aead = cryptd_aead_child(cryptd_tfm);
aead_request_set_tfm(req, aead);
return crypto_aead_encrypt(req);
}
static int cryptd_aegis128l_aesni_decrypt(struct aead_request *req)
{
struct crypto_aead *aead = crypto_aead_reqtfm(req);
struct cryptd_aead **ctx = crypto_aead_ctx(aead);
struct cryptd_aead *cryptd_tfm = *ctx;
aead = &cryptd_tfm->base;
if (irq_fpu_usable() && (!in_atomic() ||
!cryptd_aead_queued(cryptd_tfm)))
aead = cryptd_aead_child(cryptd_tfm);
aead_request_set_tfm(req, aead);
return crypto_aead_decrypt(req);
}
static int cryptd_aegis128l_aesni_init_tfm(struct crypto_aead *aead)
{
struct cryptd_aead *cryptd_tfm;
struct cryptd_aead **ctx = crypto_aead_ctx(aead);
cryptd_tfm = cryptd_alloc_aead("__aegis128l-aesni", CRYPTO_ALG_INTERNAL,
CRYPTO_ALG_INTERNAL);
if (IS_ERR(cryptd_tfm))
return PTR_ERR(cryptd_tfm);
*ctx = cryptd_tfm;
crypto_aead_set_reqsize(aead, crypto_aead_reqsize(&cryptd_tfm->base));
return 0;
}
static void cryptd_aegis128l_aesni_exit_tfm(struct crypto_aead *aead)
{
struct cryptd_aead **ctx = crypto_aead_ctx(aead);
cryptd_free_aead(*ctx);
}
static struct aead_alg crypto_aegis128l_aesni_alg[] = {
{
.setkey = crypto_aegis128l_aesni_setkey,
.setauthsize = crypto_aegis128l_aesni_setauthsize,
.encrypt = crypto_aegis128l_aesni_encrypt,
.decrypt = crypto_aegis128l_aesni_decrypt,
.init = crypto_aegis128l_aesni_init_tfm,
.exit = crypto_aegis128l_aesni_exit_tfm,
.ivsize = AEGIS128L_NONCE_SIZE,
.maxauthsize = AEGIS128L_MAX_AUTH_SIZE,
.chunksize = AEGIS128L_BLOCK_SIZE,
.base = {
.cra_flags = CRYPTO_ALG_INTERNAL,
.cra_blocksize = 1,
.cra_ctxsize = sizeof(struct aegis_ctx) +
__alignof__(struct aegis_ctx),
.cra_alignmask = 0,
.cra_name = "__aegis128l",
.cra_driver_name = "__aegis128l-aesni",
.cra_module = THIS_MODULE,
}
}, {
.setkey = cryptd_aegis128l_aesni_setkey,
.setauthsize = cryptd_aegis128l_aesni_setauthsize,
.encrypt = cryptd_aegis128l_aesni_encrypt,
.decrypt = cryptd_aegis128l_aesni_decrypt,
.init = cryptd_aegis128l_aesni_init_tfm,
.exit = cryptd_aegis128l_aesni_exit_tfm,
.ivsize = AEGIS128L_NONCE_SIZE,
.maxauthsize = AEGIS128L_MAX_AUTH_SIZE,
.chunksize = AEGIS128L_BLOCK_SIZE,
.base = {
.cra_flags = CRYPTO_ALG_ASYNC,
.cra_blocksize = 1,
.cra_ctxsize = sizeof(struct cryptd_aead *),
.cra_alignmask = 0,
.cra_priority = 400,
.cra_name = "aegis128l",
.cra_driver_name = "aegis128l-aesni",
.cra_module = THIS_MODULE,
}
}
};
static const struct x86_cpu_id aesni_cpu_id[] = {
X86_FEATURE_MATCH(X86_FEATURE_AES),
X86_FEATURE_MATCH(X86_FEATURE_XMM2),
{}
};
MODULE_DEVICE_TABLE(x86cpu, aesni_cpu_id);
static int __init crypto_aegis128l_aesni_module_init(void)
{
if (!x86_match_cpu(aesni_cpu_id))
return -ENODEV;
return crypto_register_aeads(crypto_aegis128l_aesni_alg,
ARRAY_SIZE(crypto_aegis128l_aesni_alg));
}
static void __exit crypto_aegis128l_aesni_module_exit(void)
{
crypto_unregister_aeads(crypto_aegis128l_aesni_alg,
ARRAY_SIZE(crypto_aegis128l_aesni_alg));
}
module_init(crypto_aegis128l_aesni_module_init);
module_exit(crypto_aegis128l_aesni_module_exit);
MODULE_LICENSE("GPL");
MODULE_AUTHOR("Ondrej Mosnacek <omosnacek@gmail.com>");
MODULE_DESCRIPTION("AEGIS-128L AEAD algorithm -- AESNI+SSE2 implementation");
MODULE_ALIAS_CRYPTO("aegis128l");
MODULE_ALIAS_CRYPTO("aegis128l-aesni");
此差异已折叠。
此差异已折叠。
......@@ -364,5 +364,5 @@ module_exit(ghash_pclmulqdqni_mod_exit);
MODULE_LICENSE("GPL");
MODULE_DESCRIPTION("GHASH Message Digest Algorithm, "
"acclerated by PCLMULQDQ-NI");
"accelerated by PCLMULQDQ-NI");
MODULE_ALIAS_CRYPTO("ghash");
此差异已折叠。
/*
* The MORUS-1280 Authenticated-Encryption Algorithm
* Glue for AVX2 implementation
*
* Copyright (c) 2016-2018 Ondrej Mosnacek <omosnacek@gmail.com>
* Copyright (C) 2017-2018 Red Hat, Inc. All rights reserved.
*
* This program is free software; you can redistribute it and/or modify it
* under the terms of the GNU General Public License as published by the Free
* Software Foundation; either version 2 of the License, or (at your option)
* any later version.
*/
#include <crypto/internal/aead.h>
#include <crypto/morus1280_glue.h>
#include <linux/module.h>
#include <asm/fpu/api.h>
#include <asm/cpu_device_id.h>
asmlinkage void crypto_morus1280_avx2_init(void *state, const void *key,
const void *iv);
asmlinkage void crypto_morus1280_avx2_ad(void *state, const void *data,
unsigned int length);
asmlinkage void crypto_morus1280_avx2_enc(void *state, const void *src,
void *dst, unsigned int length);
asmlinkage void crypto_morus1280_avx2_dec(void *state, const void *src,
void *dst, unsigned int length);
asmlinkage void crypto_morus1280_avx2_enc_tail(void *state, const void *src,
void *dst, unsigned int length);
asmlinkage void crypto_morus1280_avx2_dec_tail(void *state, const void *src,
void *dst, unsigned int length);
asmlinkage void crypto_morus1280_avx2_final(void *state, void *tag_xor,
u64 assoclen, u64 cryptlen);
MORUS1280_DECLARE_ALGS(avx2, "morus1280-avx2", 400);
static const struct x86_cpu_id avx2_cpu_id[] = {
X86_FEATURE_MATCH(X86_FEATURE_AVX2),
{}
};
MODULE_DEVICE_TABLE(x86cpu, avx2_cpu_id);
static int __init crypto_morus1280_avx2_module_init(void)
{
if (!x86_match_cpu(avx2_cpu_id))
return -ENODEV;
return crypto_register_aeads(crypto_morus1280_avx2_algs,
ARRAY_SIZE(crypto_morus1280_avx2_algs));
}
static void __exit crypto_morus1280_avx2_module_exit(void)
{
crypto_unregister_aeads(crypto_morus1280_avx2_algs,
ARRAY_SIZE(crypto_morus1280_avx2_algs));
}
module_init(crypto_morus1280_avx2_module_init);
module_exit(crypto_morus1280_avx2_module_exit);
MODULE_LICENSE("GPL");
MODULE_AUTHOR("Ondrej Mosnacek <omosnacek@gmail.com>");
MODULE_DESCRIPTION("MORUS-1280 AEAD algorithm -- AVX2 implementation");
MODULE_ALIAS_CRYPTO("morus1280");
MODULE_ALIAS_CRYPTO("morus1280-avx2");
此差异已折叠。
/*
* The MORUS-1280 Authenticated-Encryption Algorithm
* Glue for SSE2 implementation
*
* Copyright (c) 2016-2018 Ondrej Mosnacek <omosnacek@gmail.com>
* Copyright (C) 2017-2018 Red Hat, Inc. All rights reserved.
*
* This program is free software; you can redistribute it and/or modify it
* under the terms of the GNU General Public License as published by the Free
* Software Foundation; either version 2 of the License, or (at your option)
* any later version.
*/
#include <crypto/internal/aead.h>
#include <crypto/morus1280_glue.h>
#include <linux/module.h>
#include <asm/fpu/api.h>
#include <asm/cpu_device_id.h>
asmlinkage void crypto_morus1280_sse2_init(void *state, const void *key,
const void *iv);
asmlinkage void crypto_morus1280_sse2_ad(void *state, const void *data,
unsigned int length);
asmlinkage void crypto_morus1280_sse2_enc(void *state, const void *src,
void *dst, unsigned int length);
asmlinkage void crypto_morus1280_sse2_dec(void *state, const void *src,
void *dst, unsigned int length);
asmlinkage void crypto_morus1280_sse2_enc_tail(void *state, const void *src,
void *dst, unsigned int length);
asmlinkage void crypto_morus1280_sse2_dec_tail(void *state, const void *src,
void *dst, unsigned int length);
asmlinkage void crypto_morus1280_sse2_final(void *state, void *tag_xor,
u64 assoclen, u64 cryptlen);
MORUS1280_DECLARE_ALGS(sse2, "morus1280-sse2", 350);
static const struct x86_cpu_id sse2_cpu_id[] = {
X86_FEATURE_MATCH(X86_FEATURE_XMM2),
{}
};
MODULE_DEVICE_TABLE(x86cpu, sse2_cpu_id);
static int __init crypto_morus1280_sse2_module_init(void)
{
if (!x86_match_cpu(sse2_cpu_id))
return -ENODEV;
return crypto_register_aeads(crypto_morus1280_sse2_algs,
ARRAY_SIZE(crypto_morus1280_sse2_algs));
}
static void __exit crypto_morus1280_sse2_module_exit(void)
{
crypto_unregister_aeads(crypto_morus1280_sse2_algs,
ARRAY_SIZE(crypto_morus1280_sse2_algs));
}
module_init(crypto_morus1280_sse2_module_init);
module_exit(crypto_morus1280_sse2_module_exit);
MODULE_LICENSE("GPL");
MODULE_AUTHOR("Ondrej Mosnacek <omosnacek@gmail.com>");
MODULE_DESCRIPTION("MORUS-1280 AEAD algorithm -- SSE2 implementation");
MODULE_ALIAS_CRYPTO("morus1280");
MODULE_ALIAS_CRYPTO("morus1280-sse2");
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
......@@ -86,6 +86,11 @@ obj-$(CONFIG_CRYPTO_KEYWRAP) += keywrap.o
obj-$(CONFIG_CRYPTO_GCM) += gcm.o
obj-$(CONFIG_CRYPTO_CCM) += ccm.o
obj-$(CONFIG_CRYPTO_CHACHA20POLY1305) += chacha20poly1305.o
obj-$(CONFIG_CRYPTO_AEGIS128) += aegis128.o
obj-$(CONFIG_CRYPTO_AEGIS128L) += aegis128l.o
obj-$(CONFIG_CRYPTO_AEGIS256) += aegis256.o
obj-$(CONFIG_CRYPTO_MORUS640) += morus640.o
obj-$(CONFIG_CRYPTO_MORUS1280) += morus1280.o
obj-$(CONFIG_CRYPTO_PCRYPT) += pcrypt.o
obj-$(CONFIG_CRYPTO_CRYPTD) += cryptd.o
obj-$(CONFIG_CRYPTO_MCRYPTD) += mcryptd.o
......@@ -137,6 +142,7 @@ obj-$(CONFIG_CRYPTO_USER_API_HASH) += algif_hash.o
obj-$(CONFIG_CRYPTO_USER_API_SKCIPHER) += algif_skcipher.o
obj-$(CONFIG_CRYPTO_USER_API_RNG) += algif_rng.o
obj-$(CONFIG_CRYPTO_USER_API_AEAD) += algif_aead.o
obj-$(CONFIG_CRYPTO_ZSTD) += zstd.o
ecdh_generic-y := ecc.o
ecdh_generic-y += ecdh.o
......
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
......@@ -10,6 +10,7 @@
*
*/
#include <crypto/algapi.h>
#include <linux/err.h>
#include <linux/errno.h>
#include <linux/fips.h>
......@@ -59,6 +60,15 @@ static int crypto_check_alg(struct crypto_alg *alg)
if (alg->cra_blocksize > PAGE_SIZE / 8)
return -EINVAL;
if (!alg->cra_type && (alg->cra_flags & CRYPTO_ALG_TYPE_MASK) ==
CRYPTO_ALG_TYPE_CIPHER) {
if (alg->cra_alignmask > MAX_CIPHER_ALIGNMASK)
return -EINVAL;
if (alg->cra_blocksize > MAX_CIPHER_BLOCKSIZE)
return -EINVAL;
}
if (alg->cra_priority < 0)
return -EINVAL;
......
......@@ -108,6 +108,7 @@ static int crypto_authenc_setkey(struct crypto_aead *authenc, const u8 *key,
CRYPTO_TFM_RES_MASK);
out:
memzero_explicit(&keys, sizeof(keys));
return err;
badkey:
......
......@@ -90,6 +90,7 @@ static int crypto_authenc_esn_setkey(struct crypto_aead *authenc_esn, const u8 *
CRYPTO_TFM_RES_MASK);
out:
memzero_explicit(&keys, sizeof(keys));
return err;
badkey:
......
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册