提交 3e1a29b3 编写于 作者: L Linus Torvalds

Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6

Pull crypto updates from Herbert Xu:
 "API:

   - Decryption test vectors are now automatically generated from
     encryption test vectors.

  Algorithms:

   - Fix unaligned access issues in crc32/crc32c.

   - Add zstd compression algorithm.

   - Add AEGIS.

   - Add MORUS.

  Drivers:

   - Add accelerated AEGIS/MORUS on x86.

   - Add accelerated SM4 on arm64.

   - Removed x86 assembly salsa implementation as it is slower than C.

   - Add authenc(hmac(sha*), cbc(aes)) support in inside-secure.

   - Add ctr(aes) support in crypto4xx.

   - Add hardware key support in ccree.

   - Add support for new Centaur CPU in via-rng"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (112 commits)
  crypto: chtls - free beyond end rspq_skb_cache
  crypto: chtls - kbuild warnings
  crypto: chtls - dereference null variable
  crypto: chtls - wait for memory sendmsg, sendpage
  crypto: chtls - key len correction
  crypto: salsa20 - Revert "crypto: salsa20 - export generic helpers"
  crypto: x86/salsa20 - remove x86 salsa20 implementations
  crypto: ccp - Add GET_ID SEV command
  crypto: ccp - Add DOWNLOAD_FIRMWARE SEV command
  crypto: qat - Add MODULE_FIRMWARE for all qat drivers
  crypto: ccree - silence debug prints
  crypto: ccree - better clock handling
  crypto: ccree - correct host regs offset
  crypto: chelsio - Remove separate buffer used for DMA map B0 block in CCM
  crypt: chelsio - Send IV as Immediate for cipher algo
  crypto: chelsio - Return -ENOSPC for transient busy indication.
  crypto: caam/qi - fix warning in init_cgr()
  crypto: caam - fix rfc4543 descriptors
  crypto: caam - fix MC firmware detection
  crypto: clarify licensing of OpenSSL asm code
  ...
#define __ARM_ARCH__ __LINUX_ARM_ARCH__ #define __ARM_ARCH__ __LINUX_ARM_ARCH__
@ SPDX-License-Identifier: GPL-2.0
@ This code is taken from the OpenSSL project but the author (Andy Polyakov)
@ has relicensed it under the GPLv2. Therefore this program is free software;
@ you can redistribute it and/or modify it under the terms of the GNU General
@ Public License version 2 as published by the Free Software Foundation.
@
@ The original headers, including the original license headers, are
@ included below for completeness.
@ ==================================================================== @ ====================================================================
@ Written by Andy Polyakov <appro@fy.chalmers.se> for the OpenSSL @ Written by Andy Polyakov <appro@fy.chalmers.se> for the OpenSSL
@ project. The module is, however, dual licensed under OpenSSL and @ project. The module is, however, dual licensed under OpenSSL and
......
#!/usr/bin/env perl #!/usr/bin/env perl
# SPDX-License-Identifier: GPL-2.0
# This code is taken from the OpenSSL project but the author (Andy Polyakov)
# has relicensed it under the GPLv2. Therefore this program is free software;
# you can redistribute it and/or modify it under the terms of the GNU General
# Public License version 2 as published by the Free Software Foundation.
#
# The original headers, including the original license headers, are
# included below for completeness.
# ==================================================================== # ====================================================================
# Written by Andy Polyakov <appro@openssl.org> for the OpenSSL # Written by Andy Polyakov <appro@openssl.org> for the OpenSSL
# project. The module is, however, dual licensed under OpenSSL and # project. The module is, however, dual licensed under OpenSSL and
# CRYPTOGAMS licenses depending on where you obtain it. For further # CRYPTOGAMS licenses depending on where you obtain it. For further
# details see http://www.openssl.org/~appro/cryptogams/. # details see http://www.openssl.org/~appro/cryptogams/.
#
# Permission to use under GPL terms is granted.
# ==================================================================== # ====================================================================
# SHA256 block procedure for ARMv4. May 2007. # SHA256 block procedure for ARMv4. May 2007.
......
@ SPDX-License-Identifier: GPL-2.0
@ This code is taken from the OpenSSL project but the author (Andy Polyakov)
@ has relicensed it under the GPLv2. Therefore this program is free software;
@ you can redistribute it and/or modify it under the terms of the GNU General
@ Public License version 2 as published by the Free Software Foundation.
@
@ The original headers, including the original license headers, are
@ included below for completeness.
@ ==================================================================== @ ====================================================================
@ Written by Andy Polyakov <appro@openssl.org> for the OpenSSL @ Written by Andy Polyakov <appro@openssl.org> for the OpenSSL
@ project. The module is, however, dual licensed under OpenSSL and @ project. The module is, however, dual licensed under OpenSSL and
@ CRYPTOGAMS licenses depending on where you obtain it. For further @ CRYPTOGAMS licenses depending on where you obtain it. For further
@ details see http://www.openssl.org/~appro/cryptogams/. @ details see http://www.openssl.org/~appro/cryptogams/.
@
@ Permission to use under GPL terms is granted.
@ ==================================================================== @ ====================================================================
@ SHA256 block procedure for ARMv4. May 2007. @ SHA256 block procedure for ARMv4. May 2007.
......
#!/usr/bin/env perl #!/usr/bin/env perl
# SPDX-License-Identifier: GPL-2.0
# This code is taken from the OpenSSL project but the author (Andy Polyakov)
# has relicensed it under the GPLv2. Therefore this program is free software;
# you can redistribute it and/or modify it under the terms of the GNU General
# Public License version 2 as published by the Free Software Foundation.
#
# The original headers, including the original license headers, are
# included below for completeness.
# ==================================================================== # ====================================================================
# Written by Andy Polyakov <appro@openssl.org> for the OpenSSL # Written by Andy Polyakov <appro@openssl.org> for the OpenSSL
# project. The module is, however, dual licensed under OpenSSL and # project. The module is, however, dual licensed under OpenSSL and
# CRYPTOGAMS licenses depending on where you obtain it. For further # CRYPTOGAMS licenses depending on where you obtain it. For further
# details see http://www.openssl.org/~appro/cryptogams/. # details see http://www.openssl.org/~appro/cryptogams/.
#
# Permission to use under GPL terms is granted.
# ==================================================================== # ====================================================================
# SHA512 block procedure for ARMv4. September 2007. # SHA512 block procedure for ARMv4. September 2007.
......
@ SPDX-License-Identifier: GPL-2.0
@ This code is taken from the OpenSSL project but the author (Andy Polyakov)
@ has relicensed it under the GPLv2. Therefore this program is free software;
@ you can redistribute it and/or modify it under the terms of the GNU General
@ Public License version 2 as published by the Free Software Foundation.
@
@ The original headers, including the original license headers, are
@ included below for completeness.
@ ==================================================================== @ ====================================================================
@ Written by Andy Polyakov <appro@openssl.org> for the OpenSSL @ Written by Andy Polyakov <appro@openssl.org> for the OpenSSL
@ project. The module is, however, dual licensed under OpenSSL and @ project. The module is, however, dual licensed under OpenSSL and
@ CRYPTOGAMS licenses depending on where you obtain it. For further @ CRYPTOGAMS licenses depending on where you obtain it. For further
@ details see http://www.openssl.org/~appro/cryptogams/. @ details see http://www.openssl.org/~appro/cryptogams/.
@
@ Permission to use under GPL terms is granted.
@ ==================================================================== @ ====================================================================
@ SHA512 block procedure for ARMv4. September 2007. @ SHA512 block procedure for ARMv4. September 2007.
......
...@@ -47,6 +47,12 @@ config CRYPTO_SM3_ARM64_CE ...@@ -47,6 +47,12 @@ config CRYPTO_SM3_ARM64_CE
select CRYPTO_HASH select CRYPTO_HASH
select CRYPTO_SM3 select CRYPTO_SM3
config CRYPTO_SM4_ARM64_CE
tristate "SM4 symmetric cipher (ARMv8.2 Crypto Extensions)"
depends on KERNEL_MODE_NEON
select CRYPTO_ALGAPI
select CRYPTO_SM4
config CRYPTO_GHASH_ARM64_CE config CRYPTO_GHASH_ARM64_CE
tristate "GHASH/AES-GCM using ARMv8 Crypto Extensions" tristate "GHASH/AES-GCM using ARMv8 Crypto Extensions"
depends on KERNEL_MODE_NEON depends on KERNEL_MODE_NEON
......
...@@ -23,6 +23,9 @@ sha3-ce-y := sha3-ce-glue.o sha3-ce-core.o ...@@ -23,6 +23,9 @@ sha3-ce-y := sha3-ce-glue.o sha3-ce-core.o
obj-$(CONFIG_CRYPTO_SM3_ARM64_CE) += sm3-ce.o obj-$(CONFIG_CRYPTO_SM3_ARM64_CE) += sm3-ce.o
sm3-ce-y := sm3-ce-glue.o sm3-ce-core.o sm3-ce-y := sm3-ce-glue.o sm3-ce-core.o
obj-$(CONFIG_CRYPTO_SM4_ARM64_CE) += sm4-ce.o
sm4-ce-y := sm4-ce-glue.o sm4-ce-core.o
obj-$(CONFIG_CRYPTO_GHASH_ARM64_CE) += ghash-ce.o obj-$(CONFIG_CRYPTO_GHASH_ARM64_CE) += ghash-ce.o
ghash-ce-y := ghash-ce-glue.o ghash-ce-core.o ghash-ce-y := ghash-ce-glue.o ghash-ce-core.o
......
...@@ -19,24 +19,33 @@ ...@@ -19,24 +19,33 @@
* u32 *macp, u8 const rk[], u32 rounds); * u32 *macp, u8 const rk[], u32 rounds);
*/ */
ENTRY(ce_aes_ccm_auth_data) ENTRY(ce_aes_ccm_auth_data)
ldr w8, [x3] /* leftover from prev round? */ frame_push 7
mov x19, x0
mov x20, x1
mov x21, x2
mov x22, x3
mov x23, x4
mov x24, x5
ldr w25, [x22] /* leftover from prev round? */
ld1 {v0.16b}, [x0] /* load mac */ ld1 {v0.16b}, [x0] /* load mac */
cbz w8, 1f cbz w25, 1f
sub w8, w8, #16 sub w25, w25, #16
eor v1.16b, v1.16b, v1.16b eor v1.16b, v1.16b, v1.16b
0: ldrb w7, [x1], #1 /* get 1 byte of input */ 0: ldrb w7, [x20], #1 /* get 1 byte of input */
subs w2, w2, #1 subs w21, w21, #1
add w8, w8, #1 add w25, w25, #1
ins v1.b[0], w7 ins v1.b[0], w7
ext v1.16b, v1.16b, v1.16b, #1 /* rotate in the input bytes */ ext v1.16b, v1.16b, v1.16b, #1 /* rotate in the input bytes */
beq 8f /* out of input? */ beq 8f /* out of input? */
cbnz w8, 0b cbnz w25, 0b
eor v0.16b, v0.16b, v1.16b eor v0.16b, v0.16b, v1.16b
1: ld1 {v3.4s}, [x4] /* load first round key */ 1: ld1 {v3.4s}, [x23] /* load first round key */
prfm pldl1strm, [x1] prfm pldl1strm, [x20]
cmp w5, #12 /* which key size? */ cmp w24, #12 /* which key size? */
add x6, x4, #16 add x6, x23, #16
sub w7, w5, #2 /* modified # of rounds */ sub w7, w24, #2 /* modified # of rounds */
bmi 2f bmi 2f
bne 5f bne 5f
mov v5.16b, v3.16b mov v5.16b, v3.16b
...@@ -55,33 +64,43 @@ ENTRY(ce_aes_ccm_auth_data) ...@@ -55,33 +64,43 @@ ENTRY(ce_aes_ccm_auth_data)
ld1 {v5.4s}, [x6], #16 /* load next round key */ ld1 {v5.4s}, [x6], #16 /* load next round key */
bpl 3b bpl 3b
aese v0.16b, v4.16b aese v0.16b, v4.16b
subs w2, w2, #16 /* last data? */ subs w21, w21, #16 /* last data? */
eor v0.16b, v0.16b, v5.16b /* final round */ eor v0.16b, v0.16b, v5.16b /* final round */
bmi 6f bmi 6f
ld1 {v1.16b}, [x1], #16 /* load next input block */ ld1 {v1.16b}, [x20], #16 /* load next input block */
eor v0.16b, v0.16b, v1.16b /* xor with mac */ eor v0.16b, v0.16b, v1.16b /* xor with mac */
bne 1b beq 6f
6: st1 {v0.16b}, [x0] /* store mac */
if_will_cond_yield_neon
st1 {v0.16b}, [x19] /* store mac */
do_cond_yield_neon
ld1 {v0.16b}, [x19] /* reload mac */
endif_yield_neon
b 1b
6: st1 {v0.16b}, [x19] /* store mac */
beq 10f beq 10f
adds w2, w2, #16 adds w21, w21, #16
beq 10f beq 10f
mov w8, w2 mov w25, w21
7: ldrb w7, [x1], #1 7: ldrb w7, [x20], #1
umov w6, v0.b[0] umov w6, v0.b[0]
eor w6, w6, w7 eor w6, w6, w7
strb w6, [x0], #1 strb w6, [x19], #1
subs w2, w2, #1 subs w21, w21, #1
beq 10f beq 10f
ext v0.16b, v0.16b, v0.16b, #1 /* rotate out the mac bytes */ ext v0.16b, v0.16b, v0.16b, #1 /* rotate out the mac bytes */
b 7b b 7b
8: mov w7, w8 8: mov w7, w25
add w8, w8, #16 add w25, w25, #16
9: ext v1.16b, v1.16b, v1.16b, #1 9: ext v1.16b, v1.16b, v1.16b, #1
adds w7, w7, #1 adds w7, w7, #1
bne 9b bne 9b
eor v0.16b, v0.16b, v1.16b eor v0.16b, v0.16b, v1.16b
st1 {v0.16b}, [x0] st1 {v0.16b}, [x19]
10: str w8, [x3] 10: str w25, [x22]
frame_pop
ret ret
ENDPROC(ce_aes_ccm_auth_data) ENDPROC(ce_aes_ccm_auth_data)
...@@ -126,19 +145,29 @@ ENTRY(ce_aes_ccm_final) ...@@ -126,19 +145,29 @@ ENTRY(ce_aes_ccm_final)
ENDPROC(ce_aes_ccm_final) ENDPROC(ce_aes_ccm_final)
.macro aes_ccm_do_crypt,enc .macro aes_ccm_do_crypt,enc
ldr x8, [x6, #8] /* load lower ctr */ frame_push 8
ld1 {v0.16b}, [x5] /* load mac */
CPU_LE( rev x8, x8 ) /* keep swabbed ctr in reg */ mov x19, x0
mov x20, x1
mov x21, x2
mov x22, x3
mov x23, x4
mov x24, x5
mov x25, x6
ldr x26, [x25, #8] /* load lower ctr */
ld1 {v0.16b}, [x24] /* load mac */
CPU_LE( rev x26, x26 ) /* keep swabbed ctr in reg */
0: /* outer loop */ 0: /* outer loop */
ld1 {v1.8b}, [x6] /* load upper ctr */ ld1 {v1.8b}, [x25] /* load upper ctr */
prfm pldl1strm, [x1] prfm pldl1strm, [x20]
add x8, x8, #1 add x26, x26, #1
rev x9, x8 rev x9, x26
cmp w4, #12 /* which key size? */ cmp w23, #12 /* which key size? */
sub w7, w4, #2 /* get modified # of rounds */ sub w7, w23, #2 /* get modified # of rounds */
ins v1.d[1], x9 /* no carry in lower ctr */ ins v1.d[1], x9 /* no carry in lower ctr */
ld1 {v3.4s}, [x3] /* load first round key */ ld1 {v3.4s}, [x22] /* load first round key */
add x10, x3, #16 add x10, x22, #16
bmi 1f bmi 1f
bne 4f bne 4f
mov v5.16b, v3.16b mov v5.16b, v3.16b
...@@ -165,9 +194,9 @@ CPU_LE( rev x8, x8 ) /* keep swabbed ctr in reg */ ...@@ -165,9 +194,9 @@ CPU_LE( rev x8, x8 ) /* keep swabbed ctr in reg */
bpl 2b bpl 2b
aese v0.16b, v4.16b aese v0.16b, v4.16b
aese v1.16b, v4.16b aese v1.16b, v4.16b
subs w2, w2, #16 subs w21, w21, #16
bmi 6f /* partial block? */ bmi 7f /* partial block? */
ld1 {v2.16b}, [x1], #16 /* load next input block */ ld1 {v2.16b}, [x20], #16 /* load next input block */
.if \enc == 1 .if \enc == 1
eor v2.16b, v2.16b, v5.16b /* final round enc+mac */ eor v2.16b, v2.16b, v5.16b /* final round enc+mac */
eor v1.16b, v1.16b, v2.16b /* xor with crypted ctr */ eor v1.16b, v1.16b, v2.16b /* xor with crypted ctr */
...@@ -176,18 +205,29 @@ CPU_LE( rev x8, x8 ) /* keep swabbed ctr in reg */ ...@@ -176,18 +205,29 @@ CPU_LE( rev x8, x8 ) /* keep swabbed ctr in reg */
eor v1.16b, v2.16b, v5.16b /* final round enc */ eor v1.16b, v2.16b, v5.16b /* final round enc */
.endif .endif
eor v0.16b, v0.16b, v2.16b /* xor mac with pt ^ rk[last] */ eor v0.16b, v0.16b, v2.16b /* xor mac with pt ^ rk[last] */
st1 {v1.16b}, [x0], #16 /* write output block */ st1 {v1.16b}, [x19], #16 /* write output block */
bne 0b beq 5f
CPU_LE( rev x8, x8 )
st1 {v0.16b}, [x5] /* store mac */ if_will_cond_yield_neon
str x8, [x6, #8] /* store lsb end of ctr (BE) */ st1 {v0.16b}, [x24] /* store mac */
5: ret do_cond_yield_neon
ld1 {v0.16b}, [x24] /* reload mac */
6: eor v0.16b, v0.16b, v5.16b /* final round mac */ endif_yield_neon
b 0b
5:
CPU_LE( rev x26, x26 )
st1 {v0.16b}, [x24] /* store mac */
str x26, [x25, #8] /* store lsb end of ctr (BE) */
6: frame_pop
ret
7: eor v0.16b, v0.16b, v5.16b /* final round mac */
eor v1.16b, v1.16b, v5.16b /* final round enc */ eor v1.16b, v1.16b, v5.16b /* final round enc */
st1 {v0.16b}, [x5] /* store mac */ st1 {v0.16b}, [x24] /* store mac */
add w2, w2, #16 /* process partial tail block */ add w21, w21, #16 /* process partial tail block */
7: ldrb w9, [x1], #1 /* get 1 byte of input */ 8: ldrb w9, [x20], #1 /* get 1 byte of input */
umov w6, v1.b[0] /* get top crypted ctr byte */ umov w6, v1.b[0] /* get top crypted ctr byte */
umov w7, v0.b[0] /* get top mac byte */ umov w7, v0.b[0] /* get top mac byte */
.if \enc == 1 .if \enc == 1
...@@ -197,13 +237,13 @@ CPU_LE( rev x8, x8 ) ...@@ -197,13 +237,13 @@ CPU_LE( rev x8, x8 )
eor w9, w9, w6 eor w9, w9, w6
eor w7, w7, w9 eor w7, w7, w9
.endif .endif
strb w9, [x0], #1 /* store out byte */ strb w9, [x19], #1 /* store out byte */
strb w7, [x5], #1 /* store mac byte */ strb w7, [x24], #1 /* store mac byte */
subs w2, w2, #1 subs w21, w21, #1
beq 5b beq 6b
ext v0.16b, v0.16b, v0.16b, #1 /* shift out mac byte */ ext v0.16b, v0.16b, v0.16b, #1 /* shift out mac byte */
ext v1.16b, v1.16b, v1.16b, #1 /* shift out ctr byte */ ext v1.16b, v1.16b, v1.16b, #1 /* shift out ctr byte */
b 7b b 8b
.endm .endm
/* /*
......
...@@ -30,18 +30,21 @@ ...@@ -30,18 +30,21 @@
.endm .endm
/* prepare for encryption with key in rk[] */ /* prepare for encryption with key in rk[] */
.macro enc_prepare, rounds, rk, ignore .macro enc_prepare, rounds, rk, temp
load_round_keys \rounds, \rk mov \temp, \rk
load_round_keys \rounds, \temp
.endm .endm
/* prepare for encryption (again) but with new key in rk[] */ /* prepare for encryption (again) but with new key in rk[] */
.macro enc_switch_key, rounds, rk, ignore .macro enc_switch_key, rounds, rk, temp
load_round_keys \rounds, \rk mov \temp, \rk
load_round_keys \rounds, \temp
.endm .endm
/* prepare for decryption with key in rk[] */ /* prepare for decryption with key in rk[] */
.macro dec_prepare, rounds, rk, ignore .macro dec_prepare, rounds, rk, temp
load_round_keys \rounds, \rk mov \temp, \rk
load_round_keys \rounds, \temp
.endm .endm
.macro do_enc_Nx, de, mc, k, i0, i1, i2, i3 .macro do_enc_Nx, de, mc, k, i0, i1, i2, i3
......
...@@ -14,12 +14,12 @@ ...@@ -14,12 +14,12 @@
.align 4 .align 4
aes_encrypt_block4x: aes_encrypt_block4x:
encrypt_block4x v0, v1, v2, v3, w3, x2, x8, w7 encrypt_block4x v0, v1, v2, v3, w22, x21, x8, w7
ret ret
ENDPROC(aes_encrypt_block4x) ENDPROC(aes_encrypt_block4x)
aes_decrypt_block4x: aes_decrypt_block4x:
decrypt_block4x v0, v1, v2, v3, w3, x2, x8, w7 decrypt_block4x v0, v1, v2, v3, w22, x21, x8, w7
ret ret
ENDPROC(aes_decrypt_block4x) ENDPROC(aes_decrypt_block4x)
...@@ -31,57 +31,71 @@ ENDPROC(aes_decrypt_block4x) ...@@ -31,57 +31,71 @@ ENDPROC(aes_decrypt_block4x)
*/ */
AES_ENTRY(aes_ecb_encrypt) AES_ENTRY(aes_ecb_encrypt)
stp x29, x30, [sp, #-16]! frame_push 5
mov x29, sp
enc_prepare w3, x2, x5 mov x19, x0
mov x20, x1
mov x21, x2
mov x22, x3
mov x23, x4
.Lecbencrestart:
enc_prepare w22, x21, x5
.LecbencloopNx: .LecbencloopNx:
subs w4, w4, #4 subs w23, w23, #4
bmi .Lecbenc1x bmi .Lecbenc1x
ld1 {v0.16b-v3.16b}, [x1], #64 /* get 4 pt blocks */ ld1 {v0.16b-v3.16b}, [x20], #64 /* get 4 pt blocks */
bl aes_encrypt_block4x bl aes_encrypt_block4x
st1 {v0.16b-v3.16b}, [x0], #64 st1 {v0.16b-v3.16b}, [x19], #64
cond_yield_neon .Lecbencrestart
b .LecbencloopNx b .LecbencloopNx
.Lecbenc1x: .Lecbenc1x:
adds w4, w4, #4 adds w23, w23, #4
beq .Lecbencout beq .Lecbencout
.Lecbencloop: .Lecbencloop:
ld1 {v0.16b}, [x1], #16 /* get next pt block */ ld1 {v0.16b}, [x20], #16 /* get next pt block */
encrypt_block v0, w3, x2, x5, w6 encrypt_block v0, w22, x21, x5, w6
st1 {v0.16b}, [x0], #16 st1 {v0.16b}, [x19], #16
subs w4, w4, #1 subs w23, w23, #1
bne .Lecbencloop bne .Lecbencloop
.Lecbencout: .Lecbencout:
ldp x29, x30, [sp], #16 frame_pop
ret ret
AES_ENDPROC(aes_ecb_encrypt) AES_ENDPROC(aes_ecb_encrypt)
AES_ENTRY(aes_ecb_decrypt) AES_ENTRY(aes_ecb_decrypt)
stp x29, x30, [sp, #-16]! frame_push 5
mov x29, sp
mov x19, x0
mov x20, x1
mov x21, x2
mov x22, x3
mov x23, x4
dec_prepare w3, x2, x5 .Lecbdecrestart:
dec_prepare w22, x21, x5
.LecbdecloopNx: .LecbdecloopNx:
subs w4, w4, #4 subs w23, w23, #4
bmi .Lecbdec1x bmi .Lecbdec1x
ld1 {v0.16b-v3.16b}, [x1], #64 /* get 4 ct blocks */ ld1 {v0.16b-v3.16b}, [x20], #64 /* get 4 ct blocks */
bl aes_decrypt_block4x bl aes_decrypt_block4x
st1 {v0.16b-v3.16b}, [x0], #64 st1 {v0.16b-v3.16b}, [x19], #64
cond_yield_neon .Lecbdecrestart
b .LecbdecloopNx b .LecbdecloopNx
.Lecbdec1x: .Lecbdec1x:
adds w4, w4, #4 adds w23, w23, #4
beq .Lecbdecout beq .Lecbdecout
.Lecbdecloop: .Lecbdecloop:
ld1 {v0.16b}, [x1], #16 /* get next ct block */ ld1 {v0.16b}, [x20], #16 /* get next ct block */
decrypt_block v0, w3, x2, x5, w6 decrypt_block v0, w22, x21, x5, w6
st1 {v0.16b}, [x0], #16 st1 {v0.16b}, [x19], #16
subs w4, w4, #1 subs w23, w23, #1
bne .Lecbdecloop bne .Lecbdecloop
.Lecbdecout: .Lecbdecout:
ldp x29, x30, [sp], #16 frame_pop
ret ret
AES_ENDPROC(aes_ecb_decrypt) AES_ENDPROC(aes_ecb_decrypt)
...@@ -94,78 +108,100 @@ AES_ENDPROC(aes_ecb_decrypt) ...@@ -94,78 +108,100 @@ AES_ENDPROC(aes_ecb_decrypt)
*/ */
AES_ENTRY(aes_cbc_encrypt) AES_ENTRY(aes_cbc_encrypt)
ld1 {v4.16b}, [x5] /* get iv */ frame_push 6
enc_prepare w3, x2, x6
mov x19, x0
mov x20, x1
mov x21, x2
mov x22, x3
mov x23, x4
mov x24, x5
.Lcbcencrestart:
ld1 {v4.16b}, [x24] /* get iv */
enc_prepare w22, x21, x6
.Lcbcencloop4x: .Lcbcencloop4x:
subs w4, w4, #4 subs w23, w23, #4
bmi .Lcbcenc1x bmi .Lcbcenc1x
ld1 {v0.16b-v3.16b}, [x1], #64 /* get 4 pt blocks */ ld1 {v0.16b-v3.16b}, [x20], #64 /* get 4 pt blocks */
eor v0.16b, v0.16b, v4.16b /* ..and xor with iv */ eor v0.16b, v0.16b, v4.16b /* ..and xor with iv */
encrypt_block v0, w3, x2, x6, w7 encrypt_block v0, w22, x21, x6, w7
eor v1.16b, v1.16b, v0.16b eor v1.16b, v1.16b, v0.16b
encrypt_block v1, w3, x2, x6, w7 encrypt_block v1, w22, x21, x6, w7
eor v2.16b, v2.16b, v1.16b eor v2.16b, v2.16b, v1.16b
encrypt_block v2, w3, x2, x6, w7 encrypt_block v2, w22, x21, x6, w7
eor v3.16b, v3.16b, v2.16b eor v3.16b, v3.16b, v2.16b
encrypt_block v3, w3, x2, x6, w7 encrypt_block v3, w22, x21, x6, w7
st1 {v0.16b-v3.16b}, [x0], #64 st1 {v0.16b-v3.16b}, [x19], #64
mov v4.16b, v3.16b mov v4.16b, v3.16b
st1 {v4.16b}, [x24] /* return iv */
cond_yield_neon .Lcbcencrestart
b .Lcbcencloop4x b .Lcbcencloop4x
.Lcbcenc1x: .Lcbcenc1x:
adds w4, w4, #4 adds w23, w23, #4
beq .Lcbcencout beq .Lcbcencout
.Lcbcencloop: .Lcbcencloop:
ld1 {v0.16b}, [x1], #16 /* get next pt block */ ld1 {v0.16b}, [x20], #16 /* get next pt block */
eor v4.16b, v4.16b, v0.16b /* ..and xor with iv */ eor v4.16b, v4.16b, v0.16b /* ..and xor with iv */
encrypt_block v4, w3, x2, x6, w7 encrypt_block v4, w22, x21, x6, w7
st1 {v4.16b}, [x0], #16 st1 {v4.16b}, [x19], #16
subs w4, w4, #1 subs w23, w23, #1
bne .Lcbcencloop bne .Lcbcencloop
.Lcbcencout: .Lcbcencout:
st1 {v4.16b}, [x5] /* return iv */ st1 {v4.16b}, [x24] /* return iv */
frame_pop
ret ret
AES_ENDPROC(aes_cbc_encrypt) AES_ENDPROC(aes_cbc_encrypt)
AES_ENTRY(aes_cbc_decrypt) AES_ENTRY(aes_cbc_decrypt)
stp x29, x30, [sp, #-16]! frame_push 6
mov x29, sp
mov x19, x0
mov x20, x1
mov x21, x2
mov x22, x3
mov x23, x4
mov x24, x5
ld1 {v7.16b}, [x5] /* get iv */ .Lcbcdecrestart:
dec_prepare w3, x2, x6 ld1 {v7.16b}, [x24] /* get iv */
dec_prepare w22, x21, x6
.LcbcdecloopNx: .LcbcdecloopNx:
subs w4, w4, #4 subs w23, w23, #4
bmi .Lcbcdec1x bmi .Lcbcdec1x
ld1 {v0.16b-v3.16b}, [x1], #64 /* get 4 ct blocks */ ld1 {v0.16b-v3.16b}, [x20], #64 /* get 4 ct blocks */
mov v4.16b, v0.16b mov v4.16b, v0.16b
mov v5.16b, v1.16b mov v5.16b, v1.16b
mov v6.16b, v2.16b mov v6.16b, v2.16b
bl aes_decrypt_block4x bl aes_decrypt_block4x
sub x1, x1, #16 sub x20, x20, #16
eor v0.16b, v0.16b, v7.16b eor v0.16b, v0.16b, v7.16b
eor v1.16b, v1.16b, v4.16b eor v1.16b, v1.16b, v4.16b
ld1 {v7.16b}, [x1], #16 /* reload 1 ct block */ ld1 {v7.16b}, [x20], #16 /* reload 1 ct block */
eor v2.16b, v2.16b, v5.16b eor v2.16b, v2.16b, v5.16b
eor v3.16b, v3.16b, v6.16b eor v3.16b, v3.16b, v6.16b
st1 {v0.16b-v3.16b}, [x0], #64 st1 {v0.16b-v3.16b}, [x19], #64
st1 {v7.16b}, [x24] /* return iv */
cond_yield_neon .Lcbcdecrestart
b .LcbcdecloopNx b .LcbcdecloopNx
.Lcbcdec1x: .Lcbcdec1x:
adds w4, w4, #4 adds w23, w23, #4
beq .Lcbcdecout beq .Lcbcdecout
.Lcbcdecloop: .Lcbcdecloop:
ld1 {v1.16b}, [x1], #16 /* get next ct block */ ld1 {v1.16b}, [x20], #16 /* get next ct block */
mov v0.16b, v1.16b /* ...and copy to v0 */ mov v0.16b, v1.16b /* ...and copy to v0 */
decrypt_block v0, w3, x2, x6, w7 decrypt_block v0, w22, x21, x6, w7
eor v0.16b, v0.16b, v7.16b /* xor with iv => pt */ eor v0.16b, v0.16b, v7.16b /* xor with iv => pt */
mov v7.16b, v1.16b /* ct is next iv */ mov v7.16b, v1.16b /* ct is next iv */
st1 {v0.16b}, [x0], #16 st1 {v0.16b}, [x19], #16
subs w4, w4, #1 subs w23, w23, #1
bne .Lcbcdecloop bne .Lcbcdecloop
.Lcbcdecout: .Lcbcdecout:
st1 {v7.16b}, [x5] /* return iv */ st1 {v7.16b}, [x24] /* return iv */
ldp x29, x30, [sp], #16 frame_pop
ret ret
AES_ENDPROC(aes_cbc_decrypt) AES_ENDPROC(aes_cbc_decrypt)
...@@ -176,19 +212,26 @@ AES_ENDPROC(aes_cbc_decrypt) ...@@ -176,19 +212,26 @@ AES_ENDPROC(aes_cbc_decrypt)
*/ */
AES_ENTRY(aes_ctr_encrypt) AES_ENTRY(aes_ctr_encrypt)
stp x29, x30, [sp, #-16]! frame_push 6
mov x29, sp
enc_prepare w3, x2, x6 mov x19, x0
ld1 {v4.16b}, [x5] mov x20, x1
mov x21, x2
mov x22, x3
mov x23, x4
mov x24, x5
.Lctrrestart:
enc_prepare w22, x21, x6
ld1 {v4.16b}, [x24]
umov x6, v4.d[1] /* keep swabbed ctr in reg */ umov x6, v4.d[1] /* keep swabbed ctr in reg */
rev x6, x6 rev x6, x6
cmn w6, w4 /* 32 bit overflow? */
bcs .Lctrloop
.LctrloopNx: .LctrloopNx:
subs w4, w4, #4 subs w23, w23, #4
bmi .Lctr1x bmi .Lctr1x
cmn w6, #4 /* 32 bit overflow? */
bcs .Lctr1x
ldr q8, =0x30000000200000001 /* addends 1,2,3[,0] */ ldr q8, =0x30000000200000001 /* addends 1,2,3[,0] */
dup v7.4s, w6 dup v7.4s, w6
mov v0.16b, v4.16b mov v0.16b, v4.16b
...@@ -200,25 +243,27 @@ AES_ENTRY(aes_ctr_encrypt) ...@@ -200,25 +243,27 @@ AES_ENTRY(aes_ctr_encrypt)
mov v1.s[3], v8.s[0] mov v1.s[3], v8.s[0]
mov v2.s[3], v8.s[1] mov v2.s[3], v8.s[1]
mov v3.s[3], v8.s[2] mov v3.s[3], v8.s[2]
ld1 {v5.16b-v7.16b}, [x1], #48 /* get 3 input blocks */ ld1 {v5.16b-v7.16b}, [x20], #48 /* get 3 input blocks */
bl aes_encrypt_block4x bl aes_encrypt_block4x
eor v0.16b, v5.16b, v0.16b eor v0.16b, v5.16b, v0.16b
ld1 {v5.16b}, [x1], #16 /* get 1 input block */ ld1 {v5.16b}, [x20], #16 /* get 1 input block */
eor v1.16b, v6.16b, v1.16b eor v1.16b, v6.16b, v1.16b
eor v2.16b, v7.16b, v2.16b eor v2.16b, v7.16b, v2.16b
eor v3.16b, v5.16b, v3.16b eor v3.16b, v5.16b, v3.16b
st1 {v0.16b-v3.16b}, [x0], #64 st1 {v0.16b-v3.16b}, [x19], #64
add x6, x6, #4 add x6, x6, #4
rev x7, x6 rev x7, x6
ins v4.d[1], x7 ins v4.d[1], x7
cbz w4, .Lctrout cbz w23, .Lctrout
st1 {v4.16b}, [x24] /* return next CTR value */
cond_yield_neon .Lctrrestart
b .LctrloopNx b .LctrloopNx
.Lctr1x: .Lctr1x:
adds w4, w4, #4 adds w23, w23, #4
beq .Lctrout beq .Lctrout
.Lctrloop: .Lctrloop:
mov v0.16b, v4.16b mov v0.16b, v4.16b
encrypt_block v0, w3, x2, x8, w7 encrypt_block v0, w22, x21, x8, w7
adds x6, x6, #1 /* increment BE ctr */ adds x6, x6, #1 /* increment BE ctr */
rev x7, x6 rev x7, x6
...@@ -226,22 +271,22 @@ AES_ENTRY(aes_ctr_encrypt) ...@@ -226,22 +271,22 @@ AES_ENTRY(aes_ctr_encrypt)
bcs .Lctrcarry /* overflow? */ bcs .Lctrcarry /* overflow? */
.Lctrcarrydone: .Lctrcarrydone:
subs w4, w4, #1 subs w23, w23, #1
bmi .Lctrtailblock /* blocks <0 means tail block */ bmi .Lctrtailblock /* blocks <0 means tail block */
ld1 {v3.16b}, [x1], #16 ld1 {v3.16b}, [x20], #16
eor v3.16b, v0.16b, v3.16b eor v3.16b, v0.16b, v3.16b
st1 {v3.16b}, [x0], #16 st1 {v3.16b}, [x19], #16
bne .Lctrloop bne .Lctrloop
.Lctrout: .Lctrout:
st1 {v4.16b}, [x5] /* return next CTR value */ st1 {v4.16b}, [x24] /* return next CTR value */
ldp x29, x30, [sp], #16 .Lctrret:
frame_pop
ret ret
.Lctrtailblock: .Lctrtailblock:
st1 {v0.16b}, [x0] st1 {v0.16b}, [x19]
ldp x29, x30, [sp], #16 b .Lctrret
ret
.Lctrcarry: .Lctrcarry:
umov x7, v4.d[0] /* load upper word of ctr */ umov x7, v4.d[0] /* load upper word of ctr */
...@@ -274,10 +319,16 @@ CPU_LE( .quad 1, 0x87 ) ...@@ -274,10 +319,16 @@ CPU_LE( .quad 1, 0x87 )
CPU_BE( .quad 0x87, 1 ) CPU_BE( .quad 0x87, 1 )
AES_ENTRY(aes_xts_encrypt) AES_ENTRY(aes_xts_encrypt)
stp x29, x30, [sp, #-16]! frame_push 6
mov x29, sp
mov x19, x0
mov x20, x1
mov x21, x2
mov x22, x3
mov x23, x4
mov x24, x6
ld1 {v4.16b}, [x6] ld1 {v4.16b}, [x24]
cbz w7, .Lxtsencnotfirst cbz w7, .Lxtsencnotfirst
enc_prepare w3, x5, x8 enc_prepare w3, x5, x8
...@@ -286,15 +337,17 @@ AES_ENTRY(aes_xts_encrypt) ...@@ -286,15 +337,17 @@ AES_ENTRY(aes_xts_encrypt)
ldr q7, .Lxts_mul_x ldr q7, .Lxts_mul_x
b .LxtsencNx b .LxtsencNx
.Lxtsencrestart:
ld1 {v4.16b}, [x24]
.Lxtsencnotfirst: .Lxtsencnotfirst:
enc_prepare w3, x2, x8 enc_prepare w22, x21, x8
.LxtsencloopNx: .LxtsencloopNx:
ldr q7, .Lxts_mul_x ldr q7, .Lxts_mul_x
next_tweak v4, v4, v7, v8 next_tweak v4, v4, v7, v8
.LxtsencNx: .LxtsencNx:
subs w4, w4, #4 subs w23, w23, #4
bmi .Lxtsenc1x bmi .Lxtsenc1x
ld1 {v0.16b-v3.16b}, [x1], #64 /* get 4 pt blocks */ ld1 {v0.16b-v3.16b}, [x20], #64 /* get 4 pt blocks */
next_tweak v5, v4, v7, v8 next_tweak v5, v4, v7, v8
eor v0.16b, v0.16b, v4.16b eor v0.16b, v0.16b, v4.16b
next_tweak v6, v5, v7, v8 next_tweak v6, v5, v7, v8
...@@ -307,35 +360,43 @@ AES_ENTRY(aes_xts_encrypt) ...@@ -307,35 +360,43 @@ AES_ENTRY(aes_xts_encrypt)
eor v0.16b, v0.16b, v4.16b eor v0.16b, v0.16b, v4.16b
eor v1.16b, v1.16b, v5.16b eor v1.16b, v1.16b, v5.16b
eor v2.16b, v2.16b, v6.16b eor v2.16b, v2.16b, v6.16b
st1 {v0.16b-v3.16b}, [x0], #64 st1 {v0.16b-v3.16b}, [x19], #64
mov v4.16b, v7.16b mov v4.16b, v7.16b
cbz w4, .Lxtsencout cbz w23, .Lxtsencout
st1 {v4.16b}, [x24]
cond_yield_neon .Lxtsencrestart
b .LxtsencloopNx b .LxtsencloopNx
.Lxtsenc1x: .Lxtsenc1x:
adds w4, w4, #4 adds w23, w23, #4
beq .Lxtsencout beq .Lxtsencout
.Lxtsencloop: .Lxtsencloop:
ld1 {v1.16b}, [x1], #16 ld1 {v1.16b}, [x20], #16
eor v0.16b, v1.16b, v4.16b eor v0.16b, v1.16b, v4.16b
encrypt_block v0, w3, x2, x8, w7 encrypt_block v0, w22, x21, x8, w7
eor v0.16b, v0.16b, v4.16b eor v0.16b, v0.16b, v4.16b
st1 {v0.16b}, [x0], #16 st1 {v0.16b}, [x19], #16
subs w4, w4, #1 subs w23, w23, #1
beq .Lxtsencout beq .Lxtsencout
next_tweak v4, v4, v7, v8 next_tweak v4, v4, v7, v8
b .Lxtsencloop b .Lxtsencloop
.Lxtsencout: .Lxtsencout:
st1 {v4.16b}, [x6] st1 {v4.16b}, [x24]
ldp x29, x30, [sp], #16 frame_pop
ret ret
AES_ENDPROC(aes_xts_encrypt) AES_ENDPROC(aes_xts_encrypt)
AES_ENTRY(aes_xts_decrypt) AES_ENTRY(aes_xts_decrypt)
stp x29, x30, [sp, #-16]! frame_push 6
mov x29, sp
ld1 {v4.16b}, [x6] mov x19, x0
mov x20, x1
mov x21, x2
mov x22, x3
mov x23, x4
mov x24, x6
ld1 {v4.16b}, [x24]
cbz w7, .Lxtsdecnotfirst cbz w7, .Lxtsdecnotfirst
enc_prepare w3, x5, x8 enc_prepare w3, x5, x8
...@@ -344,15 +405,17 @@ AES_ENTRY(aes_xts_decrypt) ...@@ -344,15 +405,17 @@ AES_ENTRY(aes_xts_decrypt)
ldr q7, .Lxts_mul_x ldr q7, .Lxts_mul_x
b .LxtsdecNx b .LxtsdecNx
.Lxtsdecrestart:
ld1 {v4.16b}, [x24]
.Lxtsdecnotfirst: .Lxtsdecnotfirst:
dec_prepare w3, x2, x8 dec_prepare w22, x21, x8
.LxtsdecloopNx: .LxtsdecloopNx:
ldr q7, .Lxts_mul_x ldr q7, .Lxts_mul_x
next_tweak v4, v4, v7, v8 next_tweak v4, v4, v7, v8
.LxtsdecNx: .LxtsdecNx:
subs w4, w4, #4 subs w23, w23, #4
bmi .Lxtsdec1x bmi .Lxtsdec1x
ld1 {v0.16b-v3.16b}, [x1], #64 /* get 4 ct blocks */ ld1 {v0.16b-v3.16b}, [x20], #64 /* get 4 ct blocks */
next_tweak v5, v4, v7, v8 next_tweak v5, v4, v7, v8
eor v0.16b, v0.16b, v4.16b eor v0.16b, v0.16b, v4.16b
next_tweak v6, v5, v7, v8 next_tweak v6, v5, v7, v8
...@@ -365,26 +428,28 @@ AES_ENTRY(aes_xts_decrypt) ...@@ -365,26 +428,28 @@ AES_ENTRY(aes_xts_decrypt)
eor v0.16b, v0.16b, v4.16b eor v0.16b, v0.16b, v4.16b
eor v1.16b, v1.16b, v5.16b eor v1.16b, v1.16b, v5.16b
eor v2.16b, v2.16b, v6.16b eor v2.16b, v2.16b, v6.16b
st1 {v0.16b-v3.16b}, [x0], #64 st1 {v0.16b-v3.16b}, [x19], #64
mov v4.16b, v7.16b mov v4.16b, v7.16b
cbz w4, .Lxtsdecout cbz w23, .Lxtsdecout
st1 {v4.16b}, [x24]
cond_yield_neon .Lxtsdecrestart
b .LxtsdecloopNx b .LxtsdecloopNx
.Lxtsdec1x: .Lxtsdec1x:
adds w4, w4, #4 adds w23, w23, #4
beq .Lxtsdecout beq .Lxtsdecout
.Lxtsdecloop: .Lxtsdecloop:
ld1 {v1.16b}, [x1], #16 ld1 {v1.16b}, [x20], #16
eor v0.16b, v1.16b, v4.16b eor v0.16b, v1.16b, v4.16b
decrypt_block v0, w3, x2, x8, w7 decrypt_block v0, w22, x21, x8, w7
eor v0.16b, v0.16b, v4.16b eor v0.16b, v0.16b, v4.16b
st1 {v0.16b}, [x0], #16 st1 {v0.16b}, [x19], #16
subs w4, w4, #1 subs w23, w23, #1
beq .Lxtsdecout beq .Lxtsdecout
next_tweak v4, v4, v7, v8 next_tweak v4, v4, v7, v8
b .Lxtsdecloop b .Lxtsdecloop
.Lxtsdecout: .Lxtsdecout:
st1 {v4.16b}, [x6] st1 {v4.16b}, [x24]
ldp x29, x30, [sp], #16 frame_pop
ret ret
AES_ENDPROC(aes_xts_decrypt) AES_ENDPROC(aes_xts_decrypt)
...@@ -393,43 +458,61 @@ AES_ENDPROC(aes_xts_decrypt) ...@@ -393,43 +458,61 @@ AES_ENDPROC(aes_xts_decrypt)
* int blocks, u8 dg[], int enc_before, int enc_after) * int blocks, u8 dg[], int enc_before, int enc_after)
*/ */
AES_ENTRY(aes_mac_update) AES_ENTRY(aes_mac_update)
ld1 {v0.16b}, [x4] /* get dg */ frame_push 6
mov x19, x0
mov x20, x1
mov x21, x2
mov x22, x3
mov x23, x4
mov x24, x6
ld1 {v0.16b}, [x23] /* get dg */
enc_prepare w2, x1, x7 enc_prepare w2, x1, x7
cbz w5, .Lmacloop4x cbz w5, .Lmacloop4x
encrypt_block v0, w2, x1, x7, w8 encrypt_block v0, w2, x1, x7, w8
.Lmacloop4x: .Lmacloop4x:
subs w3, w3, #4 subs w22, w22, #4
bmi .Lmac1x bmi .Lmac1x
ld1 {v1.16b-v4.16b}, [x0], #64 /* get next pt block */ ld1 {v1.16b-v4.16b}, [x19], #64 /* get next pt block */
eor v0.16b, v0.16b, v1.16b /* ..and xor with dg */ eor v0.16b, v0.16b, v1.16b /* ..and xor with dg */
encrypt_block v0, w2, x1, x7, w8 encrypt_block v0, w21, x20, x7, w8
eor v0.16b, v0.16b, v2.16b eor v0.16b, v0.16b, v2.16b
encrypt_block v0, w2, x1, x7, w8 encrypt_block v0, w21, x20, x7, w8
eor v0.16b, v0.16b, v3.16b eor v0.16b, v0.16b, v3.16b
encrypt_block v0, w2, x1, x7, w8 encrypt_block v0, w21, x20, x7, w8
eor v0.16b, v0.16b, v4.16b eor v0.16b, v0.16b, v4.16b
cmp w3, wzr cmp w22, wzr
csinv x5, x6, xzr, eq csinv x5, x24, xzr, eq
cbz w5, .Lmacout cbz w5, .Lmacout
encrypt_block v0, w2, x1, x7, w8 encrypt_block v0, w21, x20, x7, w8
st1 {v0.16b}, [x23] /* return dg */
cond_yield_neon .Lmacrestart
b .Lmacloop4x b .Lmacloop4x
.Lmac1x: .Lmac1x:
add w3, w3, #4 add w22, w22, #4
.Lmacloop: .Lmacloop:
cbz w3, .Lmacout cbz w22, .Lmacout
ld1 {v1.16b}, [x0], #16 /* get next pt block */ ld1 {v1.16b}, [x19], #16 /* get next pt block */
eor v0.16b, v0.16b, v1.16b /* ..and xor with dg */ eor v0.16b, v0.16b, v1.16b /* ..and xor with dg */
subs w3, w3, #1 subs w22, w22, #1
csinv x5, x6, xzr, eq csinv x5, x24, xzr, eq
cbz w5, .Lmacout cbz w5, .Lmacout
encrypt_block v0, w2, x1, x7, w8 .Lmacenc:
encrypt_block v0, w21, x20, x7, w8
b .Lmacloop b .Lmacloop
.Lmacout: .Lmacout:
st1 {v0.16b}, [x4] /* return dg */ st1 {v0.16b}, [x23] /* return dg */
frame_pop
ret ret
.Lmacrestart:
ld1 {v0.16b}, [x23] /* get dg */
enc_prepare w21, x20, x0
b .Lmacloop4x
AES_ENDPROC(aes_mac_update) AES_ENDPROC(aes_mac_update)
...@@ -565,54 +565,61 @@ ENDPROC(aesbs_decrypt8) ...@@ -565,54 +565,61 @@ ENDPROC(aesbs_decrypt8)
* int blocks) * int blocks)
*/ */
.macro __ecb_crypt, do8, o0, o1, o2, o3, o4, o5, o6, o7 .macro __ecb_crypt, do8, o0, o1, o2, o3, o4, o5, o6, o7
stp x29, x30, [sp, #-16]! frame_push 5
mov x29, sp
mov x19, x0
mov x20, x1
mov x21, x2
mov x22, x3
mov x23, x4
99: mov x5, #1 99: mov x5, #1
lsl x5, x5, x4 lsl x5, x5, x23
subs w4, w4, #8 subs w23, w23, #8
csel x4, x4, xzr, pl csel x23, x23, xzr, pl
csel x5, x5, xzr, mi csel x5, x5, xzr, mi
ld1 {v0.16b}, [x1], #16 ld1 {v0.16b}, [x20], #16
tbnz x5, #1, 0f tbnz x5, #1, 0f
ld1 {v1.16b}, [x1], #16 ld1 {v1.16b}, [x20], #16
tbnz x5, #2, 0f tbnz x5, #2, 0f
ld1 {v2.16b}, [x1], #16 ld1 {v2.16b}, [x20], #16
tbnz x5, #3, 0f tbnz x5, #3, 0f
ld1 {v3.16b}, [x1], #16 ld1 {v3.16b}, [x20], #16
tbnz x5, #4, 0f tbnz x5, #4, 0f
ld1 {v4.16b}, [x1], #16 ld1 {v4.16b}, [x20], #16
tbnz x5, #5, 0f tbnz x5, #5, 0f
ld1 {v5.16b}, [x1], #16 ld1 {v5.16b}, [x20], #16
tbnz x5, #6, 0f tbnz x5, #6, 0f
ld1 {v6.16b}, [x1], #16 ld1 {v6.16b}, [x20], #16
tbnz x5, #7, 0f tbnz x5, #7, 0f
ld1 {v7.16b}, [x1], #16 ld1 {v7.16b}, [x20], #16
0: mov bskey, x2 0: mov bskey, x21
mov rounds, x3 mov rounds, x22
bl \do8 bl \do8
st1 {\o0\().16b}, [x0], #16 st1 {\o0\().16b}, [x19], #16
tbnz x5, #1, 1f tbnz x5, #1, 1f
st1 {\o1\().16b}, [x0], #16 st1 {\o1\().16b}, [x19], #16
tbnz x5, #2, 1f tbnz x5, #2, 1f
st1 {\o2\().16b}, [x0], #16 st1 {\o2\().16b}, [x19], #16
tbnz x5, #3, 1f tbnz x5, #3, 1f
st1 {\o3\().16b}, [x0], #16 st1 {\o3\().16b}, [x19], #16
tbnz x5, #4, 1f tbnz x5, #4, 1f
st1 {\o4\().16b}, [x0], #16 st1 {\o4\().16b}, [x19], #16
tbnz x5, #5, 1f tbnz x5, #5, 1f
st1 {\o5\().16b}, [x0], #16 st1 {\o5\().16b}, [x19], #16
tbnz x5, #6, 1f tbnz x5, #6, 1f
st1 {\o6\().16b}, [x0], #16 st1 {\o6\().16b}, [x19], #16
tbnz x5, #7, 1f tbnz x5, #7, 1f
st1 {\o7\().16b}, [x0], #16 st1 {\o7\().16b}, [x19], #16
cbnz x4, 99b cbz x23, 1f
cond_yield_neon
b 99b
1: ldp x29, x30, [sp], #16 1: frame_pop
ret ret
.endm .endm
...@@ -632,43 +639,49 @@ ENDPROC(aesbs_ecb_decrypt) ...@@ -632,43 +639,49 @@ ENDPROC(aesbs_ecb_decrypt)
*/ */
.align 4 .align 4
ENTRY(aesbs_cbc_decrypt) ENTRY(aesbs_cbc_decrypt)
stp x29, x30, [sp, #-16]! frame_push 6
mov x29, sp
mov x19, x0
mov x20, x1
mov x21, x2
mov x22, x3
mov x23, x4
mov x24, x5
99: mov x6, #1 99: mov x6, #1
lsl x6, x6, x4 lsl x6, x6, x23
subs w4, w4, #8 subs w23, w23, #8
csel x4, x4, xzr, pl csel x23, x23, xzr, pl
csel x6, x6, xzr, mi csel x6, x6, xzr, mi
ld1 {v0.16b}, [x1], #16 ld1 {v0.16b}, [x20], #16
mov v25.16b, v0.16b mov v25.16b, v0.16b
tbnz x6, #1, 0f tbnz x6, #1, 0f
ld1 {v1.16b}, [x1], #16 ld1 {v1.16b}, [x20], #16
mov v26.16b, v1.16b mov v26.16b, v1.16b
tbnz x6, #2, 0f tbnz x6, #2, 0f
ld1 {v2.16b}, [x1], #16 ld1 {v2.16b}, [x20], #16
mov v27.16b, v2.16b mov v27.16b, v2.16b
tbnz x6, #3, 0f tbnz x6, #3, 0f
ld1 {v3.16b}, [x1], #16 ld1 {v3.16b}, [x20], #16
mov v28.16b, v3.16b mov v28.16b, v3.16b
tbnz x6, #4, 0f tbnz x6, #4, 0f
ld1 {v4.16b}, [x1], #16 ld1 {v4.16b}, [x20], #16
mov v29.16b, v4.16b mov v29.16b, v4.16b
tbnz x6, #5, 0f tbnz x6, #5, 0f
ld1 {v5.16b}, [x1], #16 ld1 {v5.16b}, [x20], #16
mov v30.16b, v5.16b mov v30.16b, v5.16b
tbnz x6, #6, 0f tbnz x6, #6, 0f
ld1 {v6.16b}, [x1], #16 ld1 {v6.16b}, [x20], #16
mov v31.16b, v6.16b mov v31.16b, v6.16b
tbnz x6, #7, 0f tbnz x6, #7, 0f
ld1 {v7.16b}, [x1] ld1 {v7.16b}, [x20]
0: mov bskey, x2 0: mov bskey, x21
mov rounds, x3 mov rounds, x22
bl aesbs_decrypt8 bl aesbs_decrypt8
ld1 {v24.16b}, [x5] // load IV ld1 {v24.16b}, [x24] // load IV
eor v1.16b, v1.16b, v25.16b eor v1.16b, v1.16b, v25.16b
eor v6.16b, v6.16b, v26.16b eor v6.16b, v6.16b, v26.16b
...@@ -679,34 +692,36 @@ ENTRY(aesbs_cbc_decrypt) ...@@ -679,34 +692,36 @@ ENTRY(aesbs_cbc_decrypt)
eor v3.16b, v3.16b, v30.16b eor v3.16b, v3.16b, v30.16b
eor v5.16b, v5.16b, v31.16b eor v5.16b, v5.16b, v31.16b
st1 {v0.16b}, [x0], #16 st1 {v0.16b}, [x19], #16
mov v24.16b, v25.16b mov v24.16b, v25.16b
tbnz x6, #1, 1f tbnz x6, #1, 1f
st1 {v1.16b}, [x0], #16 st1 {v1.16b}, [x19], #16
mov v24.16b, v26.16b mov v24.16b, v26.16b
tbnz x6, #2, 1f tbnz x6, #2, 1f
st1 {v6.16b}, [x0], #16 st1 {v6.16b}, [x19], #16
mov v24.16b, v27.16b mov v24.16b, v27.16b
tbnz x6, #3, 1f tbnz x6, #3, 1f
st1 {v4.16b}, [x0], #16 st1 {v4.16b}, [x19], #16
mov v24.16b, v28.16b mov v24.16b, v28.16b
tbnz x6, #4, 1f tbnz x6, #4, 1f
st1 {v2.16b}, [x0], #16 st1 {v2.16b}, [x19], #16
mov v24.16b, v29.16b mov v24.16b, v29.16b
tbnz x6, #5, 1f tbnz x6, #5, 1f
st1 {v7.16b}, [x0], #16 st1 {v7.16b}, [x19], #16
mov v24.16b, v30.16b mov v24.16b, v30.16b
tbnz x6, #6, 1f tbnz x6, #6, 1f
st1 {v3.16b}, [x0], #16 st1 {v3.16b}, [x19], #16
mov v24.16b, v31.16b mov v24.16b, v31.16b
tbnz x6, #7, 1f tbnz x6, #7, 1f
ld1 {v24.16b}, [x1], #16 ld1 {v24.16b}, [x20], #16
st1 {v5.16b}, [x0], #16 st1 {v5.16b}, [x19], #16
1: st1 {v24.16b}, [x5] // store IV 1: st1 {v24.16b}, [x24] // store IV
cbnz x4, 99b cbz x23, 2f
cond_yield_neon
b 99b
ldp x29, x30, [sp], #16 2: frame_pop
ret ret
ENDPROC(aesbs_cbc_decrypt) ENDPROC(aesbs_cbc_decrypt)
...@@ -731,87 +746,93 @@ CPU_BE( .quad 0x87, 1 ) ...@@ -731,87 +746,93 @@ CPU_BE( .quad 0x87, 1 )
*/ */
__xts_crypt8: __xts_crypt8:
mov x6, #1 mov x6, #1
lsl x6, x6, x4 lsl x6, x6, x23
subs w4, w4, #8 subs w23, w23, #8
csel x4, x4, xzr, pl csel x23, x23, xzr, pl
csel x6, x6, xzr, mi csel x6, x6, xzr, mi
ld1 {v0.16b}, [x1], #16 ld1 {v0.16b}, [x20], #16
next_tweak v26, v25, v30, v31 next_tweak v26, v25, v30, v31
eor v0.16b, v0.16b, v25.16b eor v0.16b, v0.16b, v25.16b
tbnz x6, #1, 0f tbnz x6, #1, 0f
ld1 {v1.16b}, [x1], #16 ld1 {v1.16b}, [x20], #16
next_tweak v27, v26, v30, v31 next_tweak v27, v26, v30, v31
eor v1.16b, v1.16b, v26.16b eor v1.16b, v1.16b, v26.16b
tbnz x6, #2, 0f tbnz x6, #2, 0f
ld1 {v2.16b}, [x1], #16 ld1 {v2.16b}, [x20], #16
next_tweak v28, v27, v30, v31 next_tweak v28, v27, v30, v31
eor v2.16b, v2.16b, v27.16b eor v2.16b, v2.16b, v27.16b
tbnz x6, #3, 0f tbnz x6, #3, 0f
ld1 {v3.16b}, [x1], #16 ld1 {v3.16b}, [x20], #16
next_tweak v29, v28, v30, v31 next_tweak v29, v28, v30, v31
eor v3.16b, v3.16b, v28.16b eor v3.16b, v3.16b, v28.16b
tbnz x6, #4, 0f tbnz x6, #4, 0f
ld1 {v4.16b}, [x1], #16 ld1 {v4.16b}, [x20], #16
str q29, [sp, #16] str q29, [sp, #.Lframe_local_offset]
eor v4.16b, v4.16b, v29.16b eor v4.16b, v4.16b, v29.16b
next_tweak v29, v29, v30, v31 next_tweak v29, v29, v30, v31
tbnz x6, #5, 0f tbnz x6, #5, 0f
ld1 {v5.16b}, [x1], #16 ld1 {v5.16b}, [x20], #16
str q29, [sp, #32] str q29, [sp, #.Lframe_local_offset + 16]
eor v5.16b, v5.16b, v29.16b eor v5.16b, v5.16b, v29.16b
next_tweak v29, v29, v30, v31 next_tweak v29, v29, v30, v31
tbnz x6, #6, 0f tbnz x6, #6, 0f
ld1 {v6.16b}, [x1], #16 ld1 {v6.16b}, [x20], #16
str q29, [sp, #48] str q29, [sp, #.Lframe_local_offset + 32]
eor v6.16b, v6.16b, v29.16b eor v6.16b, v6.16b, v29.16b
next_tweak v29, v29, v30, v31 next_tweak v29, v29, v30, v31
tbnz x6, #7, 0f tbnz x6, #7, 0f
ld1 {v7.16b}, [x1], #16 ld1 {v7.16b}, [x20], #16
str q29, [sp, #64] str q29, [sp, #.Lframe_local_offset + 48]
eor v7.16b, v7.16b, v29.16b eor v7.16b, v7.16b, v29.16b
next_tweak v29, v29, v30, v31 next_tweak v29, v29, v30, v31
0: mov bskey, x2 0: mov bskey, x21
mov rounds, x3 mov rounds, x22
br x7 br x7
ENDPROC(__xts_crypt8) ENDPROC(__xts_crypt8)
.macro __xts_crypt, do8, o0, o1, o2, o3, o4, o5, o6, o7 .macro __xts_crypt, do8, o0, o1, o2, o3, o4, o5, o6, o7
stp x29, x30, [sp, #-80]! frame_push 6, 64
mov x29, sp
ldr q30, .Lxts_mul_x mov x19, x0
ld1 {v25.16b}, [x5] mov x20, x1
mov x21, x2
mov x22, x3
mov x23, x4
mov x24, x5
0: ldr q30, .Lxts_mul_x
ld1 {v25.16b}, [x24]
99: adr x7, \do8 99: adr x7, \do8
bl __xts_crypt8 bl __xts_crypt8
ldp q16, q17, [sp, #16] ldp q16, q17, [sp, #.Lframe_local_offset]
ldp q18, q19, [sp, #48] ldp q18, q19, [sp, #.Lframe_local_offset + 32]
eor \o0\().16b, \o0\().16b, v25.16b eor \o0\().16b, \o0\().16b, v25.16b
eor \o1\().16b, \o1\().16b, v26.16b eor \o1\().16b, \o1\().16b, v26.16b
eor \o2\().16b, \o2\().16b, v27.16b eor \o2\().16b, \o2\().16b, v27.16b
eor \o3\().16b, \o3\().16b, v28.16b eor \o3\().16b, \o3\().16b, v28.16b
st1 {\o0\().16b}, [x0], #16 st1 {\o0\().16b}, [x19], #16
mov v25.16b, v26.16b mov v25.16b, v26.16b
tbnz x6, #1, 1f tbnz x6, #1, 1f
st1 {\o1\().16b}, [x0], #16 st1 {\o1\().16b}, [x19], #16
mov v25.16b, v27.16b mov v25.16b, v27.16b
tbnz x6, #2, 1f tbnz x6, #2, 1f
st1 {\o2\().16b}, [x0], #16 st1 {\o2\().16b}, [x19], #16
mov v25.16b, v28.16b mov v25.16b, v28.16b
tbnz x6, #3, 1f tbnz x6, #3, 1f
st1 {\o3\().16b}, [x0], #16 st1 {\o3\().16b}, [x19], #16
mov v25.16b, v29.16b mov v25.16b, v29.16b
tbnz x6, #4, 1f tbnz x6, #4, 1f
...@@ -820,18 +841,22 @@ ENDPROC(__xts_crypt8) ...@@ -820,18 +841,22 @@ ENDPROC(__xts_crypt8)
eor \o6\().16b, \o6\().16b, v18.16b eor \o6\().16b, \o6\().16b, v18.16b
eor \o7\().16b, \o7\().16b, v19.16b eor \o7\().16b, \o7\().16b, v19.16b
st1 {\o4\().16b}, [x0], #16 st1 {\o4\().16b}, [x19], #16
tbnz x6, #5, 1f tbnz x6, #5, 1f
st1 {\o5\().16b}, [x0], #16 st1 {\o5\().16b}, [x19], #16
tbnz x6, #6, 1f tbnz x6, #6, 1f
st1 {\o6\().16b}, [x0], #16 st1 {\o6\().16b}, [x19], #16
tbnz x6, #7, 1f tbnz x6, #7, 1f
st1 {\o7\().16b}, [x0], #16 st1 {\o7\().16b}, [x19], #16
cbz x23, 1f
st1 {v25.16b}, [x24]
cbnz x4, 99b cond_yield_neon 0b
b 99b
1: st1 {v25.16b}, [x5] 1: st1 {v25.16b}, [x24]
ldp x29, x30, [sp], #80 frame_pop
ret ret
.endm .endm
...@@ -856,24 +881,31 @@ ENDPROC(aesbs_xts_decrypt) ...@@ -856,24 +881,31 @@ ENDPROC(aesbs_xts_decrypt)
* int rounds, int blocks, u8 iv[], u8 final[]) * int rounds, int blocks, u8 iv[], u8 final[])
*/ */
ENTRY(aesbs_ctr_encrypt) ENTRY(aesbs_ctr_encrypt)
stp x29, x30, [sp, #-16]! frame_push 8
mov x29, sp
mov x19, x0
cmp x6, #0 mov x20, x1
cset x10, ne mov x21, x2
add x4, x4, x10 // do one extra block if final mov x22, x3
mov x23, x4
ldp x7, x8, [x5] mov x24, x5
ld1 {v0.16b}, [x5] mov x25, x6
cmp x25, #0
cset x26, ne
add x23, x23, x26 // do one extra block if final
98: ldp x7, x8, [x24]
ld1 {v0.16b}, [x24]
CPU_LE( rev x7, x7 ) CPU_LE( rev x7, x7 )
CPU_LE( rev x8, x8 ) CPU_LE( rev x8, x8 )
adds x8, x8, #1 adds x8, x8, #1
adc x7, x7, xzr adc x7, x7, xzr
99: mov x9, #1 99: mov x9, #1
lsl x9, x9, x4 lsl x9, x9, x23
subs w4, w4, #8 subs w23, w23, #8
csel x4, x4, xzr, pl csel x23, x23, xzr, pl
csel x9, x9, xzr, le csel x9, x9, xzr, le
tbnz x9, #1, 0f tbnz x9, #1, 0f
...@@ -891,82 +923,85 @@ CPU_LE( rev x8, x8 ) ...@@ -891,82 +923,85 @@ CPU_LE( rev x8, x8 )
tbnz x9, #7, 0f tbnz x9, #7, 0f
next_ctr v7 next_ctr v7
0: mov bskey, x2 0: mov bskey, x21
mov rounds, x3 mov rounds, x22
bl aesbs_encrypt8 bl aesbs_encrypt8
lsr x9, x9, x10 // disregard the extra block lsr x9, x9, x26 // disregard the extra block
tbnz x9, #0, 0f tbnz x9, #0, 0f
ld1 {v8.16b}, [x1], #16 ld1 {v8.16b}, [x20], #16
eor v0.16b, v0.16b, v8.16b eor v0.16b, v0.16b, v8.16b
st1 {v0.16b}, [x0], #16 st1 {v0.16b}, [x19], #16
tbnz x9, #1, 1f tbnz x9, #1, 1f
ld1 {v9.16b}, [x1], #16 ld1 {v9.16b}, [x20], #16
eor v1.16b, v1.16b, v9.16b eor v1.16b, v1.16b, v9.16b
st1 {v1.16b}, [x0], #16 st1 {v1.16b}, [x19], #16
tbnz x9, #2, 2f tbnz x9, #2, 2f
ld1 {v10.16b}, [x1], #16 ld1 {v10.16b}, [x20], #16
eor v4.16b, v4.16b, v10.16b eor v4.16b, v4.16b, v10.16b
st1 {v4.16b}, [x0], #16 st1 {v4.16b}, [x19], #16
tbnz x9, #3, 3f tbnz x9, #3, 3f
ld1 {v11.16b}, [x1], #16 ld1 {v11.16b}, [x20], #16
eor v6.16b, v6.16b, v11.16b eor v6.16b, v6.16b, v11.16b
st1 {v6.16b}, [x0], #16 st1 {v6.16b}, [x19], #16
tbnz x9, #4, 4f tbnz x9, #4, 4f
ld1 {v12.16b}, [x1], #16 ld1 {v12.16b}, [x20], #16
eor v3.16b, v3.16b, v12.16b eor v3.16b, v3.16b, v12.16b
st1 {v3.16b}, [x0], #16 st1 {v3.16b}, [x19], #16
tbnz x9, #5, 5f tbnz x9, #5, 5f
ld1 {v13.16b}, [x1], #16 ld1 {v13.16b}, [x20], #16
eor v7.16b, v7.16b, v13.16b eor v7.16b, v7.16b, v13.16b
st1 {v7.16b}, [x0], #16 st1 {v7.16b}, [x19], #16
tbnz x9, #6, 6f tbnz x9, #6, 6f
ld1 {v14.16b}, [x1], #16 ld1 {v14.16b}, [x20], #16
eor v2.16b, v2.16b, v14.16b eor v2.16b, v2.16b, v14.16b
st1 {v2.16b}, [x0], #16 st1 {v2.16b}, [x19], #16
tbnz x9, #7, 7f tbnz x9, #7, 7f
ld1 {v15.16b}, [x1], #16 ld1 {v15.16b}, [x20], #16
eor v5.16b, v5.16b, v15.16b eor v5.16b, v5.16b, v15.16b
st1 {v5.16b}, [x0], #16 st1 {v5.16b}, [x19], #16
8: next_ctr v0 8: next_ctr v0
cbnz x4, 99b st1 {v0.16b}, [x24]
cbz x23, 0f
cond_yield_neon 98b
b 99b
0: st1 {v0.16b}, [x5] 0: frame_pop
ldp x29, x30, [sp], #16
ret ret
/* /*
* If we are handling the tail of the input (x6 != NULL), return the * If we are handling the tail of the input (x6 != NULL), return the
* final keystream block back to the caller. * final keystream block back to the caller.
*/ */
1: cbz x6, 8b 1: cbz x25, 8b
st1 {v1.16b}, [x6] st1 {v1.16b}, [x25]
b 8b b 8b
2: cbz x6, 8b 2: cbz x25, 8b
st1 {v4.16b}, [x6] st1 {v4.16b}, [x25]
b 8b b 8b
3: cbz x6, 8b 3: cbz x25, 8b
st1 {v6.16b}, [x6] st1 {v6.16b}, [x25]
b 8b b 8b
4: cbz x6, 8b 4: cbz x25, 8b
st1 {v3.16b}, [x6] st1 {v3.16b}, [x25]
b 8b b 8b
5: cbz x6, 8b 5: cbz x25, 8b
st1 {v7.16b}, [x6] st1 {v7.16b}, [x25]
b 8b b 8b
6: cbz x6, 8b 6: cbz x25, 8b
st1 {v2.16b}, [x6] st1 {v2.16b}, [x25]
b 8b b 8b
7: cbz x6, 8b 7: cbz x25, 8b
st1 {v5.16b}, [x6] st1 {v5.16b}, [x25]
b 8b b 8b
ENDPROC(aesbs_ctr_encrypt) ENDPROC(aesbs_ctr_encrypt)
...@@ -100,9 +100,10 @@ ...@@ -100,9 +100,10 @@
dCONSTANT .req d0 dCONSTANT .req d0
qCONSTANT .req q0 qCONSTANT .req q0
BUF .req x0 BUF .req x19
LEN .req x1 LEN .req x20
CRC .req x2 CRC .req x21
CONST .req x22
vzr .req v9 vzr .req v9
...@@ -123,7 +124,14 @@ ENTRY(crc32_pmull_le) ...@@ -123,7 +124,14 @@ ENTRY(crc32_pmull_le)
ENTRY(crc32c_pmull_le) ENTRY(crc32c_pmull_le)
adr_l x3, .Lcrc32c_constants adr_l x3, .Lcrc32c_constants
0: bic LEN, LEN, #15 0: frame_push 4, 64
mov BUF, x0
mov LEN, x1
mov CRC, x2
mov CONST, x3
bic LEN, LEN, #15
ld1 {v1.16b-v4.16b}, [BUF], #0x40 ld1 {v1.16b-v4.16b}, [BUF], #0x40
movi vzr.16b, #0 movi vzr.16b, #0
fmov dCONSTANT, CRC fmov dCONSTANT, CRC
...@@ -132,7 +140,7 @@ ENTRY(crc32c_pmull_le) ...@@ -132,7 +140,7 @@ ENTRY(crc32c_pmull_le)
cmp LEN, #0x40 cmp LEN, #0x40
b.lt less_64 b.lt less_64
ldr qCONSTANT, [x3] ldr qCONSTANT, [CONST]
loop_64: /* 64 bytes Full cache line folding */ loop_64: /* 64 bytes Full cache line folding */
sub LEN, LEN, #0x40 sub LEN, LEN, #0x40
...@@ -162,10 +170,21 @@ loop_64: /* 64 bytes Full cache line folding */ ...@@ -162,10 +170,21 @@ loop_64: /* 64 bytes Full cache line folding */
eor v4.16b, v4.16b, v8.16b eor v4.16b, v4.16b, v8.16b
cmp LEN, #0x40 cmp LEN, #0x40
b.ge loop_64 b.lt less_64
if_will_cond_yield_neon
stp q1, q2, [sp, #.Lframe_local_offset]
stp q3, q4, [sp, #.Lframe_local_offset + 32]
do_cond_yield_neon
ldp q1, q2, [sp, #.Lframe_local_offset]
ldp q3, q4, [sp, #.Lframe_local_offset + 32]
ldr qCONSTANT, [CONST]
movi vzr.16b, #0
endif_yield_neon
b loop_64
less_64: /* Folding cache line into 128bit */ less_64: /* Folding cache line into 128bit */
ldr qCONSTANT, [x3, #16] ldr qCONSTANT, [CONST, #16]
pmull2 v5.1q, v1.2d, vCONSTANT.2d pmull2 v5.1q, v1.2d, vCONSTANT.2d
pmull v1.1q, v1.1d, vCONSTANT.1d pmull v1.1q, v1.1d, vCONSTANT.1d
...@@ -204,8 +223,8 @@ fold_64: ...@@ -204,8 +223,8 @@ fold_64:
eor v1.16b, v1.16b, v2.16b eor v1.16b, v1.16b, v2.16b
/* final 32-bit fold */ /* final 32-bit fold */
ldr dCONSTANT, [x3, #32] ldr dCONSTANT, [CONST, #32]
ldr d3, [x3, #40] ldr d3, [CONST, #40]
ext v2.16b, v1.16b, vzr.16b, #4 ext v2.16b, v1.16b, vzr.16b, #4
and v1.16b, v1.16b, v3.16b and v1.16b, v1.16b, v3.16b
...@@ -213,7 +232,7 @@ fold_64: ...@@ -213,7 +232,7 @@ fold_64:
eor v1.16b, v1.16b, v2.16b eor v1.16b, v1.16b, v2.16b
/* Finish up with the bit-reversed barrett reduction 64 ==> 32 bits */ /* Finish up with the bit-reversed barrett reduction 64 ==> 32 bits */
ldr qCONSTANT, [x3, #48] ldr qCONSTANT, [CONST, #48]
and v2.16b, v1.16b, v3.16b and v2.16b, v1.16b, v3.16b
ext v2.16b, vzr.16b, v2.16b, #8 ext v2.16b, vzr.16b, v2.16b, #8
...@@ -223,6 +242,7 @@ fold_64: ...@@ -223,6 +242,7 @@ fold_64:
eor v1.16b, v1.16b, v2.16b eor v1.16b, v1.16b, v2.16b
mov w0, v1.s[1] mov w0, v1.s[1]
frame_pop
ret ret
ENDPROC(crc32_pmull_le) ENDPROC(crc32_pmull_le)
ENDPROC(crc32c_pmull_le) ENDPROC(crc32c_pmull_le)
......
...@@ -74,13 +74,19 @@ ...@@ -74,13 +74,19 @@
.text .text
.cpu generic+crypto .cpu generic+crypto
arg1_low32 .req w0 arg1_low32 .req w19
arg2 .req x1 arg2 .req x20
arg3 .req x2 arg3 .req x21
vzr .req v13 vzr .req v13
ENTRY(crc_t10dif_pmull) ENTRY(crc_t10dif_pmull)
frame_push 3, 128
mov arg1_low32, w0
mov arg2, x1
mov arg3, x2
movi vzr.16b, #0 // init zero register movi vzr.16b, #0 // init zero register
// adjust the 16-bit initial_crc value, scale it to 32 bits // adjust the 16-bit initial_crc value, scale it to 32 bits
...@@ -175,8 +181,25 @@ CPU_LE( ext v12.16b, v12.16b, v12.16b, #8 ) ...@@ -175,8 +181,25 @@ CPU_LE( ext v12.16b, v12.16b, v12.16b, #8 )
subs arg3, arg3, #128 subs arg3, arg3, #128
// check if there is another 64B in the buffer to be able to fold // check if there is another 64B in the buffer to be able to fold
b.ge _fold_64_B_loop b.lt _fold_64_B_end
if_will_cond_yield_neon
stp q0, q1, [sp, #.Lframe_local_offset]
stp q2, q3, [sp, #.Lframe_local_offset + 32]
stp q4, q5, [sp, #.Lframe_local_offset + 64]
stp q6, q7, [sp, #.Lframe_local_offset + 96]
do_cond_yield_neon
ldp q0, q1, [sp, #.Lframe_local_offset]
ldp q2, q3, [sp, #.Lframe_local_offset + 32]
ldp q4, q5, [sp, #.Lframe_local_offset + 64]
ldp q6, q7, [sp, #.Lframe_local_offset + 96]
ldr_l q10, rk3, x8
movi vzr.16b, #0 // init zero register
endif_yield_neon
b _fold_64_B_loop
_fold_64_B_end:
// at this point, the buffer pointer is pointing at the last y Bytes // at this point, the buffer pointer is pointing at the last y Bytes
// of the buffer the 64B of folded data is in 4 of the vector // of the buffer the 64B of folded data is in 4 of the vector
// registers: v0, v1, v2, v3 // registers: v0, v1, v2, v3
...@@ -304,6 +327,7 @@ _barrett: ...@@ -304,6 +327,7 @@ _barrett:
_cleanup: _cleanup:
// scale the result back to 16 bits // scale the result back to 16 bits
lsr x0, x0, #16 lsr x0, x0, #16
frame_pop
ret ret
_less_than_128: _less_than_128:
......
...@@ -213,22 +213,31 @@ ...@@ -213,22 +213,31 @@
.endm .endm
.macro __pmull_ghash, pn .macro __pmull_ghash, pn
ld1 {SHASH.2d}, [x3] frame_push 5
ld1 {XL.2d}, [x1]
mov x19, x0
mov x20, x1
mov x21, x2
mov x22, x3
mov x23, x4
0: ld1 {SHASH.2d}, [x22]
ld1 {XL.2d}, [x20]
ext SHASH2.16b, SHASH.16b, SHASH.16b, #8 ext SHASH2.16b, SHASH.16b, SHASH.16b, #8
eor SHASH2.16b, SHASH2.16b, SHASH.16b eor SHASH2.16b, SHASH2.16b, SHASH.16b
__pmull_pre_\pn __pmull_pre_\pn
/* do the head block first, if supplied */ /* do the head block first, if supplied */
cbz x4, 0f cbz x23, 1f
ld1 {T1.2d}, [x4] ld1 {T1.2d}, [x23]
b 1f mov x23, xzr
b 2f
0: ld1 {T1.2d}, [x2], #16 1: ld1 {T1.2d}, [x21], #16
sub w0, w0, #1 sub w19, w19, #1
1: /* multiply XL by SHASH in GF(2^128) */ 2: /* multiply XL by SHASH in GF(2^128) */
CPU_LE( rev64 T1.16b, T1.16b ) CPU_LE( rev64 T1.16b, T1.16b )
ext T2.16b, XL.16b, XL.16b, #8 ext T2.16b, XL.16b, XL.16b, #8
...@@ -250,9 +259,18 @@ CPU_LE( rev64 T1.16b, T1.16b ) ...@@ -250,9 +259,18 @@ CPU_LE( rev64 T1.16b, T1.16b )
eor T2.16b, T2.16b, XH.16b eor T2.16b, T2.16b, XH.16b
eor XL.16b, XL.16b, T2.16b eor XL.16b, XL.16b, T2.16b
cbnz w0, 0b cbz w19, 3f
if_will_cond_yield_neon
st1 {XL.2d}, [x20]
do_cond_yield_neon
b 0b
endif_yield_neon
b 1b
st1 {XL.2d}, [x1] 3: st1 {XL.2d}, [x20]
frame_pop
ret ret
.endm .endm
...@@ -304,38 +322,55 @@ ENDPROC(pmull_ghash_update_p8) ...@@ -304,38 +322,55 @@ ENDPROC(pmull_ghash_update_p8)
.endm .endm
.macro pmull_gcm_do_crypt, enc .macro pmull_gcm_do_crypt, enc
ld1 {SHASH.2d}, [x4] frame_push 10
ld1 {XL.2d}, [x1]
ldr x8, [x5, #8] // load lower counter mov x19, x0
mov x20, x1
mov x21, x2
mov x22, x3
mov x23, x4
mov x24, x5
mov x25, x6
mov x26, x7
.if \enc == 1
ldr x27, [sp, #96] // first stacked arg
.endif
ldr x28, [x24, #8] // load lower counter
CPU_LE( rev x28, x28 )
0: mov x0, x25
load_round_keys w26, x0
ld1 {SHASH.2d}, [x23]
ld1 {XL.2d}, [x20]
movi MASK.16b, #0xe1 movi MASK.16b, #0xe1
ext SHASH2.16b, SHASH.16b, SHASH.16b, #8 ext SHASH2.16b, SHASH.16b, SHASH.16b, #8
CPU_LE( rev x8, x8 )
shl MASK.2d, MASK.2d, #57 shl MASK.2d, MASK.2d, #57
eor SHASH2.16b, SHASH2.16b, SHASH.16b eor SHASH2.16b, SHASH2.16b, SHASH.16b
.if \enc == 1 .if \enc == 1
ld1 {KS.16b}, [x7] ld1 {KS.16b}, [x27]
.endif .endif
0: ld1 {CTR.8b}, [x5] // load upper counter 1: ld1 {CTR.8b}, [x24] // load upper counter
ld1 {INP.16b}, [x3], #16 ld1 {INP.16b}, [x22], #16
rev x9, x8 rev x9, x28
add x8, x8, #1 add x28, x28, #1
sub w0, w0, #1 sub w19, w19, #1
ins CTR.d[1], x9 // set lower counter ins CTR.d[1], x9 // set lower counter
.if \enc == 1 .if \enc == 1
eor INP.16b, INP.16b, KS.16b // encrypt input eor INP.16b, INP.16b, KS.16b // encrypt input
st1 {INP.16b}, [x2], #16 st1 {INP.16b}, [x21], #16
.endif .endif
rev64 T1.16b, INP.16b rev64 T1.16b, INP.16b
cmp w6, #12 cmp w26, #12
b.ge 2f // AES-192/256? b.ge 4f // AES-192/256?
1: enc_round CTR, v21 2: enc_round CTR, v21
ext T2.16b, XL.16b, XL.16b, #8 ext T2.16b, XL.16b, XL.16b, #8
ext IN1.16b, T1.16b, T1.16b, #8 ext IN1.16b, T1.16b, T1.16b, #8
...@@ -390,27 +425,39 @@ CPU_LE( rev x8, x8 ) ...@@ -390,27 +425,39 @@ CPU_LE( rev x8, x8 )
.if \enc == 0 .if \enc == 0
eor INP.16b, INP.16b, KS.16b eor INP.16b, INP.16b, KS.16b
st1 {INP.16b}, [x2], #16 st1 {INP.16b}, [x21], #16
.endif .endif
cbnz w0, 0b cbz w19, 3f
CPU_LE( rev x8, x8 ) if_will_cond_yield_neon
st1 {XL.2d}, [x1] st1 {XL.2d}, [x20]
str x8, [x5, #8] // store lower counter .if \enc == 1
st1 {KS.16b}, [x27]
.endif
do_cond_yield_neon
b 0b
endif_yield_neon
b 1b
3: st1 {XL.2d}, [x20]
.if \enc == 1 .if \enc == 1
st1 {KS.16b}, [x7] st1 {KS.16b}, [x27]
.endif .endif
CPU_LE( rev x28, x28 )
str x28, [x24, #8] // store lower counter
frame_pop
ret ret
2: b.eq 3f // AES-192? 4: b.eq 5f // AES-192?
enc_round CTR, v17 enc_round CTR, v17
enc_round CTR, v18 enc_round CTR, v18
3: enc_round CTR, v19 5: enc_round CTR, v19
enc_round CTR, v20 enc_round CTR, v20
b 1b b 2b
.endm .endm
/* /*
......
...@@ -63,11 +63,12 @@ static void (*pmull_ghash_update)(int blocks, u64 dg[], const char *src, ...@@ -63,11 +63,12 @@ static void (*pmull_ghash_update)(int blocks, u64 dg[], const char *src,
asmlinkage void pmull_gcm_encrypt(int blocks, u64 dg[], u8 dst[], asmlinkage void pmull_gcm_encrypt(int blocks, u64 dg[], u8 dst[],
const u8 src[], struct ghash_key const *k, const u8 src[], struct ghash_key const *k,
u8 ctr[], int rounds, u8 ks[]); u8 ctr[], u32 const rk[], int rounds,
u8 ks[]);
asmlinkage void pmull_gcm_decrypt(int blocks, u64 dg[], u8 dst[], asmlinkage void pmull_gcm_decrypt(int blocks, u64 dg[], u8 dst[],
const u8 src[], struct ghash_key const *k, const u8 src[], struct ghash_key const *k,
u8 ctr[], int rounds); u8 ctr[], u32 const rk[], int rounds);
asmlinkage void pmull_gcm_encrypt_block(u8 dst[], u8 const src[], asmlinkage void pmull_gcm_encrypt_block(u8 dst[], u8 const src[],
u32 const rk[], int rounds); u32 const rk[], int rounds);
...@@ -368,26 +369,29 @@ static int gcm_encrypt(struct aead_request *req) ...@@ -368,26 +369,29 @@ static int gcm_encrypt(struct aead_request *req)
pmull_gcm_encrypt_block(ks, iv, NULL, pmull_gcm_encrypt_block(ks, iv, NULL,
num_rounds(&ctx->aes_key)); num_rounds(&ctx->aes_key));
put_unaligned_be32(3, iv + GCM_IV_SIZE); put_unaligned_be32(3, iv + GCM_IV_SIZE);
kernel_neon_end();
err = skcipher_walk_aead_encrypt(&walk, req, true); err = skcipher_walk_aead_encrypt(&walk, req, false);
while (walk.nbytes >= AES_BLOCK_SIZE) { while (walk.nbytes >= AES_BLOCK_SIZE) {
int blocks = walk.nbytes / AES_BLOCK_SIZE; int blocks = walk.nbytes / AES_BLOCK_SIZE;
kernel_neon_begin();
pmull_gcm_encrypt(blocks, dg, walk.dst.virt.addr, pmull_gcm_encrypt(blocks, dg, walk.dst.virt.addr,
walk.src.virt.addr, &ctx->ghash_key, walk.src.virt.addr, &ctx->ghash_key,
iv, num_rounds(&ctx->aes_key), ks); iv, ctx->aes_key.key_enc,
num_rounds(&ctx->aes_key), ks);
kernel_neon_end();
err = skcipher_walk_done(&walk, err = skcipher_walk_done(&walk,
walk.nbytes % AES_BLOCK_SIZE); walk.nbytes % AES_BLOCK_SIZE);
} }
kernel_neon_end();
} else { } else {
__aes_arm64_encrypt(ctx->aes_key.key_enc, tag, iv, __aes_arm64_encrypt(ctx->aes_key.key_enc, tag, iv,
num_rounds(&ctx->aes_key)); num_rounds(&ctx->aes_key));
put_unaligned_be32(2, iv + GCM_IV_SIZE); put_unaligned_be32(2, iv + GCM_IV_SIZE);
err = skcipher_walk_aead_encrypt(&walk, req, true); err = skcipher_walk_aead_encrypt(&walk, req, false);
while (walk.nbytes >= AES_BLOCK_SIZE) { while (walk.nbytes >= AES_BLOCK_SIZE) {
int blocks = walk.nbytes / AES_BLOCK_SIZE; int blocks = walk.nbytes / AES_BLOCK_SIZE;
...@@ -467,15 +471,19 @@ static int gcm_decrypt(struct aead_request *req) ...@@ -467,15 +471,19 @@ static int gcm_decrypt(struct aead_request *req)
pmull_gcm_encrypt_block(tag, iv, ctx->aes_key.key_enc, pmull_gcm_encrypt_block(tag, iv, ctx->aes_key.key_enc,
num_rounds(&ctx->aes_key)); num_rounds(&ctx->aes_key));
put_unaligned_be32(2, iv + GCM_IV_SIZE); put_unaligned_be32(2, iv + GCM_IV_SIZE);
kernel_neon_end();
err = skcipher_walk_aead_decrypt(&walk, req, true); err = skcipher_walk_aead_decrypt(&walk, req, false);
while (walk.nbytes >= AES_BLOCK_SIZE) { while (walk.nbytes >= AES_BLOCK_SIZE) {
int blocks = walk.nbytes / AES_BLOCK_SIZE; int blocks = walk.nbytes / AES_BLOCK_SIZE;
kernel_neon_begin();
pmull_gcm_decrypt(blocks, dg, walk.dst.virt.addr, pmull_gcm_decrypt(blocks, dg, walk.dst.virt.addr,
walk.src.virt.addr, &ctx->ghash_key, walk.src.virt.addr, &ctx->ghash_key,
iv, num_rounds(&ctx->aes_key)); iv, ctx->aes_key.key_enc,
num_rounds(&ctx->aes_key));
kernel_neon_end();
err = skcipher_walk_done(&walk, err = skcipher_walk_done(&walk,
walk.nbytes % AES_BLOCK_SIZE); walk.nbytes % AES_BLOCK_SIZE);
...@@ -483,14 +491,12 @@ static int gcm_decrypt(struct aead_request *req) ...@@ -483,14 +491,12 @@ static int gcm_decrypt(struct aead_request *req)
if (walk.nbytes) if (walk.nbytes)
pmull_gcm_encrypt_block(iv, iv, NULL, pmull_gcm_encrypt_block(iv, iv, NULL,
num_rounds(&ctx->aes_key)); num_rounds(&ctx->aes_key));
kernel_neon_end();
} else { } else {
__aes_arm64_encrypt(ctx->aes_key.key_enc, tag, iv, __aes_arm64_encrypt(ctx->aes_key.key_enc, tag, iv,
num_rounds(&ctx->aes_key)); num_rounds(&ctx->aes_key));
put_unaligned_be32(2, iv + GCM_IV_SIZE); put_unaligned_be32(2, iv + GCM_IV_SIZE);
err = skcipher_walk_aead_decrypt(&walk, req, true); err = skcipher_walk_aead_decrypt(&walk, req, false);
while (walk.nbytes >= AES_BLOCK_SIZE) { while (walk.nbytes >= AES_BLOCK_SIZE) {
int blocks = walk.nbytes / AES_BLOCK_SIZE; int blocks = walk.nbytes / AES_BLOCK_SIZE;
......
...@@ -69,30 +69,36 @@ ...@@ -69,30 +69,36 @@
* int blocks) * int blocks)
*/ */
ENTRY(sha1_ce_transform) ENTRY(sha1_ce_transform)
frame_push 3
mov x19, x0
mov x20, x1
mov x21, x2
/* load round constants */ /* load round constants */
loadrc k0.4s, 0x5a827999, w6 0: loadrc k0.4s, 0x5a827999, w6
loadrc k1.4s, 0x6ed9eba1, w6 loadrc k1.4s, 0x6ed9eba1, w6
loadrc k2.4s, 0x8f1bbcdc, w6 loadrc k2.4s, 0x8f1bbcdc, w6
loadrc k3.4s, 0xca62c1d6, w6 loadrc k3.4s, 0xca62c1d6, w6
/* load state */ /* load state */
ld1 {dgav.4s}, [x0] ld1 {dgav.4s}, [x19]
ldr dgb, [x0, #16] ldr dgb, [x19, #16]
/* load sha1_ce_state::finalize */ /* load sha1_ce_state::finalize */
ldr_l w4, sha1_ce_offsetof_finalize, x4 ldr_l w4, sha1_ce_offsetof_finalize, x4
ldr w4, [x0, x4] ldr w4, [x19, x4]
/* load input */ /* load input */
0: ld1 {v8.4s-v11.4s}, [x1], #64 1: ld1 {v8.4s-v11.4s}, [x20], #64
sub w2, w2, #1 sub w21, w21, #1
CPU_LE( rev32 v8.16b, v8.16b ) CPU_LE( rev32 v8.16b, v8.16b )
CPU_LE( rev32 v9.16b, v9.16b ) CPU_LE( rev32 v9.16b, v9.16b )
CPU_LE( rev32 v10.16b, v10.16b ) CPU_LE( rev32 v10.16b, v10.16b )
CPU_LE( rev32 v11.16b, v11.16b ) CPU_LE( rev32 v11.16b, v11.16b )
1: add t0.4s, v8.4s, k0.4s 2: add t0.4s, v8.4s, k0.4s
mov dg0v.16b, dgav.16b mov dg0v.16b, dgav.16b
add_update c, ev, k0, 8, 9, 10, 11, dgb add_update c, ev, k0, 8, 9, 10, 11, dgb
...@@ -123,16 +129,25 @@ CPU_LE( rev32 v11.16b, v11.16b ) ...@@ -123,16 +129,25 @@ CPU_LE( rev32 v11.16b, v11.16b )
add dgbv.2s, dgbv.2s, dg1v.2s add dgbv.2s, dgbv.2s, dg1v.2s
add dgav.4s, dgav.4s, dg0v.4s add dgav.4s, dgav.4s, dg0v.4s
cbnz w2, 0b cbz w21, 3f
if_will_cond_yield_neon
st1 {dgav.4s}, [x19]
str dgb, [x19, #16]
do_cond_yield_neon
b 0b
endif_yield_neon
b 1b
/* /*
* Final block: add padding and total bit count. * Final block: add padding and total bit count.
* Skip if the input size was not a round multiple of the block size, * Skip if the input size was not a round multiple of the block size,
* the padding is handled by the C code in that case. * the padding is handled by the C code in that case.
*/ */
cbz x4, 3f 3: cbz x4, 4f
ldr_l w4, sha1_ce_offsetof_count, x4 ldr_l w4, sha1_ce_offsetof_count, x4
ldr x4, [x0, x4] ldr x4, [x19, x4]
movi v9.2d, #0 movi v9.2d, #0
mov x8, #0x80000000 mov x8, #0x80000000
movi v10.2d, #0 movi v10.2d, #0
...@@ -141,10 +156,11 @@ CPU_LE( rev32 v11.16b, v11.16b ) ...@@ -141,10 +156,11 @@ CPU_LE( rev32 v11.16b, v11.16b )
mov x4, #0 mov x4, #0
mov v11.d[0], xzr mov v11.d[0], xzr
mov v11.d[1], x7 mov v11.d[1], x7
b 1b b 2b
/* store new state */ /* store new state */
3: st1 {dgav.4s}, [x0] 4: st1 {dgav.4s}, [x19]
str dgb, [x0, #16] str dgb, [x19, #16]
frame_pop
ret ret
ENDPROC(sha1_ce_transform) ENDPROC(sha1_ce_transform)
...@@ -79,30 +79,36 @@ ...@@ -79,30 +79,36 @@
*/ */
.text .text
ENTRY(sha2_ce_transform) ENTRY(sha2_ce_transform)
frame_push 3
mov x19, x0
mov x20, x1
mov x21, x2
/* load round constants */ /* load round constants */
adr_l x8, .Lsha2_rcon 0: adr_l x8, .Lsha2_rcon
ld1 { v0.4s- v3.4s}, [x8], #64 ld1 { v0.4s- v3.4s}, [x8], #64
ld1 { v4.4s- v7.4s}, [x8], #64 ld1 { v4.4s- v7.4s}, [x8], #64
ld1 { v8.4s-v11.4s}, [x8], #64 ld1 { v8.4s-v11.4s}, [x8], #64
ld1 {v12.4s-v15.4s}, [x8] ld1 {v12.4s-v15.4s}, [x8]
/* load state */ /* load state */
ld1 {dgav.4s, dgbv.4s}, [x0] ld1 {dgav.4s, dgbv.4s}, [x19]
/* load sha256_ce_state::finalize */ /* load sha256_ce_state::finalize */
ldr_l w4, sha256_ce_offsetof_finalize, x4 ldr_l w4, sha256_ce_offsetof_finalize, x4
ldr w4, [x0, x4] ldr w4, [x19, x4]
/* load input */ /* load input */
0: ld1 {v16.4s-v19.4s}, [x1], #64 1: ld1 {v16.4s-v19.4s}, [x20], #64
sub w2, w2, #1 sub w21, w21, #1
CPU_LE( rev32 v16.16b, v16.16b ) CPU_LE( rev32 v16.16b, v16.16b )
CPU_LE( rev32 v17.16b, v17.16b ) CPU_LE( rev32 v17.16b, v17.16b )
CPU_LE( rev32 v18.16b, v18.16b ) CPU_LE( rev32 v18.16b, v18.16b )
CPU_LE( rev32 v19.16b, v19.16b ) CPU_LE( rev32 v19.16b, v19.16b )
1: add t0.4s, v16.4s, v0.4s 2: add t0.4s, v16.4s, v0.4s
mov dg0v.16b, dgav.16b mov dg0v.16b, dgav.16b
mov dg1v.16b, dgbv.16b mov dg1v.16b, dgbv.16b
...@@ -131,16 +137,24 @@ CPU_LE( rev32 v19.16b, v19.16b ) ...@@ -131,16 +137,24 @@ CPU_LE( rev32 v19.16b, v19.16b )
add dgbv.4s, dgbv.4s, dg1v.4s add dgbv.4s, dgbv.4s, dg1v.4s
/* handled all input blocks? */ /* handled all input blocks? */
cbnz w2, 0b cbz w21, 3f
if_will_cond_yield_neon
st1 {dgav.4s, dgbv.4s}, [x19]
do_cond_yield_neon
b 0b
endif_yield_neon
b 1b
/* /*
* Final block: add padding and total bit count. * Final block: add padding and total bit count.
* Skip if the input size was not a round multiple of the block size, * Skip if the input size was not a round multiple of the block size,
* the padding is handled by the C code in that case. * the padding is handled by the C code in that case.
*/ */
cbz x4, 3f 3: cbz x4, 4f
ldr_l w4, sha256_ce_offsetof_count, x4 ldr_l w4, sha256_ce_offsetof_count, x4
ldr x4, [x0, x4] ldr x4, [x19, x4]
movi v17.2d, #0 movi v17.2d, #0
mov x8, #0x80000000 mov x8, #0x80000000
movi v18.2d, #0 movi v18.2d, #0
...@@ -149,9 +163,10 @@ CPU_LE( rev32 v19.16b, v19.16b ) ...@@ -149,9 +163,10 @@ CPU_LE( rev32 v19.16b, v19.16b )
mov x4, #0 mov x4, #0
mov v19.d[0], xzr mov v19.d[0], xzr
mov v19.d[1], x7 mov v19.d[1], x7
b 1b b 2b
/* store new state */ /* store new state */
3: st1 {dgav.4s, dgbv.4s}, [x0] 4: st1 {dgav.4s, dgbv.4s}, [x19]
frame_pop
ret ret
ENDPROC(sha2_ce_transform) ENDPROC(sha2_ce_transform)
// SPDX-License-Identifier: GPL-2.0
// This code is taken from the OpenSSL project but the author (Andy Polyakov)
// has relicensed it under the GPLv2. Therefore this program is free software;
// you can redistribute it and/or modify it under the terms of the GNU General
// Public License version 2 as published by the Free Software Foundation.
//
// The original headers, including the original license headers, are
// included below for completeness.
// Copyright 2014-2016 The OpenSSL Project Authors. All Rights Reserved. // Copyright 2014-2016 The OpenSSL Project Authors. All Rights Reserved.
// //
// Licensed under the OpenSSL license (the "License"). You may not use // Licensed under the OpenSSL license (the "License"). You may not use
...@@ -10,8 +20,6 @@ ...@@ -10,8 +20,6 @@
// project. The module is, however, dual licensed under OpenSSL and // project. The module is, however, dual licensed under OpenSSL and
// CRYPTOGAMS licenses depending on where you obtain it. For further // CRYPTOGAMS licenses depending on where you obtain it. For further
// details see http://www.openssl.org/~appro/cryptogams/. // details see http://www.openssl.org/~appro/cryptogams/.
//
// Permission to use under GPLv2 terms is granted.
// ==================================================================== // ====================================================================
// //
// SHA256/512 for ARMv8. // SHA256/512 for ARMv8.
......
...@@ -41,9 +41,16 @@ ...@@ -41,9 +41,16 @@
*/ */
.text .text
ENTRY(sha3_ce_transform) ENTRY(sha3_ce_transform)
/* load state */ frame_push 4
add x8, x0, #32
ld1 { v0.1d- v3.1d}, [x0] mov x19, x0
mov x20, x1
mov x21, x2
mov x22, x3
0: /* load state */
add x8, x19, #32
ld1 { v0.1d- v3.1d}, [x19]
ld1 { v4.1d- v7.1d}, [x8], #32 ld1 { v4.1d- v7.1d}, [x8], #32
ld1 { v8.1d-v11.1d}, [x8], #32 ld1 { v8.1d-v11.1d}, [x8], #32
ld1 {v12.1d-v15.1d}, [x8], #32 ld1 {v12.1d-v15.1d}, [x8], #32
...@@ -51,13 +58,13 @@ ENTRY(sha3_ce_transform) ...@@ -51,13 +58,13 @@ ENTRY(sha3_ce_transform)
ld1 {v20.1d-v23.1d}, [x8], #32 ld1 {v20.1d-v23.1d}, [x8], #32
ld1 {v24.1d}, [x8] ld1 {v24.1d}, [x8]
0: sub w2, w2, #1 1: sub w21, w21, #1
mov w8, #24 mov w8, #24
adr_l x9, .Lsha3_rcon adr_l x9, .Lsha3_rcon
/* load input */ /* load input */
ld1 {v25.8b-v28.8b}, [x1], #32 ld1 {v25.8b-v28.8b}, [x20], #32
ld1 {v29.8b-v31.8b}, [x1], #24 ld1 {v29.8b-v31.8b}, [x20], #24
eor v0.8b, v0.8b, v25.8b eor v0.8b, v0.8b, v25.8b
eor v1.8b, v1.8b, v26.8b eor v1.8b, v1.8b, v26.8b
eor v2.8b, v2.8b, v27.8b eor v2.8b, v2.8b, v27.8b
...@@ -66,10 +73,10 @@ ENTRY(sha3_ce_transform) ...@@ -66,10 +73,10 @@ ENTRY(sha3_ce_transform)
eor v5.8b, v5.8b, v30.8b eor v5.8b, v5.8b, v30.8b
eor v6.8b, v6.8b, v31.8b eor v6.8b, v6.8b, v31.8b
tbnz x3, #6, 2f // SHA3-512 tbnz x22, #6, 3f // SHA3-512
ld1 {v25.8b-v28.8b}, [x1], #32 ld1 {v25.8b-v28.8b}, [x20], #32
ld1 {v29.8b-v30.8b}, [x1], #16 ld1 {v29.8b-v30.8b}, [x20], #16
eor v7.8b, v7.8b, v25.8b eor v7.8b, v7.8b, v25.8b
eor v8.8b, v8.8b, v26.8b eor v8.8b, v8.8b, v26.8b
eor v9.8b, v9.8b, v27.8b eor v9.8b, v9.8b, v27.8b
...@@ -77,34 +84,34 @@ ENTRY(sha3_ce_transform) ...@@ -77,34 +84,34 @@ ENTRY(sha3_ce_transform)
eor v11.8b, v11.8b, v29.8b eor v11.8b, v11.8b, v29.8b
eor v12.8b, v12.8b, v30.8b eor v12.8b, v12.8b, v30.8b
tbnz x3, #4, 1f // SHA3-384 or SHA3-224 tbnz x22, #4, 2f // SHA3-384 or SHA3-224
// SHA3-256 // SHA3-256
ld1 {v25.8b-v28.8b}, [x1], #32 ld1 {v25.8b-v28.8b}, [x20], #32
eor v13.8b, v13.8b, v25.8b eor v13.8b, v13.8b, v25.8b
eor v14.8b, v14.8b, v26.8b eor v14.8b, v14.8b, v26.8b
eor v15.8b, v15.8b, v27.8b eor v15.8b, v15.8b, v27.8b
eor v16.8b, v16.8b, v28.8b eor v16.8b, v16.8b, v28.8b
b 3f b 4f
1: tbz x3, #2, 3f // bit 2 cleared? SHA-384 2: tbz x22, #2, 4f // bit 2 cleared? SHA-384
// SHA3-224 // SHA3-224
ld1 {v25.8b-v28.8b}, [x1], #32 ld1 {v25.8b-v28.8b}, [x20], #32
ld1 {v29.8b}, [x1], #8 ld1 {v29.8b}, [x20], #8
eor v13.8b, v13.8b, v25.8b eor v13.8b, v13.8b, v25.8b
eor v14.8b, v14.8b, v26.8b eor v14.8b, v14.8b, v26.8b
eor v15.8b, v15.8b, v27.8b eor v15.8b, v15.8b, v27.8b
eor v16.8b, v16.8b, v28.8b eor v16.8b, v16.8b, v28.8b
eor v17.8b, v17.8b, v29.8b eor v17.8b, v17.8b, v29.8b
b 3f b 4f
// SHA3-512 // SHA3-512
2: ld1 {v25.8b-v26.8b}, [x1], #16 3: ld1 {v25.8b-v26.8b}, [x20], #16
eor v7.8b, v7.8b, v25.8b eor v7.8b, v7.8b, v25.8b
eor v8.8b, v8.8b, v26.8b eor v8.8b, v8.8b, v26.8b
3: sub w8, w8, #1 4: sub w8, w8, #1
eor3 v29.16b, v4.16b, v9.16b, v14.16b eor3 v29.16b, v4.16b, v9.16b, v14.16b
eor3 v26.16b, v1.16b, v6.16b, v11.16b eor3 v26.16b, v1.16b, v6.16b, v11.16b
...@@ -183,17 +190,33 @@ ENTRY(sha3_ce_transform) ...@@ -183,17 +190,33 @@ ENTRY(sha3_ce_transform)
eor v0.16b, v0.16b, v31.16b eor v0.16b, v0.16b, v31.16b
cbnz w8, 3b cbnz w8, 4b
cbnz w2, 0b cbz w21, 5f
if_will_cond_yield_neon
add x8, x19, #32
st1 { v0.1d- v3.1d}, [x19]
st1 { v4.1d- v7.1d}, [x8], #32
st1 { v8.1d-v11.1d}, [x8], #32
st1 {v12.1d-v15.1d}, [x8], #32
st1 {v16.1d-v19.1d}, [x8], #32
st1 {v20.1d-v23.1d}, [x8], #32
st1 {v24.1d}, [x8]
do_cond_yield_neon
b 0b
endif_yield_neon
b 1b
/* save state */ /* save state */
st1 { v0.1d- v3.1d}, [x0], #32 5: st1 { v0.1d- v3.1d}, [x19], #32
st1 { v4.1d- v7.1d}, [x0], #32 st1 { v4.1d- v7.1d}, [x19], #32
st1 { v8.1d-v11.1d}, [x0], #32 st1 { v8.1d-v11.1d}, [x19], #32
st1 {v12.1d-v15.1d}, [x0], #32 st1 {v12.1d-v15.1d}, [x19], #32
st1 {v16.1d-v19.1d}, [x0], #32 st1 {v16.1d-v19.1d}, [x19], #32
st1 {v20.1d-v23.1d}, [x0], #32 st1 {v20.1d-v23.1d}, [x19], #32
st1 {v24.1d}, [x0] st1 {v24.1d}, [x19]
frame_pop
ret ret
ENDPROC(sha3_ce_transform) ENDPROC(sha3_ce_transform)
......
#! /usr/bin/env perl #! /usr/bin/env perl
# SPDX-License-Identifier: GPL-2.0
# This code is taken from the OpenSSL project but the author (Andy Polyakov)
# has relicensed it under the GPLv2. Therefore this program is free software;
# you can redistribute it and/or modify it under the terms of the GNU General
# Public License version 2 as published by the Free Software Foundation.
#
# The original headers, including the original license headers, are
# included below for completeness.
# Copyright 2014-2016 The OpenSSL Project Authors. All Rights Reserved. # Copyright 2014-2016 The OpenSSL Project Authors. All Rights Reserved.
# #
# Licensed under the OpenSSL license (the "License"). You may not use # Licensed under the OpenSSL license (the "License"). You may not use
...@@ -11,8 +21,6 @@ ...@@ -11,8 +21,6 @@
# project. The module is, however, dual licensed under OpenSSL and # project. The module is, however, dual licensed under OpenSSL and
# CRYPTOGAMS licenses depending on where you obtain it. For further # CRYPTOGAMS licenses depending on where you obtain it. For further
# details see http://www.openssl.org/~appro/cryptogams/. # details see http://www.openssl.org/~appro/cryptogams/.
#
# Permission to use under GPLv2 terms is granted.
# ==================================================================== # ====================================================================
# #
# SHA256/512 for ARMv8. # SHA256/512 for ARMv8.
......
...@@ -107,17 +107,23 @@ ...@@ -107,17 +107,23 @@
*/ */
.text .text
ENTRY(sha512_ce_transform) ENTRY(sha512_ce_transform)
frame_push 3
mov x19, x0
mov x20, x1
mov x21, x2
/* load state */ /* load state */
ld1 {v8.2d-v11.2d}, [x0] 0: ld1 {v8.2d-v11.2d}, [x19]
/* load first 4 round constants */ /* load first 4 round constants */
adr_l x3, .Lsha512_rcon adr_l x3, .Lsha512_rcon
ld1 {v20.2d-v23.2d}, [x3], #64 ld1 {v20.2d-v23.2d}, [x3], #64
/* load input */ /* load input */
0: ld1 {v12.2d-v15.2d}, [x1], #64 1: ld1 {v12.2d-v15.2d}, [x20], #64
ld1 {v16.2d-v19.2d}, [x1], #64 ld1 {v16.2d-v19.2d}, [x20], #64
sub w2, w2, #1 sub w21, w21, #1
CPU_LE( rev64 v12.16b, v12.16b ) CPU_LE( rev64 v12.16b, v12.16b )
CPU_LE( rev64 v13.16b, v13.16b ) CPU_LE( rev64 v13.16b, v13.16b )
...@@ -196,9 +202,18 @@ CPU_LE( rev64 v19.16b, v19.16b ) ...@@ -196,9 +202,18 @@ CPU_LE( rev64 v19.16b, v19.16b )
add v11.2d, v11.2d, v3.2d add v11.2d, v11.2d, v3.2d
/* handled all input blocks? */ /* handled all input blocks? */
cbnz w2, 0b cbz w21, 3f
if_will_cond_yield_neon
st1 {v8.2d-v11.2d}, [x19]
do_cond_yield_neon
b 0b
endif_yield_neon
b 1b
/* store new state */ /* store new state */
3: st1 {v8.2d-v11.2d}, [x0] 3: st1 {v8.2d-v11.2d}, [x19]
frame_pop
ret ret
ENDPROC(sha512_ce_transform) ENDPROC(sha512_ce_transform)
// SPDX-License-Identifier: GPL-2.0
// This code is taken from the OpenSSL project but the author (Andy Polyakov)
// has relicensed it under the GPLv2. Therefore this program is free software;
// you can redistribute it and/or modify it under the terms of the GNU General
// Public License version 2 as published by the Free Software Foundation.
//
// The original headers, including the original license headers, are
// included below for completeness.
// Copyright 2014-2016 The OpenSSL Project Authors. All Rights Reserved. // Copyright 2014-2016 The OpenSSL Project Authors. All Rights Reserved.
// //
// Licensed under the OpenSSL license (the "License"). You may not use // Licensed under the OpenSSL license (the "License"). You may not use
...@@ -10,8 +20,6 @@ ...@@ -10,8 +20,6 @@
// project. The module is, however, dual licensed under OpenSSL and // project. The module is, however, dual licensed under OpenSSL and
// CRYPTOGAMS licenses depending on where you obtain it. For further // CRYPTOGAMS licenses depending on where you obtain it. For further
// details see http://www.openssl.org/~appro/cryptogams/. // details see http://www.openssl.org/~appro/cryptogams/.
//
// Permission to use under GPLv2 terms is granted.
// ==================================================================== // ====================================================================
// //
// SHA256/512 for ARMv8. // SHA256/512 for ARMv8.
......
// SPDX-License-Identifier: GPL-2.0
#include <linux/linkage.h>
#include <asm/assembler.h>
.irp b, 0, 1, 2, 3, 4, 5, 6, 7, 8
.set .Lv\b\().4s, \b
.endr
.macro sm4e, rd, rn
.inst 0xcec08400 | .L\rd | (.L\rn << 5)
.endm
/*
* void sm4_ce_do_crypt(const u32 *rk, u32 *out, const u32 *in);
*/
.text
ENTRY(sm4_ce_do_crypt)
ld1 {v8.4s}, [x2]
ld1 {v0.4s-v3.4s}, [x0], #64
CPU_LE( rev32 v8.16b, v8.16b )
ld1 {v4.4s-v7.4s}, [x0]
sm4e v8.4s, v0.4s
sm4e v8.4s, v1.4s
sm4e v8.4s, v2.4s
sm4e v8.4s, v3.4s
sm4e v8.4s, v4.4s
sm4e v8.4s, v5.4s
sm4e v8.4s, v6.4s
sm4e v8.4s, v7.4s
rev64 v8.4s, v8.4s
ext v8.16b, v8.16b, v8.16b, #8
CPU_LE( rev32 v8.16b, v8.16b )
st1 {v8.4s}, [x1]
ret
ENDPROC(sm4_ce_do_crypt)
// SPDX-License-Identifier: GPL-2.0
#include <asm/neon.h>
#include <asm/simd.h>
#include <crypto/sm4.h>
#include <linux/module.h>
#include <linux/cpufeature.h>
#include <linux/crypto.h>
#include <linux/types.h>
MODULE_ALIAS_CRYPTO("sm4");
MODULE_ALIAS_CRYPTO("sm4-ce");
MODULE_DESCRIPTION("SM4 symmetric cipher using ARMv8 Crypto Extensions");
MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
MODULE_LICENSE("GPL v2");
asmlinkage void sm4_ce_do_crypt(const u32 *rk, void *out, const void *in);
static void sm4_ce_encrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
{
const struct crypto_sm4_ctx *ctx = crypto_tfm_ctx(tfm);
if (!may_use_simd()) {
crypto_sm4_encrypt(tfm, out, in);
} else {
kernel_neon_begin();
sm4_ce_do_crypt(ctx->rkey_enc, out, in);
kernel_neon_end();
}
}
static void sm4_ce_decrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
{
const struct crypto_sm4_ctx *ctx = crypto_tfm_ctx(tfm);
if (!may_use_simd()) {
crypto_sm4_decrypt(tfm, out, in);
} else {
kernel_neon_begin();
sm4_ce_do_crypt(ctx->rkey_dec, out, in);
kernel_neon_end();
}
}
static struct crypto_alg sm4_ce_alg = {
.cra_name = "sm4",
.cra_driver_name = "sm4-ce",
.cra_priority = 200,
.cra_flags = CRYPTO_ALG_TYPE_CIPHER,
.cra_blocksize = SM4_BLOCK_SIZE,
.cra_ctxsize = sizeof(struct crypto_sm4_ctx),
.cra_module = THIS_MODULE,
.cra_u.cipher = {
.cia_min_keysize = SM4_KEY_SIZE,
.cia_max_keysize = SM4_KEY_SIZE,
.cia_setkey = crypto_sm4_set_key,
.cia_encrypt = sm4_ce_encrypt,
.cia_decrypt = sm4_ce_decrypt
}
};
static int __init sm4_ce_mod_init(void)
{
return crypto_register_alg(&sm4_ce_alg);
}
static void __exit sm4_ce_mod_fini(void)
{
crypto_unregister_alg(&sm4_ce_alg);
}
module_cpu_feature_match(SM3, sm4_ce_mod_init);
module_exit(sm4_ce_mod_fini);
...@@ -15,7 +15,6 @@ obj-$(CONFIG_CRYPTO_GLUE_HELPER_X86) += glue_helper.o ...@@ -15,7 +15,6 @@ obj-$(CONFIG_CRYPTO_GLUE_HELPER_X86) += glue_helper.o
obj-$(CONFIG_CRYPTO_AES_586) += aes-i586.o obj-$(CONFIG_CRYPTO_AES_586) += aes-i586.o
obj-$(CONFIG_CRYPTO_TWOFISH_586) += twofish-i586.o obj-$(CONFIG_CRYPTO_TWOFISH_586) += twofish-i586.o
obj-$(CONFIG_CRYPTO_SALSA20_586) += salsa20-i586.o
obj-$(CONFIG_CRYPTO_SERPENT_SSE2_586) += serpent-sse2-i586.o obj-$(CONFIG_CRYPTO_SERPENT_SSE2_586) += serpent-sse2-i586.o
obj-$(CONFIG_CRYPTO_AES_X86_64) += aes-x86_64.o obj-$(CONFIG_CRYPTO_AES_X86_64) += aes-x86_64.o
...@@ -24,7 +23,6 @@ obj-$(CONFIG_CRYPTO_CAMELLIA_X86_64) += camellia-x86_64.o ...@@ -24,7 +23,6 @@ obj-$(CONFIG_CRYPTO_CAMELLIA_X86_64) += camellia-x86_64.o
obj-$(CONFIG_CRYPTO_BLOWFISH_X86_64) += blowfish-x86_64.o obj-$(CONFIG_CRYPTO_BLOWFISH_X86_64) += blowfish-x86_64.o
obj-$(CONFIG_CRYPTO_TWOFISH_X86_64) += twofish-x86_64.o obj-$(CONFIG_CRYPTO_TWOFISH_X86_64) += twofish-x86_64.o
obj-$(CONFIG_CRYPTO_TWOFISH_X86_64_3WAY) += twofish-x86_64-3way.o obj-$(CONFIG_CRYPTO_TWOFISH_X86_64_3WAY) += twofish-x86_64-3way.o
obj-$(CONFIG_CRYPTO_SALSA20_X86_64) += salsa20-x86_64.o
obj-$(CONFIG_CRYPTO_CHACHA20_X86_64) += chacha20-x86_64.o obj-$(CONFIG_CRYPTO_CHACHA20_X86_64) += chacha20-x86_64.o
obj-$(CONFIG_CRYPTO_SERPENT_SSE2_X86_64) += serpent-sse2-x86_64.o obj-$(CONFIG_CRYPTO_SERPENT_SSE2_X86_64) += serpent-sse2-x86_64.o
obj-$(CONFIG_CRYPTO_AES_NI_INTEL) += aesni-intel.o obj-$(CONFIG_CRYPTO_AES_NI_INTEL) += aesni-intel.o
...@@ -38,6 +36,16 @@ obj-$(CONFIG_CRYPTO_SHA512_SSSE3) += sha512-ssse3.o ...@@ -38,6 +36,16 @@ obj-$(CONFIG_CRYPTO_SHA512_SSSE3) += sha512-ssse3.o
obj-$(CONFIG_CRYPTO_CRCT10DIF_PCLMUL) += crct10dif-pclmul.o obj-$(CONFIG_CRYPTO_CRCT10DIF_PCLMUL) += crct10dif-pclmul.o
obj-$(CONFIG_CRYPTO_POLY1305_X86_64) += poly1305-x86_64.o obj-$(CONFIG_CRYPTO_POLY1305_X86_64) += poly1305-x86_64.o
obj-$(CONFIG_CRYPTO_AEGIS128_AESNI_SSE2) += aegis128-aesni.o
obj-$(CONFIG_CRYPTO_AEGIS128L_AESNI_SSE2) += aegis128l-aesni.o
obj-$(CONFIG_CRYPTO_AEGIS256_AESNI_SSE2) += aegis256-aesni.o
obj-$(CONFIG_CRYPTO_MORUS640_GLUE) += morus640_glue.o
obj-$(CONFIG_CRYPTO_MORUS1280_GLUE) += morus1280_glue.o
obj-$(CONFIG_CRYPTO_MORUS640_SSE2) += morus640-sse2.o
obj-$(CONFIG_CRYPTO_MORUS1280_SSE2) += morus1280-sse2.o
# These modules require assembler to support AVX. # These modules require assembler to support AVX.
ifeq ($(avx_supported),yes) ifeq ($(avx_supported),yes)
obj-$(CONFIG_CRYPTO_CAMELLIA_AESNI_AVX_X86_64) += \ obj-$(CONFIG_CRYPTO_CAMELLIA_AESNI_AVX_X86_64) += \
...@@ -55,11 +63,12 @@ ifeq ($(avx2_supported),yes) ...@@ -55,11 +63,12 @@ ifeq ($(avx2_supported),yes)
obj-$(CONFIG_CRYPTO_SHA1_MB) += sha1-mb/ obj-$(CONFIG_CRYPTO_SHA1_MB) += sha1-mb/
obj-$(CONFIG_CRYPTO_SHA256_MB) += sha256-mb/ obj-$(CONFIG_CRYPTO_SHA256_MB) += sha256-mb/
obj-$(CONFIG_CRYPTO_SHA512_MB) += sha512-mb/ obj-$(CONFIG_CRYPTO_SHA512_MB) += sha512-mb/
obj-$(CONFIG_CRYPTO_MORUS1280_AVX2) += morus1280-avx2.o
endif endif
aes-i586-y := aes-i586-asm_32.o aes_glue.o aes-i586-y := aes-i586-asm_32.o aes_glue.o
twofish-i586-y := twofish-i586-asm_32.o twofish_glue.o twofish-i586-y := twofish-i586-asm_32.o twofish_glue.o
salsa20-i586-y := salsa20-i586-asm_32.o salsa20_glue.o
serpent-sse2-i586-y := serpent-sse2-i586-asm_32.o serpent_sse2_glue.o serpent-sse2-i586-y := serpent-sse2-i586-asm_32.o serpent_sse2_glue.o
aes-x86_64-y := aes-x86_64-asm_64.o aes_glue.o aes-x86_64-y := aes-x86_64-asm_64.o aes_glue.o
...@@ -68,10 +77,16 @@ camellia-x86_64-y := camellia-x86_64-asm_64.o camellia_glue.o ...@@ -68,10 +77,16 @@ camellia-x86_64-y := camellia-x86_64-asm_64.o camellia_glue.o
blowfish-x86_64-y := blowfish-x86_64-asm_64.o blowfish_glue.o blowfish-x86_64-y := blowfish-x86_64-asm_64.o blowfish_glue.o
twofish-x86_64-y := twofish-x86_64-asm_64.o twofish_glue.o twofish-x86_64-y := twofish-x86_64-asm_64.o twofish_glue.o
twofish-x86_64-3way-y := twofish-x86_64-asm_64-3way.o twofish_glue_3way.o twofish-x86_64-3way-y := twofish-x86_64-asm_64-3way.o twofish_glue_3way.o
salsa20-x86_64-y := salsa20-x86_64-asm_64.o salsa20_glue.o
chacha20-x86_64-y := chacha20-ssse3-x86_64.o chacha20_glue.o chacha20-x86_64-y := chacha20-ssse3-x86_64.o chacha20_glue.o
serpent-sse2-x86_64-y := serpent-sse2-x86_64-asm_64.o serpent_sse2_glue.o serpent-sse2-x86_64-y := serpent-sse2-x86_64-asm_64.o serpent_sse2_glue.o
aegis128-aesni-y := aegis128-aesni-asm.o aegis128-aesni-glue.o
aegis128l-aesni-y := aegis128l-aesni-asm.o aegis128l-aesni-glue.o
aegis256-aesni-y := aegis256-aesni-asm.o aegis256-aesni-glue.o
morus640-sse2-y := morus640-sse2-asm.o morus640-sse2-glue.o
morus1280-sse2-y := morus1280-sse2-asm.o morus1280-sse2-glue.o
ifeq ($(avx_supported),yes) ifeq ($(avx_supported),yes)
camellia-aesni-avx-x86_64-y := camellia-aesni-avx-asm_64.o \ camellia-aesni-avx-x86_64-y := camellia-aesni-avx-asm_64.o \
camellia_aesni_avx_glue.o camellia_aesni_avx_glue.o
...@@ -87,6 +102,8 @@ ifeq ($(avx2_supported),yes) ...@@ -87,6 +102,8 @@ ifeq ($(avx2_supported),yes)
camellia-aesni-avx2-y := camellia-aesni-avx2-asm_64.o camellia_aesni_avx2_glue.o camellia-aesni-avx2-y := camellia-aesni-avx2-asm_64.o camellia_aesni_avx2_glue.o
chacha20-x86_64-y += chacha20-avx2-x86_64.o chacha20-x86_64-y += chacha20-avx2-x86_64.o
serpent-avx2-y := serpent-avx2-asm_64.o serpent_avx2_glue.o serpent-avx2-y := serpent-avx2-asm_64.o serpent_avx2_glue.o
morus1280-avx2-y := morus1280-avx2-asm.o morus1280-avx2-glue.o
endif endif
aesni-intel-y := aesni-intel_asm.o aesni-intel_glue.o fpu.o aesni-intel-y := aesni-intel_asm.o aesni-intel_glue.o fpu.o
......
/*
* AES-NI + SSE2 implementation of AEGIS-128
*
* Copyright (c) 2017-2018 Ondrej Mosnacek <omosnacek@gmail.com>
* Copyright (C) 2017-2018 Red Hat, Inc. All rights reserved.
*
* This program is free software; you can redistribute it and/or modify it
* under the terms of the GNU General Public License version 2 as published
* by the Free Software Foundation.
*/
#include <linux/linkage.h>
#include <asm/frame.h>
#define STATE0 %xmm0
#define STATE1 %xmm1
#define STATE2 %xmm2
#define STATE3 %xmm3
#define STATE4 %xmm4
#define KEY %xmm5
#define MSG %xmm5
#define T0 %xmm6
#define T1 %xmm7
#define STATEP %rdi
#define LEN %rsi
#define SRC %rdx
#define DST %rcx
.section .rodata.cst16.aegis128_const, "aM", @progbits, 32
.align 16
.Laegis128_const_0:
.byte 0x00, 0x01, 0x01, 0x02, 0x03, 0x05, 0x08, 0x0d
.byte 0x15, 0x22, 0x37, 0x59, 0x90, 0xe9, 0x79, 0x62
.Laegis128_const_1:
.byte 0xdb, 0x3d, 0x18, 0x55, 0x6d, 0xc2, 0x2f, 0xf1
.byte 0x20, 0x11, 0x31, 0x42, 0x73, 0xb5, 0x28, 0xdd
.section .rodata.cst16.aegis128_counter, "aM", @progbits, 16
.align 16
.Laegis128_counter:
.byte 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07
.byte 0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f
.text
/*
* aegis128_update
* input:
* STATE[0-4] - input state
* output:
* STATE[0-4] - output state (shifted positions)
* changed:
* T0
*/
.macro aegis128_update
movdqa STATE4, T0
aesenc STATE0, STATE4
aesenc STATE1, STATE0
aesenc STATE2, STATE1
aesenc STATE3, STATE2
aesenc T0, STATE3
.endm
/*
* __load_partial: internal ABI
* input:
* LEN - bytes
* SRC - src
* output:
* MSG - message block
* changed:
* T0
* %r8
* %r9
*/
__load_partial:
xor %r9, %r9
pxor MSG, MSG
mov LEN, %r8
and $0x1, %r8
jz .Lld_partial_1
mov LEN, %r8
and $0x1E, %r8
add SRC, %r8
mov (%r8), %r9b
.Lld_partial_1:
mov LEN, %r8
and $0x2, %r8
jz .Lld_partial_2
mov LEN, %r8
and $0x1C, %r8
add SRC, %r8
shl $0x10, %r9
mov (%r8), %r9w
.Lld_partial_2:
mov LEN, %r8
and $0x4, %r8
jz .Lld_partial_4
mov LEN, %r8
and $0x18, %r8
add SRC, %r8
shl $32, %r9
mov (%r8), %r8d
xor %r8, %r9
.Lld_partial_4:
movq %r9, MSG
mov LEN, %r8
and $0x8, %r8
jz .Lld_partial_8
mov LEN, %r8
and $0x10, %r8
add SRC, %r8
pslldq $8, MSG
movq (%r8), T0
pxor T0, MSG
.Lld_partial_8:
ret
ENDPROC(__load_partial)
/*
* __store_partial: internal ABI
* input:
* LEN - bytes
* DST - dst
* output:
* T0 - message block
* changed:
* %r8
* %r9
* %r10
*/
__store_partial:
mov LEN, %r8
mov DST, %r9
movq T0, %r10
cmp $8, %r8
jl .Lst_partial_8
mov %r10, (%r9)
psrldq $8, T0
movq T0, %r10
sub $8, %r8
add $8, %r9
.Lst_partial_8:
cmp $4, %r8
jl .Lst_partial_4
mov %r10d, (%r9)
shr $32, %r10
sub $4, %r8
add $4, %r9
.Lst_partial_4:
cmp $2, %r8
jl .Lst_partial_2
mov %r10w, (%r9)
shr $0x10, %r10
sub $2, %r8
add $2, %r9
.Lst_partial_2:
cmp $1, %r8
jl .Lst_partial_1
mov %r10b, (%r9)
.Lst_partial_1:
ret
ENDPROC(__store_partial)
/*
* void crypto_aegis128_aesni_init(void *state, const void *key, const void *iv);
*/
ENTRY(crypto_aegis128_aesni_init)
FRAME_BEGIN
/* load IV: */
movdqu (%rdx), T1
/* load key: */
movdqa (%rsi), KEY
pxor KEY, T1
movdqa T1, STATE0
movdqa KEY, STATE3
movdqa KEY, STATE4
/* load the constants: */
movdqa .Laegis128_const_0, STATE2
movdqa .Laegis128_const_1, STATE1
pxor STATE2, STATE3
pxor STATE1, STATE4
/* update 10 times with KEY / KEY xor IV: */
aegis128_update; pxor KEY, STATE4
aegis128_update; pxor T1, STATE3
aegis128_update; pxor KEY, STATE2
aegis128_update; pxor T1, STATE1
aegis128_update; pxor KEY, STATE0
aegis128_update; pxor T1, STATE4
aegis128_update; pxor KEY, STATE3
aegis128_update; pxor T1, STATE2
aegis128_update; pxor KEY, STATE1
aegis128_update; pxor T1, STATE0
/* store the state: */
movdqu STATE0, 0x00(STATEP)
movdqu STATE1, 0x10(STATEP)
movdqu STATE2, 0x20(STATEP)
movdqu STATE3, 0x30(STATEP)
movdqu STATE4, 0x40(STATEP)
FRAME_END
ret
ENDPROC(crypto_aegis128_aesni_init)
/*
* void crypto_aegis128_aesni_ad(void *state, unsigned int length,
* const void *data);
*/
ENTRY(crypto_aegis128_aesni_ad)
FRAME_BEGIN
cmp $0x10, LEN
jb .Lad_out
/* load the state: */
movdqu 0x00(STATEP), STATE0
movdqu 0x10(STATEP), STATE1
movdqu 0x20(STATEP), STATE2
movdqu 0x30(STATEP), STATE3
movdqu 0x40(STATEP), STATE4
mov SRC, %r8
and $0xF, %r8
jnz .Lad_u_loop
.align 8
.Lad_a_loop:
movdqa 0x00(SRC), MSG
aegis128_update
pxor MSG, STATE4
sub $0x10, LEN
cmp $0x10, LEN
jl .Lad_out_1
movdqa 0x10(SRC), MSG
aegis128_update
pxor MSG, STATE3
sub $0x10, LEN
cmp $0x10, LEN
jl .Lad_out_2
movdqa 0x20(SRC), MSG
aegis128_update
pxor MSG, STATE2
sub $0x10, LEN
cmp $0x10, LEN
jl .Lad_out_3
movdqa 0x30(SRC), MSG
aegis128_update
pxor MSG, STATE1
sub $0x10, LEN
cmp $0x10, LEN
jl .Lad_out_4
movdqa 0x40(SRC), MSG
aegis128_update
pxor MSG, STATE0
sub $0x10, LEN
cmp $0x10, LEN
jl .Lad_out_0
add $0x50, SRC
jmp .Lad_a_loop
.align 8
.Lad_u_loop:
movdqu 0x00(SRC), MSG
aegis128_update
pxor MSG, STATE4
sub $0x10, LEN
cmp $0x10, LEN
jl .Lad_out_1
movdqu 0x10(SRC), MSG
aegis128_update
pxor MSG, STATE3
sub $0x10, LEN
cmp $0x10, LEN
jl .Lad_out_2
movdqu 0x20(SRC), MSG
aegis128_update
pxor MSG, STATE2
sub $0x10, LEN
cmp $0x10, LEN
jl .Lad_out_3
movdqu 0x30(SRC), MSG
aegis128_update
pxor MSG, STATE1
sub $0x10, LEN
cmp $0x10, LEN
jl .Lad_out_4
movdqu 0x40(SRC), MSG
aegis128_update
pxor MSG, STATE0
sub $0x10, LEN
cmp $0x10, LEN
jl .Lad_out_0
add $0x50, SRC
jmp .Lad_u_loop
/* store the state: */
.Lad_out_0:
movdqu STATE0, 0x00(STATEP)
movdqu STATE1, 0x10(STATEP)
movdqu STATE2, 0x20(STATEP)
movdqu STATE3, 0x30(STATEP)
movdqu STATE4, 0x40(STATEP)
FRAME_END
ret
.Lad_out_1:
movdqu STATE4, 0x00(STATEP)
movdqu STATE0, 0x10(STATEP)
movdqu STATE1, 0x20(STATEP)
movdqu STATE2, 0x30(STATEP)
movdqu STATE3, 0x40(STATEP)
FRAME_END
ret
.Lad_out_2:
movdqu STATE3, 0x00(STATEP)
movdqu STATE4, 0x10(STATEP)
movdqu STATE0, 0x20(STATEP)
movdqu STATE1, 0x30(STATEP)
movdqu STATE2, 0x40(STATEP)
FRAME_END
ret
.Lad_out_3:
movdqu STATE2, 0x00(STATEP)
movdqu STATE3, 0x10(STATEP)
movdqu STATE4, 0x20(STATEP)
movdqu STATE0, 0x30(STATEP)
movdqu STATE1, 0x40(STATEP)
FRAME_END
ret
.Lad_out_4:
movdqu STATE1, 0x00(STATEP)
movdqu STATE2, 0x10(STATEP)
movdqu STATE3, 0x20(STATEP)
movdqu STATE4, 0x30(STATEP)
movdqu STATE0, 0x40(STATEP)
FRAME_END
ret
.Lad_out:
FRAME_END
ret
ENDPROC(crypto_aegis128_aesni_ad)
.macro encrypt_block a s0 s1 s2 s3 s4 i
movdq\a (\i * 0x10)(SRC), MSG
movdqa MSG, T0
pxor \s1, T0
pxor \s4, T0
movdqa \s2, T1
pand \s3, T1
pxor T1, T0
movdq\a T0, (\i * 0x10)(DST)
aegis128_update
pxor MSG, \s4
sub $0x10, LEN
cmp $0x10, LEN
jl .Lenc_out_\i
.endm
/*
* void crypto_aegis128_aesni_enc(void *state, unsigned int length,
* const void *src, void *dst);
*/
ENTRY(crypto_aegis128_aesni_enc)
FRAME_BEGIN
cmp $0x10, LEN
jb .Lenc_out
/* load the state: */
movdqu 0x00(STATEP), STATE0
movdqu 0x10(STATEP), STATE1
movdqu 0x20(STATEP), STATE2
movdqu 0x30(STATEP), STATE3
movdqu 0x40(STATEP), STATE4
mov SRC, %r8
or DST, %r8
and $0xF, %r8
jnz .Lenc_u_loop
.align 8
.Lenc_a_loop:
encrypt_block a STATE0 STATE1 STATE2 STATE3 STATE4 0
encrypt_block a STATE4 STATE0 STATE1 STATE2 STATE3 1
encrypt_block a STATE3 STATE4 STATE0 STATE1 STATE2 2
encrypt_block a STATE2 STATE3 STATE4 STATE0 STATE1 3
encrypt_block a STATE1 STATE2 STATE3 STATE4 STATE0 4
add $0x50, SRC
add $0x50, DST
jmp .Lenc_a_loop
.align 8
.Lenc_u_loop:
encrypt_block u STATE0 STATE1 STATE2 STATE3 STATE4 0
encrypt_block u STATE4 STATE0 STATE1 STATE2 STATE3 1
encrypt_block u STATE3 STATE4 STATE0 STATE1 STATE2 2
encrypt_block u STATE2 STATE3 STATE4 STATE0 STATE1 3
encrypt_block u STATE1 STATE2 STATE3 STATE4 STATE0 4
add $0x50, SRC
add $0x50, DST
jmp .Lenc_u_loop
/* store the state: */
.Lenc_out_0:
movdqu STATE4, 0x00(STATEP)
movdqu STATE0, 0x10(STATEP)
movdqu STATE1, 0x20(STATEP)
movdqu STATE2, 0x30(STATEP)
movdqu STATE3, 0x40(STATEP)
FRAME_END
ret
.Lenc_out_1:
movdqu STATE3, 0x00(STATEP)
movdqu STATE4, 0x10(STATEP)
movdqu STATE0, 0x20(STATEP)
movdqu STATE1, 0x30(STATEP)
movdqu STATE2, 0x40(STATEP)
FRAME_END
ret
.Lenc_out_2:
movdqu STATE2, 0x00(STATEP)
movdqu STATE3, 0x10(STATEP)
movdqu STATE4, 0x20(STATEP)
movdqu STATE0, 0x30(STATEP)
movdqu STATE1, 0x40(STATEP)
FRAME_END
ret
.Lenc_out_3:
movdqu STATE1, 0x00(STATEP)
movdqu STATE2, 0x10(STATEP)
movdqu STATE3, 0x20(STATEP)
movdqu STATE4, 0x30(STATEP)
movdqu STATE0, 0x40(STATEP)
FRAME_END
ret
.Lenc_out_4:
movdqu STATE0, 0x00(STATEP)
movdqu STATE1, 0x10(STATEP)
movdqu STATE2, 0x20(STATEP)
movdqu STATE3, 0x30(STATEP)
movdqu STATE4, 0x40(STATEP)
FRAME_END
ret
.Lenc_out:
FRAME_END
ret
ENDPROC(crypto_aegis128_aesni_enc)
/*
* void crypto_aegis128_aesni_enc_tail(void *state, unsigned int length,
* const void *src, void *dst);
*/
ENTRY(crypto_aegis128_aesni_enc_tail)
FRAME_BEGIN
/* load the state: */
movdqu 0x00(STATEP), STATE0
movdqu 0x10(STATEP), STATE1
movdqu 0x20(STATEP), STATE2
movdqu 0x30(STATEP), STATE3
movdqu 0x40(STATEP), STATE4
/* encrypt message: */
call __load_partial
movdqa MSG, T0
pxor STATE1, T0
pxor STATE4, T0
movdqa STATE2, T1
pand STATE3, T1
pxor T1, T0
call __store_partial
aegis128_update
pxor MSG, STATE4
/* store the state: */
movdqu STATE4, 0x00(STATEP)
movdqu STATE0, 0x10(STATEP)
movdqu STATE1, 0x20(STATEP)
movdqu STATE2, 0x30(STATEP)
movdqu STATE3, 0x40(STATEP)
FRAME_END
ENDPROC(crypto_aegis128_aesni_enc_tail)
.macro decrypt_block a s0 s1 s2 s3 s4 i
movdq\a (\i * 0x10)(SRC), MSG
pxor \s1, MSG
pxor \s4, MSG
movdqa \s2, T1
pand \s3, T1
pxor T1, MSG
movdq\a MSG, (\i * 0x10)(DST)
aegis128_update
pxor MSG, \s4
sub $0x10, LEN
cmp $0x10, LEN
jl .Ldec_out_\i
.endm
/*
* void crypto_aegis128_aesni_dec(void *state, unsigned int length,
* const void *src, void *dst);
*/
ENTRY(crypto_aegis128_aesni_dec)
FRAME_BEGIN
cmp $0x10, LEN
jb .Ldec_out
/* load the state: */
movdqu 0x00(STATEP), STATE0
movdqu 0x10(STATEP), STATE1
movdqu 0x20(STATEP), STATE2
movdqu 0x30(STATEP), STATE3
movdqu 0x40(STATEP), STATE4
mov SRC, %r8
or DST, %r8
and $0xF, %r8
jnz .Ldec_u_loop
.align 8
.Ldec_a_loop:
decrypt_block a STATE0 STATE1 STATE2 STATE3 STATE4 0
decrypt_block a STATE4 STATE0 STATE1 STATE2 STATE3 1
decrypt_block a STATE3 STATE4 STATE0 STATE1 STATE2 2
decrypt_block a STATE2 STATE3 STATE4 STATE0 STATE1 3
decrypt_block a STATE1 STATE2 STATE3 STATE4 STATE0 4
add $0x50, SRC
add $0x50, DST
jmp .Ldec_a_loop
.align 8
.Ldec_u_loop:
decrypt_block u STATE0 STATE1 STATE2 STATE3 STATE4 0
decrypt_block u STATE4 STATE0 STATE1 STATE2 STATE3 1
decrypt_block u STATE3 STATE4 STATE0 STATE1 STATE2 2
decrypt_block u STATE2 STATE3 STATE4 STATE0 STATE1 3
decrypt_block u STATE1 STATE2 STATE3 STATE4 STATE0 4
add $0x50, SRC
add $0x50, DST
jmp .Ldec_u_loop
/* store the state: */
.Ldec_out_0:
movdqu STATE4, 0x00(STATEP)
movdqu STATE0, 0x10(STATEP)
movdqu STATE1, 0x20(STATEP)
movdqu STATE2, 0x30(STATEP)
movdqu STATE3, 0x40(STATEP)
FRAME_END
ret
.Ldec_out_1:
movdqu STATE3, 0x00(STATEP)
movdqu STATE4, 0x10(STATEP)
movdqu STATE0, 0x20(STATEP)
movdqu STATE1, 0x30(STATEP)
movdqu STATE2, 0x40(STATEP)
FRAME_END
ret
.Ldec_out_2:
movdqu STATE2, 0x00(STATEP)
movdqu STATE3, 0x10(STATEP)
movdqu STATE4, 0x20(STATEP)
movdqu STATE0, 0x30(STATEP)
movdqu STATE1, 0x40(STATEP)
FRAME_END
ret
.Ldec_out_3:
movdqu STATE1, 0x00(STATEP)
movdqu STATE2, 0x10(STATEP)
movdqu STATE3, 0x20(STATEP)
movdqu STATE4, 0x30(STATEP)
movdqu STATE0, 0x40(STATEP)
FRAME_END
ret
.Ldec_out_4:
movdqu STATE0, 0x00(STATEP)
movdqu STATE1, 0x10(STATEP)
movdqu STATE2, 0x20(STATEP)
movdqu STATE3, 0x30(STATEP)
movdqu STATE4, 0x40(STATEP)
FRAME_END
ret
.Ldec_out:
FRAME_END
ret
ENDPROC(crypto_aegis128_aesni_dec)
/*
* void crypto_aegis128_aesni_dec_tail(void *state, unsigned int length,
* const void *src, void *dst);
*/
ENTRY(crypto_aegis128_aesni_dec_tail)
FRAME_BEGIN
/* load the state: */
movdqu 0x00(STATEP), STATE0
movdqu 0x10(STATEP), STATE1
movdqu 0x20(STATEP), STATE2
movdqu 0x30(STATEP), STATE3
movdqu 0x40(STATEP), STATE4
/* decrypt message: */
call __load_partial
pxor STATE1, MSG
pxor STATE4, MSG
movdqa STATE2, T1
pand STATE3, T1
pxor T1, MSG
movdqa MSG, T0
call __store_partial
/* mask with byte count: */
movq LEN, T0
punpcklbw T0, T0
punpcklbw T0, T0
punpcklbw T0, T0
punpcklbw T0, T0
movdqa .Laegis128_counter, T1
pcmpgtb T1, T0
pand T0, MSG
aegis128_update
pxor MSG, STATE4
/* store the state: */
movdqu STATE4, 0x00(STATEP)
movdqu STATE0, 0x10(STATEP)
movdqu STATE1, 0x20(STATEP)
movdqu STATE2, 0x30(STATEP)
movdqu STATE3, 0x40(STATEP)
FRAME_END
ret
ENDPROC(crypto_aegis128_aesni_dec_tail)
/*
* void crypto_aegis128_aesni_final(void *state, void *tag_xor,
* u64 assoclen, u64 cryptlen);
*/
ENTRY(crypto_aegis128_aesni_final)
FRAME_BEGIN
/* load the state: */
movdqu 0x00(STATEP), STATE0
movdqu 0x10(STATEP), STATE1
movdqu 0x20(STATEP), STATE2
movdqu 0x30(STATEP), STATE3
movdqu 0x40(STATEP), STATE4
/* prepare length block: */
movq %rdx, MSG
movq %rcx, T0
pslldq $8, T0
pxor T0, MSG
psllq $3, MSG /* multiply by 8 (to get bit count) */
pxor STATE3, MSG
/* update state: */
aegis128_update; pxor MSG, STATE4
aegis128_update; pxor MSG, STATE3
aegis128_update; pxor MSG, STATE2
aegis128_update; pxor MSG, STATE1
aegis128_update; pxor MSG, STATE0
aegis128_update; pxor MSG, STATE4
aegis128_update; pxor MSG, STATE3
/* xor tag: */
movdqu (%rsi), MSG
pxor STATE0, MSG
pxor STATE1, MSG
pxor STATE2, MSG
pxor STATE3, MSG
pxor STATE4, MSG
movdqu MSG, (%rsi)
FRAME_END
ret
ENDPROC(crypto_aegis128_aesni_final)
/*
* The AEGIS-128 Authenticated-Encryption Algorithm
* Glue for AES-NI + SSE2 implementation
*
* Copyright (c) 2017-2018 Ondrej Mosnacek <omosnacek@gmail.com>
* Copyright (C) 2017-2018 Red Hat, Inc. All rights reserved.
*
* This program is free software; you can redistribute it and/or modify it
* under the terms of the GNU General Public License as published by the Free
* Software Foundation; either version 2 of the License, or (at your option)
* any later version.
*/
#include <crypto/cryptd.h>
#include <crypto/internal/aead.h>
#include <crypto/internal/skcipher.h>
#include <crypto/scatterwalk.h>
#include <linux/module.h>
#include <asm/fpu/api.h>
#include <asm/cpu_device_id.h>
#define AEGIS128_BLOCK_ALIGN 16
#define AEGIS128_BLOCK_SIZE 16
#define AEGIS128_NONCE_SIZE 16
#define AEGIS128_STATE_BLOCKS 5
#define AEGIS128_KEY_SIZE 16
#define AEGIS128_MIN_AUTH_SIZE 8
#define AEGIS128_MAX_AUTH_SIZE 16
asmlinkage void crypto_aegis128_aesni_init(void *state, void *key, void *iv);
asmlinkage void crypto_aegis128_aesni_ad(
void *state, unsigned int length, const void *data);
asmlinkage void crypto_aegis128_aesni_enc(
void *state, unsigned int length, const void *src, void *dst);
asmlinkage void crypto_aegis128_aesni_dec(
void *state, unsigned int length, const void *src, void *dst);
asmlinkage void crypto_aegis128_aesni_enc_tail(
void *state, unsigned int length, const void *src, void *dst);
asmlinkage void crypto_aegis128_aesni_dec_tail(
void *state, unsigned int length, const void *src, void *dst);
asmlinkage void crypto_aegis128_aesni_final(
void *state, void *tag_xor, unsigned int cryptlen,
unsigned int assoclen);
struct aegis_block {
u8 bytes[AEGIS128_BLOCK_SIZE] __aligned(AEGIS128_BLOCK_ALIGN);
};
struct aegis_state {
struct aegis_block blocks[AEGIS128_STATE_BLOCKS];
};
struct aegis_ctx {
struct aegis_block key;
};
struct aegis_crypt_ops {
int (*skcipher_walk_init)(struct skcipher_walk *walk,
struct aead_request *req, bool atomic);
void (*crypt_blocks)(void *state, unsigned int length, const void *src,
void *dst);
void (*crypt_tail)(void *state, unsigned int length, const void *src,
void *dst);
};
static void crypto_aegis128_aesni_process_ad(
struct aegis_state *state, struct scatterlist *sg_src,
unsigned int assoclen)
{
struct scatter_walk walk;
struct aegis_block buf;
unsigned int pos = 0;
scatterwalk_start(&walk, sg_src);
while (assoclen != 0) {
unsigned int size = scatterwalk_clamp(&walk, assoclen);
unsigned int left = size;
void *mapped = scatterwalk_map(&walk);
const u8 *src = (const u8 *)mapped;
if (pos + size >= AEGIS128_BLOCK_SIZE) {
if (pos > 0) {
unsigned int fill = AEGIS128_BLOCK_SIZE - pos;
memcpy(buf.bytes + pos, src, fill);
crypto_aegis128_aesni_ad(state,
AEGIS128_BLOCK_SIZE,
buf.bytes);
pos = 0;
left -= fill;
src += fill;
}
crypto_aegis128_aesni_ad(state, left, src);
src += left & ~(AEGIS128_BLOCK_SIZE - 1);
left &= AEGIS128_BLOCK_SIZE - 1;
}
memcpy(buf.bytes + pos, src, left);
pos += left;
assoclen -= size;
scatterwalk_unmap(mapped);
scatterwalk_advance(&walk, size);
scatterwalk_done(&walk, 0, assoclen);
}
if (pos > 0) {
memset(buf.bytes + pos, 0, AEGIS128_BLOCK_SIZE - pos);
crypto_aegis128_aesni_ad(state, AEGIS128_BLOCK_SIZE, buf.bytes);
}
}
static void crypto_aegis128_aesni_process_crypt(
struct aegis_state *state, struct aead_request *req,
const struct aegis_crypt_ops *ops)
{
struct skcipher_walk walk;
u8 *src, *dst;
unsigned int chunksize, base;
ops->skcipher_walk_init(&walk, req, false);
while (walk.nbytes) {
src = walk.src.virt.addr;
dst = walk.dst.virt.addr;
chunksize = walk.nbytes;
ops->crypt_blocks(state, chunksize, src, dst);
base = chunksize & ~(AEGIS128_BLOCK_SIZE - 1);
src += base;
dst += base;
chunksize &= AEGIS128_BLOCK_SIZE - 1;
if (chunksize > 0)
ops->crypt_tail(state, chunksize, src, dst);
skcipher_walk_done(&walk, 0);
}
}
static struct aegis_ctx *crypto_aegis128_aesni_ctx(struct crypto_aead *aead)
{
u8 *ctx = crypto_aead_ctx(aead);
ctx = PTR_ALIGN(ctx, __alignof__(struct aegis_ctx));
return (void *)ctx;
}
static int crypto_aegis128_aesni_setkey(struct crypto_aead *aead, const u8 *key,
unsigned int keylen)
{
struct aegis_ctx *ctx = crypto_aegis128_aesni_ctx(aead);
if (keylen != AEGIS128_KEY_SIZE) {
crypto_aead_set_flags(aead, CRYPTO_TFM_RES_BAD_KEY_LEN);
return -EINVAL;
}
memcpy(ctx->key.bytes, key, AEGIS128_KEY_SIZE);
return 0;
}
static int crypto_aegis128_aesni_setauthsize(struct crypto_aead *tfm,
unsigned int authsize)
{
if (authsize > AEGIS128_MAX_AUTH_SIZE)
return -EINVAL;
if (authsize < AEGIS128_MIN_AUTH_SIZE)
return -EINVAL;
return 0;
}
static void crypto_aegis128_aesni_crypt(struct aead_request *req,
struct aegis_block *tag_xor,
unsigned int cryptlen,
const struct aegis_crypt_ops *ops)
{
struct crypto_aead *tfm = crypto_aead_reqtfm(req);
struct aegis_ctx *ctx = crypto_aegis128_aesni_ctx(tfm);
struct aegis_state state;
kernel_fpu_begin();
crypto_aegis128_aesni_init(&state, ctx->key.bytes, req->iv);
crypto_aegis128_aesni_process_ad(&state, req->src, req->assoclen);
crypto_aegis128_aesni_process_crypt(&state, req, ops);
crypto_aegis128_aesni_final(&state, tag_xor, req->assoclen, cryptlen);
kernel_fpu_end();
}
static int crypto_aegis128_aesni_encrypt(struct aead_request *req)
{
static const struct aegis_crypt_ops OPS = {
.skcipher_walk_init = skcipher_walk_aead_encrypt,
.crypt_blocks = crypto_aegis128_aesni_enc,
.crypt_tail = crypto_aegis128_aesni_enc_tail,
};
struct crypto_aead *tfm = crypto_aead_reqtfm(req);
struct aegis_block tag = {};
unsigned int authsize = crypto_aead_authsize(tfm);
unsigned int cryptlen = req->cryptlen;
crypto_aegis128_aesni_crypt(req, &tag, cryptlen, &OPS);
scatterwalk_map_and_copy(tag.bytes, req->dst,
req->assoclen + cryptlen, authsize, 1);
return 0;
}
static int crypto_aegis128_aesni_decrypt(struct aead_request *req)
{
static const struct aegis_block zeros = {};
static const struct aegis_crypt_ops OPS = {
.skcipher_walk_init = skcipher_walk_aead_decrypt,
.crypt_blocks = crypto_aegis128_aesni_dec,
.crypt_tail = crypto_aegis128_aesni_dec_tail,
};
struct crypto_aead *tfm = crypto_aead_reqtfm(req);
struct aegis_block tag;
unsigned int authsize = crypto_aead_authsize(tfm);
unsigned int cryptlen = req->cryptlen - authsize;
scatterwalk_map_and_copy(tag.bytes, req->src,
req->assoclen + cryptlen, authsize, 0);
crypto_aegis128_aesni_crypt(req, &tag, cryptlen, &OPS);
return crypto_memneq(tag.bytes, zeros.bytes, authsize) ? -EBADMSG : 0;
}
static int crypto_aegis128_aesni_init_tfm(struct crypto_aead *aead)
{
return 0;
}
static void crypto_aegis128_aesni_exit_tfm(struct crypto_aead *aead)
{
}
static int cryptd_aegis128_aesni_setkey(struct crypto_aead *aead,
const u8 *key, unsigned int keylen)
{
struct cryptd_aead **ctx = crypto_aead_ctx(aead);
struct cryptd_aead *cryptd_tfm = *ctx;
return crypto_aead_setkey(&cryptd_tfm->base, key, keylen);
}
static int cryptd_aegis128_aesni_setauthsize(struct crypto_aead *aead,
unsigned int authsize)
{
struct cryptd_aead **ctx = crypto_aead_ctx(aead);
struct cryptd_aead *cryptd_tfm = *ctx;
return crypto_aead_setauthsize(&cryptd_tfm->base, authsize);
}
static int cryptd_aegis128_aesni_encrypt(struct aead_request *req)
{
struct crypto_aead *aead = crypto_aead_reqtfm(req);
struct cryptd_aead **ctx = crypto_aead_ctx(aead);
struct cryptd_aead *cryptd_tfm = *ctx;
aead = &cryptd_tfm->base;
if (irq_fpu_usable() && (!in_atomic() ||
!cryptd_aead_queued(cryptd_tfm)))
aead = cryptd_aead_child(cryptd_tfm);
aead_request_set_tfm(req, aead);
return crypto_aead_encrypt(req);
}
static int cryptd_aegis128_aesni_decrypt(struct aead_request *req)
{
struct crypto_aead *aead = crypto_aead_reqtfm(req);
struct cryptd_aead **ctx = crypto_aead_ctx(aead);
struct cryptd_aead *cryptd_tfm = *ctx;
aead = &cryptd_tfm->base;
if (irq_fpu_usable() && (!in_atomic() ||
!cryptd_aead_queued(cryptd_tfm)))
aead = cryptd_aead_child(cryptd_tfm);
aead_request_set_tfm(req, aead);
return crypto_aead_decrypt(req);
}
static int cryptd_aegis128_aesni_init_tfm(struct crypto_aead *aead)
{
struct cryptd_aead *cryptd_tfm;
struct cryptd_aead **ctx = crypto_aead_ctx(aead);
cryptd_tfm = cryptd_alloc_aead("__aegis128-aesni", CRYPTO_ALG_INTERNAL,
CRYPTO_ALG_INTERNAL);
if (IS_ERR(cryptd_tfm))
return PTR_ERR(cryptd_tfm);
*ctx = cryptd_tfm;
crypto_aead_set_reqsize(aead, crypto_aead_reqsize(&cryptd_tfm->base));
return 0;
}
static void cryptd_aegis128_aesni_exit_tfm(struct crypto_aead *aead)
{
struct cryptd_aead **ctx = crypto_aead_ctx(aead);
cryptd_free_aead(*ctx);
}
static struct aead_alg crypto_aegis128_aesni_alg[] = {
{
.setkey = crypto_aegis128_aesni_setkey,
.setauthsize = crypto_aegis128_aesni_setauthsize,
.encrypt = crypto_aegis128_aesni_encrypt,
.decrypt = crypto_aegis128_aesni_decrypt,
.init = crypto_aegis128_aesni_init_tfm,
.exit = crypto_aegis128_aesni_exit_tfm,
.ivsize = AEGIS128_NONCE_SIZE,
.maxauthsize = AEGIS128_MAX_AUTH_SIZE,
.chunksize = AEGIS128_BLOCK_SIZE,
.base = {
.cra_flags = CRYPTO_ALG_INTERNAL,
.cra_blocksize = 1,
.cra_ctxsize = sizeof(struct aegis_ctx) +
__alignof__(struct aegis_ctx),
.cra_alignmask = 0,
.cra_name = "__aegis128",
.cra_driver_name = "__aegis128-aesni",
.cra_module = THIS_MODULE,
}
}, {
.setkey = cryptd_aegis128_aesni_setkey,
.setauthsize = cryptd_aegis128_aesni_setauthsize,
.encrypt = cryptd_aegis128_aesni_encrypt,
.decrypt = cryptd_aegis128_aesni_decrypt,
.init = cryptd_aegis128_aesni_init_tfm,
.exit = cryptd_aegis128_aesni_exit_tfm,
.ivsize = AEGIS128_NONCE_SIZE,
.maxauthsize = AEGIS128_MAX_AUTH_SIZE,
.chunksize = AEGIS128_BLOCK_SIZE,
.base = {
.cra_flags = CRYPTO_ALG_ASYNC,
.cra_blocksize = 1,
.cra_ctxsize = sizeof(struct cryptd_aead *),
.cra_alignmask = 0,
.cra_priority = 400,
.cra_name = "aegis128",
.cra_driver_name = "aegis128-aesni",
.cra_module = THIS_MODULE,
}
}
};
static const struct x86_cpu_id aesni_cpu_id[] = {
X86_FEATURE_MATCH(X86_FEATURE_AES),
X86_FEATURE_MATCH(X86_FEATURE_XMM2),
{}
};
MODULE_DEVICE_TABLE(x86cpu, aesni_cpu_id);
static int __init crypto_aegis128_aesni_module_init(void)
{
if (!x86_match_cpu(aesni_cpu_id))
return -ENODEV;
return crypto_register_aeads(crypto_aegis128_aesni_alg,
ARRAY_SIZE(crypto_aegis128_aesni_alg));
}
static void __exit crypto_aegis128_aesni_module_exit(void)
{
crypto_unregister_aeads(crypto_aegis128_aesni_alg,
ARRAY_SIZE(crypto_aegis128_aesni_alg));
}
module_init(crypto_aegis128_aesni_module_init);
module_exit(crypto_aegis128_aesni_module_exit);
MODULE_LICENSE("GPL");
MODULE_AUTHOR("Ondrej Mosnacek <omosnacek@gmail.com>");
MODULE_DESCRIPTION("AEGIS-128 AEAD algorithm -- AESNI+SSE2 implementation");
MODULE_ALIAS_CRYPTO("aegis128");
MODULE_ALIAS_CRYPTO("aegis128-aesni");
/*
* AES-NI + SSE2 implementation of AEGIS-128L
*
* Copyright (c) 2017-2018 Ondrej Mosnacek <omosnacek@gmail.com>
* Copyright (C) 2017-2018 Red Hat, Inc. All rights reserved.
*
* This program is free software; you can redistribute it and/or modify it
* under the terms of the GNU General Public License version 2 as published
* by the Free Software Foundation.
*/
#include <linux/linkage.h>
#include <asm/frame.h>
#define STATE0 %xmm0
#define STATE1 %xmm1
#define STATE2 %xmm2
#define STATE3 %xmm3
#define STATE4 %xmm4
#define STATE5 %xmm5
#define STATE6 %xmm6
#define STATE7 %xmm7
#define MSG0 %xmm8
#define MSG1 %xmm9
#define T0 %xmm10
#define T1 %xmm11
#define T2 %xmm12
#define T3 %xmm13
#define STATEP %rdi
#define LEN %rsi
#define SRC %rdx
#define DST %rcx
.section .rodata.cst16.aegis128l_const, "aM", @progbits, 32
.align 16
.Laegis128l_const_0:
.byte 0x00, 0x01, 0x01, 0x02, 0x03, 0x05, 0x08, 0x0d
.byte 0x15, 0x22, 0x37, 0x59, 0x90, 0xe9, 0x79, 0x62
.Laegis128l_const_1:
.byte 0xdb, 0x3d, 0x18, 0x55, 0x6d, 0xc2, 0x2f, 0xf1
.byte 0x20, 0x11, 0x31, 0x42, 0x73, 0xb5, 0x28, 0xdd
.section .rodata.cst16.aegis128l_counter, "aM", @progbits, 16
.align 16
.Laegis128l_counter0:
.byte 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07
.byte 0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f
.Laegis128l_counter1:
.byte 0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17
.byte 0x18, 0x19, 0x1a, 0x1b, 0x1c, 0x1d, 0x1e, 0x1f
.text
/*
* __load_partial: internal ABI
* input:
* LEN - bytes
* SRC - src
* output:
* MSG0 - first message block
* MSG1 - second message block
* changed:
* T0
* %r8
* %r9
*/
__load_partial:
xor %r9, %r9
pxor MSG0, MSG0
pxor MSG1, MSG1
mov LEN, %r8
and $0x1, %r8
jz .Lld_partial_1
mov LEN, %r8
and $0x1E, %r8
add SRC, %r8
mov (%r8), %r9b
.Lld_partial_1:
mov LEN, %r8
and $0x2, %r8
jz .Lld_partial_2
mov LEN, %r8
and $0x1C, %r8
add SRC, %r8
shl $0x10, %r9
mov (%r8), %r9w
.Lld_partial_2:
mov LEN, %r8
and $0x4, %r8
jz .Lld_partial_4
mov LEN, %r8
and $0x18, %r8
add SRC, %r8
shl $32, %r9
mov (%r8), %r8d
xor %r8, %r9
.Lld_partial_4:
movq %r9, MSG0
mov LEN, %r8
and $0x8, %r8
jz .Lld_partial_8
mov LEN, %r8
and $0x10, %r8
add SRC, %r8
pslldq $8, MSG0
movq (%r8), T0
pxor T0, MSG0
.Lld_partial_8:
mov LEN, %r8
and $0x10, %r8
jz .Lld_partial_16
movdqa MSG0, MSG1
movdqu (SRC), MSG0
.Lld_partial_16:
ret
ENDPROC(__load_partial)
/*
* __store_partial: internal ABI
* input:
* LEN - bytes
* DST - dst
* output:
* T0 - first message block
* T1 - second message block
* changed:
* %r8
* %r9
* %r10
*/
__store_partial:
mov LEN, %r8
mov DST, %r9
cmp $16, %r8
jl .Lst_partial_16
movdqu T0, (%r9)
movdqa T1, T0
sub $16, %r8
add $16, %r9
.Lst_partial_16:
movq T0, %r10
cmp $8, %r8
jl .Lst_partial_8
mov %r10, (%r9)
psrldq $8, T0
movq T0, %r10
sub $8, %r8
add $8, %r9
.Lst_partial_8:
cmp $4, %r8
jl .Lst_partial_4
mov %r10d, (%r9)
shr $32, %r10
sub $4, %r8
add $4, %r9
.Lst_partial_4:
cmp $2, %r8
jl .Lst_partial_2
mov %r10w, (%r9)
shr $0x10, %r10
sub $2, %r8
add $2, %r9
.Lst_partial_2:
cmp $1, %r8
jl .Lst_partial_1
mov %r10b, (%r9)
.Lst_partial_1:
ret
ENDPROC(__store_partial)
.macro update
movdqa STATE7, T0
aesenc STATE0, STATE7
aesenc STATE1, STATE0
aesenc STATE2, STATE1
aesenc STATE3, STATE2
aesenc STATE4, STATE3
aesenc STATE5, STATE4
aesenc STATE6, STATE5
aesenc T0, STATE6
.endm
.macro update0
update
pxor MSG0, STATE7
pxor MSG1, STATE3
.endm
.macro update1
update
pxor MSG0, STATE6
pxor MSG1, STATE2
.endm
.macro update2
update
pxor MSG0, STATE5
pxor MSG1, STATE1
.endm
.macro update3
update
pxor MSG0, STATE4
pxor MSG1, STATE0
.endm
.macro update4
update
pxor MSG0, STATE3
pxor MSG1, STATE7
.endm
.macro update5
update
pxor MSG0, STATE2
pxor MSG1, STATE6
.endm
.macro update6
update
pxor MSG0, STATE1
pxor MSG1, STATE5
.endm
.macro update7
update
pxor MSG0, STATE0
pxor MSG1, STATE4
.endm
.macro state_load
movdqu 0x00(STATEP), STATE0
movdqu 0x10(STATEP), STATE1
movdqu 0x20(STATEP), STATE2
movdqu 0x30(STATEP), STATE3
movdqu 0x40(STATEP), STATE4
movdqu 0x50(STATEP), STATE5
movdqu 0x60(STATEP), STATE6
movdqu 0x70(STATEP), STATE7
.endm
.macro state_store s0 s1 s2 s3 s4 s5 s6 s7
movdqu \s7, 0x00(STATEP)
movdqu \s0, 0x10(STATEP)
movdqu \s1, 0x20(STATEP)
movdqu \s2, 0x30(STATEP)
movdqu \s3, 0x40(STATEP)
movdqu \s4, 0x50(STATEP)
movdqu \s5, 0x60(STATEP)
movdqu \s6, 0x70(STATEP)
.endm
.macro state_store0
state_store STATE0 STATE1 STATE2 STATE3 STATE4 STATE5 STATE6 STATE7
.endm
.macro state_store1
state_store STATE7 STATE0 STATE1 STATE2 STATE3 STATE4 STATE5 STATE6
.endm
.macro state_store2
state_store STATE6 STATE7 STATE0 STATE1 STATE2 STATE3 STATE4 STATE5
.endm
.macro state_store3
state_store STATE5 STATE6 STATE7 STATE0 STATE1 STATE2 STATE3 STATE4
.endm
.macro state_store4
state_store STATE4 STATE5 STATE6 STATE7 STATE0 STATE1 STATE2 STATE3
.endm
.macro state_store5
state_store STATE3 STATE4 STATE5 STATE6 STATE7 STATE0 STATE1 STATE2
.endm
.macro state_store6
state_store STATE2 STATE3 STATE4 STATE5 STATE6 STATE7 STATE0 STATE1
.endm
.macro state_store7
state_store STATE1 STATE2 STATE3 STATE4 STATE5 STATE6 STATE7 STATE0
.endm
/*
* void crypto_aegis128l_aesni_init(void *state, const void *key, const void *iv);
*/
ENTRY(crypto_aegis128l_aesni_init)
FRAME_BEGIN
/* load key: */
movdqa (%rsi), MSG1
movdqa MSG1, STATE0
movdqa MSG1, STATE4
movdqa MSG1, STATE5
movdqa MSG1, STATE6
movdqa MSG1, STATE7
/* load IV: */
movdqu (%rdx), MSG0
pxor MSG0, STATE0
pxor MSG0, STATE4
/* load the constants: */
movdqa .Laegis128l_const_0, STATE2
movdqa .Laegis128l_const_1, STATE1
movdqa STATE1, STATE3
pxor STATE2, STATE5
pxor STATE1, STATE6
pxor STATE2, STATE7
/* update 10 times with IV and KEY: */
update0
update1
update2
update3
update4
update5
update6
update7
update0
update1
state_store1
FRAME_END
ret
ENDPROC(crypto_aegis128l_aesni_init)
.macro ad_block a i
movdq\a (\i * 0x20 + 0x00)(SRC), MSG0
movdq\a (\i * 0x20 + 0x10)(SRC), MSG1
update\i
sub $0x20, LEN
cmp $0x20, LEN
jl .Lad_out_\i
.endm
/*
* void crypto_aegis128l_aesni_ad(void *state, unsigned int length,
* const void *data);
*/
ENTRY(crypto_aegis128l_aesni_ad)
FRAME_BEGIN
cmp $0x20, LEN
jb .Lad_out
state_load
mov SRC, %r8
and $0xf, %r8
jnz .Lad_u_loop
.align 8
.Lad_a_loop:
ad_block a 0
ad_block a 1
ad_block a 2
ad_block a 3
ad_block a 4
ad_block a 5
ad_block a 6
ad_block a 7
add $0x100, SRC
jmp .Lad_a_loop
.align 8
.Lad_u_loop:
ad_block u 0
ad_block u 1
ad_block u 2
ad_block u 3
ad_block u 4
ad_block u 5
ad_block u 6
ad_block u 7
add $0x100, SRC
jmp .Lad_u_loop
.Lad_out_0:
state_store0
FRAME_END
ret
.Lad_out_1:
state_store1
FRAME_END
ret
.Lad_out_2:
state_store2
FRAME_END
ret
.Lad_out_3:
state_store3
FRAME_END
ret
.Lad_out_4:
state_store4
FRAME_END
ret
.Lad_out_5:
state_store5
FRAME_END
ret
.Lad_out_6:
state_store6
FRAME_END
ret
.Lad_out_7:
state_store7
FRAME_END
ret
.Lad_out:
FRAME_END
ret
ENDPROC(crypto_aegis128l_aesni_ad)
.macro crypt m0 m1 s0 s1 s2 s3 s4 s5 s6 s7
pxor \s1, \m0
pxor \s6, \m0
movdqa \s2, T3
pand \s3, T3
pxor T3, \m0
pxor \s2, \m1
pxor \s5, \m1
movdqa \s6, T3
pand \s7, T3
pxor T3, \m1
.endm
.macro crypt0 m0 m1
crypt \m0 \m1 STATE0 STATE1 STATE2 STATE3 STATE4 STATE5 STATE6 STATE7
.endm
.macro crypt1 m0 m1
crypt \m0 \m1 STATE7 STATE0 STATE1 STATE2 STATE3 STATE4 STATE5 STATE6
.endm
.macro crypt2 m0 m1
crypt \m0 \m1 STATE6 STATE7 STATE0 STATE1 STATE2 STATE3 STATE4 STATE5
.endm
.macro crypt3 m0 m1
crypt \m0 \m1 STATE5 STATE6 STATE7 STATE0 STATE1 STATE2 STATE3 STATE4
.endm
.macro crypt4 m0 m1
crypt \m0 \m1 STATE4 STATE5 STATE6 STATE7 STATE0 STATE1 STATE2 STATE3
.endm
.macro crypt5 m0 m1
crypt \m0 \m1 STATE3 STATE4 STATE5 STATE6 STATE7 STATE0 STATE1 STATE2
.endm
.macro crypt6 m0 m1
crypt \m0 \m1 STATE2 STATE3 STATE4 STATE5 STATE6 STATE7 STATE0 STATE1
.endm
.macro crypt7 m0 m1
crypt \m0 \m1 STATE1 STATE2 STATE3 STATE4 STATE5 STATE6 STATE7 STATE0
.endm
.macro encrypt_block a i
movdq\a (\i * 0x20 + 0x00)(SRC), MSG0
movdq\a (\i * 0x20 + 0x10)(SRC), MSG1
movdqa MSG0, T0
movdqa MSG1, T1
crypt\i T0, T1
movdq\a T0, (\i * 0x20 + 0x00)(DST)
movdq\a T1, (\i * 0x20 + 0x10)(DST)
update\i
sub $0x20, LEN
cmp $0x20, LEN
jl .Lenc_out_\i
.endm
.macro decrypt_block a i
movdq\a (\i * 0x20 + 0x00)(SRC), MSG0
movdq\a (\i * 0x20 + 0x10)(SRC), MSG1
crypt\i MSG0, MSG1
movdq\a MSG0, (\i * 0x20 + 0x00)(DST)
movdq\a MSG1, (\i * 0x20 + 0x10)(DST)
update\i
sub $0x20, LEN
cmp $0x20, LEN
jl .Ldec_out_\i
.endm
/*
* void crypto_aegis128l_aesni_enc(void *state, unsigned int length,
* const void *src, void *dst);
*/
ENTRY(crypto_aegis128l_aesni_enc)
FRAME_BEGIN
cmp $0x20, LEN
jb .Lenc_out
state_load
mov SRC, %r8
or DST, %r8
and $0xf, %r8
jnz .Lenc_u_loop
.align 8
.Lenc_a_loop:
encrypt_block a 0
encrypt_block a 1
encrypt_block a 2
encrypt_block a 3
encrypt_block a 4
encrypt_block a 5
encrypt_block a 6
encrypt_block a 7
add $0x100, SRC
add $0x100, DST
jmp .Lenc_a_loop
.align 8
.Lenc_u_loop:
encrypt_block u 0
encrypt_block u 1
encrypt_block u 2
encrypt_block u 3
encrypt_block u 4
encrypt_block u 5
encrypt_block u 6
encrypt_block u 7
add $0x100, SRC
add $0x100, DST
jmp .Lenc_u_loop
.Lenc_out_0:
state_store0
FRAME_END
ret
.Lenc_out_1:
state_store1
FRAME_END
ret
.Lenc_out_2:
state_store2
FRAME_END
ret
.Lenc_out_3:
state_store3
FRAME_END
ret
.Lenc_out_4:
state_store4
FRAME_END
ret
.Lenc_out_5:
state_store5
FRAME_END
ret
.Lenc_out_6:
state_store6
FRAME_END
ret
.Lenc_out_7:
state_store7
FRAME_END
ret
.Lenc_out:
FRAME_END
ret
ENDPROC(crypto_aegis128l_aesni_enc)
/*
* void crypto_aegis128l_aesni_enc_tail(void *state, unsigned int length,
* const void *src, void *dst);
*/
ENTRY(crypto_aegis128l_aesni_enc_tail)
FRAME_BEGIN
state_load
/* encrypt message: */
call __load_partial
movdqa MSG0, T0
movdqa MSG1, T1
crypt0 T0, T1
call __store_partial
update0
state_store0
FRAME_END
ENDPROC(crypto_aegis128l_aesni_enc_tail)
/*
* void crypto_aegis128l_aesni_dec(void *state, unsigned int length,
* const void *src, void *dst);
*/
ENTRY(crypto_aegis128l_aesni_dec)
FRAME_BEGIN
cmp $0x20, LEN
jb .Ldec_out
state_load
mov SRC, %r8
or DST, %r8
and $0xF, %r8
jnz .Ldec_u_loop
.align 8
.Ldec_a_loop:
decrypt_block a 0
decrypt_block a 1
decrypt_block a 2
decrypt_block a 3
decrypt_block a 4
decrypt_block a 5
decrypt_block a 6
decrypt_block a 7
add $0x100, SRC
add $0x100, DST
jmp .Ldec_a_loop
.align 8
.Ldec_u_loop:
decrypt_block u 0
decrypt_block u 1
decrypt_block u 2
decrypt_block u 3
decrypt_block u 4
decrypt_block u 5
decrypt_block u 6
decrypt_block u 7
add $0x100, SRC
add $0x100, DST
jmp .Ldec_u_loop
.Ldec_out_0:
state_store0
FRAME_END
ret
.Ldec_out_1:
state_store1
FRAME_END
ret
.Ldec_out_2:
state_store2
FRAME_END
ret
.Ldec_out_3:
state_store3
FRAME_END
ret
.Ldec_out_4:
state_store4
FRAME_END
ret
.Ldec_out_5:
state_store5
FRAME_END
ret
.Ldec_out_6:
state_store6
FRAME_END
ret
.Ldec_out_7:
state_store7
FRAME_END
ret
.Ldec_out:
FRAME_END
ret
ENDPROC(crypto_aegis128l_aesni_dec)
/*
* void crypto_aegis128l_aesni_dec_tail(void *state, unsigned int length,
* const void *src, void *dst);
*/
ENTRY(crypto_aegis128l_aesni_dec_tail)
FRAME_BEGIN
state_load
/* decrypt message: */
call __load_partial
crypt0 MSG0, MSG1
movdqa MSG0, T0
movdqa MSG1, T1
call __store_partial
/* mask with byte count: */
movq LEN, T0
punpcklbw T0, T0
punpcklbw T0, T0
punpcklbw T0, T0
punpcklbw T0, T0
movdqa T0, T1
movdqa .Laegis128l_counter0, T2
movdqa .Laegis128l_counter1, T3
pcmpgtb T2, T0
pcmpgtb T3, T1
pand T0, MSG0
pand T1, MSG1
update0
state_store0
FRAME_END
ret
ENDPROC(crypto_aegis128l_aesni_dec_tail)
/*
* void crypto_aegis128l_aesni_final(void *state, void *tag_xor,
* u64 assoclen, u64 cryptlen);
*/
ENTRY(crypto_aegis128l_aesni_final)
FRAME_BEGIN
state_load
/* prepare length block: */
movq %rdx, MSG0
movq %rcx, T0
pslldq $8, T0
pxor T0, MSG0
psllq $3, MSG0 /* multiply by 8 (to get bit count) */
pxor STATE2, MSG0
movdqa MSG0, MSG1
/* update state: */
update0
update1
update2
update3
update4
update5
update6
/* xor tag: */
movdqu (%rsi), T0
pxor STATE1, T0
pxor STATE2, T0
pxor STATE3, T0
pxor STATE4, T0
pxor STATE5, T0
pxor STATE6, T0
pxor STATE7, T0
movdqu T0, (%rsi)
FRAME_END
ret
ENDPROC(crypto_aegis128l_aesni_final)
/*
* The AEGIS-128L Authenticated-Encryption Algorithm
* Glue for AES-NI + SSE2 implementation
*
* Copyright (c) 2017-2018 Ondrej Mosnacek <omosnacek@gmail.com>
* Copyright (C) 2017-2018 Red Hat, Inc. All rights reserved.
*
* This program is free software; you can redistribute it and/or modify it
* under the terms of the GNU General Public License as published by the Free
* Software Foundation; either version 2 of the License, or (at your option)
* any later version.
*/
#include <crypto/cryptd.h>
#include <crypto/internal/aead.h>
#include <crypto/internal/skcipher.h>
#include <crypto/scatterwalk.h>
#include <linux/module.h>
#include <asm/fpu/api.h>
#include <asm/cpu_device_id.h>
#define AEGIS128L_BLOCK_ALIGN 16
#define AEGIS128L_BLOCK_SIZE 32
#define AEGIS128L_NONCE_SIZE 16
#define AEGIS128L_STATE_BLOCKS 8
#define AEGIS128L_KEY_SIZE 16
#define AEGIS128L_MIN_AUTH_SIZE 8
#define AEGIS128L_MAX_AUTH_SIZE 16
asmlinkage void crypto_aegis128l_aesni_init(void *state, void *key, void *iv);
asmlinkage void crypto_aegis128l_aesni_ad(
void *state, unsigned int length, const void *data);
asmlinkage void crypto_aegis128l_aesni_enc(
void *state, unsigned int length, const void *src, void *dst);
asmlinkage void crypto_aegis128l_aesni_dec(
void *state, unsigned int length, const void *src, void *dst);
asmlinkage void crypto_aegis128l_aesni_enc_tail(
void *state, unsigned int length, const void *src, void *dst);
asmlinkage void crypto_aegis128l_aesni_dec_tail(
void *state, unsigned int length, const void *src, void *dst);
asmlinkage void crypto_aegis128l_aesni_final(
void *state, void *tag_xor, unsigned int cryptlen,
unsigned int assoclen);
struct aegis_block {
u8 bytes[AEGIS128L_BLOCK_SIZE] __aligned(AEGIS128L_BLOCK_ALIGN);
};
struct aegis_state {
struct aegis_block blocks[AEGIS128L_STATE_BLOCKS];
};
struct aegis_ctx {
struct aegis_block key;
};
struct aegis_crypt_ops {
int (*skcipher_walk_init)(struct skcipher_walk *walk,
struct aead_request *req, bool atomic);
void (*crypt_blocks)(void *state, unsigned int length, const void *src,
void *dst);
void (*crypt_tail)(void *state, unsigned int length, const void *src,
void *dst);
};
static void crypto_aegis128l_aesni_process_ad(
struct aegis_state *state, struct scatterlist *sg_src,
unsigned int assoclen)
{
struct scatter_walk walk;
struct aegis_block buf;
unsigned int pos = 0;
scatterwalk_start(&walk, sg_src);
while (assoclen != 0) {
unsigned int size = scatterwalk_clamp(&walk, assoclen);
unsigned int left = size;
void *mapped = scatterwalk_map(&walk);
const u8 *src = (const u8 *)mapped;
if (pos + size >= AEGIS128L_BLOCK_SIZE) {
if (pos > 0) {
unsigned int fill = AEGIS128L_BLOCK_SIZE - pos;
memcpy(buf.bytes + pos, src, fill);
crypto_aegis128l_aesni_ad(state,
AEGIS128L_BLOCK_SIZE,
buf.bytes);
pos = 0;
left -= fill;
src += fill;
}
crypto_aegis128l_aesni_ad(state, left, src);
src += left & ~(AEGIS128L_BLOCK_SIZE - 1);
left &= AEGIS128L_BLOCK_SIZE - 1;
}
memcpy(buf.bytes + pos, src, left);
pos += left;
assoclen -= size;
scatterwalk_unmap(mapped);
scatterwalk_advance(&walk, size);
scatterwalk_done(&walk, 0, assoclen);
}
if (pos > 0) {
memset(buf.bytes + pos, 0, AEGIS128L_BLOCK_SIZE - pos);
crypto_aegis128l_aesni_ad(state, AEGIS128L_BLOCK_SIZE, buf.bytes);
}
}
static void crypto_aegis128l_aesni_process_crypt(
struct aegis_state *state, struct aead_request *req,
const struct aegis_crypt_ops *ops)
{
struct skcipher_walk walk;
u8 *src, *dst;
unsigned int chunksize, base;
ops->skcipher_walk_init(&walk, req, false);
while (walk.nbytes) {
src = walk.src.virt.addr;
dst = walk.dst.virt.addr;
chunksize = walk.nbytes;
ops->crypt_blocks(state, chunksize, src, dst);
base = chunksize & ~(AEGIS128L_BLOCK_SIZE - 1);
src += base;
dst += base;
chunksize &= AEGIS128L_BLOCK_SIZE - 1;
if (chunksize > 0)
ops->crypt_tail(state, chunksize, src, dst);
skcipher_walk_done(&walk, 0);
}
}
static struct aegis_ctx *crypto_aegis128l_aesni_ctx(struct crypto_aead *aead)
{
u8 *ctx = crypto_aead_ctx(aead);
ctx = PTR_ALIGN(ctx, __alignof__(struct aegis_ctx));
return (void *)ctx;
}
static int crypto_aegis128l_aesni_setkey(struct crypto_aead *aead,
const u8 *key, unsigned int keylen)
{
struct aegis_ctx *ctx = crypto_aegis128l_aesni_ctx(aead);
if (keylen != AEGIS128L_KEY_SIZE) {
crypto_aead_set_flags(aead, CRYPTO_TFM_RES_BAD_KEY_LEN);
return -EINVAL;
}
memcpy(ctx->key.bytes, key, AEGIS128L_KEY_SIZE);
return 0;
}
static int crypto_aegis128l_aesni_setauthsize(struct crypto_aead *tfm,
unsigned int authsize)
{
if (authsize > AEGIS128L_MAX_AUTH_SIZE)
return -EINVAL;
if (authsize < AEGIS128L_MIN_AUTH_SIZE)
return -EINVAL;
return 0;
}
static void crypto_aegis128l_aesni_crypt(struct aead_request *req,
struct aegis_block *tag_xor,
unsigned int cryptlen,
const struct aegis_crypt_ops *ops)
{
struct crypto_aead *tfm = crypto_aead_reqtfm(req);
struct aegis_ctx *ctx = crypto_aegis128l_aesni_ctx(tfm);
struct aegis_state state;
kernel_fpu_begin();
crypto_aegis128l_aesni_init(&state, ctx->key.bytes, req->iv);
crypto_aegis128l_aesni_process_ad(&state, req->src, req->assoclen);
crypto_aegis128l_aesni_process_crypt(&state, req, ops);
crypto_aegis128l_aesni_final(&state, tag_xor, req->assoclen, cryptlen);
kernel_fpu_end();
}
static int crypto_aegis128l_aesni_encrypt(struct aead_request *req)
{
static const struct aegis_crypt_ops OPS = {
.skcipher_walk_init = skcipher_walk_aead_encrypt,
.crypt_blocks = crypto_aegis128l_aesni_enc,
.crypt_tail = crypto_aegis128l_aesni_enc_tail,
};
struct crypto_aead *tfm = crypto_aead_reqtfm(req);
struct aegis_block tag = {};
unsigned int authsize = crypto_aead_authsize(tfm);
unsigned int cryptlen = req->cryptlen;
crypto_aegis128l_aesni_crypt(req, &tag, cryptlen, &OPS);
scatterwalk_map_and_copy(tag.bytes, req->dst,
req->assoclen + cryptlen, authsize, 1);
return 0;
}
static int crypto_aegis128l_aesni_decrypt(struct aead_request *req)
{
static const struct aegis_block zeros = {};
static const struct aegis_crypt_ops OPS = {
.skcipher_walk_init = skcipher_walk_aead_decrypt,
.crypt_blocks = crypto_aegis128l_aesni_dec,
.crypt_tail = crypto_aegis128l_aesni_dec_tail,
};
struct crypto_aead *tfm = crypto_aead_reqtfm(req);
struct aegis_block tag;
unsigned int authsize = crypto_aead_authsize(tfm);
unsigned int cryptlen = req->cryptlen - authsize;
scatterwalk_map_and_copy(tag.bytes, req->src,
req->assoclen + cryptlen, authsize, 0);
crypto_aegis128l_aesni_crypt(req, &tag, cryptlen, &OPS);
return crypto_memneq(tag.bytes, zeros.bytes, authsize) ? -EBADMSG : 0;
}
static int crypto_aegis128l_aesni_init_tfm(struct crypto_aead *aead)
{
return 0;
}
static void crypto_aegis128l_aesni_exit_tfm(struct crypto_aead *aead)
{
}
static int cryptd_aegis128l_aesni_setkey(struct crypto_aead *aead,
const u8 *key, unsigned int keylen)
{
struct cryptd_aead **ctx = crypto_aead_ctx(aead);
struct cryptd_aead *cryptd_tfm = *ctx;
return crypto_aead_setkey(&cryptd_tfm->base, key, keylen);
}
static int cryptd_aegis128l_aesni_setauthsize(struct crypto_aead *aead,
unsigned int authsize)
{
struct cryptd_aead **ctx = crypto_aead_ctx(aead);
struct cryptd_aead *cryptd_tfm = *ctx;
return crypto_aead_setauthsize(&cryptd_tfm->base, authsize);
}
static int cryptd_aegis128l_aesni_encrypt(struct aead_request *req)
{
struct crypto_aead *aead = crypto_aead_reqtfm(req);
struct cryptd_aead **ctx = crypto_aead_ctx(aead);
struct cryptd_aead *cryptd_tfm = *ctx;
aead = &cryptd_tfm->base;
if (irq_fpu_usable() && (!in_atomic() ||
!cryptd_aead_queued(cryptd_tfm)))
aead = cryptd_aead_child(cryptd_tfm);
aead_request_set_tfm(req, aead);
return crypto_aead_encrypt(req);
}
static int cryptd_aegis128l_aesni_decrypt(struct aead_request *req)
{
struct crypto_aead *aead = crypto_aead_reqtfm(req);
struct cryptd_aead **ctx = crypto_aead_ctx(aead);
struct cryptd_aead *cryptd_tfm = *ctx;
aead = &cryptd_tfm->base;
if (irq_fpu_usable() && (!in_atomic() ||
!cryptd_aead_queued(cryptd_tfm)))
aead = cryptd_aead_child(cryptd_tfm);
aead_request_set_tfm(req, aead);
return crypto_aead_decrypt(req);
}
static int cryptd_aegis128l_aesni_init_tfm(struct crypto_aead *aead)
{
struct cryptd_aead *cryptd_tfm;
struct cryptd_aead **ctx = crypto_aead_ctx(aead);
cryptd_tfm = cryptd_alloc_aead("__aegis128l-aesni", CRYPTO_ALG_INTERNAL,
CRYPTO_ALG_INTERNAL);
if (IS_ERR(cryptd_tfm))
return PTR_ERR(cryptd_tfm);
*ctx = cryptd_tfm;
crypto_aead_set_reqsize(aead, crypto_aead_reqsize(&cryptd_tfm->base));
return 0;
}
static void cryptd_aegis128l_aesni_exit_tfm(struct crypto_aead *aead)
{
struct cryptd_aead **ctx = crypto_aead_ctx(aead);
cryptd_free_aead(*ctx);
}
static struct aead_alg crypto_aegis128l_aesni_alg[] = {
{
.setkey = crypto_aegis128l_aesni_setkey,
.setauthsize = crypto_aegis128l_aesni_setauthsize,
.encrypt = crypto_aegis128l_aesni_encrypt,
.decrypt = crypto_aegis128l_aesni_decrypt,
.init = crypto_aegis128l_aesni_init_tfm,
.exit = crypto_aegis128l_aesni_exit_tfm,
.ivsize = AEGIS128L_NONCE_SIZE,
.maxauthsize = AEGIS128L_MAX_AUTH_SIZE,
.chunksize = AEGIS128L_BLOCK_SIZE,
.base = {
.cra_flags = CRYPTO_ALG_INTERNAL,
.cra_blocksize = 1,
.cra_ctxsize = sizeof(struct aegis_ctx) +
__alignof__(struct aegis_ctx),
.cra_alignmask = 0,
.cra_name = "__aegis128l",
.cra_driver_name = "__aegis128l-aesni",
.cra_module = THIS_MODULE,
}
}, {
.setkey = cryptd_aegis128l_aesni_setkey,
.setauthsize = cryptd_aegis128l_aesni_setauthsize,
.encrypt = cryptd_aegis128l_aesni_encrypt,
.decrypt = cryptd_aegis128l_aesni_decrypt,
.init = cryptd_aegis128l_aesni_init_tfm,
.exit = cryptd_aegis128l_aesni_exit_tfm,
.ivsize = AEGIS128L_NONCE_SIZE,
.maxauthsize = AEGIS128L_MAX_AUTH_SIZE,
.chunksize = AEGIS128L_BLOCK_SIZE,
.base = {
.cra_flags = CRYPTO_ALG_ASYNC,
.cra_blocksize = 1,
.cra_ctxsize = sizeof(struct cryptd_aead *),
.cra_alignmask = 0,
.cra_priority = 400,
.cra_name = "aegis128l",
.cra_driver_name = "aegis128l-aesni",
.cra_module = THIS_MODULE,
}
}
};
static const struct x86_cpu_id aesni_cpu_id[] = {
X86_FEATURE_MATCH(X86_FEATURE_AES),
X86_FEATURE_MATCH(X86_FEATURE_XMM2),
{}
};
MODULE_DEVICE_TABLE(x86cpu, aesni_cpu_id);
static int __init crypto_aegis128l_aesni_module_init(void)
{
if (!x86_match_cpu(aesni_cpu_id))
return -ENODEV;
return crypto_register_aeads(crypto_aegis128l_aesni_alg,
ARRAY_SIZE(crypto_aegis128l_aesni_alg));
}
static void __exit crypto_aegis128l_aesni_module_exit(void)
{
crypto_unregister_aeads(crypto_aegis128l_aesni_alg,
ARRAY_SIZE(crypto_aegis128l_aesni_alg));
}
module_init(crypto_aegis128l_aesni_module_init);
module_exit(crypto_aegis128l_aesni_module_exit);
MODULE_LICENSE("GPL");
MODULE_AUTHOR("Ondrej Mosnacek <omosnacek@gmail.com>");
MODULE_DESCRIPTION("AEGIS-128L AEAD algorithm -- AESNI+SSE2 implementation");
MODULE_ALIAS_CRYPTO("aegis128l");
MODULE_ALIAS_CRYPTO("aegis128l-aesni");
此差异已折叠。
此差异已折叠。
...@@ -364,5 +364,5 @@ module_exit(ghash_pclmulqdqni_mod_exit); ...@@ -364,5 +364,5 @@ module_exit(ghash_pclmulqdqni_mod_exit);
MODULE_LICENSE("GPL"); MODULE_LICENSE("GPL");
MODULE_DESCRIPTION("GHASH Message Digest Algorithm, " MODULE_DESCRIPTION("GHASH Message Digest Algorithm, "
"acclerated by PCLMULQDQ-NI"); "accelerated by PCLMULQDQ-NI");
MODULE_ALIAS_CRYPTO("ghash"); MODULE_ALIAS_CRYPTO("ghash");
此差异已折叠。
/*
* The MORUS-1280 Authenticated-Encryption Algorithm
* Glue for AVX2 implementation
*
* Copyright (c) 2016-2018 Ondrej Mosnacek <omosnacek@gmail.com>
* Copyright (C) 2017-2018 Red Hat, Inc. All rights reserved.
*
* This program is free software; you can redistribute it and/or modify it
* under the terms of the GNU General Public License as published by the Free
* Software Foundation; either version 2 of the License, or (at your option)
* any later version.
*/
#include <crypto/internal/aead.h>
#include <crypto/morus1280_glue.h>
#include <linux/module.h>
#include <asm/fpu/api.h>
#include <asm/cpu_device_id.h>
asmlinkage void crypto_morus1280_avx2_init(void *state, const void *key,
const void *iv);
asmlinkage void crypto_morus1280_avx2_ad(void *state, const void *data,
unsigned int length);
asmlinkage void crypto_morus1280_avx2_enc(void *state, const void *src,
void *dst, unsigned int length);
asmlinkage void crypto_morus1280_avx2_dec(void *state, const void *src,
void *dst, unsigned int length);
asmlinkage void crypto_morus1280_avx2_enc_tail(void *state, const void *src,
void *dst, unsigned int length);
asmlinkage void crypto_morus1280_avx2_dec_tail(void *state, const void *src,
void *dst, unsigned int length);
asmlinkage void crypto_morus1280_avx2_final(void *state, void *tag_xor,
u64 assoclen, u64 cryptlen);
MORUS1280_DECLARE_ALGS(avx2, "morus1280-avx2", 400);
static const struct x86_cpu_id avx2_cpu_id[] = {
X86_FEATURE_MATCH(X86_FEATURE_AVX2),
{}
};
MODULE_DEVICE_TABLE(x86cpu, avx2_cpu_id);
static int __init crypto_morus1280_avx2_module_init(void)
{
if (!x86_match_cpu(avx2_cpu_id))
return -ENODEV;
return crypto_register_aeads(crypto_morus1280_avx2_algs,
ARRAY_SIZE(crypto_morus1280_avx2_algs));
}
static void __exit crypto_morus1280_avx2_module_exit(void)
{
crypto_unregister_aeads(crypto_morus1280_avx2_algs,
ARRAY_SIZE(crypto_morus1280_avx2_algs));
}
module_init(crypto_morus1280_avx2_module_init);
module_exit(crypto_morus1280_avx2_module_exit);
MODULE_LICENSE("GPL");
MODULE_AUTHOR("Ondrej Mosnacek <omosnacek@gmail.com>");
MODULE_DESCRIPTION("MORUS-1280 AEAD algorithm -- AVX2 implementation");
MODULE_ALIAS_CRYPTO("morus1280");
MODULE_ALIAS_CRYPTO("morus1280-avx2");
此差异已折叠。
/*
* The MORUS-1280 Authenticated-Encryption Algorithm
* Glue for SSE2 implementation
*
* Copyright (c) 2016-2018 Ondrej Mosnacek <omosnacek@gmail.com>
* Copyright (C) 2017-2018 Red Hat, Inc. All rights reserved.
*
* This program is free software; you can redistribute it and/or modify it
* under the terms of the GNU General Public License as published by the Free
* Software Foundation; either version 2 of the License, or (at your option)
* any later version.
*/
#include <crypto/internal/aead.h>
#include <crypto/morus1280_glue.h>
#include <linux/module.h>
#include <asm/fpu/api.h>
#include <asm/cpu_device_id.h>
asmlinkage void crypto_morus1280_sse2_init(void *state, const void *key,
const void *iv);
asmlinkage void crypto_morus1280_sse2_ad(void *state, const void *data,
unsigned int length);
asmlinkage void crypto_morus1280_sse2_enc(void *state, const void *src,
void *dst, unsigned int length);
asmlinkage void crypto_morus1280_sse2_dec(void *state, const void *src,
void *dst, unsigned int length);
asmlinkage void crypto_morus1280_sse2_enc_tail(void *state, const void *src,
void *dst, unsigned int length);
asmlinkage void crypto_morus1280_sse2_dec_tail(void *state, const void *src,
void *dst, unsigned int length);
asmlinkage void crypto_morus1280_sse2_final(void *state, void *tag_xor,
u64 assoclen, u64 cryptlen);
MORUS1280_DECLARE_ALGS(sse2, "morus1280-sse2", 350);
static const struct x86_cpu_id sse2_cpu_id[] = {
X86_FEATURE_MATCH(X86_FEATURE_XMM2),
{}
};
MODULE_DEVICE_TABLE(x86cpu, sse2_cpu_id);
static int __init crypto_morus1280_sse2_module_init(void)
{
if (!x86_match_cpu(sse2_cpu_id))
return -ENODEV;
return crypto_register_aeads(crypto_morus1280_sse2_algs,
ARRAY_SIZE(crypto_morus1280_sse2_algs));
}
static void __exit crypto_morus1280_sse2_module_exit(void)
{
crypto_unregister_aeads(crypto_morus1280_sse2_algs,
ARRAY_SIZE(crypto_morus1280_sse2_algs));
}
module_init(crypto_morus1280_sse2_module_init);
module_exit(crypto_morus1280_sse2_module_exit);
MODULE_LICENSE("GPL");
MODULE_AUTHOR("Ondrej Mosnacek <omosnacek@gmail.com>");
MODULE_DESCRIPTION("MORUS-1280 AEAD algorithm -- SSE2 implementation");
MODULE_ALIAS_CRYPTO("morus1280");
MODULE_ALIAS_CRYPTO("morus1280-sse2");
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
...@@ -86,6 +86,11 @@ obj-$(CONFIG_CRYPTO_KEYWRAP) += keywrap.o ...@@ -86,6 +86,11 @@ obj-$(CONFIG_CRYPTO_KEYWRAP) += keywrap.o
obj-$(CONFIG_CRYPTO_GCM) += gcm.o obj-$(CONFIG_CRYPTO_GCM) += gcm.o
obj-$(CONFIG_CRYPTO_CCM) += ccm.o obj-$(CONFIG_CRYPTO_CCM) += ccm.o
obj-$(CONFIG_CRYPTO_CHACHA20POLY1305) += chacha20poly1305.o obj-$(CONFIG_CRYPTO_CHACHA20POLY1305) += chacha20poly1305.o
obj-$(CONFIG_CRYPTO_AEGIS128) += aegis128.o
obj-$(CONFIG_CRYPTO_AEGIS128L) += aegis128l.o
obj-$(CONFIG_CRYPTO_AEGIS256) += aegis256.o
obj-$(CONFIG_CRYPTO_MORUS640) += morus640.o
obj-$(CONFIG_CRYPTO_MORUS1280) += morus1280.o
obj-$(CONFIG_CRYPTO_PCRYPT) += pcrypt.o obj-$(CONFIG_CRYPTO_PCRYPT) += pcrypt.o
obj-$(CONFIG_CRYPTO_CRYPTD) += cryptd.o obj-$(CONFIG_CRYPTO_CRYPTD) += cryptd.o
obj-$(CONFIG_CRYPTO_MCRYPTD) += mcryptd.o obj-$(CONFIG_CRYPTO_MCRYPTD) += mcryptd.o
...@@ -137,6 +142,7 @@ obj-$(CONFIG_CRYPTO_USER_API_HASH) += algif_hash.o ...@@ -137,6 +142,7 @@ obj-$(CONFIG_CRYPTO_USER_API_HASH) += algif_hash.o
obj-$(CONFIG_CRYPTO_USER_API_SKCIPHER) += algif_skcipher.o obj-$(CONFIG_CRYPTO_USER_API_SKCIPHER) += algif_skcipher.o
obj-$(CONFIG_CRYPTO_USER_API_RNG) += algif_rng.o obj-$(CONFIG_CRYPTO_USER_API_RNG) += algif_rng.o
obj-$(CONFIG_CRYPTO_USER_API_AEAD) += algif_aead.o obj-$(CONFIG_CRYPTO_USER_API_AEAD) += algif_aead.o
obj-$(CONFIG_CRYPTO_ZSTD) += zstd.o
ecdh_generic-y := ecc.o ecdh_generic-y := ecc.o
ecdh_generic-y += ecdh.o ecdh_generic-y += ecdh.o
......
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
...@@ -10,6 +10,7 @@ ...@@ -10,6 +10,7 @@
* *
*/ */
#include <crypto/algapi.h>
#include <linux/err.h> #include <linux/err.h>
#include <linux/errno.h> #include <linux/errno.h>
#include <linux/fips.h> #include <linux/fips.h>
...@@ -59,6 +60,15 @@ static int crypto_check_alg(struct crypto_alg *alg) ...@@ -59,6 +60,15 @@ static int crypto_check_alg(struct crypto_alg *alg)
if (alg->cra_blocksize > PAGE_SIZE / 8) if (alg->cra_blocksize > PAGE_SIZE / 8)
return -EINVAL; return -EINVAL;
if (!alg->cra_type && (alg->cra_flags & CRYPTO_ALG_TYPE_MASK) ==
CRYPTO_ALG_TYPE_CIPHER) {
if (alg->cra_alignmask > MAX_CIPHER_ALIGNMASK)
return -EINVAL;
if (alg->cra_blocksize > MAX_CIPHER_BLOCKSIZE)
return -EINVAL;
}
if (alg->cra_priority < 0) if (alg->cra_priority < 0)
return -EINVAL; return -EINVAL;
......
...@@ -108,6 +108,7 @@ static int crypto_authenc_setkey(struct crypto_aead *authenc, const u8 *key, ...@@ -108,6 +108,7 @@ static int crypto_authenc_setkey(struct crypto_aead *authenc, const u8 *key,
CRYPTO_TFM_RES_MASK); CRYPTO_TFM_RES_MASK);
out: out:
memzero_explicit(&keys, sizeof(keys));
return err; return err;
badkey: badkey:
......
...@@ -90,6 +90,7 @@ static int crypto_authenc_esn_setkey(struct crypto_aead *authenc_esn, const u8 * ...@@ -90,6 +90,7 @@ static int crypto_authenc_esn_setkey(struct crypto_aead *authenc_esn, const u8 *
CRYPTO_TFM_RES_MASK); CRYPTO_TFM_RES_MASK);
out: out:
memzero_explicit(&keys, sizeof(keys));
return err; return err;
badkey: badkey:
......
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册