提交 4e34e51f 编写于 作者: E Eric Biggers 提交者: Herbert Xu

crypto: arm/chacha20 - always use vrev for 16-bit rotates

The 4-way ChaCha20 NEON code implements 16-bit rotates with vrev32.16,
but the one-way code (used on remainder blocks) implements it with
vshl + vsri, which is slower.  Switch the one-way code to vrev32.16 too.
Signed-off-by: NEric Biggers <ebiggers@google.com>
Acked-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
上级 f53ad3e1
......@@ -51,9 +51,8 @@ ENTRY(chacha20_block_xor_neon)
.Ldoubleround:
// x0 += x1, x3 = rotl32(x3 ^ x0, 16)
vadd.i32 q0, q0, q1
veor q4, q3, q0
vshl.u32 q3, q4, #16
vsri.u32 q3, q4, #16
veor q3, q3, q0
vrev32.16 q3, q3
// x2 += x3, x1 = rotl32(x1 ^ x2, 12)
vadd.i32 q2, q2, q3
......@@ -82,9 +81,8 @@ ENTRY(chacha20_block_xor_neon)
// x0 += x1, x3 = rotl32(x3 ^ x0, 16)
vadd.i32 q0, q0, q1
veor q4, q3, q0
vshl.u32 q3, q4, #16
vsri.u32 q3, q4, #16
veor q3, q3, q0
vrev32.16 q3, q3
// x2 += x3, x1 = rotl32(x1 ^ x2, 12)
vadd.i32 q2, q2, q3
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册