提交 521cdde7 编写于 作者: A Ard Biesheuvel 提交者: Herbert Xu

crypto: aegis - avoid prerotated AES tables

The generic AES code provides four sets of lookup tables, where each
set consists of four tables containing the same 32-bit values, but
rotated by 0, 8, 16 and 24 bits, respectively. This makes sense for
CISC architectures such as x86 which support memory operands, but
for other architectures, the rotates are quite cheap, and using all
four tables needlessly thrashes the D-cache, and actually hurts rather
than helps performance.

Since x86 already has its own implementation of AEGIS based on AES-NI
instructions, let's tweak the generic implementation towards other
architectures, and avoid the prerotated tables, and perform the
rotations inline. On ARM Cortex-A53, this results in a ~8% speedup.
Acked-by: NOndrej Mosnacek <omosnace@redhat.com>
Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
上级 368b1bdc
...@@ -10,6 +10,7 @@ ...@@ -10,6 +10,7 @@
#define _CRYPTO_AEGIS_H #define _CRYPTO_AEGIS_H
#include <crypto/aes.h> #include <crypto/aes.h>
#include <linux/bitops.h>
#include <linux/types.h> #include <linux/types.h>
#define AEGIS_BLOCK_SIZE 16 #define AEGIS_BLOCK_SIZE 16
...@@ -53,16 +54,13 @@ static void crypto_aegis_aesenc(union aegis_block *dst, ...@@ -53,16 +54,13 @@ static void crypto_aegis_aesenc(union aegis_block *dst,
const union aegis_block *key) const union aegis_block *key)
{ {
const u8 *s = src->bytes; const u8 *s = src->bytes;
const u32 *t0 = crypto_ft_tab[0]; const u32 *t = crypto_ft_tab[0];
const u32 *t1 = crypto_ft_tab[1];
const u32 *t2 = crypto_ft_tab[2];
const u32 *t3 = crypto_ft_tab[3];
u32 d0, d1, d2, d3; u32 d0, d1, d2, d3;
d0 = t0[s[ 0]] ^ t1[s[ 5]] ^ t2[s[10]] ^ t3[s[15]]; d0 = t[s[ 0]] ^ rol32(t[s[ 5]], 8) ^ rol32(t[s[10]], 16) ^ rol32(t[s[15]], 24);
d1 = t0[s[ 4]] ^ t1[s[ 9]] ^ t2[s[14]] ^ t3[s[ 3]]; d1 = t[s[ 4]] ^ rol32(t[s[ 9]], 8) ^ rol32(t[s[14]], 16) ^ rol32(t[s[ 3]], 24);
d2 = t0[s[ 8]] ^ t1[s[13]] ^ t2[s[ 2]] ^ t3[s[ 7]]; d2 = t[s[ 8]] ^ rol32(t[s[13]], 8) ^ rol32(t[s[ 2]], 16) ^ rol32(t[s[ 7]], 24);
d3 = t0[s[12]] ^ t1[s[ 1]] ^ t2[s[ 6]] ^ t3[s[11]]; d3 = t[s[12]] ^ rol32(t[s[ 1]], 8) ^ rol32(t[s[ 6]], 16) ^ rol32(t[s[11]], 24);
dst->words32[0] = cpu_to_le32(d0) ^ key->words32[0]; dst->words32[0] = cpu_to_le32(d0) ^ key->words32[0];
dst->words32[1] = cpu_to_le32(d1) ^ key->words32[1]; dst->words32[1] = cpu_to_le32(d1) ^ key->words32[1];
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册