提交 · 3d387ef08c40382315b8e9baa4bc9a07f7c49fce · openanolis / cloud-kernel

21 6月, 2013 2 次提交

Revert "crypto: blowfish - add AVX2/x86_64 implementation of blowfish cipher" · 3d387ef0

由 Jussi Kivilinna 提交于 6月 08, 2013

This reverts commit 60488010.

Instruction (vpgatherdd) that this implementation relied on turned out to be
slow performer on real hardware (i5-4570). The previous 4-way blowfish
implementation is therefore faster and this implementation should be removed.
Signed-off-by: NJussi Kivilinna <jussi.kivilinna@iki.fi>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

3d387ef0

crypto: camellia-aesni-avx2 - tune assembly code for more performance · acfffdb8

由 Jussi Kivilinna 提交于 6月 08, 2013

Add implementation tuned for more performance on real hardware. Changes are
mostly around the part mixing 128-bit extract and insert instructions and
AES-NI instructions. Also 'vpbroadcastb' instructions have been change to
'vpshufb with zero mask'.

Tests on Intel Core i5-4570:

tcrypt ECB results, old-AVX2 vs new-AVX2:

size    128bit key      256bit key
        enc     dec     enc     dec
256     1.00x   1.00x   1.00x   1.00x
1k      1.08x   1.09x   1.05x   1.06x
8k      1.06x   1.06x   1.06x   1.06x

tcrypt ECB results, AVX vs new-AVX2:

size    128bit key      256bit key
        enc     dec     enc     dec
256     1.00x   1.00x   1.00x   1.00x
1k      1.51x   1.50x   1.52x   1.50x
8k      1.47x   1.48x   1.48x   1.48x
Signed-off-by: NJussi Kivilinna <jussi.kivilinna@iki.fi>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

acfffdb8

28 5月, 2013 2 次提交

crypto: sha256_ssse3 - add sha224 support · a710f761

由 Jussi Kivilinna 提交于 5月 21, 2013

Add sha224 implementation to sha256_ssse3 module.

This also fixes sha256_ssse3 module autoloading issue when 'sha224' is used
before 'sha256'. Previously in such case, just sha256_generic was loaded and
not sha256_ssse3 (since it did not provide sha224). Now if 'sha256' was used
after 'sha224' usage, sha256_ssse3 would remain unloaded.

Cc: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: NJussi Kivilinna <jussi.kivilinna@iki.fi>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

a710f761

crypto: sha512_ssse3 - add sha384 support · 340991e3

由 Jussi Kivilinna 提交于 5月 21, 2013

Add sha384 implementation to sha512_ssse3 module.

This also fixes sha512_ssse3 module autoloading issue when 'sha384' is used
before 'sha512'. Previously in such case, just sha512_generic was loaded and
not sha512_ssse3 (since it did not provide sha384). Now if 'sha512' was used
after 'sha384' usage, sha512_ssse3 would remain unloaded. For example, this
happens with tcrypt testing module since it tests 'sha384' before 'sha512'.

Cc: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: NJussi Kivilinna <jussi.kivilinna@iki.fi>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

340991e3

24 5月, 2013 1 次提交

crypto: crct10dif - Glue code to cast accelerated CRCT10DIF assembly as a crypto transform · 0b95a7f8

由 Tim Chen 提交于 5月 01, 2013

Glue code that plugs the PCLMULQDQ accelerated CRC T10 DIF hash into the
crypto framework. The config CRYPTO_CRCT10DIF_PCLMUL should be turned
on to enable the feature. The crc_t10dif crypto library function will
use this faster algorithm when crct10dif_pclmul module is loaded.
Signed-off-by: NTim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

0b95a7f8

20 5月, 2013 1 次提交

crypto: crct10dif - Accelerated CRC T10 DIF computation with PCLMULQDQ instruction · 31d93962

由 Tim Chen 提交于 5月 01, 2013

This is the x86_64 CRC T10 DIF transform accelerated with the PCLMULQDQ
instructions. Details discussing the implementation can be found in the
paper:

"Fast CRC Computation for Generic Polynomials Using PCLMULQDQ Instruction"
http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/fast-crc-computation-generic-polynomials-pclmulqdq-paper.pdfSigned-off-by: NTim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

31d93962

25 4月, 2013 15 次提交

crypto: camellia - add AVX2/AES-NI/x86_64 assembler implementation of camellia cipher · f3f935a7

由 Jussi Kivilinna 提交于 4月 13, 2013

Patch adds AVX2/AES-NI/x86-64 implementation of Camellia cipher, requiring
32 parallel blocks for input (512 bytes). Compared to AVX implementation, this
version is extended to use the 256-bit wide YMM registers. For AES-NI
instructions data is split to two 128-bit registers and merged afterwards.
Even with this additional handling, performance should be higher compared
to the AES-NI/AVX implementation.
Signed-off-by: NJussi Kivilinna <jussi.kivilinna@iki.fi>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

f3f935a7

crypto: serpent - add AVX2/x86_64 assembler implementation of serpent cipher · 56d76c96

由 Jussi Kivilinna 提交于 4月 13, 2013

Patch adds AVX2/x86-64 implementation of Serpent cipher, requiring 16 parallel
blocks for input (256 bytes). Implementation is based on the AVX implementation
and extends to use the 256-bit wide YMM registers. Since serpent does not use
table look-ups, this implementation should be close to two times faster than
the AVX implementation.
Signed-off-by: NJussi Kivilinna <jussi.kivilinna@iki.fi>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

56d76c96

crypto: twofish - add AVX2/x86_64 assembler implementation of twofish cipher · cf1521a1

由 Jussi Kivilinna 提交于 4月 13, 2013

Patch adds AVX2/x86-64 implementation of Twofish cipher, requiring 16 parallel
blocks for input (256 bytes). Table look-ups are performed using vpgatherdd
instruction directly from vector registers and thus should be faster than
earlier implementations. Implementation also uses 256-bit wide YMM registers,
which should give additional speed up compared to the AVX implementation.
Signed-off-by: NJussi Kivilinna <jussi.kivilinna@iki.fi>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

cf1521a1

crypto: blowfish - add AVX2/x86_64 implementation of blowfish cipher · 60488010

由 Jussi Kivilinna 提交于 4月 13, 2013

Patch adds AVX2/x86-64 implementation of Blowfish cipher, requiring 32 parallel
blocks for input (256 bytes). Table look-ups are performed using vpgatherdd
instruction directly from vector registers and thus should be faster than
earlier implementations.
Signed-off-by: NJussi Kivilinna <jussi.kivilinna@iki.fi>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

60488010

crypto: aesni_intel - add more optimized XTS mode for x86-64 · c456a9cd

由 Jussi Kivilinna 提交于 4月 08, 2013

Add more optimized XTS code for aesni_intel in 64-bit mode, for smaller stack
usage and boost for speed.

tcrypt results, with Intel i5-2450M:
256-bit key
        enc     dec
16B     0.98x   0.99x
64B     0.64x   0.63x
256B    1.29x   1.32x
1024B   1.54x   1.58x
8192B   1.57x   1.60x

512-bit key
        enc     dec
16B     0.98x   0.99x
64B     0.60x   0.59x
256B    1.24x   1.25x
1024B   1.39x   1.42x
8192B   1.38x   1.42x

I chose not to optimize smaller than block size of 256 bytes, since XTS is
practically always used with data blocks of size 512 bytes. This is why
performance is reduced in tcrypt for 64 byte long blocks.

Cc: Huang Ying <ying.huang@intel.com>
Signed-off-by: NJussi Kivilinna <jussi.kivilinna@iki.fi>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

c456a9cd

crypto: x86/camellia-aesni-avx - add more optimized XTS code · b5c5b072

由 Jussi Kivilinna 提交于 4月 08, 2013

Add more optimized XTS code for camellia-aesni-avx, for smaller stack usage
and small boost for speed.

tcrypt results, with Intel i5-2450M:
        enc     dec
16B     1.10x   1.01x
64B     0.82x   0.77x
256B    1.14x   1.10x
1024B   1.17x   1.16x
8192B   1.10x   1.11x

Since XTS is practically always used with data blocks of size 512 bytes or
more, I chose to not make use of camellia-2way for block sized smaller than
256 bytes. This causes slower result in tcrypt for 64 bytes.
Signed-off-by: NJussi Kivilinna <jussi.kivilinna@iki.fi>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

b5c5b072

crypto: cast6-avx: use new optimized XTS code · 70177286

由 Jussi Kivilinna 提交于 4月 08, 2013

Change cast6-avx to use the new XTS code, for smaller stack usage and small
boost to performance.

tcrypt results, with Intel i5-2450M:
        enc     dec
16B     1.01x   1.01x
64B     1.01x   1.00x
256B    1.09x   1.02x
1024B   1.08x   1.06x
8192B   1.08x   1.07x
Signed-off-by: NJussi Kivilinna <jussi.kivilinna@iki.fi>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

70177286

crypto: x86/twofish-avx - use optimized XTS code · 18be4527

由 Jussi Kivilinna 提交于 4月 08, 2013

Change twofish-avx to use the new XTS code, for smaller stack usage and small
boost to performance.

tcrypt results, with Intel i5-2450M:
        enc     dec
16B     1.03x   1.02x
64B     0.91x   0.91x
256B    1.10x   1.09x
1024B   1.12x   1.11x
8192B   1.12x   1.11x

Since XTS is practically always used with data blocks of size 512 bytes or
more, I chose to not make use of twofish-3way for block sized smaller than
128 bytes. This causes slower result in tcrypt for 64 bytes.
Signed-off-by: NJussi Kivilinna <jussi.kivilinna@iki.fi>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

18be4527

crypto: x86 - add more optimized XTS-mode for serpent-avx · a05248ed

由 Jussi Kivilinna 提交于 4月 08, 2013

This patch adds AVX optimized XTS-mode helper functions/macros and converts
serpent-avx to use the new facilities. Benefits are slightly improved speed
and reduced stack usage as use of temporary IV-array is avoided.

tcrypt results, with Intel i5-2450M:
        enc     dec
16B     1.00x   1.00x
64B     1.00x   1.00x
256B    1.04x   1.06x
1024B   1.09x   1.09x
8192B   1.10x   1.09x
Signed-off-by: NJussi Kivilinna <jussi.kivilinna@iki.fi>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

a05248ed

crypto: crc32-pclmul - Use gas macro for pclmulqdq · 57ae1b05

由 Sandy Wu 提交于 3月 28, 2013

Occurs when CONFIG_CRYPTO_CRC32C_INTEL=y and CONFIG_CRYPTO_CRC32C_INTEL=y.
Older versions of bintuils do not support the pclmulqdq instruction. The
PCLMULQDQ gas macro is used instead.
Signed-off-by: NSandy Wu <sandyw@twitter.com>
Cc: stable@vger.kernel.org # 3.8+
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

57ae1b05

crypto: sha512 - Create module providing optimized SHA512 routines using... · 87de4579

由 Tim Chen 提交于 3月 26, 2013

crypto: sha512 - Create module providing optimized SHA512 routines using SSSE3, AVX or AVX2 instructions.

We added glue code and config options to create crypto
module that uses SSE/AVX/AVX2 optimized SHA512 x86_64 assembly routines.
Signed-off-by: NTim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

87de4579

crypto: sha512 - Optimized SHA512 x86_64 assembly routine using AVX2 RORX instruction. · 5663535b

由 Tim Chen 提交于 3月 26, 2013

Provides SHA512 x86_64 assembly routine optimized with SSE, AVX and
AVX2's RORX instructions.  Speedup of 70% or more has been
measured over the generic implementation.
Signed-off-by: NTim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

5663535b

crypto: sha512 - Optimized SHA512 x86_64 assembly routine using AVX instructions. · e01d69cb

由 Tim Chen 提交于 3月 26, 2013

Provides SHA512 x86_64 assembly routine optimized with SSE and AVX instructions.
Speedup of 60% or more has been measured over the generic implementation.
Signed-off-by: NTim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

e01d69cb

crypto: sha512 - Optimized SHA512 x86_64 assembly routine using Supplemental SSE3 instructions. · bf215cee

由 Tim Chen 提交于 3月 26, 2013

Provides SHA512 x86_64 assembly routine optimized with SSSE3 instructions.
Speedup of 40% or more has been measured over the generic implementation.
Signed-off-by: NTim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

bf215cee

crypto: sha256 - Create module providing optimized SHA256 routines using... · 8275d1aa

由 Tim Chen 提交于 3月 26, 2013

crypto: sha256 - Create module providing optimized SHA256 routines using SSSE3, AVX or AVX2 instructions.

We added glue code and config options to create crypto
module that uses SSE/AVX/AVX2 optimized SHA256 x86_64 assembly routines.
Signed-off-by: NTim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

8275d1aa

03 4月, 2013 5 次提交

crypto: sha256 - Optimized sha256 x86_64 routine using AVX2's RORX instructions · d34a4600

由 Tim Chen 提交于 3月 26, 2013

Provides SHA256 x86_64 assembly routine optimized with SSE, AVX and
AVX2's RORX instructions.  Speedup of 70% or more has been
measured over the generic implementation.
Signed-off-by: NTim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

d34a4600

crypto: sha256 - Optimized sha256 x86_64 assembly routine with AVX instructions. · ec2b4c85

由 Tim Chen 提交于 3月 26, 2013

Provides SHA256 x86_64 assembly routine optimized with SSE and AVX instructions.
Speedup of 60% or more has been measured over the generic implementation.
Signed-off-by: NTim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

ec2b4c85

crypto: sha256 - Optimized sha256 x86_64 assembly routine using Supplemental SSE3 instructions. · 46d208a2

由 Tim Chen 提交于 3月 26, 2013

Provides SHA256 x86_64 assembly routine optimized with SSSE3 instructions.
Speedup of 40% or more has been measured over the generic implementation.
Signed-off-by: NTim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

46d208a2

crypto: x86 - build AVX block cipher implementations only if assembler supports AVX instructions · 873b9caf

由 Jussi Kivilinna 提交于 3月 24, 2013

These modules require AVX support in assembler, so add new check to Makefile
for this.

Other option would be to use CONFIG_AS_AVX inside source files, but that would
result dummy/empty/no-fuctionality modules being created.
Signed-off-by: NJussi Kivilinna <jussi.kivilinna@iki.fi>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

873b9caf

J
crypto: x86/crc32-pclmul - assembly clean-ups: use ENTRY/ENDPROC · eca17269
由 Jussi Kivilinna 提交于 3月 24, 2013
```
Signed-off-by: NJussi Kivilinna <jussi.kivilinna@iki.fi>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
```
eca17269

10 3月, 2013 1 次提交

crypto: crc32c - Update the links to the white papers on CRC32C calculations... · 918731fa

由 Tim Chen 提交于 2月 21, 2013

crypto: crc32c - Update the links to the white papers on CRC32C calculations with PCLMULQDQ instructions.

Herbert,

The following patch update the stale link to the CRC32C white paper
that was referenced.

Tim
Signed-off-by: NTim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

918731fa

26 2月, 2013 1 次提交

crypto: crc32c - Kill pointless CRYPTO_CRC32C_X86_64 option · ca81a1a1

由 Herbert Xu 提交于 2月 26, 2013

This bool option can never be set to anything other than y.  So
let's just kill it.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

ca81a1a1

20 1月, 2013 12 次提交

crypto: crc32-pclmul - Kill warning on x86-32 · 79836276

由 Herbert Xu 提交于 1月 20, 2013

This patch removes a gratuitous warning on x86-32:

arch/x86/crypto/crc32-pclmul_asm.S:87:2: warning: #warning Using 32bit code support [-Wcpp]
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

79836276

crypto: x86/twofish - assembler clean-ups: use ENTRY/ENDPROC, localize jump labels · d3f5188d

由 Jussi Kivilinna 提交于 1月 19, 2013

Signed-off-by: NJussi Kivilinna <jussi.kivilinna@mbnet.fi>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

d3f5188d

crypto: x86/sha1 - assembler clean-ups: use ENTRY/ENDPROC · ac9d55dd

由 Jussi Kivilinna 提交于 1月 19, 2013

Signed-off-by: NJussi Kivilinna <jussi.kivilinna@mbnet.fi>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

ac9d55dd

crypto: x86/serpent - use ENTRY/ENDPROC for assember functions and localize jump targets · 2dcfd44d

由 Jussi Kivilinna 提交于 1月 19, 2013

Signed-off-by: NJussi Kivilinna <jussi.kivilinna@mbnet.fi>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

2dcfd44d

crypto: x86/salsa20 - assembler cleanup, use ENTRY/ENDPROC for assember... · 04443808

由 Jussi Kivilinna 提交于 1月 19, 2013

crypto: x86/salsa20 - assembler cleanup, use ENTRY/ENDPROC for assember functions and rename ECRYPT_* to salsa20_*
Signed-off-by: NJussi Kivilinna <jussi.kivilinna@mbnet.fi>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

04443808

crypto: x86/ghash - assembler clean-up: use ENDPROC at end of assember functions · b05d3f37

由 Jussi Kivilinna 提交于 1月 19, 2013

Signed-off-by: NJussi Kivilinna <jussi.kivilinn@mbnet.fi>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

b05d3f37

crypto: x86/crc32c - assembler clean-up: use ENTRY/ENDPROC · 698a5abb

由 Jussi Kivilinna 提交于 1月 19, 2013

Signed-off-by: NJussi Kivilinna <jussi.kivilinna@mbnet.fi>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

698a5abb

crypto: cast6-avx: use ENTRY()/ENDPROC() for assembler functions · 1985fecf

由 Jussi Kivilinna 提交于 1月 19, 2013

Signed-off-by: NJussi Kivilinna <jussi.kivilinna@mbnet.fi>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

1985fecf

crypto: cast5-avx: use ENTRY()/ENDPROC() for assembler functions and localize jump targets · e17e209e

由 Jussi Kivilinna 提交于 1月 19, 2013

Signed-off-by: NJussi Kivilinna <jussi.kivilinna@mbnet.fi>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

e17e209e

crypto: camellia-x86_64/aes-ni: use ENTRY()/ENDPROC() for assembler functions... · 59990684

由 Jussi Kivilinna 提交于 1月 19, 2013

crypto: camellia-x86_64/aes-ni: use ENTRY()/ENDPROC() for assembler functions and localize jump targets
Signed-off-by: NJussi Kivilinna <jussi.kivilinna@mbnet.fi>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

59990684

crypto: blowfish-x86_64: use ENTRY()/ENDPROC() for assembler functions and localize jump targets · 5186e395

由 Jussi Kivilinna 提交于 1月 19, 2013

Signed-off-by: NJussi Kivilinna <jussi.kivilinna@mbnet.fi>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

5186e395

crypto: aesni-intel - add ENDPROC statements for assembler functions · 8309b745

由 Jussi Kivilinna 提交于 1月 19, 2013

Signed-off-by: NJussi Kivilinna <jussi.kivilinna@mbnet.fi>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

8309b745

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功