- 17 7月, 2015 19 次提交
-
-
由 Martin Willi 提交于
Extends the x86_64 SSE2 Poly1305 authenticator by a function processing two consecutive Poly1305 blocks in parallel using a derived key r^2. Loop unrolling can be more effectively mapped to SSE instructions, further increasing throughput. For large messages, throughput increases by ~45-65% compared to single block SSE2: testing speed of poly1305 (poly1305-simd) test 0 ( 96 byte blocks, 16 bytes per update, 6 updates): 3790063 opers/sec, 363846076 bytes/sec test 1 ( 96 byte blocks, 32 bytes per update, 3 updates): 5913378 opers/sec, 567684355 bytes/sec test 2 ( 96 byte blocks, 96 bytes per update, 1 updates): 9352574 opers/sec, 897847104 bytes/sec test 3 ( 288 byte blocks, 16 bytes per update, 18 updates): 1362145 opers/sec, 392297990 bytes/sec test 4 ( 288 byte blocks, 32 bytes per update, 9 updates): 2007075 opers/sec, 578037628 bytes/sec test 5 ( 288 byte blocks, 288 bytes per update, 1 updates): 3709811 opers/sec, 1068425798 bytes/sec test 6 ( 1056 byte blocks, 32 bytes per update, 33 updates): 566272 opers/sec, 597984182 bytes/sec test 7 ( 1056 byte blocks, 1056 bytes per update, 1 updates): 1111657 opers/sec, 1173910108 bytes/sec test 8 ( 2080 byte blocks, 32 bytes per update, 65 updates): 288857 opers/sec, 600823808 bytes/sec test 9 ( 2080 byte blocks, 2080 bytes per update, 1 updates): 590746 opers/sec, 1228751888 bytes/sec test 10 ( 4128 byte blocks, 4128 bytes per update, 1 updates): 301825 opers/sec, 1245936902 bytes/sec test 11 ( 8224 byte blocks, 8224 bytes per update, 1 updates): 153075 opers/sec, 1258896201 bytes/sec testing speed of poly1305 (poly1305-simd) test 0 ( 96 byte blocks, 16 bytes per update, 6 updates): 3809514 opers/sec, 365713411 bytes/sec test 1 ( 96 byte blocks, 32 bytes per update, 3 updates): 5973423 opers/sec, 573448627 bytes/sec test 2 ( 96 byte blocks, 96 bytes per update, 1 updates): 9446779 opers/sec, 906890803 bytes/sec test 3 ( 288 byte blocks, 16 bytes per update, 18 updates): 1364814 opers/sec, 393066691 bytes/sec test 4 ( 288 byte blocks, 32 bytes per update, 9 updates): 2045780 opers/sec, 589184697 bytes/sec test 5 ( 288 byte blocks, 288 bytes per update, 1 updates): 3711946 opers/sec, 1069040592 bytes/sec test 6 ( 1056 byte blocks, 32 bytes per update, 33 updates): 573686 opers/sec, 605812732 bytes/sec test 7 ( 1056 byte blocks, 1056 bytes per update, 1 updates): 1647802 opers/sec, 1740079440 bytes/sec test 8 ( 2080 byte blocks, 32 bytes per update, 65 updates): 292970 opers/sec, 609378224 bytes/sec test 9 ( 2080 byte blocks, 2080 bytes per update, 1 updates): 943229 opers/sec, 1961916528 bytes/sec test 10 ( 4128 byte blocks, 4128 bytes per update, 1 updates): 494623 opers/sec, 2041804569 bytes/sec test 11 ( 8224 byte blocks, 8224 bytes per update, 1 updates): 254045 opers/sec, 2089271014 bytes/sec Benchmark results from a Core i5-4670T. Signed-off-by: NMartin Willi <martin@strongswan.org> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
由 Martin Willi 提交于
Implements an x86_64 assembler driver for the Poly1305 authenticator. This single block variant holds the 130-bit integer in 5 32-bit words, but uses SSE to do two multiplications/additions in parallel. When calling updates with small blocks, the overhead for kernel_fpu_begin/ kernel_fpu_end() negates the perfmance gain. We therefore use the poly1305-generic fallback for small updates. For large messages, throughput increases by ~5-10% compared to poly1305-generic: testing speed of poly1305 (poly1305-generic) test 0 ( 96 byte blocks, 16 bytes per update, 6 updates): 4080026 opers/sec, 391682496 bytes/sec test 1 ( 96 byte blocks, 32 bytes per update, 3 updates): 6221094 opers/sec, 597225024 bytes/sec test 2 ( 96 byte blocks, 96 bytes per update, 1 updates): 9609750 opers/sec, 922536057 bytes/sec test 3 ( 288 byte blocks, 16 bytes per update, 18 updates): 1459379 opers/sec, 420301267 bytes/sec test 4 ( 288 byte blocks, 32 bytes per update, 9 updates): 2115179 opers/sec, 609171609 bytes/sec test 5 ( 288 byte blocks, 288 bytes per update, 1 updates): 3729874 opers/sec, 1074203856 bytes/sec test 6 ( 1056 byte blocks, 32 bytes per update, 33 updates): 593000 opers/sec, 626208000 bytes/sec test 7 ( 1056 byte blocks, 1056 bytes per update, 1 updates): 1081536 opers/sec, 1142102332 bytes/sec test 8 ( 2080 byte blocks, 32 bytes per update, 65 updates): 302077 opers/sec, 628320576 bytes/sec test 9 ( 2080 byte blocks, 2080 bytes per update, 1 updates): 554384 opers/sec, 1153120176 bytes/sec test 10 ( 4128 byte blocks, 4128 bytes per update, 1 updates): 278715 opers/sec, 1150536345 bytes/sec test 11 ( 8224 byte blocks, 8224 bytes per update, 1 updates): 140202 opers/sec, 1153022070 bytes/sec testing speed of poly1305 (poly1305-simd) test 0 ( 96 byte blocks, 16 bytes per update, 6 updates): 3790063 opers/sec, 363846076 bytes/sec test 1 ( 96 byte blocks, 32 bytes per update, 3 updates): 5913378 opers/sec, 567684355 bytes/sec test 2 ( 96 byte blocks, 96 bytes per update, 1 updates): 9352574 opers/sec, 897847104 bytes/sec test 3 ( 288 byte blocks, 16 bytes per update, 18 updates): 1362145 opers/sec, 392297990 bytes/sec test 4 ( 288 byte blocks, 32 bytes per update, 9 updates): 2007075 opers/sec, 578037628 bytes/sec test 5 ( 288 byte blocks, 288 bytes per update, 1 updates): 3709811 opers/sec, 1068425798 bytes/sec test 6 ( 1056 byte blocks, 32 bytes per update, 33 updates): 566272 opers/sec, 597984182 bytes/sec test 7 ( 1056 byte blocks, 1056 bytes per update, 1 updates): 1111657 opers/sec, 1173910108 bytes/sec test 8 ( 2080 byte blocks, 32 bytes per update, 65 updates): 288857 opers/sec, 600823808 bytes/sec test 9 ( 2080 byte blocks, 2080 bytes per update, 1 updates): 590746 opers/sec, 1228751888 bytes/sec test 10 ( 4128 byte blocks, 4128 bytes per update, 1 updates): 301825 opers/sec, 1245936902 bytes/sec test 11 ( 8224 byte blocks, 8224 bytes per update, 1 updates): 153075 opers/sec, 1258896201 bytes/sec Benchmark results from a Core i5-4670T. Signed-off-by: NMartin Willi <martin@strongswan.org> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
由 Martin Willi 提交于
As architecture specific drivers need a software fallback, export Poly1305 init/update/final functions together with some helpers in a header file. Signed-off-by: NMartin Willi <martin@strongswan.org> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
由 Martin Willi 提交于
The AVX2 variant of ChaCha20 is used only for messages with >= 512 bytes length. With the existing test vectors, the implementation could not be tested. Due that lack of such a long official test vector, this one is self-generated using chacha20-generic. Signed-off-by: NMartin Willi <martin@strongswan.org> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
由 Martin Willi 提交于
Extends the x86_64 ChaCha20 implementation by a function processing eight ChaCha20 blocks in parallel using AVX2. For large messages, throughput increases by ~55-70% compared to four block SSSE3: testing speed of chacha20 (chacha20-simd) encryption test 0 (256 bit key, 16 byte blocks): 42249230 operations in 10 seconds (675987680 bytes) test 1 (256 bit key, 64 byte blocks): 46441641 operations in 10 seconds (2972265024 bytes) test 2 (256 bit key, 256 byte blocks): 33028112 operations in 10 seconds (8455196672 bytes) test 3 (256 bit key, 1024 byte blocks): 11568759 operations in 10 seconds (11846409216 bytes) test 4 (256 bit key, 8192 byte blocks): 1448761 operations in 10 seconds (11868250112 bytes) testing speed of chacha20 (chacha20-simd) encryption test 0 (256 bit key, 16 byte blocks): 41999675 operations in 10 seconds (671994800 bytes) test 1 (256 bit key, 64 byte blocks): 45805908 operations in 10 seconds (2931578112 bytes) test 2 (256 bit key, 256 byte blocks): 32814947 operations in 10 seconds (8400626432 bytes) test 3 (256 bit key, 1024 byte blocks): 19777167 operations in 10 seconds (20251819008 bytes) test 4 (256 bit key, 8192 byte blocks): 2279321 operations in 10 seconds (18672197632 bytes) Benchmark results from a Core i5-4670T. Signed-off-by: NMartin Willi <martin@strongswan.org> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
由 Martin Willi 提交于
Extends the x86_64 SSSE3 ChaCha20 implementation by a function processing four ChaCha20 blocks in parallel. This avoids the word shuffling needed in the single block variant, further increasing throughput. For large messages, throughput increases by ~110% compared to single block SSSE3: testing speed of chacha20 (chacha20-simd) encryption test 0 (256 bit key, 16 byte blocks): 43141886 operations in 10 seconds (690270176 bytes) test 1 (256 bit key, 64 byte blocks): 46845874 operations in 10 seconds (2998135936 bytes) test 2 (256 bit key, 256 byte blocks): 18458512 operations in 10 seconds (4725379072 bytes) test 3 (256 bit key, 1024 byte blocks): 5360533 operations in 10 seconds (5489185792 bytes) test 4 (256 bit key, 8192 byte blocks): 692846 operations in 10 seconds (5675794432 bytes) testing speed of chacha20 (chacha20-simd) encryption test 0 (256 bit key, 16 byte blocks): 42249230 operations in 10 seconds (675987680 bytes) test 1 (256 bit key, 64 byte blocks): 46441641 operations in 10 seconds (2972265024 bytes) test 2 (256 bit key, 256 byte blocks): 33028112 operations in 10 seconds (8455196672 bytes) test 3 (256 bit key, 1024 byte blocks): 11568759 operations in 10 seconds (11846409216 bytes) test 4 (256 bit key, 8192 byte blocks): 1448761 operations in 10 seconds (11868250112 bytes) Benchmark results from a Core i5-4670T. Signed-off-by: NMartin Willi <martin@strongswan.org> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
由 Martin Willi 提交于
Implements an x86_64 assembler driver for the ChaCha20 stream cipher. This single block variant works on a single state matrix using SSE instructions. It requires SSSE3 due the use of pshufb for efficient 8/16-bit rotate operations. For large messages, throughput increases by ~65% compared to chacha20-generic: testing speed of chacha20 (chacha20-generic) encryption test 0 (256 bit key, 16 byte blocks): 45089207 operations in 10 seconds (721427312 bytes) test 1 (256 bit key, 64 byte blocks): 43839521 operations in 10 seconds (2805729344 bytes) test 2 (256 bit key, 256 byte blocks): 12702056 operations in 10 seconds (3251726336 bytes) test 3 (256 bit key, 1024 byte blocks): 3371173 operations in 10 seconds (3452081152 bytes) test 4 (256 bit key, 8192 byte blocks): 422468 operations in 10 seconds (3460857856 bytes) testing speed of chacha20 (chacha20-simd) encryption test 0 (256 bit key, 16 byte blocks): 43141886 operations in 10 seconds (690270176 bytes) test 1 (256 bit key, 64 byte blocks): 46845874 operations in 10 seconds (2998135936 bytes) test 2 (256 bit key, 256 byte blocks): 18458512 operations in 10 seconds (4725379072 bytes) test 3 (256 bit key, 1024 byte blocks): 5360533 operations in 10 seconds (5489185792 bytes) test 4 (256 bit key, 8192 byte blocks): 692846 operations in 10 seconds (5675794432 bytes) Benchmark results from a Core i5-4670T. Signed-off-by: NMartin Willi <martin@strongswan.org> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
由 Martin Willi 提交于
As architecture specific drivers need a software fallback, export a ChaCha20 en-/decryption function together with some helpers in a header file. Signed-off-by: NMartin Willi <martin@strongswan.org> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
由 Martin Willi 提交于
Adds individual ChaCha20 and Poly1305 and a combined rfc7539esp AEAD speed test using mode numbers 214, 321 and 213. For Poly1305 we add a specific speed template, as it expects the key prepended to the input data. Signed-off-by: NMartin Willi <martin@strongswan.org> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
由 Herbert Xu 提交于
This patch converts rfc7539 and rfc7539esp to the new AEAD interface. The test vectors for rfc7539esp have also been updated to include the IV. Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au> Tested-by: NMartin Willi <martin@strongswan.org>
-
由 Tadeusz Struk 提交于
Introduce constrains for RSA keys lengths. Only key lengths of 512, 1024, 1536, 2048, 3072, and 4096 bits will be supported. Signed-off-by: NTadeusz Struk <tadeusz.struk@intel.com> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
由 Tadeusz Struk 提交于
Add RSA support to QAT driver. Removed unused RNG rings. Signed-off-by: NTadeusz Struk <tadeusz.struk@intel.com> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
由 Tadeusz Struk 提交于
Add code that loads the MMP firmware Signed-off-by: NTadeusz Struk <tadeusz.struk@intel.com> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
由 Pingchao Yang 提交于
Load Modular Math Processor(MMP) firmware into QAT devices to support public key algorithm acceleration. Signed-off-by: NPingchao Yang <pingchao.yang@intel.com> Signed-off-by: NTadeusz Struk <tadeusz.struk@intel.com> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
由 Herbert Xu 提交于
Now that all implementations of rfc4309 have been converted we can reenable the test. Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
由 Herbert Xu 提交于
This patch converts the nx ccm and 4309 implementations to the new AEAD interface. Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
由 Herbert Xu 提交于
This patch converts the ARM64 aes-ce-ccm implementation to the new AEAD interface. Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au> Reviewed-by: NArd Biesheuvel <ard.biesheuvel@linaro.org> Tested-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
-
由 Herbert Xu 提交于
This patch converts generic ccm and its associated transforms to the new AEAD interface. Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
由 Herbert Xu 提交于
This patch disables the rfc4309 test while the conversion to the new seqiv calling convention takes place. It also replaces the rfc4309 test vectors with ones that will work with the new IV convention. Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
- 14 7月, 2015 21 次提交
-
-
vmx-crypto driver make use of some VSX instructions which are only available if VSX is enabled. Running in cases where VSX are not enabled vmx-crypto fails in a VSX exception. In order to fix this enable_kernel_vsx() was added to turn on VSX instructions for vmx-crypto. Signed-off-by: NLeonidas S. Barbosa <leosilva@linux.vnet.ibm.com> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
enable_kernel_vsx() function was commented since anything was using it. However, vmx-crypto driver uses VSX instructions which are only available if VSX is enable. Otherwise it rises an exception oops. This patch uncomment enable_kernel_vsx() routine and makes it available. Signed-off-by: NLeonidas S. Barbosa <leosilva@linux.vnet.ibm.com> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
由 Krzysztof Kozlowski 提交于
platform_driver does not need to set an owner because platform_driver_register() will set it. Signed-off-by: NKrzysztof Kozlowski <k.kozlowski@samsung.com> Acked-by: NBoris Brezillon <boris.brezillon@free-electrons.com> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
由 Herbert Xu 提交于
Now that all implementations of rfc4106 have been converted we can reenable the test. Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
由 Herbert Xu 提交于
This patch converts rfc4106 to the new calling convention where the IV is now part of the AD and needs to be skipped. Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
由 Herbert Xu 提交于
This patch converts rfc4106 to the new calling convention where the IV is now part of the AD and needs to be skipped. This patch also makes use of type-safe AEAD functions where possible. Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
由 Herbert Xu 提交于
This patch converts rfc4106 to the new calling convention where the IV is now part of the AD and needs to be skipped. This patch also makes use of the new type-safe way of freeing instances. Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
由 Herbert Xu 提交于
This patch converts rfc4106 to the new calling convention where the IV is now in the AD and needs to be skipped. Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
由 Herbert Xu 提交于
This patch allows the AEAD speed tests to cope with the new seqiv calling convention as well as the old one. Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
由 Herbert Xu 提交于
This patch disables the rfc4106 test while the conversion to the new seqiv calling convention takes place. It also converts the rfc4106 test vectors to the new format. Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
由 Herbert Xu 提交于
This patch allows the CRYPTO_ALG_AEAD_NEW flag to be propagated. Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
由 Herbert Xu 提交于
This patch replaces the seqniv generator with seqiv when the underlying algorithm understands the new calling convention. This not only makes more sense as now seqiv is solely responsible for IV generation rather than also determining how the IV is going to be used, it also allows for optimisations in the underlying implementation. For example, the space for the IV could be used to add padding for authentication. This patch also removes the unnecessary copying of IV to dst during seqiv decryption as the IV is part of the AD and not cipher text. Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
由 Herbert Xu 提交于
This patch fixes a bug where we were incorrectly including the IV in the AD during encryption. The IV must remain in the plain text for it to be encrypted. During decryption there is no need to copy the IV to dst because it's now part of the AD. This patch removes an unncessary check on authsize which would be performed by the underlying decrypt call. Finally this patch makes use of the type-safe init/exit functions. Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
由 Herbert Xu 提交于
This patch allows the CRYPTO_ALG_AEAD_NEW flag to be propagated. Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
由 Herbert Xu 提交于
This patch allows the CRYPTO_ALG_AEAD_NEW flag to be propagated. It also restores the ASYNC bit that went missing during the AEAD conversion. Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
由 Herbert Xu 提交于
This patch adds a type-safe function for freeing AEAD instances to struct aead_instance. This replaces the existing free function in struct crypto_template which does not know the type of the instance that it's freeing. Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
由 Herbert Xu 提交于
Currently the task of freeing an instance is given to the crypto template. However, it has no type information on the instance so we have to resort to checking type information at runtime. This patch introduces a free function to crypto_type that will be used to free an instance. This can then be used to free an instance in a type-safe manner. Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
由 Herbert Xu 提交于
The transform context is shared memory and must not be written to without locking. This patch adds locking to nx-842 to prevent context corruption. Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
由 Herbert Xu 提交于
This patch adds a type-safe queueing interface for AEAD. Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
由 Herbert Xu 提交于
The function __crypto_dequeue_request is completely unused. Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6由 Herbert Xu 提交于
Merge the crypto tree to pull in the nx reentrancy patch.
-