1. 24 2月, 2016 5 次提交
    • J
      x86/asm/crypto: Create stack frames in crypto functions · 8691ccd7
      Josh Poimboeuf 提交于
      The crypto code has several callable non-leaf functions which don't
      honor CONFIG_FRAME_POINTER, which can result in bad stack traces.
      
      Create stack frames for them when CONFIG_FRAME_POINTER is enabled.
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Bernd Petrovitsch <bernd@petrovitsch.priv.at>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Chris J Arges <chris.j.arges@canonical.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michal Marek <mmarek@suse.cz>
      Cc: Namhyung Kim <namhyung@gmail.com>
      Cc: Pedro Alves <palves@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: live-patching@vger.kernel.org
      Link: http://lkml.kernel.org/r/6c20192bcf1102ae18ae5a242cabf30ce9b29895.1453405861.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      8691ccd7
    • J
      x86/asm/crypto: Don't use RBP as a scratch register · 68874ac3
      Josh Poimboeuf 提交于
      The frame pointer (RBP) is getting clobbered in
      sha1_mb_mgr_submit_avx2() before a function call, which can mess up
      stack traces.  Use R12 instead.
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Bernd Petrovitsch <bernd@petrovitsch.priv.at>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Chris J Arges <chris.j.arges@canonical.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michal Marek <mmarek@suse.cz>
      Cc: Namhyung Kim <namhyung@gmail.com>
      Cc: Pedro Alves <palves@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: live-patching@vger.kernel.org
      Link: http://lkml.kernel.org/r/15a3eb7ebe68e37755927915f45e4f0bde4d18c5.1453405861.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      68874ac3
    • J
      x86/asm/crypto: Simplify stack usage in sha-mb functions · aec4d0e3
      Josh Poimboeuf 提交于
      sha1_mb_mgr_flush_avx2() and sha1_mb_mgr_submit_avx2() both allocate a
      lot of stack space which is never used.  Also, many of the registers
      being saved aren't being clobbered so there's no need to save them.
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Bernd Petrovitsch <bernd@petrovitsch.priv.at>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Chris J Arges <chris.j.arges@canonical.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michal Marek <mmarek@suse.cz>
      Cc: Namhyung Kim <namhyung@gmail.com>
      Cc: Pedro Alves <palves@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: live-patching@vger.kernel.org
      Link: http://lkml.kernel.org/r/9402e4d87580d6b2376ed95f67b84bdcce3c830e.1453405861.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      aec4d0e3
    • J
      x86/asm/crypto: Move jump_table to .rodata section · f66f6191
      Josh Poimboeuf 提交于
      stacktool reports the following warning:
      
        stacktool: arch/x86/crypto/crc32c-pcl-intel-asm_64.o: crc_pcl()+0x11dd: can't decode instruction
      
      It gets confused when trying to decode jump_table data.  Move jump_table
      to the .rodata section which is a more appropriate home for read-only
      data.
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Bernd Petrovitsch <bernd@petrovitsch.priv.at>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Chris J Arges <chris.j.arges@canonical.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michal Marek <mmarek@suse.cz>
      Cc: Namhyung Kim <namhyung@gmail.com>
      Cc: Pedro Alves <palves@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: live-patching@vger.kernel.org
      Link: http://lkml.kernel.org/r/1dbf80c097bb9d89c0cbddc01a815ada690e3b32.1453405861.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f66f6191
    • J
      x86/asm/crypto: Move .Lbswap_mask data to .rodata section · 1253cab8
      Josh Poimboeuf 提交于
      stacktool reports the following warning:
      
        stacktool: arch/x86/crypto/aesni-intel_asm.o: _aesni_inc_init(): can't find starting instruction
      
      stacktool gets confused when it tries to disassemble the following data
      in the .text section:
      
        .Lbswap_mask:
                .byte 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0
      
      Move it to .rodata which is a more appropriate section for read-only
      data.
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Bernd Petrovitsch <bernd@petrovitsch.priv.at>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Chris J Arges <chris.j.arges@canonical.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michal Marek <mmarek@suse.cz>
      Cc: Namhyung Kim <namhyung@gmail.com>
      Cc: Pedro Alves <palves@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: live-patching@vger.kernel.org
      Link: http://lkml.kernel.org/r/b6a2f3f8bda705143e127c025edb2b53c86e6eb4.1453405861.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      1253cab8
  2. 17 2月, 2016 1 次提交
    • S
      crypto: xts - consolidate sanity check for keys · 28856a9e
      Stephan Mueller 提交于
      The patch centralizes the XTS key check logic into the service function
      xts_check_key which is invoked from the different XTS implementations.
      With this, the XTS implementations in ARM, ARM64, PPC and S390 have now
      a sanity check for the XTS keys similar to the other arches.
      
      In addition, this service function received a check to ensure that the
      key != the tweak key which is mandated by FIPS 140-2 IG A.9. As the
      check is not present in the standards defining XTS, it is only enforced
      in FIPS mode of the kernel.
      Signed-off-by: NStephan Mueller <smueller@chronox.de>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      28856a9e
  3. 06 2月, 2016 1 次提交
    • W
      crypto: sha-mb - Fix load failure · fd09967b
      Wang, Rui Y 提交于
      On  Monday, February 1, 2016 4:18 PM, Herbert Xu wrote:
      >
      > On Wed, Jan 27, 2016 at 05:08:35PM +0800, Rui Wang wrote:
      >>
      >> +static int sha1_mb_async_import(struct ahash_request *req, const void
      >> +*in) {
      >> +	struct ahash_request *mcryptd_req = ahash_request_ctx(req);
      >> +	struct crypto_ahash *tfm = crypto_ahash_reqtfm(req);
      >> +	struct sha1_mb_ctx *ctx = crypto_ahash_ctx(tfm);
      >> +	struct mcryptd_ahash *mcryptd_tfm = ctx->mcryptd_tfm;
      >> +	struct crypto_shash *child = mcryptd_ahash_child(mcryptd_tfm);
      >> +	struct mcryptd_hash_request_ctx *rctx;
      >> +	struct shash_desc *desc;
      >> +	int err;
      >> +
      >> +	memcpy(mcryptd_req, req, sizeof(*req));
      >> +	ahash_request_set_tfm(mcryptd_req, &mcryptd_tfm->base);
      >> +	rctx = ahash_request_ctx(mcryptd_req);
      >> +	desc = &rctx->desc;
      >> +	desc->tfm = child;
      >> +	desc->flags = CRYPTO_TFM_REQ_MAY_SLEEP;
      >> +
      >> +	err = crypto_shash_init(desc);
      >> +	if (err)
      >> +		return err;
      >
      > What is this desc for?
      
      Hi Herbert,
      
      Yeah I just realized that the call to crypto_shash_init() isn't necessary
      here. What it does is overwritten by crypto_ahash_import(). But this desc
      still needs to be initialized here because it's newly allocated by
      ahash_request_alloc(). We eventually calls the shash version of import()
      which needs desc as an argument. The real context to be imported is then
      derived from shash_desc_ctx(desc).
      
      desc is a sub-field of struct mcryptd_hash_request_ctx, which is again a
      sub-field of the bigger blob allocated by ahash_request_alloc(). The entire
      blob's size is set in sha1_mb_async_init_tfm(). So a better version is as
      follows:
      
      (just removed the call to crypto_shash_init())
      
      >From 4bcb73adbef99aada94c49f352063619aa24d43d Mon Sep 17 00:00:00 2001
      From: Rui Wang <rui.y.wang@intel.com>
      Date: Mon, 14 Dec 2015 17:22:13 +0800
      Subject: [PATCH v2 1/4] crypto x86/sha1_mb: Fix load failure
      
      modprobe sha1_mb fails with the following message:
      
      modprobe: ERROR: could not insert 'sha1_mb': No such device
      
      It is because it needs to set its statesize and implement its
      import() and export() interface.
      
      v2: remove redundant call to crypto_shash_init()
      Signed-off-by: NRui Wang <rui.y.wang@intel.com>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      fd09967b
  4. 30 1月, 2016 1 次提交
  5. 27 1月, 2016 1 次提交
  6. 25 1月, 2016 1 次提交
  7. 19 12月, 2015 1 次提交
  8. 04 12月, 2015 1 次提交
    • W
      crypto: ghash-clmulni - Fix load failure · 3a020a72
      Wang, Rui Y 提交于
      ghash_clmulni_intel fails to load on Linux 4.3+ with the following message:
      "modprobe: ERROR: could not insert 'ghash_clmulni_intel': Invalid argument"
      
      After 8996eafd ("crypto: ahash - ensure statesize is non-zero") all ahash
      drivers are required to implement import()/export(), and must have a non-
      zero statesize.
      
      This patch has been tested with the algif_hash interface. The calculated
      digest values, after several rounds of import()s and export()s, match those
      calculated by tcrypt.
      Signed-off-by: NRui Wang <rui.y.wang@intel.com>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      3a020a72
  9. 08 10月, 2015 1 次提交
  10. 24 9月, 2015 1 次提交
  11. 21 9月, 2015 8 次提交
  12. 14 9月, 2015 1 次提交
    • D
      x86/fpu: Rename XSAVE macros · d91cab78
      Dave Hansen 提交于
      There are two concepts that have some confusing naming:
       1. Extended State Component numbers (currently called
          XFEATURE_BIT_*)
       2. Extended State Component masks (currently called XSTATE_*)
      
      The numbers are (currently) from 0-9.  State component 3 is the
      bounds registers for MPX, for instance.
      
      But when we want to enable "state component 3", we go set a bit
      in XCR0.  The bit we set is 1<<3.  We can check to see if a
      state component feature is enabled by looking at its bit.
      
      The current 'xfeature_bit's are at best xfeature bit _numbers_.
      Calling them bits is at best inconsistent with ending the enum
      list with 'XFEATURES_NR_MAX'.
      
      This patch renames the enum to be 'xfeature'.  These also
      happen to be what the Intel documentation calls a "state
      component".
      
      We also want to differentiate these from the "XSTATE_*" macros.
      The "XSTATE_*" macros are a mask, and we rename them to match.
      
      These macros are reasonably widely used so this patch is a
      wee bit big, but this really is just a rename.
      
      The only non-mechanical part of this is the
      
      	s/XSTATE_EXTEND_MASK/XFEATURE_MASK_EXTEND/
      
      We need a better name for it, but that's another patch.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: dave@sr71.net
      Cc: linux-kernel@vger.kernel.org
      Link: http://lkml.kernel.org/r/20150902233126.38653250@viggo.jf.intel.com
      [ Ported to v4.3-rc1. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      d91cab78
  13. 04 9月, 2015 1 次提交
  14. 17 8月, 2015 1 次提交
  15. 17 7月, 2015 6 次提交
    • M
      crypto: poly1305 - Add a four block AVX2 variant for x86_64 · b1ccc8f4
      Martin Willi 提交于
      Extends the x86_64 Poly1305 authenticator by a function processing four
      consecutive Poly1305 blocks in parallel using AVX2 instructions.
      
      For large messages, throughput increases by ~15-45% compared to two
      block SSE2:
      
      testing speed of poly1305 (poly1305-simd)
      test  0 (   96 byte blocks,   16 bytes per update,   6 updates): 3809514 opers/sec,  365713411 bytes/sec
      test  1 (   96 byte blocks,   32 bytes per update,   3 updates): 5973423 opers/sec,  573448627 bytes/sec
      test  2 (   96 byte blocks,   96 bytes per update,   1 updates): 9446779 opers/sec,  906890803 bytes/sec
      test  3 (  288 byte blocks,   16 bytes per update,  18 updates): 1364814 opers/sec,  393066691 bytes/sec
      test  4 (  288 byte blocks,   32 bytes per update,   9 updates): 2045780 opers/sec,  589184697 bytes/sec
      test  5 (  288 byte blocks,  288 bytes per update,   1 updates): 3711946 opers/sec, 1069040592 bytes/sec
      test  6 ( 1056 byte blocks,   32 bytes per update,  33 updates):  573686 opers/sec,  605812732 bytes/sec
      test  7 ( 1056 byte blocks, 1056 bytes per update,   1 updates): 1647802 opers/sec, 1740079440 bytes/sec
      test  8 ( 2080 byte blocks,   32 bytes per update,  65 updates):  292970 opers/sec,  609378224 bytes/sec
      test  9 ( 2080 byte blocks, 2080 bytes per update,   1 updates):  943229 opers/sec, 1961916528 bytes/sec
      test 10 ( 4128 byte blocks, 4128 bytes per update,   1 updates):  494623 opers/sec, 2041804569 bytes/sec
      test 11 ( 8224 byte blocks, 8224 bytes per update,   1 updates):  254045 opers/sec, 2089271014 bytes/sec
      
      testing speed of poly1305 (poly1305-simd)
      test  0 (   96 byte blocks,   16 bytes per update,   6 updates): 3826224 opers/sec,  367317552 bytes/sec
      test  1 (   96 byte blocks,   32 bytes per update,   3 updates): 5948638 opers/sec,  571069267 bytes/sec
      test  2 (   96 byte blocks,   96 bytes per update,   1 updates): 9439110 opers/sec,  906154627 bytes/sec
      test  3 (  288 byte blocks,   16 bytes per update,  18 updates): 1367756 opers/sec,  393913872 bytes/sec
      test  4 (  288 byte blocks,   32 bytes per update,   9 updates): 2056881 opers/sec,  592381958 bytes/sec
      test  5 (  288 byte blocks,  288 bytes per update,   1 updates): 3711153 opers/sec, 1068812179 bytes/sec
      test  6 ( 1056 byte blocks,   32 bytes per update,  33 updates):  574940 opers/sec,  607136745 bytes/sec
      test  7 ( 1056 byte blocks, 1056 bytes per update,   1 updates): 1948830 opers/sec, 2057964585 bytes/sec
      test  8 ( 2080 byte blocks,   32 bytes per update,  65 updates):  293308 opers/sec,  610082096 bytes/sec
      test  9 ( 2080 byte blocks, 2080 bytes per update,   1 updates): 1235224 opers/sec, 2569267792 bytes/sec
      test 10 ( 4128 byte blocks, 4128 bytes per update,   1 updates):  684405 opers/sec, 2825226316 bytes/sec
      test 11 ( 8224 byte blocks, 8224 bytes per update,   1 updates):  367101 opers/sec, 3019039446 bytes/sec
      
      Benchmark results from a Core i5-4670T.
      Signed-off-by: NMartin Willi <martin@strongswan.org>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      b1ccc8f4
    • M
      crypto: poly1305 - Add a two block SSE2 variant for x86_64 · da35b22d
      Martin Willi 提交于
      Extends the x86_64 SSE2 Poly1305 authenticator by a function processing two
      consecutive Poly1305 blocks in parallel using a derived key r^2. Loop
      unrolling can be more effectively mapped to SSE instructions, further
      increasing throughput.
      
      For large messages, throughput increases by ~45-65% compared to single
      block SSE2:
      
      testing speed of poly1305 (poly1305-simd)
      test  0 (   96 byte blocks,   16 bytes per update,   6 updates): 3790063 opers/sec,  363846076 bytes/sec
      test  1 (   96 byte blocks,   32 bytes per update,   3 updates): 5913378 opers/sec,  567684355 bytes/sec
      test  2 (   96 byte blocks,   96 bytes per update,   1 updates): 9352574 opers/sec,  897847104 bytes/sec
      test  3 (  288 byte blocks,   16 bytes per update,  18 updates): 1362145 opers/sec,  392297990 bytes/sec
      test  4 (  288 byte blocks,   32 bytes per update,   9 updates): 2007075 opers/sec,  578037628 bytes/sec
      test  5 (  288 byte blocks,  288 bytes per update,   1 updates): 3709811 opers/sec, 1068425798 bytes/sec
      test  6 ( 1056 byte blocks,   32 bytes per update,  33 updates):  566272 opers/sec,  597984182 bytes/sec
      test  7 ( 1056 byte blocks, 1056 bytes per update,   1 updates): 1111657 opers/sec, 1173910108 bytes/sec
      test  8 ( 2080 byte blocks,   32 bytes per update,  65 updates):  288857 opers/sec,  600823808 bytes/sec
      test  9 ( 2080 byte blocks, 2080 bytes per update,   1 updates):  590746 opers/sec, 1228751888 bytes/sec
      test 10 ( 4128 byte blocks, 4128 bytes per update,   1 updates):  301825 opers/sec, 1245936902 bytes/sec
      test 11 ( 8224 byte blocks, 8224 bytes per update,   1 updates):  153075 opers/sec, 1258896201 bytes/sec
      
      testing speed of poly1305 (poly1305-simd)
      test  0 (   96 byte blocks,   16 bytes per update,   6 updates): 3809514 opers/sec,  365713411 bytes/sec
      test  1 (   96 byte blocks,   32 bytes per update,   3 updates): 5973423 opers/sec,  573448627 bytes/sec
      test  2 (   96 byte blocks,   96 bytes per update,   1 updates): 9446779 opers/sec,  906890803 bytes/sec
      test  3 (  288 byte blocks,   16 bytes per update,  18 updates): 1364814 opers/sec,  393066691 bytes/sec
      test  4 (  288 byte blocks,   32 bytes per update,   9 updates): 2045780 opers/sec,  589184697 bytes/sec
      test  5 (  288 byte blocks,  288 bytes per update,   1 updates): 3711946 opers/sec, 1069040592 bytes/sec
      test  6 ( 1056 byte blocks,   32 bytes per update,  33 updates):  573686 opers/sec,  605812732 bytes/sec
      test  7 ( 1056 byte blocks, 1056 bytes per update,   1 updates): 1647802 opers/sec, 1740079440 bytes/sec
      test  8 ( 2080 byte blocks,   32 bytes per update,  65 updates):  292970 opers/sec,  609378224 bytes/sec
      test  9 ( 2080 byte blocks, 2080 bytes per update,   1 updates):  943229 opers/sec, 1961916528 bytes/sec
      test 10 ( 4128 byte blocks, 4128 bytes per update,   1 updates):  494623 opers/sec, 2041804569 bytes/sec
      test 11 ( 8224 byte blocks, 8224 bytes per update,   1 updates):  254045 opers/sec, 2089271014 bytes/sec
      
      Benchmark results from a Core i5-4670T.
      Signed-off-by: NMartin Willi <martin@strongswan.org>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      da35b22d
    • M
      crypto: poly1305 - Add a SSE2 SIMD variant for x86_64 · c70f4abe
      Martin Willi 提交于
      Implements an x86_64 assembler driver for the Poly1305 authenticator. This
      single block variant holds the 130-bit integer in 5 32-bit words, but uses
      SSE to do two multiplications/additions in parallel.
      
      When calling updates with small blocks, the overhead for kernel_fpu_begin/
      kernel_fpu_end() negates the perfmance gain. We therefore use the
      poly1305-generic fallback for small updates.
      
      For large messages, throughput increases by ~5-10% compared to
      poly1305-generic:
      
      testing speed of poly1305 (poly1305-generic)
      test  0 (   96 byte blocks,   16 bytes per update,   6 updates): 4080026 opers/sec,  391682496 bytes/sec
      test  1 (   96 byte blocks,   32 bytes per update,   3 updates): 6221094 opers/sec,  597225024 bytes/sec
      test  2 (   96 byte blocks,   96 bytes per update,   1 updates): 9609750 opers/sec,  922536057 bytes/sec
      test  3 (  288 byte blocks,   16 bytes per update,  18 updates): 1459379 opers/sec,  420301267 bytes/sec
      test  4 (  288 byte blocks,   32 bytes per update,   9 updates): 2115179 opers/sec,  609171609 bytes/sec
      test  5 (  288 byte blocks,  288 bytes per update,   1 updates): 3729874 opers/sec, 1074203856 bytes/sec
      test  6 ( 1056 byte blocks,   32 bytes per update,  33 updates):  593000 opers/sec,  626208000 bytes/sec
      test  7 ( 1056 byte blocks, 1056 bytes per update,   1 updates): 1081536 opers/sec, 1142102332 bytes/sec
      test  8 ( 2080 byte blocks,   32 bytes per update,  65 updates):  302077 opers/sec,  628320576 bytes/sec
      test  9 ( 2080 byte blocks, 2080 bytes per update,   1 updates):  554384 opers/sec, 1153120176 bytes/sec
      test 10 ( 4128 byte blocks, 4128 bytes per update,   1 updates):  278715 opers/sec, 1150536345 bytes/sec
      test 11 ( 8224 byte blocks, 8224 bytes per update,   1 updates):  140202 opers/sec, 1153022070 bytes/sec
      
      testing speed of poly1305 (poly1305-simd)
      test  0 (   96 byte blocks,   16 bytes per update,   6 updates): 3790063 opers/sec,  363846076 bytes/sec
      test  1 (   96 byte blocks,   32 bytes per update,   3 updates): 5913378 opers/sec,  567684355 bytes/sec
      test  2 (   96 byte blocks,   96 bytes per update,   1 updates): 9352574 opers/sec,  897847104 bytes/sec
      test  3 (  288 byte blocks,   16 bytes per update,  18 updates): 1362145 opers/sec,  392297990 bytes/sec
      test  4 (  288 byte blocks,   32 bytes per update,   9 updates): 2007075 opers/sec,  578037628 bytes/sec
      test  5 (  288 byte blocks,  288 bytes per update,   1 updates): 3709811 opers/sec, 1068425798 bytes/sec
      test  6 ( 1056 byte blocks,   32 bytes per update,  33 updates):  566272 opers/sec,  597984182 bytes/sec
      test  7 ( 1056 byte blocks, 1056 bytes per update,   1 updates): 1111657 opers/sec, 1173910108 bytes/sec
      test  8 ( 2080 byte blocks,   32 bytes per update,  65 updates):  288857 opers/sec,  600823808 bytes/sec
      test  9 ( 2080 byte blocks, 2080 bytes per update,   1 updates):  590746 opers/sec, 1228751888 bytes/sec
      test 10 ( 4128 byte blocks, 4128 bytes per update,   1 updates):  301825 opers/sec, 1245936902 bytes/sec
      test 11 ( 8224 byte blocks, 8224 bytes per update,   1 updates):  153075 opers/sec, 1258896201 bytes/sec
      
      Benchmark results from a Core i5-4670T.
      Signed-off-by: NMartin Willi <martin@strongswan.org>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      c70f4abe
    • M
      crypto: chacha20 - Add an eight block AVX2 variant for x86_64 · 3d1e93cd
      Martin Willi 提交于
      Extends the x86_64 ChaCha20 implementation by a function processing eight
      ChaCha20 blocks in parallel using AVX2.
      
      For large messages, throughput increases by ~55-70% compared to four block
      SSSE3:
      
      testing speed of chacha20 (chacha20-simd) encryption
      test 0 (256 bit key, 16 byte blocks): 42249230 operations in 10 seconds (675987680 bytes)
      test 1 (256 bit key, 64 byte blocks): 46441641 operations in 10 seconds (2972265024 bytes)
      test 2 (256 bit key, 256 byte blocks): 33028112 operations in 10 seconds (8455196672 bytes)
      test 3 (256 bit key, 1024 byte blocks): 11568759 operations in 10 seconds (11846409216 bytes)
      test 4 (256 bit key, 8192 byte blocks): 1448761 operations in 10 seconds (11868250112 bytes)
      
      testing speed of chacha20 (chacha20-simd) encryption
      test 0 (256 bit key, 16 byte blocks): 41999675 operations in 10 seconds (671994800 bytes)
      test 1 (256 bit key, 64 byte blocks): 45805908 operations in 10 seconds (2931578112 bytes)
      test 2 (256 bit key, 256 byte blocks): 32814947 operations in 10 seconds (8400626432 bytes)
      test 3 (256 bit key, 1024 byte blocks): 19777167 operations in 10 seconds (20251819008 bytes)
      test 4 (256 bit key, 8192 byte blocks): 2279321 operations in 10 seconds (18672197632 bytes)
      
      Benchmark results from a Core i5-4670T.
      Signed-off-by: NMartin Willi <martin@strongswan.org>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      3d1e93cd
    • M
      crypto: chacha20 - Add a four block SSSE3 variant for x86_64 · 274f938e
      Martin Willi 提交于
      Extends the x86_64 SSSE3 ChaCha20 implementation by a function processing
      four ChaCha20 blocks in parallel. This avoids the word shuffling needed
      in the single block variant, further increasing throughput.
      
      For large messages, throughput increases by ~110% compared to single block
      SSSE3:
      
      testing speed of chacha20 (chacha20-simd) encryption
      test 0 (256 bit key, 16 byte blocks): 43141886 operations in 10 seconds (690270176 bytes)
      test 1 (256 bit key, 64 byte blocks): 46845874 operations in 10 seconds (2998135936 bytes)
      test 2 (256 bit key, 256 byte blocks): 18458512 operations in 10 seconds (4725379072 bytes)
      test 3 (256 bit key, 1024 byte blocks): 5360533 operations in 10 seconds (5489185792 bytes)
      test 4 (256 bit key, 8192 byte blocks): 692846 operations in 10 seconds (5675794432 bytes)
      
      testing speed of chacha20 (chacha20-simd) encryption
      test 0 (256 bit key, 16 byte blocks): 42249230 operations in 10 seconds (675987680 bytes)
      test 1 (256 bit key, 64 byte blocks): 46441641 operations in 10 seconds (2972265024 bytes)
      test 2 (256 bit key, 256 byte blocks): 33028112 operations in 10 seconds (8455196672 bytes)
      test 3 (256 bit key, 1024 byte blocks): 11568759 operations in 10 seconds (11846409216 bytes)
      test 4 (256 bit key, 8192 byte blocks): 1448761 operations in 10 seconds (11868250112 bytes)
      
      Benchmark results from a Core i5-4670T.
      Signed-off-by: NMartin Willi <martin@strongswan.org>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      274f938e
    • M
      crypto: chacha20 - Add a SSSE3 SIMD variant for x86_64 · c9320b6d
      Martin Willi 提交于
      Implements an x86_64 assembler driver for the ChaCha20 stream cipher. This
      single block variant works on a single state matrix using SSE instructions.
      It requires SSSE3 due the use of pshufb for efficient 8/16-bit rotate
      operations.
      
      For large messages, throughput increases by ~65% compared to
      chacha20-generic:
      
      testing speed of chacha20 (chacha20-generic) encryption
      test 0 (256 bit key, 16 byte blocks): 45089207 operations in 10 seconds (721427312 bytes)
      test 1 (256 bit key, 64 byte blocks): 43839521 operations in 10 seconds (2805729344 bytes)
      test 2 (256 bit key, 256 byte blocks): 12702056 operations in 10 seconds (3251726336 bytes)
      test 3 (256 bit key, 1024 byte blocks): 3371173 operations in 10 seconds (3452081152 bytes)
      test 4 (256 bit key, 8192 byte blocks): 422468 operations in 10 seconds (3460857856 bytes)
      
      testing speed of chacha20 (chacha20-simd) encryption
      test 0 (256 bit key, 16 byte blocks): 43141886 operations in 10 seconds (690270176 bytes)
      test 1 (256 bit key, 64 byte blocks): 46845874 operations in 10 seconds (2998135936 bytes)
      test 2 (256 bit key, 256 byte blocks): 18458512 operations in 10 seconds (4725379072 bytes)
      test 3 (256 bit key, 1024 byte blocks): 5360533 operations in 10 seconds (5489185792 bytes)
      test 4 (256 bit key, 8192 byte blocks): 692846 operations in 10 seconds (5675794432 bytes)
      
      Benchmark results from a Core i5-4670T.
      Signed-off-by: NMartin Willi <martin@strongswan.org>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      c9320b6d
  16. 14 7月, 2015 1 次提交
  17. 29 6月, 2015 1 次提交
  18. 15 6月, 2015 1 次提交
    • J
      crypto: aesni - fix crypto_fpu_exit() section mismatch · de1e0087
      Jeremiah Mahler 提交于
      The '__init aesni_init()' function calls the '__exit crypto_fpu_exit()'
      function directly.  Since they are in different sections, this generates
      a warning.
      
        make CONFIG_DEBUG_SECTION_MISMATCH=y
        ...
        WARNING: arch/x86/crypto/aesni-intel.o(.init.text+0x12b): Section
        mismatch in reference from the function init_module() to the function
        .exit.text:crypto_fpu_exit()
        The function __init init_module() references
        a function __exit crypto_fpu_exit().
        This is often seen when error handling in the init function
        uses functionality in the exit path.
        The fix is often to remove the __exit annotation of
        crypto_fpu_exit() so it may be used outside an exit section.
      
      Fix the warning by removing the __exit annotation.
      Signed-off-by: NJeremiah Mahler <jmmahler@gmail.com>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      de1e0087
  19. 03 6月, 2015 2 次提交
  20. 22 5月, 2015 1 次提交
    • I
      x86/fpu, crypto: Fix AVX2 feature tests · b54b4bbb
      Ingo Molnar 提交于
      For some CPU models I broke the AVX2 feature detection in:
      
        7bc371fa ("x86/fpu, crypto x86/camellia_aesni_avx2: Simplify the camellia_aesni_init() xfeature checks")
        534ff06e ("x86/fpu, crypto x86/serpent_avx2: Simplify the init() xfeature checks")
      
      ... because I did not realize that it's possible for a CPU to support
      the xstate necessary for AVX2 execution (XSTATE_YMM), but not have
      the AVX2 instructions themselves.
      
      Restore the necessary CPUID checks as well.
      
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      b54b4bbb
  21. 19 5月, 2015 3 次提交
    • I
      x86/fpu, crypto x86/sha1_mb: Remove FPU internal headers from sha1_mb.c · 57dd083e
      Ingo Molnar 提交于
      This file only uses the public FPU APIs, so remove the xcr.h, fpu/xstate.h
      and fpu/internal.h headers and add the fpu/api.h include.
      
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      57dd083e
    • I
      x86/fpu, crypto x86/serpent_avx2: Simplify the init() xfeature checks · 534ff06e
      Ingo Molnar 提交于
      Use the new 'cpu_has_xfeatures()' function to query AVX CPU support.
      
      This has the following advantages to the driver:
      
       - Decouples the driver from FPU internals: it's now only using <asm/fpu/api.h>.
      
       - Removes detection complexity from the driver, no more raw XGETBV instruction
      
       - Shrinks the code a bit.
      
       - Standardizes feature name error message printouts across drivers
      
      There are also advantages to the x86 FPU code: once all drivers
      are decoupled from internals we can move them out of common
      headers and we'll also be able to remove xcr.h.
      
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      534ff06e
    • I
      x86/fpu, crypto x86/sha1_ssse3: Simplify the sha1_ssse3_mod_init() xfeature checks · d1e50966
      Ingo Molnar 提交于
      Use the new 'cpu_has_xfeatures()' function to query AVX CPU support.
      
      This has the following advantages to the driver:
      
       - Decouples the driver from FPU internals: it's now only using <asm/fpu/api.h>.
      
       - Removes detection complexity from the driver, no more raw XGETBV instruction
      
       - Shrinks the code a bit.
      
       - Standardizes feature name error message printouts across drivers
      
      There are also advantages to the x86 FPU code: once all drivers
      are decoupled from internals we can move them out of common
      headers and we'll also be able to remove xcr.h.
      
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      d1e50966