1. 16 4月, 2016 1 次提交
    • D
      x86: Fix non-static inlines · a3819e3e
      Denys Vlasenko 提交于
      Four instances of incorrect usage of non-static "inline" crept up
      in arch/x86, all trivial; cleaning them up:
      
      EVT_TO_HPET_DEV() - made static, it is only used in kernel/hpet.c
      
      Debug version of check_iommu_entries() is an __init function.
      Non-debug dummy empty version of it is declared "inline" instead -
      which doesn't help to eliminate it (the caller is in a different unit,
      inlining doesn't happen).
      Switch to non-inlined __init function, which does eliminate it
      (by discarding it as part of __init section).
      
      crypto/sha-mb/sha1_mb.c: looks like they just forgot to add "static"
      to their two internal inlines, which emitted two unused functions into
      vmlinux.
      
            text     data      bss       dec     hex filename
        95903394 20860288 35991552 152755234 91adc22 vmlinux_before
        95903266 20860288 35991552 152755106 91adba2 vmlinux
      Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Link: http://lkml.kernel.org/r/1460739626-12179-1-git-send-email-dvlasenk@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      a3819e3e
  2. 15 4月, 2016 1 次提交
  3. 13 4月, 2016 3 次提交
  4. 31 3月, 2016 2 次提交
  5. 24 2月, 2016 5 次提交
    • J
      x86/asm/crypto: Create stack frames in crypto functions · 8691ccd7
      Josh Poimboeuf 提交于
      The crypto code has several callable non-leaf functions which don't
      honor CONFIG_FRAME_POINTER, which can result in bad stack traces.
      
      Create stack frames for them when CONFIG_FRAME_POINTER is enabled.
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Bernd Petrovitsch <bernd@petrovitsch.priv.at>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Chris J Arges <chris.j.arges@canonical.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michal Marek <mmarek@suse.cz>
      Cc: Namhyung Kim <namhyung@gmail.com>
      Cc: Pedro Alves <palves@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: live-patching@vger.kernel.org
      Link: http://lkml.kernel.org/r/6c20192bcf1102ae18ae5a242cabf30ce9b29895.1453405861.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      8691ccd7
    • J
      x86/asm/crypto: Don't use RBP as a scratch register · 68874ac3
      Josh Poimboeuf 提交于
      The frame pointer (RBP) is getting clobbered in
      sha1_mb_mgr_submit_avx2() before a function call, which can mess up
      stack traces.  Use R12 instead.
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Bernd Petrovitsch <bernd@petrovitsch.priv.at>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Chris J Arges <chris.j.arges@canonical.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michal Marek <mmarek@suse.cz>
      Cc: Namhyung Kim <namhyung@gmail.com>
      Cc: Pedro Alves <palves@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: live-patching@vger.kernel.org
      Link: http://lkml.kernel.org/r/15a3eb7ebe68e37755927915f45e4f0bde4d18c5.1453405861.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      68874ac3
    • J
      x86/asm/crypto: Simplify stack usage in sha-mb functions · aec4d0e3
      Josh Poimboeuf 提交于
      sha1_mb_mgr_flush_avx2() and sha1_mb_mgr_submit_avx2() both allocate a
      lot of stack space which is never used.  Also, many of the registers
      being saved aren't being clobbered so there's no need to save them.
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Bernd Petrovitsch <bernd@petrovitsch.priv.at>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Chris J Arges <chris.j.arges@canonical.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michal Marek <mmarek@suse.cz>
      Cc: Namhyung Kim <namhyung@gmail.com>
      Cc: Pedro Alves <palves@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: live-patching@vger.kernel.org
      Link: http://lkml.kernel.org/r/9402e4d87580d6b2376ed95f67b84bdcce3c830e.1453405861.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      aec4d0e3
    • J
      x86/asm/crypto: Move jump_table to .rodata section · f66f6191
      Josh Poimboeuf 提交于
      stacktool reports the following warning:
      
        stacktool: arch/x86/crypto/crc32c-pcl-intel-asm_64.o: crc_pcl()+0x11dd: can't decode instruction
      
      It gets confused when trying to decode jump_table data.  Move jump_table
      to the .rodata section which is a more appropriate home for read-only
      data.
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Bernd Petrovitsch <bernd@petrovitsch.priv.at>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Chris J Arges <chris.j.arges@canonical.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michal Marek <mmarek@suse.cz>
      Cc: Namhyung Kim <namhyung@gmail.com>
      Cc: Pedro Alves <palves@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: live-patching@vger.kernel.org
      Link: http://lkml.kernel.org/r/1dbf80c097bb9d89c0cbddc01a815ada690e3b32.1453405861.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f66f6191
    • J
      x86/asm/crypto: Move .Lbswap_mask data to .rodata section · 1253cab8
      Josh Poimboeuf 提交于
      stacktool reports the following warning:
      
        stacktool: arch/x86/crypto/aesni-intel_asm.o: _aesni_inc_init(): can't find starting instruction
      
      stacktool gets confused when it tries to disassemble the following data
      in the .text section:
      
        .Lbswap_mask:
                .byte 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0
      
      Move it to .rodata which is a more appropriate section for read-only
      data.
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Bernd Petrovitsch <bernd@petrovitsch.priv.at>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Chris J Arges <chris.j.arges@canonical.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michal Marek <mmarek@suse.cz>
      Cc: Namhyung Kim <namhyung@gmail.com>
      Cc: Pedro Alves <palves@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: live-patching@vger.kernel.org
      Link: http://lkml.kernel.org/r/b6a2f3f8bda705143e127c025edb2b53c86e6eb4.1453405861.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      1253cab8
  6. 17 2月, 2016 1 次提交
    • S
      crypto: xts - consolidate sanity check for keys · 28856a9e
      Stephan Mueller 提交于
      The patch centralizes the XTS key check logic into the service function
      xts_check_key which is invoked from the different XTS implementations.
      With this, the XTS implementations in ARM, ARM64, PPC and S390 have now
      a sanity check for the XTS keys similar to the other arches.
      
      In addition, this service function received a check to ensure that the
      key != the tweak key which is mandated by FIPS 140-2 IG A.9. As the
      check is not present in the standards defining XTS, it is only enforced
      in FIPS mode of the kernel.
      Signed-off-by: NStephan Mueller <smueller@chronox.de>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      28856a9e
  7. 06 2月, 2016 1 次提交
    • W
      crypto: sha-mb - Fix load failure · fd09967b
      Wang, Rui Y 提交于
      On  Monday, February 1, 2016 4:18 PM, Herbert Xu wrote:
      >
      > On Wed, Jan 27, 2016 at 05:08:35PM +0800, Rui Wang wrote:
      >>
      >> +static int sha1_mb_async_import(struct ahash_request *req, const void
      >> +*in) {
      >> +	struct ahash_request *mcryptd_req = ahash_request_ctx(req);
      >> +	struct crypto_ahash *tfm = crypto_ahash_reqtfm(req);
      >> +	struct sha1_mb_ctx *ctx = crypto_ahash_ctx(tfm);
      >> +	struct mcryptd_ahash *mcryptd_tfm = ctx->mcryptd_tfm;
      >> +	struct crypto_shash *child = mcryptd_ahash_child(mcryptd_tfm);
      >> +	struct mcryptd_hash_request_ctx *rctx;
      >> +	struct shash_desc *desc;
      >> +	int err;
      >> +
      >> +	memcpy(mcryptd_req, req, sizeof(*req));
      >> +	ahash_request_set_tfm(mcryptd_req, &mcryptd_tfm->base);
      >> +	rctx = ahash_request_ctx(mcryptd_req);
      >> +	desc = &rctx->desc;
      >> +	desc->tfm = child;
      >> +	desc->flags = CRYPTO_TFM_REQ_MAY_SLEEP;
      >> +
      >> +	err = crypto_shash_init(desc);
      >> +	if (err)
      >> +		return err;
      >
      > What is this desc for?
      
      Hi Herbert,
      
      Yeah I just realized that the call to crypto_shash_init() isn't necessary
      here. What it does is overwritten by crypto_ahash_import(). But this desc
      still needs to be initialized here because it's newly allocated by
      ahash_request_alloc(). We eventually calls the shash version of import()
      which needs desc as an argument. The real context to be imported is then
      derived from shash_desc_ctx(desc).
      
      desc is a sub-field of struct mcryptd_hash_request_ctx, which is again a
      sub-field of the bigger blob allocated by ahash_request_alloc(). The entire
      blob's size is set in sha1_mb_async_init_tfm(). So a better version is as
      follows:
      
      (just removed the call to crypto_shash_init())
      
      >From 4bcb73adbef99aada94c49f352063619aa24d43d Mon Sep 17 00:00:00 2001
      From: Rui Wang <rui.y.wang@intel.com>
      Date: Mon, 14 Dec 2015 17:22:13 +0800
      Subject: [PATCH v2 1/4] crypto x86/sha1_mb: Fix load failure
      
      modprobe sha1_mb fails with the following message:
      
      modprobe: ERROR: could not insert 'sha1_mb': No such device
      
      It is because it needs to set its statesize and implement its
      import() and export() interface.
      
      v2: remove redundant call to crypto_shash_init()
      Signed-off-by: NRui Wang <rui.y.wang@intel.com>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      fd09967b
  8. 30 1月, 2016 1 次提交
  9. 27 1月, 2016 1 次提交
  10. 25 1月, 2016 1 次提交
  11. 19 12月, 2015 1 次提交
  12. 04 12月, 2015 1 次提交
    • W
      crypto: ghash-clmulni - Fix load failure · 3a020a72
      Wang, Rui Y 提交于
      ghash_clmulni_intel fails to load on Linux 4.3+ with the following message:
      "modprobe: ERROR: could not insert 'ghash_clmulni_intel': Invalid argument"
      
      After 8996eafd ("crypto: ahash - ensure statesize is non-zero") all ahash
      drivers are required to implement import()/export(), and must have a non-
      zero statesize.
      
      This patch has been tested with the algif_hash interface. The calculated
      digest values, after several rounds of import()s and export()s, match those
      calculated by tcrypt.
      Signed-off-by: NRui Wang <rui.y.wang@intel.com>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      3a020a72
  13. 08 10月, 2015 1 次提交
  14. 24 9月, 2015 1 次提交
  15. 21 9月, 2015 8 次提交
  16. 14 9月, 2015 1 次提交
    • D
      x86/fpu: Rename XSAVE macros · d91cab78
      Dave Hansen 提交于
      There are two concepts that have some confusing naming:
       1. Extended State Component numbers (currently called
          XFEATURE_BIT_*)
       2. Extended State Component masks (currently called XSTATE_*)
      
      The numbers are (currently) from 0-9.  State component 3 is the
      bounds registers for MPX, for instance.
      
      But when we want to enable "state component 3", we go set a bit
      in XCR0.  The bit we set is 1<<3.  We can check to see if a
      state component feature is enabled by looking at its bit.
      
      The current 'xfeature_bit's are at best xfeature bit _numbers_.
      Calling them bits is at best inconsistent with ending the enum
      list with 'XFEATURES_NR_MAX'.
      
      This patch renames the enum to be 'xfeature'.  These also
      happen to be what the Intel documentation calls a "state
      component".
      
      We also want to differentiate these from the "XSTATE_*" macros.
      The "XSTATE_*" macros are a mask, and we rename them to match.
      
      These macros are reasonably widely used so this patch is a
      wee bit big, but this really is just a rename.
      
      The only non-mechanical part of this is the
      
      	s/XSTATE_EXTEND_MASK/XFEATURE_MASK_EXTEND/
      
      We need a better name for it, but that's another patch.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: dave@sr71.net
      Cc: linux-kernel@vger.kernel.org
      Link: http://lkml.kernel.org/r/20150902233126.38653250@viggo.jf.intel.com
      [ Ported to v4.3-rc1. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      d91cab78
  17. 04 9月, 2015 1 次提交
  18. 17 8月, 2015 1 次提交
  19. 17 7月, 2015 6 次提交
    • M
      crypto: poly1305 - Add a four block AVX2 variant for x86_64 · b1ccc8f4
      Martin Willi 提交于
      Extends the x86_64 Poly1305 authenticator by a function processing four
      consecutive Poly1305 blocks in parallel using AVX2 instructions.
      
      For large messages, throughput increases by ~15-45% compared to two
      block SSE2:
      
      testing speed of poly1305 (poly1305-simd)
      test  0 (   96 byte blocks,   16 bytes per update,   6 updates): 3809514 opers/sec,  365713411 bytes/sec
      test  1 (   96 byte blocks,   32 bytes per update,   3 updates): 5973423 opers/sec,  573448627 bytes/sec
      test  2 (   96 byte blocks,   96 bytes per update,   1 updates): 9446779 opers/sec,  906890803 bytes/sec
      test  3 (  288 byte blocks,   16 bytes per update,  18 updates): 1364814 opers/sec,  393066691 bytes/sec
      test  4 (  288 byte blocks,   32 bytes per update,   9 updates): 2045780 opers/sec,  589184697 bytes/sec
      test  5 (  288 byte blocks,  288 bytes per update,   1 updates): 3711946 opers/sec, 1069040592 bytes/sec
      test  6 ( 1056 byte blocks,   32 bytes per update,  33 updates):  573686 opers/sec,  605812732 bytes/sec
      test  7 ( 1056 byte blocks, 1056 bytes per update,   1 updates): 1647802 opers/sec, 1740079440 bytes/sec
      test  8 ( 2080 byte blocks,   32 bytes per update,  65 updates):  292970 opers/sec,  609378224 bytes/sec
      test  9 ( 2080 byte blocks, 2080 bytes per update,   1 updates):  943229 opers/sec, 1961916528 bytes/sec
      test 10 ( 4128 byte blocks, 4128 bytes per update,   1 updates):  494623 opers/sec, 2041804569 bytes/sec
      test 11 ( 8224 byte blocks, 8224 bytes per update,   1 updates):  254045 opers/sec, 2089271014 bytes/sec
      
      testing speed of poly1305 (poly1305-simd)
      test  0 (   96 byte blocks,   16 bytes per update,   6 updates): 3826224 opers/sec,  367317552 bytes/sec
      test  1 (   96 byte blocks,   32 bytes per update,   3 updates): 5948638 opers/sec,  571069267 bytes/sec
      test  2 (   96 byte blocks,   96 bytes per update,   1 updates): 9439110 opers/sec,  906154627 bytes/sec
      test  3 (  288 byte blocks,   16 bytes per update,  18 updates): 1367756 opers/sec,  393913872 bytes/sec
      test  4 (  288 byte blocks,   32 bytes per update,   9 updates): 2056881 opers/sec,  592381958 bytes/sec
      test  5 (  288 byte blocks,  288 bytes per update,   1 updates): 3711153 opers/sec, 1068812179 bytes/sec
      test  6 ( 1056 byte blocks,   32 bytes per update,  33 updates):  574940 opers/sec,  607136745 bytes/sec
      test  7 ( 1056 byte blocks, 1056 bytes per update,   1 updates): 1948830 opers/sec, 2057964585 bytes/sec
      test  8 ( 2080 byte blocks,   32 bytes per update,  65 updates):  293308 opers/sec,  610082096 bytes/sec
      test  9 ( 2080 byte blocks, 2080 bytes per update,   1 updates): 1235224 opers/sec, 2569267792 bytes/sec
      test 10 ( 4128 byte blocks, 4128 bytes per update,   1 updates):  684405 opers/sec, 2825226316 bytes/sec
      test 11 ( 8224 byte blocks, 8224 bytes per update,   1 updates):  367101 opers/sec, 3019039446 bytes/sec
      
      Benchmark results from a Core i5-4670T.
      Signed-off-by: NMartin Willi <martin@strongswan.org>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      b1ccc8f4
    • M
      crypto: poly1305 - Add a two block SSE2 variant for x86_64 · da35b22d
      Martin Willi 提交于
      Extends the x86_64 SSE2 Poly1305 authenticator by a function processing two
      consecutive Poly1305 blocks in parallel using a derived key r^2. Loop
      unrolling can be more effectively mapped to SSE instructions, further
      increasing throughput.
      
      For large messages, throughput increases by ~45-65% compared to single
      block SSE2:
      
      testing speed of poly1305 (poly1305-simd)
      test  0 (   96 byte blocks,   16 bytes per update,   6 updates): 3790063 opers/sec,  363846076 bytes/sec
      test  1 (   96 byte blocks,   32 bytes per update,   3 updates): 5913378 opers/sec,  567684355 bytes/sec
      test  2 (   96 byte blocks,   96 bytes per update,   1 updates): 9352574 opers/sec,  897847104 bytes/sec
      test  3 (  288 byte blocks,   16 bytes per update,  18 updates): 1362145 opers/sec,  392297990 bytes/sec
      test  4 (  288 byte blocks,   32 bytes per update,   9 updates): 2007075 opers/sec,  578037628 bytes/sec
      test  5 (  288 byte blocks,  288 bytes per update,   1 updates): 3709811 opers/sec, 1068425798 bytes/sec
      test  6 ( 1056 byte blocks,   32 bytes per update,  33 updates):  566272 opers/sec,  597984182 bytes/sec
      test  7 ( 1056 byte blocks, 1056 bytes per update,   1 updates): 1111657 opers/sec, 1173910108 bytes/sec
      test  8 ( 2080 byte blocks,   32 bytes per update,  65 updates):  288857 opers/sec,  600823808 bytes/sec
      test  9 ( 2080 byte blocks, 2080 bytes per update,   1 updates):  590746 opers/sec, 1228751888 bytes/sec
      test 10 ( 4128 byte blocks, 4128 bytes per update,   1 updates):  301825 opers/sec, 1245936902 bytes/sec
      test 11 ( 8224 byte blocks, 8224 bytes per update,   1 updates):  153075 opers/sec, 1258896201 bytes/sec
      
      testing speed of poly1305 (poly1305-simd)
      test  0 (   96 byte blocks,   16 bytes per update,   6 updates): 3809514 opers/sec,  365713411 bytes/sec
      test  1 (   96 byte blocks,   32 bytes per update,   3 updates): 5973423 opers/sec,  573448627 bytes/sec
      test  2 (   96 byte blocks,   96 bytes per update,   1 updates): 9446779 opers/sec,  906890803 bytes/sec
      test  3 (  288 byte blocks,   16 bytes per update,  18 updates): 1364814 opers/sec,  393066691 bytes/sec
      test  4 (  288 byte blocks,   32 bytes per update,   9 updates): 2045780 opers/sec,  589184697 bytes/sec
      test  5 (  288 byte blocks,  288 bytes per update,   1 updates): 3711946 opers/sec, 1069040592 bytes/sec
      test  6 ( 1056 byte blocks,   32 bytes per update,  33 updates):  573686 opers/sec,  605812732 bytes/sec
      test  7 ( 1056 byte blocks, 1056 bytes per update,   1 updates): 1647802 opers/sec, 1740079440 bytes/sec
      test  8 ( 2080 byte blocks,   32 bytes per update,  65 updates):  292970 opers/sec,  609378224 bytes/sec
      test  9 ( 2080 byte blocks, 2080 bytes per update,   1 updates):  943229 opers/sec, 1961916528 bytes/sec
      test 10 ( 4128 byte blocks, 4128 bytes per update,   1 updates):  494623 opers/sec, 2041804569 bytes/sec
      test 11 ( 8224 byte blocks, 8224 bytes per update,   1 updates):  254045 opers/sec, 2089271014 bytes/sec
      
      Benchmark results from a Core i5-4670T.
      Signed-off-by: NMartin Willi <martin@strongswan.org>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      da35b22d
    • M
      crypto: poly1305 - Add a SSE2 SIMD variant for x86_64 · c70f4abe
      Martin Willi 提交于
      Implements an x86_64 assembler driver for the Poly1305 authenticator. This
      single block variant holds the 130-bit integer in 5 32-bit words, but uses
      SSE to do two multiplications/additions in parallel.
      
      When calling updates with small blocks, the overhead for kernel_fpu_begin/
      kernel_fpu_end() negates the perfmance gain. We therefore use the
      poly1305-generic fallback for small updates.
      
      For large messages, throughput increases by ~5-10% compared to
      poly1305-generic:
      
      testing speed of poly1305 (poly1305-generic)
      test  0 (   96 byte blocks,   16 bytes per update,   6 updates): 4080026 opers/sec,  391682496 bytes/sec
      test  1 (   96 byte blocks,   32 bytes per update,   3 updates): 6221094 opers/sec,  597225024 bytes/sec
      test  2 (   96 byte blocks,   96 bytes per update,   1 updates): 9609750 opers/sec,  922536057 bytes/sec
      test  3 (  288 byte blocks,   16 bytes per update,  18 updates): 1459379 opers/sec,  420301267 bytes/sec
      test  4 (  288 byte blocks,   32 bytes per update,   9 updates): 2115179 opers/sec,  609171609 bytes/sec
      test  5 (  288 byte blocks,  288 bytes per update,   1 updates): 3729874 opers/sec, 1074203856 bytes/sec
      test  6 ( 1056 byte blocks,   32 bytes per update,  33 updates):  593000 opers/sec,  626208000 bytes/sec
      test  7 ( 1056 byte blocks, 1056 bytes per update,   1 updates): 1081536 opers/sec, 1142102332 bytes/sec
      test  8 ( 2080 byte blocks,   32 bytes per update,  65 updates):  302077 opers/sec,  628320576 bytes/sec
      test  9 ( 2080 byte blocks, 2080 bytes per update,   1 updates):  554384 opers/sec, 1153120176 bytes/sec
      test 10 ( 4128 byte blocks, 4128 bytes per update,   1 updates):  278715 opers/sec, 1150536345 bytes/sec
      test 11 ( 8224 byte blocks, 8224 bytes per update,   1 updates):  140202 opers/sec, 1153022070 bytes/sec
      
      testing speed of poly1305 (poly1305-simd)
      test  0 (   96 byte blocks,   16 bytes per update,   6 updates): 3790063 opers/sec,  363846076 bytes/sec
      test  1 (   96 byte blocks,   32 bytes per update,   3 updates): 5913378 opers/sec,  567684355 bytes/sec
      test  2 (   96 byte blocks,   96 bytes per update,   1 updates): 9352574 opers/sec,  897847104 bytes/sec
      test  3 (  288 byte blocks,   16 bytes per update,  18 updates): 1362145 opers/sec,  392297990 bytes/sec
      test  4 (  288 byte blocks,   32 bytes per update,   9 updates): 2007075 opers/sec,  578037628 bytes/sec
      test  5 (  288 byte blocks,  288 bytes per update,   1 updates): 3709811 opers/sec, 1068425798 bytes/sec
      test  6 ( 1056 byte blocks,   32 bytes per update,  33 updates):  566272 opers/sec,  597984182 bytes/sec
      test  7 ( 1056 byte blocks, 1056 bytes per update,   1 updates): 1111657 opers/sec, 1173910108 bytes/sec
      test  8 ( 2080 byte blocks,   32 bytes per update,  65 updates):  288857 opers/sec,  600823808 bytes/sec
      test  9 ( 2080 byte blocks, 2080 bytes per update,   1 updates):  590746 opers/sec, 1228751888 bytes/sec
      test 10 ( 4128 byte blocks, 4128 bytes per update,   1 updates):  301825 opers/sec, 1245936902 bytes/sec
      test 11 ( 8224 byte blocks, 8224 bytes per update,   1 updates):  153075 opers/sec, 1258896201 bytes/sec
      
      Benchmark results from a Core i5-4670T.
      Signed-off-by: NMartin Willi <martin@strongswan.org>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      c70f4abe
    • M
      crypto: chacha20 - Add an eight block AVX2 variant for x86_64 · 3d1e93cd
      Martin Willi 提交于
      Extends the x86_64 ChaCha20 implementation by a function processing eight
      ChaCha20 blocks in parallel using AVX2.
      
      For large messages, throughput increases by ~55-70% compared to four block
      SSSE3:
      
      testing speed of chacha20 (chacha20-simd) encryption
      test 0 (256 bit key, 16 byte blocks): 42249230 operations in 10 seconds (675987680 bytes)
      test 1 (256 bit key, 64 byte blocks): 46441641 operations in 10 seconds (2972265024 bytes)
      test 2 (256 bit key, 256 byte blocks): 33028112 operations in 10 seconds (8455196672 bytes)
      test 3 (256 bit key, 1024 byte blocks): 11568759 operations in 10 seconds (11846409216 bytes)
      test 4 (256 bit key, 8192 byte blocks): 1448761 operations in 10 seconds (11868250112 bytes)
      
      testing speed of chacha20 (chacha20-simd) encryption
      test 0 (256 bit key, 16 byte blocks): 41999675 operations in 10 seconds (671994800 bytes)
      test 1 (256 bit key, 64 byte blocks): 45805908 operations in 10 seconds (2931578112 bytes)
      test 2 (256 bit key, 256 byte blocks): 32814947 operations in 10 seconds (8400626432 bytes)
      test 3 (256 bit key, 1024 byte blocks): 19777167 operations in 10 seconds (20251819008 bytes)
      test 4 (256 bit key, 8192 byte blocks): 2279321 operations in 10 seconds (18672197632 bytes)
      
      Benchmark results from a Core i5-4670T.
      Signed-off-by: NMartin Willi <martin@strongswan.org>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      3d1e93cd
    • M
      crypto: chacha20 - Add a four block SSSE3 variant for x86_64 · 274f938e
      Martin Willi 提交于
      Extends the x86_64 SSSE3 ChaCha20 implementation by a function processing
      four ChaCha20 blocks in parallel. This avoids the word shuffling needed
      in the single block variant, further increasing throughput.
      
      For large messages, throughput increases by ~110% compared to single block
      SSSE3:
      
      testing speed of chacha20 (chacha20-simd) encryption
      test 0 (256 bit key, 16 byte blocks): 43141886 operations in 10 seconds (690270176 bytes)
      test 1 (256 bit key, 64 byte blocks): 46845874 operations in 10 seconds (2998135936 bytes)
      test 2 (256 bit key, 256 byte blocks): 18458512 operations in 10 seconds (4725379072 bytes)
      test 3 (256 bit key, 1024 byte blocks): 5360533 operations in 10 seconds (5489185792 bytes)
      test 4 (256 bit key, 8192 byte blocks): 692846 operations in 10 seconds (5675794432 bytes)
      
      testing speed of chacha20 (chacha20-simd) encryption
      test 0 (256 bit key, 16 byte blocks): 42249230 operations in 10 seconds (675987680 bytes)
      test 1 (256 bit key, 64 byte blocks): 46441641 operations in 10 seconds (2972265024 bytes)
      test 2 (256 bit key, 256 byte blocks): 33028112 operations in 10 seconds (8455196672 bytes)
      test 3 (256 bit key, 1024 byte blocks): 11568759 operations in 10 seconds (11846409216 bytes)
      test 4 (256 bit key, 8192 byte blocks): 1448761 operations in 10 seconds (11868250112 bytes)
      
      Benchmark results from a Core i5-4670T.
      Signed-off-by: NMartin Willi <martin@strongswan.org>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      274f938e
    • M
      crypto: chacha20 - Add a SSSE3 SIMD variant for x86_64 · c9320b6d
      Martin Willi 提交于
      Implements an x86_64 assembler driver for the ChaCha20 stream cipher. This
      single block variant works on a single state matrix using SSE instructions.
      It requires SSSE3 due the use of pshufb for efficient 8/16-bit rotate
      operations.
      
      For large messages, throughput increases by ~65% compared to
      chacha20-generic:
      
      testing speed of chacha20 (chacha20-generic) encryption
      test 0 (256 bit key, 16 byte blocks): 45089207 operations in 10 seconds (721427312 bytes)
      test 1 (256 bit key, 64 byte blocks): 43839521 operations in 10 seconds (2805729344 bytes)
      test 2 (256 bit key, 256 byte blocks): 12702056 operations in 10 seconds (3251726336 bytes)
      test 3 (256 bit key, 1024 byte blocks): 3371173 operations in 10 seconds (3452081152 bytes)
      test 4 (256 bit key, 8192 byte blocks): 422468 operations in 10 seconds (3460857856 bytes)
      
      testing speed of chacha20 (chacha20-simd) encryption
      test 0 (256 bit key, 16 byte blocks): 43141886 operations in 10 seconds (690270176 bytes)
      test 1 (256 bit key, 64 byte blocks): 46845874 operations in 10 seconds (2998135936 bytes)
      test 2 (256 bit key, 256 byte blocks): 18458512 operations in 10 seconds (4725379072 bytes)
      test 3 (256 bit key, 1024 byte blocks): 5360533 operations in 10 seconds (5489185792 bytes)
      test 4 (256 bit key, 8192 byte blocks): 692846 operations in 10 seconds (5675794432 bytes)
      
      Benchmark results from a Core i5-4670T.
      Signed-off-by: NMartin Willi <martin@strongswan.org>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      c9320b6d
  20. 14 7月, 2015 1 次提交
  21. 29 6月, 2015 1 次提交