1. 17 7月, 2015 5 次提交
    • M
      crypto: poly1305 - Add a SSE2 SIMD variant for x86_64 · c70f4abe
      Martin Willi 提交于
      Implements an x86_64 assembler driver for the Poly1305 authenticator. This
      single block variant holds the 130-bit integer in 5 32-bit words, but uses
      SSE to do two multiplications/additions in parallel.
      
      When calling updates with small blocks, the overhead for kernel_fpu_begin/
      kernel_fpu_end() negates the perfmance gain. We therefore use the
      poly1305-generic fallback for small updates.
      
      For large messages, throughput increases by ~5-10% compared to
      poly1305-generic:
      
      testing speed of poly1305 (poly1305-generic)
      test  0 (   96 byte blocks,   16 bytes per update,   6 updates): 4080026 opers/sec,  391682496 bytes/sec
      test  1 (   96 byte blocks,   32 bytes per update,   3 updates): 6221094 opers/sec,  597225024 bytes/sec
      test  2 (   96 byte blocks,   96 bytes per update,   1 updates): 9609750 opers/sec,  922536057 bytes/sec
      test  3 (  288 byte blocks,   16 bytes per update,  18 updates): 1459379 opers/sec,  420301267 bytes/sec
      test  4 (  288 byte blocks,   32 bytes per update,   9 updates): 2115179 opers/sec,  609171609 bytes/sec
      test  5 (  288 byte blocks,  288 bytes per update,   1 updates): 3729874 opers/sec, 1074203856 bytes/sec
      test  6 ( 1056 byte blocks,   32 bytes per update,  33 updates):  593000 opers/sec,  626208000 bytes/sec
      test  7 ( 1056 byte blocks, 1056 bytes per update,   1 updates): 1081536 opers/sec, 1142102332 bytes/sec
      test  8 ( 2080 byte blocks,   32 bytes per update,  65 updates):  302077 opers/sec,  628320576 bytes/sec
      test  9 ( 2080 byte blocks, 2080 bytes per update,   1 updates):  554384 opers/sec, 1153120176 bytes/sec
      test 10 ( 4128 byte blocks, 4128 bytes per update,   1 updates):  278715 opers/sec, 1150536345 bytes/sec
      test 11 ( 8224 byte blocks, 8224 bytes per update,   1 updates):  140202 opers/sec, 1153022070 bytes/sec
      
      testing speed of poly1305 (poly1305-simd)
      test  0 (   96 byte blocks,   16 bytes per update,   6 updates): 3790063 opers/sec,  363846076 bytes/sec
      test  1 (   96 byte blocks,   32 bytes per update,   3 updates): 5913378 opers/sec,  567684355 bytes/sec
      test  2 (   96 byte blocks,   96 bytes per update,   1 updates): 9352574 opers/sec,  897847104 bytes/sec
      test  3 (  288 byte blocks,   16 bytes per update,  18 updates): 1362145 opers/sec,  392297990 bytes/sec
      test  4 (  288 byte blocks,   32 bytes per update,   9 updates): 2007075 opers/sec,  578037628 bytes/sec
      test  5 (  288 byte blocks,  288 bytes per update,   1 updates): 3709811 opers/sec, 1068425798 bytes/sec
      test  6 ( 1056 byte blocks,   32 bytes per update,  33 updates):  566272 opers/sec,  597984182 bytes/sec
      test  7 ( 1056 byte blocks, 1056 bytes per update,   1 updates): 1111657 opers/sec, 1173910108 bytes/sec
      test  8 ( 2080 byte blocks,   32 bytes per update,  65 updates):  288857 opers/sec,  600823808 bytes/sec
      test  9 ( 2080 byte blocks, 2080 bytes per update,   1 updates):  590746 opers/sec, 1228751888 bytes/sec
      test 10 ( 4128 byte blocks, 4128 bytes per update,   1 updates):  301825 opers/sec, 1245936902 bytes/sec
      test 11 ( 8224 byte blocks, 8224 bytes per update,   1 updates):  153075 opers/sec, 1258896201 bytes/sec
      
      Benchmark results from a Core i5-4670T.
      Signed-off-by: NMartin Willi <martin@strongswan.org>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      c70f4abe
    • M
      crypto: chacha20 - Add an eight block AVX2 variant for x86_64 · 3d1e93cd
      Martin Willi 提交于
      Extends the x86_64 ChaCha20 implementation by a function processing eight
      ChaCha20 blocks in parallel using AVX2.
      
      For large messages, throughput increases by ~55-70% compared to four block
      SSSE3:
      
      testing speed of chacha20 (chacha20-simd) encryption
      test 0 (256 bit key, 16 byte blocks): 42249230 operations in 10 seconds (675987680 bytes)
      test 1 (256 bit key, 64 byte blocks): 46441641 operations in 10 seconds (2972265024 bytes)
      test 2 (256 bit key, 256 byte blocks): 33028112 operations in 10 seconds (8455196672 bytes)
      test 3 (256 bit key, 1024 byte blocks): 11568759 operations in 10 seconds (11846409216 bytes)
      test 4 (256 bit key, 8192 byte blocks): 1448761 operations in 10 seconds (11868250112 bytes)
      
      testing speed of chacha20 (chacha20-simd) encryption
      test 0 (256 bit key, 16 byte blocks): 41999675 operations in 10 seconds (671994800 bytes)
      test 1 (256 bit key, 64 byte blocks): 45805908 operations in 10 seconds (2931578112 bytes)
      test 2 (256 bit key, 256 byte blocks): 32814947 operations in 10 seconds (8400626432 bytes)
      test 3 (256 bit key, 1024 byte blocks): 19777167 operations in 10 seconds (20251819008 bytes)
      test 4 (256 bit key, 8192 byte blocks): 2279321 operations in 10 seconds (18672197632 bytes)
      
      Benchmark results from a Core i5-4670T.
      Signed-off-by: NMartin Willi <martin@strongswan.org>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      3d1e93cd
    • M
      crypto: chacha20 - Add a four block SSSE3 variant for x86_64 · 274f938e
      Martin Willi 提交于
      Extends the x86_64 SSSE3 ChaCha20 implementation by a function processing
      four ChaCha20 blocks in parallel. This avoids the word shuffling needed
      in the single block variant, further increasing throughput.
      
      For large messages, throughput increases by ~110% compared to single block
      SSSE3:
      
      testing speed of chacha20 (chacha20-simd) encryption
      test 0 (256 bit key, 16 byte blocks): 43141886 operations in 10 seconds (690270176 bytes)
      test 1 (256 bit key, 64 byte blocks): 46845874 operations in 10 seconds (2998135936 bytes)
      test 2 (256 bit key, 256 byte blocks): 18458512 operations in 10 seconds (4725379072 bytes)
      test 3 (256 bit key, 1024 byte blocks): 5360533 operations in 10 seconds (5489185792 bytes)
      test 4 (256 bit key, 8192 byte blocks): 692846 operations in 10 seconds (5675794432 bytes)
      
      testing speed of chacha20 (chacha20-simd) encryption
      test 0 (256 bit key, 16 byte blocks): 42249230 operations in 10 seconds (675987680 bytes)
      test 1 (256 bit key, 64 byte blocks): 46441641 operations in 10 seconds (2972265024 bytes)
      test 2 (256 bit key, 256 byte blocks): 33028112 operations in 10 seconds (8455196672 bytes)
      test 3 (256 bit key, 1024 byte blocks): 11568759 operations in 10 seconds (11846409216 bytes)
      test 4 (256 bit key, 8192 byte blocks): 1448761 operations in 10 seconds (11868250112 bytes)
      
      Benchmark results from a Core i5-4670T.
      Signed-off-by: NMartin Willi <martin@strongswan.org>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      274f938e
    • M
      crypto: chacha20 - Add a SSSE3 SIMD variant for x86_64 · c9320b6d
      Martin Willi 提交于
      Implements an x86_64 assembler driver for the ChaCha20 stream cipher. This
      single block variant works on a single state matrix using SSE instructions.
      It requires SSSE3 due the use of pshufb for efficient 8/16-bit rotate
      operations.
      
      For large messages, throughput increases by ~65% compared to
      chacha20-generic:
      
      testing speed of chacha20 (chacha20-generic) encryption
      test 0 (256 bit key, 16 byte blocks): 45089207 operations in 10 seconds (721427312 bytes)
      test 1 (256 bit key, 64 byte blocks): 43839521 operations in 10 seconds (2805729344 bytes)
      test 2 (256 bit key, 256 byte blocks): 12702056 operations in 10 seconds (3251726336 bytes)
      test 3 (256 bit key, 1024 byte blocks): 3371173 operations in 10 seconds (3452081152 bytes)
      test 4 (256 bit key, 8192 byte blocks): 422468 operations in 10 seconds (3460857856 bytes)
      
      testing speed of chacha20 (chacha20-simd) encryption
      test 0 (256 bit key, 16 byte blocks): 43141886 operations in 10 seconds (690270176 bytes)
      test 1 (256 bit key, 64 byte blocks): 46845874 operations in 10 seconds (2998135936 bytes)
      test 2 (256 bit key, 256 byte blocks): 18458512 operations in 10 seconds (4725379072 bytes)
      test 3 (256 bit key, 1024 byte blocks): 5360533 operations in 10 seconds (5489185792 bytes)
      test 4 (256 bit key, 8192 byte blocks): 692846 operations in 10 seconds (5675794432 bytes)
      
      Benchmark results from a Core i5-4670T.
      Signed-off-by: NMartin Willi <martin@strongswan.org>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      c9320b6d
    • H
      crypto: aes-ce-ccm - Convert to new AEAD interface · 2642d6ab
      Herbert Xu 提交于
      This patch converts the ARM64 aes-ce-ccm implementation to the
      new AEAD interface.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Reviewed-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Tested-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      2642d6ab
  2. 14 7月, 2015 2 次提交
  3. 06 7月, 2015 1 次提交
  4. 04 7月, 2015 7 次提交
  5. 03 7月, 2015 3 次提交
    • H
      ARM64 / SMP: Switch pr_err() to pr_debug() for disabled GICC entry · f9058929
      Hanjun Guo 提交于
      It is normal that firmware presents GICC entry or entries (processors)
      with disabled flag in ACPI MADT, taking a system of 16 cpus for example,
      ACPI firmware may present 8 ebabled first with another 8 cpus disabled
      in MADT, the disabled cpus can be hot-added later.
      
      Firmware may also present more cpus than the hardware actually has, but
      disabled the unused ones, and easily enable it when the hardware has such
      cpus to make the firmware code scalable.
      
      So that's not an error for disabled cpus in MADT, we can switch pr_err()
      to pr_debug() to make the boot a little quieter by default.
      
      Since hwid for disabled cpus often are invalid, and we check invalid hwid
      first in the code, for use case that hot add cpus later will be filtered
      out and will not be counted in possible cups, so move this check before
      the hwid one to prepare the code to count for disabeld cpus when cpu
      hot-plug is introduced.
      Signed-off-by: NHanjun Guo <hanjun.guo@linaro.org>
      Reviewed-by: NAl Stone <ahs3@redhat.com>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      f9058929
    • T
      [IA64] Drop debug test/printk that some special pages are marked reserved · 43c518d1
      Tony Luck 提交于
      In commit 92923ca3 "mm: meminit: only set page reserved in the memblock region"
      we dropped setting the reserved bits for all pages. This results in some warnings
      on ia64:
      
      put_kernel_page: page at 0xe000000005588000 not in reserved memory
      put_kernel_page: page at 0xe000000005588000 not in reserved memory
      put_kernel_page: page at 0xe000000005580000 not in reserved memory
      put_kernel_page: page at 0xe000000005580000 not in reserved memory
      put_kernel_page: page at 0xe000000005580000 not in reserved memory
      put_kernel_page: page at 0xe000000005580000 not in reserved memory
      
      the two different pages match up with two objects from the loaded kernel
      that get mapped by arch/ia64/mm/init.c:setup_gate()
      
      a000000101588000 D __start_gate_section
      a000000101580000 D empty_zero_page
      
      In a discussion with Mel Gorman:
        http://lkml.kernel.org/r/20150526102219.GB13750%40suse.de
      he suggested that while the preferred approach might be to
      set the reserved bit for these pages, it would also be OK
      to just drop the test:
         "as it's a debugging check that is ia-64 specific"
      
      After hunting around a bit and failin to find a good place to mark these
      pages as reserved - I decided to just delete the test.
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      43c518d1
    • J
      arm64: cpuidle: add __init section marker to arm_cpuidle_init · ea389daa
      Jisheng Zhang 提交于
      It is not needed after booting, this patch moves the arm_cpuidle_init()
      function to the __init section.
      Signed-off-by: NJisheng Zhang <jszhang@marvell.com>
      Reviewed-by: NKrzysztof Kozlowski <k.kozlowski@samsung.com>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      ea389daa
  6. 02 7月, 2015 4 次提交
  7. 01 7月, 2015 13 次提交
  8. 30 6月, 2015 3 次提交
    • P
      perf/x86: Fix 'active_events' imbalance · 93472aff
      Peter Zijlstra 提交于
      Commit 1b7b938f ("perf/x86/intel: Fix PMI handling for Intel PT") conditionally
      increments active_events in x86_add_exclusive() but unconditionally decrements in
      x86_del_exclusive().
      
      These extra decrements can lead to the situation where
      active_events is zero and thus the PMI handler is 'disabled'
      while we have active events on the PMU generating PMIs.
      
      This leads to a truckload of:
      
        Uhhuh. NMI received for unknown reason 21 on CPU 28.
        Do you have a strange power saving mode enabled?
        Dazed and confused, but trying to continue
      
      messages and generally messes up perf.
      
      Remove the condition on the increment, double increment balanced
      by a double decrement is perfectly fine.
      
      Restructure the code a little bit to make the unconditional inc
      a bit more natural.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: alexander.shishkin@linux.intel.com
      Cc: brgerst@gmail.com
      Cc: dvlasenk@redhat.com
      Cc: luto@amacapital.net
      Cc: oleg@redhat.com
      Fixes: 1b7b938f ("perf/x86/intel: Fix PMI handling for Intel PT")
      Link: http://lkml.kernel.org/r/20150624144750.GJ18673@twins.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      93472aff
    • I
      x86/fpu: Fix FPU related boot regression when CPUID masking BIOS feature is enabled · db52ef74
      Ingo Molnar 提交于
      Mike Galbraith reported:
      
        " My i7-4790 box is having one hell of a time with this merge
          window, dead in the water.
      
          BIOS setting "Limit CPUID Maximum" upsets new fpu code
          mightily. "
      
      It turns out that Linux does a double workaround here, as per:
      
        066941bd ("x86: unmask CPUID levels on Intel CPUs")
      
      it undoes the BIOS workaround - but as a side effect the CPUID
      state is not completely constant during early init anymore,
      and the new FPU init code did not take this into account.
      
      So what happened is that the xstate init code did not have full
      CPUID available, which broke subsequent attempts to use xstate
      features.
      
      Fix this by ordering the early FPU init code to after we've
      stabilized the CPUID state.
      Reported-bisected-and-tested-by: NMike Galbraith <umgwanakikbuti@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20150627082514.GA10894@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      db52ef74
    • Q
      intel_pmc_ipc: Add Intel Apollo Lake PMC IPC driver · 0a8b8353
      qipeng.zha 提交于
      This driver provides support for PMC control on Apollo Lake platforms.
      The PMC is an ARC processor which defines some IPC commands for
      communication with other entities in the CPU.
      Signed-off-by: Nqipeng.zha <qipeng.zha@intel.com>
      [fengguang.wu@intel.com: Fix Sparse and Cocinelle warnings]
      Signed-off-by: NFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: NDarren Hart <dvhart@linux.intel.com>
      0a8b8353
  9. 29 6月, 2015 2 次提交