1. 26 6月, 2012 1 次提交
  2. 22 2月, 2012 1 次提交
  3. 27 1月, 2012 1 次提交
  4. 26 1月, 2012 1 次提交
  5. 27 12月, 2011 1 次提交
  6. 26 9月, 2011 1 次提交
  7. 16 9月, 2011 1 次提交
    • L
      asm alternatives: remove incorrect alignment notes · a7f934d4
      Linus Torvalds 提交于
      On x86-64, they were just wasteful: with the explicitly added (now
      unnecessary) padding, the size of the alternatives structure was 16
      bytes, and an alignment of 8 bytes didn't hurt much.
      
      However, it was still silly, since the natural size and alignment for
      the structure is actually just 12 bytes, 4-byte aligned since commit
      59e97e4d ("x86: Make alternative instruction pointers relative").
      So removing the padding, and removing the extra alignment is just a good
      idea.
      
      On x86-32, the alignment of 4 bytes was correct, but was incorrectly
      hardcoded as 8 bytes in <asm/alternative-asm.h>.  That header file had
      used to be an x86-64 only header file, but various unification efforts
      have made it be used for x86-32 too (ie the unification of rwlock and
      rwsem).
      
      That in turn caused x86-32 boot failures, because the extra alignment
      would result in random zero-filled words in the altinstructions section,
      causing oopses early at boot when doing alternative instruction
      replacement.
      
      So just remove all the alignment noise entirely.  It's wrong, and it's
      unnecessary.  The section itself is already properly aligned by the
      linker scripts, and all additions to the section had better be of the
      proper 12-byte format, keeping it aligned.  So if the align directive
      were to ever make a difference, that would be an indication of a serious
      bug to begin with.
      Reported-by: NWerner Landgraf <w.landgraf@ru.r>
      Acked-by: NAndrew Lutomirski <luto@mit.edu>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a7f934d4
  8. 23 8月, 2011 1 次提交
  9. 10 8月, 2011 1 次提交
    • M
      crypto: sha1 - SSSE3 based SHA1 implementation for x86-64 · 66be8951
      Mathias Krause 提交于
      This is an assembler implementation of the SHA1 algorithm using the
      Supplemental SSE3 (SSSE3) instructions or, when available, the
      Advanced Vector Extensions (AVX).
      
      Testing with the tcrypt module shows the raw hash performance is up to
      2.3 times faster than the C implementation, using 8k data blocks on a
      Core 2 Duo T5500. For the smalest data set (16 byte) it is still 25%
      faster.
      
      Since this implementation uses SSE/YMM registers it cannot safely be
      used in every situation, e.g. while an IRQ interrupts a kernel thread.
      The implementation falls back to the generic SHA1 variant, if using
      the SSE/YMM registers is not possible.
      
      With this algorithm I was able to increase the throughput of a single
      IPsec link from 344 Mbit/s to 464 Mbit/s on a Core 2 Quad CPU using
      the SSSE3 variant -- a speedup of +34.8%.
      
      Saving and restoring SSE/YMM state might make the actual throughput
      fluctuate when there are FPU intensive userland applications running.
      For example, meassuring the performance using iperf2 directly on the
      machine under test gives wobbling numbers because iperf2 uses the FPU
      for each packet to check if the reporting interval has expired (in the
      above test I got min/max/avg: 402/484/464 MBit/s).
      
      Using this algorithm on a IPsec gateway gives much more reasonable and
      stable numbers, albeit not as high as in the directly connected case.
      Here is the result from an RFC 2544 test run with a EXFO Packet Blazer
      FTB-8510:
      
       frame size    sha1-generic     sha1-ssse3    delta
          64 byte     37.5 MBit/s    37.5 MBit/s     0.0%
         128 byte     56.3 MBit/s    62.5 MBit/s   +11.0%
         256 byte     87.5 MBit/s   100.0 MBit/s   +14.3%
         512 byte    131.3 MBit/s   150.0 MBit/s   +14.2%
        1024 byte    162.5 MBit/s   193.8 MBit/s   +19.3%
        1280 byte    175.0 MBit/s   212.5 MBit/s   +21.4%
        1420 byte    175.0 MBit/s   218.7 MBit/s   +25.0%
        1518 byte    150.0 MBit/s   181.2 MBit/s   +20.8%
      
      The throughput for the largest frame size is lower than for the
      previous size because the IP packets need to be fragmented in this
      case to make there way through the IPsec tunnel.
      Signed-off-by: NMathias Krause <minipli@googlemail.com>
      Cc: Maxim Locktyukhin <maxim.locktyukhin@intel.com>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      66be8951
  10. 14 7月, 2011 1 次提交
  11. 26 6月, 2011 1 次提交
    • C
      x86: Add support for cmpxchg_double · 3824abd1
      Christoph Lameter 提交于
      A simple implementation that only supports the word size and does not
      have a fallback mode (would require a spinlock).
      
      Add 32 and 64 bit support for cmpxchg_double. cmpxchg double uses
      the cmpxchg8b or cmpxchg16b instruction on x86 processors to compare
      and swap 2 machine words. This allows lockless algorithms to move more
      context information through critical sections.
      
      Set a flag CONFIG_CMPXCHG_DOUBLE to signal that support for double word
      cmpxchg detection has been build into the kernel. Note that each subsystem
      using cmpxchg_double has to implement a fall back mechanism as long as
      we offer support for processors that do not implement cmpxchg_double.
      Reviewed-by: NH. Peter Anvin <hpa@zytor.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Signed-off-by: NChristoph Lameter <cl@linux.com>
      Link: http://lkml.kernel.org/r/20110601172614.173427964@linux.comSigned-off-by: NH. Peter Anvin <hpa@zytor.com>
      3824abd1
  12. 25 5月, 2011 1 次提交
  13. 18 5月, 2011 2 次提交
  14. 29 3月, 2011 1 次提交
    • C
      x86: A fast way to check capabilities of the current cpu · 349c004e
      Christoph Lameter 提交于
      Add this_cpu_has() which determines if the current cpu has a certain
      ability using a segment prefix and a bit test operation.
      
      For that we need to add bit operations to x86s percpu.h.
      
      Many uses of cpu_has use a pointer passed to a function to determine
      the current flags. That is no longer necessary after this patch.
      
      However, this patch only converts the straightforward cases where
      cpu_has is used with this_cpu_ptr. The rest is work for later.
      
      -tj: Rolled up patch to add x86_ prefix and use percpu_read() instead
           of percpu_read_stable().
      Signed-off-by: NChristoph Lameter <cl@linux.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      349c004e
  15. 16 2月, 2011 1 次提交
    • R
      perf, x86: Add support for AMD family 15h core counters · 4979d272
      Robert Richter 提交于
      This patch adds support for AMD family 15h core counters. There are
      major changes compared to family 10h. First, there is a new perfctr
      msr range for up to 6 counters. Northbridge counters are separate
      now. This patch only adds support for core counters. Second, certain
      events may only be scheduled on certain counters. For this we need to
      extend the event scheduling and constraints.
      
      We use cpu feature flags to calculate family 15h msr address offsets.
      This way we later can implement a faster ALTERNATIVE() version for
      this.
      Signed-off-by: NRobert Richter <robert.richter@amd.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <20110215135210.GB5874@erda.amd.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      4979d272
  16. 25 9月, 2010 1 次提交
  17. 14 9月, 2010 1 次提交
  18. 09 9月, 2010 3 次提交
  19. 02 8月, 2010 1 次提交
  20. 31 7月, 2010 1 次提交
  21. 20 7月, 2010 2 次提交
  22. 08 7月, 2010 3 次提交
  23. 17 6月, 2010 1 次提交
    • V
      x86: Look for IA32_ENERGY_PERF_BIAS support · 23016bf0
      Venkatesh Pallipadi 提交于
      The new IA32_ENERGY_PERF_BIAS MSR allows system software to give
      hardware a hint whether OS policy favors more power saving,
      or more performance.  This allows the OS to have some influence
      on internal hardware power/performance tradeoffs where the OS
      has previously had no influence.
      
      The support for this feature is indicated by CPUID.06H.ECX.bit3,
      as documented in the Intel Architectures Software Developer's Manual.
      
      This patch discovers support of this feature and displays it
      as "epb" in /proc/cpuinfo.
      Signed-off-by: NVenkatesh Pallipadi <venki@google.com>
      LKML-Reference: <alpine.LFD.2.00.1006032310160.6669@localhost.localdomain>
      Signed-off-by: NLen Brown <len.brown@intel.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      23016bf0
  24. 28 5月, 2010 1 次提交
  25. 12 5月, 2010 1 次提交
    • H
      x86: Add new static_cpu_has() function using alternatives · a3c8acd0
      H. Peter Anvin 提交于
      For CPU-feature-specific code that touches performance-critical paths,
      introduce a static patching version of [boot_]cpu_has().  This is run
      at alternatives time and is therefore not appropriate for most
      initialization code, but on the other hand initialization code is
      generally not performance critical.
      
      On gcc 4.5+ this uses the new "asm goto" feature.
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      Cc: Avi Kivity <avi@redhat.com>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      LKML-Reference: <1273135546-29690-2-git-send-email-avi@redhat.com>
      a3c8acd0
  26. 10 4月, 2010 1 次提交
  27. 14 2月, 2010 1 次提交
  28. 17 12月, 2009 1 次提交
  29. 19 10月, 2009 1 次提交
  30. 15 9月, 2009 1 次提交
    • P
      x86: Move APERF/MPERF into a X86_FEATURE · a8303aaf
      Peter Zijlstra 提交于
      Move the APERFMPERF capacility into a X86_FEATURE flag so that it
      can be used outside of the acpi cpufreq driver.
      
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
      Cc: Yanmin <yanmin_zhang@linux.intel.com>
      Cc: Dave Jones <davej@redhat.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Yinghai Lu <yhlu.kernel@gmail.com>
      Cc: cpufreq@vger.kernel.org
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a8303aaf
  31. 04 9月, 2009 1 次提交
  32. 10 6月, 2009 1 次提交
  33. 09 6月, 2009 1 次提交
    • A
      x86: Detect use of extended APIC ID for AMD CPUs · 42937e81
      Andreas Herrmann 提交于
      Booting a 32-bit kernel on Magny-Cours results in the following panic:
      
        ...
        Using APIC driver default
        ...
        Overriding APIC driver with bigsmp
        ...
        Getting VERSION: 80050010
        Getting VERSION: 80050010
        Getting ID: 10000000
        Getting ID: ef000000
        Getting LVT0: 700
        Getting LVT1: 10000
        Kernel panic - not syncing: Boot APIC ID in local APIC unexpected (16 vs 0)
        Pid: 1, comm: swapper Not tainted 2.6.30-rcX #2
        Call Trace:
         [<c05194da>] ? panic+0x38/0xd3
         [<c0743102>] ? native_smp_prepare_cpus+0x259/0x31f
         [<c073b19d>] ? kernel_init+0x3e/0x141
         [<c073b15f>] ? kernel_init+0x0/0x141
         [<c020325f>] ? kernel_thread_helper+0x7/0x10
      
      The reason is that default_get_apic_id handled extension of local APIC
      ID field just in case of XAPIC.
      
      Thus for this AMD CPU, default_get_apic_id() returns 0 and
      bigsmp_get_apic_id() returns 16 which leads to the respective kernel
      panic.
      
      This patch introduces a Linux specific feature flag to indicate
      support for extended APIC id (8 bits instead of 4 bits width) and sets
      the flag on AMD CPUs if applicable.
      Signed-off-by: NAndreas Herrmann <andreas.herrmann3@amd.com>
      Cc: <stable@kernel.org>
      LKML-Reference: <20090608135509.GA12431@alberich.amd.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      42937e81
  34. 11 5月, 2009 1 次提交
    • Y
      x86: clean up and fix setup_clear/force_cpu_cap handling · 3e0c3737
      Yinghai Lu 提交于
      setup_force_cpu_cap() only have one user (Xen guest code),
      but it should not reuse cleared_cpu_cpus, otherwise it
      will have problems on SMP.
      
      Need to have a separate cpu_cpus_set array too, for forced-on
      flags, beyond the forced-off flags.
      
      Also need to setup handling before all cpus caps are combined.
      
      [ Impact: fix the forced-set CPU feature flag logic ]
      
      Cc: H. Peter Anvin <hpa@linux.intel.com>
      Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NYinghai Lu <yinghai.lu@kernel.org>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      3e0c3737