1. 03 4月, 2013 2 次提交
  2. 16 2月, 2013 1 次提交
  3. 01 12月, 2012 1 次提交
    • W
      KVM: x86: Emulate IA32_TSC_ADJUST MSR · ba904635
      Will Auld 提交于
      CPUID.7.0.EBX[1]=1 indicates IA32_TSC_ADJUST MSR 0x3b is supported
      
      Basic design is to emulate the MSR by allowing reads and writes to a guest
      vcpu specific location to store the value of the emulated MSR while adding
      the value to the vmcs tsc_offset. In this way the IA32_TSC_ADJUST value will
      be included in all reads to the TSC MSR whether through rdmsr or rdtsc. This
      is of course as long as the "use TSC counter offsetting" VM-execution control
      is enabled as well as the IA32_TSC_ADJUST control.
      
      However, because hardware will only return the TSC + IA32_TSC_ADJUST +
      vmsc tsc_offset for a guest process when it does and rdtsc (with the correct
      settings) the value of our virtualized IA32_TSC_ADJUST must be stored in one
      of these three locations. The argument against storing it in the actual MSR
      is performance. This is likely to be seldom used while the save/restore is
      required on every transition. IA32_TSC_ADJUST was created as a way to solve
      some issues with writing TSC itself so that is not an option either.
      
      The remaining option, defined above as our solution has the problem of
      returning incorrect vmcs tsc_offset values (unless we intercept and fix, not
      done here) as mentioned above. However, more problematic is that storing the
      data in vmcs tsc_offset will have a different semantic effect on the system
      than does using the actual MSR. This is illustrated in the following example:
      
      The hypervisor set the IA32_TSC_ADJUST, then the guest sets it and a guest
      process performs a rdtsc. In this case the guest process will get
      TSC + IA32_TSC_ADJUST_hyperviser + vmsc tsc_offset including
      IA32_TSC_ADJUST_guest. While the total system semantics changed the semantics
      as seen by the guest do not and hence this will not cause a problem.
      Signed-off-by: NWill Auld <will.auld@intel.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      ba904635
  4. 30 11月, 2012 1 次提交
  5. 14 11月, 2012 1 次提交
  6. 03 10月, 2012 1 次提交
  7. 19 9月, 2012 2 次提交
  8. 10 9月, 2012 1 次提交
  9. 21 7月, 2012 1 次提交
  10. 26 6月, 2012 1 次提交
  11. 22 2月, 2012 1 次提交
  12. 27 1月, 2012 1 次提交
  13. 26 1月, 2012 1 次提交
  14. 27 12月, 2011 1 次提交
  15. 26 9月, 2011 1 次提交
  16. 16 9月, 2011 1 次提交
    • L
      asm alternatives: remove incorrect alignment notes · a7f934d4
      Linus Torvalds 提交于
      On x86-64, they were just wasteful: with the explicitly added (now
      unnecessary) padding, the size of the alternatives structure was 16
      bytes, and an alignment of 8 bytes didn't hurt much.
      
      However, it was still silly, since the natural size and alignment for
      the structure is actually just 12 bytes, 4-byte aligned since commit
      59e97e4d ("x86: Make alternative instruction pointers relative").
      So removing the padding, and removing the extra alignment is just a good
      idea.
      
      On x86-32, the alignment of 4 bytes was correct, but was incorrectly
      hardcoded as 8 bytes in <asm/alternative-asm.h>.  That header file had
      used to be an x86-64 only header file, but various unification efforts
      have made it be used for x86-32 too (ie the unification of rwlock and
      rwsem).
      
      That in turn caused x86-32 boot failures, because the extra alignment
      would result in random zero-filled words in the altinstructions section,
      causing oopses early at boot when doing alternative instruction
      replacement.
      
      So just remove all the alignment noise entirely.  It's wrong, and it's
      unnecessary.  The section itself is already properly aligned by the
      linker scripts, and all additions to the section had better be of the
      proper 12-byte format, keeping it aligned.  So if the align directive
      were to ever make a difference, that would be an indication of a serious
      bug to begin with.
      Reported-by: NWerner Landgraf <w.landgraf@ru.r>
      Acked-by: NAndrew Lutomirski <luto@mit.edu>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a7f934d4
  17. 23 8月, 2011 1 次提交
  18. 10 8月, 2011 1 次提交
    • M
      crypto: sha1 - SSSE3 based SHA1 implementation for x86-64 · 66be8951
      Mathias Krause 提交于
      This is an assembler implementation of the SHA1 algorithm using the
      Supplemental SSE3 (SSSE3) instructions or, when available, the
      Advanced Vector Extensions (AVX).
      
      Testing with the tcrypt module shows the raw hash performance is up to
      2.3 times faster than the C implementation, using 8k data blocks on a
      Core 2 Duo T5500. For the smalest data set (16 byte) it is still 25%
      faster.
      
      Since this implementation uses SSE/YMM registers it cannot safely be
      used in every situation, e.g. while an IRQ interrupts a kernel thread.
      The implementation falls back to the generic SHA1 variant, if using
      the SSE/YMM registers is not possible.
      
      With this algorithm I was able to increase the throughput of a single
      IPsec link from 344 Mbit/s to 464 Mbit/s on a Core 2 Quad CPU using
      the SSSE3 variant -- a speedup of +34.8%.
      
      Saving and restoring SSE/YMM state might make the actual throughput
      fluctuate when there are FPU intensive userland applications running.
      For example, meassuring the performance using iperf2 directly on the
      machine under test gives wobbling numbers because iperf2 uses the FPU
      for each packet to check if the reporting interval has expired (in the
      above test I got min/max/avg: 402/484/464 MBit/s).
      
      Using this algorithm on a IPsec gateway gives much more reasonable and
      stable numbers, albeit not as high as in the directly connected case.
      Here is the result from an RFC 2544 test run with a EXFO Packet Blazer
      FTB-8510:
      
       frame size    sha1-generic     sha1-ssse3    delta
          64 byte     37.5 MBit/s    37.5 MBit/s     0.0%
         128 byte     56.3 MBit/s    62.5 MBit/s   +11.0%
         256 byte     87.5 MBit/s   100.0 MBit/s   +14.3%
         512 byte    131.3 MBit/s   150.0 MBit/s   +14.2%
        1024 byte    162.5 MBit/s   193.8 MBit/s   +19.3%
        1280 byte    175.0 MBit/s   212.5 MBit/s   +21.4%
        1420 byte    175.0 MBit/s   218.7 MBit/s   +25.0%
        1518 byte    150.0 MBit/s   181.2 MBit/s   +20.8%
      
      The throughput for the largest frame size is lower than for the
      previous size because the IP packets need to be fragmented in this
      case to make there way through the IPsec tunnel.
      Signed-off-by: NMathias Krause <minipli@googlemail.com>
      Cc: Maxim Locktyukhin <maxim.locktyukhin@intel.com>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      66be8951
  19. 14 7月, 2011 1 次提交
  20. 26 6月, 2011 1 次提交
    • C
      x86: Add support for cmpxchg_double · 3824abd1
      Christoph Lameter 提交于
      A simple implementation that only supports the word size and does not
      have a fallback mode (would require a spinlock).
      
      Add 32 and 64 bit support for cmpxchg_double. cmpxchg double uses
      the cmpxchg8b or cmpxchg16b instruction on x86 processors to compare
      and swap 2 machine words. This allows lockless algorithms to move more
      context information through critical sections.
      
      Set a flag CONFIG_CMPXCHG_DOUBLE to signal that support for double word
      cmpxchg detection has been build into the kernel. Note that each subsystem
      using cmpxchg_double has to implement a fall back mechanism as long as
      we offer support for processors that do not implement cmpxchg_double.
      Reviewed-by: NH. Peter Anvin <hpa@zytor.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Signed-off-by: NChristoph Lameter <cl@linux.com>
      Link: http://lkml.kernel.org/r/20110601172614.173427964@linux.comSigned-off-by: NH. Peter Anvin <hpa@zytor.com>
      3824abd1
  21. 25 5月, 2011 1 次提交
  22. 18 5月, 2011 2 次提交
  23. 29 3月, 2011 1 次提交
    • C
      x86: A fast way to check capabilities of the current cpu · 349c004e
      Christoph Lameter 提交于
      Add this_cpu_has() which determines if the current cpu has a certain
      ability using a segment prefix and a bit test operation.
      
      For that we need to add bit operations to x86s percpu.h.
      
      Many uses of cpu_has use a pointer passed to a function to determine
      the current flags. That is no longer necessary after this patch.
      
      However, this patch only converts the straightforward cases where
      cpu_has is used with this_cpu_ptr. The rest is work for later.
      
      -tj: Rolled up patch to add x86_ prefix and use percpu_read() instead
           of percpu_read_stable().
      Signed-off-by: NChristoph Lameter <cl@linux.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      349c004e
  24. 16 2月, 2011 1 次提交
    • R
      perf, x86: Add support for AMD family 15h core counters · 4979d272
      Robert Richter 提交于
      This patch adds support for AMD family 15h core counters. There are
      major changes compared to family 10h. First, there is a new perfctr
      msr range for up to 6 counters. Northbridge counters are separate
      now. This patch only adds support for core counters. Second, certain
      events may only be scheduled on certain counters. For this we need to
      extend the event scheduling and constraints.
      
      We use cpu feature flags to calculate family 15h msr address offsets.
      This way we later can implement a faster ALTERNATIVE() version for
      this.
      Signed-off-by: NRobert Richter <robert.richter@amd.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <20110215135210.GB5874@erda.amd.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      4979d272
  25. 25 9月, 2010 1 次提交
  26. 14 9月, 2010 1 次提交
  27. 09 9月, 2010 3 次提交
  28. 02 8月, 2010 1 次提交
  29. 31 7月, 2010 1 次提交
  30. 20 7月, 2010 2 次提交
  31. 08 7月, 2010 3 次提交
  32. 17 6月, 2010 1 次提交
    • V
      x86: Look for IA32_ENERGY_PERF_BIAS support · 23016bf0
      Venkatesh Pallipadi 提交于
      The new IA32_ENERGY_PERF_BIAS MSR allows system software to give
      hardware a hint whether OS policy favors more power saving,
      or more performance.  This allows the OS to have some influence
      on internal hardware power/performance tradeoffs where the OS
      has previously had no influence.
      
      The support for this feature is indicated by CPUID.06H.ECX.bit3,
      as documented in the Intel Architectures Software Developer's Manual.
      
      This patch discovers support of this feature and displays it
      as "epb" in /proc/cpuinfo.
      Signed-off-by: NVenkatesh Pallipadi <venki@google.com>
      LKML-Reference: <alpine.LFD.2.00.1006032310160.6669@localhost.localdomain>
      Signed-off-by: NLen Brown <len.brown@intel.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      23016bf0