1. 30 5月, 2014 2 次提交
  2. 28 2月, 2014 2 次提交
  3. 21 2月, 2014 1 次提交
  4. 19 2月, 2014 1 次提交
  5. 07 12月, 2013 1 次提交
  6. 11 10月, 2013 1 次提交
  7. 29 6月, 2013 1 次提交
  8. 21 6月, 2013 3 次提交
  9. 25 4月, 2013 1 次提交
  10. 21 4月, 2013 1 次提交
  11. 10 4月, 2013 1 次提交
  12. 03 4月, 2013 6 次提交
  13. 16 3月, 2013 1 次提交
  14. 16 2月, 2013 1 次提交
  15. 01 12月, 2012 1 次提交
    • W
      KVM: x86: Emulate IA32_TSC_ADJUST MSR · ba904635
      Will Auld 提交于
      CPUID.7.0.EBX[1]=1 indicates IA32_TSC_ADJUST MSR 0x3b is supported
      
      Basic design is to emulate the MSR by allowing reads and writes to a guest
      vcpu specific location to store the value of the emulated MSR while adding
      the value to the vmcs tsc_offset. In this way the IA32_TSC_ADJUST value will
      be included in all reads to the TSC MSR whether through rdmsr or rdtsc. This
      is of course as long as the "use TSC counter offsetting" VM-execution control
      is enabled as well as the IA32_TSC_ADJUST control.
      
      However, because hardware will only return the TSC + IA32_TSC_ADJUST +
      vmsc tsc_offset for a guest process when it does and rdtsc (with the correct
      settings) the value of our virtualized IA32_TSC_ADJUST must be stored in one
      of these three locations. The argument against storing it in the actual MSR
      is performance. This is likely to be seldom used while the save/restore is
      required on every transition. IA32_TSC_ADJUST was created as a way to solve
      some issues with writing TSC itself so that is not an option either.
      
      The remaining option, defined above as our solution has the problem of
      returning incorrect vmcs tsc_offset values (unless we intercept and fix, not
      done here) as mentioned above. However, more problematic is that storing the
      data in vmcs tsc_offset will have a different semantic effect on the system
      than does using the actual MSR. This is illustrated in the following example:
      
      The hypervisor set the IA32_TSC_ADJUST, then the guest sets it and a guest
      process performs a rdtsc. In this case the guest process will get
      TSC + IA32_TSC_ADJUST_hyperviser + vmsc tsc_offset including
      IA32_TSC_ADJUST_guest. While the total system semantics changed the semantics
      as seen by the guest do not and hence this will not cause a problem.
      Signed-off-by: NWill Auld <will.auld@intel.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      ba904635
  16. 30 11月, 2012 1 次提交
  17. 14 11月, 2012 1 次提交
  18. 03 10月, 2012 1 次提交
  19. 19 9月, 2012 2 次提交
  20. 10 9月, 2012 1 次提交
  21. 21 7月, 2012 1 次提交
  22. 26 6月, 2012 1 次提交
  23. 22 2月, 2012 1 次提交
  24. 27 1月, 2012 1 次提交
  25. 26 1月, 2012 1 次提交
  26. 27 12月, 2011 1 次提交
  27. 26 9月, 2011 1 次提交
  28. 16 9月, 2011 1 次提交
    • L
      asm alternatives: remove incorrect alignment notes · a7f934d4
      Linus Torvalds 提交于
      On x86-64, they were just wasteful: with the explicitly added (now
      unnecessary) padding, the size of the alternatives structure was 16
      bytes, and an alignment of 8 bytes didn't hurt much.
      
      However, it was still silly, since the natural size and alignment for
      the structure is actually just 12 bytes, 4-byte aligned since commit
      59e97e4d ("x86: Make alternative instruction pointers relative").
      So removing the padding, and removing the extra alignment is just a good
      idea.
      
      On x86-32, the alignment of 4 bytes was correct, but was incorrectly
      hardcoded as 8 bytes in <asm/alternative-asm.h>.  That header file had
      used to be an x86-64 only header file, but various unification efforts
      have made it be used for x86-32 too (ie the unification of rwlock and
      rwsem).
      
      That in turn caused x86-32 boot failures, because the extra alignment
      would result in random zero-filled words in the altinstructions section,
      causing oopses early at boot when doing alternative instruction
      replacement.
      
      So just remove all the alignment noise entirely.  It's wrong, and it's
      unnecessary.  The section itself is already properly aligned by the
      linker scripts, and all additions to the section had better be of the
      proper 12-byte format, keeping it aligned.  So if the align directive
      were to ever make a difference, that would be an indication of a serious
      bug to begin with.
      Reported-by: NWerner Landgraf <w.landgraf@ru.r>
      Acked-by: NAndrew Lutomirski <luto@mit.edu>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a7f934d4
  29. 23 8月, 2011 1 次提交
  30. 10 8月, 2011 1 次提交
    • M
      crypto: sha1 - SSSE3 based SHA1 implementation for x86-64 · 66be8951
      Mathias Krause 提交于
      This is an assembler implementation of the SHA1 algorithm using the
      Supplemental SSE3 (SSSE3) instructions or, when available, the
      Advanced Vector Extensions (AVX).
      
      Testing with the tcrypt module shows the raw hash performance is up to
      2.3 times faster than the C implementation, using 8k data blocks on a
      Core 2 Duo T5500. For the smalest data set (16 byte) it is still 25%
      faster.
      
      Since this implementation uses SSE/YMM registers it cannot safely be
      used in every situation, e.g. while an IRQ interrupts a kernel thread.
      The implementation falls back to the generic SHA1 variant, if using
      the SSE/YMM registers is not possible.
      
      With this algorithm I was able to increase the throughput of a single
      IPsec link from 344 Mbit/s to 464 Mbit/s on a Core 2 Quad CPU using
      the SSSE3 variant -- a speedup of +34.8%.
      
      Saving and restoring SSE/YMM state might make the actual throughput
      fluctuate when there are FPU intensive userland applications running.
      For example, meassuring the performance using iperf2 directly on the
      machine under test gives wobbling numbers because iperf2 uses the FPU
      for each packet to check if the reporting interval has expired (in the
      above test I got min/max/avg: 402/484/464 MBit/s).
      
      Using this algorithm on a IPsec gateway gives much more reasonable and
      stable numbers, albeit not as high as in the directly connected case.
      Here is the result from an RFC 2544 test run with a EXFO Packet Blazer
      FTB-8510:
      
       frame size    sha1-generic     sha1-ssse3    delta
          64 byte     37.5 MBit/s    37.5 MBit/s     0.0%
         128 byte     56.3 MBit/s    62.5 MBit/s   +11.0%
         256 byte     87.5 MBit/s   100.0 MBit/s   +14.3%
         512 byte    131.3 MBit/s   150.0 MBit/s   +14.2%
        1024 byte    162.5 MBit/s   193.8 MBit/s   +19.3%
        1280 byte    175.0 MBit/s   212.5 MBit/s   +21.4%
        1420 byte    175.0 MBit/s   218.7 MBit/s   +25.0%
        1518 byte    150.0 MBit/s   181.2 MBit/s   +20.8%
      
      The throughput for the largest frame size is lower than for the
      previous size because the IP packets need to be fragmented in this
      case to make there way through the IPsec tunnel.
      Signed-off-by: NMathias Krause <minipli@googlemail.com>
      Cc: Maxim Locktyukhin <maxim.locktyukhin@intel.com>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      66be8951