1. 28 8月, 2012 1 次提交
    • A
      KVM: VMX: Separate saving pre-realmode state from setting segments · baa7e81e
      Avi Kivity 提交于
      Commit b246dd5d ("KVM: VMX: Fix KVM_SET_SREGS with big real mode
      segments") moved fix_rmode_seg() to vmx_set_segment(), so that it is
      applied not just on transitions to real mode, but also on KVM_SET_SREGS
      (migration).  However fix_rmode_seg() not only munges the vmcs segments,
      it also sets up the save area for us to restore when returning to
      protected mode or to return in vmx_get_segment().
      
      Move saving the segment into a new function, save_rmode_seg(), and
      call it just during the transition.
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      baa7e81e
  2. 14 8月, 2012 2 次提交
  3. 06 8月, 2012 1 次提交
  4. 02 8月, 2012 1 次提交
    • A
      KVM: VMX: Fix ds/es corruption on i386 with preemption · aa67f609
      Avi Kivity 提交于
      Commit b2da15ac ("KVM: VMX: Optimize %ds, %es reload") broke i386
      in the following scenario:
      
        vcpu_load
        ...
        vmx_save_host_state
        vmx_vcpu_run
        (ds.rpl, es.rpl cleared by hardware)
      
        interrupt
          push ds, es  # pushes bad ds, es
          schedule
            vmx_vcpu_put
              vmx_load_host_state
                reload ds, es (with __USER_DS)
          pop ds, es  # of other thread's stack
          iret
        # other thread runs
        interrupt
          push ds, es
          schedule  # back in vcpu thread
          pop ds, es  # now with rpl=0
          iret
        ...
        vcpu_put
        resume_userspace
        iret  # clears ds, es due to mismatched rpl
      
      (instead of resume_userspace, we might return with SYSEXIT and then
      take an exception; when the exception IRETs we end up with cleared
      ds, es)
      
      Fix by avoiding the optimization on i386 and reloading ds, es on the
      lightweight exit path.
      Reported-by: NChris Clayron <chris2553@googlemail.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      aa67f609
  5. 21 7月, 2012 1 次提交
  6. 12 7月, 2012 1 次提交
    • M
      KVM: VMX: Implement PCID/INVPCID for guests with EPT · ad756a16
      Mao, Junjie 提交于
      This patch handles PCID/INVPCID for guests.
      
      Process-context identifiers (PCIDs) are a facility by which a logical processor
      may cache information for multiple linear-address spaces so that the processor
      may retain cached information when software switches to a different linear
      address space. Refer to section 4.10.1 in IA32 Intel Software Developer's Manual
      Volume 3A for details.
      
      For guests with EPT, the PCID feature is enabled and INVPCID behaves as running
      natively.
      For guests without EPT, the PCID feature is disabled and INVPCID triggers #UD.
      Signed-off-by: NJunjie Mao <junjie.mao@intel.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      ad756a16
  7. 11 7月, 2012 1 次提交
  8. 09 7月, 2012 8 次提交
  9. 04 7月, 2012 1 次提交
  10. 06 6月, 2012 1 次提交
  11. 05 6月, 2012 4 次提交
  12. 17 5月, 2012 2 次提交
    • A
      KVM: VMX: Optimize %ds, %es reload · b2da15ac
      Avi Kivity 提交于
      On x86_64, we can defer %ds and %es reload to the heavyweight context switch,
      since nothing in the lightweight paths uses the host %ds or %es (they are
      ignored by the processor).  Furthermore we can avoid the load if the segments
      are null, by letting the hardware load the null segments for us.  This is the
      expected case.
      
      On i386, we could avoid the reload entirely, since the entry.S paths take care
      of reload, except for the SYSEXIT path which leaves %ds and %es set to __USER_DS.
      So we set them to the same values as well.
      
      Saves about 70 cycles out of 1600 (around 4%; noisy measurements).
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      b2da15ac
    • A
      KVM: VMX: Fix %ds/%es clobber · 512d5649
      Avi Kivity 提交于
      The vmx exit code unconditionally restores %ds and %es to __USER_DS.  This
      can override the user's values, since %ds and %es are not saved and restored
      in x86_64 syscalls.  In practice, this isn't dangerous since nobody uses
      segment registers in long mode, least of all programs that use KVM.
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      512d5649
  13. 14 5月, 2012 1 次提交
    • X
      KVM: VMX: unlike vmcs on fail path · 5f3fbc34
      Xiao Guangrong 提交于
      fix:
      
      [ 1529.577273] Call Trace:
      [ 1529.577289]  [<ffffffffa060d58f>] kvm_arch_hardware_disable+0x13/0x30 [kvm]
      [ 1529.577302]  [<ffffffffa05fa2d4>] hardware_disable_nolock+0x35/0x39 [kvm]
      [ 1529.577311]  [<ffffffffa05fa29f>] ? cpumask_clear_cpu.constprop.31+0x13/0x13 [kvm]
      [ 1529.577315]  [<ffffffff81096ba8>] on_each_cpu+0x44/0x84
      [ 1529.577326]  [<ffffffffa05f98b5>] hardware_disable_all_nolock+0x34/0x36 [kvm]
      [ 1529.577335]  [<ffffffffa05f98e2>] hardware_disable_all+0x2b/0x39 [kvm]
      [ 1529.577349]  [<ffffffffa05fafe5>] kvm_put_kvm+0xed/0x10f [kvm]
      [ 1529.577358]  [<ffffffffa05fb3d7>] kvm_vm_release+0x22/0x28 [kvm]
      Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      5f3fbc34
  14. 19 4月, 2012 1 次提交
    • A
      KVM: VMX: Fix kvm_set_shared_msr() called in preemptible context · 2225fd56
      Avi Kivity 提交于
      kvm_set_shared_msr() may not be called in preemptible context,
      but vmx_set_msr() does so:
      
        BUG: using smp_processor_id() in preemptible [00000000] code: qemu-kvm/22713
        caller is kvm_set_shared_msr+0x32/0xa0 [kvm]
        Pid: 22713, comm: qemu-kvm Not tainted 3.4.0-rc3+ #39
        Call Trace:
         [<ffffffff8131fa82>] debug_smp_processor_id+0xe2/0x100
         [<ffffffffa0328ae2>] kvm_set_shared_msr+0x32/0xa0 [kvm]
         [<ffffffffa03a103b>] vmx_set_msr+0x28b/0x2d0 [kvm_intel]
         ...
      
      Making kvm_set_shared_msr() work in preemptible is cleaner, but
      it's used in the fast path.  Making two variants is overkill, so
      this patch just disables preemption around the call.
      Reported-by: NDave Jones <davej@redhat.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      2225fd56
  15. 08 4月, 2012 1 次提交
  16. 06 4月, 2012 1 次提交
  17. 08 3月, 2012 6 次提交
  18. 22 2月, 2012 1 次提交
    • L
      i387: Split up <asm/i387.h> into exported and internal interfaces · 1361b83a
      Linus Torvalds 提交于
      While various modules include <asm/i387.h> to get access to things we
      actually *intend* for them to use, most of that header file was really
      pretty low-level internal stuff that we really don't want to expose to
      others.
      
      So split the header file into two: the small exported interfaces remain
      in <asm/i387.h>, while the internal definitions that are only used by
      core architecture code are now in <asm/fpu-internal.h>.
      
      The guiding principle for this was to expose functions that we export to
      modules, and leave them in <asm/i387.h>, while stuff that is used by
      task switching or was marked GPL-only is in <asm/fpu-internal.h>.
      
      The fpu-internal.h file could be further split up too, especially since
      arch/x86/kvm/ uses some of the remaining stuff for its module.  But that
      kvm usage should probably be abstracted out a bit, and at least now the
      internal FPU accessor functions are much more contained.  Even if it
      isn't perhaps as contained as it _could_ be.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1202211340330.5354@i5.linux-foundation.orgSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      1361b83a
  19. 19 2月, 2012 1 次提交
    • L
      i387: move TS_USEDFPU flag from thread_info to task_struct · f94edacf
      Linus Torvalds 提交于
      This moves the bit that indicates whether a thread has ownership of the
      FPU from the TS_USEDFPU bit in thread_info->status to a word of its own
      (called 'has_fpu') in task_struct->thread.has_fpu.
      
      This fixes two independent bugs at the same time:
      
       - changing 'thread_info->status' from the scheduler causes nasty
         problems for the other users of that variable, since it is defined to
         be thread-synchronous (that's what the "TS_" part of the naming was
         supposed to indicate).
      
         So perfectly valid code could (and did) do
      
      	ti->status |= TS_RESTORE_SIGMASK;
      
         and the compiler was free to do that as separate load, or and store
         instructions.  Which can cause problems with preemption, since a task
         switch could happen in between, and change the TS_USEDFPU bit. The
         change to TS_USEDFPU would be overwritten by the final store.
      
         In practice, this seldom happened, though, because the 'status' field
         was seldom used more than once, so gcc would generally tend to
         generate code that used a read-modify-write instruction and thus
         happened to avoid this problem - RMW instructions are naturally low
         fat and preemption-safe.
      
       - On x86-32, the current_thread_info() pointer would, during interrupts
         and softirqs, point to a *copy* of the real thread_info, because
         x86-32 uses %esp to calculate the thread_info address, and thus the
         separate irq (and softirq) stacks would cause these kinds of odd
         thread_info copy aliases.
      
         This is normally not a problem, since interrupts aren't supposed to
         look at thread information anyway (what thread is running at
         interrupt time really isn't very well-defined), but it confused the
         heck out of irq_fpu_usable() and the code that tried to squirrel
         away the FPU state.
      
         (It also caused untold confusion for us poor kernel developers).
      
      It also turns out that using 'task_struct' is actually much more natural
      for most of the call sites that care about the FPU state, since they
      tend to work with the task struct for other reasons anyway (ie
      scheduling).  And the FPU data that we are going to save/restore is
      found there too.
      
      Thanks to Arjan Van De Ven <arjan@linux.intel.com> for pointing us to
      the %esp issue.
      
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Reported-and-tested-by: NRaphael Prevost <raphael@buro.asia>
      Acked-and-tested-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Tested-by: NPeter Anvin <hpa@zytor.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f94edacf
  20. 17 2月, 2012 1 次提交
    • L
      i387: don't ever touch TS_USEDFPU directly, use helper functions · 6d59d7a9
      Linus Torvalds 提交于
      This creates three helper functions that do the TS_USEDFPU accesses, and
      makes everybody that used to do it by hand use those helpers instead.
      
      In addition, there's a couple of helper functions for the "change both
      CR0.TS and TS_USEDFPU at the same time" case, and the places that do
      that together have been changed to use those.  That means that we have
      fewer random places that open-code this situation.
      
      The intent is partly to clarify the code without actually changing any
      semantics yet (since we clearly still have some hard to reproduce bug in
      this area), but also to make it much easier to use another approach
      entirely to caching the CR0.TS bit for software accesses.
      
      Right now we use a bit in the thread-info 'status' variable (this patch
      does not change that), but we might want to make it a full field of its
      own or even make it a per-cpu variable.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6d59d7a9
  21. 13 1月, 2012 1 次提交
  22. 27 12月, 2011 2 次提交