1. 09 7月, 2012 2 次提交
    • A
      KVM: VMX: Relax check on unusable segment · f0495f9b
      Avi Kivity 提交于
      Some userspace (e.g. QEMU 1.1) munge the d and g bits of segment
      descriptors, causing us not to recognize them as unusable segments
      with emulate_invalid_guest_state=1.  Relax the check by testing for
      segment not present (a non-present segment cannot be usable).
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      f0495f9b
    • A
      KVM: VMX: Return correct CPL during transition to protected mode · d881e6f6
      Avi Kivity 提交于
      In protected mode, the CPL is defined as the lower two bits of CS, as set by
      the last far jump.  But during the transition to protected mode, there is no
      last far jump, so we need to return zero (the inherited real mode CPL).
      
      Fix by reading CPL from the cache during the transition.  This isn't 100%
      correct since we don't set the CPL cache on a far jump, but since protected
      mode transition will always jump to a segment with RPL=0, it will always
      work.
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      d881e6f6
  2. 04 7月, 2012 1 次提交
  3. 06 6月, 2012 1 次提交
  4. 05 6月, 2012 4 次提交
  5. 17 5月, 2012 2 次提交
    • A
      KVM: VMX: Optimize %ds, %es reload · b2da15ac
      Avi Kivity 提交于
      On x86_64, we can defer %ds and %es reload to the heavyweight context switch,
      since nothing in the lightweight paths uses the host %ds or %es (they are
      ignored by the processor).  Furthermore we can avoid the load if the segments
      are null, by letting the hardware load the null segments for us.  This is the
      expected case.
      
      On i386, we could avoid the reload entirely, since the entry.S paths take care
      of reload, except for the SYSEXIT path which leaves %ds and %es set to __USER_DS.
      So we set them to the same values as well.
      
      Saves about 70 cycles out of 1600 (around 4%; noisy measurements).
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      b2da15ac
    • A
      KVM: VMX: Fix %ds/%es clobber · 512d5649
      Avi Kivity 提交于
      The vmx exit code unconditionally restores %ds and %es to __USER_DS.  This
      can override the user's values, since %ds and %es are not saved and restored
      in x86_64 syscalls.  In practice, this isn't dangerous since nobody uses
      segment registers in long mode, least of all programs that use KVM.
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      512d5649
  6. 14 5月, 2012 1 次提交
    • X
      KVM: VMX: unlike vmcs on fail path · 5f3fbc34
      Xiao Guangrong 提交于
      fix:
      
      [ 1529.577273] Call Trace:
      [ 1529.577289]  [<ffffffffa060d58f>] kvm_arch_hardware_disable+0x13/0x30 [kvm]
      [ 1529.577302]  [<ffffffffa05fa2d4>] hardware_disable_nolock+0x35/0x39 [kvm]
      [ 1529.577311]  [<ffffffffa05fa29f>] ? cpumask_clear_cpu.constprop.31+0x13/0x13 [kvm]
      [ 1529.577315]  [<ffffffff81096ba8>] on_each_cpu+0x44/0x84
      [ 1529.577326]  [<ffffffffa05f98b5>] hardware_disable_all_nolock+0x34/0x36 [kvm]
      [ 1529.577335]  [<ffffffffa05f98e2>] hardware_disable_all+0x2b/0x39 [kvm]
      [ 1529.577349]  [<ffffffffa05fafe5>] kvm_put_kvm+0xed/0x10f [kvm]
      [ 1529.577358]  [<ffffffffa05fb3d7>] kvm_vm_release+0x22/0x28 [kvm]
      Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      5f3fbc34
  7. 19 4月, 2012 1 次提交
    • A
      KVM: VMX: Fix kvm_set_shared_msr() called in preemptible context · 2225fd56
      Avi Kivity 提交于
      kvm_set_shared_msr() may not be called in preemptible context,
      but vmx_set_msr() does so:
      
        BUG: using smp_processor_id() in preemptible [00000000] code: qemu-kvm/22713
        caller is kvm_set_shared_msr+0x32/0xa0 [kvm]
        Pid: 22713, comm: qemu-kvm Not tainted 3.4.0-rc3+ #39
        Call Trace:
         [<ffffffff8131fa82>] debug_smp_processor_id+0xe2/0x100
         [<ffffffffa0328ae2>] kvm_set_shared_msr+0x32/0xa0 [kvm]
         [<ffffffffa03a103b>] vmx_set_msr+0x28b/0x2d0 [kvm_intel]
         ...
      
      Making kvm_set_shared_msr() work in preemptible is cleaner, but
      it's used in the fast path.  Making two variants is overkill, so
      this patch just disables preemption around the call.
      Reported-by: NDave Jones <davej@redhat.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      2225fd56
  8. 08 4月, 2012 1 次提交
  9. 06 4月, 2012 1 次提交
  10. 08 3月, 2012 6 次提交
  11. 22 2月, 2012 1 次提交
    • L
      i387: Split up <asm/i387.h> into exported and internal interfaces · 1361b83a
      Linus Torvalds 提交于
      While various modules include <asm/i387.h> to get access to things we
      actually *intend* for them to use, most of that header file was really
      pretty low-level internal stuff that we really don't want to expose to
      others.
      
      So split the header file into two: the small exported interfaces remain
      in <asm/i387.h>, while the internal definitions that are only used by
      core architecture code are now in <asm/fpu-internal.h>.
      
      The guiding principle for this was to expose functions that we export to
      modules, and leave them in <asm/i387.h>, while stuff that is used by
      task switching or was marked GPL-only is in <asm/fpu-internal.h>.
      
      The fpu-internal.h file could be further split up too, especially since
      arch/x86/kvm/ uses some of the remaining stuff for its module.  But that
      kvm usage should probably be abstracted out a bit, and at least now the
      internal FPU accessor functions are much more contained.  Even if it
      isn't perhaps as contained as it _could_ be.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1202211340330.5354@i5.linux-foundation.orgSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      1361b83a
  12. 19 2月, 2012 1 次提交
    • L
      i387: move TS_USEDFPU flag from thread_info to task_struct · f94edacf
      Linus Torvalds 提交于
      This moves the bit that indicates whether a thread has ownership of the
      FPU from the TS_USEDFPU bit in thread_info->status to a word of its own
      (called 'has_fpu') in task_struct->thread.has_fpu.
      
      This fixes two independent bugs at the same time:
      
       - changing 'thread_info->status' from the scheduler causes nasty
         problems for the other users of that variable, since it is defined to
         be thread-synchronous (that's what the "TS_" part of the naming was
         supposed to indicate).
      
         So perfectly valid code could (and did) do
      
      	ti->status |= TS_RESTORE_SIGMASK;
      
         and the compiler was free to do that as separate load, or and store
         instructions.  Which can cause problems with preemption, since a task
         switch could happen in between, and change the TS_USEDFPU bit. The
         change to TS_USEDFPU would be overwritten by the final store.
      
         In practice, this seldom happened, though, because the 'status' field
         was seldom used more than once, so gcc would generally tend to
         generate code that used a read-modify-write instruction and thus
         happened to avoid this problem - RMW instructions are naturally low
         fat and preemption-safe.
      
       - On x86-32, the current_thread_info() pointer would, during interrupts
         and softirqs, point to a *copy* of the real thread_info, because
         x86-32 uses %esp to calculate the thread_info address, and thus the
         separate irq (and softirq) stacks would cause these kinds of odd
         thread_info copy aliases.
      
         This is normally not a problem, since interrupts aren't supposed to
         look at thread information anyway (what thread is running at
         interrupt time really isn't very well-defined), but it confused the
         heck out of irq_fpu_usable() and the code that tried to squirrel
         away the FPU state.
      
         (It also caused untold confusion for us poor kernel developers).
      
      It also turns out that using 'task_struct' is actually much more natural
      for most of the call sites that care about the FPU state, since they
      tend to work with the task struct for other reasons anyway (ie
      scheduling).  And the FPU data that we are going to save/restore is
      found there too.
      
      Thanks to Arjan Van De Ven <arjan@linux.intel.com> for pointing us to
      the %esp issue.
      
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Reported-and-tested-by: NRaphael Prevost <raphael@buro.asia>
      Acked-and-tested-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Tested-by: NPeter Anvin <hpa@zytor.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f94edacf
  13. 17 2月, 2012 1 次提交
    • L
      i387: don't ever touch TS_USEDFPU directly, use helper functions · 6d59d7a9
      Linus Torvalds 提交于
      This creates three helper functions that do the TS_USEDFPU accesses, and
      makes everybody that used to do it by hand use those helpers instead.
      
      In addition, there's a couple of helper functions for the "change both
      CR0.TS and TS_USEDFPU at the same time" case, and the places that do
      that together have been changed to use those.  That means that we have
      fewer random places that open-code this situation.
      
      The intent is partly to clarify the code without actually changing any
      semantics yet (since we clearly still have some hard to reproduce bug in
      this area), but also to make it much easier to use another approach
      entirely to caching the CR0.TS bit for software accesses.
      
      Right now we use a bit in the thread-info 'status' variable (this patch
      does not change that), but we might want to make it a full field of its
      own or even make it a per-cpu variable.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6d59d7a9
  14. 13 1月, 2012 1 次提交
  15. 27 12月, 2011 6 次提交
    • A
      KVM: VMX: Intercept RDPMC · fee84b07
      Avi Kivity 提交于
      Intercept RDPMC and forward it to the PMU emulation code.
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      Signed-off-by: NGleb Natapov <gleb@redhat.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      fee84b07
    • A
      KVM: Move cpuid code to new file · 00b27a3e
      Avi Kivity 提交于
      The cpuid code has grown; put it into a separate file.
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      00b27a3e
    • X
      KVM: introduce id_to_memslot function · 28a37544
      Xiao Guangrong 提交于
      Introduce id_to_memslot to get memslot by slot id
      Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      28a37544
    • G
      KVM: VMX: remove unneeded vmx_load_host_state() calls. · 46199f33
      Gleb Natapov 提交于
      vmx_load_host_state() does not handle msrs switching (except
      MSR_KERNEL_GS_BASE) since commit 26bb0981. Remove call to it
      where it is no longer make sense.
      Signed-off-by: NGleb Natapov <gleb@redhat.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      46199f33
    • N
      KVM: nVMX: Fix warning-causing idt-vectoring-info behavior · 51cfe38e
      Nadav Har'El 提交于
      When L0 wishes to inject an interrupt while L2 is running, it emulates an exit
      to L1 with EXIT_REASON_EXTERNAL_INTERRUPT. This was explained in the original
      nVMX patch 23, titled "Correct handling of interrupt injection".
      
      Unfortunately, it is possible (though rare) that at this point there is valid
      idt_vectoring_info in vmcs02. For example, L1 injected some interrupt to L2,
      and when L2 tried to run this interrupt's handler, it got a page fault - so
      it returns the original interrupt vector in idt_vectoring_info. The problem
      is that if this is the case, we cannot exit to L1 with EXTERNAL_INTERRUPT
      like we wished to, because the VMX spec guarantees that idt_vectoring_info
      and exit_reason_external_interrupt can never happen together. This is not
      just specified in the spec - a KVM L1 actually prints a kernel warning
      "unexpected, valid vectoring info" if we violate this guarantee, and some
      users noticed these warnings in L1's logs.
      
      In order to better emulate a processor, which would never return the external
      interrupt and the idt-vectoring-info together, we need to separate the two
      injection steps: First, complete L1's injection into L2 (i.e., enter L2,
      injecting to it the idt-vectoring-info); Second, after entry into L2 succeeds
      and it exits back to L0, exit to L1 with the EXIT_REASON_EXTERNAL_INTERRUPT.
      Most of this is already in the code - the only change we need is to remain
      in L2 (and not exit to L1) in this case.
      
      Note that the previous patch ensures (by using KVM_REQ_IMMEDIATE_EXIT) that
      although we do enter L2 first, it will exit immediately after processing its
      injection, allowing us to promptly inject to L1.
      
      Note how we test vmcs12->idt_vectoring_info_field; This isn't really the
      vmcs12 value (we haven't exited to L1 yet, so vmcs12 hasn't been updated),
      but rather the place we save, at the end of vmx_vcpu_run, the vmcs02 value
      of this field. This was explained in patch 25 ("Correct handling of idt
      vectoring info") of the original nVMX patch series.
      
      Thanks to Dave Allan and to Federico Simoncelli for reporting this bug,
      to Abel Gordon for helping me figure out the solution, and to Avi Kivity
      for helping to improve it.
      Signed-off-by: NNadav Har'El <nyh@il.ibm.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      51cfe38e
    • N
      KVM: nVMX: Add KVM_REQ_IMMEDIATE_EXIT · d6185f20
      Nadav Har'El 提交于
      This patch adds a new vcpu->requests bit, KVM_REQ_IMMEDIATE_EXIT.
      This bit requests that when next entering the guest, we should run it only
      for as little as possible, and exit again.
      
      We use this new option in nested VMX: When L1 launches L2, but L0 wishes L1
      to continue running so it can inject an event to it, we unfortunately cannot
      just pretend to have run L2 for a little while - We must really launch L2,
      otherwise certain one-off vmcs12 parameters (namely, L1 injection into L2)
      will be lost. So the existing code runs L2 in this case.
      But L2 could potentially run for a long time until it exits, and the
      injection into L1 will be delayed. The new KVM_REQ_IMMEDIATE_EXIT allows us
      to request that L2 will be entered, as necessary, but will exit as soon as
      possible after entry.
      
      Our implementation of this request uses smp_send_reschedule() to send a
      self-IPI, with interrupts disabled. The interrupts remain disabled until the
      guest is entered, and then, after the entry is complete (often including
      processing an injection and jumping to the relevant handler), the physical
      interrupt is noticed and causes an exit.
      
      On recent Intel processors, we could have achieved the same goal by using
      MTF instead of a self-IPI. Another technique worth considering in the future
      is to use VM_EXIT_ACK_INTR_ON_EXIT and a highest-priority vector IPI - to
      slightly improve performance by avoiding the useless interrupt handler
      which ends up being called when smp_send_reschedule() is used.
      Signed-off-by: NNadav Har'El <nyh@il.ibm.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      d6185f20
  16. 17 11月, 2011 3 次提交
  17. 26 9月, 2011 7 次提交