1. 06 12月, 2009 1 次提交
  2. 03 12月, 2009 17 次提交
    • A
      KVM: VMX: Fix comparison of guest efer with stale host value · d5696725
      Avi Kivity 提交于
      update_transition_efer() masks out some efer bits when deciding whether
      to switch the msr during guest entry; for example, NX is emulated using the
      mmu so we don't need to disable it, and LMA/LME are handled by the hardware.
      
      However, with shared msrs, the comparison is made against a stale value;
      at the time of the guest switch we may be running with another guest's efer.
      
      Fix by deferring the mask/compare to the actual point of guest entry.
      
      Noted by Marcelo.
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      d5696725
    • A
      KVM: x86 emulator: limit instructions to 15 bytes · eb3c79e6
      Avi Kivity 提交于
      While we are never normally passed an instruction that exceeds 15 bytes,
      smp games can cause us to attempt to interpret one, which will cause
      large latencies in non-preempt hosts.
      
      Cc: stable@kernel.org
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      eb3c79e6
    • J
      KVM: x86: Add KVM_GET/SET_VCPU_EVENTS · 3cfc3092
      Jan Kiszka 提交于
      This new IOCTL exports all yet user-invisible states related to
      exceptions, interrupts, and NMIs. Together with appropriate user space
      changes, this fixes sporadic problems of vmsave/restore, live migration
      and system reset.
      
      [avi: future-proof abi by adding a flags field]
      Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      3cfc3092
    • A
      KVM: x86 shared msr infrastructure · 18863bdd
      Avi Kivity 提交于
      The various syscall-related MSRs are fairly expensive to switch.  Currently
      we switch them on every vcpu preemption, which is far too often:
      
      - if we're switching to a kernel thread (idle task, threaded interrupt,
        kernel-mode virtio server (vhost-net), for example) and back, then
        there's no need to switch those MSRs since kernel threasd won't
        be exiting to userspace.
      
      - if we're switching to another guest running an identical OS, most likely
        those MSRs will have the same value, so there's little point in reloading
        them.
      
      - if we're running the same OS on the guest and host, the MSRs will have
        identical values and reloading is unnecessary.
      
      This patch uses the new user return notifiers to implement last-minute
      switching, and checks the msr values to avoid unnecessary reloading.
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      18863bdd
    • G
      KVM: allow userspace to adjust kvmclock offset · afbcf7ab
      Glauber Costa 提交于
      When we migrate a kvm guest that uses pvclock between two hosts, we may
      suffer a large skew. This is because there can be significant differences
      between the monotonic clock of the hosts involved. When a new host with
      a much larger monotonic time starts running the guest, the view of time
      will be significantly impacted.
      
      Situation is much worse when we do the opposite, and migrate to a host with
      a smaller monotonic clock.
      
      This proposed ioctl will allow userspace to inform us what is the monotonic
      clock value in the source host, so we can keep the time skew short, and
      more importantly, never goes backwards. Userspace may also need to trigger
      the current data, since from the first migration onwards, it won't be
      reflected by a simple call to clock_gettime() anymore.
      
      [marcelo: future-proof abi with a flags field]
      [jan: fix KVM_GET_CLOCK by clearing flags field instead of checking it]
      Signed-off-by: NGlauber Costa <glommer@redhat.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      afbcf7ab
    • J
      KVM: SVM: Cleanup NMI singlestep · 6be7d306
      Jan Kiszka 提交于
      Push the NMI-related singlestep variable into vcpu_svm. It's dealing
      with an AMD-specific deficit, nothing generic for x86.
      Acked-by: NGleb Natapov <gleb@redhat.com>
      Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
      
       arch/x86/include/asm/kvm_host.h |    1 -
       arch/x86/kvm/svm.c              |   12 +++++++-----
       2 files changed, 7 insertions(+), 6 deletions(-)
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      6be7d306
    • J
      KVM: x86: Fix guest single-stepping while interruptible · 94fe45da
      Jan Kiszka 提交于
      Commit 705c5323 opened the doors of hell by unconditionally injecting
      single-step flags as long as guest_debug signaled this. This doesn't
      work when the guest branches into some interrupt or exception handler
      and triggers a vmexit with flag reloading.
      
      Fix it by saving cs:rip when user space requests single-stepping and
      restricting the trace flag injection to this guest code position.
      Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      94fe45da
    • E
      KVM: Xen PV-on-HVM guest support · ffde22ac
      Ed Swierk 提交于
      Support for Xen PV-on-HVM guests can be implemented almost entirely in
      userspace, except for handling one annoying MSR that maps a Xen
      hypercall blob into guest address space.
      
      A generic mechanism to delegate MSR writes to userspace seems overkill
      and risks encouraging similar MSR abuse in the future.  Thus this patch
      adds special support for the Xen HVM MSR.
      
      I implemented a new ioctl, KVM_XEN_HVM_CONFIG, that lets userspace tell
      KVM which MSR the guest will write to, as well as the starting address
      and size of the hypercall blobs (one each for 32-bit and 64-bit) that
      userspace has loaded from files.  When the guest writes to the MSR, KVM
      copies one page of the blob from userspace to the guest.
      
      I've tested this patch with a hacked-up version of Gerd's userspace
      code, booting a number of guests (CentOS 5.3 i386 and x86_64, and
      FreeBSD 8.0-RC1 amd64) and exercising PV network and block devices.
      
      [jan: fix i386 build warning]
      [avi: future proof abi with a flags field]
      Signed-off-by: NEd Swierk <eswierk@aristanetworks.com>
      Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      ffde22ac
    • M
      KVM: SVM: Support Pause Filter in AMD processors · 565d0998
      Mark Langsdorf 提交于
      New AMD processors (Family 0x10 models 8+) support the Pause
      Filter Feature.  This feature creates a new field in the VMCB
      called Pause Filter Count.  If Pause Filter Count is greater
      than 0 and intercepting PAUSEs is enabled, the processor will
      increment an internal counter when a PAUSE instruction occurs
      instead of intercepting.  When the internal counter reaches the
      Pause Filter Count value, a PAUSE intercept will occur.
      
      This feature can be used to detect contended spinlocks,
      especially when the lock holding VCPU is not scheduled.
      Rescheduling another VCPU prevents the VCPU seeking the
      lock from wasting its quantum by spinning idly.
      
      Experimental results show that most spinlocks are held
      for less than 1000 PAUSE cycles or more than a few
      thousand.  Default the Pause Filter Counter to 3000 to
      detect the contended spinlocks.
      
      Processor support for this feature is indicated by a CPUID
      bit.
      
      On a 24 core system running 4 guests each with 16 VCPUs,
      this patch improved overall performance of each guest's
      32 job kernbench by approximately 3-5% when combined
      with a scheduler algorithm thati caused the VCPU to
      sleep for a brief period. Further performance improvement
      may be possible with a more sophisticated yield algorithm.
      Signed-off-by: NMark Langsdorf <mark.langsdorf@amd.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      565d0998
    • Z
      KVM: VMX: Add support for Pause-Loop Exiting · 4b8d54f9
      Zhai, Edwin 提交于
      New NHM processors will support Pause-Loop Exiting by adding 2 VM-execution
      control fields:
      PLE_Gap    - upper bound on the amount of time between two successive
                   executions of PAUSE in a loop.
      PLE_Window - upper bound on the amount of time a guest is allowed to execute in
                   a PAUSE loop
      
      If the time, between this execution of PAUSE and previous one, exceeds the
      PLE_Gap, processor consider this PAUSE belongs to a new loop.
      Otherwise, processor determins the the total execution time of this loop(since
      1st PAUSE in this loop), and triggers a VM exit if total time exceeds the
      PLE_Window.
      * Refer SDM volume 3b section 21.6.13 & 22.1.3.
      
      Pause-Loop Exiting can be used to detect Lock-Holder Preemption, where one VP
      is sched-out after hold a spinlock, then other VPs for same lock are sched-in
      to waste the CPU time.
      
      Our tests indicate that most spinlocks are held for less than 212 cycles.
      Performance tests show that with 2X LP over-commitment we can get +2% perf
      improvement for kernel build(Even more perf gain with more LPs).
      Signed-off-by: NZhai Edwin <edwin.zhai@intel.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      4b8d54f9
    • J
      KVM: x86: Rework guest single-step flag injection and filtering · 91586a3b
      Jan Kiszka 提交于
      Push TF and RF injection and filtering on guest single-stepping into the
      vender get/set_rflags callbacks. This makes the whole mechanism more
      robust wrt user space IOCTL order and instruction emulations.
      Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      91586a3b
    • J
      KVM: x86: Refactor guest debug IOCTL handling · 355be0b9
      Jan Kiszka 提交于
      Much of so far vendor-specific code for setting up guest debug can
      actually be handled by the generic code. This also fixes a minor deficit
      in the SVM part /wrt processing KVM_GUESTDBG_ENABLE.
      Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      355be0b9
    • A
      KVM: Activate Virtualization On Demand · 10474ae8
      Alexander Graf 提交于
      X86 CPUs need to have some magic happening to enable the virtualization
      extensions on them. This magic can result in unpleasant results for
      users, like blocking other VMMs from working (vmx) or using invalid TLB
      entries (svm).
      
      Currently KVM activates virtualization when the respective kernel module
      is loaded. This blocks us from autoloading KVM modules without breaking
      other VMMs.
      
      To circumvent this problem at least a bit, this patch introduces on
      demand activation of virtualization. This means, that instead
      virtualization is enabled on creation of the first virtual machine
      and disabled on destruction of the last one.
      
      So using this, KVM can be easily autoloaded, while keeping other
      hypervisors usable.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      10474ae8
    • G
      KVM: Move irq ack notifier list to arch independent code · 136bdfee
      Gleb Natapov 提交于
      Mask irq notifier list is already there.
      Signed-off-by: NGleb Natapov <gleb@redhat.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      136bdfee
    • G
      KVM: Maintain back mapping from irqchip/pin to gsi · 3e71f88b
      Gleb Natapov 提交于
      Maintain back mapping from irqchip/pin to gsi to speedup
      interrupt acknowledgment notifications.
      
      [avi: build fix on non-x86/ia64]
      Signed-off-by: NGleb Natapov <gleb@redhat.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      3e71f88b
    • G
      KVM: Move irq sharing information to irqchip level · 1a6e4a8c
      Gleb Natapov 提交于
      This removes assumptions that max GSIs is smaller than number of pins.
      Sharing is tracked on pin level not GSI level.
      
      [avi: no PIC on ia64]
      Signed-off-by: NGleb Natapov <gleb@redhat.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      1a6e4a8c
    • A
      KVM: Don't pass kvm_run arguments · 851ba692
      Avi Kivity 提交于
      They're just copies of vcpu->run, which is readily accessible.
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      851ba692
  3. 02 12月, 2009 3 次提交
  4. 01 12月, 2009 1 次提交
    • H
      x86, mm: Correct the implementation of is_untracked_pat_range() · ccef0864
      H. Peter Anvin 提交于
      The semantics the PAT code expect of is_untracked_pat_range() is "is
      this range completely contained inside the untracked region."  This
      means that checkin 8a271389 was
      technically wrong, because the implementation needlessly confusing.
      
      The sane interface is for it to take a semiclosed range like just
      about everything else (as evidenced by the sheer number of "- 1"'s
      removed by that patch) so change the actual implementation to match.
      Reported-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Jack Steiner <steiner@sgi.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      LKML-Reference: <20091119202341.GA4420@sgi.com>
      ccef0864
  5. 27 11月, 2009 12 次提交
  6. 26 11月, 2009 3 次提交
    • I
      x86: Clean up the loadsegment() macro · 64b028b2
      Ingo Molnar 提交于
      Make it readable in the source too, not just in the assembly output.
      No change in functionality.
      
      Cc: Brian Gerst <brgerst@gmail.com>
      LKML-Reference: <1259176706-5908-1-git-send-email-brgerst@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      64b028b2
    • B
      x86: Optimize loadsegment() · 79b0379c
      Brian Gerst 提交于
      Zero the input register in the exception handler instead of
      using an extra register to pass in a zero value.
      Signed-off-by: NBrian Gerst <brgerst@gmail.com>
      LKML-Reference: <1259176706-5908-1-git-send-email-brgerst@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      79b0379c
    • I
      block: add helpers to run flush_dcache_page() against a bio and a request's pages · 2d4dc890
      Ilya Loginov 提交于
      Mtdblock driver doesn't call flush_dcache_page for pages in request.  So,
      this causes problems on architectures where the icache doesn't fill from
      the dcache or with dcache aliases.  The patch fixes this.
      
      The ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE symbol was introduced to avoid
      pointless empty cache-thrashing loops on architectures for which
      flush_dcache_page() is a no-op.  Every architecture was provided with this
      flush pages on architectires where ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE is
      equal 1 or do nothing otherwise.
      
      See "fix mtd_blkdevs problem with caches on some architectures" discussion
      on LKML for more information.
      Signed-off-by: NIlya Loginov <isloginov@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Peter Horton <phorton@bitbox.co.uk>
      Cc: "Ed L. Cashin" <ecashin@coraid.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      2d4dc890
  7. 25 11月, 2009 1 次提交
    • T
      x86: Rename global percpu symbol dr7 to cpu_dr7 · 28b4e0d8
      Tejun Heo 提交于
      Percpu symbols now occupy the same namespace as other global
      symbols and as such short global symbols without subsystem
      prefix tend to collide with local variables.  dr7 percpu
      variable used by x86 was hit by this. Rename it to cpu_dr7.
      
      The rename also makes it more consistent with its fellow
      cpu_debugreg percpu variable.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>,
      Cc: Andrew Morton <akpm@linux-foundation.org>
      LKML-Reference: <20091125115856.GA17856@elte.hu>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
      28b4e0d8
  8. 24 11月, 2009 2 次提交