1. 01 11月, 2012 1 次提交
    • X
      KVM: x86: fix vcpu->mmio_fragments overflow · 87da7e66
      Xiao Guangrong 提交于
      After commit b3356bf0 (KVM: emulator: optimize "rep ins" handling),
      the pieces of io data can be collected and write them to the guest memory
      or MMIO together
      
      Unfortunately, kvm splits the mmio access into 8 bytes and store them to
      vcpu->mmio_fragments. If the guest uses "rep ins" to move large data, it
      will cause vcpu->mmio_fragments overflow
      
      The bug can be exposed by isapc (-M isapc):
      
      [23154.818733] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
      [ ......]
      [23154.858083] Call Trace:
      [23154.859874]  [<ffffffffa04f0e17>] kvm_get_cr8+0x1d/0x28 [kvm]
      [23154.861677]  [<ffffffffa04fa6d4>] kvm_arch_vcpu_ioctl_run+0xcda/0xe45 [kvm]
      [23154.863604]  [<ffffffffa04f5a1a>] ? kvm_arch_vcpu_load+0x17b/0x180 [kvm]
      
      Actually, we can use one mmio_fragment to store a large mmio access then
      split it when we pass the mmio-exit-info to userspace. After that, we only
      need two entries to store mmio info for the cross-mmio pages access
      Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      87da7e66
  2. 23 9月, 2012 2 次提交
    • J
      KVM: x86: Fix guest debug across vcpu INIT reset · c8639010
      Jan Kiszka 提交于
      If we reset a vcpu on INIT, we so far overwrote dr7 as provided by
      KVM_SET_GUEST_DEBUG, and we also cleared switch_db_regs unconditionally.
      
      Fix this by saving the dr7 used for guest debugging and calculating the
      effective register value as well as switch_db_regs on any potential
      change. This will change to focus of the set_guest_debug vendor op to
      update_dp_bp_intercept.
      
      Found while trying to stop on start_secondary.
      Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      c8639010
    • A
      KVM: Add resampling irqfds for level triggered interrupts · 7a84428a
      Alex Williamson 提交于
      To emulate level triggered interrupts, add a resample option to
      KVM_IRQFD.  When specified, a new resamplefd is provided that notifies
      the user when the irqchip has been resampled by the VM.  This may, for
      instance, indicate an EOI.  Also in this mode, posting of an interrupt
      through an irqfd only asserts the interrupt.  On resampling, the
      interrupt is automatically de-asserted prior to user notification.
      This enables level triggered interrupts to be posted and re-enabled
      from vfio with no userspace intervention.
      
      All resampling irqfds can make use of a single irq source ID, so we
      reserve a new one for this interface.
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      7a84428a
  3. 22 9月, 2012 1 次提交
    • S
      x86, kvm: fix kvm's usage of kernel_fpu_begin/end() · b1a74bf8
      Suresh Siddha 提交于
      Preemption is disabled between kernel_fpu_begin/end() and as such
      it is not a good idea to use these routines in kvm_load/put_guest_fpu()
      which can be very far apart.
      
      kvm_load/put_guest_fpu() routines are already called with
      preemption disabled and KVM already uses the preempt notifier to save
      the guest fpu state using kvm_put_guest_fpu().
      
      So introduce __kernel_fpu_begin/end() routines which don't touch
      preemption and use them instead of kernel_fpu_begin/end()
      for KVM's use model of saving/restoring guest FPU state.
      
      Also with this change (and with eagerFPU model), fix the host cr0.TS vm-exit
      state in the case of VMX. For eagerFPU case, host cr0.TS is always clear.
      So no need to worry about it. For the traditional lazyFPU restore case,
      change the cr0.TS bit for the host state during vm-exit to be always clear
      and cr0.TS bit is set in the __vmx_load_host_state() when the FPU
      (guest FPU or the host task's FPU) state is not active. This ensures
      that the host/guest FPU state is properly saved, restored
      during context-switch and with interrupts (using irq_fpu_usable()) not
      stomping on the active FPU state.
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Link: http://lkml.kernel.org/r/1348164109.26695.338.camel@sbsiddha-desk.sc.intel.com
      Cc: Avi Kivity <avi@redhat.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      b1a74bf8
  4. 20 9月, 2012 2 次提交
    • G
      KVM: optimize apic interrupt delivery · 1e08ec4a
      Gleb Natapov 提交于
      Most interrupt are delivered to only one vcpu. Use pre-build tables to
      find interrupt destination instead of looping through all vcpus. In case
      of logical mode loop only through vcpus in a logical cluster irq is sent
      to.
      Signed-off-by: NGleb Natapov <gleb@redhat.com>
      Acked-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      1e08ec4a
    • A
      KVM: MMU: Optimize pte permission checks · 97d64b78
      Avi Kivity 提交于
      walk_addr_generic() permission checks are a maze of branchy code, which is
      performed four times per lookup.  It depends on the type of access, efer.nxe,
      cr0.wp, cr4.smep, and in the near future, cr4.smap.
      
      Optimize this away by precalculating all variants and storing them in a
      bitmap.  The bitmap is recalculated when rarely-changing variables change
      (cr0, cr4) and is indexed by the often-changing variables (page fault error
      code, pte access permissions).
      
      The permission check is moved to the end of the loop, otherwise an SMEP
      fault could be reported as a false positive, when PDE.U=1 but PTE.U=0.
      Noted by Xiao Guangrong.
      
      The result is short, branch-free code.
      Reviewed-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      97d64b78
  5. 19 9月, 2012 1 次提交
  6. 18 9月, 2012 1 次提交
    • M
      KVM: make processes waiting on vcpu mutex killable · 9fc77441
      Michael S. Tsirkin 提交于
      vcpu mutex can be held for unlimited time so
      taking it with mutex_lock on an ioctl is wrong:
      one process could be passed a vcpu fd and
      call this ioctl on the vcpu used by another process,
      it will then be unkillable until the owner exits.
      
      Call mutex_lock_killable instead and return status.
      Note: mutex_lock_interruptible would be even nicer,
      but I am not sure all users are prepared to handle EINTR
      from these ioctls. They might misinterpret it as an error.
      
      Cleanup paths expect a vcpu that can't be used by
      any userspace so this will always succeed - catch bugs
      by calling BUG_ON.
      
      Catch callers that don't check return state by adding
      __must_check.
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      9fc77441
  7. 10 9月, 2012 1 次提交
    • X
      KVM: fix error paths for failed gfn_to_page() calls · 4484141a
      Xiao Guangrong 提交于
      This bug was triggered:
      [ 4220.198458] BUG: unable to handle kernel paging request at fffffffffffffffe
      [ 4220.203907] IP: [<ffffffff81104d85>] put_page+0xf/0x34
      ......
      [ 4220.237326] Call Trace:
      [ 4220.237361]  [<ffffffffa03830d0>] kvm_arch_destroy_vm+0xf9/0x101 [kvm]
      [ 4220.237382]  [<ffffffffa036fe53>] kvm_put_kvm+0xcc/0x127 [kvm]
      [ 4220.237401]  [<ffffffffa03702bc>] kvm_vcpu_release+0x18/0x1c [kvm]
      [ 4220.237407]  [<ffffffff81145425>] __fput+0x111/0x1ed
      [ 4220.237411]  [<ffffffff8114550f>] ____fput+0xe/0x10
      [ 4220.237418]  [<ffffffff81063511>] task_work_run+0x5d/0x88
      [ 4220.237424]  [<ffffffff8104c3f7>] do_exit+0x2bf/0x7ca
      
      The test case:
      
      	printf(fmt, ##args);		\
      	exit(-1);} while (0)
      
      static int create_vm(void)
      {
      	int sys_fd, vm_fd;
      
      	sys_fd = open("/dev/kvm", O_RDWR);
      	if (sys_fd < 0)
      		die("open /dev/kvm fail.\n");
      
      	vm_fd = ioctl(sys_fd, KVM_CREATE_VM, 0);
      	if (vm_fd < 0)
      		die("KVM_CREATE_VM fail.\n");
      
      	return vm_fd;
      }
      
      static int create_vcpu(int vm_fd)
      {
      	int vcpu_fd;
      
      	vcpu_fd = ioctl(vm_fd, KVM_CREATE_VCPU, 0);
      	if (vcpu_fd < 0)
      		die("KVM_CREATE_VCPU ioctl.\n");
      	printf("Create vcpu.\n");
      	return vcpu_fd;
      }
      
      static void *vcpu_thread(void *arg)
      {
      	int vm_fd = (int)(long)arg;
      
      	create_vcpu(vm_fd);
      	return NULL;
      }
      
      int main(int argc, char *argv[])
      {
      	pthread_t thread;
      	int vm_fd;
      
      	(void)argc;
      	(void)argv;
      
      	vm_fd = create_vm();
      	pthread_create(&thread, NULL, vcpu_thread, (void *)(long)vm_fd);
      	printf("Exit.\n");
      	return 0;
      }
      
      It caused by release kvm->arch.ept_identity_map_addr which is the
      error page.
      
      The parent thread can send KILL signal to the vcpu thread when it was
      exiting which stops faulting pages and potentially allocating memory.
      So gfn_to_pfn/gfn_to_page may fail at this time
      
      Fixed by checking the page before it is used
      Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      4484141a
  8. 06 9月, 2012 4 次提交
  9. 05 9月, 2012 3 次提交
  10. 31 8月, 2012 1 次提交
  11. 28 8月, 2012 2 次提交
  12. 22 8月, 2012 2 次提交
  13. 14 8月, 2012 1 次提交
  14. 09 8月, 2012 1 次提交
  15. 07 8月, 2012 2 次提交
  16. 06 8月, 2012 4 次提交
  17. 05 8月, 2012 1 次提交
  18. 04 8月, 2012 1 次提交
  19. 02 8月, 2012 1 次提交
    • B
      KVM: x86: apply kvmclock offset to guest wall clock time · 4b648665
      Bruce Rogers 提交于
      When a guest migrates to a new host, the system time difference from the
      previous host is used in the updates to the kvmclock system time visible
      to the guest, resulting in a continuation of correct kvmclock based guest
      timekeeping.
      
      The wall clock component of the kvmclock provided time is currently not
      updated with this same time offset. Since the Linux guest caches the
      wall clock based time, this discrepency is not noticed until the guest is
      rebooted. After reboot the guest's time calculations are off.
      
      This patch adjusts the wall clock by the kvmclock_offset, resulting in
      correct guest time after a reboot.
      
      Cc: Zachary Amsden <zamsden@gmail.com>
      Signed-off-by: NBruce Rogers <brogers@suse.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      4b648665
  20. 01 8月, 2012 1 次提交
  21. 26 7月, 2012 1 次提交
  22. 21 7月, 2012 1 次提交
  23. 20 7月, 2012 1 次提交
    • X
      KVM: x86: remove unnecessary mark_page_dirty · 9d3c92af
      Xiao Guangrong 提交于
      fix:
      [  132.474633] 3.5.0-rc1+ #50 Not tainted
      [  132.474634] -------------------------------
      [  132.474635] include/linux/kvm_host.h:369 suspicious rcu_dereference_check() usage!
      [  132.474636]
      [  132.474636] other info that might help us debug this:
      [  132.474636]
      [  132.474638]
      [  132.474638] rcu_scheduler_active = 1, debug_locks = 1
      [  132.474640] 1 lock held by qemu-kvm/2832:
      [  132.474657]  #0:  (&vcpu->mutex){+.+.+.}, at: [<ffffffffa01e1636>] vcpu_load+0x1e/0x91 [kvm]
      [  132.474658]
      [  132.474658] stack backtrace:
      [  132.474660] Pid: 2832, comm: qemu-kvm Not tainted 3.5.0-rc1+ #50
      [  132.474661] Call Trace:
      [  132.474665]  [<ffffffff81092f40>] lockdep_rcu_suspicious+0xfc/0x105
      [  132.474675]  [<ffffffffa01e0c85>] kvm_memslots+0x6d/0x75 [kvm]
      [  132.474683]  [<ffffffffa01e0ca1>] gfn_to_memslot+0x14/0x4c [kvm]
      [  132.474693]  [<ffffffffa01e3575>] mark_page_dirty+0x17/0x2a [kvm]
      [  132.474706]  [<ffffffffa01f21ea>] kvm_arch_vcpu_ioctl+0xbcf/0xc07 [kvm]
      
      Actually, we do not write vcpu->arch.time at this time, mark_page_dirty
      should be removed.
      Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      9d3c92af
  24. 19 7月, 2012 1 次提交
  25. 12 7月, 2012 1 次提交
    • M
      KVM: VMX: Implement PCID/INVPCID for guests with EPT · ad756a16
      Mao, Junjie 提交于
      This patch handles PCID/INVPCID for guests.
      
      Process-context identifiers (PCIDs) are a facility by which a logical processor
      may cache information for multiple linear-address spaces so that the processor
      may retain cached information when software switches to a different linear
      address space. Refer to section 4.10.1 in IA32 Intel Software Developer's Manual
      Volume 3A for details.
      
      For guests with EPT, the PCID feature is enabled and INVPCID behaves as running
      natively.
      For guests without EPT, the PCID feature is disabled and INVPCID triggers #UD.
      Signed-off-by: NJunjie Mao <junjie.mao@intel.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      ad756a16
  26. 09 7月, 2012 1 次提交
  27. 25 6月, 2012 1 次提交
    • M
      KVM: host side for eoi optimization · ae7a2a3f
      Michael S. Tsirkin 提交于
      Implementation of PV EOI using shared memory.
      This reduces the number of exits an interrupt
      causes as much as by half.
      
      The idea is simple: there's a bit, per APIC, in guest memory,
      that tells the guest that it does not need EOI.
      We set it before injecting an interrupt and clear
      before injecting a nested one. Guest tests it using
      a test and clear operation - this is necessary
      so that host can detect interrupt nesting -
      and if set, it can skip the EOI MSR.
      
      There's a new MSR to set the address of said register
      in guest memory. Otherwise not much changed:
      - Guest EOI is not required
      - Register is tested & ISR is automatically cleared on exit
      
      For testing results see description of previous patch
      'kvm_para: guest side for eoi avoidance'.
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      ae7a2a3f