1. 08 6月, 2022 29 次提交
  2. 07 6月, 2022 7 次提交
    • M
      KVM: SVM: fix tsc scaling cache logic · 11d39e8c
      Maxim Levitsky 提交于
      SVM uses a per-cpu variable to cache the current value of the
      tsc scaling multiplier msr on each cpu.
      
      Commit 1ab9287a
      ("KVM: X86: Add vendor callbacks for writing the TSC multiplier")
      broke this caching logic.
      
      Refactor the code so that all TSC scaling multiplier writes go through
      a single function which checks and updates the cache.
      
      This fixes the following scenario:
      
      1. A CPU runs a guest with some tsc scaling ratio.
      
      2. New guest with different tsc scaling ratio starts on this CPU
         and terminates almost immediately.
      
         This ensures that the short running guest had set the tsc scaling ratio just
         once when it was set via KVM_SET_TSC_KHZ. Due to the bug,
         the per-cpu cache is not updated.
      
      3. The original guest continues to run, it doesn't restore the msr
         value back to its own value, because the cache matches,
         and thus continues to run with a wrong tsc scaling ratio.
      
      Fixes: 1ab9287a ("KVM: X86: Add vendor callbacks for writing the TSC multiplier")
      Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20220606181149.103072-1-mlevitsk@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      11d39e8c
    • V
      KVM: selftests: Make hyperv_clock selftest more stable · eae260be
      Vitaly Kuznetsov 提交于
      hyperv_clock doesn't always give a stable test result, especially with
      AMD CPUs. The test compares Hyper-V MSR clocksource (acquired either
      with rdmsr() from within the guest or KVM_GET_MSRS from the host)
      against rdtsc(). To increase the accuracy, increase the measured delay
      (done with nop loop) by two orders of magnitude and take the mean rdtsc()
      value before and after rdmsr()/KVM_GET_MSRS.
      Reported-by: NMaxim Levitsky <mlevitsk@redhat.com>
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com>
      Tested-by: NMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20220601144322.1968742-1-vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      eae260be
    • B
      KVM: x86/MMU: Zap non-leaf SPTEs when disabling dirty logging · 5ba7c4c6
      Ben Gardon 提交于
      Currently disabling dirty logging with the TDP MMU is extremely slow.
      On a 96 vCPU / 96G VM backed with gigabyte pages, it takes ~200 seconds
      to disable dirty logging with the TDP MMU, as opposed to ~4 seconds with
      the shadow MMU.
      
      When disabling dirty logging, zap non-leaf parent entries to allow
      replacement with huge pages instead of recursing and zapping all of the
      child, leaf entries. This reduces the number of TLB flushes required.
      and reduces the disable dirty log time with the TDP MMU to ~3 seconds.
      
      Opportunistically add a WARN() to catch GFNs that are mapped at a
      higher level than their max level.
      Signed-off-by: NBen Gardon <bgardon@google.com>
      Message-Id: <20220525230904.1584480-1-bgardon@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      5ba7c4c6
    • J
      x86: drop bogus "cc" clobber from __try_cmpxchg_user_asm() · 1df931d9
      Jan Beulich 提交于
      As noted (and fixed) a couple of times in the past, "=@cc<cond>" outputs
      and clobbering of "cc" don't work well together. The compiler appears to
      mean to reject such, but doesn't - in its upstream form - quite manage
      to yet for "cc". Furthermore two similar macros don't clobber "cc", and
      clobbering "cc" is pointless in asm()-s for x86 anyway - the compiler
      always assumes status flags to be clobbered there.
      
      Fixes: 989b5db2 ("x86/uaccess: Implement macros for CMPXCHG on user addresses")
      Signed-off-by: NJan Beulich <jbeulich@suse.com>
      Message-Id: <485c0c0b-a3a7-0b7c-5264-7d00c01de032@suse.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      1df931d9
    • S
      KVM: x86/mmu: Check every prev_roots in __kvm_mmu_free_obsolete_roots() · cf4a8693
      Shaoqin Huang 提交于
      When freeing obsolete previous roots, check prev_roots as intended, not
      the current root.
      Signed-off-by: NShaoqin Huang <shaoqin.huang@intel.com>
      Fixes: 527d5cd7 ("KVM: x86/mmu: Zap only obsolete roots if a root shadow page is zapped")
      Message-Id: <20220607005905.2933378-1-shaoqin.huang@intel.com>
      Cc: stable@vger.kernel.org
      Reviewed-by: NSean Christopherson <seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      cf4a8693
    • S
      entry/kvm: Exit to user mode when TIF_NOTIFY_SIGNAL is set · 3e684903
      Seth Forshee 提交于
      A livepatch transition may stall indefinitely when a kvm vCPU is heavily
      loaded. To the host, the vCPU task is a user thread which is spending a
      very long time in the ioctl(KVM_RUN) syscall. During livepatch
      transition, set_notify_signal() will be called on such tasks to
      interrupt the syscall so that the task can be transitioned. This
      interrupts guest execution, but when xfer_to_guest_mode_work() sees that
      TIF_NOTIFY_SIGNAL is set but not TIF_SIGPENDING it concludes that an
      exit to user mode is unnecessary, and guest execution is resumed without
      transitioning the task for the livepatch.
      
      This handling of TIF_NOTIFY_SIGNAL is incorrect, as set_notify_signal()
      is expected to break tasks out of interruptible kernel loops and cause
      them to return to userspace. Change xfer_to_guest_mode_work() to handle
      TIF_NOTIFY_SIGNAL the same as TIF_SIGPENDING, signaling to the vCPU run
      loop that an exit to userpsace is needed. Any pending task_work will be
      run when get_signal() is called from exit_to_user_mode_loop(), so there
      is no longer any need to run task work from xfer_to_guest_mode_work().
      Suggested-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Petr Mladek <pmladek@suse.com>
      Signed-off-by: NSeth Forshee <sforshee@digitalocean.com>
      Message-Id: <20220504180840.2907296-1-sforshee@digitalocean.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      3e684903
    • A
      KVM: Don't null dereference ops->destroy · e8bc2427
      Alexey Kardashevskiy 提交于
      A KVM device cleanup happens in either of two callbacks:
      1) destroy() which is called when the VM is being destroyed;
      2) release() which is called when a device fd is closed.
      
      Most KVM devices use 1) but Book3s's interrupt controller KVM devices
      (XICS, XIVE, XIVE-native) use 2) as they need to close and reopen during
      the machine execution. The error handling in kvm_ioctl_create_device()
      assumes destroy() is always defined which leads to NULL dereference as
      discovered by Syzkaller.
      
      This adds a checks for destroy!=NULL and adds a missing release().
      
      This is not changing kvm_destroy_devices() as devices with defined
      release() should have been removed from the KVM devices list by then.
      Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      e8bc2427
  3. 06 6月, 2022 4 次提交