1. 07 5月, 2021 2 次提交
  2. 26 4月, 2021 5 次提交
  3. 24 4月, 2021 1 次提交
  4. 22 4月, 2021 2 次提交
    • W
      KVM: Boost vCPU candidate in user mode which is delivering interrupt · 52acd22f
      Wanpeng Li 提交于
      Both lock holder vCPU and IPI receiver that has halted are condidate for
      boost. However, the PLE handler was originally designed to deal with the
      lock holder preemption problem. The Intel PLE occurs when the spinlock
      waiter is in kernel mode. This assumption doesn't hold for IPI receiver,
      they can be in either kernel or user mode. the vCPU candidate in user mode
      will not be boosted even if they should respond to IPIs. Some benchmarks
      like pbzip2, swaptions etc do the TLB shootdown in kernel mode and most
      of the time they are running in user mode. It can lead to a large number
      of continuous PLE events because the IPI sender causes PLE events
      repeatedly until the receiver is scheduled while the receiver is not
      candidate for a boost.
      
      This patch boosts the vCPU candidiate in user mode which is delivery
      interrupt. We can observe the speed of pbzip2 improves 10% in 96 vCPUs
      VM in over-subscribe scenario (The host machine is 2 socket, 48 cores,
      96 HTs Intel CLX box). There is no performance regression for other
      benchmarks like Unixbench spawn (most of the time contend read/write
      lock in kernel mode), ebizzy (most of the time contend read/write sem
      and TLB shoodtdown in kernel mode).
      Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
      Message-Id: <1618542490-14756-1-git-send-email-wanpengli@tencent.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      52acd22f
    • N
      KVM: x86: Support KVM VMs sharing SEV context · 54526d1f
      Nathan Tempelman 提交于
      Add a capability for userspace to mirror SEV encryption context from
      one vm to another. On our side, this is intended to support a
      Migration Helper vCPU, but it can also be used generically to support
      other in-guest workloads scheduled by the host. The intention is for
      the primary guest and the mirror to have nearly identical memslots.
      
      The primary benefits of this are that:
      1) The VMs do not share KVM contexts (think APIC/MSRs/etc), so they
      can't accidentally clobber each other.
      2) The VMs can have different memory-views, which is necessary for post-copy
      migration (the migration vCPUs on the target need to read and write to
      pages, when the primary guest would VMEXIT).
      
      This does not change the threat model for AMD SEV. Any memory involved
      is still owned by the primary guest and its initial state is still
      attested to through the normal SEV_LAUNCH_* flows. If userspace wanted
      to circumvent SEV, they could achieve the same effect by simply attaching
      a vCPU to the primary VM.
      This patch deliberately leaves userspace in charge of the memslots for the
      mirror, as it already has the power to mess with them in the primary guest.
      
      This patch does not support SEV-ES (much less SNP), as it does not
      handle handing off attested VMSAs to the mirror.
      
      For additional context, we need a Migration Helper because SEV PSP
      migration is far too slow for our live migration on its own. Using
      an in-guest migrator lets us speed this up significantly.
      Signed-off-by: NNathan Tempelman <natet@google.com>
      Message-Id: <20210408223214.2582277-1-natet@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      54526d1f
  5. 20 4月, 2021 4 次提交
  6. 17 4月, 2021 2 次提交
  7. 01 4月, 2021 3 次提交
    • V
      KVM: x86: Prevent 'hv_clock->system_time' from going negative in kvm_guest_time_update() · 77fcbe82
      Vitaly Kuznetsov 提交于
      When guest time is reset with KVM_SET_CLOCK(0), it is possible for
      'hv_clock->system_time' to become a small negative number. This happens
      because in KVM_SET_CLOCK handling we set 'kvm->arch.kvmclock_offset' based
      on get_kvmclock_ns(kvm) but when KVM_REQ_CLOCK_UPDATE is handled,
      kvm_guest_time_update() does (masterclock in use case):
      
      hv_clock.system_time = ka->master_kernel_ns + v->kvm->arch.kvmclock_offset;
      
      And 'master_kernel_ns' represents the last time when masterclock
      got updated, it can precede KVM_SET_CLOCK() call. Normally, this is not a
      problem, the difference is very small, e.g. I'm observing
      hv_clock.system_time = -70 ns. The issue comes from the fact that
      'hv_clock.system_time' is stored as unsigned and 'system_time / 100' in
      compute_tsc_page_parameters() becomes a very big number.
      
      Use 'master_kernel_ns' instead of get_kvmclock_ns() when masterclock is in
      use and get_kvmclock_base_ns() when it's not to prevent 'system_time' from
      going negative.
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20210331124130.337992-2-vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      77fcbe82
    • P
      KVM: x86: disable interrupts while pvclock_gtod_sync_lock is taken · a83829f5
      Paolo Bonzini 提交于
      pvclock_gtod_sync_lock can be taken with interrupts disabled if the
      preempt notifier calls get_kvmclock_ns to update the Xen
      runstate information:
      
         spin_lock include/linux/spinlock.h:354 [inline]
         get_kvmclock_ns+0x25/0x390 arch/x86/kvm/x86.c:2587
         kvm_xen_update_runstate+0x3d/0x2c0 arch/x86/kvm/xen.c:69
         kvm_xen_update_runstate_guest+0x74/0x320 arch/x86/kvm/xen.c:100
         kvm_xen_runstate_set_preempted arch/x86/kvm/xen.h:96 [inline]
         kvm_arch_vcpu_put+0x2d8/0x5a0 arch/x86/kvm/x86.c:4062
      
      So change the users of the spinlock to spin_lock_irqsave and
      spin_unlock_irqrestore.
      
      Reported-by: syzbot+b282b65c2c68492df769@syzkaller.appspotmail.com
      Fixes: 30b5c851 ("KVM: x86/xen: Add support for vCPU runstate information")
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      a83829f5
    • P
      KVM: x86: reduce pvclock_gtod_sync_lock critical sections · c2c647f9
      Paolo Bonzini 提交于
      There is no need to include changes to vcpu->requests into
      the pvclock_gtod_sync_lock critical section.  The changes to
      the shared data structures (in pvclock_update_vm_gtod_copy)
      already occur under the lock.
      
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c2c647f9
  8. 31 3月, 2021 1 次提交
  9. 19 3月, 2021 2 次提交
    • W
      KVM: X86: Fix missing local pCPU when executing wbinvd on all dirty pCPUs · c2162e13
      Wanpeng Li 提交于
      In order to deal with noncoherent DMA, we should execute wbinvd on
      all dirty pCPUs when guest wbinvd exits to maintain data consistency.
      smp_call_function_many() does not execute the provided function on the
      local core, therefore replace it by on_each_cpu_mask().
      Reported-by: NNadav Amit <namit@vmware.com>
      Cc: Nadav Amit <namit@vmware.com>
      Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
      Message-Id: <1615517151-7465-1-git-send-email-wanpengli@tencent.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c2162e13
    • S
      KVM: x86: Protect userspace MSR filter with SRCU, and set atomically-ish · b318e8de
      Sean Christopherson 提交于
      Fix a plethora of issues with MSR filtering by installing the resulting
      filter as an atomic bundle instead of updating the live filter one range
      at a time.  The KVM_X86_SET_MSR_FILTER ioctl() isn't truly atomic, as
      the hardware MSR bitmaps won't be updated until the next VM-Enter, but
      the relevant software struct is atomically updated, which is what KVM
      really needs.
      
      Similar to the approach used for modifying memslots, make arch.msr_filter
      a SRCU-protected pointer, do all the work configuring the new filter
      outside of kvm->lock, and then acquire kvm->lock only when the new filter
      has been vetted and created.  That way vCPU readers either see the old
      filter or the new filter in their entirety, not some half-baked state.
      
      Yuan Yao pointed out a use-after-free in ksm_msr_allowed() due to a
      TOCTOU bug, but that's just the tip of the iceberg...
      
        - Nothing is __rcu annotated, making it nigh impossible to audit the
          code for correctness.
        - kvm_add_msr_filter() has an unpaired smp_wmb().  Violation of kernel
          coding style aside, the lack of a smb_rmb() anywhere casts all code
          into doubt.
        - kvm_clear_msr_filter() has a double free TOCTOU bug, as it grabs
          count before taking the lock.
        - kvm_clear_msr_filter() also has memory leak due to the same TOCTOU bug.
      
      The entire approach of updating the live filter is also flawed.  While
      installing a new filter is inherently racy if vCPUs are running, fixing
      the above issues also makes it trivial to ensure certain behavior is
      deterministic, e.g. KVM can provide deterministic behavior for MSRs with
      identical settings in the old and new filters.  An atomic update of the
      filter also prevents KVM from getting into a half-baked state, e.g. if
      installing a filter fails, the existing approach would leave the filter
      in a half-baked state, having already committed whatever bits of the
      filter were already processed.
      
      [*] https://lkml.kernel.org/r/20210312083157.25403-1-yaoyuan0329os@gmail.com
      
      Fixes: 1a155254 ("KVM: x86: Introduce MSR filtering")
      Cc: stable@vger.kernel.org
      Cc: Alexander Graf <graf@amazon.com>
      Reported-by: NYuan Yao <yaoyuan0329os@gmail.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210316184436.2544875-2-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      b318e8de
  10. 18 3月, 2021 1 次提交
    • I
      x86: Fix various typos in comments · d9f6e12f
      Ingo Molnar 提交于
      Fix ~144 single-word typos in arch/x86/ code comments.
      
      Doing this in a single commit should reduce the churn.
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: linux-kernel@vger.kernel.org
      d9f6e12f
  11. 17 3月, 2021 1 次提交
    • V
      KVM: x86: hyper-v: Prevent using not-yet-updated TSC page by secondary CPUs · e880c6ea
      Vitaly Kuznetsov 提交于
      When KVM_REQ_MASTERCLOCK_UPDATE request is issued (e.g. after migration)
      we need to make sure no vCPU sees stale values in PV clock structures and
      thus all vCPUs are kicked with KVM_REQ_CLOCK_UPDATE. Hyper-V TSC page
      clocksource is global and kvm_guest_time_update() only updates in on vCPU0
      but this is not entirely correct: nothing blocks some other vCPU from
      entering the guest before we finish the update on CPU0 and it can read
      stale values from the page.
      
      Invalidate TSC page in kvm_gen_update_masterclock() to switch all vCPUs
      to using MSR based clocksource (HV_X64_MSR_TIME_REF_COUNT).
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20210316143736.964151-3-vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      e880c6ea
  12. 15 3月, 2021 8 次提交
  13. 08 3月, 2021 1 次提交
  14. 06 3月, 2021 1 次提交
  15. 03 3月, 2021 2 次提交
  16. 26 2月, 2021 1 次提交
  17. 19 2月, 2021 3 次提交