1. 27 9月, 2019 3 次提交
    • P
      selftests: kvm: add test for dirty logging inside nested guests · 09444420
      Paolo Bonzini 提交于
      Check that accesses by nested guests are logged according to the
      L1 physical addresses rather than L2.
      
      Most of the patch is really adding EPT support to the testing
      framework.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      09444420
    • P
      KVM: x86: fix nested guest live migration with PML · 1f4e5fc8
      Paolo Bonzini 提交于
      Shadow paging is fundamentally incompatible with the page-modification
      log, because the GPAs in the log come from the wrong memory map.
      In particular, for the EPT page-modification log, the GPAs in the log come
      from L2 rather than L1.  (If there was a non-EPT page-modification log,
      we couldn't use it for shadow paging because it would log GVAs rather
      than GPAs).
      
      Therefore, we need to rely on write protection to record dirty pages.
      This has the side effect of bypassing PML, since writes now result in an
      EPT violation vmexit.
      
      This is relatively easy to add to KVM, because pretty much the only place
      that needs changing is spte_clear_dirty.  The first access to the page
      already goes through the page fault path and records the correct GPA;
      it's only subsequent accesses that are wrong.  Therefore, we can equip
      set_spte (where the first access happens) to record that the SPTE will
      have to be write protected, and then spte_clear_dirty will use this
      information to do the right thing.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      1f4e5fc8
    • P
      KVM: x86: assign two bits to track SPTE kinds · 6eeb4ef0
      Paolo Bonzini 提交于
      Currently, we are overloading SPTE_SPECIAL_MASK to mean both
      "A/D bits unavailable" and MMIO, where the difference between the
      two is determined by mio_mask and mmio_value.
      
      However, the next patch will need two bits to distinguish
      availability of A/D bits from write protection.  So, while at
      it give MMIO its own bit pattern, and move the two bits from
      bit 62 to bits 52..53 since Intel is allocating EPT page table
      bits from the top.
      Reviewed-by: NJunaid Shahid <junaids@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      6eeb4ef0
  2. 26 9月, 2019 8 次提交
  3. 25 9月, 2019 8 次提交
    • V
      KVM: vmx: fix build warnings in hv_enable_direct_tlbflush() on i386 · cab01850
      Vitaly Kuznetsov 提交于
      The following was reported on i386:
      
        arch/x86/kvm/vmx/vmx.c: In function 'hv_enable_direct_tlbflush':
        arch/x86/kvm/vmx/vmx.c:503:10: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
      
      pr_debugs() in this function are  more or less useless, let's just
      remove them. evmcs->hv_vm_id can use 'unsigned long' instead of 'u64'.
      
      Also, simplify the code a little bit.
      Reported-by: Nkbuild test robot <lkp@intel.com>
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      cab01850
    • S
      KVM: x86: Don't check kvm_rebooting in __kvm_handle_fault_on_reboot() · f209a26d
      Sean Christopherson 提交于
      Remove the kvm_rebooting check from VMX/SVM instruction exception fixup
      now that kvm_spurious_fault() conditions its BUG() on !kvm_rebooting.
      Because the 'cleanup_insn' functionally is also gone, deferring to
      kvm_spurious_fault() means __kvm_handle_fault_on_reboot() can eliminate
      its .fixup code entirely and have its exception table entry branch
      directly to the call to kvm_spurious_fault().
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      f209a26d
    • S
      KVM: x86: Drop ____kvm_handle_fault_on_reboot() · 98cd382d
      Sean Christopherson 提交于
      Remove the variation of __kvm_handle_fault_on_reboot() that accepts a
      post-fault cleanup instruction now that its sole user (VMREAD) uses
      a different method for handling faults.
      Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      98cd382d
    • S
      KVM: VMX: Add error handling to VMREAD helper · 6e202097
      Sean Christopherson 提交于
      Now that VMREAD flows require a taken branch, courtesy of commit
      
        3901336e ("x86/kvm: Don't call kvm_spurious_fault() from .fixup")
      
      bite the bullet and add full error handling to VMREAD, i.e. replace the
      JMP added by __ex()/____kvm_handle_fault_on_reboot() with a hinted Jcc.
      
      To minimize the code footprint, add a helper function, vmread_error(),
      to handle both faults and failures so that the inline flow has a single
      CALL.
      Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      6e202097
    • S
      KVM: VMX: Optimize VMX instruction error and fault handling · 52a9fcbc
      Sean Christopherson 提交于
      Rework the VMX instruction helpers using asm-goto to branch directly
      to error/fault "handlers" in lieu of using __ex(), i.e. the generic
      ____kvm_handle_fault_on_reboot().  Branching directly to fault handling
      code during fixup avoids the extra JMP that is inserted after every VMX
      instruction when using the generic "fault on reboot" (see commit
      3901336e, "x86/kvm: Don't call kvm_spurious_fault() from .fixup").
      
      Opportunistically clean up the helpers so that they all have consistent
      error handling and messages.
      
      Leave the usage of ____kvm_handle_fault_on_reboot() (via __ex()) in
      kvm_cpu_vmxoff() and nested_vmx_check_vmentry_hw() as is.  The VMXOFF
      case is not a fast path, i.e. the cleanliness of __ex() is worth the
      JMP, and the extra JMP in nested_vmx_check_vmentry_hw() is unavoidable.
      
      Note, VMREAD cannot get the asm-goto treatment as output operands aren't
      compatible with GCC's asm-goto due to internal compiler restrictions.
      Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      52a9fcbc
    • S
      KVM: x86: Check kvm_rebooting in kvm_spurious_fault() · 4b526de5
      Sean Christopherson 提交于
      Explicitly check kvm_rebooting in kvm_spurious_fault() prior to invoking
      BUG(), as opposed to assuming the caller has already done so.  Letting
      kvm_spurious_fault() be called "directly" will allow VMX to better
      optimize its low level assembly flows.
      
      As a happy side effect, kvm_spurious_fault() no longer needs to be
      marked as a dead end since it doesn't unconditionally BUG().
      Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      4b526de5
    • V
      KVM: selftests: fix ucall on x86 · 90a48843
      Vitaly Kuznetsov 提交于
      After commit e8bb4755eea2("KVM: selftests: Split ucall.c into architecture
      specific files") selftests which use ucall on x86 started segfaulting and
      apparently it's gcc to blame: it "optimizes" ucall() function throwing away
      va_start/va_end part because it thinks the structure is not being used.
      Previously, it couldn't do that because the there was also MMIO version and
      the decision which particular implementation to use was done at runtime.
      
      With older gccs it's possible to solve the problem by adding 'volatile'
      to 'struct ucall' but at least with gcc-8.3 this trick doesn't work.
      
      'memory' clobber seems to do the job.
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      90a48843
    • W
      Revert "locking/pvqspinlock: Don't wait if vCPU is preempted" · 89340d09
      Wanpeng Li 提交于
      This patch reverts commit 75437bb3 (locking/pvqspinlock: Don't
      wait if vCPU is preempted).  A large performance regression was caused
      by this commit.  on over-subscription scenarios.
      
      The test was run on a Xeon Skylake box, 2 sockets, 40 cores, 80 threads,
      with three VMs of 80 vCPUs each.  The score of ebizzy -M is reduced from
      13000-14000 records/s to 1700-1800 records/s:
      
                Host                Guest                score
      
      vanilla w/o kvm optimizations     upstream    1700-1800 records/s
      vanilla w/o kvm optimizations     revert      13000-14000 records/s
      vanilla w/ kvm optimizations      upstream    4500-5000 records/s
      vanilla w/ kvm optimizations      revert      14000-15500 records/s
      
      Exit from aggressive wait-early mechanism can result in premature yield
      and extra scheduling latency.
      
      Actually, only 6% of wait_early events are caused by vcpu_is_preempted()
      being true.  However, when one vCPU voluntarily releases its vCPU, all
      the subsequently waiters in the queue will do the same and the cascading
      effect leads to bad performance.
      
      kvm optimizations:
      [1] commit d73eb57b (KVM: Boost vCPUs that are delivering interrupts)
      [2] commit 266e85a5 (KVM: X86: Boost queue head vCPU to mitigate lock waiter preemption)
      
      Tested-by: loobinliu@tencent.com
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Waiman Long <longman@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: loobinliu@tencent.com
      Cc: stable@vger.kernel.org
      Fixes: 75437bb3 (locking/pvqspinlock: Don't wait if vCPU is preempted)
      Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      89340d09
  4. 24 9月, 2019 21 次提交