1. 15 12月, 2020 2 次提交
  2. 12 12月, 2020 1 次提交
    • P
      KVM: x86: reinstate vendor-agnostic check on SPEC_CTRL cpuid bits · 39485ed9
      Paolo Bonzini 提交于
      Until commit e7c587da ("x86/speculation: Use synthetic bits for
      IBRS/IBPB/STIBP"), KVM was testing both Intel and AMD CPUID bits before
      allowing the guest to write MSR_IA32_SPEC_CTRL and MSR_IA32_PRED_CMD.
      Testing only Intel bits on VMX processors, or only AMD bits on SVM
      processors, fails if the guests are created with the "opposite" vendor
      as the host.
      
      While at it, also tweak the host CPU check to use the vendor-agnostic
      feature bit X86_FEATURE_IBPB, since we only care about the availability
      of the MSR on the host here and not about specific CPUID bits.
      
      Fixes: e7c587da ("x86/speculation: Use synthetic bits for IBRS/IBPB/STIBP")
      Cc: stable@vger.kernel.org
      Reported-by: NDenis V. Lunev <den@openvz.org>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      39485ed9
  3. 15 11月, 2020 8 次提交
    • J
      kvm: x86: Sink cpuid update into vendor-specific set_cr4 functions · 2259c17f
      Jim Mattson 提交于
      On emulated VM-entry and VM-exit, update the CPUID bits that reflect
      CR4.OSXSAVE and CR4.PKE.
      
      This fixes a bug where the CPUID bits could continue to reflect L2 CR4
      values after emulated VM-exit to L1. It also fixes a related bug where
      the CPUID bits could continue to reflect L1 CR4 values after emulated
      VM-entry to L2. The latter bug is mainly relevant to SVM, wherein
      CPUID is not a required intercept. However, it could also be relevant
      to VMX, because the code to conditionally update these CPUID bits
      assumes that the guest CPUID and the guest CR4 are always in sync.
      
      Fixes: 8eb3f87d ("KVM: nVMX: fix guest CR4 loading when emulating L2 to L1 exit")
      Fixes: 2acf923e ("KVM: VMX: Enable XSAVE/XRSTOR for guest")
      Fixes: b9baba86 ("KVM, pkeys: expose CPUID/CR4 to guest")
      Reported-by: NAbhiroop Dabral <adabral@paloaltonetworks.com>
      Signed-off-by: NJim Mattson <jmattson@google.com>
      Reviewed-by: NRicardo Koller <ricarkol@google.com>
      Reviewed-by: NPeter Shier <pshier@google.com>
      Cc: Haozhong Zhang <haozhong.zhang@intel.com>
      Cc: Dexuan Cui <dexuan.cui@intel.com>
      Cc: Huaitong Han <huaitong.han@intel.com>
      Message-Id: <20201029170648.483210-1-jmattson@google.com>
      2259c17f
    • P
      KVM: X86: Implement ring-based dirty memory tracking · fb04a1ed
      Peter Xu 提交于
      This patch is heavily based on previous work from Lei Cao
      <lei.cao@stratus.com> and Paolo Bonzini <pbonzini@redhat.com>. [1]
      
      KVM currently uses large bitmaps to track dirty memory.  These bitmaps
      are copied to userspace when userspace queries KVM for its dirty page
      information.  The use of bitmaps is mostly sufficient for live
      migration, as large parts of memory are be dirtied from one log-dirty
      pass to another.  However, in a checkpointing system, the number of
      dirty pages is small and in fact it is often bounded---the VM is
      paused when it has dirtied a pre-defined number of pages. Traversing a
      large, sparsely populated bitmap to find set bits is time-consuming,
      as is copying the bitmap to user-space.
      
      A similar issue will be there for live migration when the guest memory
      is huge while the page dirty procedure is trivial.  In that case for
      each dirty sync we need to pull the whole dirty bitmap to userspace
      and analyse every bit even if it's mostly zeros.
      
      The preferred data structure for above scenarios is a dense list of
      guest frame numbers (GFN).  This patch series stores the dirty list in
      kernel memory that can be memory mapped into userspace to allow speedy
      harvesting.
      
      This patch enables dirty ring for X86 only.  However it should be
      easily extended to other archs as well.
      
      [1] https://patchwork.kernel.org/patch/10471409/Signed-off-by: NLei Cao <lei.cao@stratus.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NPeter Xu <peterx@redhat.com>
      Message-Id: <20201001012222.5767-1-peterx@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      fb04a1ed
    • P
      KVM: X86: Don't track dirty for KVM_SET_[TSS_ADDR|IDENTITY_MAP_ADDR] · ff5a983c
      Peter Xu 提交于
      Originally, we have three code paths that can dirty a page without
      vcpu context for X86:
      
        - init_rmode_identity_map
        - init_rmode_tss
        - kvmgt_rw_gpa
      
      init_rmode_identity_map and init_rmode_tss will be setup on
      destination VM no matter what (and the guest cannot even see them), so
      it does not make sense to track them at all.
      
      To do this, allow __x86_set_memory_region() to return the userspace
      address that just allocated to the caller.  Then in both of the
      functions we directly write to the userspace address instead of
      calling kvm_write_*() APIs.
      
      Another trivial change is that we don't need to explicitly clear the
      identity page table root in init_rmode_identity_map() because no
      matter what we'll write to the whole page with 4M huge page entries.
      Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
      Reviewed-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPeter Xu <peterx@redhat.com>
      Message-Id: <20201001012044.5151-4-peterx@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      ff5a983c
    • P
      KVM: x86: fix apic_accept_events vs check_nested_events · 1c96dcce
      Paolo Bonzini 提交于
      vmx_apic_init_signal_blocked is buggy in that it returns true
      even in VMX non-root mode.  In non-root mode, however, INITs
      are not latched, they just cause a vmexit.  Previously,
      KVM was waiting for them to be processed when kvm_apic_accept_events
      and in the meanwhile it ate the SIPIs that the processor received.
      
      However, in order to implement the wait-for-SIPI activity state,
      KVM will have to process KVM_APIC_SIPI in vmx_check_nested_events,
      and it will not be possible anymore to disregard SIPIs in non-root
      mode as the code is currently doing.
      
      By calling kvm_x86_ops.nested_ops->check_events, we can force a vmexit
      (with the side-effect of latching INITs) before incorrectly injecting
      an INIT or SIPI in a guest, and therefore vmx_apic_init_signal_blocked
      can do the right thing.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      1c96dcce
    • S
      KVM: x86: Return bool instead of int for CR4 and SREGS validity checks · ee69c92b
      Sean Christopherson 提交于
      Rework the common CR4 and SREGS checks to return a bool instead of an
      int, i.e. true/false instead of 0/-EINVAL, and add "is" to the name to
      clarify the polarity of the return value (which is effectively inverted
      by this change).
      
      No functional changed intended.
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Message-Id: <20201007014417.29276-6-sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      ee69c92b
    • S
      KVM: x86: Move vendor CR4 validity check to dedicated kvm_x86_ops hook · c2fe3cd4
      Sean Christopherson 提交于
      Split out VMX's checks on CR4.VMXE to a dedicated hook, .is_valid_cr4(),
      and invoke the new hook from kvm_valid_cr4().  This fixes an issue where
      KVM_SET_SREGS would return success while failing to actually set CR4.
      
      Fixing the issue by explicitly checking kvm_x86_ops.set_cr4()'s return
      in __set_sregs() is not a viable option as KVM has already stuffed a
      variety of vCPU state.
      
      Note, kvm_valid_cr4() and is_valid_cr4() have different return types and
      inverted semantics.  This will be remedied in a future patch.
      
      Fixes: 5e1746d6 ("KVM: nVMX: Allow setting the VMXE bit in CR4")
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Message-Id: <20201007014417.29276-5-sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c2fe3cd4
    • S
      KVM: VMX: Drop explicit 'nested' check from vmx_set_cr4() · a447e38a
      Sean Christopherson 提交于
      Drop vmx_set_cr4()'s explicit check on the 'nested' module param now
      that common x86 handles the check by incorporating VMXE into the CR4
      reserved bits, via kvm_cpu_caps.  X86_FEATURE_VMX is set in kvm_cpu_caps
      (by vmx_set_cpu_caps()), if and only if 'nested' is true.
      
      No functional change intended.
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Message-Id: <20201007014417.29276-3-sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      a447e38a
    • S
      KVM: VMX: Drop guest CPUID check for VMXE in vmx_set_cr4() · d3a9e414
      Sean Christopherson 提交于
      Drop vmx_set_cr4()'s somewhat hidden guest_cpuid_has() check on VMXE now
      that common x86 handles the check by incorporating VMXE into the CR4
      reserved bits, i.e. in cr4_guest_rsvd_bits.  This fixes a bug where KVM
      incorrectly rejects KVM_SET_SREGS with CR4.VMXE=1 if it's executed
      before KVM_SET_CPUID{,2}.
      
      Fixes: 5e1746d6 ("KVM: nVMX: Allow setting the VMXE bit in CR4")
      Reported-by: NStas Sergeev <stsp@users.sourceforge.net>
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Message-Id: <20201007014417.29276-2-sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      d3a9e414
  4. 31 10月, 2020 2 次提交
  5. 24 10月, 2020 1 次提交
    • P
      KVM: vmx: rename pi_init to avoid conflict with paride · a3ff25fc
      Paolo Bonzini 提交于
      allyesconfig results in:
      
      ld: drivers/block/paride/paride.o: in function `pi_init':
      (.text+0x1340): multiple definition of `pi_init'; arch/x86/kvm/vmx/posted_intr.o:posted_intr.c:(.init.text+0x0): first defined here
      make: *** [Makefile:1164: vmlinux] Error 1
      
      because commit:
      
      commit 8888cdd0
      Author: Xiaoyao Li <xiaoyao.li@intel.com>
      Date:   Wed Sep 23 11:31:11 2020 -0700
      
          KVM: VMX: Extract posted interrupt support to separate files
      
      added another pi_init(), though one already existed in the paride code.
      Reported-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      a3ff25fc
  6. 22 10月, 2020 4 次提交
  7. 20 10月, 2020 1 次提交
  8. 03 10月, 2020 1 次提交
  9. 29 9月, 2020 1 次提交
  10. 28 9月, 2020 19 次提交