1. 24 1月, 2020 10 次提交
  2. 23 1月, 2020 1 次提交
  3. 21 1月, 2020 7 次提交
    • S
      KVM: x86: Refactor and rename bit() to feature_bit() macro · 87382003
      Sean Christopherson 提交于
      Rename bit() to __feature_bit() to give it a more descriptive name, and
      add a macro, feature_bit(), to stuff the X68_FEATURE_ prefix to keep
      line lengths manageable for code that hardcodes the bit to be retrieved.
      
      No functional change intended.
      
      Cc: Jim Mattson <jmattson@google.com>
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      87382003
    • S
      KVM: x86: Add dedicated emulator helpers for querying CPUID features · 5ae78e95
      Sean Christopherson 提交于
      Add feature-specific helpers for querying guest CPUID support from the
      emulator instead of having the emulator do a full CPUID and perform its
      own bit tests.  The primary motivation is to eliminate the emulator's
      usage of bit() so that future patches can add more extensive build-time
      assertions on the usage of bit() without having to expose yet more code
      to the emulator.
      
      Note, providing a generic guest_cpuid_has() to the emulator doesn't work
      due to the existing built-time assertions in guest_cpuid_has(), which
      require the feature being checked to be a compile-time constant.
      
      No functional change intended.
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      5ae78e95
    • S
      KVM: x86: Add macro to ensure reserved cr4 bits checks stay in sync · 345599f9
      Sean Christopherson 提交于
      Add a helper macro to generate the set of reserved cr4 bits for both
      host and guest to ensure that adding a check on guest capabilities is
      also added for host capabilities, and vice versa.
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      345599f9
    • S
      KVM: x86: Ensure all logical CPUs have consistent reserved cr4 bits · f1cdecf5
      Sean Christopherson 提交于
      Check the current CPU's reserved cr4 bits against the mask calculated
      for the boot CPU to ensure consistent behavior across all CPUs.
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      f1cdecf5
    • S
      KVM: x86: Don't let userspace set host-reserved cr4 bits · b11306b5
      Sean Christopherson 提交于
      Calculate the host-reserved cr4 bits at runtime based on the system's
      capabilities (using logic similar to __do_cpuid_func()), and use the
      dynamically generated mask for the reserved bit check in kvm_set_cr4()
      instead using of the static CR4_RESERVED_BITS define.  This prevents
      userspace from "enabling" features in cr4 that are not supported by the
      system, e.g. by ignoring KVM_GET_SUPPORTED_CPUID and specifying a bogus
      CPUID for the vCPU.
      
      Allowing userspace to set unsupported bits in cr4 can lead to a variety
      of undesirable behavior, e.g. failed VM-Enter, and in general increases
      KVM's attack surface.  A crafty userspace can even abuse CR4.LA57 to
      induce an unchecked #GP on a WRMSR.
      
      On a platform without LA57 support:
      
        KVM_SET_CPUID2 // CPUID_7_0_ECX.LA57 = 1
        KVM_SET_SREGS  // CR4.LA57 = 1
        KVM_SET_MSRS   // KERNEL_GS_BASE = 0x0004000000000000
        KVM_RUN
      
      leads to a #GP when writing KERNEL_GS_BASE into hardware:
      
        unchecked MSR access error: WRMSR to 0xc0000102 (tried to write 0x0004000000000000)
        at rIP: 0xffffffffa00f239a (vmx_prepare_switch_to_guest+0x10a/0x1d0 [kvm_intel])
        Call Trace:
         kvm_arch_vcpu_ioctl_run+0x671/0x1c70 [kvm]
         kvm_vcpu_ioctl+0x36b/0x5d0 [kvm]
         do_vfs_ioctl+0xa1/0x620
         ksys_ioctl+0x66/0x70
         __x64_sys_ioctl+0x16/0x20
         do_syscall_64+0x4c/0x170
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
        RIP: 0033:0x7fc08133bf47
      
      Note, the above sequence fails VM-Enter due to invalid guest state.
      Userspace can allow VM-Enter to succeed (after the WRMSR #GP) by adding
      a KVM_SET_SREGS w/ CR4.LA57=0 after KVM_SET_MSRS, in which case KVM will
      technically leak the host's KERNEL_GS_BASE into the guest.  But, as
      KERNEL_GS_BASE is a userspace-defined value/address, the leak is largely
      benign as a malicious userspace would simply be exposing its own data to
      the guest, and attacking a benevolent userspace would require multiple
      bugs in the userspace VMM.
      
      Cc: stable@vger.kernel.org
      Cc: Jun Nakajima <jun.nakajima@intel.com>
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      b11306b5
    • M
      KVM: x86: check kvm_pit outside kvm_vm_ioctl_reinject() · cad23e72
      Miaohe Lin 提交于
      check kvm_pit outside kvm_vm_ioctl_reinject() to keep codestyle consistent
      with other kvm_pit func and prepare for futher cleanups.
      Signed-off-by: NMiaohe Lin <linmiaohe@huawei.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      cad23e72
    • W
      KVM: VMX: FIXED+PHYSICAL mode single target IPI fastpath · 1e9e2622
      Wanpeng Li 提交于
      ICR and TSCDEADLINE MSRs write cause the main MSRs write vmexits in our
      product observation, multicast IPIs are not as common as unicast IPI like
      RESCHEDULE_VECTOR and CALL_FUNCTION_SINGLE_VECTOR etc.
      
      This patch introduce a mechanism to handle certain performance-critical
      WRMSRs in a very early stage of KVM VMExit handler.
      
      This mechanism is specifically used for accelerating writes to x2APIC ICR
      that attempt to send a virtual IPI with physical destination-mode, fixed
      delivery-mode and single target. Which was found as one of the main causes
      of VMExits for Linux workloads.
      
      The reason this mechanism significantly reduce the latency of such virtual
      IPIs is by sending the physical IPI to the target vCPU in a very early stage
      of KVM VMExit handler, before host interrupts are enabled and before expensive
      operations such as reacquiring KVM’s SRCU lock.
      Latency is reduced even more when KVM is able to use APICv posted-interrupt
      mechanism (which allows to deliver the virtual IPI directly to target vCPU
      without the need to kick it to host).
      
      Testing on Xeon Skylake server:
      
      The virtual IPI latency from sender send to receiver receive reduces
      more than 200+ cpu cycles.
      Reviewed-by: NLiran Alon <liran.alon@oracle.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Sean Christopherson <sean.j.christopherson@intel.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Liran Alon <liran.alon@oracle.com>
      Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      1e9e2622
  4. 09 1月, 2020 6 次提交
    • S
      KVM: x86: Use gpa_t for cr2/gpa to fix TDP support on 32-bit KVM · 736c291c
      Sean Christopherson 提交于
      Convert a plethora of parameters and variables in the MMU and page fault
      flows from type gva_t to gpa_t to properly handle TDP on 32-bit KVM.
      
      Thanks to PSE and PAE paging, 32-bit kernels can access 64-bit physical
      addresses.  When TDP is enabled, the fault address is a guest physical
      address and thus can be a 64-bit value, even when both KVM and its guest
      are using 32-bit virtual addressing, e.g. VMX's VMCS.GUEST_PHYSICAL is a
      64-bit field, not a natural width field.
      
      Using a gva_t for the fault address means KVM will incorrectly drop the
      upper 32-bits of the GPA.  Ditto for gva_to_gpa() when it is used to
      translate L2 GPAs to L1 GPAs.
      
      Opportunistically rename variables and parameters to better reflect the
      dual address modes, e.g. use "cr2_or_gpa" for fault addresses and plain
      "addr" instead of "vaddr" when the address may be either a GVA or an L2
      GPA.  Similarly, use "gpa" in the nonpaging_page_fault() flows to avoid
      a confusing "gpa_t gva" declaration; this also sets the stage for a
      future patch to combing nonpaging_page_fault() and tdp_page_fault() with
      minimal churn.
      
      Sprinkle in a few comments to document flows where an address is known
      to be a GVA and thus can be safely truncated to a 32-bit value.  Add
      WARNs in kvm_handle_page_fault() and FNAME(gva_to_gpa_nested)() to help
      document such cases and detect bugs.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      736c291c
    • S
      KVM: x86: Add a WARN on TIF_NEED_FPU_LOAD in kvm_load_guest_fpu() · 95145c25
      Sean Christopherson 提交于
      WARN once in kvm_load_guest_fpu() if TIF_NEED_FPU_LOAD is observed, as
      that would mean that KVM is corrupting userspace's FPU by saving
      unknown register state into arch.user_fpu.  Add a comment to explain
      why KVM WARNs on TIF_NEED_FPU_LOAD instead of implementing logic
      similar to fpu__copy().
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      95145c25
    • S
      KVM: x86: Fix potential put_fpu() w/o load_fpu() on MPX platform · f958bd23
      Sean Christopherson 提交于
      Unlike most state managed by XSAVE, MPX is initialized to zero on INIT.
      Because INITs are usually recognized in the context of a VCPU_RUN call,
      kvm_vcpu_reset() puts the guest's FPU so that the FPU state is resident
      in memory, zeros the MPX state, and reloads FPU state to hardware.  But,
      in the unlikely event that an INIT is recognized during
      kvm_arch_vcpu_ioctl_get_mpstate() via kvm_apic_accept_events(),
      kvm_vcpu_reset() will call kvm_put_guest_fpu() without a preceding
      kvm_load_guest_fpu() and corrupt the guest's FPU state (and possibly
      userspace's FPU state as well).
      
      Given that MPX is being removed from the kernel[*], fix the bug with the
      simple-but-ugly approach of loading the guest's FPU during
      KVM_GET_MP_STATE.
      
      [*] See commit f240652b ("x86/mpx: Remove MPX APIs").
      
      Fixes: f775b13e ("x86,kvm: move qemu/guest FPU switching out to vcpu_run")
      Cc: stable@vger.kernel.org
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      f958bd23
    • M
      KVM: x86: Fix some comment typos · 0a03cbda
      Miaohe Lin 提交于
      Fix some typos in comment.
      Signed-off-by: NMiaohe Lin <linmiaohe@huawei.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      0a03cbda
    • P
      KVM: X86: Convert the last users of "shorthand = 0" to use macros · 150a84fe
      Peter Xu 提交于
      Change the last users of "shorthand = 0" to use APIC_DEST_NOSHORT.
      Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NPeter Xu <peterx@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      150a84fe
    • P
      KVM: X86: Use APIC_DEST_* macros properly in kvm_lapic_irq.dest_mode · c96001c5
      Peter Xu 提交于
      We were using either APIC_DEST_PHYSICAL|APIC_DEST_LOGICAL or 0|1 to
      fill in kvm_lapic_irq.dest_mode.  It's fine only because in most cases
      when we check against dest_mode it's against APIC_DEST_PHYSICAL (which
      equals to 0).  However, that's not consistent.  We'll have problem
      when we want to start checking against APIC_DEST_LOGICAL, which does
      not equals to 1.
      
      This patch firstly introduces kvm_lapic_irq_dest_mode() helper to take
      any boolean of destination mode and return the APIC_DEST_* macro.
      Then, it replaces the 0|1 settings of irq.dest_mode with the helper.
      Signed-off-by: NPeter Xu <peterx@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c96001c5
  5. 23 11月, 2019 3 次提交
    • S
      KVM: x86: Grab KVM's srcu lock when setting nested state · ad5996d9
      Sean Christopherson 提交于
      Acquire kvm->srcu for the duration of ->set_nested_state() to fix a bug
      where nVMX derefences ->memslots without holding ->srcu or ->slots_lock.
      
      The other half of nested migration, ->get_nested_state(), does not need
      to acquire ->srcu as it is a purely a dump of internal KVM (and CPU)
      state to userspace.
      
      Detected as an RCU lockdep splat that is 100% reproducible by running
      KVM's state_test selftest with CONFIG_PROVE_LOCKING=y.  Note that the
      failing function, kvm_is_visible_gfn(), is only checking the validity of
      a gfn, it's not actually accessing guest memory (which is more or less
      unsupported during vmx_set_nested_state() due to incorrect MMU state),
      i.e. vmx_set_nested_state() itself isn't fundamentally broken.  In any
      case, setting nested state isn't a fast path so there's no reason to go
      out of our way to avoid taking ->srcu.
      
        =============================
        WARNING: suspicious RCU usage
        5.4.0-rc7+ #94 Not tainted
        -----------------------------
        include/linux/kvm_host.h:626 suspicious rcu_dereference_check() usage!
      
                     other info that might help us debug this:
      
        rcu_scheduler_active = 2, debug_locks = 1
        1 lock held by evmcs_test/10939:
         #0: ffff88826ffcb800 (&vcpu->mutex){+.+.}, at: kvm_vcpu_ioctl+0x85/0x630 [kvm]
      
        stack backtrace:
        CPU: 1 PID: 10939 Comm: evmcs_test Not tainted 5.4.0-rc7+ #94
        Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
        Call Trace:
         dump_stack+0x68/0x9b
         kvm_is_visible_gfn+0x179/0x180 [kvm]
         mmu_check_root+0x11/0x30 [kvm]
         fast_cr3_switch+0x40/0x120 [kvm]
         kvm_mmu_new_cr3+0x34/0x60 [kvm]
         nested_vmx_load_cr3+0xbd/0x1f0 [kvm_intel]
         nested_vmx_enter_non_root_mode+0xab8/0x1d60 [kvm_intel]
         vmx_set_nested_state+0x256/0x340 [kvm_intel]
         kvm_arch_vcpu_ioctl+0x491/0x11a0 [kvm]
         kvm_vcpu_ioctl+0xde/0x630 [kvm]
         do_vfs_ioctl+0xa2/0x6c0
         ksys_ioctl+0x66/0x70
         __x64_sys_ioctl+0x16/0x20
         do_syscall_64+0x54/0x200
         entry_SYSCALL_64_after_hwframe+0x49/0xbe
        RIP: 0033:0x7f59a2b95f47
      
      Fixes: 8fcc4b59 ("kvm: nVMX: Introduce KVM_CAP_NESTED_STATE")
      Cc: stable@vger.kernel.org
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      ad5996d9
    • S
      KVM: x86: Open code shared_msr_update() in its only caller · 05c19c2f
      Sean Christopherson 提交于
      Fold shared_msr_update() into its sole user to eliminate its pointless
      bounds check, its godawful printk, its misleading comment (it's called
      under a global lock), and its woefully inaccurate name.
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      05c19c2f
    • S
      KVM: x86: Remove a spurious export of a static function · 24885d1d
      Sean Christopherson 提交于
      A recent change inadvertently exported a static function, which results
      in modpost throwing a warning.  Fix it.
      
      Fixes: cbbaa272 ("KVM: x86: fix presentation of TSX feature in ARCH_CAPABILITIES")
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Cc: stable@vger.kernel.org
      Reviewed-by: NJim Mattson <jmattson@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      24885d1d
  6. 21 11月, 2019 5 次提交
    • M
      KVM: x86: remove set but not used variable 'called' · db5a95ec
      Mao Wenan 提交于
      Fixes gcc '-Wunused-but-set-variable' warning:
      
      arch/x86/kvm/x86.c: In function kvm_make_scan_ioapic_request_mask:
      arch/x86/kvm/x86.c:7911:7: warning: variable called set but not
      used [-Wunused-but-set-variable]
      
      It is not used since commit 7ee30bc1 ("KVM: x86: deliver KVM
      IOAPIC scan request to target vCPUs")
      Signed-off-by: NMao Wenan <maowenan@huawei.com>
      Fixes: 7ee30bc1 ("KVM: x86: deliver KVM IOAPIC scan request to target vCPUs")
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      db5a95ec
    • P
      KVM: vmx: implement MSR_IA32_TSX_CTRL disable RTM functionality · c11f83e0
      Paolo Bonzini 提交于
      The current guest mitigation of TAA is both too heavy and not really
      sufficient.  It is too heavy because it will cause some affected CPUs
      (those that have MDS_NO but lack TAA_NO) to fall back to VERW and
      get the corresponding slowdown.  It is not really sufficient because
      it will cause the MDS_NO bit to disappear upon microcode update, so
      that VMs started before the microcode update will not be runnable
      anymore afterwards, even with tsx=on.
      
      Instead, if tsx=on on the host, we can emulate MSR_IA32_TSX_CTRL for
      the guest and let it run without the VERW mitigation.  Even though
      MSR_IA32_TSX_CTRL is quite heavyweight, and we do not want to write
      it on every vmentry, we can use the shared MSR functionality because
      the host kernel need not protect itself from TSX-based side-channels.
      Tested-by: NJim Mattson <jmattson@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c11f83e0
    • P
      KVM: x86: implement MSR_IA32_TSX_CTRL effect on CPUID · edef5c36
      Paolo Bonzini 提交于
      Because KVM always emulates CPUID, the CPUID clear bit
      (bit 1) of MSR_IA32_TSX_CTRL must be emulated "manually"
      by the hypervisor when performing said emulation.
      
      Right now neither kvm-intel.ko nor kvm-amd.ko implement
      MSR_IA32_TSX_CTRL but this will change in the next patch.
      Reviewed-by: NJim Mattson <jmattson@google.com>
      Tested-by: NJim Mattson <jmattson@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      edef5c36
    • P
      KVM: x86: do not modify masked bits of shared MSRs · de1fca5d
      Paolo Bonzini 提交于
      "Shared MSRs" are guest MSRs that are written to the host MSRs but
      keep their value until the next return to userspace.  They support
      a mask, so that some bits keep the host value, but this mask is
      only used to skip an unnecessary MSR write and the value written
      to the MSR is always the guest MSR.
      
      Fix this and, while at it, do not update smsr->values[slot].curr if
      for whatever reason the wrmsr fails.  This should only happen due to
      reserved bits, so the value written to smsr->values[slot].curr
      will not match when the user-return notifier and the host value will
      always be restored.  However, it is untidy and in rare cases this
      can actually avoid spurious WRMSRs on return to userspace.
      
      Cc: stable@vger.kernel.org
      Reviewed-by: NJim Mattson <jmattson@google.com>
      Tested-by: NJim Mattson <jmattson@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      de1fca5d
    • P
      KVM: x86: fix presentation of TSX feature in ARCH_CAPABILITIES · cbbaa272
      Paolo Bonzini 提交于
      KVM does not implement MSR_IA32_TSX_CTRL, so it must not be presented
      to the guests.  It is also confusing to have !ARCH_CAP_TSX_CTRL_MSR &&
      !RTM && ARCH_CAP_TAA_NO: lack of MSR_IA32_TSX_CTRL suggests TSX was not
      hidden (it actually was), yet the value says that TSX is not vulnerable
      to microarchitectural data sampling.  Fix both.
      
      Cc: stable@vger.kernel.org
      Tested-by: NJim Mattson <jmattson@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      cbbaa272
  7. 20 11月, 2019 1 次提交
  8. 15 11月, 2019 5 次提交
  9. 13 11月, 2019 1 次提交
  10. 12 11月, 2019 1 次提交
    • C
      KVM: X86: Fix initialization of MSR lists · 7a5ee6ed
      Chenyi Qiang 提交于
      The three MSR lists(msrs_to_save[], emulated_msrs[] and
      msr_based_features[]) are global arrays of kvm.ko, which are
      adjusted (copy supported MSRs forward to override the unsupported MSRs)
      when insmod kvm-{intel,amd}.ko, but it doesn't reset these three arrays
      to their initial value when rmmod kvm-{intel,amd}.ko. Thus, at the next
      installation, kvm-{intel,amd}.ko will do operations on the modified
      arrays with some MSRs lost and some MSRs duplicated.
      
      So define three constant arrays to hold the initial MSR lists and
      initialize msrs_to_save[], emulated_msrs[] and msr_based_features[]
      based on the constant arrays.
      
      Cc: stable@vger.kernel.org
      Reviewed-by: NXiaoyao Li <xiaoyao.li@intel.com>
      Signed-off-by: NChenyi Qiang <chenyi.qiang@intel.com>
      [Remove now useless conditionals. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      7a5ee6ed