1. 21 4月, 2020 12 次提交
  2. 16 4月, 2020 6 次提交
    • J
      KVM: nVMX: Invalidate all roots when emulating INVVPID without EPT · d6e3f838
      Junaid Shahid 提交于
      Free all roots when emulating INVVPID for L1 and EPT is disabled, as
      outstanding changes to the page tables managed by L1 need to be
      recognized.  Because L1 and L2 share an MMU when EPT is disabled, and
      because VPID is not tracked by the MMU role, all roots in the current
      MMU (root_mmu) need to be freed, otherwise a future nested VM-Enter or
      VM-Exit could do a fast CR3 switch (without a flush/sync) and consume
      stale SPTEs.
      
      Fixes: 5c614b35 ("KVM: nVMX: nested VPID emulation")
      Signed-off-by: NJunaid Shahid <junaids@google.com>
      [sean: ported to upstream KVM, reworded the comment and changelog]
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Message-Id: <20200320212833.3507-5-sean.j.christopherson@intel.com>
      Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      d6e3f838
    • S
      KVM: nVMX: Invalidate all EPTP contexts when emulating INVEPT for L1 · f8aa7e39
      Sean Christopherson 提交于
      Free all L2 (guest_mmu) roots when emulating INVEPT for L1.  Outstanding
      changes to the EPT tables managed by L1 need to be recognized, and
      relying on KVM to always flush L2's EPTP context on nested VM-Enter is
      dangerous.
      
      Similar to handle_invpcid(), rely on kvm_mmu_free_roots() to do a remote
      TLB flush if necessary, e.g. if L1 has never entered L2 then there is
      nothing to be done.
      
      Nuking all L2 roots is overkill for the single-context variant, but it's
      the safe and easy bet.  A more precise zap mechanism will be added in
      the future.  Add a TODO to call out that KVM only needs to invalidate
      affected contexts.
      
      Fixes: 14c07ad8 ("x86/kvm/mmu: introduce guest_mmu")
      Reported-by: NJim Mattson <jmattson@google.com>
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Message-Id: <20200320212833.3507-4-sean.j.christopherson@intel.com>
      Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      f8aa7e39
    • S
      KVM: nVMX: Validate the EPTP when emulating INVEPT(EXTENT_CONTEXT) · eed0030e
      Sean Christopherson 提交于
      Signal VM-Fail for the single-context variant of INVEPT if the specified
      EPTP is invalid.  Per the INEVPT pseudocode in Intel's SDM, it's subject
      to the standard EPT checks:
      
        If VM entry with the "enable EPT" VM execution control set to 1 would
        fail due to the EPTP value then VMfail(Invalid operand to INVEPT/INVVPID);
      
      Fixes: bfd0a56b ("nEPT: Nested INVEPT")
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Message-Id: <20200320212833.3507-3-sean.j.christopherson@intel.com>
      Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      eed0030e
    • S
      KVM: VMX: Flush all EPTP/VPID contexts on remote TLB flush · e8eff282
      Sean Christopherson 提交于
      Flush all EPTP/VPID contexts if a TLB flush _may_ have been triggered by
      a remote or deferred TLB flush, i.e. by KVM_REQ_TLB_FLUSH.  Remote TLB
      flushes require all contexts to be invalidated, not just the active
      contexts, e.g. all mappings in all contexts for a given HVA need to be
      invalidated on a mmu_notifier invalidation.  Similarly, the instigator
      of the deferred TLB flush may be expecting all contexts to be flushed,
      e.g. vmx_vcpu_load_vmcs().
      
      Without nested VMX, flushing only the current EPTP/VPID context isn't
      problematic because KVM uses a constant VPID for each vCPU, and
      mmu_alloc_direct_roots() all but guarantees KVM will use a single EPTP
      for L1.  In the rare case where a different EPTP is created or reused,
      KVM (currently) unconditionally flushes the new EPTP context prior to
      entering the guest.
      
      With nested VMX, KVM conditionally uses a different VPID for L2, and
      unconditionally uses a different EPTP for L2.  Because KVM doesn't
      _intentionally_ guarantee L2's EPTP/VPID context is flushed on nested
      VM-Enter, it'd be possible for a malicious L1 to attack the host and/or
      different VMs by exploiting the lack of flushing for L2.
      
        1) Launch nested guest from malicious L1.
      
        2) Nested VM-Enter to L2.
      
        3) Access target GPA 'g'.  CPU inserts TLB entry tagged with L2's ASID
           mapping 'g' to host PFN 'x'.
      
        2) Nested VM-Exit to L1.
      
        3) L1 triggers kernel same-page merging (ksm) by duplicating/zeroing
           the page for PFN 'x'.
      
        4) Host kernel merges PFN 'x' with PFN 'y', i.e. unmaps PFN 'x' and
           remaps the page to PFN 'y'.  mmu_notifier sends invalidate command,
           KVM flushes TLB only for L1's ASID.
      
        4) Host kernel reallocates PFN 'x' to some other task/guest.
      
        5) Nested VM-Enter to L2.  KVM does not invalidate L2's EPTP or VPID.
      
        6) L2 accesses GPA 'g' and gains read/write access to PFN 'x' via its
           stale TLB entry.
      
      However, current KVM unconditionally flushes L1's EPTP/VPID context on
      nested VM-Exit.  But, that behavior is mostly unintentional, KVM doesn't
      go out of its way to flush EPTP/VPID on nested VM-Enter/VM-Exit, rather
      a TLB flush is guaranteed to occur prior to re-entering L1 due to
      __kvm_mmu_new_cr3() always being called with skip_tlb_flush=false.  On
      nested VM-Enter, this happens via kvm_init_shadow_ept_mmu() (nested EPT
      enabled) or in nested_vmx_load_cr3() (nested EPT disabled).  On nested
      VM-Exit it occurs via nested_vmx_load_cr3().
      
      This also fixes a bug where a deferred TLB flush in the context of L2,
      with EPT disabled, would flush L1's VPID instead of L2's VPID, as
      vmx_flush_tlb() flushes L1's VPID regardless of is_guest_mode().
      
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Ben Gardon <bgardon@google.com>
      Cc: Jim Mattson <jmattson@google.com>
      Cc: Junaid Shahid <junaids@google.com>
      Cc: Liran Alon <liran.alon@oracle.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: John Haxby <john.haxby@oracle.com>
      Reviewed-by: NLiran Alon <liran.alon@oracle.com>
      Fixes: efebf0aa ("KVM: nVMX: Do not flush TLB on L1<->L2 transitions if L1 uses VPID and EPT")
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Message-Id: <20200320212833.3507-2-sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      e8eff282
    • O
      kvm: nVMX: match comment with return type for nested_vmx_exit_reflected · 69c09755
      Oliver Upton 提交于
      nested_vmx_exit_reflected() returns a bool, not int. As such, refer to
      the return values as true/false in the comment instead of 1/0.
      Signed-off-by: NOliver Upton <oupton@google.com>
      Message-Id: <20200414221241.134103-1-oupton@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      69c09755
    • O
      kvm: nVMX: reflect MTF VM-exits if injected by L1 · b045ae90
      Oliver Upton 提交于
      According to SDM 26.6.2, it is possible to inject an MTF VM-exit via the
      VM-entry interruption-information field regardless of the 'monitor trap
      flag' VM-execution control. KVM appropriately copies the VM-entry
      interruption-information field from vmcs12 to vmcs02. However, if L1
      has not set the 'monitor trap flag' VM-execution control, KVM fails to
      reflect the subsequent MTF VM-exit into L1.
      
      Fix this by consulting the VM-entry interruption-information field of
      vmcs12 to determine if L1 has injected the MTF VM-exit. If so, reflect
      the exit, regardless of the 'monitor trap flag' VM-execution control.
      
      Fixes: 5f3d45e7 ("kvm/x86: add support for MONITOR_TRAP_FLAG")
      Signed-off-by: NOliver Upton <oupton@google.com>
      Reviewed-by: NPeter Shier <pshier@google.com>
      Reviewed-by: NJim Mattson <jmattson@google.com>
      Message-Id: <20200414224746.240324-1-oupton@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      b045ae90
  3. 14 4月, 2020 1 次提交
  4. 07 4月, 2020 3 次提交
  5. 03 4月, 2020 1 次提交
    • Q
      x86/kvm: fix a missing-prototypes "vmread_error" · 514ccc19
      Qian Cai 提交于
      The commit 842f4be9 ("KVM: VMX: Add a trampoline to fix VMREAD error
      handling") removed the declaration of vmread_error() causes a W=1 build
      failure with KVM_WERROR=y. Fix it by adding it back.
      
      arch/x86/kvm/vmx/vmx.c:359:17: error: no previous prototype for 'vmread_error' [-Werror=missing-prototypes]
       asmlinkage void vmread_error(unsigned long field, bool fault)
                       ^~~~~~~~~~~~
      Signed-off-by: NQian Cai <cai@lca.pw>
      Message-Id: <20200402153955.1695-1-cai@lca.pw>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      514ccc19
  6. 31 3月, 2020 7 次提交
  7. 25 3月, 2020 2 次提交
  8. 24 3月, 2020 4 次提交
    • S
      KVM: VMX: Gracefully handle faults on VMXON · 4f6ea0a8
      Sean Christopherson 提交于
      Gracefully handle faults on VMXON, e.g. #GP due to VMX being disabled by
      BIOS, instead of letting the fault crash the system.  Now that KVM uses
      cpufeatures to query support instead of reading MSR_IA32_FEAT_CTL
      directly, it's possible for a bug in a different subsystem to cause KVM
      to incorrectly attempt VMXON[*].  Crashing the system is especially
      annoying if the system is configured such that hardware_enable() will
      be triggered during boot.
      
      Oppurtunistically rename @addr to @vmxon_pointer and use a named param
      to reference it in the inline assembly.
      
      Print 0xdeadbeef in the ultra-"rare" case that reading MSR_IA32_FEAT_CTL
      also faults.
      
      [*] https://lkml.kernel.org/r/20200226231615.13664-1-sean.j.christopherson@intel.comSigned-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Message-Id: <20200321193751.24985-4-sean.j.christopherson@intel.com>
      Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      4f6ea0a8
    • S
      KVM: VMX: Fold loaded_vmcs_init() into alloc_loaded_vmcs() · d260f9ef
      Sean Christopherson 提交于
      Subsume loaded_vmcs_init() into alloc_loaded_vmcs(), its only remaining
      caller, and drop the VMCLEAR on the shadow VMCS, which is guaranteed to
      be NULL.  loaded_vmcs_init() was previously used by loaded_vmcs_clear(),
      but loaded_vmcs_clear() also subsumed loaded_vmcs_init() to properly
      handle smp_wmb() with respect to VMCLEAR.
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Message-Id: <20200321193751.24985-3-sean.j.christopherson@intel.com>
      Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      d260f9ef
    • S
      KVM: VMX: Always VMCLEAR in-use VMCSes during crash with kexec support · 31603d4f
      Sean Christopherson 提交于
      VMCLEAR all in-use VMCSes during a crash, even if kdump's NMI shootdown
      interrupted a KVM update of the percpu in-use VMCS list.
      
      Because NMIs are not blocked by disabling IRQs, it's possible that
      crash_vmclear_local_loaded_vmcss() could be called while the percpu list
      of VMCSes is being modified, e.g. in the middle of list_add() in
      vmx_vcpu_load_vmcs().  This potential corner case was called out in the
      original commit[*], but the analysis of its impact was wrong.
      
      Skipping the VMCLEARs is wrong because it all but guarantees that a
      loaded, and therefore cached, VMCS will live across kexec and corrupt
      memory in the new kernel.  Corruption will occur because the CPU's VMCS
      cache is non-coherent, i.e. not snooped, and so the writeback of VMCS
      memory on its eviction will overwrite random memory in the new kernel.
      The VMCS will live because the NMI shootdown also disables VMX, i.e. the
      in-progress VMCLEAR will #UD, and existing Intel CPUs do not flush the
      VMCS cache on VMXOFF.
      
      Furthermore, interrupting list_add() and list_del() is safe due to
      crash_vmclear_local_loaded_vmcss() using forward iteration.  list_add()
      ensures the new entry is not visible to forward iteration unless the
      entire add completes, via WRITE_ONCE(prev->next, new).  A bad "prev"
      pointer could be observed if the NMI shootdown interrupted list_del() or
      list_add(), but list_for_each_entry() does not consume ->prev.
      
      In addition to removing the temporary disabling of VMCLEAR, open code
      loaded_vmcs_init() in __loaded_vmcs_clear() and reorder VMCLEAR so that
      the VMCS is deleted from the list only after it's been VMCLEAR'd.
      Deleting the VMCS before VMCLEAR would allow a race where the NMI
      shootdown could arrive between list_del() and vmcs_clear() and thus
      neither flow would execute a successful VMCLEAR.  Alternatively, more
      code could be moved into loaded_vmcs_init(), but that gets rather silly
      as the only other user, alloc_loaded_vmcs(), doesn't need the smp_wmb()
      and would need to work around the list_del().
      
      Update the smp_*() comments related to the list manipulation, and
      opportunistically reword them to improve clarity.
      
      [*] https://patchwork.kernel.org/patch/1675731/#3720461
      
      Fixes: 8f536b76 ("KVM: VMX: provide the vmclear function and a bitmap to support VMCLEAR in kdump")
      Cc: stable@vger.kernel.org
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Message-Id: <20200321193751.24985-2-sean.j.christopherson@intel.com>
      Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      31603d4f
    • N
      KVM: VMX: don't allow memory operands for inline asm that modifies SP · 428b8f1d
      Nick Desaulniers 提交于
      THUNK_TARGET defines [thunk_target] as having "rm" input constraints
      when CONFIG_RETPOLINE is not set, which isn't constrained enough for
      this specific case.
      
      For inline assembly that modifies the stack pointer before using this
      input, the underspecification of constraints is dangerous, and results
      in an indirect call to a previously pushed flags register.
      
      In this case `entry`'s stack slot is good enough to satisfy the "m"
      constraint in "rm", but the inline assembly in
      handle_external_interrupt_irqoff() modifies the stack pointer via
      push+pushf before using this input, which in this case results in
      calling what was the previous state of the flags register, rather than
      `entry`.
      
      Be more specific in the constraints by requiring `entry` be in a
      register, and not a memory operand.
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Reported-by: syzbot+3f29ca2efb056a761e38@syzkaller.appspotmail.com
      Debugged-by: NAlexander Potapenko <glider@google.com>
      Debugged-by: NPaolo Bonzini <pbonzini@redhat.com>
      Debugged-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NNick Desaulniers <ndesaulniers@google.com>
      Message-Id: <20200323191243.30002-1-ndesaulniers@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      428b8f1d
  9. 18 3月, 2020 1 次提交
  10. 17 3月, 2020 3 次提交
    • U
      KVM: VMX: access regs array in vmenter.S in its natural order · bb03911f
      Uros Bizjak 提交于
      Registers in "regs" array are indexed as rax/rcx/rdx/.../rsi/rdi/r8/...
      Reorder access to "regs" array in vmenter.S to follow its natural order.
      Signed-off-by: NUros Bizjak <ubizjak@gmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      bb03911f
    • V
      KVM: nVMX: properly handle errors in nested_vmx_handle_enlightened_vmptrld() · b6a0653a
      Vitaly Kuznetsov 提交于
      nested_vmx_handle_enlightened_vmptrld() fails in two cases:
      - when we fail to kvm_vcpu_map() the supplied GPA
      - when revision_id is incorrect.
      Genuine Hyper-V raises #UD in the former case (at least with *some*
      incorrect GPAs) and does VMfailInvalid() in the later. KVM doesn't do
      anything so L1 just gets stuck retrying the same faulty VMLAUNCH.
      
      nested_vmx_handle_enlightened_vmptrld() has two call sites:
      nested_vmx_run() and nested_get_vmcs12_pages(). The former needs to queue
      do much: the failure there happens after migration when L2 was running (and
      L1 did something weird like wrote to VP assist page from a different vCPU),
      just kill L1 with KVM_EXIT_INTERNAL_ERROR.
      Reported-by: NMiaohe Lin <linmiaohe@huawei.com>
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      [Squash kbuild autopatch. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      b6a0653a
    • V
      KVM: nVMX: stop abusing need_vmcs12_to_shadow_sync for eVMCS mapping · e942dbf8
      Vitaly Kuznetsov 提交于
      When vmx_set_nested_state() happens, we may not have all the required
      data to map enlightened VMCS: e.g. HV_X64_MSR_VP_ASSIST_PAGE MSR may not
      yet be restored so we need a postponed action. Currently, we (ab)use
      need_vmcs12_to_shadow_sync/nested_sync_vmcs12_to_shadow() for that but
      this is not ideal:
      - We may not need to sync anything if L2 is running
      - It is hard to propagate errors from nested_sync_vmcs12_to_shadow()
       as we call it from vmx_prepare_switch_to_guest() which happens just
       before we do VMLAUNCH, the code is not ready to handle errors there.
      
      Move eVMCS mapping to nested_get_vmcs12_pages() and request
      KVM_REQ_GET_VMCS12_PAGES, it seems to be is less abusive in nature.
      It would probably be possible to introduce a specialized KVM_REQ_EVMCS_MAP
      but it is undesirable to propagate eVMCS specifics all the way up to x86.c
      
      Note, we don't need to request KVM_REQ_GET_VMCS12_PAGES from
      vmx_set_nested_state() directly as nested_vmx_enter_non_root_mode() already
      does that. Requesting KVM_REQ_GET_VMCS12_PAGES is done to document the
      (non-obvious) side-effect and to be future proof.
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      e942dbf8