1. 21 4月, 2020 4 次提交
    • S
      KVM: VMX: Clean up vmx_flush_tlb_gva() · ad104b5e
      Sean Christopherson 提交于
      Refactor vmx_flush_tlb_gva() to remove a superfluous local variable and
      clean up its comment, which is oddly located below the code it is
      commenting.
      
      No functional change intended.
      Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Message-Id: <20200320212833.3507-16-sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      ad104b5e
    • S
      KVM: x86: Move "flush guest's TLB" logic to separate kvm_x86_ops hook · e64419d9
      Sean Christopherson 提交于
      Add a dedicated hook to handle flushing TLB entries on behalf of the
      guest, i.e. for a paravirtualized TLB flush, and use it directly instead
      of bouncing through kvm_vcpu_flush_tlb().
      
      For VMX, change the effective implementation implementation to never do
      INVEPT and flush only the current context, i.e. to always flush via
      INVVPID(SINGLE_CONTEXT).  The INVEPT performed by __vmx_flush_tlb() when
      @invalidate_gpa=false and enable_vpid=0 is unnecessary, as it will only
      flush guest-physical mappings; linear and combined mappings are flushed
      by VM-Enter when VPID is disabled, and changes in the guest pages tables
      do not affect guest-physical mappings.
      
      When EPT and VPID are enabled, doing INVVPID is not required (by Intel's
      architecture) to invalidate guest-physical mappings, i.e. TLB entries
      that cache guest-physical mappings can live across INVVPID as the
      mappings are associated with an EPTP, not a VPID.  The intent of
      @invalidate_gpa is to inform vmx_flush_tlb() that it must "invalidate
      gpa mappings", i.e. do INVEPT and not simply INVVPID.  Other than nested
      VPID handling, which now calls vpid_sync_context() directly, the only
      scenario where KVM can safely do INVVPID instead of INVEPT (when EPT is
      enabled) is if KVM is flushing TLB entries from the guest's perspective,
      i.e. is only required to invalidate linear mappings.
      
      For SVM, flushing TLB entries from the guest's perspective can be done
      by flushing the current ASID, as changes to the guest's page tables are
      associated only with the current ASID.
      
      Adding a dedicated ->tlb_flush_guest() paves the way toward removing
      @invalidate_gpa, which is a potentially dangerous control flag as its
      meaning is not exactly crystal clear, even for those who are familiar
      with the subtleties of what mappings Intel CPUs are/aren't allowed to
      keep across various invalidation scenarios.
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Message-Id: <20200320212833.3507-15-sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      e64419d9
    • S
      KVM: VMX: Handle INVVPID fallback logic in vpid_sync_vcpu_addr() · ab4b3597
      Sean Christopherson 提交于
      Directly invoke vpid_sync_context() to do a global INVVPID when the
      individual address variant is not supported instead of deferring such
      behavior to the caller.  This allows for additional consolidation of
      code as the logic is basically identical to the emulation of the
      individual address variant in handle_invvpid().
      
      No functional change intended.
      Reviewed-by: NMiaohe Lin <linmiaohe@huawei.com>
      Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Message-Id: <20200320212833.3507-12-sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      ab4b3597
    • J
      KVM: x86: Sync SPTEs when injecting page/EPT fault into L1 · ee1fa209
      Junaid Shahid 提交于
      When injecting a page fault or EPT violation/misconfiguration, KVM is
      not syncing any shadow PTEs associated with the faulting address,
      including those in previous MMUs that are associated with L1's current
      EPTP (in a nested EPT scenario), nor is it flushing any hardware TLB
      entries.  All this is done by kvm_mmu_invalidate_gva.
      
      Page faults that are either !PRESENT or RSVD are exempt from the flushing,
      as the CPU is not allowed to cache such translations.
      Signed-off-by: NJunaid Shahid <junaids@google.com>
      Co-developed-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Message-Id: <20200320212833.3507-8-sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      ee1fa209
  2. 14 4月, 2020 1 次提交
  3. 07 4月, 2020 1 次提交
    • V
      KVM: VMX: fix crash cleanup when KVM wasn't used · dbef2808
      Vitaly Kuznetsov 提交于
      If KVM wasn't used at all before we crash the cleanup procedure fails with
       BUG: unable to handle page fault for address: ffffffffffffffc8
       #PF: supervisor read access in kernel mode
       #PF: error_code(0x0000) - not-present page
       PGD 23215067 P4D 23215067 PUD 23217067 PMD 0
       Oops: 0000 [#8] SMP PTI
       CPU: 0 PID: 3542 Comm: bash Kdump: loaded Tainted: G      D           5.6.0-rc2+ #823
       RIP: 0010:crash_vmclear_local_loaded_vmcss.cold+0x19/0x51 [kvm_intel]
      
      The root cause is that loaded_vmcss_on_cpu list is not yet initialized,
      we initialize it in hardware_enable() but this only happens when we start
      a VM.
      
      Previously, we used to have a bitmap with enabled CPUs and that was
      preventing [masking] the issue.
      
      Initialized loaded_vmcss_on_cpu list earlier, right before we assign
      crash_vmclear_loaded_vmcss pointer. blocked_vcpu_on_cpu list and
      blocked_vcpu_on_cpu_lock are moved altogether for consistency.
      
      Fixes: 31603d4f ("KVM: VMX: Always VMCLEAR in-use VMCSes during crash with kexec support")
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20200401081348.1345307-1-vkuznets@redhat.com>
      Reviewed-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      dbef2808
  4. 31 3月, 2020 6 次提交
  5. 25 3月, 2020 2 次提交
  6. 24 3月, 2020 4 次提交
    • S
      KVM: VMX: Gracefully handle faults on VMXON · 4f6ea0a8
      Sean Christopherson 提交于
      Gracefully handle faults on VMXON, e.g. #GP due to VMX being disabled by
      BIOS, instead of letting the fault crash the system.  Now that KVM uses
      cpufeatures to query support instead of reading MSR_IA32_FEAT_CTL
      directly, it's possible for a bug in a different subsystem to cause KVM
      to incorrectly attempt VMXON[*].  Crashing the system is especially
      annoying if the system is configured such that hardware_enable() will
      be triggered during boot.
      
      Oppurtunistically rename @addr to @vmxon_pointer and use a named param
      to reference it in the inline assembly.
      
      Print 0xdeadbeef in the ultra-"rare" case that reading MSR_IA32_FEAT_CTL
      also faults.
      
      [*] https://lkml.kernel.org/r/20200226231615.13664-1-sean.j.christopherson@intel.comSigned-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Message-Id: <20200321193751.24985-4-sean.j.christopherson@intel.com>
      Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      4f6ea0a8
    • S
      KVM: VMX: Fold loaded_vmcs_init() into alloc_loaded_vmcs() · d260f9ef
      Sean Christopherson 提交于
      Subsume loaded_vmcs_init() into alloc_loaded_vmcs(), its only remaining
      caller, and drop the VMCLEAR on the shadow VMCS, which is guaranteed to
      be NULL.  loaded_vmcs_init() was previously used by loaded_vmcs_clear(),
      but loaded_vmcs_clear() also subsumed loaded_vmcs_init() to properly
      handle smp_wmb() with respect to VMCLEAR.
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Message-Id: <20200321193751.24985-3-sean.j.christopherson@intel.com>
      Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      d260f9ef
    • S
      KVM: VMX: Always VMCLEAR in-use VMCSes during crash with kexec support · 31603d4f
      Sean Christopherson 提交于
      VMCLEAR all in-use VMCSes during a crash, even if kdump's NMI shootdown
      interrupted a KVM update of the percpu in-use VMCS list.
      
      Because NMIs are not blocked by disabling IRQs, it's possible that
      crash_vmclear_local_loaded_vmcss() could be called while the percpu list
      of VMCSes is being modified, e.g. in the middle of list_add() in
      vmx_vcpu_load_vmcs().  This potential corner case was called out in the
      original commit[*], but the analysis of its impact was wrong.
      
      Skipping the VMCLEARs is wrong because it all but guarantees that a
      loaded, and therefore cached, VMCS will live across kexec and corrupt
      memory in the new kernel.  Corruption will occur because the CPU's VMCS
      cache is non-coherent, i.e. not snooped, and so the writeback of VMCS
      memory on its eviction will overwrite random memory in the new kernel.
      The VMCS will live because the NMI shootdown also disables VMX, i.e. the
      in-progress VMCLEAR will #UD, and existing Intel CPUs do not flush the
      VMCS cache on VMXOFF.
      
      Furthermore, interrupting list_add() and list_del() is safe due to
      crash_vmclear_local_loaded_vmcss() using forward iteration.  list_add()
      ensures the new entry is not visible to forward iteration unless the
      entire add completes, via WRITE_ONCE(prev->next, new).  A bad "prev"
      pointer could be observed if the NMI shootdown interrupted list_del() or
      list_add(), but list_for_each_entry() does not consume ->prev.
      
      In addition to removing the temporary disabling of VMCLEAR, open code
      loaded_vmcs_init() in __loaded_vmcs_clear() and reorder VMCLEAR so that
      the VMCS is deleted from the list only after it's been VMCLEAR'd.
      Deleting the VMCS before VMCLEAR would allow a race where the NMI
      shootdown could arrive between list_del() and vmcs_clear() and thus
      neither flow would execute a successful VMCLEAR.  Alternatively, more
      code could be moved into loaded_vmcs_init(), but that gets rather silly
      as the only other user, alloc_loaded_vmcs(), doesn't need the smp_wmb()
      and would need to work around the list_del().
      
      Update the smp_*() comments related to the list manipulation, and
      opportunistically reword them to improve clarity.
      
      [*] https://patchwork.kernel.org/patch/1675731/#3720461
      
      Fixes: 8f536b76 ("KVM: VMX: provide the vmclear function and a bitmap to support VMCLEAR in kdump")
      Cc: stable@vger.kernel.org
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Message-Id: <20200321193751.24985-2-sean.j.christopherson@intel.com>
      Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      31603d4f
    • N
      KVM: VMX: don't allow memory operands for inline asm that modifies SP · 428b8f1d
      Nick Desaulniers 提交于
      THUNK_TARGET defines [thunk_target] as having "rm" input constraints
      when CONFIG_RETPOLINE is not set, which isn't constrained enough for
      this specific case.
      
      For inline assembly that modifies the stack pointer before using this
      input, the underspecification of constraints is dangerous, and results
      in an indirect call to a previously pushed flags register.
      
      In this case `entry`'s stack slot is good enough to satisfy the "m"
      constraint in "rm", but the inline assembly in
      handle_external_interrupt_irqoff() modifies the stack pointer via
      push+pushf before using this input, which in this case results in
      calling what was the previous state of the flags register, rather than
      `entry`.
      
      Be more specific in the constraints by requiring `entry` be in a
      register, and not a memory operand.
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Reported-by: syzbot+3f29ca2efb056a761e38@syzkaller.appspotmail.com
      Debugged-by: NAlexander Potapenko <glider@google.com>
      Debugged-by: NPaolo Bonzini <pbonzini@redhat.com>
      Debugged-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NNick Desaulniers <ndesaulniers@google.com>
      Message-Id: <20200323191243.30002-1-ndesaulniers@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      428b8f1d
  7. 18 3月, 2020 1 次提交
  8. 17 3月, 2020 21 次提交