• S
    KVM: VMX: Always VMCLEAR in-use VMCSes during crash with kexec support · 31603d4f
    Sean Christopherson 提交于
    VMCLEAR all in-use VMCSes during a crash, even if kdump's NMI shootdown
    interrupted a KVM update of the percpu in-use VMCS list.
    
    Because NMIs are not blocked by disabling IRQs, it's possible that
    crash_vmclear_local_loaded_vmcss() could be called while the percpu list
    of VMCSes is being modified, e.g. in the middle of list_add() in
    vmx_vcpu_load_vmcs().  This potential corner case was called out in the
    original commit[*], but the analysis of its impact was wrong.
    
    Skipping the VMCLEARs is wrong because it all but guarantees that a
    loaded, and therefore cached, VMCS will live across kexec and corrupt
    memory in the new kernel.  Corruption will occur because the CPU's VMCS
    cache is non-coherent, i.e. not snooped, and so the writeback of VMCS
    memory on its eviction will overwrite random memory in the new kernel.
    The VMCS will live because the NMI shootdown also disables VMX, i.e. the
    in-progress VMCLEAR will #UD, and existing Intel CPUs do not flush the
    VMCS cache on VMXOFF.
    
    Furthermore, interrupting list_add() and list_del() is safe due to
    crash_vmclear_local_loaded_vmcss() using forward iteration.  list_add()
    ensures the new entry is not visible to forward iteration unless the
    entire add completes, via WRITE_ONCE(prev->next, new).  A bad "prev"
    pointer could be observed if the NMI shootdown interrupted list_del() or
    list_add(), but list_for_each_entry() does not consume ->prev.
    
    In addition to removing the temporary disabling of VMCLEAR, open code
    loaded_vmcs_init() in __loaded_vmcs_clear() and reorder VMCLEAR so that
    the VMCS is deleted from the list only after it's been VMCLEAR'd.
    Deleting the VMCS before VMCLEAR would allow a race where the NMI
    shootdown could arrive between list_del() and vmcs_clear() and thus
    neither flow would execute a successful VMCLEAR.  Alternatively, more
    code could be moved into loaded_vmcs_init(), but that gets rather silly
    as the only other user, alloc_loaded_vmcs(), doesn't need the smp_wmb()
    and would need to work around the list_del().
    
    Update the smp_*() comments related to the list manipulation, and
    opportunistically reword them to improve clarity.
    
    [*] https://patchwork.kernel.org/patch/1675731/#3720461
    
    Fixes: 8f536b76 ("KVM: VMX: provide the vmclear function and a bitmap to support VMCLEAR in kdump")
    Cc: stable@vger.kernel.org
    Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
    Message-Id: <20200321193751.24985-2-sean.j.christopherson@intel.com>
    Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
    Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
    31603d4f
vmx.c 220.9 KB