• S
    KVM: nVMX: Reset the segment cache when stuffing guest segs · fc387d8d
    Sean Christopherson 提交于
    Explicitly reset the segment cache after stuffing guest segment regs in
    prepare_vmcs02_rare().  Although the cache is reset when switching to
    vmcs02, there is nothing that prevents KVM from re-populating the cache
    prior to writing vmcs02 with vmcs12's values.  E.g. if the vCPU is
    preempted after switching to vmcs02 but before prepare_vmcs02_rare(),
    kvm_arch_vcpu_put() will dereference GUEST_SS_AR_BYTES via .get_cpl()
    and cache the stale vmcs02 value.  While the current code base only
    caches stale data in the preemption case, it's theoretically possible
    future code could read a segment register during the nested flow itself,
    i.e. this isn't technically illegal behavior in kvm_arch_vcpu_put(),
    although it did introduce the bug.
    
    This manifests as an unexpected nested VM-Enter failure when running
    with unrestricted guest disabled if the above preemption case coincides
    with L1 switching L2's CPL, e.g. when switching from a L2 vCPU at CPL3
    to to a L2 vCPU at CPL0.  stack_segment_valid() will see the new SS_SEL
    but the old SS_AR_BYTES and incorrectly mark the guest state as invalid
    due to SS.dpl != SS.rpl.
    
    Don't bother updating the cache even though prepare_vmcs02_rare() writes
    every segment.  With unrestricted guest, guest segments are almost never
    read, let alone L2 guest segments.  On the other hand, populating the
    cache requires a large number of memory writes, i.e. it's unlikely to be
    a net win.  Updating the cache would be a win when unrestricted guest is
    not supported, as guest_state_valid() will immediately cache all segment
    registers.  But, nested virtualization without unrestricted guest is
    dirt slow, saving some VMREADs won't change that, and every CPU
    manufactured in the last decade supports unrestricted guest.  In other
    words, the extra (minor) complexity isn't worth the trouble.
    
    Note, kvm_arch_vcpu_put() may see stale data when querying guest CPL
    depending on when preemption occurs.  This is "ok" in that the usage is
    imperfect by nature, i.e. it's used heuristically to improve performance
    but doesn't affect functionality.  kvm_arch_vcpu_put() could be "fixed"
    by also disabling preemption while loading segments, but that's
    pointless and misleading as reading state from kvm_sched_{in,out}() is
    guaranteed to see stale data in one form or another.  E.g. even if all
    the usage of regs_avail is fixed to call kvm_register_mark_available()
    after the associated state is set, the individual state might still be
    stale with respect to the overall vCPU state.  I.e. making functional
    decisions in an asynchronous hook is doomed from the get go.  Thankfully
    KVM doesn't do that.
    
    Fixes: de63ad4c ("KVM: X86: implement the logic for spinlock optimization")
    Cc: stable@vger.kernel.org
    Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
    Message-Id: <20200923184452.980-2-sean.j.christopherson@intel.com>
    Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
    fc387d8d
nested.c 201.5 KB