1. 08 12月, 2021 25 次提交
  2. 26 11月, 2021 1 次提交
  3. 18 11月, 2021 3 次提交
    • S
      KVM: Disallow user memslot with size that exceeds "unsigned long" · 6b285a55
      Sean Christopherson 提交于
      Reject userspace memslots whose size exceeds the storage capacity of an
      "unsigned long".  KVM's uAPI takes the size as u64 to support large slots
      on 64-bit hosts, but does not account for the size being truncated on
      32-bit hosts in various flows.  The access_ok() check on the userspace
      virtual address in particular casts the size to "unsigned long" and will
      check the wrong number of bytes.
      
      KVM doesn't actually support slots whose size doesn't fit in an "unsigned
      long", e.g. KVM's internal kvm_memory_slot.npages is an "unsigned long",
      not a "u64", and misc arch specific code follows that behavior.
      
      Fixes: fa3d315a ("KVM: Validate userspace_addr of memslot when registered")
      Cc: stable@vger.kernel.org
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Reviewed-by: NMaciej S. Szmigiero <maciej.szmigiero@oracle.com>
      Message-Id: <20211104002531.1176691-3-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      6b285a55
    • S
      KVM: Ensure local memslot copies operate on up-to-date arch-specific data · bda44d84
      Sean Christopherson 提交于
      When modifying memslots, snapshot the "old" memslot and copy it to the
      "new" memslot's arch data after (re)acquiring slots_arch_lock.  x86 can
      change a memslot's arch data while memslot updates are in-progress so
      long as it holds slots_arch_lock, thus snapshotting a memslot without
      holding the lock can result in the consumption of stale data.
      
      Fixes: b10a038e ("KVM: mmu: Add slots_arch_lock for memslot arch fields")
      Cc: stable@vger.kernel.org
      Cc: Ben Gardon <bgardon@google.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20211104002531.1176691-2-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      bda44d84
    • D
      KVM: Kill kvm_map_gfn() / kvm_unmap_gfn() and gfn_to_pfn_cache · 357a18ad
      David Woodhouse 提交于
      In commit 7e2175eb ("KVM: x86: Fix recording of guest steal time /
      preempted status") I removed the only user of these functions because
      it was basically impossible to use them safely.
      
      There are two stages to the GFN->PFN mapping; first through the KVM
      memslots to a userspace HVA and then through the page tables to
      translate that HVA to an underlying PFN. Invalidations of the former
      were being handled correctly, but no attempt was made to use the MMU
      notifiers to invalidate the cache when the HVA->GFN mapping changed.
      
      As a prelude to reinventing the gfn_to_pfn_cache with more usable
      semantics, rip it out entirely and untangle the implementation of
      the unsafe kvm_vcpu_map()/kvm_vcpu_unmap() functions from it.
      
      All current users of kvm_vcpu_map() also look broken right now, and
      will be dealt with separately. They broadly fall into two classes:
      
      * Those which map, access the data and immediately unmap. This is
        mostly gratuitous and could just as well use the existing user
        HVA, and could probably benefit from a gfn_to_hva_cache as they
        do so.
      
      * Those which keep the mapping around for a longer time, perhaps
        even using the PFN directly from the guest. These will need to
        be converted to the new gfn_to_pfn_cache and then kvm_vcpu_map()
        can be removed too.
      Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
      Message-Id: <20211115165030.7422-8-dwmw2@infradead.org>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      357a18ad
  4. 11 11月, 2021 1 次提交
  5. 30 9月, 2021 6 次提交
  6. 23 9月, 2021 1 次提交
  7. 22 9月, 2021 3 次提交
    • S
      KVM: KVM: Use cpumask_available() to check for NULL cpumask when kicking vCPUs · 0bbc2ca8
      Sean Christopherson 提交于
      Check for a NULL cpumask_var_t when kicking multiple vCPUs via
      cpumask_available(), which performs a !NULL check if and only if cpumasks
      are configured to be allocated off-stack.  This is a meaningless
      optimization, e.g. avoids a TEST+Jcc and TEST+CMOV on x86, but more
      importantly helps document that the NULL check is necessary even though
      all callers pass in a local variable.
      
      No functional change intended.
      
      Cc: Lai Jiangshan <jiangshanlai@gmail.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20210827092516.1027264-3-vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      0bbc2ca8
    • S
      KVM: Clean up benign vcpu->cpu data races when kicking vCPUs · 85b64045
      Sean Christopherson 提交于
      Fix a benign data race reported by syzbot+KCSAN[*] by ensuring vcpu->cpu
      is read exactly once, and by ensuring the vCPU is booted from guest mode
      if kvm_arch_vcpu_should_kick() returns true.  Fix a similar race in
      kvm_make_vcpus_request_mask() by ensuring the vCPU is interrupted if
      kvm_request_needs_ipi() returns true.
      
      Reading vcpu->cpu before vcpu->mode (via kvm_arch_vcpu_should_kick() or
      kvm_request_needs_ipi()) means the target vCPU could get migrated (change
      vcpu->cpu) and enter !OUTSIDE_GUEST_MODE between reading vcpu->cpud and
      reading vcpu->mode.  If that happens, the kick/IPI will be sent to the
      old pCPU, not the new pCPU that is now running the vCPU or reading SPTEs.
      
      Although failing to kick the vCPU is not exactly ideal, practically
      speaking it cannot cause a functional issue unless there is also a bug in
      the caller, and any such bug would exist regardless of kvm_vcpu_kick()'s
      behavior.
      
      The purpose of sending an IPI is purely to get a vCPU into the host (or
      out of reading SPTEs) so that the vCPU can recognize a change in state,
      e.g. a KVM_REQ_* request.  If vCPU's handling of the state change is
      required for correctness, KVM must ensure either the vCPU sees the change
      before entering the guest, or that the sender sees the vCPU as running in
      guest mode.  All architectures handle this by (a) sending the request
      before calling kvm_vcpu_kick() and (b) checking for requests _after_
      setting vcpu->mode.
      
      x86's READING_SHADOW_PAGE_TABLES has similar requirements; KVM needs to
      ensure it kicks and waits for vCPUs that started reading SPTEs _before_
      MMU changes were finalized, but any vCPU that starts reading after MMU
      changes were finalized will see the new state and can continue on
      uninterrupted.
      
      For uses of kvm_vcpu_kick() that are not paired with a KVM_REQ_*, e.g.
      x86's kvm_arch_sync_dirty_log(), the order of the kick must not be relied
      upon for functional correctness, e.g. in the dirty log case, userspace
      cannot assume it has a 100% complete log if vCPUs are still running.
      
      All that said, eliminate the benign race since the cost of doing so is an
      "extra" atomic cmpxchg() in the case where the target vCPU is loaded by
      the current pCPU or is not loaded at all.  I.e. the kick will be skipped
      due to kvm_vcpu_exiting_guest_mode() seeing a compatible vcpu->mode as
      opposed to the kick being skipped because of the cpu checks.
      
      Keep the "cpu != me" checks even though they appear useless/impossible at
      first glance.  x86 processes guest IPI writes in a fast path that runs in
      IN_GUEST_MODE, i.e. can call kvm_vcpu_kick() from IN_GUEST_MODE.  And
      calling kvm_vm_bugged()->kvm_make_vcpus_request_mask() from IN_GUEST or
      READING_SHADOW_PAGE_TABLES is perfectly reasonable.
      
      Note, a race with the cpu_online() check in kvm_vcpu_kick() likely
      persists, e.g. the vCPU could exit guest mode and get offlined between
      the cpu_online() check and the sending of smp_send_reschedule().  But,
      the online check appears to exist only to avoid a WARN in x86's
      native_smp_send_reschedule() that fires if the target CPU is not online.
      The reschedule WARN exists because CPU offlining takes the CPU out of the
      scheduling pool, i.e. the WARN is intended to detect the case where the
      kernel attempts to schedule a task on an offline CPU.  The actual sending
      of the IPI is a non-issue as at worst it will simpy be dropped on the
      floor.  In other words, KVM's usurping of the reschedule IPI could
      theoretically trigger a WARN if the stars align, but there will be no
      loss of functionality.
      
      [*] https://syzkaller.appspot.com/bug?extid=cd4154e502f43f10808a
      
      Cc: Venkatesh Srinivas <venkateshs@google.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Fixes: 97222cc8 ("KVM: Emulate local APIC in kernel")
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20210827092516.1027264-2-vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      85b64045
    • S
      KVM: do not shrink halt_poll_ns below grow_start · ae232ea4
      Sergey Senozhatsky 提交于
      grow_halt_poll_ns() ignores values between 0 and
      halt_poll_ns_grow_start (10000 by default). However,
      when we shrink halt_poll_ns we may fall way below
      halt_poll_ns_grow_start and endup with halt_poll_ns
      values that don't make a lot of sense: like 1 or 9,
      or 19.
      
      VCPU1 trace (halt_poll_ns_shrink equals 2):
      
      VCPU1 grow 10000
      VCPU1 shrink 5000
      VCPU1 shrink 2500
      VCPU1 shrink 1250
      VCPU1 shrink 625
      VCPU1 shrink 312
      VCPU1 shrink 156
      VCPU1 shrink 78
      VCPU1 shrink 39
      VCPU1 shrink 19
      VCPU1 shrink 9
      VCPU1 shrink 4
      
      Mirror what grow_halt_poll_ns() does and set halt_poll_ns
      to 0 as soon as new shrink-ed halt_poll_ns value falls
      below halt_poll_ns_grow_start.
      Signed-off-by: NSergey Senozhatsky <senozhatsky@chromium.org>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Message-Id: <20210902031100.252080-1-senozhatsky@chromium.org>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      ae232ea4