1. 01 12月, 2022 1 次提交
  2. 30 11月, 2022 2 次提交
    • D
      KVM: x86/xen: Allow XEN_RUNSTATE_UPDATE flag behaviour to be configured · d8ba8ba4
      David Woodhouse 提交于
      Closer inspection of the Xen code shows that we aren't supposed to be
      using the XEN_RUNSTATE_UPDATE flag unconditionally. It should be
      explicitly enabled by guests through the HYPERVISOR_vm_assist hypercall.
      If we randomly set the top bit of ->state_entry_time for a guest that
      hasn't asked for it and doesn't expect it, that could make the runtimes
      fail to add up and confuse the guest. Without the flag it's perfectly
      safe for a vCPU to read its own vcpu_runstate_info; just not for one
      vCPU to read *another's*.
      
      I briefly pondered adding a word for the whole set of VMASST_TYPE_*
      flags but the only one we care about for HVM guests is this, so it
      seemed a bit pointless.
      Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
      Message-Id: <20221127122210.248427-3-dwmw2@infradead.org>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      d8ba8ba4
    • D
      KVM: x86/xen: Compatibility fixes for shared runstate area · 5ec3289b
      David Woodhouse 提交于
      The guest runstate area can be arbitrarily byte-aligned. In fact, even
      when a sane 32-bit guest aligns the overall structure nicely, the 64-bit
      fields in the structure end up being unaligned due to the fact that the
      32-bit ABI only aligns them to 32 bits.
      
      So setting the ->state_entry_time field to something|XEN_RUNSTATE_UPDATE
      is buggy, because if it's unaligned then we can't update the whole field
      atomically; the low bytes might be observable before the _UPDATE bit is.
      Xen actually updates the *byte* containing that top bit, on its own. KVM
      should do the same.
      
      In addition, we cannot assume that the runstate area fits within a single
      page. One option might be to make the gfn_to_pfn cache cope with regions
      that cross a page — but getting a contiguous virtual kernel mapping of a
      discontiguous set of IOMEM pages is a distinctly non-trivial exercise,
      and it seems this is the *only* current use case for the GPC which would
      benefit from it.
      
      An earlier version of the runstate code did use a gfn_to_hva cache for
      this purpose, but it still had the single-page restriction because it
      used the uhva directly — because it needs to be able to do so atomically
      when the vCPU is being scheduled out, so it used pagefault_disable()
      around the accesses and didn't just use kvm_write_guest_cached() which
      has a fallback path.
      
      So... use a pair of GPCs for the first and potential second page covering
      the runstate area. We can get away with locking both at once because
      nothing else takes more than one GPC lock at a time so we can invent
      a trivial ordering rule.
      
      The common case where it's all in the same page is kept as a fast path,
      but in both cases, the actual guest structure (compat or not) is built
      up from the fields in @vx, following preset pointers to the state and
      times fields. The only difference is whether those pointers point to
      the kernel stack (in the split case) or to guest memory directly via
      the GPC.  The fast path is also fixed to use a byte access for the
      XEN_RUNSTATE_UPDATE bit, then the only real difference is the dual
      memcpy.
      
      Finally, Xen also does write the runstate area immediately when it's
      configured. Flip the kvm_xen_update_runstate() and …_guest() functions
      and call the latter directly when the runstate area is set. This means
      that other ioctls which modify the runstate also write it immediately
      to the guest when they do so, which is also intended.
      
      Update the xen_shinfo_test to exercise the pathological case where the
      XEN_RUNSTATE_UPDATE flag in the top byte of the state_entry_time is
      actually in a different page to the rest of the 64-bit word.
      Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      5ec3289b
  3. 29 11月, 2022 1 次提交
  4. 24 11月, 2022 2 次提交
  5. 28 10月, 2022 1 次提交
  6. 27 10月, 2022 1 次提交
    • M
      KVM: Initialize gfn_to_pfn_cache locks in dedicated helper · 52491a38
      Michal Luczaj 提交于
      Move the gfn_to_pfn_cache lock initialization to another helper and
      call the new helper during VM/vCPU creation.  There are race
      conditions possible due to kvm_gfn_to_pfn_cache_init()'s
      ability to re-initialize the cache's locks.
      
      For example: a race between ioctl(KVM_XEN_HVM_EVTCHN_SEND) and
      kvm_gfn_to_pfn_cache_init() leads to a corrupted shinfo gpc lock.
      
                      (thread 1)                |           (thread 2)
                                                |
       kvm_xen_set_evtchn_fast                  |
        read_lock_irqsave(&gpc->lock, ...)      |
                                                | kvm_gfn_to_pfn_cache_init
                                                |  rwlock_init(&gpc->lock)
        read_unlock_irqrestore(&gpc->lock, ...) |
      
      Rename "cache_init" and "cache_destroy" to activate+deactivate to
      avoid implying that the cache really is destroyed/freed.
      
      Note, there more races in the newly named kvm_gpc_activate() that will
      be addressed separately.
      
      Fixes: 982ed0de ("KVM: Reinstate gfn_to_pfn_cache with invalidation support")
      Cc: stable@vger.kernel.org
      Suggested-by: NSean Christopherson <seanjc@google.com>
      Signed-off-by: NMichal Luczaj <mhal@rbox.co>
      [sean: call out that this is a bug fix]
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20221013211234.1318131-2-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      52491a38
  7. 27 9月, 2022 1 次提交
  8. 11 8月, 2022 2 次提交
  9. 13 7月, 2022 1 次提交
  10. 14 4月, 2022 1 次提交
  11. 02 4月, 2022 15 次提交
  12. 11 2月, 2022 2 次提交
    • S
      KVM: xen: Use static_call() for invoking kvm_x86_ops hooks · 0264a351
      Sean Christopherson 提交于
      Use static_call() for invoking kvm_x86_ops function that already have a
      defined static call, mostly as a step toward having _all_ calls to
      kvm_x86_ops route through a static_call() in order to simplify auditing,
      e.g. via grep, that all functions have an entry in kvm-x86-ops.h, but
      also because there's no reason not to use a static_call().
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20220128005208.4008533-8-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      0264a351
    • D
      KVM: x86/xen: Fix runstate updates to be atomic when preempting vCPU · fcb732d8
      David Woodhouse 提交于
      There are circumstances whem kvm_xen_update_runstate_guest() should not
      sleep because it ends up being called from __schedule() when the vCPU
      is preempted:
      
      [  222.830825]  kvm_xen_update_runstate_guest+0x24/0x100
      [  222.830878]  kvm_arch_vcpu_put+0x14c/0x200
      [  222.830920]  kvm_sched_out+0x30/0x40
      [  222.830960]  __schedule+0x55c/0x9f0
      
      To handle this, make it use the same trick as __kvm_xen_has_interrupt(),
      of using the hva from the gfn_to_hva_cache directly. Then it can use
      pagefault_disable() around the accesses and just bail out if the page
      is absent (which is unlikely).
      
      I almost switched to using a gfn_to_pfn_cache here and bailing out if
      kvm_map_gfn() fails, like kvm_steal_time_set_preempted() does — but on
      closer inspection it looks like kvm_map_gfn() will *always* fail in
      atomic context for a page in IOMEM, which means it will silently fail
      to make the update every single time for such guests, AFAICT. So I
      didn't do it that way after all. And will probably fix that one too.
      
      Cc: stable@vger.kernel.org
      Fixes: 30b5c851 ("KVM: x86/xen: Add support for vCPU runstate information")
      Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
      Message-Id: <b17a93e5ff4561e57b1238e3e7ccd0b613eb827e.camel@infradead.org>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      fcb732d8
  13. 24 1月, 2022 1 次提交
  14. 07 1月, 2022 3 次提交
    • D
      KVM: x86: Fix wall clock writes in Xen shared_info not to mark page dirty · 55749769
      David Woodhouse 提交于
      When dirty ring logging is enabled, any dirty logging without an active
      vCPU context will cause a kernel oops. But we've already declared that
      the shared_info page doesn't get dirty tracking anyway, since it would
      be kind of insane to mark it dirty every time we deliver an event channel
      interrupt. Userspace is supposed to just assume it's always dirty any
      time a vCPU can run or event channels are routed.
      
      So stop using the generic kvm_write_wall_clock() and just write directly
      through the gfn_to_pfn_cache that we already have set up.
      
      We can make kvm_write_wall_clock() static in x86.c again now, but let's
      not remove the 'sec_hi_ofs' argument even though it's not used yet. At
      some point we *will* want to use that for KVM guests too.
      
      Fixes: 629b5348 ("KVM: x86/xen: update wallclock region")
      Reported-by: Nbutt3rflyh4ck <butterflyhuangxx@gmail.com>
      Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
      Message-Id: <20211210163625.2886-6-dwmw2@infradead.org>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      55749769
    • D
      KVM: x86/xen: Add KVM_IRQ_ROUTING_XEN_EVTCHN and event channel delivery · 14243b38
      David Woodhouse 提交于
      This adds basic support for delivering 2 level event channels to a guest.
      
      Initially, it only supports delivery via the IRQ routing table, triggered
      by an eventfd. In order to do so, it has a kvm_xen_set_evtchn_fast()
      function which will use the pre-mapped shared_info page if it already
      exists and is still valid, while the slow path through the irqfd_inject
      workqueue will remap the shared_info page if necessary.
      
      It sets the bits in the shared_info page but not the vcpu_info; that is
      deferred to __kvm_xen_has_interrupt() which raises the vector to the
      appropriate vCPU.
      
      Add a 'verbose' mode to xen_shinfo_test while adding test cases for this.
      Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
      Message-Id: <20211210163625.2886-5-dwmw2@infradead.org>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      14243b38
    • D
      KVM: x86/xen: Maintain valid mapping of Xen shared_info page · 1cfc9c4b
      David Woodhouse 提交于
      Use the newly reinstated gfn_to_pfn_cache to maintain a kernel mapping
      of the Xen shared_info page so that it can be accessed in atomic context.
      
      Note that we do not participate in dirty tracking for the shared info
      page and we do not explicitly mark it dirty every single tim we deliver
      an event channel interrupts. We wouldn't want to do that even if we *did*
      have a valid vCPU context with which to do so.
      Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
      Message-Id: <20211210163625.2886-4-dwmw2@infradead.org>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      1cfc9c4b
  15. 18 11月, 2021 3 次提交
  16. 25 10月, 2021 1 次提交
    • D
      KVM: x86/xen: Fix kvm_xen_has_interrupt() sleeping in kvm_vcpu_block() · 0985dba8
      David Woodhouse 提交于
      In kvm_vcpu_block, the current task is set to TASK_INTERRUPTIBLE before
      making a final check whether the vCPU should be woken from HLT by any
      incoming interrupt.
      
      This is a problem for the get_user() in __kvm_xen_has_interrupt(), which
      really shouldn't be sleeping when the task state has already been set.
      I think it's actually harmless as it would just manifest itself as a
      spurious wakeup, but it's causing a debug warning:
      
      [  230.963649] do not call blocking ops when !TASK_RUNNING; state=1 set at [<00000000b6bcdbc9>] prepare_to_swait_exclusive+0x30/0x80
      
      Fix the warning by turning it into an *explicit* spurious wakeup. When
      invoked with !task_is_running(current) (and we might as well add
      in_atomic() there while we're at it), just return 1 to indicate that
      an IRQ is pending, which will cause a wakeup and then something will
      call it again in a context that *can* sleep so it can fault the page
      back in.
      
      Cc: stable@vger.kernel.org
      Fixes: 40da8ccd ("KVM: x86/xen: Add event channel interrupt vector upcall")
      Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
      
      Message-Id: <168bf8c689561da904e48e2ff5ae4713eaef9e2d.camel@infradead.org>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      0985dba8
  17. 05 8月, 2021 1 次提交
    • P
      KVM: xen: do not use struct gfn_to_hva_cache · 319afe68
      Paolo Bonzini 提交于
      gfn_to_hva_cache is not thread-safe, so it is usually used only within
      a vCPU (whose code is protected by vcpu->mutex).  The Xen interface
      implementation has such a cache in kvm->arch, but it is not really
      used except to store the location of the shared info page.  Replace
      shinfo_set and shinfo_cache with just the value that is passed via
      KVM_XEN_ATTR_TYPE_SHARED_INFO; the only complication is that the
      initialization value is not zero anymore and therefore kvm_xen_init_vm
      needs to be introduced.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      319afe68
  18. 26 4月, 2021 1 次提交