提交 · 8acc35186ed63436bfaf60051c8bb53f344dcbfc · openeuler / Kernel

01 12月, 2022 1 次提交

KVM: x86/xen: Add runstate tests for 32-bit mode and crossing page boundary · 8acc3518

由 David Woodhouse 提交于 11月 19, 2022

Torture test the cases where the runstate crosses a page boundary, and
and especially the case where it's configured in 32-bit mode and doesn't,
but then switching to 64-bit mode makes it go onto the second page.

To simplify this, make the KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_ADJUST ioctl
also update the guest runstate area. It already did so if the actual
runstate changed, as a side-effect of kvm_xen_update_runstate(). So
doing it in the plain adjustment case is making it more consistent, as
well as giving us a nice way to trigger the update without actually
running the vCPU again and changing the values.
Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: NPaul Durrant <paul@xen.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

8acc3518

30 11月, 2022 2 次提交

KVM: x86/xen: Allow XEN_RUNSTATE_UPDATE flag behaviour to be configured · d8ba8ba4

由 David Woodhouse 提交于 11月 27, 2022

Closer inspection of the Xen code shows that we aren't supposed to be
using the XEN_RUNSTATE_UPDATE flag unconditionally. It should be
explicitly enabled by guests through the HYPERVISOR_vm_assist hypercall.
If we randomly set the top bit of ->state_entry_time for a guest that
hasn't asked for it and doesn't expect it, that could make the runtimes
fail to add up and confuse the guest. Without the flag it's perfectly
safe for a vCPU to read its own vcpu_runstate_info; just not for one
vCPU to read *another's*.

I briefly pondered adding a word for the whole set of VMASST_TYPE_*
flags but the only one we care about for HVM guests is this, so it
seemed a bit pointless.
Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
Message-Id: <20221127122210.248427-3-dwmw2@infradead.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d8ba8ba4

KVM: x86/xen: Compatibility fixes for shared runstate area · 5ec3289b

由 David Woodhouse 提交于 11月 18, 2022

The guest runstate area can be arbitrarily byte-aligned. In fact, even
when a sane 32-bit guest aligns the overall structure nicely, the 64-bit
fields in the structure end up being unaligned due to the fact that the
32-bit ABI only aligns them to 32 bits.

So setting the ->state_entry_time field to something|XEN_RUNSTATE_UPDATE
is buggy, because if it's unaligned then we can't update the whole field
atomically; the low bytes might be observable before the _UPDATE bit is.
Xen actually updates the *byte* containing that top bit, on its own. KVM
should do the same.

In addition, we cannot assume that the runstate area fits within a single
page. One option might be to make the gfn_to_pfn cache cope with regions
that cross a page — but getting a contiguous virtual kernel mapping of a
discontiguous set of IOMEM pages is a distinctly non-trivial exercise,
and it seems this is the *only* current use case for the GPC which would
benefit from it.

An earlier version of the runstate code did use a gfn_to_hva cache for
this purpose, but it still had the single-page restriction because it
used the uhva directly — because it needs to be able to do so atomically
when the vCPU is being scheduled out, so it used pagefault_disable()
around the accesses and didn't just use kvm_write_guest_cached() which
has a fallback path.

So... use a pair of GPCs for the first and potential second page covering
the runstate area. We can get away with locking both at once because
nothing else takes more than one GPC lock at a time so we can invent
a trivial ordering rule.

The common case where it's all in the same page is kept as a fast path,
but in both cases, the actual guest structure (compat or not) is built
up from the fields in @vx, following preset pointers to the state and
times fields. The only difference is whether those pointers point to
the kernel stack (in the split case) or to guest memory directly via
the GPC. The fast path is also fixed to use a byte access for the
XEN_RUNSTATE_UPDATE bit, then the only real difference is the dual
memcpy.

Finally, Xen also does write the runstate area immediately when it's
configured. Flip the kvm_xen_update_runstate() and …_guest() functions
and call the latter directly when the runstate area is set. This means
that other ioctls which modify the runstate also write it immediately
to the guest when they do so, which is also intended.

Update the xen_shinfo_test to exercise the pathological case where the
XEN_RUNSTATE_UPDATE flag in the top byte of the state_entry_time is
actually in a different page to the rest of the 64-bit word.
Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

5ec3289b

29 11月, 2022 1 次提交
- D
  KVM: x86/xen: Add CPL to Xen hypercall tracepoint · c3f37199
  由 David Woodhouse 提交于 11月 14, 2022
```
Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
```
  c3f37199
24 11月, 2022 2 次提交

KVM: x86/xen: Only do in-kernel acceleration of hypercalls for guest CPL0 · c2b8cdfa

由 David Woodhouse 提交于 11月 12, 2022

There are almost no hypercalls which are valid from CPL > 0, and definitely
none which are handled by the kernel.

Fixes: 2fd6df2f ("KVM: x86/xen: intercept EVTCHNOP_send from guests")
Reported-by: NMichal Luczaj <mhal@rbox.co>
Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: NSean Christopherson <seanjc@google.com>
Cc: stable@kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c2b8cdfa

KVM: x86/xen: Validate port number in SCHEDOP_poll · 4ea9439f

由 David Woodhouse 提交于 11月 12, 2022

We shouldn't allow guests to poll on arbitrary port numbers off the end
of the event channel table.

Fixes: 1a65105a ("KVM: x86/xen: handle PV spinlocks slowpath")
[dwmw2: my bug though; the original version did check the validity as a
 side-effect of an idr_find() which I ripped out in refactoring.]
Reported-by: NMichal Luczaj <mhal@rbox.co>
Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: NSean Christopherson <seanjc@google.com>
Cc: stable@kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4ea9439f

28 10月, 2022 1 次提交

KVM: x86/xen: Fix eventfd error handling in kvm_xen_eventfd_assign() · 73536338

由 Eiichi Tsukata 提交于 10月 28, 2022

Should not call eventfd_ctx_put() in case of error.

Fixes: 2fd6df2f ("KVM: x86/xen: intercept EVTCHNOP_send from guests")
Reported-by: syzbot+6f0c896c5a9449a10ded@syzkaller.appspotmail.com
Signed-off-by: NEiichi Tsukata <eiichi.tsukata@nutanix.com>
Message-Id: <20221028092631.117438-1-eiichi.tsukata@nutanix.com>
[Introduce new goto target instead. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

73536338

27 10月, 2022 1 次提交

KVM: Initialize gfn_to_pfn_cache locks in dedicated helper · 52491a38

由 Michal Luczaj 提交于 10月 13, 2022

Move the gfn_to_pfn_cache lock initialization to another helper and
call the new helper during VM/vCPU creation.  There are race
conditions possible due to kvm_gfn_to_pfn_cache_init()'s
ability to re-initialize the cache's locks.

For example: a race between ioctl(KVM_XEN_HVM_EVTCHN_SEND) and
kvm_gfn_to_pfn_cache_init() leads to a corrupted shinfo gpc lock.

                (thread 1)                |           (thread 2)
                                          |
 kvm_xen_set_evtchn_fast                  |
  read_lock_irqsave(&gpc->lock, ...)      |
                                          | kvm_gfn_to_pfn_cache_init
                                          |  rwlock_init(&gpc->lock)
  read_unlock_irqrestore(&gpc->lock, ...) |

Rename "cache_init" and "cache_destroy" to activate+deactivate to
avoid implying that the cache really is destroyed/freed.

Note, there more races in the newly named kvm_gpc_activate() that will
be addressed separately.

Fixes: 982ed0de ("KVM: Reinstate gfn_to_pfn_cache with invalidation support")
Cc: stable@vger.kernel.org
Suggested-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NMichal Luczaj <mhal@rbox.co>
[sean: call out that this is a bug fix]
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20221013211234.1318131-2-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

52491a38

27 9月, 2022 1 次提交

KVM: remove KVM_REQ_UNHALT · c59fb127

由 Paolo Bonzini 提交于 9月 21, 2022

KVM_REQ_UNHALT is now unnecessary because it is replaced by the return
value of kvm_vcpu_block/kvm_vcpu_halt.  Remove it.

No functional change intended.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Acked-by: NMarc Zyngier <maz@kernel.org>
Message-Id: <20220921003201.1441511-13-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c59fb127

11 8月, 2022 2 次提交

KVM: x86/xen: Stop Xen timer before changing IRQ · c0368991

由 Coleman Dietsch 提交于 8月 08, 2022

Stop Xen timer (if it's running) prior to changing the IRQ vector and
potentially (re)starting the timer. Changing the IRQ vector while the
timer is still running can result in KVM injecting a garbage event, e.g.
vm_xen_inject_timer_irqs() could see a non-zero xen.timer_pending from
a previous timer but inject the new xen.timer_virq.

Fixes: 53639526 ("KVM: x86/xen: handle PV timers oneshot mode")
Cc: stable@vger.kernel.org
Link: https://syzkaller.appspot.com/bug?id=8234a9dfd3aafbf092cc5a7cd9842e3ebc45fc42
Reported-by: syzbot+e54f930ed78eb0f85281@syzkaller.appspotmail.com
Signed-off-by: NColeman Dietsch <dietschc@csp.edu>
Reviewed-by: NSean Christopherson <seanjc@google.com>
Acked-by: NDavid Woodhouse <dwmw@amazon.co.uk>
Message-Id: <20220808190607.323899-3-dietschc@csp.edu>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c0368991

KVM: x86/xen: Initialize Xen timer only once · af735db3

由 Coleman Dietsch 提交于 8月 08, 2022

Add a check for existing xen timers before initializing a new one.

Currently kvm_xen_init_timer() is called on every
KVM_XEN_VCPU_ATTR_TYPE_TIMER, which is causing the following ODEBUG
crash when vcpu->arch.xen.timer is already set.

ODEBUG: init active (active state 0)
object type: hrtimer hint: xen_timer_callbac0
RIP: 0010:debug_print_object+0x16e/0x250 lib/debugobjects.c:502
Call Trace:
__debug_object_init
debug_hrtimer_init
debug_init
hrtimer_init
kvm_xen_init_timer
kvm_xen_vcpu_set_attr
kvm_arch_vcpu_ioctl
kvm_vcpu_ioctl
vfs_ioctl

Fixes: 53639526 ("KVM: x86/xen: handle PV timers oneshot mode")
Cc: stable@vger.kernel.org
Link: https://syzkaller.appspot.com/bug?id=8234a9dfd3aafbf092cc5a7cd9842e3ebc45fc42
Reported-by: syzbot+e54f930ed78eb0f85281@syzkaller.appspotmail.com
Signed-off-by: NColeman Dietsch <dietschc@csp.edu>
Reviewed-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20220808190607.323899-2-dietschc@csp.edu>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

af735db3

13 7月, 2022 1 次提交

KVM: x86: Query vcpu->vcpu_idx directly and drop its accessor, again · 79f772b9

由 Sean Christopherson 提交于 6月 14, 2022

Read vcpu->vcpu_idx directly instead of bouncing through the one-line
wrapper, kvm_vcpu_get_idx(), and drop the wrapper.  The wrapper is a
remnant of the original implementation and serves no purpose; remove it
(again) before it gains more users.

kvm_vcpu_get_idx() was removed in the not-too-distant past by commit
4eeef242 ("KVM: x86: Query vcpu->vcpu_idx directly and drop its
accessor"), but was unintentionally re-introduced by commit a54d8066
("KVM: Keep memslots in tree-based structures instead of array-based ones"),
likely due to a rebase goof.  The wrapper then managed to gain users in
KVM's Xen code.

No functional change intended.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Reviewed-by: NJim Mattson <jmattson@google.com>
Link: https://lore.kernel.org/r/20220614225615.3843835-1-seanjc@google.com

79f772b9

14 4月, 2022 1 次提交

KVM: x86/xen: Remove the redundantly included header file lapic.h · 04c97512

由 Like Xu 提交于 4月 06, 2022

The header lapic.h is included more than once, remove one of them.
Signed-off-by: NLike Xu <likexu@tencent.com>
Message-Id: <20220406063715.55625-2-likexu@tencent.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

04c97512

02 4月, 2022 15 次提交

KVM: x86/xen: handle PV spinlocks slowpath · 1a65105a

由 Boris Ostrovsky 提交于 3月 03, 2022

Add support for SCHEDOP_poll hypercall.

This implementation is optimized for polling for a single channel, which
is what Linux does. Polling for multiple channels is not especially
efficient (and has not been tested).

PV spinlocks slow path uses this hypercall, and explicitly crash if it's
not supported.

[ dwmw2: Rework to use kvm_vcpu_halt(), not supported for 32-bit guests ]
Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220303154127.202856-17-dwmw2@infradead.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

1a65105a

KVM: x86/xen: Advertise and document KVM_XEN_HVM_CONFIG_EVTCHN_SEND · 661a20fa

由 David Woodhouse 提交于 3月 03, 2022

At the end of the patch series adding this batch of event channel
acceleration features, finally add the feature bit which advertises
them and document it all.

For SCHEDOP_poll we need to wake a polling vCPU when a given port
is triggered, even when it's masked — and we want to implement that
in the kernel, for efficiency. So we want the kernel to know that it
has sole ownership of event channel delivery. Thus, we allow
userspace to make the 'promise' by setting the corresponding feature
bit in its KVM_XEN_HVM_CONFIG call. As we implement SCHEDOP_poll
bypass later, we will do so only if that promise has been made by
userspace.
Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220303154127.202856-16-dwmw2@infradead.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

661a20fa

KVM: x86/xen: Support per-vCPU event channel upcall via local APIC · fde0451b

由 David Woodhouse 提交于 3月 03, 2022

Windows uses a per-vCPU vector, and it's delivered via the local APIC
basically like an MSI (with associated EOI) unlike the traditional
guest-wide vector which is just magically asserted by Xen (and in the
KVM case by kvm_xen_has_interrupt() / kvm_cpu_get_extint()).

Now that the kernel is able to raise event channel events for itself,
being able to do so for Windows guests is also going to be useful.
Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220303154127.202856-15-dwmw2@infradead.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

fde0451b

KVM: x86/xen: Kernel acceleration for XENVER_version · 28d1629f

由 David Woodhouse 提交于 3月 03, 2022

Turns out this is a fast path for PV guests because they use it to
trigger the event channel upcall. So letting it bounce all the way up
to userspace is not great.
Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220303154127.202856-14-dwmw2@infradead.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

28d1629f

KVM: x86/xen: handle PV timers oneshot mode · 53639526

由 Joao Martins 提交于 3月 03, 2022

If the guest has offloaded the timer virq, handle the following
hypercalls for programming the timer:

    VCPUOP_set_singleshot_timer
    VCPUOP_stop_singleshot_timer
    set_timer_op(timestamp_ns)

The event channel corresponding to the timer virq is then used to inject
events once timer deadlines are met. For now we back the PV timer with
hrtimer.

[ dwmw2: Add save/restore, 32-bit compat mode, immediate delivery,
         don't check timer in kvm_vcpu_has_event() ]
Signed-off-by: NJoao Martins <joao.m.martins@oracle.com>
Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220303154127.202856-13-dwmw2@infradead.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

53639526

KVM: x86/xen: Add KVM_XEN_VCPU_ATTR_TYPE_VCPU_ID · 942c2490

由 David Woodhouse 提交于 3月 03, 2022

In order to intercept hypercalls such as VCPUOP_set_singleshot_timer, we
need to be aware of the Xen CPU numbering.

This looks a lot like the Hyper-V handling of vpidx, for obvious reasons.
Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220303154127.202856-12-dwmw2@infradead.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

942c2490

KVM: x86/xen: handle PV IPI vcpu yield · 0ec6c5c5

由 Joao Martins 提交于 3月 03, 2022

Cooperative Linux guests after an IPI-many may yield vcpu if
any of the IPI'd vcpus were preempted (i.e. runstate is 'runnable'.)
Support SCHEDOP_yield for handling yield.
Signed-off-by: NJoao Martins <joao.m.martins@oracle.com>
Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220303154127.202856-11-dwmw2@infradead.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0ec6c5c5

KVM: x86/xen: intercept EVTCHNOP_send from guests · 2fd6df2f

由 Joao Martins 提交于 3月 03, 2022

Userspace registers a sending @port to either deliver to an @eventfd
or directly back to a local event channel port.

After binding events the guest or host may wish to bind those
events to a particular vcpu. This is usually done for unbound
and and interdomain events. Update requests are handled via the
KVM_XEN_EVTCHN_UPDATE flag.

Unregistered ports are handled by the emulator.
Co-developed-by: NAnkur Arora <ankur.a.arora@oracle.com>
Co-developed-By: NDavid Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: NJoao Martins <joao.m.martins@oracle.com>
Signed-off-by: NAnkur Arora <ankur.a.arora@oracle.com>
Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220303154127.202856-10-dwmw2@infradead.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

2fd6df2f

KVM: x86/xen: Support direct injection of event channel events · 35025735

由 David Woodhouse 提交于 3月 03, 2022

This adds a KVM_XEN_HVM_EVTCHN_SEND ioctl which allows direct injection
of events given an explicit { vcpu, port, priority } in precisely the
same form that those fields are given in the IRQ routing table.

Userspace is currently able to inject 2-level events purely by setting
the bits in the shared_info and vcpu_info, but FIFO event channels are
harder to deal with; we will need the kernel to take sole ownership of
delivery when we support those.

A patch advertising this feature with a new bit in the KVM_CAP_XEN_HVM
ioctl will be added in a subsequent patch.
Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220303154127.202856-9-dwmw2@infradead.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

35025735

KVM: x86/xen: Make kvm_xen_set_evtchn() reusable from other places · 8733068b

由 David Woodhouse 提交于 3月 03, 2022

Clean it up to return -errno on error consistently, while still being
compatible with the return conventions for kvm_arch_set_irq_inatomic()
and the kvm_set_irq() callback.

We use -ENOTCONN to indicate when the port is masked. No existing users
care, except that it's negative.

Also allow it to optimise the vCPU lookup. Unless we abuse the lapic
map, there is no quick lookup from APIC ID to a vCPU; the logic in
kvm_get_vcpu_by_id() will just iterate over all vCPUs till it finds
the one it wants. So do that just once and stash the result in the
struct kvm_xen_evtchn for next time.
Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220303154127.202856-8-dwmw2@infradead.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

8733068b

KVM: x86/xen: Use gfn_to_pfn_cache for vcpu_time_info · 69d413cf

由 David Woodhouse 提交于 3月 03, 2022

This switches the final pvclock to kvm_setup_pvclock_pfncache() and now
the old kvm_setup_pvclock_page() can be removed.
Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220303154127.202856-7-dwmw2@infradead.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

69d413cf

KVM: x86/xen: Use gfn_to_pfn_cache for vcpu_info · 7caf9571

由 David Woodhouse 提交于 3月 03, 2022

Currently, the fast path of kvm_xen_set_evtchn_fast() doesn't set the
index bits in the target vCPU's evtchn_pending_sel, because it only has
a userspace virtual address with which to do so. It just sets them in
the kernel, and kvm_xen_has_interrupt() then completes the delivery to
the actual vcpu_info structure when the vCPU runs.

Using a gfn_to_pfn_cache allows kvm_xen_set_evtchn_fast() to do the full
delivery in the common case.

Clean up the fallback case too, by moving the deferred delivery out into
a separate kvm_xen_inject_pending_events() function which isn't ever
called in atomic contexts as __kvm_xen_has_interrupt() is.
Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220303154127.202856-6-dwmw2@infradead.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7caf9571

KVM: x86/xen: Use gfn_to_pfn_cache for runstate area · a795cd43

由 David Woodhouse 提交于 3月 03, 2022

Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220303154127.202856-4-dwmw2@infradead.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a795cd43

KVM: Remove dirty handling from gfn_to_pfn_cache completely · cf1d88b3

由 David Woodhouse 提交于 3月 03, 2022

It isn't OK to cache the dirty status of a page in internal structures
for an indefinite period of time.

Any time a vCPU exits the run loop to userspace might be its last; the
VMM might do its final check of the dirty log, flush the last remaining
dirty pages to the destination and complete a live migration. If we
have internal 'dirty' state which doesn't get flushed until the vCPU
is finally destroyed on the source after migration is complete, then
we have lost data because that will escape the final copy.

This problem already exists with the use of kvm_vcpu_unmap() to mark
pages dirty in e.g. VMX nesting.

Note that the actual Linux MM already considers the page to be dirty
since we have a writeable mapping of it. This is just about the KVM
dirty logging.

For the nesting-style use cases (KVM_GUEST_USES_PFN) we will need to
track which gfn_to_pfn_caches have been used and explicitly mark the
corresponding pages dirty before returning to userspace. But we would
have needed external tracking of that anyway, rather than walking the
full list of GPCs to find those belonging to this vCPU which are dirty.

So let's rely *solely* on that external tracking, and keep it simple
rather than laying a tempting trap for callers to fall into.
Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220303154127.202856-3-dwmw2@infradead.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

cf1d88b3

KVM: Use enum to track if cached PFN will be used in guest and/or host · d0d96121

由 Sean Christopherson 提交于 3月 03, 2022

Replace the guest_uses_pa and kernel_map booleans in the PFN cache code
with a unified enum/bitmask. Using explicit names makes it easier to
review and audit call sites.

Opportunistically add a WARN to prevent passing garbage; instantating a
cache without declaring its usage is either buggy or pointless.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220303154127.202856-2-dwmw2@infradead.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d0d96121

11 2月, 2022 2 次提交

KVM: xen: Use static_call() for invoking kvm_x86_ops hooks · 0264a351

由 Sean Christopherson 提交于 1月 28, 2022

Use static_call() for invoking kvm_x86_ops function that already have a
defined static call, mostly as a step toward having _all_ calls to
kvm_x86_ops route through a static_call() in order to simplify auditing,
e.g. via grep, that all functions have an entry in kvm-x86-ops.h, but
also because there's no reason not to use a static_call().
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20220128005208.4008533-8-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0264a351

KVM: x86/xen: Fix runstate updates to be atomic when preempting vCPU · fcb732d8

由 David Woodhouse 提交于 10月 25, 2021

There are circumstances whem kvm_xen_update_runstate_guest() should not
sleep because it ends up being called from __schedule() when the vCPU
is preempted:

[  222.830825]  kvm_xen_update_runstate_guest+0x24/0x100
[  222.830878]  kvm_arch_vcpu_put+0x14c/0x200
[  222.830920]  kvm_sched_out+0x30/0x40
[  222.830960]  __schedule+0x55c/0x9f0

To handle this, make it use the same trick as __kvm_xen_has_interrupt(),
of using the hva from the gfn_to_hva_cache directly. Then it can use
pagefault_disable() around the accesses and just bail out if the page
is absent (which is unlikely).

I almost switched to using a gfn_to_pfn_cache here and bailing out if
kvm_map_gfn() fails, like kvm_steal_time_set_preempted() does — but on
closer inspection it looks like kvm_map_gfn() will *always* fail in
atomic context for a page in IOMEM, which means it will silently fail
to make the update every single time for such guests, AFAICT. So I
didn't do it that way after all. And will probably fix that one too.

Cc: stable@vger.kernel.org
Fixes: 30b5c851 ("KVM: x86/xen: Add support for vCPU runstate information")
Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
Message-Id: <b17a93e5ff4561e57b1238e3e7ccd0b613eb827e.camel@infradead.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

fcb732d8

24 1月, 2022 1 次提交

x86,kvm/xen: Remove superfluous .fixup usage · adb759e5

由 Peter Zijlstra 提交于 1月 23, 2022

Commit 14243b38 ("KVM: x86/xen: Add KVM_IRQ_ROUTING_XEN_EVTCHN and
event channel delivery") adds superfluous .fixup usage after the whole
.fixup section was removed in commit e5eefda5 ("x86: Remove .fixup
section").

Fixes: 14243b38 ("KVM: x86/xen: Add KVM_IRQ_ROUTING_XEN_EVTCHN and event channel delivery")
Reported-by: NBorislav Petkov <bp@alien8.de>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Message-Id: <20220123124219.GH20638@worktop.programming.kicks-ass.net>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

adb759e5

07 1月, 2022 3 次提交

KVM: x86: Fix wall clock writes in Xen shared_info not to mark page dirty · 55749769

由 David Woodhouse 提交于 12月 10, 2021

When dirty ring logging is enabled, any dirty logging without an active
vCPU context will cause a kernel oops. But we've already declared that
the shared_info page doesn't get dirty tracking anyway, since it would
be kind of insane to mark it dirty every time we deliver an event channel
interrupt. Userspace is supposed to just assume it's always dirty any
time a vCPU can run or event channels are routed.

So stop using the generic kvm_write_wall_clock() and just write directly
through the gfn_to_pfn_cache that we already have set up.

We can make kvm_write_wall_clock() static in x86.c again now, but let's
not remove the 'sec_hi_ofs' argument even though it's not used yet. At
some point we *will* want to use that for KVM guests too.

Fixes: 629b5348 ("KVM: x86/xen: update wallclock region")
Reported-by: Nbutt3rflyh4ck <butterflyhuangxx@gmail.com>
Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
Message-Id: <20211210163625.2886-6-dwmw2@infradead.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

55749769

KVM: x86/xen: Add KVM_IRQ_ROUTING_XEN_EVTCHN and event channel delivery · 14243b38

由 David Woodhouse 提交于 12月 10, 2021

This adds basic support for delivering 2 level event channels to a guest.

Initially, it only supports delivery via the IRQ routing table, triggered
by an eventfd. In order to do so, it has a kvm_xen_set_evtchn_fast()
function which will use the pre-mapped shared_info page if it already
exists and is still valid, while the slow path through the irqfd_inject
workqueue will remap the shared_info page if necessary.

It sets the bits in the shared_info page but not the vcpu_info; that is
deferred to __kvm_xen_has_interrupt() which raises the vector to the
appropriate vCPU.

Add a 'verbose' mode to xen_shinfo_test while adding test cases for this.
Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
Message-Id: <20211210163625.2886-5-dwmw2@infradead.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

14243b38

KVM: x86/xen: Maintain valid mapping of Xen shared_info page · 1cfc9c4b

由 David Woodhouse 提交于 12月 10, 2021

Use the newly reinstated gfn_to_pfn_cache to maintain a kernel mapping
of the Xen shared_info page so that it can be accessed in atomic context.

Note that we do not participate in dirty tracking for the shared info
page and we do not explicitly mark it dirty every single tim we deliver
an event channel interrupts. We wouldn't want to do that even if we *did*
have a valid vCPU context with which to do so.
Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
Message-Id: <20211210163625.2886-4-dwmw2@infradead.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

1cfc9c4b

18 11月, 2021 3 次提交

KVM: x86: Assume a 64-bit hypercall for guests with protected state · b5aead00

由 Tom Lendacky 提交于 5月 24, 2021

When processing a hypercall for a guest with protected state, currently
SEV-ES guests, the guest CS segment register can't be checked to
determine if the guest is in 64-bit mode. For an SEV-ES guest, it is
expected that communication between the guest and the hypervisor is
performed to shared memory using the GHCB. In order to use the GHCB, the
guest must have been in long mode, otherwise writes by the guest to the
GHCB would be encrypted and not be able to be comprehended by the
hypervisor.

Create a new helper function, is_64_bit_hypercall(), that assumes the
guest is in 64-bit mode when the guest has protected state, and returns
true, otherwise invoking is_64_bit_mode() to determine the mode. Update
the hypercall related routines to use is_64_bit_hypercall() instead of
is_64_bit_mode().

Add a WARN_ON_ONCE() to is_64_bit_mode() to catch occurences of calls to
this helper function for a guest running with protected state.

Fixes: f1c6366e ("KVM: SVM: Add required changes to support intercepts under SEV-ES")
Reported-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
Message-Id: <e0b20c770c9d0d1403f23d83e785385104211f74.1621878537.git.thomas.lendacky@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b5aead00

KVM: x86/xen: Use sizeof_field() instead of open-coding it · 6a834754

由 David Woodhouse 提交于 11月 15, 2021

Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
Message-Id: <20211115165030.7422-4-dwmw2@infradead.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

6a834754

KVM: x86/xen: Fix get_attr of KVM_XEN_ATTR_TYPE_SHARED_INFO · 4e843647

由 David Woodhouse 提交于 11月 15, 2021

In commit 319afe68 ("KVM: xen: do not use struct gfn_to_hva_cache") we
stopped storing this in-kernel as a GPA, and started storing it as a GFN.
Which means we probably should have stopped calling gpa_to_gfn() on it
when userspace asks for it back.

Cc: stable@vger.kernel.org
Fixes: 319afe68 ("KVM: xen: do not use struct gfn_to_hva_cache")
Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
Message-Id: <20211115165030.7422-2-dwmw2@infradead.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4e843647

25 10月, 2021 1 次提交

KVM: x86/xen: Fix kvm_xen_has_interrupt() sleeping in kvm_vcpu_block() · 0985dba8

由 David Woodhouse 提交于 10月 23, 2021

In kvm_vcpu_block, the current task is set to TASK_INTERRUPTIBLE before
making a final check whether the vCPU should be woken from HLT by any
incoming interrupt.

This is a problem for the get_user() in __kvm_xen_has_interrupt(), which
really shouldn't be sleeping when the task state has already been set.
I think it's actually harmless as it would just manifest itself as a
spurious wakeup, but it's causing a debug warning:

[  230.963649] do not call blocking ops when !TASK_RUNNING; state=1 set at [<00000000b6bcdbc9>] prepare_to_swait_exclusive+0x30/0x80

Fix the warning by turning it into an *explicit* spurious wakeup. When
invoked with !task_is_running(current) (and we might as well add
in_atomic() there while we're at it), just return 1 to indicate that
an IRQ is pending, which will cause a wakeup and then something will
call it again in a context that *can* sleep so it can fault the page
back in.

Cc: stable@vger.kernel.org
Fixes: 40da8ccd ("KVM: x86/xen: Add event channel interrupt vector upcall")
Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>

Message-Id: <168bf8c689561da904e48e2ff5ae4713eaef9e2d.camel@infradead.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0985dba8

05 8月, 2021 1 次提交

KVM: xen: do not use struct gfn_to_hva_cache · 319afe68

由 Paolo Bonzini 提交于 8月 04, 2021

gfn_to_hva_cache is not thread-safe, so it is usually used only within
a vCPU (whose code is protected by vcpu->mutex).  The Xen interface
implementation has such a cache in kvm->arch, but it is not really
used except to store the location of the shared info page.  Replace
shinfo_set and shinfo_cache with just the value that is passed via
KVM_XEN_ATTR_TYPE_SHARED_INFO; the only complication is that the
initialization value is not zero anymore and therefore kvm_xen_init_vm
needs to be introduced.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

319afe68

26 4月, 2021 1 次提交

KVM: x86: Rename GPR accessors to make mode-aware variants the defaults · 27b4a9c4

由 Sean Christopherson 提交于 4月 21, 2021

Append raw to the direct variants of kvm_register_read/write(), and
drop the "l" from the mode-aware variants.  I.e. make the mode-aware
variants the default, and make the direct variants scary sounding so as
to discourage use.  Accessing the full 64-bit values irrespective of
mode is rarely the desired behavior.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210422022128.3464144-10-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

27b4a9c4

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功