提交 · 2f9f5cddb29b4fbdf2d328c7a6326d53227e6329 · openeuler / Kernel

21 1月, 2020 2 次提交

KVM: Fix some grammar mistakes · 00116795

由 Miaohe Lin 提交于 12月 11, 2019

Fix some grammar mistakes in the comments.
Signed-off-by: NMiaohe Lin <linmiaohe@huawei.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

00116795

KVM: Fix some wrong function names in comment · 668effb6

由 Miaohe Lin 提交于 12月 11, 2019

Fix some wrong function names in comment. mmu_check_roots is a typo for
mmu_check_root, vmcs_read_any should be vmcs12_read_any and so on.
Signed-off-by: NMiaohe Lin <linmiaohe@huawei.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

668effb6

09 1月, 2020 2 次提交

KVM: x86: Use gpa_t for cr2/gpa to fix TDP support on 32-bit KVM · 736c291c

由 Sean Christopherson 提交于 12月 06, 2019

Convert a plethora of parameters and variables in the MMU and page fault
flows from type gva_t to gpa_t to properly handle TDP on 32-bit KVM.

Thanks to PSE and PAE paging, 32-bit kernels can access 64-bit physical
addresses.  When TDP is enabled, the fault address is a guest physical
address and thus can be a 64-bit value, even when both KVM and its guest
are using 32-bit virtual addressing, e.g. VMX's VMCS.GUEST_PHYSICAL is a
64-bit field, not a natural width field.

Using a gva_t for the fault address means KVM will incorrectly drop the
upper 32-bits of the GPA.  Ditto for gva_to_gpa() when it is used to
translate L2 GPAs to L1 GPAs.

Opportunistically rename variables and parameters to better reflect the
dual address modes, e.g. use "cr2_or_gpa" for fault addresses and plain
"addr" instead of "vaddr" when the address may be either a GVA or an L2
GPA.  Similarly, use "gpa" in the nonpaging_page_fault() flows to avoid
a confusing "gpa_t gva" declaration; this also sets the stage for a
future patch to combing nonpaging_page_fault() and tdp_page_fault() with
minimal churn.

Sprinkle in a few comments to document flows where an address is known
to be a GVA and thus can be safely truncated to a 32-bit value.  Add
WARNs in kvm_handle_page_fault() and FNAME(gva_to_gpa_nested)() to help
document such cases and detect bugs.

Cc: stable@vger.kernel.org
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

736c291c

KVM: get rid of var page in kvm_set_pfn_dirty() · d29c03a5

由 Miaohe Lin 提交于 12月 05, 2019

We can get rid of unnecessary var page in
kvm_set_pfn_dirty() , thus make code style
similar with kvm_set_pfn_accessed().
Signed-off-by: NMiaohe Lin <linmiaohe@huawei.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d29c03a5

13 12月, 2019 1 次提交

KVM: arm/arm64: Properly handle faulting of device mappings · 6d674e28

由 Marc Zyngier 提交于 12月 11, 2019

A device mapping is normally always mapped at Stage-2, since there
is very little gain in having it faulted in.

Nonetheless, it is possible to end-up in a situation where the device
mapping has been removed from Stage-2 (userspace munmaped the VFIO
region, and the MMU notifier did its job), but present in a userspace
mapping (userpace has mapped it back at the same address). In such
a situation, the device mapping will be demand-paged as the guest
performs memory accesses.

This requires to be careful when dealing with mapping size, cache
management, and to handle potential execution of a device mapping.
Reported-by: NAlexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Tested-by: NAlexandru Elisei <alexandru.elisei@arm.com>
Reviewed-by: NJames Morse <james.morse@arm.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20191211165651.7889-2-maz@kernel.org

6d674e28

07 12月, 2019 1 次提交

KVM: arm/arm64: Remove excessive permission check in kvm_arch_prepare_memory_region · 97418e96

由 Jia He 提交于 12月 06, 2019

In kvm_arch_prepare_memory_region, arm kvm regards the memory region as
writable if the flag has no KVM_MEM_READONLY, and the vm is readonly if
!VM_WRITE.

But there is common usage for setting kvm memory region as follows:
e.g. qemu side (see the PROT_NONE flag)
1. mmap(NULL, size, PROT_NONE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
   memory_region_init_ram_ptr()
2. re mmap the above area with read/write authority.

Such example is used in virtio-fs qemu codes which hasn't been upstreamed
[1]. But seems we can't forbid this example.

Without this patch, it will cause an EPERM during kvm_set_memory_region()
and cause qemu boot crash.

As told by Ard, "the underlying assumption is incorrect, i.e., that the
value of vm_flags at this point in time defines how the VMA is used
during its lifetime. There may be other cases where a VMA is created
with VM_READ vm_flags that are changed to VM_READ|VM_WRITE later, and
we are currently rejecting this use case as well."

[1] https://gitlab.com/virtio-fs/qemu/blob/5a356e/hw/virtio/vhost-user-fs.c#L488Suggested-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NJia He <justin.he@arm.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Reviewed-by: NChristoffer Dall <christoffer.dall@arm.com>
Link: https://lore.kernel.org/r/20191206020802.196108-1-justin.he@arm.com

97418e96

06 12月, 2019 3 次提交

KVM: arm/arm64: vgic: Use wrapper function to lock/unlock all vcpus in kvm_vgic_create() · 72a610f3

由 Miaohe Lin 提交于 11月 30, 2019

Use wrapper function lock_all_vcpus()/unlock_all_vcpus()
in kvm_vgic_create() to remove duplicated code dealing
with locking and unlocking all vcpus in a vm.
Signed-off-by: NMiaohe Lin <linmiaohe@huawei.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Reviewed-by: NEric Auger <eric.auger@redhat.com>
Reviewed-by: NSteven Price <steven.price@arm.com>
Link: https://lore.kernel.org/r/1575081918-11401-1-git-send-email-linmiaohe@huawei.com

72a610f3

KVM: arm/arm64: vgic: Fix potential double free dist->spis in __kvm_vgic_destroy() · 0bda9498

由 Miaohe Lin 提交于 11月 28, 2019

In kvm_vgic_dist_init() called from kvm_vgic_map_resources(), if
dist->vgic_model is invalid, dist->spis will be freed without set
dist->spis = NULL. And in vgicv2 resources clean up path,
__kvm_vgic_destroy() will be called to free allocated resources.
And dist->spis will be freed again in clean up chain because we
forget to set dist->spis = NULL in kvm_vgic_dist_init() failed
path. So double free would happen.
Signed-off-by: NMiaohe Lin <linmiaohe@huawei.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Reviewed-by: NEric Auger <eric.auger@redhat.com>
Link: https://lore.kernel.org/r/1574923128-19956-1-git-send-email-linmiaohe@huawei.com

0bda9498

KVM: arm/arm64: Get rid of unused arg in cpu_init_hyp_mode() · 7e0befd5

由 Miaohe Lin 提交于 11月 21, 2019

As arg dummy is not really needed, there's no need to pass
NULL when calling cpu_init_hyp_mode(). So clean it up.

Fixes: 67f69197 ("arm64: kvm: allows kvm cpu hotplug")
Reviewed-by: NSteven Price <steven.price@arm.com>
Signed-off-by: NMiaohe Lin <linmiaohe@huawei.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/1574320559-5662-1-git-send-email-linmiaohe@huawei.com

7e0befd5

23 11月, 2019 1 次提交

KVM: Fix jump label out_free_* in kvm_init() · faf0be22

由 Miaohe Lin 提交于 11月 23, 2019

The jump label out_free_1 and out_free_2 deal with
the same stuff, so git rid of one and rename the
label out_free_0a to retain the label name order.
Signed-off-by: NMiaohe Lin <linmiaohe@huawei.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

faf0be22

15 11月, 2019 3 次提交

KVM: remember position in kvm->vcpus array · 8750e72a

由 Radim Krčmář 提交于 11月 07, 2019

Fetching an index for any vcpu in kvm->vcpus array by traversing
the entire array everytime is costly.
This patch remembers the position of each vcpu in kvm->vcpus array
by storing it in vcpus_idx under kvm_vcpu structure.
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NNitesh Narayan Lal <nitesh@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

8750e72a

KVM: MMIO: get rid of odd out_err label in kvm_coalesced_mmio_init · b139b5a2

由 Miaohe Lin 提交于 11月 09, 2019

The out_err label and var ret is unnecessary, clean them up.
Signed-off-by: NMiaohe Lin <linmiaohe@huawei.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b139b5a2

KVM: Add a comment describing the /dev/kvm no_compat handling · 9cb09e7c

由 Marc Zyngier 提交于 11月 14, 2019

Add a comment explaining the rational behind having both
no_compat open and ioctl callbacks to fend off compat tasks.
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

9cb09e7c

14 11月, 2019 1 次提交

KVM: Forbid /dev/kvm being opened by a compat task when CONFIG_KVM_COMPAT=n · b9876e6d

由 Marc Zyngier 提交于 11月 13, 2019

On a system without KVM_COMPAT, we prevent IOCTLs from being issued
by a compat task. Although this prevents most silly things from
happening, it can still confuse a 32bit userspace that is able
to open the kvm device (the qemu test suite seems to be pretty
mad with this behaviour).

Take a more radical approach and return a -ENODEV to the compat
task.
Reported-by: NPeter Maydell <peter.maydell@linaro.org>
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b9876e6d

12 11月, 2019 1 次提交

KVM: MMU: Do not treat ZONE_DEVICE pages as being reserved · a78986aa

由 Sean Christopherson 提交于 11月 11, 2019

Explicitly exempt ZONE_DEVICE pages from kvm_is_reserved_pfn() and
instead manually handle ZONE_DEVICE on a case-by-case basis. For things
like page refcounts, KVM needs to treat ZONE_DEVICE pages like normal
pages, e.g. put pages grabbed via gup(). But for flows such as setting
A/D bits or shifting refcounts for transparent huge pages, KVM needs to
to avoid processing ZONE_DEVICE pages as the flows in question lack the
underlying machinery for proper handling of ZONE_DEVICE pages.

This fixes a hang reported by Adam Borowski[*] in dev_pagemap_cleanup()
when running a KVM guest backed with /dev/dax memory, as KVM straight up
doesn't put any references to ZONE_DEVICE pages acquired by gup().

Note, Dan Williams proposed an alternative solution of doing put_page()
on ZONE_DEVICE pages immediately after gup() in order to simplify the
auditing needed to ensure is_zone_device_page() is called if and only if
the backing device is pinned (via gup()). But that approach would break
kvm_vcpu_{un}map() as KVM requires the page to be pinned from map() 'til
unmap() when accessing guest memory, unlike KVM's secondary MMU, which
coordinates with mmu_notifier invalidations to avoid creating stale
page references, i.e. doesn't rely on pages being pinned.

[*] http://lkml.kernel.org/r/20190919115547.GA17963@angband.plReported-by: NAdam Borowski <kilobyte@angband.pl>
Analyzed-by: NDavid Hildenbrand <david@redhat.com>
Acked-by: NDan Williams <dan.j.williams@intel.com>
Cc: stable@vger.kernel.org
Fixes: 3565fce3 ("mm, x86: get_user_pages() for dax mappings")
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a78986aa

11 11月, 2019 2 次提交

KVM: fix placement of refcount initialization · e2d3fcaf

由 Paolo Bonzini 提交于 11月 04, 2019

Reported by syzkaller:

   =============================
   WARNING: suspicious RCU usage
   -----------------------------
   ./include/linux/kvm_host.h:536 suspicious rcu_dereference_check() usage!

   other info that might help us debug this:

   rcu_scheduler_active = 2, debug_locks = 1
   no locks held by repro_11/12688.

   stack backtrace:
   Call Trace:
    dump_stack+0x7d/0xc5
    lockdep_rcu_suspicious+0x123/0x170
    kvm_dev_ioctl+0x9a9/0x1260 [kvm]
    do_vfs_ioctl+0x1a1/0xfb0
    ksys_ioctl+0x6d/0x80
    __x64_sys_ioctl+0x73/0xb0
    do_syscall_64+0x108/0xaa0
    entry_SYSCALL_64_after_hwframe+0x49/0xbe

Commit a97b0e77 (kvm: call kvm_arch_destroy_vm if vm creation fails)
sets users_count to 1 before kvm_arch_init_vm(), however, if kvm_arch_init_vm()
fails, we need to decrease this count.  By moving it earlier, we can push
the decrease to out_err_no_arch_destroy_vm without introducing yet another
error label.

syzkaller source: https://syzkaller.appspot.com/x/repro.c?x=15209b84e00000

Reported-by: syzbot+75475908cd0910f141ee@syzkaller.appspotmail.com
Fixes: a97b0e77 ("kvm: call kvm_arch_destroy_vm if vm creation fails")
Cc: Jim Mattson <jmattson@google.com>
Analyzed-by: NWanpeng Li <wanpengli@tencent.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e2d3fcaf

KVM: Fix NULL-ptr deref after kvm_create_vm fails · 8a44119a

由 Paolo Bonzini 提交于 11月 04, 2019

Reported by syzkaller:

    kasan: CONFIG_KASAN_INLINE enabled
    kasan: GPF could be caused by NULL-ptr deref or user memory access
    general protection fault: 0000 [#1] PREEMPT SMP KASAN
    CPU: 0 PID: 14727 Comm: syz-executor.3 Not tainted 5.4.0-rc4+ #0
    RIP: 0010:kvm_coalesced_mmio_init+0x5d/0x110 arch/x86/kvm/../../../virt/kvm/coalesced_mmio.c:121
    Call Trace:
     kvm_dev_ioctl_create_vm arch/x86/kvm/../../../virt/kvm/kvm_main.c:3446 [inline]
     kvm_dev_ioctl+0x781/0x1490 arch/x86/kvm/../../../virt/kvm/kvm_main.c:3494
     vfs_ioctl fs/ioctl.c:46 [inline]
     file_ioctl fs/ioctl.c:509 [inline]
     do_vfs_ioctl+0x196/0x1150 fs/ioctl.c:696
     ksys_ioctl+0x62/0x90 fs/ioctl.c:713
     __do_sys_ioctl fs/ioctl.c:720 [inline]
     __se_sys_ioctl fs/ioctl.c:718 [inline]
     __x64_sys_ioctl+0x6e/0xb0 fs/ioctl.c:718
     do_syscall_64+0xca/0x5d0 arch/x86/entry/common.c:290
     entry_SYSCALL_64_after_hwframe+0x49/0xbe

Commit 9121923c ("kvm: Allocate memslots and buses before calling kvm_arch_init_vm")
moves memslots and buses allocations around, however, if kvm->srcu/irq_srcu fails
initialization, NULL will be returned instead of error code, NULL will not be intercepted
in kvm_dev_ioctl_create_vm() and be dereferenced by kvm_coalesced_mmio_init(), this patch
fixes it.

Moving the initialization is required anyway to avoid an incorrect synchronize_srcu that
was also reported by syzkaller:

 wait_for_completion+0x29c/0x440 kernel/sched/completion.c:136
 __synchronize_srcu+0x197/0x250 kernel/rcu/srcutree.c:921
 synchronize_srcu_expedited kernel/rcu/srcutree.c:946 [inline]
 synchronize_srcu+0x239/0x3e8 kernel/rcu/srcutree.c:997
 kvm_page_track_unregister_notifier+0xe7/0x130 arch/x86/kvm/page_track.c:212
 kvm_mmu_uninit_vm+0x1e/0x30 arch/x86/kvm/mmu.c:5828
 kvm_arch_destroy_vm+0x4a2/0x5f0 arch/x86/kvm/x86.c:9579
 kvm_create_vm arch/x86/kvm/../../../virt/kvm/kvm_main.c:702 [inline]

so do it.

Reported-by: syzbot+89a8060879fa0bd2db4f@syzkaller.appspotmail.com
Reported-by: syzbot+e27e7027eb2b80e44225@syzkaller.appspotmail.com
Fixes: 9121923c ("kvm: Allocate memslots and buses before calling kvm_arch_init_vm")
Cc: Jim Mattson <jmattson@google.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

8a44119a

08 11月, 2019 3 次提交

KVM: arm64: Opportunistically turn off WFI trapping when using direct LPI injection · ef2e78dd

由 Marc Zyngier 提交于 11月 07, 2019

Just like we do for WFE trapping, it can be useful to turn off
WFI trapping when the physical CPU is not oversubscribed (that
is, the vcpu is the only runnable process on this CPU) *and*
that we're using direct injection of interrupts.

The conditions are reevaluated on each vcpu_load(), ensuring that
we don't switch to this mode on a busy system.

On a GICv4 system, this has the effect of reducing the generation
of doorbell interrupts to zero when the right conditions are
met, which is a huge improvement over the current situation
(where the doorbells are screaming if the CPU ever hits a
blocking WFI).
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Reviewed-by: NZenghui Yu <yuzenghui@huawei.com>
Reviewed-by: NChristoffer Dall <christoffer.dall@arm.com>
Link: https://lore.kernel.org/r/20191107160412.30301-3-maz@kernel.org

ef2e78dd

KVM: vgic-v4: Track the number of VLPIs per vcpu · 5bd90b09

由 Marc Zyngier 提交于 11月 07, 2019

In order to find out whether a vcpu is likely to be the target of
VLPIs (and to further optimize the way we deal with those), let's
track the number of VLPIs a vcpu can receive.

This gets implemented with an atomic variable that gets incremented
or decremented on map, unmap and move of a VLPI.
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Reviewed-by: NZenghui Yu <yuzenghui@huawei.com>
Reviewed-by: NChristoffer Dall <christoffer.dall@arm.com>
Link: https://lore.kernel.org/r/20191107160412.30301-2-maz@kernel.org

5bd90b09

KVM: arm/arm64: Let the timer expire in hardirq context on RT · 9090825f

由 Thomas Gleixner 提交于 11月 07, 2019

The timers are canceled from an preempt-notifier which is invoked with
disabled preemption which is not allowed on PREEMPT_RT.
The timer callback is short so in could be invoked in hard-IRQ context
on -RT.

Let the timer expire on hard-IRQ context even on -RT.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Tested-by: NJulien Grall <julien.grall@arm.com>
Acked-by: NMarc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20191107095424.16647-1-bigeasy@linutronix.de

9090825f

05 11月, 2019 1 次提交

kvm: x86: mmu: Recovery of shattered NX large pages · 1aa9b957

由 Junaid Shahid 提交于 11月 04, 2019

The page table pages corresponding to broken down large pages are zapped in
FIFO order, so that the large page can potentially be recovered, if it is
not longer being used for execution.  This removes the performance penalty
for walking deeper EPT page tables.

By default, one large page will last about one hour once the guest
reaches a steady state.
Signed-off-by: NJunaid Shahid <junaids@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

1aa9b957

04 11月, 2019 1 次提交

kvm: Add helper function for creating VM worker threads · c57c8046

由 Junaid Shahid 提交于 11月 04, 2019

Add a function to create a kernel thread associated with a given VM. In
particular, it ensures that the worker thread inherits the priority and
cgroups of the calling thread.
Signed-off-by: NJunaid Shahid <junaids@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

c57c8046

31 10月, 2019 1 次提交

kvm: call kvm_arch_destroy_vm if vm creation fails · a97b0e77

由 Jim Mattson 提交于 10月 25, 2019

In kvm_create_vm(), if we've successfully called kvm_arch_init_vm(), but
then fail later in the function, we need to call kvm_arch_destroy_vm()
so that it can do any necessary cleanup (like freeing memory).

Fixes: 44a95dae ("KVM: x86: Detect and Initialize AVIC support")
Signed-off-by: NJohn Sperbeck <jsperbeck@google.com>
Signed-off-by: NJim Mattson <jmattson@google.com>
Reviewed-by: NJunaid Shahid <junaids@google.com>
[Remove dependency on "kvm: Don't clear reference count on
 kvm_create_vm() error path" which was not committed. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a97b0e77

29 10月, 2019 3 次提交

KVM: arm/arm64: vgic: Don't rely on the wrong pending table · ca185b26

由 Zenghui Yu 提交于 10月 29, 2019

It's possible that two LPIs locate in the same "byte_offset" but target
two different vcpus, where their pending status are indicated by two
different pending tables.  In such a scenario, using last_byte_offset
optimization will lead KVM relying on the wrong pending table entry.
Let us use last_ptr instead, which can be treated as a byte index into
a pending table and also, can be vcpu specific.

Fixes: 28077125 ("KVM: arm64: vgic-v3: KVM_DEV_ARM_VGIC_SAVE_PENDING_TABLES")
Cc: stable@vger.kernel.org
Signed-off-by: NZenghui Yu <yuzenghui@huawei.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Acked-by: NEric Auger <eric.auger@redhat.com>
Link: https://lore.kernel.org/r/20191029071919.177-4-yuzenghui@huawei.com

ca185b26

KVM: arm/arm64: vgic: Fix some comments typo · bad36e4e

由 Zenghui Yu 提交于 10月 29, 2019

Fix various comments, including wrong function names, grammar mistakes
and specification references.
Signed-off-by: NZenghui Yu <yuzenghui@huawei.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20191029071919.177-3-yuzenghui@huawei.com

bad36e4e

KVM: arm64: vgic-v4: Move the GICv4 residency flow to be driven by vcpu_load/put · 8e01d9a3

由 Marc Zyngier 提交于 10月 27, 2019

When the VHE code was reworked, a lot of the vgic stuff was moved around,
but the GICv4 residency code did stay untouched, meaning that we come
in and out of residency on each flush/sync, which is obviously suboptimal.

To address this, let's move things around a bit:

- Residency entry (flush) moves to vcpu_load
- Residency exit (sync) moves to vcpu_put
- On blocking (entry to WFI), we "put"
- On unblocking (exit from WFI), we "load"

Because these can nest (load/block/put/load/unblock/put, for example),
we now have per-VPE tracking of the residency state.

Additionally, vgic_v4_put gains a "need doorbell" parameter, which only
gets set to true when blocking because of a WFI. This allows a finer
control of the doorbell, which now also gets disabled as soon as
it gets signaled.
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20191027144234.8395-2-maz@kernel.org

8e01d9a3

25 10月, 2019 1 次提交

kvm: Allocate memslots and buses before calling kvm_arch_init_vm · 9121923c

由 Jim Mattson 提交于 10月 24, 2019

This reorganization will allow us to call kvm_arch_destroy_vm in the
event that kvm_create_vm fails after calling kvm_arch_init_vm.
Suggested-by: NJunaid Shahid <junaids@google.com>
Signed-off-by: NJim Mattson <jmattson@google.com>
Reviewed-by: NJunaid Shahid <junaids@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

9121923c

22 10月, 2019 9 次提交

KVM: Add separate helper for putting borrowed reference to kvm · 149487bd

由 Sean Christopherson 提交于 10月 21, 2019

Add a new helper, kvm_put_kvm_no_destroy(), to handle putting a borrowed
reference[*] to the VM when installing a new file descriptor fails.  KVM
expects the refcount to remain valid in this case, as the in-progress
ioctl() has an explicit reference to the VM.  The primary motiviation
for the helper is to document that the 'kvm' pointer is still valid
after putting the borrowed reference, e.g. to document that doing
mutex(&kvm->lock) immediately after putting a ref to kvm isn't broken.

[*] When exposing a new object to userspace via a file descriptor, e.g.
    a new vcpu, KVM grabs a reference to itself (the VM) prior to making
    the object visible to userspace to avoid prematurely freeing the VM
    in the scenario where userspace immediately closes file descriptor.
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

149487bd

KVM: Don't shrink/grow vCPU halt_poll_ns if host side polling is disabled · 44551b2f

由 Wanpeng Li 提交于 9月 29, 2019

Don't waste cycles to shrink/grow vCPU halt_poll_ns if host
side polling is disabled.
Acked-by: NMarcelo Tosatti <mtosatti@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

44551b2f

KVM: arm64: Provide VCPU attributes for stolen time · 58772e9a

由 Steven Price 提交于 10月 21, 2019

Allow user space to inform the KVM host where in the physical memory
map the paravirtualized time structures should be located.

User space can set an attribute on the VCPU providing the IPA base
address of the stolen time structure for that VCPU. This must be
repeated for every VCPU in the VM.

The address is given in terms of the physical address visible to
the guest and must be 64 byte aligned. The guest will discover the
address via a hypercall.
Signed-off-by: NSteven Price <steven.price@arm.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>

58772e9a

KVM: Allow kvm_device_ops to be const · 8538cb22

由 Steven Price 提交于 10月 21, 2019

Currently a kvm_device_ops structure cannot be const without triggering
compiler warnings. However the structure doesn't need to be written to
and, by marking it const, it can be read-only in memory. Add some more
const keywords to allow this.
Reviewed-by: NAndrew Jones <drjones@redhat.com>
Signed-off-by: NSteven Price <steven.price@arm.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>

8538cb22

KVM: arm64: Support stolen time reporting via shared structure · 8564d637

由 Steven Price 提交于 10月 21, 2019

Implement the service call for configuring a shared structure between a
VCPU and the hypervisor in which the hypervisor can write the time
stolen from the VCPU's execution time by other tasks on the host.

User space allocates memory which is placed at an IPA also chosen by user
space. The hypervisor then updates the shared structure using
kvm_put_guest() to ensure single copy atomicity of the 64-bit value
reporting the stolen time in nanoseconds.

Whenever stolen time is enabled by the guest, the stolen time counter is
reset.

The stolen time itself is retrieved from the sched_info structure
maintained by the Linux scheduler code. We enable SCHEDSTATS when
selecting KVM Kconfig to ensure this value is meaningful.
Signed-off-by: NSteven Price <steven.price@arm.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>

8564d637

KVM: arm64: Implement PV_TIME_FEATURES call · b48c1a45

由 Steven Price 提交于 10月 21, 2019

This provides a mechanism for querying which paravirtualized time
features are available in this hypervisor.

Also add the header file which defines the ABI for the paravirtualized
time features we're about to add.
Signed-off-by: NSteven Price <steven.price@arm.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>

b48c1a45

KVM: arm/arm64: Factor out hypercall handling from PSCI code · 55009c6e

由 Christoffer Dall 提交于 10月 21, 2019

We currently intertwine the KVM PSCI implementation with the general
dispatch of hypercall handling, which makes perfect sense because PSCI
is the only category of hypercalls we support.

However, as we are about to support additional hypercalls, factor out
this functionality into a separate hypercall handler file.
Signed-off-by: NChristoffer Dall <christoffer.dall@arm.com>
[steven.price@arm.com: rebased]
Reviewed-by: NAndrew Jones <drjones@redhat.com>
Signed-off-by: NSteven Price <steven.price@arm.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>

55009c6e

KVM: arm/arm64: Allow user injection of external data aborts · da345174

由 Christoffer Dall 提交于 10月 11, 2019

In some scenarios, such as buggy guest or incorrect configuration of the
VMM and firmware description data, userspace will detect a memory access
to a portion of the IPA, which is not mapped to any MMIO region.

For this purpose, the appropriate action is to inject an external abort
to the guest.  The kernel already has functionality to inject an
external abort, but we need to wire up a signal from user space that
lets user space tell the kernel to do this.

It turns out, we already have the set event functionality which we can
perfectly reuse for this.
Signed-off-by: NChristoffer Dall <christoffer.dall@arm.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>

da345174

KVM: arm/arm64: Allow reporting non-ISV data aborts to userspace · c726200d

由 Christoffer Dall 提交于 10月 11, 2019

For a long time, if a guest accessed memory outside of a memslot using
any of the load/store instructions in the architecture which doesn't
supply decoding information in the ESR_EL2 (the ISV bit is not set), the
kernel would print the following message and terminate the VM as a
result of returning -ENOSYS to userspace:

  load/store instruction decoding not implemented

The reason behind this message is that KVM assumes that all accesses
outside a memslot is an MMIO access which should be handled by
userspace, and we originally expected to eventually implement some sort
of decoding of load/store instructions where the ISV bit was not set.

However, it turns out that many of the instructions which don't provide
decoding information on abort are not safe to use for MMIO accesses, and
the remaining few that would potentially make sense to use on MMIO
accesses, such as those with register writeback, are not used in
practice.  It also turns out that fetching an instruction from guest
memory can be a pretty horrible affair, involving stopping all CPUs on
SMP systems, handling multiple corner cases of address translation in
software, and more.  It doesn't appear likely that we'll ever implement
this in the kernel.

What is much more common is that a user has misconfigured his/her guest
and is actually not accessing an MMIO region, but just hitting some
random hole in the IPA space.  In this scenario, the error message above
is almost misleading and has led to a great deal of confusion over the
years.

It is, nevertheless, ABI to userspace, and we therefore need to
introduce a new capability that userspace explicitly enables to change
behavior.

This patch introduces KVM_CAP_ARM_NISV_TO_USER (NISV meaning Non-ISV)
which does exactly that, and introduces a new exit reason to report the
event to userspace.  User space can then emulate an exception to the
guest, restart the guest, suspend the guest, or take any other
appropriate action as per the policy of the running system.
Reported-by: NHeinrich Schuchardt <xypron.glpk@gmx.de>
Signed-off-by: NChristoffer Dall <christoffer.dall@arm.com>
Reviewed-by: NAlexander Graf <graf@amazon.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>

c726200d

20 10月, 2019 3 次提交

KVM: arm64: pmu: Reset sample period on overflow handling · 8c3252c0

由 Marc Zyngier 提交于 10月 06, 2019

The PMU emulation code uses the perf event sample period to trigger
the overflow detection. This works fine  for the *first* overflow
handling, but results in a huge number of interrupts on the host,
unrelated to the number of interrupts handled in the guest (a x20
factor is pretty common for the cycle counter). On a slow system
(such as a SW model), this can result in the guest only making
forward progress at a glacial pace.

It turns out that the clue is in the name. The sample period is
exactly that: a period. And once the an overflow has occured,
the following period should be the full width of the associated
counter, instead of whatever the guest had initially programed.

Reset the sample period to the architected value in the overflow
handler, which now results in a number of host interrupts that is
much closer to the number of interrupts in the guest.

Fixes: b02386eb ("arm64: KVM: Add PMU overflow interrupt routing")
Reviewed-by: NAndrew Murray <andrew.murray@arm.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>

8c3252c0

KVM: arm64: pmu: Set the CHAINED attribute before creating the in-kernel event · 725ce669

由 Marc Zyngier 提交于 10月 08, 2019

The current convention for KVM to request a chained event from the
host PMU is to set bit[0] in attr.config1 (PERF_ATTR_CFG1_KVM_PMU_CHAINED).

But as it turns out, this bit gets set *after* we create the kernel
event that backs our virtual counter, meaning that we never get
a 64bit counter.

Moving the setting to an earlier point solves the problem.

Fixes: 80f393a2 ("KVM: arm/arm64: Support chained PMU counters")
Reviewed-by: NAndrew Murray <andrew.murray@arm.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>

725ce669

KVM: arm64: pmu: Fix cycle counter truncation · f4e23cf9

由 Marc Zyngier 提交于 10月 03, 2019

When a counter is disabled, its value is sampled before the event
is being disabled, and the value written back in the shadow register.

In that process, the value gets truncated to 32bit, which is adequate
for any counter but the cycle counter (defined as a 64bit counter).

This obviously results in a corrupted counter, and things like
"perf record -e cycles" not working at all when run in a guest...
A similar, but less critical bug exists in kvm_pmu_get_counter_value.

Make the truncation conditional on the counter not being the cycle
counter, which results in a minor code reorganisation.

Fixes: 80f393a2 ("KVM: arm/arm64: Support chained PMU counters")
Reviewed-by: NAndrew Murray <andrew.murray@arm.com>
Reported-by: NJulien Thierry <julien.thierry.kdev@gmail.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>

f4e23cf9

01 10月, 2019 1 次提交

kvm: x86, powerpc: do not allow clearing largepages debugfs entry · 833b45de

由 Paolo Bonzini 提交于 9月 30, 2019

The largepages debugfs entry is incremented/decremented as shadow
pages are created or destroyed.  Clearing it will result in an
underflow, which is harmless to KVM but ugly (and could be
misinterpreted by tools that use debugfs information), so make
this particular statistic read-only.

Cc: kvm-ppc@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

833b45de

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功