提交 · 5bd90b0989731520f2cdcfbbe467f1271f3cc803 · openeuler / Kernel

08 11月, 2019 2 次提交

KVM: vgic-v4: Track the number of VLPIs per vcpu · 5bd90b09

由 Marc Zyngier 提交于 11月 07, 2019

In order to find out whether a vcpu is likely to be the target of
VLPIs (and to further optimize the way we deal with those), let's
track the number of VLPIs a vcpu can receive.

This gets implemented with an atomic variable that gets incremented
or decremented on map, unmap and move of a VLPI.
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Reviewed-by: NZenghui Yu <yuzenghui@huawei.com>
Reviewed-by: NChristoffer Dall <christoffer.dall@arm.com>
Link: https://lore.kernel.org/r/20191107160412.30301-2-maz@kernel.org

5bd90b09

KVM: arm/arm64: Let the timer expire in hardirq context on RT · 9090825f

由 Thomas Gleixner 提交于 11月 07, 2019

The timers are canceled from an preempt-notifier which is invoked with
disabled preemption which is not allowed on PREEMPT_RT.
The timer callback is short so in could be invoked in hard-IRQ context
on -RT.

Let the timer expire on hard-IRQ context even on -RT.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Tested-by: NJulien Grall <julien.grall@arm.com>
Acked-by: NMarc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20191107095424.16647-1-bigeasy@linutronix.de

9090825f

29 10月, 2019 3 次提交

KVM: arm/arm64: vgic: Don't rely on the wrong pending table · ca185b26

由 Zenghui Yu 提交于 10月 29, 2019

It's possible that two LPIs locate in the same "byte_offset" but target
two different vcpus, where their pending status are indicated by two
different pending tables.  In such a scenario, using last_byte_offset
optimization will lead KVM relying on the wrong pending table entry.
Let us use last_ptr instead, which can be treated as a byte index into
a pending table and also, can be vcpu specific.

Fixes: 28077125 ("KVM: arm64: vgic-v3: KVM_DEV_ARM_VGIC_SAVE_PENDING_TABLES")
Cc: stable@vger.kernel.org
Signed-off-by: NZenghui Yu <yuzenghui@huawei.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Acked-by: NEric Auger <eric.auger@redhat.com>
Link: https://lore.kernel.org/r/20191029071919.177-4-yuzenghui@huawei.com

ca185b26

KVM: arm/arm64: vgic: Fix some comments typo · bad36e4e

由 Zenghui Yu 提交于 10月 29, 2019

Fix various comments, including wrong function names, grammar mistakes
and specification references.
Signed-off-by: NZenghui Yu <yuzenghui@huawei.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20191029071919.177-3-yuzenghui@huawei.com

bad36e4e

KVM: arm64: vgic-v4: Move the GICv4 residency flow to be driven by vcpu_load/put · 8e01d9a3

由 Marc Zyngier 提交于 10月 27, 2019

When the VHE code was reworked, a lot of the vgic stuff was moved around,
but the GICv4 residency code did stay untouched, meaning that we come
in and out of residency on each flush/sync, which is obviously suboptimal.

To address this, let's move things around a bit:

- Residency entry (flush) moves to vcpu_load
- Residency exit (sync) moves to vcpu_put
- On blocking (entry to WFI), we "put"
- On unblocking (exit from WFI), we "load"

Because these can nest (load/block/put/load/unblock/put, for example),
we now have per-VPE tracking of the residency state.

Additionally, vgic_v4_put gains a "need doorbell" parameter, which only
gets set to true when blocking because of a WFI. This allows a finer
control of the doorbell, which now also gets disabled as soon as
it gets signaled.
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20191027144234.8395-2-maz@kernel.org

8e01d9a3

01 10月, 2019 1 次提交

kvm: x86, powerpc: do not allow clearing largepages debugfs entry · 833b45de

由 Paolo Bonzini 提交于 9月 30, 2019

The largepages debugfs entry is incremented/decremented as shadow
pages are created or destroyed.  Clearing it will result in an
underflow, which is harmless to KVM but ugly (and could be
misinterpreted by tools that use debugfs information), so make
this particular statistic read-only.

Cc: kvm-ppc@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

833b45de

18 9月, 2019 1 次提交

KVM: coalesced_mmio: add bounds checking · b60fe990

由 Matt Delco 提交于 9月 16, 2019

The first/last indexes are typically shared with a user app.
The app can change the 'last' index that the kernel uses
to store the next result.  This change sanity checks the index
before using it for writing to a potentially arbitrary address.

This fixes CVE-2019-14821.

Cc: stable@vger.kernel.org
Fixes: 5f94c174 ("KVM: Add coalesced MMIO support (common part)")
Signed-off-by: NMatt Delco <delco@chromium.org>
Signed-off-by: NJim Mattson <jmattson@google.com>
Reported-by: syzbot+983c866c3dd6efa3662a@syzkaller.appspotmail.com
[Use READ_ONCE. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b60fe990

11 9月, 2019 1 次提交

KVM: arm/arm64: vgic: Use the appropriate TRACE_INCLUDE_PATH · aac60f1a

由 Zenghui Yu 提交于 9月 11, 2019

Commit 49dfe94f ("KVM: arm/arm64: Fix TRACE_INCLUDE_PATH") fixes
TRACE_INCLUDE_PATH to the correct relative path to the define_trace.h
and explains why did the old one work.

The same fix should be applied to virt/kvm/arm/vgic/trace.h.
Reviewed-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: NZenghui Yu <yuzenghui@huawei.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>

aac60f1a

09 9月, 2019 1 次提交

KVM: arm/arm64: vgic: Allow more than 256 vcpus for KVM_IRQ_LINE · 92f35b75

由 Marc Zyngier 提交于 8月 18, 2019

While parts of the VGIC support a large number of vcpus (we
bravely allow up to 512), other parts are more limited.

One of these limits is visible in the KVM_IRQ_LINE ioctl, which
only allows 256 vcpus to be signalled when using the CPU or PPI
types. Unfortunately, we've cornered ourselves badly by allocating
all the bits in the irq field.

Since the irq_type subfield (8 bit wide) is currently only taking
the values 0, 1 and 2 (and we have been careful not to allow anything
else), let's reduce this field to only 4 bits, and allocate the
remaining 4 bits to a vcpu2_index, which acts as a multiplier:

  vcpu_id = 256 * vcpu2_index + vcpu_index

With that, and a new capability (KVM_CAP_ARM_IRQ_LINE_LAYOUT_2)
allowing this to be discovered, it becomes possible to inject
PPIs to up to 4096 vcpus. But please just don't.

Whilst we're there, add a clarification about the use of KVM_IRQ_LINE
on arm, which is not completely conditionned by KVM_CAP_IRQCHIP.
Reported-by: NZenghui Yu <yuzenghui@huawei.com>
Reviewed-by: NEric Auger <eric.auger@redhat.com>
Reviewed-by: NZenghui Yu <yuzenghui@huawei.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>

92f35b75

28 8月, 2019 1 次提交

KVM: arm/arm64: vgic-v2: Handle SGI bits in GICD_I{S,C}PENDR0 as WI · 82e40f55

由 Marc Zyngier 提交于 8月 28, 2019

A guest is not allowed to inject a SGI (or clear its pending state)
by writing to GICD_ISPENDR0 (resp. GICD_ICPENDR0), as these bits are
defined as WI (as per ARM IHI 0048B 4.3.7 and 4.3.8).

Make sure we correctly emulate the architecture.

Fixes: 96b29800 ("KVM: arm/arm64: vgic-new: Add PENDING registers handlers")
Cc: stable@vger.kernel.org # 4.7+
Reported-by: NAndre Przywara <andre.przywara@arm.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Signed-off-by: NWill Deacon <will@kernel.org>

82e40f55

27 8月, 2019 1 次提交

KVM: arm/arm64: vgic: Fix potential deadlock when ap_list is long · d4a8061a

由 Heyi Guo 提交于 8月 27, 2019

If the ap_list is longer than 256 entries, merge_final() in list_sort()
will call the comparison callback with the same element twice, causing
a deadlock in vgic_irq_cmp().

Fix it by returning early when irqa == irqb.

Cc: stable@vger.kernel.org # 4.7+
Fixes: 8e444745 ("KVM: arm/arm64: vgic-new: Add IRQ sorting")
Signed-off-by: NZenghui Yu <yuzenghui@huawei.com>
Signed-off-by: NHeyi Guo <guoheyi@huawei.com>
[maz: massaged commit log and patch, added Fixes and Cc-stable]
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Signed-off-by: NWill Deacon <will@kernel.org>

d4a8061a

25 8月, 2019 2 次提交

KVM: arm/arm64: vgic: Use a single IO device per redistributor · 3109741a

由 Eric Auger 提交于 8月 23, 2019

At the moment we use 2 IO devices per GICv3 redistributor: one
one for the RD_base frame and one for the SGI_base frame.

Instead we can use a single IO device per redistributor (the 2
frames are contiguous). This saves slots on the KVM_MMIO_BUS
which is currently limited to NR_IOBUS_DEVS (1000).

This change allows to instantiate up to 512 redistributors and may
speed the guest boot with a large number of VCPUs.
Signed-off-by: NEric Auger <eric.auger@redhat.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>

3109741a

KVM: arm/arm64: vgic: Remove spurious semicolons · 926c6156

由 Marc Zyngier 提交于 8月 25, 2019

Detected by Coccinelle (and Will Deacon) using
scripts/coccinelle/misc/semicolon.cocci.
Reported-by: NWill Deacon <will@kernel.org>
Signed-off-by: NMarc Zyngier <maz@kernel.org>

926c6156

24 8月, 2019 1 次提交

KVM: arm/arm64: VGIC: Properly initialise private IRQ affinity · 2e16f3e9

由 Andre Przywara 提交于 8月 23, 2019

At the moment we initialise the target *mask* of a virtual IRQ to the
VCPU it belongs to, even though this mask is only defined for GICv2 and
quickly runs out of bits for many GICv3 guests.
This behaviour triggers an UBSAN complaint for more than 32 VCPUs:
------
[ 5659.462377] UBSAN: Undefined behaviour in virt/kvm/arm/vgic/vgic-init.c:223:21
[ 5659.471689] shift exponent 32 is too large for 32-bit type 'unsigned int'
------
Also for GICv3 guests the reporting of TARGET in the "vgic-state" debugfs
dump is wrong, due to this very same problem.

Because there is no requirement to create the VGIC device before the
VCPUs (and QEMU actually does it the other way round), we can't safely
initialise mpidr or targets in kvm_vgic_vcpu_init(). But since we touch
every private IRQ for each VCPU anyway later (in vgic_init()), we can
just move the initialisation of those fields into there, where we
definitely know the VGIC type.

On the way make sure we really have either a VGICv2 or a VGICv3 device,
since the existing code is just checking for "VGICv3 or not", silently
ignoring the uninitialised case.
Signed-off-by: NAndre Przywara <andre.przywara@arm.com>
Reported-by: NDave Martin <dave.martin@arm.com>
Tested-by: NJulien Grall <julien.grall@arm.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>

2e16f3e9

22 8月, 2019 1 次提交

KVM: arm/arm64: Only skip MMIO insn once · 2113c5f6

由 Andrew Jones 提交于 8月 22, 2019

If after an MMIO exit to userspace a VCPU is immediately run with an
immediate_exit request, such as when a signal is delivered or an MMIO
emulation completion is needed, then the VCPU completes the MMIO
emulation and immediately returns to userspace. As the exit_reason
does not get changed from KVM_EXIT_MMIO in these cases we have to
be careful not to complete the MMIO emulation again, when the VCPU is
eventually run again, because the emulation does an instruction skip
(and doing too many skips would be a waste of guest code :-) We need
to use additional VCPU state to track if the emulation is complete.
As luck would have it, we already have 'mmio_needed', which even
appears to be used in this way by other architectures already.

Fixes: 0d640732 ("arm64: KVM: Skip MMIO insn after emulation")
Acked-by: NMark Rutland <mark.rutland@arm.com>
Signed-off-by: NAndrew Jones <drjones@redhat.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>

2113c5f6

19 8月, 2019 12 次提交

KVM: Call kvm_arch_vcpu_blocking early into the blocking sequence · 07ab0f8d

由 Marc Zyngier 提交于 8月 02, 2019

When a vpcu is about to block by calling kvm_vcpu_block, we call
back into the arch code to allow any form of synchronization that
may be required at this point (SVN stops the AVIC, ARM synchronises
the VMCR and enables GICv4 doorbells). But this synchronization
comes in quite late, as we've potentially waited for halt_poll_ns
to expire.

Instead, let's move kvm_arch_vcpu_blocking() to the beginning of
kvm_vcpu_block(), which on ARM has several benefits:

- VMCR gets synchronised early, meaning that any interrupt delivered
  during the polling window will be evaluated with the correct guest
  PMR
- GICv4 doorbells are enabled, which means that any guest interrupt
  directly injected during that window will be immediately recognised

Tang Nianyao ran some tests on a GICv4 machine to evaluate such
change, and reported up to a 10% improvement for netperf:

<quote>
	netperf result:
	D06 as server, intel 8180 server as client
	with change:
	package 512 bytes - 5500 Mbits/s
	package 64 bytes - 760 Mbits/s
	without change:
	package 512 bytes - 5000 Mbits/s
	package 64 bytes - 710 Mbits/s
</quote>
Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>

07ab0f8d

KVM: arm/arm64: vgic: Make function comments match function declarations · 0ed5f5d6

由 Alexandru Elisei 提交于 8月 15, 2019

Since commit 503a6286 ("KVM: arm/arm64: vgic: Rely on the GIC driver to
parse the firmware tables"), the vgic_v{2,3}_probe functions stopped using
a DT node. Commit 90977732 ("KVM: arm/arm64: vgic-new: vgic_init:
implement kvm_vgic_hyp_init") changed the functions again, and now they
require exactly one argument, a struct gic_kvm_info populated by the GIC
driver. Unfortunately the comments regressed and state that a DT node is
used instead. Change the function comments to reflect the current
prototypes.
Signed-off-by: NAlexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>

0ed5f5d6

KVM: arm/arm64: vgic-irqfd: Implement kvm_arch_set_irq_inatomic · 41108170

由 Marc Zyngier 提交于 3月 18, 2019

Now that we have a cache of MSI->LPI translations, it is pretty
easy to implement kvm_arch_set_irq_inatomic (this cache can be
parsed without sleeping).

Hopefully, this will improve some LPI-heavy workloads.
Tested-by: NAndre Przywara <andre.przywara@arm.com>
Reviewed-by: NEric Auger <eric.auger@redhat.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>

41108170

KVM: arm/arm64: vgic-its: Check the LPI translation cache on MSI injection · 86a7dae8

由 Marc Zyngier 提交于 3月 18, 2019

When performing an MSI injection, let's first check if the translation
is already in the cache. If so, let's inject it quickly without
going through the whole translation process.
Tested-by: NAndre Przywara <andre.przywara@arm.com>
Reviewed-by: NEric Auger <eric.auger@redhat.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>

86a7dae8

KVM: arm/arm64: vgic-its: Cache successful MSI->LPI translation · 89489ee9

由 Marc Zyngier 提交于 3月 18, 2019

On a successful translation, preserve the parameters in the LPI
translation cache. Each translation is reusing the last slot
in the list, naturally evicting the least recently used entry.
Tested-by: NAndre Przywara <andre.przywara@arm.com>
Reviewed-by: NEric Auger <eric.auger@redhat.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>

89489ee9

KVM: arm/arm64: vgic-its: Invalidate MSI-LPI translation cache on vgic teardown · cbfda481

由 Marc Zyngier 提交于 6月 10, 2019

In order to avoid leaking vgic_irq structures on teardown, we need to
drop all references to LPIs before deallocating the cache itself.

This is done by invalidating the cache on vgic teardown.
Tested-by: NAndre Przywara <andre.przywara@arm.com>
Reviewed-by: NEric Auger <eric.auger@redhat.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>

cbfda481

KVM: arm/arm64: vgic-its: Invalidate MSI-LPI translation cache on ITS disable · 363518f3

由 Marc Zyngier 提交于 5月 22, 2019

If an ITS gets disabled, we need to make sure that further interrupts
won't hit in the cache. For that, we invalidate the translation cache
when the ITS is disabled.
Tested-by: NAndre Przywara <andre.przywara@arm.com>
Reviewed-by: NEric Auger <eric.auger@redhat.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>

363518f3

KVM: arm/arm64: vgic-its: Invalidate MSI-LPI translation cache on disabling LPIs · b4931afc

由 Marc Zyngier 提交于 5月 22, 2019

If a vcpu disables LPIs at its redistributor level, we need to make sure
we won't pend more interrupts. For this, we need to invalidate the LPI
translation cache.
Tested-by: NAndre Przywara <andre.przywara@arm.com>
Reviewed-by: NEric Auger <eric.auger@redhat.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>

b4931afc

KVM: arm/arm64: vgic-its: Invalidate MSI-LPI translation cache on specific commands · 0c144848

由 Marc Zyngier 提交于 3月 18, 2019

The LPI translation cache needs to be discarded when an ITS command
may affect the translation of an LPI (DISCARD, MAPC and MAPD with V=0)
or the routing of an LPI to a redistributor with disabled LPIs (MOVI,
MOVALL).

We decide to perform a full invalidation of the cache, irrespective
of the LPI that is affected. Commands are supposed to be rare enough
that it doesn't matter.
Tested-by: NAndre Przywara <andre.przywara@arm.com>
Reviewed-by: NEric Auger <eric.auger@redhat.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>

0c144848

KVM: arm/arm64: vgic-its: Add MSI-LPI translation cache invalidation · 7d825fd6

由 Marc Zyngier 提交于 6月 10, 2019

There's a number of cases where we need to invalidate the caching
of translations, so let's add basic support for that.
Tested-by: NAndre Przywara <andre.przywara@arm.com>
Reviewed-by: NEric Auger <eric.auger@redhat.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>

7d825fd6

KVM: arm/arm64: vgic: Add __vgic_put_lpi_locked primitive · 1bb3691d

由 Marc Zyngier 提交于 3月 18, 2019

Our LPI translation cache needs to be able to drop the refcount
on an LPI whilst already holding the lpi_list_lock.

Let's add a new primitive for this.
Tested-by: NAndre Przywara <andre.przywara@arm.com>
Reviewed-by: NEric Auger <eric.auger@redhat.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>

1bb3691d

KVM: arm/arm64: vgic: Add LPI translation cache definition · 24cab82c

由 Marc Zyngier 提交于 3月 18, 2019

Add the basic data structure that expresses an MSI to LPI
translation as well as the allocation/release hooks.

The size of the cache is arbitrarily defined as 16*nr_vcpus.
Tested-by: NAndre Przywara <andre.przywara@arm.com>
Reviewed-by: NEric Auger <eric.auger@redhat.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>

24cab82c

09 8月, 2019 2 次提交

kvm: remove unnecessary PageReserved check · 8f946da7

由 Paolo Bonzini 提交于 8月 05, 2019

The same check is already done in kvm_is_reserved_pfn.
Reviewed-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

8f946da7

KVM: arm/arm64: vgic: Reevaluate level sensitive interrupts on enable · 16e604a4

由 Alexandru Elisei 提交于 8月 07, 2019

A HW mapped level sensitive interrupt asserted by a device will not be put
into the ap_list if it is disabled at the VGIC level. When it is enabled
again, it will be inserted into the ap_list and written to a list register
on guest entry regardless of the state of the device.

We could argue that this can also happen on real hardware, when the command
to enable the interrupt reached the GIC before the device had the chance to
de-assert the interrupt signal; however, we emulate the distributor and
redistributors in software and we can do better than that.
Signed-off-by: NAlexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>

16e604a4

05 8月, 2019 5 次提交

KVM: arm/arm64: Sync ICH_VMCR_EL2 back when about to block · 5eeaf10e

由 Marc Zyngier 提交于 8月 02, 2019

Since commit commit 328e5664 ("KVM: arm/arm64: vgic: Defer
touching GICH_VMCR to vcpu_load/put"), we leave ICH_VMCR_EL2 (or
its GICv2 equivalent) loaded as long as we can, only syncing it
back when we're scheduled out.

There is a small snag with that though: kvm_vgic_vcpu_pending_irq(),
which is indirectly called from kvm_vcpu_check_block(), needs to
evaluate the guest's view of ICC_PMR_EL1. At the point were we
call kvm_vcpu_check_block(), the vcpu is still loaded, and whatever
changes to PMR is not visible in memory until we do a vcpu_put().

Things go really south if the guest does the following:

	mov x0, #0	// or any small value masking interrupts
	msr ICC_PMR_EL1, x0

	[vcpu preempted, then rescheduled, VMCR sampled]

	mov x0, #ff	// allow all interrupts
	msr ICC_PMR_EL1, x0
	wfi		// traps to EL2, so samping of VMCR

	[interrupt arrives just after WFI]

Here, the hypervisor's view of PMR is zero, while the guest has enabled
its interrupts. kvm_vgic_vcpu_pending_irq() will then say that no
interrupts are pending (despite an interrupt being received) and we'll
block for no reason. If the guest doesn't have a periodic interrupt
firing once it has blocked, it will stay there forever.

To avoid this unfortuante situation, let's resync VMCR from
kvm_arch_vcpu_blocking(), ensuring that a following kvm_vcpu_check_block()
will observe the latest value of PMR.

This has been found by booting an arm64 Linux guest with the pseudo NMI
feature, and thus using interrupt priorities to mask interrupts instead
of the usual PSTATE masking.

Cc: stable@vger.kernel.org # 4.12
Fixes: 328e5664 ("KVM: arm/arm64: vgic: Defer touching GICH_VMCR to vcpu_load/put")
Signed-off-by: NMarc Zyngier <maz@kernel.org>

5eeaf10e

KVM: no need to check return value of debugfs_create functions · 3e7093d0

由 Greg KH 提交于 7月 31, 2019

When calling debugfs functions, there is no need to ever check the
return value.  The function can work or not, but the code logic should
never do something different based on this.

Also, when doing this, change kvm_arch_create_vcpu_debugfs() to return
void instead of an integer, as we should not care at all about if this
function actually does anything or not.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: <x86@kernel.org>
Cc: <kvm@vger.kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

3e7093d0

KVM: remove kvm_arch_has_vcpu_debugfs() · 741cbbae

由 Paolo Bonzini 提交于 8月 03, 2019

There is no need for this function as all arches have to implement
kvm_arch_create_vcpu_debugfs() no matter what.  A #define symbol
let us actually simplify the code.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

741cbbae

KVM: Fix leak vCPU's VMCS value into other pCPU · 17e433b5

由 Wanpeng Li 提交于 8月 05, 2019

After commit d73eb57b (KVM: Boost vCPUs that are delivering interrupts), a
five years old bug is exposed. Running ebizzy benchmark in three 80 vCPUs VMs
on one 80 pCPUs Skylake server, a lot of rcu_sched stall warning splatting
in the VMs after stress testing:

 INFO: rcu_sched detected stalls on CPUs/tasks: { 4 41 57 62 77} (detected by 15, t=60004 jiffies, g=899, c=898, q=15073)
 Call Trace:
   flush_tlb_mm_range+0x68/0x140
   tlb_flush_mmu.part.75+0x37/0xe0
   tlb_finish_mmu+0x55/0x60
   zap_page_range+0x142/0x190
   SyS_madvise+0x3cd/0x9c0
   system_call_fastpath+0x1c/0x21

swait_active() sustains to be true before finish_swait() is called in
kvm_vcpu_block(), voluntarily preempted vCPUs are taken into account
by kvm_vcpu_on_spin() loop greatly increases the probability condition
kvm_arch_vcpu_runnable(vcpu) is checked and can be true, when APICv
is enabled the yield-candidate vCPU's VMCS RVI field leaks(by
vmx_sync_pir_to_irr()) into spinning-on-a-taken-lock vCPU's current
VMCS.

This patch fixes it by checking conservatively a subset of events.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Marc Zyngier <Marc.Zyngier@arm.com>
Cc: stable@vger.kernel.org
Fixes: 98f4a146 (KVM: add kvm_arch_vcpu_runnable() test to kvm_vcpu_on_spin() loop)
Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

17e433b5

KVM: Check preempted_in_kernel for involuntary preemption · 046ddeed

由 Wanpeng Li 提交于 8月 01, 2019

preempted_in_kernel is updated in preempt_notifier when involuntary preemption
ocurrs, it can be stale when the voluntarily preempted vCPUs are taken into
account by kvm_vcpu_on_spin() loop. This patch lets it just check preempted_in_kernel
for involuntary preemption.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

046ddeed

26 7月, 2019 1 次提交

KVM: arm: vgic-v3: Mark expected switch fall-through · 1a8248c7

由 Anders Roxell 提交于 7月 26, 2019

When fall-through warnings was enabled by default the following warnings
was starting to show up:

../virt/kvm/arm/hyp/vgic-v3-sr.c: In function ‘__vgic_v3_save_aprs’:
../virt/kvm/arm/hyp/vgic-v3-sr.c:351:24: warning: this statement may fall
 through [-Wimplicit-fallthrough=]
   cpu_if->vgic_ap0r[2] = __vgic_v3_read_ap0rn(2);
   ~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~
../virt/kvm/arm/hyp/vgic-v3-sr.c:352:2: note: here
  case 6:
  ^~~~
../virt/kvm/arm/hyp/vgic-v3-sr.c:353:24: warning: this statement may fall
 through [-Wimplicit-fallthrough=]
   cpu_if->vgic_ap0r[1] = __vgic_v3_read_ap0rn(1);
   ~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~
../virt/kvm/arm/hyp/vgic-v3-sr.c:354:2: note: here
  default:
  ^~~~~~~

Rework so that the compiler doesn't warn about fall-through.

Fixes: d93512ef0f0e ("Makefile: Globally enable fall-through warning")
Signed-off-by: NAnders Roxell <anders.roxell@linaro.org>
Signed-off-by: NMarc Zyngier <maz@kernel.org>

1a8248c7

24 7月, 2019 1 次提交

Documentation: move Documentation/virtual to Documentation/virt · 2f5947df

由 Christoph Hellwig 提交于 7月 24, 2019

Renaming docs seems to be en vogue at the moment, so fix on of the
grossly misnamed directories.  We usually never use "virtual" as
a shortcut for virtualization in the kernel, but always virt,
as seen in the virt/ top-level directory.  Fix up the documentation
to match that.

Fixes: ed16648e ("Move kvm, uml, and lguest subdirectories under a common "virtual" directory, I.E:")
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

2f5947df

23 7月, 2019 1 次提交

KVM: arm/arm64: Introduce kvm_pmu_vcpu_init() to setup PMU counter index · bca031e2

由 Zenghui Yu 提交于 7月 18, 2019

We use "pmc->idx" and the "chained" bitmap to determine if the pmc is
chained, in kvm_pmu_pmc_is_chained().  But idx might be uninitialized
(and random) when we doing this decision, through a KVM_ARM_VCPU_INIT
ioctl -> kvm_pmu_vcpu_reset(). And the test_bit() against this random
idx will potentially hit a KASAN BUG [1].

In general, idx is the static property of a PMU counter that is not
expected to be modified across resets, as suggested by Julien.  It
looks more reasonable if we can setup the PMU counter idx for a vcpu
in its creation time. Introduce a new function - kvm_pmu_vcpu_init()
for this basic setup. Oh, and the KASAN BUG will get fixed this way.

[1] https://www.spinics.net/lists/kvm-arm/msg36700.html

Fixes: 80f393a2 ("KVM: arm/arm64: Support chained PMU counters")
Suggested-by: NAndrew Murray <andrew.murray@arm.com>
Suggested-by: NJulien Thierry <julien.thierry@arm.com>
Acked-by: NJulien Thierry <julien.thierry@arm.com>
Signed-off-by: NZenghui Yu <yuzenghui@huawei.com>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>

bca031e2

20 7月, 2019 1 次提交

KVM: Boost vCPUs that are delivering interrupts · d73eb57b

由 Wanpeng Li 提交于 7月 18, 2019

Inspired by commit 9cac38dd (KVM/s390: Set preempted flag during
vcpu wakeup and interrupt delivery), we want to also boost not just
lock holders but also vCPUs that are delivering interrupts. Most
smp_call_function_many calls are synchronous, so the IPI target vCPUs
are also good yield candidates.  This patch introduces vcpu->ready to
boost vCPUs during wakeup and interrupt delivery time; unlike s390 we do
not reuse vcpu->preempted so that voluntarily preempted vCPUs are taken
into account by kvm_vcpu_on_spin, but vmx_vcpu_pi_put is not affected
(VT-d PI handles voluntary preemption separately, in pi_pre_block).

Testing on 80 HT 2 socket Xeon Skylake server, with 80 vCPUs VM 80GB RAM:
ebizzy -M

            vanilla     boosting    improved
1VM          21443       23520         9%
2VM           2800        8000       180%
3VM           1800        3100        72%

Testing on my Haswell desktop 8 HT, with 8 vCPUs VM 8GB RAM, two VMs,
one running ebizzy -M, the other running 'stress --cpu 2':

w/ boosting + w/o pv sched yield(vanilla)

            vanilla     boosting   improved
              1570         4000      155%

w/ boosting + w/ pv sched yield(vanilla)

            vanilla     boosting   improved
              1844         5157      179%

w/o boosting, perf top in VM:

 72.33%  [kernel]       [k] smp_call_function_many
  4.22%  [kernel]       [k] call_function_i
  3.71%  [kernel]       [k] async_page_fault

w/ boosting, perf top in VM:

 38.43%  [kernel]       [k] smp_call_function_many
  6.31%  [kernel]       [k] async_page_fault
  6.13%  libc-2.23.so   [.] __memcpy_avx_unaligned
  4.88%  [kernel]       [k] call_function_interrupt

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Marc Zyngier <maz@kernel.org>
Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d73eb57b

13 7月, 2019 1 次提交

arm64: switch to generic version of pte allocation · 50f11a8a

由 Mike Rapoport 提交于 7月 11, 2019

The PTE allocations in arm64 are identical to the generic ones modulo the
GFP flags.

Using the generic pte_alloc_one() functions ensures that the user page
tables are allocated with __GFP_ACCOUNT set.

The arm64 definition of PGALLOC_GFP is removed and replaced with
GFP_PGTABLE_USER for p[gum]d_alloc_one() for the user page tables and
GFP_PGTABLE_KERNEL for the kernel page tables. The KVM memory cache is now
using GFP_PGTABLE_USER.

The mappings created with create_pgd_mapping() are now using
GFP_PGTABLE_KERNEL.

The conversion to the generic version of pte_free_kernel() removes the NULL
check for pte.

The pte_free() version on arm64 is identical to the generic one and
can be simply dropped.

[cai@lca.pw: fix a bogus GFP flag in pgd_alloc()]
  Link: https://lore.kernel.org/r/1559656836-24940-1-git-send-email-cai@lca.pw/
[and fix it more]
  Link: https://lore.kernel.org/linux-mm/20190617151252.GF16810@rapoport-lnx/
Link: http://lkml.kernel.org/r/1557296232-15361-5-git-send-email-rppt@linux.ibm.comSigned-off-by: NMike Rapoport <rppt@linux.ibm.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Guo Ren <guoren@kernel.org>
Cc: Guo Ren <ren_guo@c-sky.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Ley Foon Tan <lftan@altera.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Sam Creasey <sammy@sammy.net>
Cc: Vincent Chen <deanbo422@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

50f11a8a

10 7月, 2019 1 次提交

KVM: Properly check if "page" is valid in kvm_vcpu_unmap · b614c602

由 KarimAllah Ahmed 提交于 7月 10, 2019

The field "page" is initialized to KVM_UNMAPPED_PAGE when it is not used
(i.e. when the memory lives outside kernel control). So this check will
always end up using kunmap even for memremap regions.

Fixes: e45adf66 ("KVM: Introduce a new guest mapping API")
Cc: stable@vger.kernel.org
Signed-off-by: NKarimAllah Ahmed <karahmed@amazon.de>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b614c602

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功