提交 · 822ca7f82b21822bb7435a6d76feffe60a86ec40 · openeuler / Kernel

15 5月, 2022 1 次提交

KVM: arm64: vgic-v3: List M1 Pro/Max as requiring the SEIS workaround · cae88930

由 Marc Zyngier 提交于 5月 14, 2022

Unsusprisingly, Apple M1 Pro/Max have the exact same defect as the
original M1 and generate random SErrors in the host when a guest
tickles the GICv3 CPU interface the wrong way.

Add the part numbers for both the CPU types found in these two
new implementations, and add them to the hall of shame. This also
applies to the Ultra version, as it is composed of 2 Max SoCs.
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20220514102524.3188730-1-maz@kernel.org

cae88930

22 1月, 2022 1 次提交

KVM: arm64: vgic-v3: Restrict SEIS workaround to known broken systems · d11a327e

由 Marc Zyngier 提交于 1月 21, 2022

Contrary to what df652bcf ("KVM: arm64: vgic-v3: Work around GICv3
locally generated SErrors") was asserting, there is at least one other
system out there (Cavium ThunderX2) implementing SEIS, and not in
an obviously broken way.

So instead of imposing the M1 workaround on an innocent bystander,
let's limit it to the two known broken Apple implementations.

Fixes: df652bcf ("KVM: arm64: vgic-v3: Work around GICv3 locally generated SErrors")
Reported-by: NArd Biesheuvel <ardb@kernel.org>
Tested-by: NArd Biesheuvel <ardb@kernel.org>
Acked-by: NArd Biesheuvel <ardb@kernel.org>
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20220122103912.795026-1-maz@kernel.org

d11a327e

16 12月, 2021 1 次提交

KVM: arm64: vgic: Demote userspace-triggered console prints to kvm_debug() · 440523b9

由 Marc Zyngier 提交于 12月 16, 2021

Running the KVM selftests results in these messages being dumped
in the kernel console:

[  188.051073] kvm [469]: VGIC redist and dist frames overlap
[  188.056820] kvm [469]: VGIC redist and dist frames overlap
[  188.076199] kvm [469]: VGIC redist and dist frames overlap

Being amle to trigger this from userspace is definitely not on,
so demote these warnings to kvm_debug().
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211216104507.1482017-1-maz@kernel.org

440523b9

15 12月, 2021 1 次提交

KVM: arm64: pkvm: Disable GICv2 support · a770ee80

由 Quentin Perret 提交于 12月 08, 2021

GICv2 requires having device mappings in guests and the hypervisor,
which is incompatible with the current pKVM EL2 page ownership model
which only covers memory. While it would be desirable to support pKVM
with GICv2, this will require a lot more work, so let's make the
current assumption clear until then.
Co-developed-by: NMarc Zyngier <maz@kernel.org>
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Signed-off-by: NQuentin Perret <qperret@google.com>
Acked-by: NWill Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20211208152300.2478542-3-qperret@google.com

a770ee80

08 12月, 2021 1 次提交

KVM: Use 'unsigned long' as kvm_for_each_vcpu()'s index · 46808a4c

由 Marc Zyngier 提交于 11月 16, 2021

Everywhere we use kvm_for_each_vpcu(), we use an int as the vcpu
index. Unfortunately, we're about to move rework the iterator,
which requires this to be upgrade to an unsigned long.

Let's bite the bullet and repaint all of it in one go.
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Message-Id: <20211116160403.4074052-7-maz@kernel.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

46808a4c

17 10月, 2021 2 次提交

KVM: arm64: vgic-v3: Reduce common group trapping to ICV_DIR_EL1 when possible · 0924729b

由 Marc Zyngier 提交于 10月 10, 2021

On systems that advertise ICH_VTR_EL2.SEIS, we trap all GICv3 sysreg
accesses from the guest. From a performance perspective, this is OK
as long as the guest doesn't hammer the GICv3 CPU interface.

In most cases, this is fine, unless the guest actively uses
priorities and switches PMR_EL1 very often. Which is exactly what
happens when a Linux guest runs with irqchip.gicv3_pseudo_nmi=1.
In these condition, the performance plumets as we hit PMR each time
we mask/unmask interrupts. Not good.

There is however an opportunity for improvement. Careful reading
of the architecture specification indicates that the only GICv3
sysreg belonging to the common group (which contains the SGI
registers, PMR, DIR, CTLR and RPR) that is allowed to generate
a SError is DIR. Everything else is safe.

It is thus possible to substitute the trapping of all the common
group with just that of DIR if it supported by the implementation.
Yes, that's yet another optional bit of the architecture.
So let's just do that, as it leads to some impressive result on
the M1:

Without this change:
	bash-5.1# /host/home/maz/hackbench 100 process 1000
	Running with 100*40 (== 4000) tasks.
	Time: 56.596

With this change:
	bash-5.1# /host/home/maz/hackbench 100 process 1000
	Running with 100*40 (== 4000) tasks.
	Time: 8.649

which is a pretty convincing result.
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Reviewed-by: NAlexandru Elisei <alexandru.elisei@arm.com>
Link: https://lore.kernel.org/r/20211010150910.2911495-4-maz@kernel.org

0924729b

KVM: arm64: vgic-v3: Work around GICv3 locally generated SErrors · df652bcf

由 Marc Zyngier 提交于 10月 10, 2021

The infamous M1 has a feature nobody else ever implemented,
in the form of the "GIC locally generated SError interrupts",
also known as SEIS for short.

These SErrors are generated when a guest does something that violates
the GIC state machine. It would have been simpler to just *ignore*
the damned thing, but that's not what this HW does. Oh well.

This part of of the architecture is also amazingly under-specified.
There is a whole 10 lines that describe the feature in a spec that
is 930 pages long, and some of these lines are factually wrong.
Oh, and it is deprecated, so the insentive to clarify it is low.

Now, the spec says that this should be a *virtual* SError when
HCR_EL2.AMO is set. As it turns out, that's not always the case
on this CPU, and the SError sometimes fires on the host as a
physical SError. Goodbye, cruel world. This clearly is a HW bug,
and it means that a guest can easily take the host down, on demand.

Thankfully, we have seen systems that were just as broken in the
past, and we have the perfect vaccine for it.

Apple M1, please meet the Cavium ThunderX workaround. All your
GIC accesses will be trapped, sanitised, and emulated. Only the
signalling aspect of the HW will be used. It won't be super speedy,
but it will at least be safe. You're most welcome.

Given that this has only ever been seen on this single implementation,
that the spec is unclear at best and that we cannot trust it to ever
be implemented correctly, gate the workaround solely on ICH_VTR_EL2.SEIS
being set.
Tested-by: NJoey Gouly <joey.gouly@arm.com>
Reviewed-by: NAlexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211010150910.2911495-3-maz@kernel.org

df652bcf

11 10月, 2021 1 次提交

KVM: arm64: vgic-v3: Check redist region is not above the VM IPA size · 4612d98f

由 Ricardo Koller 提交于 10月 04, 2021

Verify that the redistributor regions do not extend beyond the
VM-specified IPA range (phys_size). This can happen when using
KVM_VGIC_V3_ADDR_TYPE_REDIST or KVM_VGIC_V3_ADDR_TYPE_REDIST_REGIONS
with:

  base + size > phys_size AND base < phys_size

Add the missing check into vgic_v3_alloc_redist_region() which is called
when setting the regions, and into vgic_v3_check_base() which is called
when attempting the first vcpu-run. The vcpu-run check does not apply to
KVM_VGIC_V3_ADDR_TYPE_REDIST_REGIONS because the regions size is known
before the first vcpu-run. Note that using the REDIST_REGIONS API
results in a different check, which already exists, at first vcpu run:
that the number of redist regions is enough for all vcpus.

Finally, this patch also enables some extra tests in
vgic_v3_alloc_redist_region() by calculating "size" early for the legacy
redist api: like checking that the REDIST region can fit all the already
created vcpus.
Reviewed-by: NEric Auger <eric.auger@redhat.com>
Signed-off-by: NRicardo Koller <ricarkol@google.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211005011921.437353-3-ricarkol@google.com

4612d98f

20 8月, 2021 1 次提交

KVM: arm64: vgic: Resample HW pending state on deactivation · 3134cc8b

由 Marc Zyngier 提交于 8月 19, 2021

When a mapped level interrupt (a timer, for example) is deactivated
by the guest, the corresponding host interrupt is equally deactivated.
However, the fate of the pending state still needs to be dealt
with in SW.

This is specially true when the interrupt was in the active+pending
state in the virtual distributor at the point where the guest
was entered. On exit, the pending state is potentially stale
(the guest may have put the interrupt in a non-pending state).

If we don't do anything, the interrupt will be spuriously injected
in the guest. Although this shouldn't have any ill effect (spurious
interrupts are always possible), we can improve the emulation by
detecting the deactivation-while-pending case and resample the
interrupt.

While we're at it, move the logic into a common helper that can
be shared between the two GIC implementations.

Fixes: e40cc57b ("KVM: arm/arm64: vgic: Support level-triggered mapped interrupts")
Reported-by: NRaghavendra Rao Ananta <rananta@google.com>
Tested-by: NRaghavendra Rao Ananta <rananta@google.com>
Reviewed-by: NOliver Upton <oupton@google.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20210819180305.1670525-1-maz@kernel.org

3134cc8b

01 6月, 2021 1 次提交

KVM: arm64: vgic: Implement SW-driven deactivation · 354920e7

由 Marc Zyngier 提交于 3月 15, 2021

In order to deal with these systems that do not offer HW-based
deactivation of interrupts, let implement a SW-based approach:

- When the irq is queued into a LR, treat it as a pure virtual
  interrupt and set the EOI flag in the LR.

- When the interrupt state is read back from the LR, force a
  deactivation when the state is invalid (neither active nor
  pending)

Interrupts requiring such treatment get the VGIC_SW_RESAMPLE flag.
Signed-off-by: NMarc Zyngier <maz@kernel.org>

354920e7

25 3月, 2021 1 次提交

KVM: arm64: GICv4.1: Try to save VLPI state in save_pending_tables · f66b7b15

由 Shenming Lu 提交于 3月 22, 2021

After pausing all vCPUs and devices capable of interrupting, in order
to save the states of all interrupts, besides flushing the states in
kvm’s vgic, we also try to flush the states of VLPIs in the virtual
pending tables into guest RAM, but we need to have GICv4.1 and safely
unmap the vPEs first.

As for the saving of VSGIs, which needs the vPEs to be mapped and might
conflict with the saving of VLPIs, but since we will map the vPEs back
at the end of save_pending_tables and both savings require the kvm->lock
to be held (thus only happen serially), it will work fine.
Signed-off-by: NShenming Lu <lushenming@huawei.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210322060158.1584-5-lushenming@huawei.com

f66b7b15

06 3月, 2021 2 次提交

KVM: arm64: Workaround firmware wrongly advertising GICv2-on-v3 compatibility · 9739f6ef

由 Marc Zyngier 提交于 3月 05, 2021

It looks like we have broken firmware out there that wrongly advertises
a GICv2 compatibility interface, despite the CPUs not being able to deal
with it.

To work around this, check that the CPU initialising KVM is actually able
to switch to MMIO instead of system registers, and use that as a
precondition to enable GICv2 compatibility in KVM.

Note that the detection happens on a single CPU. If the firmware is
lying *and* that the CPUs are asymetric, all hope is lost anyway.
Reported-by: NShameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
Tested-by: NShameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Message-Id: <20210305185254.3730990-8-maz@kernel.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

9739f6ef

KVM: arm64: Rename __vgic_v3_get_ich_vtr_el2() to __vgic_v3_get_gic_config() · b9d699e2

由 Marc Zyngier 提交于 3月 05, 2021

As we are about to report a bit more information to the rest of
the kernel, rename __vgic_v3_get_ich_vtr_el2() to the more
explicit __vgic_v3_get_gic_config().

No functional change.
Tested-by: NShameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Message-Id: <20210305185254.3730990-7-maz@kernel.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b9d699e2

27 12月, 2020 1 次提交

KVM: arm64: Consolidate dist->ready setting into kvm_vgic_map_resources() · 101068b5

由 Marc Zyngier 提交于 12月 27, 2020

dist->ready setting is pointlessly spread across the two vgic
backends, while it could be consolidated in kvm_vgic_map_resources().

Move it there, and slightly simplify the flows in both backends.
Suggested-by: NEric Auger <eric.auger@redhat.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>

101068b5

24 12月, 2020 1 次提交

KVM: arm64: Move double-checked lock to kvm_vgic_map_resources() · 1c91f06d

由 Alexandru Elisei 提交于 12月 01, 2020

kvm_vgic_map_resources() is called when a VCPU if first run and it maps all
the VGIC MMIO regions. To prevent double-initialization, the VGIC uses the
ready variable to keep track of the state of resources and the global KVM
mutex to protect against concurrent accesses. After the lock is taken, the
variable is checked again in case another VCPU took the lock between the
current VCPU reading ready equals false and taking the lock.

The double-checked lock pattern is spread across four different functions:
in kvm_vcpu_first_run_init(), in kvm_vgic_map_resource() and in
vgic_{v2,v3}_map_resources(), which makes it hard to reason about and
introduces minor code duplication. Consolidate the checks in
kvm_vgic_map_resources(), where the lock is taken.

No functional change intended.
Signed-off-by: NAlexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20201201150157.223625-4-alexandru.elisei@arm.com

1c91f06d

16 9月, 2020 1 次提交

KVM: arm64: nVHE: Fix pointers during SMCCC convertion · a071261d

由 Andrew Scull 提交于 9月 15, 2020

The host need not concern itself with the pointer differences for the
hyp interfaces that are shared between VHE and nVHE so leave it to the
hyp to handle.

As the SMCCC function IDs are converted into function calls, it is a
suitable place to also convert any pointer arguments into hyp pointers.
This, additionally, eases the reuse of the handlers in different
contexts.
Signed-off-by: NAndrew Scull <ascull@google.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20200915104643.2543892-20-ascull@google.com

a071261d

28 5月, 2020 1 次提交

KVM: arm64: vgic-v3: Take cpu_if pointer directly instead of vcpu · fc5d1f1a

由 Christoffer Dall 提交于 12月 01, 2018

If we move the used_lrs field to the version-specific cpu interface
structure, the following functions only operate on the struct
vgic_v3_cpu_if and not the full vcpu:

  __vgic_v3_save_state
  __vgic_v3_restore_state
  __vgic_v3_activate_traps
  __vgic_v3_deactivate_traps
  __vgic_v3_save_aprs
  __vgic_v3_restore_aprs

This is going to be very useful for nested virt, so move the used_lrs
field and change the prototypes and implementations of these functions to
take the cpu_if parameter directly.

No functional change.
Reviewed-by: NJames Morse <james.morse@arm.com>
Signed-off-by: NChristoffer Dall <christoffer.dall@arm.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>

fc5d1f1a

16 5月, 2020 2 次提交

KVM: Fix spelling in code comments · 656012c7

由 Fuad Tabba 提交于 4月 01, 2020

Fix spelling and typos (e.g., repeated words) in comments.
Signed-off-by: NFuad Tabba <tabba@google.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20200401140310.29701-1-tabba@google.com

656012c7

KVM: arm64: Move virt/kvm/arm to arch/arm64 · 9ed24f4b

由 Marc Zyngier 提交于 5月 13, 2020

Now that the 32bit KVM/arm host is a distant memory, let's move the
whole of the KVM/arm64 code into the arm64 tree.

As they said in the song: Welcome Home (Sanitarium).
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Acked-by: NWill Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20200513104034.74741-1-maz@kernel.org

9ed24f4b

24 3月, 2020 2 次提交

KVM: arm64: GICv4.1: Plumb SGI implementation selection in the distributor · 2291ff2f

由 Marc Zyngier 提交于 3月 04, 2020

The GICv4.1 architecture gives the hypervisor the option to let
the guest choose whether it wants the good old SGIs with an
active state, or the new, HW-based ones that do not have one.

For this, plumb the configuration of SGIs into the GICv3 MMIO
handling, present the GICD_TYPER2.nASSGIcap to the guest,
and handle the GICD_CTLR.nASSGIreq setting.

In order to be able to deal with the restore of a guest, also
apply the GICD_CTLR.nASSGIreq setting at first run so that we
can move the restored SGIs to the HW if that's what the guest
had selected in a previous life.
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Reviewed-by: NZenghui Yu <yuzenghui@huawei.com>
Link: https://lore.kernel.org/r/20200304203330.4967-21-maz@kernel.org

2291ff2f

irqchip/gic-v4.1: Move doorbell management to the GICv4 abstraction layer · ae699ad3

由 Marc Zyngier 提交于 3月 04, 2020

In order to hide some of the differences between v4.0 and v4.1, move
the doorbell management out of the KVM code, and into the GICv4-specific
layer. This allows the calling code to ask for the doorbell when blocking,
and otherwise to leave the doorbell permanently disabled.

This matches the v4.1 code perfectly, and only results in a minor
refactoring of the v4.0 code.
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Reviewed-by: NZenghui Yu <yuzenghui@huawei.com>
Link: https://lore.kernel.org/r/20200304203330.4967-14-maz@kernel.org

ae699ad3

29 10月, 2019 3 次提交

KVM: arm/arm64: vgic: Don't rely on the wrong pending table · ca185b26

由 Zenghui Yu 提交于 10月 29, 2019

It's possible that two LPIs locate in the same "byte_offset" but target
two different vcpus, where their pending status are indicated by two
different pending tables.  In such a scenario, using last_byte_offset
optimization will lead KVM relying on the wrong pending table entry.
Let us use last_ptr instead, which can be treated as a byte index into
a pending table and also, can be vcpu specific.

Fixes: 28077125 ("KVM: arm64: vgic-v3: KVM_DEV_ARM_VGIC_SAVE_PENDING_TABLES")
Cc: stable@vger.kernel.org
Signed-off-by: NZenghui Yu <yuzenghui@huawei.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Acked-by: NEric Auger <eric.auger@redhat.com>
Link: https://lore.kernel.org/r/20191029071919.177-4-yuzenghui@huawei.com

ca185b26

KVM: arm/arm64: vgic: Fix some comments typo · bad36e4e

由 Zenghui Yu 提交于 10月 29, 2019

Fix various comments, including wrong function names, grammar mistakes
and specification references.
Signed-off-by: NZenghui Yu <yuzenghui@huawei.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20191029071919.177-3-yuzenghui@huawei.com

bad36e4e

KVM: arm64: vgic-v4: Move the GICv4 residency flow to be driven by vcpu_load/put · 8e01d9a3

由 Marc Zyngier 提交于 10月 27, 2019

When the VHE code was reworked, a lot of the vgic stuff was moved around,
but the GICv4 residency code did stay untouched, meaning that we come
in and out of residency on each flush/sync, which is obviously suboptimal.

To address this, let's move things around a bit:

- Residency entry (flush) moves to vcpu_load
- Residency exit (sync) moves to vcpu_put
- On blocking (entry to WFI), we "put"
- On unblocking (exit from WFI), we "load"

Because these can nest (load/block/put/load/unblock/put, for example),
we now have per-VPE tracking of the residency state.

Additionally, vgic_v4_put gains a "need doorbell" parameter, which only
gets set to true when blocking because of a WFI. This allows a finer
control of the doorbell, which now also gets disabled as soon as
it gets signaled.
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20191027144234.8395-2-maz@kernel.org

8e01d9a3

28 8月, 2019 1 次提交

KVM: arm/arm64: vgic-v2: Handle SGI bits in GICD_I{S,C}PENDR0 as WI · 82e40f55

由 Marc Zyngier 提交于 8月 28, 2019

A guest is not allowed to inject a SGI (or clear its pending state)
by writing to GICD_ISPENDR0 (resp. GICD_ICPENDR0), as these bits are
defined as WI (as per ARM IHI 0048B 4.3.7 and 4.3.8).

Make sure we correctly emulate the architecture.

Fixes: 96b29800 ("KVM: arm/arm64: vgic-new: Add PENDING registers handlers")
Cc: stable@vger.kernel.org # 4.7+
Reported-by: NAndre Przywara <andre.przywara@arm.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Signed-off-by: NWill Deacon <will@kernel.org>

82e40f55

19 8月, 2019 1 次提交

KVM: arm/arm64: vgic: Make function comments match function declarations · 0ed5f5d6

由 Alexandru Elisei 提交于 8月 15, 2019

Since commit 503a6286 ("KVM: arm/arm64: vgic: Rely on the GIC driver to
parse the firmware tables"), the vgic_v{2,3}_probe functions stopped using
a DT node. Commit 90977732 ("KVM: arm/arm64: vgic-new: vgic_init:
implement kvm_vgic_hyp_init") changed the functions again, and now they
require exactly one argument, a struct gic_kvm_info populated by the GIC
driver. Unfortunately the comments regressed and state that a DT node is
used instead. Change the function comments to reflect the current
prototypes.
Signed-off-by: NAlexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>

0ed5f5d6

05 8月, 2019 1 次提交

KVM: arm/arm64: Sync ICH_VMCR_EL2 back when about to block · 5eeaf10e

由 Marc Zyngier 提交于 8月 02, 2019

Since commit commit 328e5664 ("KVM: arm/arm64: vgic: Defer
touching GICH_VMCR to vcpu_load/put"), we leave ICH_VMCR_EL2 (or
its GICv2 equivalent) loaded as long as we can, only syncing it
back when we're scheduled out.

There is a small snag with that though: kvm_vgic_vcpu_pending_irq(),
which is indirectly called from kvm_vcpu_check_block(), needs to
evaluate the guest's view of ICC_PMR_EL1. At the point were we
call kvm_vcpu_check_block(), the vcpu is still loaded, and whatever
changes to PMR is not visible in memory until we do a vcpu_put().

Things go really south if the guest does the following:

	mov x0, #0	// or any small value masking interrupts
	msr ICC_PMR_EL1, x0

	[vcpu preempted, then rescheduled, VMCR sampled]

	mov x0, #ff	// allow all interrupts
	msr ICC_PMR_EL1, x0
	wfi		// traps to EL2, so samping of VMCR

	[interrupt arrives just after WFI]

Here, the hypervisor's view of PMR is zero, while the guest has enabled
its interrupts. kvm_vgic_vcpu_pending_irq() will then say that no
interrupts are pending (despite an interrupt being received) and we'll
block for no reason. If the guest doesn't have a periodic interrupt
firing once it has blocked, it will stay there forever.

To avoid this unfortuante situation, let's resync VMCR from
kvm_arch_vcpu_blocking(), ensuring that a following kvm_vcpu_check_block()
will observe the latest value of PMR.

This has been found by booting an arm64 Linux guest with the pseudo NMI
feature, and thus using interrupt priorities to mask interrupts instead
of the usual PSTATE masking.

Cc: stable@vger.kernel.org # 4.12
Fixes: 328e5664 ("KVM: arm/arm64: vgic: Defer touching GICH_VMCR to vcpu_load/put")
Signed-off-by: NMarc Zyngier <maz@kernel.org>

5eeaf10e

19 6月, 2019 1 次提交

treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 234 · caab277b

由 Thomas Gleixner 提交于 6月 03, 2019

Based on 1 normalized pattern(s):

  this program is free software you can redistribute it and or modify
  it under the terms of the gnu general public license version 2 as
  published by the free software foundation this program is
  distributed in the hope that it will be useful but without any
  warranty without even the implied warranty of merchantability or
  fitness for a particular purpose see the gnu general public license
  for more details you should have received a copy of the gnu general
  public license along with this program if not see http www gnu org
  licenses

extracted by the scancode license scanner the SPDX license identifier

  GPL-2.0-only

has been chosen to replace the boilerplate/reference in 503 file(s).
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NAlexios Zavras <alexios.zavras@intel.com>
Reviewed-by: NAllison Randal <allison@lohutok.net>
Reviewed-by: NEnrico Weigelt <info@metux.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190602204653.811534538@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

caab277b

20 3月, 2019 1 次提交

KVM: arm/arm64: vgic-its: Take the srcu lock when writing to guest memory · a6ecfb11

由 Marc Zyngier 提交于 3月 19, 2019

When halting a guest, QEMU flushes the virtual ITS caches, which
amounts to writing to the various tables that the guest has allocated.

When doing this, we fail to take the srcu lock, and the kernel
shouts loudly if running a lockdep kernel:

[   69.680416] =============================
[   69.680819] WARNING: suspicious RCU usage
[   69.681526] 5.1.0-rc1-00008-g600025238f51-dirty #18 Not tainted
[   69.682096] -----------------------------
[   69.682501] ./include/linux/kvm_host.h:605 suspicious rcu_dereference_check() usage!
[   69.683225]
[   69.683225] other info that might help us debug this:
[   69.683225]
[   69.683975]
[   69.683975] rcu_scheduler_active = 2, debug_locks = 1
[   69.684598] 6 locks held by qemu-system-aar/4097:
[   69.685059]  #0: 0000000034196013 (&kvm->lock){+.+.}, at: vgic_its_set_attr+0x244/0x3a0
[   69.686087]  #1: 00000000f2ed935e (&its->its_lock){+.+.}, at: vgic_its_set_attr+0x250/0x3a0
[   69.686919]  #2: 000000005e71ea54 (&vcpu->mutex){+.+.}, at: lock_all_vcpus+0x64/0xd0
[   69.687698]  #3: 00000000c17e548d (&vcpu->mutex){+.+.}, at: lock_all_vcpus+0x64/0xd0
[   69.688475]  #4: 00000000ba386017 (&vcpu->mutex){+.+.}, at: lock_all_vcpus+0x64/0xd0
[   69.689978]  #5: 00000000c2c3c335 (&vcpu->mutex){+.+.}, at: lock_all_vcpus+0x64/0xd0
[   69.690729]
[   69.690729] stack backtrace:
[   69.691151] CPU: 2 PID: 4097 Comm: qemu-system-aar Not tainted 5.1.0-rc1-00008-g600025238f51-dirty #18
[   69.691984] Hardware name: rockchip evb_rk3399/evb_rk3399, BIOS 2019.04-rc3-00124-g2feec69fb1 03/15/2019
[   69.692831] Call trace:
[   69.694072]  lockdep_rcu_suspicious+0xcc/0x110
[   69.694490]  gfn_to_memslot+0x174/0x190
[   69.694853]  kvm_write_guest+0x50/0xb0
[   69.695209]  vgic_its_save_tables_v0+0x248/0x330
[   69.695639]  vgic_its_set_attr+0x298/0x3a0
[   69.696024]  kvm_device_ioctl_attr+0x9c/0xd8
[   69.696424]  kvm_device_ioctl+0x8c/0xf8
[   69.696788]  do_vfs_ioctl+0xc8/0x960
[   69.697128]  ksys_ioctl+0x8c/0xa0
[   69.697445]  __arm64_sys_ioctl+0x28/0x38
[   69.697817]  el0_svc_common+0xd8/0x138
[   69.698173]  el0_svc_handler+0x38/0x78
[   69.698528]  el0_svc+0x8/0xc

The fix is to obviously take the srcu lock, just like we do on the
read side of things since bf308242. One wonders why this wasn't
fixed at the same time, but hey...

Fixes: bf308242 ("KVM: arm/arm64: VGIC/ITS: protect kvm_read_guest() calls with SRCU lock")
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>

a6ecfb11

20 2月, 2019 1 次提交

arm/arm64: KVM: Introduce kvm_call_hyp_ret() · 7aa8d146

由 Marc Zyngier 提交于 1月 05, 2019

Until now, we haven't differentiated between HYP calls that
have a return value and those who don't. As we're about to
change this, introduce kvm_call_hyp_ret(), and change all
call sites that actually make use of a return value.
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
Acked-by: NChristoffer Dall <christoffer.dall@arm.com>
Signed-off-by: NChristoffer Dall <christoffer.dall@arm.com>

7aa8d146

24 1月, 2019 1 次提交

KVM: arm/arm64: vgic: Make vgic_irq->irq_lock a raw_spinlock · 8fa3adb8

由 Julien Thierry 提交于 1月 07, 2019

vgic_irq->irq_lock must always be taken with interrupts disabled as
it is used in interrupt context.

For configurations such as PREEMPT_RT_FULL, this means that it should
be a raw_spinlock since RT spinlocks are interruptible.
Signed-off-by: NJulien Thierry <julien.thierry@arm.com>
Acked-by: NChristoffer Dall <christoffer.dall@arm.com>
Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NChristoffer Dall <christoffer.dall@arm.com>

8fa3adb8

12 8月, 2018 1 次提交

KVM: arm/arm64: vgic: Do not use spin_lock_irqsave/restore with irq disabled · d0823cb3

由 Jia He 提交于 8月 03, 2018

kvm_vgic_sync_hwstate is only called with IRQ being disabled.
There is thus no need to call spin_lock_irqsave/restore in
vgic_fold_lr_state and vgic_prune_ap_list.

This patch replace them with the non irq-safe version.
Signed-off-by: NJia He <jia.he@hxt-semitech.com>
Acked-by: NChristoffer Dall <christoffer.dall@arm.com>
[maz: commit message tidy-up]
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>

d0823cb3

21 7月, 2018 1 次提交

KVM: arm/arm64: vgic: Signal IRQs using their configured group · 87322099

由 Christoffer Dall 提交于 7月 16, 2018

Now when we have a group configuration on the struct IRQ, use this state
when populating the LR and signaling interrupts as either group 0 or
group 1 to the VM.  Depending on the model of the emulated GIC, and the
guest's configuration of the VMCR, interrupts may be signaled as IRQs or
FIQs to the VM.
Reviewed-by: NAndrew Jones <drjones@redhat.com>
Signed-off-by: NChristoffer Dall <christoffer.dall@arm.com>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>

87322099

21 6月, 2018 1 次提交

KVM: arm/arm64: Drop resource size check for GICV window · ba56bc3a

由 Ard Biesheuvel 提交于 6月 01, 2018

When booting a 64 KB pages kernel on a ACPI GICv3 system that
implements support for v2 emulation, the following warning is
produced

  GICV size 0x2000 not a multiple of page size 0x10000

and support for v2 emulation is disabled, preventing GICv2 VMs
from being able to run on such hosts.

The reason is that vgic_v3_probe() performs a sanity check on the
size of the window (it should be a multiple of the page size),
while the ACPI MADT parsing code hardcodes the size of the window
to 8 KB. This makes sense, considering that ACPI does not bother
to describe the size in the first place, under the assumption that
platforms implementing ACPI will follow the architecture and not
put anything else in the same 64 KB window.

So let's just drop the sanity check altogether, and assume that
the window is at least 64 KB in size.

Fixes: 90977732 ("KVM: arm/arm64: vgic-new: vgic_init: implement kvm_vgic_hyp_init")
Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>

ba56bc3a

25 5月, 2018 5 次提交

KVM: arm/arm64: Implement KVM_VGIC_V3_ADDR_TYPE_REDIST_REGION · 04c11093

由 Eric Auger 提交于 5月 22, 2018

Now all the internals are ready to handle multiple redistributor
regions, let's allow the userspace to register them.
Signed-off-by: NEric Auger <eric.auger@redhat.com>
Reviewed-by: NChristoffer Dall <christoffer.dall@arm.com>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>

04c11093

KVM: arm/arm64: Check all vcpu redistributors are set on map_resources · c957a6d6

由 Eric Auger 提交于 5月 22, 2018

On vcpu first run, we eventually know the actual number of vcpus.
This is a synchronization point to check all redistributors
were assigned. On kvm_vgic_map_resources() we check both dist and
redist were set, eventually check potential base address inconsistencies.
Signed-off-by: NEric Auger <eric.auger@redhat.com>
Reviewed-by: NChristoffer Dall <christoffer.dall@arm.com>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>

c957a6d6

KVM: arm/arm64: Adapt vgic_v3_check_base to multiple rdist regions · 028bf278

由 Eric Auger 提交于 5月 22, 2018

vgic_v3_check_base() currently only handles the case of a unique
legacy redistributor region whose size is not explicitly set but
inferred, instead, from the number of online vcpus.

We adapt it to handle the case of multiple redistributor regions
with explicitly defined size. We rely on two new helpers:
- vgic_v3_rdist_overlap() is used to detect overlap with the dist
  region if defined
- vgic_v3_rd_region_size computes the size of the redist region,
  would it be a legacy unique region or a new explicitly sized
  region.
Signed-off-by: NEric Auger <eric.auger@redhat.com>
Reviewed-by: NChristoffer Dall <christoffer.dall@arm.com>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>

028bf278

KVM: arm/arm64: Helper to locate free rdist index · dc524619

由 Eric Auger 提交于 5月 22, 2018

We introduce vgic_v3_rdist_free_slot to help identifying
where we can place a new 2x64KB redistributor.
Signed-off-by: NEric Auger <eric.auger@redhat.com>
Reviewed-by: NChristoffer Dall <christoffer.dall@arm.com>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>

dc524619

KVM: arm/arm64: Replace the single rdist region by a list · dbd9733a

由 Eric Auger 提交于 5月 22, 2018

At the moment KVM supports a single rdist region. We want to
support several separate rdist regions so let's introduce a list
of them. This patch currently only cares about a single
entry in this list as the functionality to register several redist
regions is not yet there. So this only translates the existing code
into something functionally similar using that new data struct.

The redistributor region handle is stored in the vgic_cpu structure
to allow later computation of the TYPER last bit.
Signed-off-by: NEric Auger <eric.auger@redhat.com>
Reviewed-by: NChristoffer Dall <christoffer.dall@arm.com>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>

dbd9733a

15 5月, 2018 1 次提交

KVM: arm/arm64: VGIC/ITS save/restore: protect kvm_read_guest() calls · 711702b5

由 Andre Przywara 提交于 5月 11, 2018

kvm_read_guest() will eventually look up in kvm_memslots(), which requires
either to hold the kvm->slots_lock or to be inside a kvm->srcu critical
section.
In contrast to x86 and s390 we don't take the SRCU lock on every guest
exit, so we have to do it individually for each kvm_read_guest() call.
Use the newly introduced wrapper for that.

Cc: Stable <stable@vger.kernel.org> # 4.12+
Reported-by: NJan Glauber <jan.glauber@caviumnetworks.com>
Signed-off-by: NAndre Przywara <andre.przywara@arm.com>
Acked-by: NChristoffer Dall <christoffer.dall@arm.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

711702b5

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功