提交 · 6f2f10cabe73944488a62df16695c86e20d4c3f9 · openeuler / Kernel

15 6月, 2017 1 次提交
- M
  
  Merge branch 'kvmarm-master/master' into HEAD · 6f2f10ca
  由 Marc Zyngier 提交于 6月 15, 2017
  
  6f2f10ca
08 6月, 2017 9 次提交

KVM: arm/arm64: Don't assume initialized vgic when setting PMU IRQ · ebb127f2

由 Christoffer Dall 提交于 5月 16, 2017

The PMU IRQ number is set through the VCPU device's KVM_SET_DEVICE_ATTR
ioctl handler for the KVM_ARM_VCPU_PMU_V3_IRQ attribute, but there is no
enforced or stated requirement that this must happen after initializing
the VGIC. As a result, calling vgic_valid_spi() which relies on the
nr_spis being set during the VGIC init can incorrectly fail.

Introduce irq_is_spi, which determines if an IRQ number is within the
SPI range without verifying it against the actual VGIC properties.
Signed-off-by: NChristoffer Dall <cdall@linaro.org>
Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com>

ebb127f2

KVM: arm/arm64: Disallow userspace control of in-kernel IRQ lines · cb3f0ad8

由 Christoffer Dall 提交于 5月 16, 2017

When injecting an IRQ to the VGIC, you now have to present an owner
token for that IRQ line to show that you are the owner of that line.

IRQ lines driven from userspace or via an irqfd do not have an owner and
will simply pass a NULL pointer.

Also get rid of the unused kvm_vgic_inject_mapped_irq prototype.
Signed-off-by: NChristoffer Dall <cdall@linaro.org>
Acked-by: NMarc Zyngier <marc.zyngier@arm.com>

cb3f0ad8

KVM: arm/arm64: Check if irq lines to the GIC are already used · abcb851d

由 Christoffer Dall 提交于 5月 04, 2017

We check if other in-kernel devices have already been connected to the
GIC for a particular interrupt line when possible.

For the PMU, we can do this whenever setting the PMU interrupt number
from userspace.

For the timers, we have to wait until we try to enable the timer,
because we have a concept of default IRQ numbers that userspace
shouldn't have to work around in the initialization phase.
Signed-off-by: NChristoffer Dall <cdall@linaro.org>
Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com>

abcb851d

KVM: arm/arm64: Introduce an allocator for in-kernel irq lines · c6ccd30e

由 Christoffer Dall 提交于 5月 04, 2017

Having multiple devices being able to signal the same interrupt line is
very confusing and almost certainly guarantees a configuration error.

Therefore, introduce a very simple allocator which allows a device to
claim an interrupt line from the vgic for a given VM.
Signed-off-by: NChristoffer Dall <cdall@linaro.org>
Acked-by: NMarc Zyngier <marc.zyngier@arm.com>

c6ccd30e

KVM: arm/arm64: Allow setting the timer IRQ numbers from userspace · 99a1db7a

由 Christoffer Dall 提交于 5月 02, 2017

First we define an ABI using the vcpu devices that lets userspace set
the interrupt numbers for the various timers on both the 32-bit and
64-bit KVM/ARM implementations.

Second, we add the definitions for the groups and attributes introduced
by the above ABI.  (We add the PMU define on the 32-bit side as well for
symmetry and it may get used some day.)

Third, we set up the arch-specific vcpu device operation handlers to
call into the timer code for anything related to the
KVM_ARM_VCPU_TIMER_CTRL group.

Fourth, we implement support for getting and setting the timer interrupt
numbers using the above defined ABI in the arch timer code.

Fifth, we introduce error checking upon enabling the arch timer (which
is called when first running a VCPU) to check that all VCPUs are
configured to use the same PPI for the timer (as mandated by the
architecture) and that the virtual and physical timers are not
configured to use the same IRQ number.
Signed-off-by: NChristoffer Dall <cdall@linaro.org>
Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com>

99a1db7a

KVM: arm/arm64: Move timer IRQ default init to arch_timer.c · 85e69ad7

由 Christoffer Dall 提交于 5月 02, 2017

We currently initialize the arch timer IRQ numbers from the reset code,
presumably because we once intended to model multiple CPU or SoC types
from within the kernel and have hard-coded reset values in the reset
code.

As we are moving towards userspace being in charge of more fine-grained
CPU emulation and stitching together the pieces needed to emulate a
particular type of CPU, we should no longer have a tight coupling
between resetting a VCPU and setting IRQ numbers.

Therefore, move the logic to define and use the default IRQ numbers to
the timer code and set the IRQ number immediately when creating the
VCPU.
Signed-off-by: NChristoffer Dall <cdall@linaro.org>
Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com>

85e69ad7

KVM: arm/arm64: Move irq_is_ppi() to header file · 3cba4af3

由 Christoffer Dall 提交于 5月 02, 2017

We are about to need this define in the arch timer code as well so move
it to a common location.
Signed-off-by: NChristoffer Dall <cdall@linaro.org>
Acked-by: NMarc Zyngier <marc.zyngier@arm.com>

3cba4af3

KVM: arm: Handle VCPU device attributes in guest.c · 2227e439

由 Christoffer Dall 提交于 5月 02, 2017

As we are about to support VCPU attributes to set the timer IRQ numbers
in guest.c, move the static inlines for the VCPU attributes handlers
from the header file to guest.c.
Signed-off-by: NChristoffer Dall <cdall@linaro.org>
Acked-by: NMarc Zyngier <marc.zyngier@arm.com>

2227e439

KVM: arm64: Allow creating the PMU without the in-kernel GIC · a2befacf

由 Christoffer Dall 提交于 5月 02, 2017

Since we got support for devices in userspace which allows reporting the
PMU overflow output status to userspace, we should actually allow
creating the PMU on systems without an in-kernel irqchip, which in turn
requires us to slightly clarify error codes for the ABI and move things
around for the initialization phase.
Signed-off-by: NChristoffer Dall <cdall@linaro.org>
Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com>

a2befacf

07 6月, 2017 3 次提交

arm: KVM: Allow unaligned accesses at HYP · 33b5c388

由 Marc Zyngier 提交于 6月 06, 2017

We currently have the HSCTLR.A bit set, trapping unaligned accesses
at HYP, but we're not really prepared to deal with it.

Since the rest of the kernel is pretty happy about that, let's follow
its example and set HSCTLR.A to zero. Modern CPUs don't really care.

Cc: stable@vger.kernel.org
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NChristoffer Dall <cdall@linaro.org>

33b5c388

arm64: KVM: Allow unaligned accesses at EL2 · 78fd6dcf

由 Marc Zyngier 提交于 6月 06, 2017

We currently have the SCTLR_EL2.A bit set, trapping unaligned accesses
at EL2, but we're not really prepared to deal with it. So far, this
has been unnoticed, until GCC 7 started emitting those (in particular
64bit writes on a 32bit boundary).

Since the rest of the kernel is pretty happy about that, let's follow
its example and set SCTLR_EL2.A to zero. Modern CPUs don't really
care.

Cc: stable@vger.kernel.org
Reported-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NChristoffer Dall <cdall@linaro.org>

78fd6dcf

arm64: KVM: Preserve RES1 bits in SCTLR_EL2 · d68c1f7f

由 Marc Zyngier 提交于 6月 06, 2017

__do_hyp_init has the rather bad habit of ignoring RES1 bits and
writing them back as zero. On a v8.0-8.2 CPU, this doesn't do anything
bad, but may end-up being pretty nasty on future revisions of the
architecture.

Let's preserve those bits so that we don't have to fix this later on.

Cc: stable@vger.kernel.org
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NChristoffer Dall <cdall@linaro.org>

d68c1f7f

06 6月, 2017 2 次提交

KVM: arm/arm64: Handle possible NULL stage2 pud when ageing pages · d6dbdd3c

由 Marc Zyngier 提交于 6月 05, 2017

Under memory pressure, we start ageing pages, which amounts to parsing
the page tables. Since we don't want to allocate any extra level,
we pass NULL for our private allocation cache. Which means that
stage2_get_pud() is allowed to fail. This results in the following
splat:

[ 1520.409577] Unable to handle kernel NULL pointer dereference at virtual address 00000008
[ 1520.417741] pgd = ffff810f52fef000
[ 1520.421201] [00000008] *pgd=0000010f636c5003, *pud=0000010f56f48003, *pmd=0000000000000000
[ 1520.429546] Internal error: Oops: 96000006 [#1] PREEMPT SMP
[ 1520.435156] Modules linked in:
[ 1520.438246] CPU: 15 PID: 53550 Comm: qemu-system-aar Tainted: G        W       4.12.0-rc4-00027-g1885c397eaec #7205
[ 1520.448705] Hardware name: FOXCONN R2-1221R-A4/C2U4N_MB, BIOS G31FB12A 10/26/2016
[ 1520.463726] task: ffff800ac5fb4e00 task.stack: ffff800ce04e0000
[ 1520.469666] PC is at stage2_get_pmd+0x34/0x110
[ 1520.474119] LR is at kvm_age_hva_handler+0x44/0xf0
[ 1520.478917] pc : [<ffff0000080b137c>] lr : [<ffff0000080b149c>] pstate: 40000145
[ 1520.486325] sp : ffff800ce04e33d0
[ 1520.489644] x29: ffff800ce04e33d0 x28: 0000000ffff40064
[ 1520.494967] x27: 0000ffff27e00000 x26: 0000000000000000
[ 1520.500289] x25: ffff81051ba65008 x24: 0000ffff40065000
[ 1520.505618] x23: 0000ffff40064000 x22: 0000000000000000
[ 1520.510947] x21: ffff810f52b20000 x20: 0000000000000000
[ 1520.516274] x19: 0000000058264000 x18: 0000000000000000
[ 1520.521603] x17: 0000ffffa6fe7438 x16: ffff000008278b70
[ 1520.526940] x15: 000028ccd8000000 x14: 0000000000000008
[ 1520.532264] x13: ffff7e0018298000 x12: 0000000000000002
[ 1520.537582] x11: ffff000009241b93 x10: 0000000000000940
[ 1520.542908] x9 : ffff0000092ef800 x8 : 0000000000000200
[ 1520.548229] x7 : ffff800ce04e36a8 x6 : 0000000000000000
[ 1520.553552] x5 : 0000000000000001 x4 : 0000000000000000
[ 1520.558873] x3 : 0000000000000000 x2 : 0000000000000008
[ 1520.571696] x1 : ffff000008fd5000 x0 : ffff0000080b149c
[ 1520.577039] Process qemu-system-aar (pid: 53550, stack limit = 0xffff800ce04e0000)
[...]
[ 1521.510735] [<ffff0000080b137c>] stage2_get_pmd+0x34/0x110
[ 1521.516221] [<ffff0000080b149c>] kvm_age_hva_handler+0x44/0xf0
[ 1521.522054] [<ffff0000080b0610>] handle_hva_to_gpa+0xb8/0xe8
[ 1521.527716] [<ffff0000080b3434>] kvm_age_hva+0x44/0xf0
[ 1521.532854] [<ffff0000080a58b0>] kvm_mmu_notifier_clear_flush_young+0x70/0xc0
[ 1521.539992] [<ffff000008238378>] __mmu_notifier_clear_flush_young+0x88/0xd0
[ 1521.546958] [<ffff00000821eca0>] page_referenced_one+0xf0/0x188
[ 1521.552881] [<ffff00000821f36c>] rmap_walk_anon+0xec/0x250
[ 1521.558370] [<ffff000008220f78>] rmap_walk+0x78/0xa0
[ 1521.563337] [<ffff000008221104>] page_referenced+0x164/0x180
[ 1521.569002] [<ffff0000081f1af0>] shrink_active_list+0x178/0x3b8
[ 1521.574922] [<ffff0000081f2058>] shrink_node_memcg+0x328/0x600
[ 1521.580758] [<ffff0000081f23f4>] shrink_node+0xc4/0x328
[ 1521.585986] [<ffff0000081f2718>] do_try_to_free_pages+0xc0/0x340
[ 1521.592000] [<ffff0000081f2a64>] try_to_free_pages+0xcc/0x240
[...]

The trivial fix is to handle this NULL pud value early, rather than
dereferencing it blindly.

Cc: stable@vger.kernel.org
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
Reviewed-by: NChristoffer Dall <cdall@linaro.org>
Signed-off-by: NChristoffer Dall <cdall@linaro.org>

d6dbdd3c

KVM: arm/arm64: vgic-v3: Fix nr_pre_bits bitfield extraction · d68356cc

由 Christoffer Dall 提交于 6月 04, 2017

We used to extract PRIbits from the ICH_VT_EL2 which was the upper field
in the register word, so a mask wasn't necessary, but as we switched to
looking at PREbits, which is bits 26 through 28 with the PRIbits field
being potentially non-zero, we really need to mask off the field value,
otherwise fun things may happen.
Signed-off-by: NChristoffer Dall <cdall@linaro.org>
Acked-by: NMarc Zyngier <marc.zyngier@arm.com>

d68356cc

04 6月, 2017 12 次提交

KVM: arm/arm64: timer: remove request-less vcpu kick · 1b6502e5

由 Andrew Jones 提交于 6月 04, 2017

The timer work is only scheduled for a VCPU when that VCPU is
blocked. This means we only need to wake it up, not kick (IPI)
it. While calling kvm_vcpu_kick() would just do the wake up,
and not kick, anyway, let's change this to avoid request-less
vcpu kicks, as they're generally not a good idea (see
"Request-less VCPU Kicks" in
Documentation/virtual/kvm/vcpu-requests.rst)
Signed-off-by: NAndrew Jones <drjones@redhat.com>
Reviewed-by: NChristoffer Dall <cdall@linaro.org>
Signed-off-by: NChristoffer Dall <cdall@linaro.org>

1b6502e5

KVM: arm/arm64: PMU: remove request-less vcpu kick · b7484931

由 Andrew Jones 提交于 6月 04, 2017

Refactor PMU overflow handling in order to remove the request-less
vcpu kick.  Now, since kvm_vgic_inject_irq() uses vcpu requests,
there should be no chance that a kick sent at just the wrong time
(between the VCPU's call to kvm_pmu_flush_hwstate() and before it
enters guest mode) results in a failure for the guest to see updated
GIC state until its next exit some time later for some other reason.
Signed-off-by: NAndrew Jones <drjones@redhat.com>
Reviewed-by: NChristoffer Dall <cdall@linaro.org>
Signed-off-by: NChristoffer Dall <cdall@linaro.org>

b7484931

KVM: arm/arm64: use vcpu requests for irq injection · 325f9c64

由 Andrew Jones 提交于 6月 04, 2017

Don't use request-less VCPU kicks when injecting IRQs, as a VCPU
kick meant to trigger the interrupt injection could be sent while
the VCPU is outside guest mode, which means no IPI is sent, and
after it has called kvm_vgic_flush_hwstate(), meaning it won't see
the updated GIC state until its next exit some time later for some
other reason.  The receiving VCPU only needs to check this request
in VCPU RUN to handle it.  By checking it, if it's pending, a
memory barrier will be issued that ensures all state is visible.
See "Ensuring Requests Are Seen" of
Documentation/virtual/kvm/vcpu-requests.rst
Signed-off-by: NAndrew Jones <drjones@redhat.com>
Reviewed-by: NChristoffer Dall <cdall@linaro.org>
Signed-off-by: NChristoffer Dall <cdall@linaro.org>

325f9c64

KVM: arm/arm64: change exit request to sleep request · 7b244e2b

由 Andrew Jones 提交于 6月 04, 2017

A request called EXIT is too generic. All requests are meant to cause
exits, but different requests have different flags. Let's not make
it difficult to decide if the EXIT request is correct for some case
by just always providing unique requests for each case. This patch
changes EXIT to SLEEP, because that's what the request is asking the
VCPU to do.
Signed-off-by: NAndrew Jones <drjones@redhat.com>
Acked-by: NChristoffer Dall <cdall@linaro.org>
Signed-off-by: NChristoffer Dall <cdall@linaro.org>

7b244e2b

KVM: arm/arm64: optimize VCPU RUN · 424c989b

由 Andrew Jones 提交于 6月 04, 2017

We can make a small optimization by not checking the state of
the power_off field on each run. This is done by treating
power_off like pause, only checking it when we get the EXIT
VCPU request. When a VCPU powers off another VCPU the EXIT
request is already made, so we just need to make sure the
request is also made on self power off. kvm_vcpu_kick() isn't
necessary for these cases, as the VCPU would just be kicking
itself, but we add it anyway as a self kick doesn't cost much,
and it makes the code more future-proof.
Signed-off-by: NAndrew Jones <drjones@redhat.com>
Reviewed-by: NChristoffer Dall <cdall@linaro.org>
Signed-off-by: NChristoffer Dall <cdall@linaro.org>

424c989b

KVM: arm/arm64: use vcpu requests for power_off · cc9b43f9

由 Andrew Jones 提交于 6月 04, 2017

System shutdown is currently using request-less VCPU kicks. This
leaves open a tiny race window, as it doesn't ensure the state
change to power_off is seen by a VCPU just about to enter guest
mode. VCPU requests, OTOH, are guaranteed to be seen (see "Ensuring
Requests Are Seen" of Documentation/virtual/kvm/vcpu-requests.rst)
This patch applies the EXIT request used by pause to power_off,
fixing the race.
Signed-off-by: NAndrew Jones <drjones@redhat.com>
Reviewed-by: NChristoffer Dall <cdall@linaro.org>
Signed-off-by: NChristoffer Dall <cdall@linaro.org>

cc9b43f9

KVM: arm/arm64: replace pause checks with vcpu request checks · 0592c005

由 Andrew Jones 提交于 6月 04, 2017

The current use of KVM_REQ_VCPU_EXIT for pause is fine.  Even the
requester clearing the request is OK, as this is the special case
where the sole requesting thread and receiving VCPU are executing
synchronously (see "Clearing Requests" in
Documentation/virtual/kvm/vcpu-requests.rst) However, that's about
to change, so let's ensure only the receiving VCPU clears the
request. Additionally, by guaranteeing KVM_REQ_VCPU_EXIT is always
set when pause is, we can avoid checking pause directly in VCPU RUN.
Signed-off-by: NAndrew Jones <drjones@redhat.com>
Reviewed-by: NChristoffer Dall <cdall@linaro.org>
Signed-off-by: NChristoffer Dall <cdall@linaro.org>

0592c005

KVM: arm/arm64: properly use vcpu requests · 6a6d73be

由 Andrew Jones 提交于 6月 04, 2017

arm/arm64 already has one VCPU request used when setting pause,
but it doesn't properly check requests in VCPU RUN. Check it
and also make sure we set vcpu->mode at the appropriate time
(before the check) and with the appropriate barriers. See
Documentation/virtual/kvm/vcpu-requests.rst. Also make sure we
don't leave any vcpu requests we don't intend to handle later
set in the request bitmap. If we don't clear them, then
kvm_request_pending() may return true when it shouldn't.

Using VCPU requests properly fixes a small race where pause
could get set just as a VCPU was entering guest mode.
Signed-off-by: NAndrew Jones <drjones@redhat.com>
Reviewed-by: NChristoffer Dall <cdall@linaro.org>
Signed-off-by: NChristoffer Dall <cdall@linaro.org>

6a6d73be

KVM: Add documentation for VCPU requests · 3bb96149

由 Andrew Jones 提交于 6月 04, 2017

Signed-off-by: NAndrew Jones <drjones@redhat.com>
Acked-by: NChristoffer Dall <cdall@linaro.org>
Signed-off-by: NChristoffer Dall <cdall@linaro.org>

3bb96149

KVM: add kvm_request_pending · 2fa6e1e1

由 Radim Krčmář 提交于 6月 04, 2017

A first step in vcpu->requests encapsulation.  Additionally, we now
use READ_ONCE() when accessing vcpu->requests, which ensures we
always load vcpu->requests when it's accessed.  This is important as
other threads can change it any time.  Also, READ_ONCE() documents
that vcpu->requests is used with other threads, likely requiring
memory barriers, which it does.
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
[ Documented the new use of READ_ONCE() and converted another check
  in arch/mips/kvm/vz.c ]
Signed-off-by: NAndrew Jones <drjones@redhat.com>
Acked-by: NChristoffer Dall <cdall@linaro.org>
Signed-off-by: NChristoffer Dall <cdall@linaro.org>

2fa6e1e1

KVM: improve arch vcpu request defining · 2387149e

由 Andrew Jones 提交于 6月 04, 2017

Marc Zyngier suggested that we define the arch specific VCPU request
base, rather than requiring each arch to remember to start from 8.
That suggestion, along with Radim Krcmar's recent VCPU request flag
addition, snowballed into defining something of an arch VCPU request
defining API.

No functional change.

(Looks like x86 is running out of arch VCPU request bits.  Maybe
 someday we'll need to extend to 64.)
Signed-off-by: NAndrew Jones <drjones@redhat.com>
Acked-by: NChristoffer Dall <cdall@linaro.org>
Signed-off-by: NChristoffer Dall <cdall@linaro.org>

2387149e

KVM: arm/arm64: Use uaccess functions for GICv3 {sc}active · 0710f9a6

由 Christoffer Dall 提交于 6月 04, 2017

We recently rewrote the sactive and cactive handlers to take the kvm
lock for guest accesses to these registers. However, when accessed from
userspace this lock is already held. Unfortunately we forgot to change
the private accessors for GICv3, because these are redistributor
registers and not distributor registers.
Signed-off-by: NChristoffer Dall <cdall@linaro.org>

0710f9a6

24 5月, 2017 1 次提交

KVM: arm/arm64: Fix isues with GICv2 on GICv3 migration · 28232a43

由 Christoffer Dall 提交于 5月 20, 2017

We have been a little loose with our intermediate VMCR representation
where we had a 'ctlr' field, but we failed to differentiate between the
GICv2 GICC_CTLR and ICC_CTLR_EL1 layouts, and therefore ended up mapping
the wrong bits into the individual fields of the ICH_VMCR_EL2 when
emulating a GICv2 on a GICv3 system.

Fix this by using explicit fields for the VMCR bits instead.

Cc: Eric Auger <eric.auger@redhat.com>
Reported-by: Nwanghaibin <wanghaibin.wang@huawei.com>
Signed-off-by: NChristoffer Dall <cdall@linaro.org>
Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com>
Tested-by: NMarc Zyngier <marc.zyngier@arm.com>

28232a43

23 5月, 2017 3 次提交

KVM: arm/arm64: Simplify active_change_prepare and plug race · abd72296

由 Christoffer Dall 提交于 5月 06, 2017

We don't need to stop a specific VCPU when changing the active state,
because private IRQs can only be modified by a running VCPU for the
VCPU itself and it is therefore already stopped.

However, it is also possible for two VCPUs to be modifying the active
state of SPIs at the same time, which can cause the thread being stuck
in the loop that checks other VCPU threads for a potentially very long
time, or to modify the active state of a running VCPU.  Fix this by
serializing all accesses to setting and clearing the active state of
interrupts using the KVM mutex.
Reported-by: NAndrew Jones <drjones@redhat.com>
Signed-off-by: NChristoffer Dall <cdall@linaro.org>
Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com>

abd72296

KVM: arm/arm64: Separate guest and uaccess writes to dist {sc}active · 3197191e

由 Christoffer Dall 提交于 5月 16, 2017

Factor out the core register modifier functionality from the entry
points from the register description table, and only call the
prepare/finish functions from the guest path, not the uaccess path.
Signed-off-by: NChristoffer Dall <cdall@linaro.org>
Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com>

3197191e

KVM: arm/arm64: Allow GICv2 to supply a uaccess register function · 2602087e

由 Christoffer Dall 提交于 5月 16, 2017

We are about to differentiate between writes from a VCPU and from
userspace to the GIC's GICD_ISACTIVER and GICD_ICACTIVER registers due
to different synchronization requirements.

Expand the macro to define a register description for the GIC to take
uaccess functions as well.
Signed-off-by: NChristoffer Dall <cdall@linaro.org>
Acked-by: NMarc Zyngier <marc.zyngier@arm.com>

2602087e

18 5月, 2017 2 次提交

KVM: arm/arm64: Hold slots_lock when unregistering kvm io bus devices · fa472fa9

由 Christoffer Dall 提交于 5月 17, 2017

We were not holding the kvm->slots_lock as required when calling
kvm_io_bus_unregister_dev() as required.

This only affects the error path, but still, let's do our due
diligence.

Reported by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: NChristoffer Dall <cdall@linaro.org>
Reviewed-by: NEric Auger <eric.auger@redhat.com>

fa472fa9

KVM: arm/arm64: Fix bug when registering redist iodevs · 552c9f47

由 Christoffer Dall 提交于 5月 17, 2017

If userspace creates the VCPUs after initializing the VGIC, then we end
up in a situation where we trigger a bug in kvm_vcpu_get_idx(), because
it is called prior to adding the VCPU into the vcpus array on the VM.

There is no tight coupling between the VCPU index and the area of the
redistributor region used for the VCPU, so we can simply ensure that all
creations of redistributors are serialized per VM, and increment an
offset when we successfully add a redistributor.

The vgic_register_redist_iodev() function can be called from two paths:
vgic_redister_all_redist_iodev() which is called via the kvm_vgic_addr()
device attribute handler. This patch already holds the kvm->lock mutex.

The other path is via kvm_vgic_vcpu_init, which is called through a
longer chain from kvm_vm_ioctl_create_vcpu(), which releases the
kvm->lock mutex just before calling kvm_arch_vcpu_create(), so we can
simply take this mutex again later for our purposes.

Fixes: ab6f468c10 ("KVM: arm/arm64: Register iodevs when setting redist base and creating VCPUs")
Signed-off-by: NChristoffer Dall <cdall@linaro.org>
Tested-by: NJean-Philippe Brucker <jean-philippe.brucker@arm.com>
Reviewed-by: NEric Auger <eric.auger@redhat.com>

552c9f47

16 5月, 2017 4 次提交

kvm: arm/arm64: Fix use after free of stage2 page table · 0c428a6a

由 Suzuki K Poulose 提交于 5月 16, 2017

We yield the kvm->mmu_lock occassionaly while performing an operation
(e.g, unmap or permission changes) on a large area of stage2 mappings.
However this could possibly cause another thread to clear and free up
the stage2 page tables while we were waiting for regaining the lock and
thus the original thread could end up in accessing memory that was
freed. This patch fixes the problem by making sure that the stage2
pagetable is still valid after we regain the lock. The fact that
mmu_notifer->release() could be called twice (via __mmu_notifier_release
and mmu_notifier_unregsister) enhances the possibility of hitting
this race where there are two threads trying to unmap the entire guest
shadow pages.

While at it, cleanup the redudant checks around cond_resched_lock in
stage2_wp_range(), as cond_resched_lock already does the same checks.

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: andreyknvl@google.com
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: stable@vger.kernel.org
Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NSuzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: NChristoffer Dall <cdall@linaro.org>
Signed-off-by: NChristoffer Dall <cdall@linaro.org>

0c428a6a

kvm: arm/arm64: Force reading uncached stage2 PGD · 2952a607

由 Suzuki K Poulose 提交于 5月 16, 2017

Make sure we don't use a cached value of the KVM stage2 PGD while
resetting the PGD.

Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: stable@vger.kernel.org
Signed-off-by: NSuzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: NChristoffer Dall <cdall@linaro.org>
Signed-off-by: NChristoffer Dall <cdall@linaro.org>

2952a607

KVM: arm64: Restore host physical timer access on hyp_panic() · e8ec032b

由 James Morse 提交于 4月 25, 2017

When KVM panics, it hurridly restores the host context and parachutes
into the host's panic() code. At some point panic() touches the physical
timer/counter. Unless we are an arm64 system with VHE, this traps back
to EL2. If we're lucky, we panic again.

Add a __timer_save_state() call to KVMs hyp_panic() path, this saves the
guest registers and disables the traps for the host.

Fixes: 53fd5b64 ("arm64: KVM: Add panic handling")
Signed-off-by: NJames Morse <james.morse@arm.com>
Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com>
Reviewed-by: NChristoffer Dall <cdall@linaro.org>
Signed-off-by: NChristoffer Dall <cdall@linaro.org>

e8ec032b

KVM: arm: Restore banked registers and physical timer access on hyp_panic() · d2e19368

由 James Morse 提交于 4月 25, 2017

When KVM panics, it hurridly restores the host context and parachutes
into the host's panic() code. This looks like it was copied from arm64,
the 32bit KVM panic code needs to restore the host's banked registers
too.

At some point panic() touches the physical timer/counter, this will
trap back to HYP. If we're lucky, we panic again.

Add a __timer_save_state() call to KVMs hyp_panic() path, this saves the
guest registers and disables the traps for the host.

Fixes: c36b6db5 ("ARM: KVM: Add panic handling code")
Signed-off-by: NJames Morse <james.morse@arm.com>
Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com>
Reviewed-by: NChristoffer Dall <cdall@linaro.org>
Signed-off-by: NChristoffer Dall <cdall@linaro.org>

d2e19368

15 5月, 2017 3 次提交

KVM: arm: rename pm_fake handler to trap_raz_wi · 9b619a8f

由 Zhichao Huang 提交于 5月 11, 2017

pm_fake doesn't quite describe what the handler does (ignoring writes
and returning 0 for reads).

As we're about to use it (a lot) in a different context, rename it
with a (admitedly cryptic) name that make sense for all users.
Signed-off-by: NZhichao Huang <zhichao.huang@linaro.org>
Reviewed-by: NAlex Bennee <alex.bennee@linaro.org>
Acked-by: NChristoffer Dall <christoffer.dall@linaro.org>
Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NAlex Bennée <alex.bennee@linaro.org>
Signed-off-by: NChristoffer Dall <cdall@linaro.org>

9b619a8f

KVM: arm: plug potential guest hardware debug leakage · 661e6b02

由 Zhichao Huang 提交于 5月 11, 2017

Hardware debugging in guests is not intercepted currently, it means
that a malicious guest can bring down the entire machine by writing
to the debug registers.

This patch enable trapping of all debug registers, preventing the
guests to access the debug registers. This includes access to the
debug mode(DBGDSCR) in the guest world all the time which could
otherwise mess with the host state. Reads return 0 and writes are
ignored (RAZ_WI).

The result is the guest cannot detect any working hardware based debug
support. As debug exceptions are still routed to the guest normal
debug using software based breakpoints still works.

To support debugging using hardware registers we need to implement a
debug register aware world switch as well as special trapping for
registers that may affect the host state.

Cc: stable@vger.kernel.org
Signed-off-by: NZhichao Huang <zhichao.huang@linaro.org>
Signed-off-by: NAlex Bennée <alex.bennee@linaro.org>
Reviewed-by: NChristoffer Dall <cdall@linaro.org>
Signed-off-by: NChristoffer Dall <cdall@linaro.org>

661e6b02

kvm: arm/arm64: Fix race in resetting stage2 PGD · 6c0d706b

由 Suzuki K Poulose 提交于 5月 03, 2017

In kvm_free_stage2_pgd() we check the stage2 PGD before holding
the lock and proceed to take the lock if it is valid. And we unmap
the page tables, followed by releasing the lock. We reset the PGD
only after dropping this lock, which could cause a race condition
where another thread waiting on or even holding the lock, could
potentially see that the PGD is still valid and proceed to perform
a stage2 operation and later encounter a NULL PGD.

[223090.242280] Unable to handle kernel NULL pointer dereference at
virtual address 00000040
[223090.262330] PC is at unmap_stage2_range+0x8c/0x428
[223090.262332] LR is at kvm_unmap_hva_handler+0x2c/0x3c
[223090.262531] Call trace:
[223090.262533] [<ffff0000080adb78>] unmap_stage2_range+0x8c/0x428
[223090.262535] [<ffff0000080adf40>] kvm_unmap_hva_handler+0x2c/0x3c
[223090.262537] [<ffff0000080ace2c>] handle_hva_to_gpa+0xb0/0x104
[223090.262539] [<ffff0000080af988>] kvm_unmap_hva+0x5c/0xbc
[223090.262543] [<ffff0000080a2478>]
kvm_mmu_notifier_invalidate_page+0x50/0x8c
[223090.262547] [<ffff0000082274f8>]
__mmu_notifier_invalidate_page+0x5c/0x84
[223090.262551] [<ffff00000820b700>] try_to_unmap_one+0x1d0/0x4a0
[223090.262553] [<ffff00000820c5c8>] rmap_walk+0x1cc/0x2e0
[223090.262555] [<ffff00000820c90c>] try_to_unmap+0x74/0xa4
[223090.262557] [<ffff000008230ce4>] migrate_pages+0x31c/0x5ac
[223090.262561] [<ffff0000081f869c>] compact_zone+0x3fc/0x7ac
[223090.262563] [<ffff0000081f8ae0>] compact_zone_order+0x94/0xb0
[223090.262564] [<ffff0000081f91c0>] try_to_compact_pages+0x108/0x290
[223090.262569] [<ffff0000081d5108>] __alloc_pages_direct_compact+0x70/0x1ac
[223090.262571] [<ffff0000081d64a0>] __alloc_pages_nodemask+0x434/0x9f4
[223090.262572] [<ffff0000082256f0>] alloc_pages_vma+0x230/0x254
[223090.262574] [<ffff000008235e5c>] do_huge_pmd_anonymous_page+0x114/0x538
[223090.262576] [<ffff000008201bec>] handle_mm_fault+0xd40/0x17a4
[223090.262577] [<ffff0000081fb324>] __get_user_pages+0x12c/0x36c
[223090.262578] [<ffff0000081fb804>] get_user_pages_unlocked+0xa4/0x1b8
[223090.262579] [<ffff0000080a3ce8>] __gfn_to_pfn_memslot+0x280/0x31c
[223090.262580] [<ffff0000080a3dd0>] gfn_to_pfn_prot+0x4c/0x5c
[223090.262582] [<ffff0000080af3f8>] kvm_handle_guest_abort+0x240/0x774
[223090.262584] [<ffff0000080b2bac>] handle_exit+0x11c/0x1ac
[223090.262586] [<ffff0000080ab99c>] kvm_arch_vcpu_ioctl_run+0x31c/0x648
[223090.262587] [<ffff0000080a1d78>] kvm_vcpu_ioctl+0x378/0x768
[223090.262590] [<ffff00000825df5c>] do_vfs_ioctl+0x324/0x5a4
[223090.262591] [<ffff00000825e26c>] SyS_ioctl+0x90/0xa4
[223090.262595] [<ffff000008085d84>] el0_svc_naked+0x38/0x3c

This patch moves the stage2 PGD manipulation under the lock.
Reported-by: NAlexander Graf <agraf@suse.de>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Reviewed-by: NChristoffer Dall <cdall@linaro.org>
Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NSuzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: NChristoffer Dall <cdall@linaro.org>

6c0d706b

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功