提交 · c8d95668c4caf845d2fa5e5a0c0df83cac00fc37 · openanolis / cloud-kernel

14 7月, 2019 1 次提交

KVM: arm/arm64: vgic: Fix kvm_device leak in vgic_its_destroy · 512bbb11

由 Dave Martin 提交于 6月 06, 2019

[ Upstream commit 4729ec8c1e1145234aeeebad5d96d77f4ccbb00a ]

kvm_device->destroy() seems to be supposed to free its kvm_device
struct, but vgic_its_destroy() is not currently doing this,
resulting in a memory leak, resulting in kmemleak reports such as
the following:

unreferenced object 0xffff800aeddfe280 (size 128):
  comm "qemu-system-aar", pid 13799, jiffies 4299827317 (age 1569.844s)
  [...]
  backtrace:
    [<00000000a08b80e2>] kmem_cache_alloc+0x178/0x208
    [<00000000dcad2bd3>] kvm_vm_ioctl+0x350/0xbc0

Fix it.

Cc: Andre Przywara <andre.przywara@arm.com>
Fixes: 1085fdc6 ("KVM: arm64: vgic-its: Introduce new KVM ITS device")
Signed-off-by: NDave Martin <Dave.Martin@arm.com>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>

512bbb11

19 6月, 2019 1 次提交

KVM: arm/arm64: Move cc/it checks under hyp's Makefile to avoid instrumentation · 60b30097

由 James Morse 提交于 5月 22, 2019

[ Upstream commit 623e1528d4090bd1abaf93ec46f047dee9a6fb32 ]

KVM has helpers to handle the condition codes of trapped aarch32
instructions. These are marked __hyp_text and used from HYP, but they
aren't built by the 'hyp' Makefile, which has all the runes to avoid ASAN
and KCOV instrumentation.

Move this code to a new hyp/aarch32.c to avoid a hyp-panic when starting
an aarch32 guest on a host built with the ASAN/KCOV debug options.

Fixes: 021234ef ("KVM: arm64: Make kvm_condition_valid32() accessible from EL2")
Fixes: 8cebe750 ("arm64: KVM: Make kvm_skip_instr32 available to HYP")
Signed-off-by: NJames Morse <james.morse@arm.com>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>

60b30097

09 6月, 2019 1 次提交

KVM: s390: Do not report unusabled IDs via KVM_CAP_MAX_VCPU_ID · 6a2fbec7

由 Thomas Huth 提交于 5月 23, 2019

commit a86cb413f4bf273a9d341a3ab2c2ca44e12eb317 upstream.

KVM_CAP_MAX_VCPU_ID is currently always reporting KVM_MAX_VCPU_ID on all
architectures. However, on s390x, the amount of usable CPUs is determined
during runtime - it is depending on the features of the machine the code
is running on. Since we are using the vcpu_id as an index into the SCA
structures that are defined by the hardware (see e.g. the sca_add_vcpu()
function), it is not only the amount of CPUs that is limited by the hard-
ware, but also the range of IDs that we can use.
Thus KVM_CAP_MAX_VCPU_ID must be determined during runtime on s390x, too.
So the handling of KVM_CAP_MAX_VCPU_ID has to be moved from the common
code into the architecture specific code, and on s390x we have to return
the same value here as for KVM_CAP_MAX_VCPUS.
This problem has been discovered with the kvm_create_max_vcpus selftest.
With this change applied, the selftest now passes on s390x, too.
Reviewed-by: NAndrew Jones <drjones@redhat.com>
Reviewed-by: NCornelia Huck <cohuck@redhat.com>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Signed-off-by: NThomas Huth <thuth@redhat.com>
Message-Id: <20190523164309.13345-9-thuth@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

6a2fbec7

26 5月, 2019 1 次提交

KVM: arm/arm64: Ensure vcpu target is unset on reset failure · a1251522

由 Andrew Jones 提交于 4月 04, 2019

[ Upstream commit 811328fc3222f7b55846de0cd0404339e2e1e6d7 ]

A failed KVM_ARM_VCPU_INIT should not set the vcpu target,
as the vcpu target is used by kvm_vcpu_initialized() to
determine if other vcpu ioctls may proceed. We need to set
the target before calling kvm_reset_vcpu(), but if that call
fails, we should then unset it and clear the feature bitmap
while we're at it.
Signed-off-by: NAndrew Jones <drjones@redhat.com>
[maz: Simplified patch, completed commit message]
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>

a1251522

04 5月, 2019 2 次提交

KVM: arm/arm64: vgic-its: Take the srcu lock when parsing the memslots · e4705ae7

由 Marc Zyngier 提交于 3月 19, 2019

[ Upstream commit 7494cec6cb3ba7385a6a223b81906384f15aae34 ]

Calling kvm_is_visible_gfn() implies that we're parsing the memslots,
and doing this without the srcu lock is frown upon:

[12704.164532] =============================
[12704.164544] WARNING: suspicious RCU usage
[12704.164560] 5.1.0-rc1-00008-g600025238f51-dirty #16 Tainted: G        W
[12704.164573] -----------------------------
[12704.164589] ./include/linux/kvm_host.h:605 suspicious rcu_dereference_check() usage!
[12704.164602] other info that might help us debug this:
[12704.164616] rcu_scheduler_active = 2, debug_locks = 1
[12704.164631] 6 locks held by qemu-system-aar/13968:
[12704.164644]  #0: 000000007ebdae4f (&kvm->lock){+.+.}, at: vgic_its_set_attr+0x244/0x3a0
[12704.164691]  #1: 000000007d751022 (&its->its_lock){+.+.}, at: vgic_its_set_attr+0x250/0x3a0
[12704.164726]  #2: 00000000219d2706 (&vcpu->mutex){+.+.}, at: lock_all_vcpus+0x64/0xd0
[12704.164761]  #3: 00000000a760aecd (&vcpu->mutex){+.+.}, at: lock_all_vcpus+0x64/0xd0
[12704.164794]  #4: 000000000ef8e31d (&vcpu->mutex){+.+.}, at: lock_all_vcpus+0x64/0xd0
[12704.164827]  #5: 000000007a872093 (&vcpu->mutex){+.+.}, at: lock_all_vcpus+0x64/0xd0
[12704.164861] stack backtrace:
[12704.164878] CPU: 2 PID: 13968 Comm: qemu-system-aar Tainted: G        W         5.1.0-rc1-00008-g600025238f51-dirty #16
[12704.164887] Hardware name: rockchip evb_rk3399/evb_rk3399, BIOS 2019.04-rc3-00124-g2feec69fb1 03/15/2019
[12704.164896] Call trace:
[12704.164910]  dump_backtrace+0x0/0x138
[12704.164920]  show_stack+0x24/0x30
[12704.164934]  dump_stack+0xbc/0x104
[12704.164946]  lockdep_rcu_suspicious+0xcc/0x110
[12704.164958]  gfn_to_memslot+0x174/0x190
[12704.164969]  kvm_is_visible_gfn+0x28/0x70
[12704.164980]  vgic_its_check_id.isra.0+0xec/0x1e8
[12704.164991]  vgic_its_save_tables_v0+0x1ac/0x330
[12704.165001]  vgic_its_set_attr+0x298/0x3a0
[12704.165012]  kvm_device_ioctl_attr+0x9c/0xd8
[12704.165022]  kvm_device_ioctl+0x8c/0xf8
[12704.165035]  do_vfs_ioctl+0xc8/0x960
[12704.165045]  ksys_ioctl+0x8c/0xa0
[12704.165055]  __arm64_sys_ioctl+0x28/0x38
[12704.165067]  el0_svc_common+0xd8/0x138
[12704.165078]  el0_svc_handler+0x38/0x78
[12704.165089]  el0_svc+0x8/0xc

Make sure the lock is taken when doing this.

Fixes: bf308242 ("KVM: arm/arm64: VGIC/ITS: protect kvm_read_guest() calls with SRCU lock")
Reviewed-by: NEric Auger <eric.auger@redhat.com>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NSasha Levin (Microsoft) <sashal@kernel.org>

e4705ae7

KVM: arm/arm64: vgic-its: Take the srcu lock when writing to guest memory · 0371fa03

由 Marc Zyngier 提交于 3月 19, 2019

[ Upstream commit a6ecfb11bf37743c1ac49b266595582b107b61d4 ]

When halting a guest, QEMU flushes the virtual ITS caches, which
amounts to writing to the various tables that the guest has allocated.

When doing this, we fail to take the srcu lock, and the kernel
shouts loudly if running a lockdep kernel:

[   69.680416] =============================
[   69.680819] WARNING: suspicious RCU usage
[   69.681526] 5.1.0-rc1-00008-g600025238f51-dirty #18 Not tainted
[   69.682096] -----------------------------
[   69.682501] ./include/linux/kvm_host.h:605 suspicious rcu_dereference_check() usage!
[   69.683225]
[   69.683225] other info that might help us debug this:
[   69.683225]
[   69.683975]
[   69.683975] rcu_scheduler_active = 2, debug_locks = 1
[   69.684598] 6 locks held by qemu-system-aar/4097:
[   69.685059]  #0: 0000000034196013 (&kvm->lock){+.+.}, at: vgic_its_set_attr+0x244/0x3a0
[   69.686087]  #1: 00000000f2ed935e (&its->its_lock){+.+.}, at: vgic_its_set_attr+0x250/0x3a0
[   69.686919]  #2: 000000005e71ea54 (&vcpu->mutex){+.+.}, at: lock_all_vcpus+0x64/0xd0
[   69.687698]  #3: 00000000c17e548d (&vcpu->mutex){+.+.}, at: lock_all_vcpus+0x64/0xd0
[   69.688475]  #4: 00000000ba386017 (&vcpu->mutex){+.+.}, at: lock_all_vcpus+0x64/0xd0
[   69.689978]  #5: 00000000c2c3c335 (&vcpu->mutex){+.+.}, at: lock_all_vcpus+0x64/0xd0
[   69.690729]
[   69.690729] stack backtrace:
[   69.691151] CPU: 2 PID: 4097 Comm: qemu-system-aar Not tainted 5.1.0-rc1-00008-g600025238f51-dirty #18
[   69.691984] Hardware name: rockchip evb_rk3399/evb_rk3399, BIOS 2019.04-rc3-00124-g2feec69fb1 03/15/2019
[   69.692831] Call trace:
[   69.694072]  lockdep_rcu_suspicious+0xcc/0x110
[   69.694490]  gfn_to_memslot+0x174/0x190
[   69.694853]  kvm_write_guest+0x50/0xb0
[   69.695209]  vgic_its_save_tables_v0+0x248/0x330
[   69.695639]  vgic_its_set_attr+0x298/0x3a0
[   69.696024]  kvm_device_ioctl_attr+0x9c/0xd8
[   69.696424]  kvm_device_ioctl+0x8c/0xf8
[   69.696788]  do_vfs_ioctl+0xc8/0x960
[   69.697128]  ksys_ioctl+0x8c/0xa0
[   69.697445]  __arm64_sys_ioctl+0x28/0x38
[   69.697817]  el0_svc_common+0xd8/0x138
[   69.698173]  el0_svc_handler+0x38/0x78
[   69.698528]  el0_svc+0x8/0xc

The fix is to obviously take the srcu lock, just like we do on the
read side of things since bf308242. One wonders why this wasn't
fixed at the same time, but hey...

Fixes: bf308242 ("KVM: arm/arm64: VGIC/ITS: protect kvm_read_guest() calls with SRCU lock")
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NSasha Levin (Microsoft) <sashal@kernel.org>

0371fa03

24 3月, 2019 4 次提交

KVM: Call kvm_arch_memslots_updated() before updating memslots · 23ad135a

由 Sean Christopherson 提交于 2月 05, 2019

commit 152482580a1b0accb60676063a1ac57b2d12daf6 upstream.

kvm_arch_memslots_updated() is at this point in time an x86-specific
hook for handling MMIO generation wraparound. x86 stashes 19 bits of
the memslots generation number in its MMIO sptes in order to avoid
full page fault walks for repeat faults on emulated MMIO addresses.
Because only 19 bits are used, wrapping the MMIO generation number is
possible, if unlikely. kvm_arch_memslots_updated() alerts x86 that
the generation has changed so that it can invalidate all MMIO sptes in
case the effective MMIO generation has wrapped so as to avoid using a
stale spte, e.g. a (very) old spte that was created with generation==0.

Given that the purpose of kvm_arch_memslots_updated() is to prevent
consuming stale entries, it needs to be called before the new generation
is propagated to memslots. Invalidating the MMIO sptes after updating
memslots means that there is a window where a vCPU could dereference
the new memslots generation, e.g. 0, and incorrectly reuse an old MMIO
spte that was created with (pre-wrap) generation==0.

Fixes: e59dbe09 ("KVM: Introduce kvm_arch_memslots_updated()")
Cc: <stable@vger.kernel.org>
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

23ad135a

KVM: arm/arm64: vgic: Always initialize the group of private IRQs · 04131dfc

由 Christoffer Dall 提交于 1月 10, 2019

[ Upstream commit ab2d5eb03dbb7b37a1c6356686fb48626ab0c93e ]

We currently initialize the group of private IRQs during
kvm_vgic_vcpu_init, and the value of the group depends on the GIC model
we are emulating.  However, CPUs created before creating (and
initializing) the VGIC might end up with the wrong group if the VGIC
is created as GICv3 later.

Since we have no enforced ordering of creating the VGIC and creating
VCPUs, we can end up with part the VCPUs being properly intialized and
the remaining incorrectly initialized.  That also means that we have no
single place to do the per-cpu data structure initialization which
depends on knowing the emulated GIC model (which is only the group
field).

This patch removes the incorrect comment from kvm_vgic_vcpu_init and
initializes the group of all previously created VCPUs's private
interrupts in vgic_init in addition to the existing initialization in
kvm_vgic_vcpu_init.
Signed-off-by: NChristoffer Dall <christoffer.dall@arm.com>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>

04131dfc

arm/arm64: KVM: Allow a VCPU to fully reset itself · b78379c3

由 Marc Zyngier 提交于 12月 20, 2018

[ Upstream commit 358b28f09f0ab074d781df72b8a671edb1547789 ]

The current kvm_psci_vcpu_on implementation will directly try to
manipulate the state of the VCPU to reset it.  However, since this is
not done on the thread that runs the VCPU, we can end up in a strangely
corrupted state when the source and target VCPUs are running at the same
time.

Fix this by factoring out all reset logic from the PSCI implementation
and forwarding the required information along with a request to the
target VCPU.
Reviewed-by: NAndrew Jones <drjones@redhat.com>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NChristoffer Dall <christoffer.dall@arm.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>

b78379c3

KVM: arm/arm64: vgic: Make vgic_dist->lpi_list_lock a raw_spinlock · 5f4a64b0

由 Julien Thierry 提交于 1月 07, 2019

[ Upstream commit fc3bc475231e12e9c0142f60100cf84d077c79e1 ]

vgic_dist->lpi_list_lock must always be taken with interrupts disabled as
it is used in interrupt context.

For configurations such as PREEMPT_RT_FULL, this means that it should
be a raw_spinlock since RT spinlocks are interruptible.
Signed-off-by: NJulien Thierry <julien.thierry@arm.com>
Acked-by: NChristoffer Dall <christoffer.dall@arm.com>
Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NChristoffer Dall <christoffer.dall@arm.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>

5f4a64b0

13 2月, 2019 1 次提交

arm64: KVM: Skip MMIO insn after emulation · c709eeb0

由 Mark Rutland 提交于 11月 09, 2018

[ Upstream commit 0d640732dbebed0f10f18526de21652931f0b2f2 ]

When we emulate an MMIO instruction, we advance the CPU state within
decode_hsr(), before emulating the instruction effects.

Having this logic in decode_hsr() is opaque, and advancing the state
before emulation is problematic. It gets in the way of applying
consistent single-step logic, and it prevents us from being able to fail
an MMIO instruction with a synchronous exception.

Clean this up by only advancing the CPU state *after* the effects of the
instruction are emulated.

Cc: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
Reviewed-by: NChristoffer Dall <christoffer.dall@arm.com>
Signed-off-by: NMark Rutland <mark.rutland@arm.com>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>

c709eeb0

17 1月, 2019 1 次提交

KVM: arm/arm64: Fix VMID alloc race by reverting to lock-less · 4f14f446

由 Christoffer Dall 提交于 12月 11, 2018

commit fb544d1ca65a89f7a3895f7531221ceeed74ada7 upstream.

We recently addressed a VMID generation race by introducing a read/write
lock around accesses and updates to the vmid generation values.

However, kvm_arch_vcpu_ioctl_run() also calls need_new_vmid_gen() but
does so without taking the read lock.

As far as I can tell, this can lead to the same kind of race:

  VM 0, VCPU 0			VM 0, VCPU 1
  ------------			------------
  update_vttbr (vmid 254)
  				update_vttbr (vmid 1) // roll over
				read_lock(kvm_vmid_lock);
				force_vm_exit()
  local_irq_disable
  need_new_vmid_gen == false //because vmid gen matches

  enter_guest (vmid 254)
  				kvm_arch.vttbr = <PGD>:<VMID 1>
				read_unlock(kvm_vmid_lock);

  				enter_guest (vmid 1)

Which results in running two VCPUs in the same VM with different VMIDs
and (even worse) other VCPUs from other VMs could now allocate clashing
VMID 254 from the new generation as long as VCPU 0 is not exiting.

Attempt to solve this by making sure vttbr is updated before another CPU
can observe the updated VMID generation.

Cc: stable@vger.kernel.org
Fixes: f0cf47d9 "KVM: arm/arm64: Close VMID generation race"
Reviewed-by: NJulien Thierry <julien.thierry@arm.com>
Signed-off-by: NChristoffer Dall <christoffer.dall@arm.com>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

4f14f446

10 1月, 2019 5 次提交

KVM: arm/arm64: vgic: Fix off-by-one bug in vgic_get_irq() · f318d0cf

由 Gustavo A. R. Silva 提交于 12月 12, 2018

commit c23b2e6fc4ca346018618266bcabd335c0a8a49e upstream.

When using the nospec API, it should be taken into account that:

"...if the CPU speculates past the bounds check then
 * array_index_nospec() will clamp the index within the range of [0,
 * size)."

The above is part of the header for macro array_index_nospec() in
linux/nospec.h

Now, in this particular case, if intid evaluates to exactly VGIC_MAX_SPI
or to exaclty VGIC_MAX_PRIVATE, the array_index_nospec() macro ends up
returning VGIC_MAX_SPI - 1 or VGIC_MAX_PRIVATE - 1 respectively, instead
of VGIC_MAX_SPI or VGIC_MAX_PRIVATE, which, based on the original logic:

	/* SGIs and PPIs */
	if (intid <= VGIC_MAX_PRIVATE)
 		return &vcpu->arch.vgic_cpu.private_irqs[intid];

 	/* SPIs */
	if (intid <= VGIC_MAX_SPI)
 		return &kvm->arch.vgic.spis[intid - VGIC_NR_PRIVATE_IRQS];

are valid values for intid.

Fix this by calling array_index_nospec() macro with VGIC_MAX_PRIVATE + 1
and VGIC_MAX_SPI + 1 as arguments for its parameter size.

Fixes: 41b87599 ("KVM: arm/arm64: vgic: fix possible spectre-v1 in vgic_get_irq()")
Cc: stable@vger.kernel.org
Signed-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com>
[dropped the SPI part which was fixed separately]
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

f318d0cf

KVM: arm/arm64: vgic-v2: Set active_source to 0 when restoring state · 47ffaa7d

由 Christoffer Dall 提交于 12月 11, 2018

commit 60c3ab30d8c2ff3a52606df03f05af2aae07dc6b upstream.

When restoring the active state from userspace, we don't know which CPU
was the source for the active state, and this is not architecturally
exposed in any of the register state.

Set the active_source to 0 in this case.  In the future, we can expand
on this and exposse the information as additional information to
userspace for GICv2 if anyone cares.

Cc: stable@vger.kernel.org
Signed-off-by: NChristoffer Dall <christoffer.dall@arm.com>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

47ffaa7d

KVM: arm/arm64: vgic: Cap SPIs to the VM-defined maximum · 6318b1b7

由 Marc Zyngier 提交于 12月 04, 2018

commit bea2ef803ade3359026d5d357348842bca9edcf1 upstream.

SPIs should be checked against the VMs specific configuration, and
not the architectural maximum.

Cc: stable@vger.kernel.org
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

6318b1b7

KVM: arm/arm64: vgic: Do not cond_resched_lock() with IRQs disabled · f0fcc4d1

由 Julien Thierry 提交于 11月 26, 2018

commit 2e2f6c3c0b08eed3fcf7de3c7684c940451bdeb1 upstream.

To change the active state of an MMIO, halt is requested for all vcpus of
the affected guest before modifying the IRQ state. This is done by calling
cond_resched_lock() in vgic_mmio_change_active(). However interrupts are
disabled at this point and we cannot reschedule a vcpu.

We actually don't need any of this, as kvm_arm_halt_guest ensures that
all the other vcpus are out of the guest. Let's just drop that useless
code.
Signed-off-by: NJulien Thierry <julien.thierry@arm.com>
Suggested-by: NChristoffer Dall <christoffer.dall@arm.com>
Cc: stable@vger.kernel.org
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

f0fcc4d1

arm/arm64: KVM: vgic: Force VM halt when changing the active state of GICv3 PPIs/SGIs · 0fa68518

由 Marc Zyngier 提交于 12月 18, 2018

commit 107352a24900fb458152b92a4e72fbdc83fd5510 upstream.

We currently only halt the guest when a vCPU messes with the active
state of an SPI. This is perfectly fine for GICv2, but isn't enough
for GICv3, where all vCPUs can access the state of any other vCPU.

Let's broaden the condition to include any GICv3 interrupt that
has an active state (i.e. all but LPIs).

Cc: stable@vger.kernel.org
Reviewed-by: NChristoffer Dall <christoffer.dall@arm.com>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

0fa68518

14 11月, 2018 2 次提交

KVM: arm64: Fix caching of host MDCR_EL2 value · 59571785

由 Mark Rutland 提交于 10月 17, 2018

commit da5a3ce66b8bb51b0ea8a89f42aac153903f90fb upstream.

At boot time, KVM stashes the host MDCR_EL2 value, but only does this
when the kernel is not running in hyp mode (i.e. is non-VHE). In these
cases, the stashed value of MDCR_EL2.HPMN happens to be zero, which can
lead to CONSTRAINED UNPREDICTABLE behaviour.

Since we use this value to derive the MDCR_EL2 value when switching
to/from a guest, after a guest have been run, the performance counters
do not behave as expected. This has been observed to result in accesses
via PMXEVTYPER_EL0 and PMXEVCNTR_EL0 not affecting the relevant
counters, resulting in events not being counted. In these cases, only
the fixed-purpose cycle counter appears to work as expected.

Fix this by always stashing the host MDCR_EL2 value, regardless of VHE.

Cc: Christopher Dall <christoffer.dall@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: stable@vger.kernel.org
Fixes: 1e947bad ("arm64: KVM: Skip HYP setup when already running in HYP")
Tested-by: NRobin Murphy <robin.murphy@arm.com>
Signed-off-by: NMark Rutland <mark.rutland@arm.com>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

59571785

KVM: arm/arm64: Ensure only THP is candidate for adjustment · 3e286d39

由 Punit Agrawal 提交于 10月 01, 2018

commit fd2ef358282c849c193aa36dadbf4f07f7dcd29b upstream.

PageTransCompoundMap() returns true for hugetlbfs and THP
hugepages. This behaviour incorrectly leads to stage 2 faults for
unsupported hugepage sizes (e.g., 64K hugepage with 4K pages) to be
treated as THP faults.

Tighten the check to filter out hugetlbfs pages. This also leads to
consistently mapping all unsupported hugepage sizes as PTE level
entries at stage 2.
Signed-off-by: NPunit Agrawal <punit.agrawal@arm.com>
Reviewed-by: NSuzuki Poulose <suzuki.poulose@arm.com>
Cc: Christoffer Dall <christoffer.dall@arm.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: stable@vger.kernel.org # v4.13+
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

3e286d39

07 9月, 2018 2 次提交

KVM: Remove obsolete kvm_unmap_hva notifier backend · a35381e1

由 Marc Zyngier 提交于 8月 23, 2018

kvm_unmap_hva is long gone, and we only have kvm_unmap_hva_range to
deal with. Drop the now obsolete code.

Fixes: fb1522e0 ("KVM: update to new mmu_notifier semantic v2")
Cc: James Hogan <jhogan@kernel.org>
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NChristoffer Dall <christoffer.dall@arm.com>

a35381e1

KVM: arm/arm64: Clean dcache to PoC when changing PTE due to CoW · 694556d5

由 Marc Zyngier 提交于 8月 23, 2018

When triggering a CoW, we unmap the RO page via an MMU notifier
(invalidate_range_start), and then populate the new PTE using another
one (change_pte). In the meantime, we'll have copied the old page
into the new one.

The problem is that the data for the new page is sitting in the
cache, and should the guest have an uncached mapping to that page
(or its MMU off), following accesses will bypass the cache.

In a way, this is similar to what happens on a translation fault:
We need to clean the page to the PoC before mapping it. So let's just
do that.

This fixes a KVM unit test regression observed on a HiSilicon platform,
and subsequently reproduced on Seattle.

Fixes: a9c0e12e ("KVM: arm/arm64: Only clean the dcache on translation fault")
Cc: stable@vger.kernel.org # v4.16+
Reported-by: NMike Galbraith <efault@gmx.de>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NChristoffer Dall <christoffer.dall@arm.com>

694556d5

13 8月, 2018 2 次提交

KVM: arm/arm64: Skip updating PTE entry if no change · 976d34e2

由 Punit Agrawal 提交于 8月 13, 2018

When there is contention on faulting in a particular page table entry
at stage 2, the break-before-make requirement of the architecture can
lead to additional refaulting due to TLB invalidation.

Avoid this by skipping a page table update if the new value of the PTE
matches the previous value.

Cc: stable@vger.kernel.org
Fixes: d5d8184d ("KVM: ARM: Memory virtualization setup")
Reviewed-by: NSuzuki Poulose <suzuki.poulose@arm.com>
Acked-by: NChristoffer Dall <christoffer.dall@arm.com>
Signed-off-by: NPunit Agrawal <punit.agrawal@arm.com>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>

976d34e2

KVM: arm/arm64: Skip updating PMD entry if no change · 86658b81

由 Punit Agrawal 提交于 8月 13, 2018

Contention on updating a PMD entry by a large number of vcpus can lead
to duplicate work when handling stage 2 page faults. As the page table
update follows the break-before-make requirement of the architecture,
it can lead to repeated refaults due to clearing the entry and
flushing the tlbs.

This problem is more likely when -

* there are large number of vcpus
* the mapping is large block mapping

such as when using PMD hugepages (512MB) with 64k pages.

Fix this by skipping the page table update if there is no change in
the entry being updated.

Cc: stable@vger.kernel.org
Fixes: ad361f09 ("KVM: ARM: Support hugetlbfs backed huge pages")
Reviewed-by: NSuzuki Poulose <suzuki.poulose@arm.com>
Acked-by: NChristoffer Dall <christoffer.dall@arm.com>
Signed-off-by: NPunit Agrawal <punit.agrawal@arm.com>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>

86658b81

12 8月, 2018 3 次提交

KVM: arm/arm64: vgic: Do not use spin_lock_irqsave/restore with irq disabled · d0823cb3

由 Jia He 提交于 8月 03, 2018

kvm_vgic_sync_hwstate is only called with IRQ being disabled.
There is thus no need to call spin_lock_irqsave/restore in
vgic_fold_lr_state and vgic_prune_ap_list.

This patch replace them with the non irq-safe version.
Signed-off-by: NJia He <jia.he@hxt-semitech.com>
Acked-by: NChristoffer Dall <christoffer.dall@arm.com>
[maz: commit message tidy-up]
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>

d0823cb3

KVM: arm/arm64: vgic: Move DEBUG_SPINLOCK_BUG_ON to vgic.h · dc961e53

由 Jia He 提交于 8月 03, 2018

DEBUG_SPINLOCK_BUG_ON can be used with both vgic-v2 and vgic-v3,
so let's move it to vgic.h
Signed-off-by: NJia He <jia.he@hxt-semitech.com>
[maz: commit message tidy-up]
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>

dc961e53

KVM: arm/arm64: vgic-v3: Add core support for Group0 SGIs · 6249f2a4

由 Marc Zyngier 提交于 8月 06, 2018

Although vgic-v3 now supports Group0 interrupts, it still doesn't
deal with Group0 SGIs. As usually with the GIC, nothing is simple:

- ICC_SGI1R can signal SGIs of both groups, since GICD_CTLR.DS==1
  with KVM (as per 8.1.10, Non-secure EL1 access)

- ICC_SGI0R can only generate Group0 SGIs

- ICC_ASGI1R sees its scope refocussed to generate only Group0
  SGIs (as per the note at the bottom of Table 8-14)

We only support Group1 SGIs so far, so no material change.
Reviewed-by: NEric Auger <eric.auger@redhat.com>
Reviewed-by: NChristoffer Dall <christoffer.dall@arm.com>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>

6249f2a4

31 7月, 2018 2 次提交

KVM: arm/arm64: Fix lost IRQs from emulated physcial timer when blocked · 245715cb

由 Christoffer Dall 提交于 7月 25, 2018

When the VCPU is blocked (for example from WFI) we don't inject the
physical timer interrupt if it should fire while the CPU is blocked, but
instead we just wake up the VCPU and expect kvm_timer_vcpu_load to take
care of injecting the interrupt.

Unfortunately, kvm_timer_vcpu_load() doesn't actually do that, it only
has support to schedule a soft timer if the emulated phys timer is
expected to fire in the future.

Follow the same pattern as kvm_timer_update_state() and update the irq
state after potentially scheduling a soft timer.
Reported-by: NAndre Przywara <andre.przywara@arm.com>
Cc: Stable <stable@vger.kernel.org> # 4.15+
Fixes: bbdd52cf ("KVM: arm/arm64: Avoid phys timer emulation in vcpu entry/exit")
Signed-off-by: NChristoffer Dall <christoffer.dall@arm.com>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>

245715cb

KVM: arm/arm64: Fix potential loss of ptimer interrupts · 7afc4ddb

由 Christoffer Dall 提交于 7月 25, 2018

kvm_timer_update_state() is called when changing the phys timer
configuration registers, either via vcpu reset, as a result of a trap
from the guest, or when userspace programs the registers.

phys_timer_emulate() is in turn called by kvm_timer_update_state() to
either cancel an existing software timer, or program a new software
timer, to emulate the behavior of a real phys timer, based on the change
in configuration registers.

Unfortunately, the interaction between these two functions left a small
race; if the conceptual emulated phys timer should actually fire, but
the soft timer hasn't executed its callback yet, we cancel the timer in
phys_timer_emulate without injecting an irq.  This only happens if the
check in kvm_timer_update_state is called before the timer should fire,
which is relatively unlikely, but possible.

The solution is to update the state of the phys timer after calling
phys_timer_emulate, which will pick up the pending timer state and
update the interrupt value.

Note that this leaves the opportunity of raising the interrupt twice,
once in the just-programmed soft timer, and once in
kvm_timer_update_state.  Since this always happens synchronously with
the VCPU execution, there is no harm in this, and the guest ever only
sees a single timer interrupt.

Cc: Stable <stable@vger.kernel.org> # 4.15+
Signed-off-by: NChristoffer Dall <christoffer.dall@arm.com>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>

7afc4ddb

24 7月, 2018 1 次提交

KVM: arm/arm64: vgic: Fix possible spectre-v1 write in vgic_mmio_write_apr() · 6b8b9a48

由 Mark Rutland 提交于 7月 10, 2018

It's possible for userspace to control n. Sanitize n when using it as an
array index, to inhibit the potential spectre-v1 write gadget.

Note that while it appears that n must be bound to the interval [0,3]
due to the way it is extracted from addr, we cannot guarantee that
compiler transformations (and/or future refactoring) will ensure this is
the case, and given this is a slow path it's better to always perform
the masking.

Found by smatch.
Signed-off-by: NMark Rutland <mark.rutland@arm.com>
Cc: Christoffer Dall <christoffer.dall@arm.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: kvmarm@lists.cs.columbia.edu
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>

6b8b9a48

21 7月, 2018 11 次提交

KVM: arm: Add 32bit get/set events support · b0960b95

由 James Morse 提交于 7月 19, 2018

arm64's new use of KVMs get_events/set_events API calls isn't just
or RAS, it allows an SError that has been made pending by KVM as
part of its device emulation to be migrated.

Wire this up for 32bit too.

We only need to read/write the HCR_VA bit, and check that no esr has
been provided, as we don't yet support VDFSR.
Signed-off-by: NJames Morse <james.morse@arm.com>
Reviewed-by: NDongjiu Geng <gengdongjiu@huawei.com>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>

b0960b95

KVM: arm64: Share the parts of get/set events useful to 32bit · 539aee0e

由 James Morse 提交于 7月 19, 2018

The get/set events helpers to do some work to check reserved
and padding fields are zero. This is useful on 32bit too.

Move this code into virt/kvm/arm/arm.c, and give the arch
code some underscores.

This is temporarily hidden behind __KVM_HAVE_VCPU_EVENTS until
32bit is wired up.
Signed-off-by: NJames Morse <james.morse@arm.com>
Reviewed-by: NDongjiu Geng <gengdongjiu@huawei.com>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>

539aee0e

arm/arm64: KVM: Add KVM_GET/SET_VCPU_EVENTS · b7b27fac

由 Dongjiu Geng 提交于 7月 19, 2018

For the migrating VMs, user space may need to know the exception
state. For example, in the machine A, KVM make an SError pending,
when migrate to B, KVM also needs to pend an SError.

This new IOCTL exports user-invisible states related to SError.
Together with appropriate user space changes, user space can get/set
the SError exception state to do migrate/snapshot/suspend.
Signed-off-by: NDongjiu Geng <gengdongjiu@huawei.com>
Reviewed-by: NJames Morse <james.morse@arm.com>
[expanded documentation wording]
Signed-off-by: NJames Morse <james.morse@arm.com>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>

b7b27fac

KVM: arm/arm64: vgic: Let userspace opt-in to writable v2 IGROUPR · 32f8777e

由 Christoffer Dall 提交于 7月 16, 2018

Simply letting IGROUPR be writable from userspace would break
migration from old kernels to newer kernels, because old kernels
incorrectly report interrupt groups as group 1. This would not be a big
problem if userspace wrote GICD_IIDR as read from the kernel, because we
could detect the incompatibility and return an error to userspace.
Unfortunately, this is not the case with current userspace
implementations and simply letting IGROUPR be writable from userspace for
an emulated GICv2 silently breaks migration and causes the destination
VM to no longer run after migration.

We now encourage userspace to write the read and expected value of
GICD_IIDR as the first part of a GIC register restore, and if we observe
a write to GICD_IIDR we know that userspace has been updated and has had
a chance to cope with older kernels (VGICv2 IIDR.Revision == 0)
incorrectly reporting interrupts as group 1, and therefore we now allow
groups to be user writable.
Reviewed-by: NAndrew Jones <drjones@redhat.com>
Signed-off-by: NChristoffer Dall <christoffer.dall@arm.com>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>

32f8777e

KVM: arm/arm64: vgic: Allow configuration of interrupt groups · d53c2c29

由 Christoffer Dall 提交于 7月 16, 2018

Implement the required MMIO accessors for GICv2 and GICv3 for the
IGROUPR distributor and redistributor registers.

This can allow guests to change behavior compared to running on previous
versions of KVM, but only to align with the architecture and hardware
implementations.

This also allows userspace to configure the interrupts groups for GICv3.
We don't allow userspace to write the groups on GICv2 just yet, because
that would result in GICv2 guests not receiving interrupts after
migrating from an older kernel that exposes GICv2 interrupts as group 1.
Reviewed-by: NAndrew Jones <drjones@redhat.com>
Signed-off-by: NChristoffer Dall <christoffer.dall@arm.com>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>

d53c2c29

KVM: arm/arm64: vgic: Return error on incompatible uaccess GICD_IIDR writes · b489edc3

由 Christoffer Dall 提交于 7月 16, 2018

If userspace attempts to write a GICD_IIDR that does not match the
kernel version, return an error to userspace.  The intention is to allow
implementation changes inside KVM while avoiding silently breaking
migration resulting in guests not running without any clear indication
of what went wrong.
Reviewed-by: NAndrew Jones <drjones@redhat.com>
Signed-off-by: NChristoffer Dall <christoffer.dall@arm.com>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>

b489edc3

KVM: arm/arm64: vgic: Permit uaccess writes to return errors · c6e0917b

由 Christoffer Dall 提交于 7月 16, 2018

Currently we do not allow any vgic mmio write operations to fail, which
makes sense from mmio traps from the guest. However, we should be able
to report failures to userspace, if userspace writes incompatible values
to read-only registers. Rework the internal interface to allow errors
to be returned on the write side for userspace writes.
Reviewed-by: NAndrew Jones <drjones@redhat.com>
Signed-off-by: NChristoffer Dall <christoffer.dall@arm.com>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>

c6e0917b

KVM: arm/arm64: vgic: Signal IRQs using their configured group · 87322099

由 Christoffer Dall 提交于 7月 16, 2018

Now when we have a group configuration on the struct IRQ, use this state
when populating the LR and signaling interrupts as either group 0 or
group 1 to the VM.  Depending on the model of the emulated GIC, and the
guest's configuration of the VMCR, interrupts may be signaled as IRQs or
FIQs to the VM.
Reviewed-by: NAndrew Jones <drjones@redhat.com>
Signed-off-by: NChristoffer Dall <christoffer.dall@arm.com>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>

87322099

KVM: arm/arm64: vgic: Add group field to struct irq · 8df3c8f3

由 Christoffer Dall 提交于 7月 16, 2018

In preparation for proper group 0 and group 1 support in the vgic, we
add a field in the struct irq to store the group of all interrupts.

We initialize the group to group 0 when emulating GICv2 and to group 1
when emulating GICv3, just like we treat them today.  LPIs are always
group 1.  We also continue to ignore writes from the guest, preserving
existing functionality, for now.

Finally, we also add this field to the vgic debug logic to show the
group for all interrupts.
Reviewed-by: NAndrew Jones <drjones@redhat.com>
Signed-off-by: NChristoffer Dall <christoffer.dall@arm.com>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>

8df3c8f3

KVM: arm/arm64: vgic: GICv2 IGROUPR should read as zero · dd6251e4

由 Christoffer Dall 提交于 7月 16, 2018

We currently don't support grouping in the emulated VGIC, which is a
known defect on KVM (not hurting any currently used guests as far as
we're aware). This is currently handled by treating all interrupts as
group 0 interrupts for an emulated GICv2 and always signaling interrupts
as group 0 to the virtual CPU interface.

However, when reading which group interrupts belongs to in the guest
from the emulated VGIC, the VGIC currently reports group 1 instead of
group 0, which is misleading. Fix this temporarily before introducing
full group support by changing the hander to _raz instead of _rao.

Fixes: fb848db3 "KVM: arm/arm64: vgic-new: Add GICv2 MMIO handling framework"
Reviewed-by: NAndrew Jones <drjones@redhat.com>
Signed-off-by: NChristoffer Dall <christoffer.dall@arm.com>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>

dd6251e4

KVM: arm/arm64: vgic: Keep track of implementation revision · aa075b0f

由 Christoffer Dall 提交于 7月 16, 2018

As we are about to tweak implementation aspects of the VGIC emulation,
while still preserving some level of backwards compatibility support,
add a field to keep track of the implementation revision field which is
reported to the VM and to userspace.
Reviewed-by: NAndrew Jones <drjones@redhat.com>
Signed-off-by: NChristoffer Dall <christoffer.dall@arm.com>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>

aa075b0f

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功