提交 · 4f5abad9e826bd579b0661efa32682d9c9bc3fa8 · openanolis / cloud-kernel

16 1月, 2018 1 次提交

KVM: arm/arm64: mask/unmask daif around VHE guests · 4f5abad9

由 James Morse 提交于 1月 15, 2018

Non-VHE systems take an exception to EL2 in order to world-switch into the
guest. When returning from the guest KVM implicitly restores the DAIF
flags when it returns to the kernel at EL1.

With VHE none of this exception-level jumping happens, so KVMs
world-switch code is exposed to the host kernel's DAIF values, and KVM
spills the guest-exit DAIF values back into the host kernel.
On entry to a guest we have Debug and SError exceptions unmasked, KVM
has switched VBAR but isn't prepared to handle these. On guest exit
Debug exceptions are left disabled once we return to the host and will
stay this way until we enter user space.

Add a helper to mask/unmask DAIF around VHE guests. The unmask can only
happen after the hosts VBAR value has been synchronised by the isb in
__vhe_hyp_call (via kvm_call_hyp()). Masking could be as late as
setting KVMs VBAR value, but is kept here for symmetry.
Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NJames Morse <james.morse@arm.com>
Reviewed-by: NChristoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

4f5abad9

13 1月, 2018 1 次提交

KVM: arm64: Change hyp_panic()s dependency on tpidr_el2 · c97e166e

由 James Morse 提交于 1月 08, 2018

Make tpidr_el2 a cpu-offset for per-cpu variables in the same way the
host uses tpidr_el1. This lets tpidr_el{1,2} have the same value, and
on VHE they can be the same register.

KVM calls hyp_panic() when anything unexpected happens. This may occur
while a guest owns the EL1 registers. KVM stashes the vcpu pointer in
tpidr_el2, which it uses to find the host context in order to restore
the host EL1 registers before parachuting into the host's panic().

The host context is a struct kvm_cpu_context allocated in the per-cpu
area, and mapped to hyp. Given the per-cpu offset for this CPU, this is
easy to find. Change hyp_panic() to take a pointer to the
struct kvm_cpu_context. Wrap these calls with an asm function that
retrieves the struct kvm_cpu_context from the host's per-cpu area.

Copy the per-cpu offset from the hosts tpidr_el1 into tpidr_el2 during
kvm init. (Later patches will make this unnecessary for VHE hosts)

We print out the vcpu pointer as part of the panic message. Add a back
reference to the 'running vcpu' in the host cpu context to preserve this.
Signed-off-by: NJames Morse <james.morse@arm.com>
Reviewed-by: NChristoffer Dall <cdall@linaro.org>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

c97e166e

29 11月, 2017 1 次提交

KVM: arm/arm64: debug: Introduce helper for single-step · 696673d1

由 Alex Bennée 提交于 11月 16, 2017

After emulating instructions we may want return to user-space to handle
single-step debugging. Introduce a helper function, which, if
single-step is enabled, sets the run structure for return and returns
true.
Signed-off-by: NAlex Bennée <alex.bennee@linaro.org>
Reviewed-by: NJulien Thierry <julien.thierry@arm.com>
Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>

696673d1

03 11月, 2017 1 次提交

arm64/sve: KVM: Prevent guests from using SVE · 17eed27b

由 Dave Martin 提交于 10月 31, 2017

Until KVM has full SVE support, guests must not be allowed to
execute SVE instructions.

This patch enables the necessary traps, and also ensures that the
traps are disabled again on exit from the guest so that the host
can still use SVE if it wants to.

On guest exit, high bits of the SVE Zn registers may have been
clobbered as a side-effect the execution of FPSIMD instructions in
the guest.  The existing KVM host FPSIMD restore code is not
sufficient to restore these bits, so this patch explicitly marks
the CPU as not containing cached vector state for any task, thus
forcing a reload on the next return to userspace.  This is an
interim measure, in advance of adding full SVE awareness to KVM.

This marking of cached vector state in the CPU as invalid is done
using __this_cpu_write(fpsimd_last_state, NULL) in fpsimd.c.  Due
to the repeated use of this rather obscure operation, it makes
sense to factor it out as a separate helper with a clearer name.
This patch factors it out as fpsimd_flush_cpu_state(), and ports
all callers to use it.

As a side effect of this refactoring, a this_cpu_write() in
fpsimd_cpu_pm_notifier() is changed to __this_cpu_write().  This
should be fine, since cpu_pm_enter() is supposed to be called only
with interrupts disabled.
Signed-off-by: NDave Martin <Dave.Martin@arm.com>
Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
Reviewed-by: NChristoffer Dall <christoffer.dall@linaro.org>
Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

17eed27b

01 9月, 2017 1 次提交

KVM: update to new mmu_notifier semantic v2 · fb1522e0

由 Jérôme Glisse 提交于 8月 31, 2017

Calls to mmu_notifier_invalidate_page() were replaced by calls to
mmu_notifier_invalidate_range() and are now bracketed by calls to
mmu_notifier_invalidate_range_start()/end()

Remove now useless invalidate_page callback.

Changed since v1 (Linus Torvalds)
    - remove now useless kvm_arch_mmu_notifier_invalidate_page()
Signed-off-by: NJérôme Glisse <jglisse@redhat.com>
Tested-by: NMike Galbraith <efault@gmx.de>
Tested-by: NAdam Borowski <kilobyte@angband.pl>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: kvm@vger.kernel.org
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

fb1522e0

04 6月, 2017 3 次提交

KVM: arm/arm64: use vcpu requests for irq injection · 325f9c64

由 Andrew Jones 提交于 6月 04, 2017

Don't use request-less VCPU kicks when injecting IRQs, as a VCPU
kick meant to trigger the interrupt injection could be sent while
the VCPU is outside guest mode, which means no IPI is sent, and
after it has called kvm_vgic_flush_hwstate(), meaning it won't see
the updated GIC state until its next exit some time later for some
other reason.  The receiving VCPU only needs to check this request
in VCPU RUN to handle it.  By checking it, if it's pending, a
memory barrier will be issued that ensures all state is visible.
See "Ensuring Requests Are Seen" of
Documentation/virtual/kvm/vcpu-requests.rst
Signed-off-by: NAndrew Jones <drjones@redhat.com>
Reviewed-by: NChristoffer Dall <cdall@linaro.org>
Signed-off-by: NChristoffer Dall <cdall@linaro.org>

325f9c64

KVM: arm/arm64: change exit request to sleep request · 7b244e2b

由 Andrew Jones 提交于 6月 04, 2017

A request called EXIT is too generic. All requests are meant to cause
exits, but different requests have different flags. Let's not make
it difficult to decide if the EXIT request is correct for some case
by just always providing unique requests for each case. This patch
changes EXIT to SLEEP, because that's what the request is asking the
VCPU to do.
Signed-off-by: NAndrew Jones <drjones@redhat.com>
Acked-by: NChristoffer Dall <cdall@linaro.org>
Signed-off-by: NChristoffer Dall <cdall@linaro.org>

7b244e2b

KVM: improve arch vcpu request defining · 2387149e

由 Andrew Jones 提交于 6月 04, 2017

Marc Zyngier suggested that we define the arch specific VCPU request
base, rather than requiring each arch to remember to start from 8.
That suggestion, along with Radim Krcmar's recent VCPU request flag
addition, snowballed into defining something of an arch VCPU request
defining API.

No functional change.

(Looks like x86 is running out of arch VCPU request bits.  Maybe
 someday we'll need to extend to 64.)
Signed-off-by: NAndrew Jones <drjones@redhat.com>
Acked-by: NChristoffer Dall <cdall@linaro.org>
Signed-off-by: NChristoffer Dall <cdall@linaro.org>

2387149e

23 5月, 2017 1 次提交

KVM: arm/arm64: Simplify active_change_prepare and plug race · abd72296

由 Christoffer Dall 提交于 5月 06, 2017

We don't need to stop a specific VCPU when changing the active state,
because private IRQs can only be modified by a running VCPU for the
VCPU itself and it is therefore already stopped.

However, it is also possible for two VCPUs to be modifying the active
state of SPIs at the same time, which can cause the thread being stuck
in the loop that checks other VCPU threads for a potentially very long
time, or to modify the active state of a running VCPU.  Fix this by
serializing all accesses to setting and clearing the active state of
interrupts using the KVM mutex.
Reported-by: NAndrew Jones <drjones@redhat.com>
Signed-off-by: NChristoffer Dall <cdall@linaro.org>
Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com>

abd72296

18 5月, 2017 1 次提交

arm64/cpufeature: don't use mutex in bringup path · 63a1e1c9

由 Mark Rutland 提交于 5月 16, 2017

Currently, cpus_set_cap() calls static_branch_enable_cpuslocked(), which
must take the jump_label mutex.

We call cpus_set_cap() in the secondary bringup path, from the idle
thread where interrupts are disabled. Taking a mutex in this path "is a
NONO" regardless of whether it's contended, and something we must avoid.
We didn't spot this until recently, as ___might_sleep() won't warn for
this case until all CPUs have been brought up.

This patch avoids taking the mutex in the secondary bringup path. The
poking of static keys is deferred until enable_cpu_capabilities(), which
runs in a suitable context on the boot CPU. To account for the static
keys being set later, cpus_have_const_cap() is updated to use another
static key to check whether the const cap keys have been initialised,
falling back to the caps bitmap until this is the case.

This means that users of cpus_have_const_cap() gain should only gain a
single additional NOP in the fast path once the const caps are
initialised, but should always see the current cap value.

The hyp code should never dereference the caps array, since the caps are
initialized before we run the module initcall to initialise hyp. A check
is added to the hyp init code to document this requirement.

This change will sidestep a number of issues when the upcoming hotplug
locking rework is merged.
Signed-off-by: NMark Rutland <mark.rutland@arm.com>
Reviewed-by: NMarc Zyniger <marc.zyngier@arm.com>
Reviewed-by: NSuzuki Poulose <suzuki.poulose@arm.com>
Acked-by: NWill Deacon <will.deacon@arm.com>
Cc: Christoffer Dall <christoffer.dall@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Sewior <bigeasy@linutronix.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

63a1e1c9

27 4月, 2017 2 次提交

KVM: mark requests that need synchronization · 7a97cec2

由 Paolo Bonzini 提交于 4月 27, 2017

kvm_make_all_requests() provides a synchronization that waits until all
kicked VCPUs have acknowledged the kick.  This is important for
KVM_REQ_MMU_RELOAD as it prevents freeing while lockless paging is
underway.

This patch adds the synchronization property into all requests that are
currently being used with kvm_make_all_requests() in order to preserve
the current behavior and only introduce a new framework.  Removing it
from requests where it is not necessary is left for future patches.
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7a97cec2

KVM: mark requests that do not need a wakeup · 930f7fd6

由 Radim Krčmář 提交于 4月 26, 2017

Some operations must ensure that the guest is not running with stale
data, but if the guest is halted, then the update can wait until another
event happens.  kvm_make_all_requests() currently doesn't wake up, so we
can mark all requests used with it.

First 8 bits were arbitrarily reserved for request numbers.

Most uses of requests have the request type as a constant, so a compiler
will optimize the '&'.

An alternative would be to have an inline function that would return
whether the request needs a wake-up or not, but I like this one better
even though it might produce worse assembly.
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
Reviewed-by: NAndrew Jones <drjones@redhat.com>
Reviewed-by: NCornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

930f7fd6

09 4月, 2017 2 次提交

arm/arm64: KVM: Use __hyp_reset_vectors() directly · 0fb26593

由 Marc Zyngier 提交于 4月 03, 2017

__cpu_reset_hyp_mode doesn't need to be passed any argument now,
as the hyp-stub implementations are self-contained, and is now
reduced to just calling __hyp_reset_vectors(). Let's drop the
wrapper and use the stub hypercall directly.
Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NChristoffer Dall <cdall@linaro.org>

0fb26593

arm64: KVM: Convert __cpu_reset_hyp_mode to using __hyp_reset_vectors · 4adb1341

由 Marc Zyngier 提交于 4月 03, 2017

We are now able to use the hyp stub to reset HYP mode. Time to
kiss __kvm_hyp_reset goodbye, and use __hyp_reset_vectors.
Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
Reviewed-by: NJames Morse <james.morse@arm.com>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NChristoffer Dall <cdall@linaro.org>

4adb1341

07 4月, 2017 1 次提交

kvm: make KVM_COALESCED_MMIO_PAGE_OFFSET public · 4b4357e0

由 Paolo Bonzini 提交于 3月 31, 2017

Its value has never changed; we might as well make it part of the ABI instead
of using the return value of KVM_CHECK_EXTENSION(KVM_CAP_COALESCED_MMIO).

Because PPC does not always make MMIO available, the code has to be made
dependent on CONFIG_KVM_MMIO rather than KVM_COALESCED_MMIO_PAGE_OFFSET.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

4b4357e0

09 3月, 2017 2 次提交

KVM: arm64: Increase number of user memslots to 512 · 955a3fc6

由 Linu Cherian 提交于 3月 08, 2017

Having only 32 memslots is a real constraint for the maximum
number of PCI devices that can be assigned to a single guest.
Assuming each PCI device/virtual function having two memory BAR
regions, we could assign only 15 devices/virtual functions to a
guest.

Hence increase KVM_USER_MEM_SLOTS to 512 as done in other archs like
powerpc.
Reviewed-by: NChristoffer Dall <cdall@linaro.org>
Signed-off-by: NLinu Cherian <linu.cherian@cavium.com>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>

955a3fc6

KVM: arm/arm64: Remove KVM_PRIVATE_MEM_SLOTS definition that are unused · 3e92f94a

由 Linu Cherian 提交于 3月 08, 2017

arm/arm64 architecture doesnt use private memslots, hence removing
KVM_PRIVATE_MEM_SLOTS macro definition.
Reviewed-by: NChristoffer Dall <cdall@linaro.org>
Signed-off-by: NLinu Cherian <linu.cherian@cavium.com>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>

3e92f94a

08 2月, 2017 1 次提交

KVM: arm/arm64: Move cntvoff to each timer context · 90de943a

由 Jintack Lim 提交于 2月 03, 2017

Make cntvoff per each timer context. This is helpful to abstract kvm
timer functions to work with timer context without considering timer
types (e.g. physical timer or virtual timer).

This also would pave the way for ever doing adjustments of the cntvoff
on a per-CPU basis if that should ever make sense.
Signed-off-by: NJintack Lim <jintack@cs.columbia.edu>
Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>

90de943a

03 2月, 2017 1 次提交

arm64: KVM: Save/restore the host SPE state when entering/leaving a VM · f85279b4

由 Will Deacon 提交于 9月 22, 2016

The SPE buffer is virtually addressed, using the page tables of the CPU
MMU. Unusually, this means that the EL0/1 page table may be live whilst
we're executing at EL2 on non-VHE configurations. When VHE is in use,
we can use the same property to profile the guest behind its back.

This patch adds the relevant disabling and flushing code to KVM so that
the host can make use of SPE without corrupting guest memory, and any
attempts by a guest to use SPE will result in a trap.
Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
Cc: Alex Bennée <alex.bennee@linaro.org>
Cc: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

f85279b4

05 11月, 2016 1 次提交

arm/arm64: KVM: Perform local TLB invalidation when multiplexing vcpus on a single CPU · 94d0e598

由 Marc Zyngier 提交于 10月 18, 2016

Architecturally, TLBs are private to the (physical) CPU they're
associated with. But when multiple vcpus from the same VM are
being multiplexed on the same CPU, the TLBs are not private
to the vcpus (and are actually shared across the VMID).

Let's consider the following scenario:

- vcpu-0 maps PA to VA
- vcpu-1 maps PA' to VA

If run on the same physical CPU, vcpu-1 can hit TLB entries generated
by vcpu-0 accesses, and access the wrong physical page.

The solution to this is to keep a per-VM map of which vcpu ran last
on each given physical CPU, and invalidate local TLBs when switching
to a different vcpu from the same VM.
Reviewed-by: NChristoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>

94d0e598

08 9月, 2016 1 次提交

KVM: Add provisioning for ulong vm stats and u64 vcpu stats · 8a7e75d4

由 Suraj Jitindar Singh 提交于 8月 02, 2016

vms and vcpus have statistics associated with them which can be viewed
within the debugfs. Currently it is assumed within the vcpu_stat_get() and
vm_stat_get() functions that all of these statistics are represented as
u32s, however the next patch adds some u64 vcpu statistics.

Change all vcpu statistics to u64 and modify vcpu_stat_get() accordingly.
Since vcpu statistics are per vcpu, they will only be updated by a single
vcpu at a time so this shouldn't present a problem on 32-bit machines
which can't atomically increment 64-bit numbers. However vm statistics
could potentially be updated by multiple vcpus from that vm at a time.
To avoid the overhead of atomics make all vm statistics ulong such that
they are 64-bit on 64-bit systems where they can be atomically incremented
and are 32-bit on 32-bit systems which may not be able to atomically
increment 64-bit numbers. Modify vm_stat_get() to expect ulongs.
Signed-off-by: NSuraj Jitindar Singh <sjitindarsingh@gmail.com>
Reviewed-by: NDavid Matlack <dmatlack@google.com>
Acked-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

8a7e75d4

19 7月, 2016 1 次提交

KVM: arm/arm64: Extend arch CAP checks to allow per-VM capabilities · b46f01ce

由 Andre Przywara 提交于 7月 15, 2016

KVM capabilities can be a per-VM property, though ARM/ARM64 currently
does not pass on the VM pointer to the architecture specific
capability handlers.
Add a "struct kvm*" parameter to those function to later allow proper
per-VM capability reporting.
Signed-off-by: NAndre Przywara <andre.przywara@arm.com>
Reviewed-by: NEric Auger <eric.auger@linaro.org>
Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com>
Acked-by: NChristoffer Dall <christoffer.dall@linaro.org>
Tested-by: NEric Auger <eric.auger@redhat.com>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>

b46f01ce

04 7月, 2016 3 次提交

arm: KVM: Allow hyp teardown · e537ecd7

由 Marc Zyngier 提交于 6月 30, 2016

So far, KVM was getting in the way of kexec on 32bit (and the arm64
kexec hackers couldn't be bothered to fix it on 32bit...).

With simpler page tables, tearing KVM down becomes very easy, so
let's just do it.
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>

e537ecd7

arm/arm64: KVM: Drop boot_pgd · 12fda812

由 Marc Zyngier 提交于 6月 30, 2016

Since we now only have one set of page tables, the concept of
boot_pgd is useless and can be removed. We still keep it as
an element of the "extended idmap" thing.
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>

12fda812

arm64: KVM: Simplify HYP init/teardown · 3421e9d8

由 Marc Zyngier 提交于 6月 30, 2016

Now that we only have the "merged page tables" case to deal with,
there is a bunch of things we can simplify in the HYP code (both
at init and teardown time).
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>

3421e9d8

20 5月, 2016 2 次提交

KVM: arm/arm64: vgic-new: Synchronize changes to active state · 35a2d585

由 Christoffer Dall 提交于 5月 20, 2016

When modifying the active state of an interrupt via the MMIO interface,
we should ensure that the write has the intended effect.

If a guest sets an interrupt to active, but that interrupt is already
flushed into a list register on a running VCPU, then that VCPU will
write the active state back into the struct vgic_irq upon returning from
the guest and syncing its state.  This is a non-benign race, because the
guest can observe that an interrupt is not active, and it can have a
reasonable expectations that other VCPUs will not ack any IRQs, and then
set the state to active, and expect it to stay that way.  Currently we
are not honoring this case.

Thefore, change both the SACTIVE and CACTIVE mmio handlers to stop the
world, change the irq state, potentially queue the irq if we're setting
it to active, and then continue.

We take this chance to slightly optimize these functions by not stopping
the world when touching private interrupts where there is inherently no
possible race.
Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>

35a2d585

KVM: arm/arm64: Provide functionality to pause and resume a guest · b13216cf

由 Christoffer Dall 提交于 4月 27, 2016

For some rare corner cases in our VGIC emulation later we have to stop
the guest to make sure the VGIC state is consistent.
Provide the necessary framework to pause and resume a guest.
Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: NAndre Przywara <andre.przywara@arm.com>

b13216cf

13 5月, 2016 1 次提交

KVM: halt_polling: provide a way to qualify wakeups during poll · 3491caf2

由 Christian Borntraeger 提交于 5月 13, 2016

Some wakeups should not be considered a sucessful poll. For example on
s390 I/O interrupts are usually floating, which means that _ALL_ CPUs
would be considered runnable - letting all vCPUs poll all the time for
transactional like workload, even if one vCPU would be enough.
This can result in huge CPU usage for large guests.
This patch lets architectures provide a way to qualify wakeups if they
should be considered a good/bad wakeups in regard to polls.

For s390 the implementation will fence of halt polling for anything but
known good, single vCPU events. The s390 implementation for floating
interrupts does a wakeup for one vCPU, but the interrupt will be delivered
by whatever CPU checks first for a pending interrupt. We prefer the
woken up CPU by marking the poll of this CPU as "good" poll.
This code will also mark several other wakeup reasons like IPI or
expired timers as "good". This will of course also mark some events as
not sucessful. As  KVM on z runs always as a 2nd level hypervisor,
we prefer to not poll, unless we are really sure, though.

This patch successfully limits the CPU usage for cases like uperf 1byte
transactional ping pong workload or wakeup heavy workload like OLTP
while still providing a proper speedup.

This also introduced a new vcpu stat "halt_poll_no_tuning" that marks
wakeups that are considered not good for polling.
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Radim Krčmář <rkrcmar@redhat.com> (for an earlier version)
Cc: David Matlack <dmatlack@google.com>
Cc: Wanpeng Li <kernellwp@gmail.com>
[Rename config symbol. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

3491caf2

03 5月, 2016 1 次提交

arm64: kvm: Fix kvm teardown for systems using the extended idmap · c612505f

由 James Morse 提交于 4月 29, 2016

If memory is located above 1<<VA_BITS, kvm adds an extra level to its page
tables, merging the runtime tables and boot tables that contain the idmap.
This lets us avoid the trampoline dance during initialisation.

This also means there is no trampoline page mapped, so
__cpu_reset_hyp_mode() can't call __kvm_hyp_reset() in this page. The good
news is the idmap is still mapped, so we don't need the trampoline page.
The bad news is we can't call it directly as the idmap is above
HYP_PAGE_OFFSET, so its address is masked by kvm_call_hyp.

Add a function __extended_idmap_trampoline which will branch into
__kvm_hyp_reset in the idmap, change kvm_hyp_reset_entry() to return
this address if __kvm_cpu_uses_extended_idmap(). In this case
__kvm_hyp_reset() will still switch to the boot tables (which are the
merged tables that were already in use), and branch into the idmap (where
it already was).

This fixes boot failures on these systems, where we fail to execute the
missing trampoline page when tearing down kvm in init_subsystems():
[    2.508922] kvm [1]: 8-bit VMID
[    2.512057] kvm [1]: Hyp mode initialized successfully
[    2.517242] kvm [1]: interrupt-controller@e1140000 IRQ13
[    2.522622] kvm [1]: timer IRQ3
[    2.525783] Kernel panic - not syncing: HYP panic:
[    2.525783] PS:200003c9 PC:0000007ffffff820 ESR:86000005
[    2.525783] FAR:0000007ffffff820 HPFAR:00000000003ffff0 PAR:0000000000000000
[    2.525783] VCPU:          (null)
[    2.525783]
[    2.547667] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G        W       4.6.0-rc5+ #1
[    2.555137] Hardware name: Default string Default string/Default string, BIOS ROD0084E 09/03/2015
[    2.563994] Call trace:
[    2.566432] [<ffffff80080888d0>] dump_backtrace+0x0/0x240
[    2.571818] [<ffffff8008088b24>] show_stack+0x14/0x20
[    2.576858] [<ffffff80083423ac>] dump_stack+0x94/0xb8
[    2.581899] [<ffffff8008152130>] panic+0x10c/0x250
[    2.586677] [<ffffff8008152024>] panic+0x0/0x250
[    2.591281] SMP: stopping secondary CPUs
[    3.649692] SMP: failed to stop secondary CPUs 0-2,4-7
[    3.654818] Kernel Offset: disabled
[    3.658293] Memory Limit: none
[    3.661337] ---[ end Kernel panic - not syncing: HYP panic:
[    3.661337] PS:200003c9 PC:0000007ffffff820 ESR:86000005
[    3.661337] FAR:0000007ffffff820 HPFAR:00000000003ffff0 PAR:0000000000000000
[    3.661337] VCPU:          (null)
[    3.661337]
Reported-by: NWill Deacon <will.deacon@arm.com>
Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NJames Morse <james.morse@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

c612505f

28 4月, 2016 1 次提交

arm64: kvm: allows kvm cpu hotplug · 67f69197

由 AKASHI Takahiro 提交于 4月 27, 2016

The current kvm implementation on arm64 does cpu-specific initialization
at system boot, and has no way to gracefully shutdown a core in terms of
kvm. This prevents kexec from rebooting the system at EL2.

This patch adds a cpu tear-down function and also puts an existing cpu-init
code into a separate function, kvm_arch_hardware_disable() and
kvm_arch_hardware_enable() respectively.
We don't need the arm64 specific cpu hotplug hook any more.

Since this patch modifies common code between arm and arm64, one stub
definition, __cpu_reset_hyp_mode(), is added on arm side to avoid
compilation errors.
Signed-off-by: NAKASHI Takahiro <takahiro.akashi@linaro.org>
[Rebase, added separate VHE init/exit path, changed resets use of
 kvm_call_hyp() to the __version, en/disabled hardware in init_subsystems(),
 added icache maintenance to __kvm_hyp_reset() and removed lr restore, removed
 guest-enter after teardown handling]
Signed-off-by: NJames Morse <james.morse@arm.com>
Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

67f69197

06 4月, 2016 1 次提交

arm64: KVM: Warn when PARange is less than 40 bits · 6141570c

由 Marc Zyngier 提交于 4月 05, 2016

We always thought that 40bits of PA range would be the minimum people
would actually build. Anything less is terrifyingly small.

Turns out that we were both right and wrong. Nobody has ever built
such a system, but the ARM Foundation Model has a PARange set to 36bits.
Just because we can. Oh well. Now, the KVM API explicitely says that
we offer a 40bit PA space to the VM, so we shouldn't run KVM on
the Foundation Model at all.

That being said, this patch offers a less agressive alternative, and
loudly warns about the configuration being unsupported. You'll still
be able to run VMs (at your own risks, though).

This is just a workaround until we have a proper userspace API where
we report the PARange to userspace.
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>

6141570c

29 3月, 2016 1 次提交

arm64: perf: Move PMU register related defines to asm/perf_event.h · b8cfadfc

由 Shannon Zhao 提交于 3月 24, 2016

To use the ARMv8 PMU related register defines from the KVM code, we move
the relevant definitions to asm/perf_event.h header file and rename them
with prefix ARMV8_PMU_. This allows us to get rid of kvm_perf_event.h.
Signed-off-by: NAnup Patel <anup.patel@linaro.org>
Signed-off-by: NShannon Zhao <shannon.zhao@linaro.org>
Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
Reviewed-by: NAndrew Jones <drjones@redhat.com>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

b8cfadfc

05 3月, 2016 1 次提交

arm64: Fix misspellings in comments. · ef769e32

由 Adam Buchbinder 提交于 2月 24, 2016

Signed-off-by: NAdam Buchbinder <adam.buchbinder@gmail.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

ef769e32

01 3月, 2016 7 次提交

arm64: KVM: Move kvm_call_hyp back to its original localtion · 22b39ca3

由 Marc Zyngier 提交于 3月 01, 2016

In order to reduce the risk of a bad merge, let's move the new
kvm_call_hyp back to its original location in the file. This has
zero impact from a code point of view.
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
Acked-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

22b39ca3

arm64: KVM: Add a new vcpu device control group for PMUv3 · bb0c70bc

由 Shannon Zhao 提交于 1月 11, 2016

To configure the virtual PMUv3 overflow interrupt number, we use the
vcpu kvm_device ioctl, encapsulating the KVM_ARM_VCPU_PMU_V3_IRQ
attribute within the KVM_ARM_VCPU_PMU_V3_CTRL group.

After configuring the PMUv3, call the vcpu ioctl with attribute
KVM_ARM_VCPU_PMU_V3_INIT to initialize the PMUv3.
Signed-off-by: NShannon Zhao <shannon.zhao@linaro.org>
Acked-by: NPeter Maydell <peter.maydell@linaro.org>
Reviewed-by: NAndrew Jones <drjones@redhat.com>
Reviewed-by: NChristoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>

bb0c70bc

arm64: KVM: Add a new feature bit for PMUv3 · 808e7381

由 Shannon Zhao 提交于 1月 11, 2016

To support guest PMUv3, use one bit of the VCPU INIT feature array.
Initialize the PMU when initialzing the vcpu with that bit and PMU
overflow interrupt set.
Signed-off-by: NShannon Zhao <shannon.zhao@linaro.org>
Acked-by: NPeter Maydell <peter.maydell@linaro.org>
Reviewed-by: NAndrew Jones <drjones@redhat.com>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>

808e7381

arm64: KVM: Add access handler for PMUSERENR register · d692b8ad

由 Shannon Zhao 提交于 9月 08, 2015

This register resets as unknown in 64bit mode while it resets as zero
in 32bit mode. Here we choose to reset it as zero for consistency.

PMUSERENR_EL0 holds some bits which decide whether PMU registers can be
accessed from EL0. Add some check helpers to handle the access from EL0.

When these bits are zero, only reading PMUSERENR will trap to EL2 and
writing PMUSERENR or reading/writing other PMU registers will trap to
EL1 other than EL2 when HCR.TGE==0. To current KVM configuration
(HCR.TGE==0) there is no way to get these traps. Here we write 0xf to
physical PMUSERENR register on VM entry, so that it will trap PMU access
from EL0 to EL2. Within the register access handler we check the real
value of guest PMUSERENR register to decide whether this access is
allowed. If not allowed, return false to inject UND to guest.
Signed-off-by: NShannon Zhao <shannon.zhao@linaro.org>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>

d692b8ad

arm64: KVM: Add access handler for PMSWINC register · 7a0adc70

由 Shannon Zhao 提交于 9月 08, 2015

Add access handler which emulates writing and reading PMSWINC
register and add support for creating software increment event.
Signed-off-by: NShannon Zhao <shannon.zhao@linaro.org>
Reviewed-by: NAndrew Jones <drjones@redhat.com>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>

7a0adc70

arm64: KVM: Add access handler for PMOVSSET and PMOVSCLR register · 76d883c4

由 Shannon Zhao 提交于 9月 08, 2015

Since the reset value of PMOVSSET and PMOVSCLR is UNKNOWN, use
reset_unknown for its reset handler. Add a handler to emulate writing
PMOVSSET or PMOVSCLR register.

When writing non-zero value to PMOVSSET, the counter and its interrupt
is enabled, kick this vcpu to sync PMU interrupt.
Signed-off-by: NShannon Zhao <shannon.zhao@linaro.org>
Reviewed-by: NAndrew Jones <drjones@redhat.com>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>

76d883c4

arm64: KVM: Add access handler for PMINTENSET and PMINTENCLR register · 9db52c78

由 Shannon Zhao 提交于 9月 08, 2015

Since the reset value of PMINTENSET and PMINTENCLR is UNKNOWN, use
reset_unknown for its reset handler. Add a handler to emulate writing
PMINTENSET or PMINTENCLR register.
Signed-off-by: NShannon Zhao <shannon.zhao@linaro.org>
Reviewed-by: NAndrew Jones <drjones@redhat.com>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>

9db52c78

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功