提交 · 90c73795afa24890bd2ae4f3b359de04b4147d37 · openeuler / Kernel

30 4月, 2019 6 次提交

KVM: PPC: Book3S HV: Add a new KVM device for the XIVE native exploitation mode · 90c73795

由 Cédric Le Goater 提交于 4月 18, 2019

This is the basic framework for the new KVM device supporting the XIVE
native exploitation mode. The user interface exposes a new KVM device
to be created by QEMU, only available when running on a L0 hypervisor.
Support for nested guests is not available yet.

The XIVE device reuses the device structure of the XICS-on-XIVE device
as they have a lot in common. That could possibly change in the future
if the need arise.
Signed-off-by: NCédric Le Goater <clg@kaod.org>
Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

90c73795

KVM: PPC: Book3S HV: Flush TLB on secondary radix threads · 70ea13f6

由 Paul Mackerras 提交于 4月 29, 2019

When running on POWER9 with kvm_hv.indep_threads_mode = N and the host
in SMT1 mode, KVM will run guest VCPUs on offline secondary threads.
If those guests are in radix mode, we fail to load the LPID and flush
the TLB if necessary, leading to the guest crashing with an
unsupported MMU fault. This arises from commit 9a4506e1 ("KVM:
PPC: Book3S HV: Make radix handle process scoped LPID flush in C,
with relocation on", 2018-05-17), which didn't consider the case
where indep_threads_mode = N.

For simplicity, this makes the real-mode guest entry path flush the
TLB in the same place for both radix and hash guests, as we did before
9a4506e1, though the code is now C code rather than assembly code.
We also have the radix TLB flush open-coded rather than calling
radix__local_flush_tlb_lpid_guest(), because the TLB flush can be
called in real mode, and in real mode we don't want to invoke the
tracepoint code.

Fixes: 9a4506e1 ("KVM: PPC: Book3S HV: Make radix handle process scoped LPID flush in C, with relocation on")
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

70ea13f6

KVM: PPC: Book3S HV: Move HPT guest TLB flushing to C code · 2940ba0c

由 Paul Mackerras 提交于 4月 29, 2019

This replaces assembler code in book3s_hv_rmhandlers.S that checks
the kvm->arch.need_tlb_flush cpumask and optionally does a TLB flush
with C code in book3s_hv_builtin.c.  Note that unlike the radix
version, the hash version doesn't do an explicit ERAT invalidation
because we will invalidate and load up the SLB before entering the
guest, and that will invalidate the ERAT.
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

2940ba0c

KVM: PPC: Book3S: Allocate guest TCEs on demand too · e1a1ef84

由 Alexey Kardashevskiy 提交于 3月 29, 2019

We already allocate hardware TCE tables in multiple levels and skip
intermediate levels when we can, now it is a turn of the KVM TCE tables.
Thankfully these are allocated already in 2 levels.

This moves the table's last level allocation from the creating helper to
kvmppc_tce_put() and kvm_spapr_tce_fault(). Since such allocation cannot
be done in real mode, this creates a virtual mode version of
kvmppc_tce_put() which handles allocations.

This adds kvmppc_rm_ioba_validate() to do an additional test if
the consequent kvmppc_tce_put() needs a page which has not been allocated;
if this is the case, we bail out to virtual mode handlers.

The allocations are protected by a new mutex as kvm->lock is not suitable
for the task because the fault handler is called with the mmap_sem held
but kvmhv_setup_mmu() locks kvm->lock and mmap_sem in the reverse order.
Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

e1a1ef84

KVM: PPC: Book3S HV: Avoid lockdep debugging in TCE realmode handlers · 2001825e

由 Alexey Kardashevskiy 提交于 3月 29, 2019

The kvmppc_tce_to_ua() helper is called from real and virtual modes
and it works fine as long as CONFIG_DEBUG_LOCKDEP is not enabled.
However if the lockdep debugging is on, the lockdep will most likely break
in kvm_memslots() because of srcu_dereference_check() so we need to use
PPC-own kvm_memslots_raw() which uses realmode safe
rcu_dereference_raw_notrace().

This creates a realmode copy of kvmppc_tce_to_ua() which replaces
kvm_memslots() with kvm_memslots_raw().

Since kvmppc_rm_tce_to_ua() becomes static and can only be used inside
HV KVM, this moves it earlier under CONFIG_KVM_BOOK3S_HV_POSSIBLE.

This moves truly virtual-mode kvmppc_tce_to_ua() to where it belongs and
drops the prmap parameter which was never used in the virtual mode.

Fixes: d3695aa4 ("KVM: PPC: Add support for multiple-TCE hcalls", 2016-02-15)
Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

2001825e

KVM: PPC: Book3S HV: Implement real mode H_PAGE_INIT handler · eadfb1c5

由 Suraj Jitindar Singh 提交于 3月 22, 2019

Implement a real mode handler for the H_CALL H_PAGE_INIT which can be
used to zero or copy a guest page. The page is defined to be 4k and must
be 4k aligned.

The in-kernel real mode handler halves the time to handle this H_CALL
compared to handling it in userspace for a hash guest.
Signed-off-by: NSuraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

eadfb1c5

27 2月, 2019 1 次提交

KVM: PPC: Fix compilation when KVM is not enabled · e74d53e3

由 Paul Mackerras 提交于 2月 25, 2019

Compiling with CONFIG_PPC_POWERNV=y and KVM disabled currently gives
an error like this:

  CC      arch/powerpc/kernel/dbell.o
In file included from arch/powerpc/kernel/dbell.c:20:0:
arch/powerpc/include/asm/kvm_ppc.h: In function ‘xics_on_xive’:
arch/powerpc/include/asm/kvm_ppc.h:625:9: error: implicit declaration of function ‘xive_enabled’ [-Werror=implicit-function-declaration]
  return xive_enabled() && cpu_has_feature(CPU_FTR_HVMODE);
         ^
cc1: all warnings being treated as errors
scripts/Makefile.build:276: recipe for target 'arch/powerpc/kernel/dbell.o' failed
make[3]: *** [arch/powerpc/kernel/dbell.o] Error 1

Fix this by making the xics_on_xive() definition conditional on the
same symbol (CONFIG_KVM_BOOK3S_64_HANDLER) that determines whether we
include <asm/xive.h> or not, since that's the header that defines
xive_enabled().

Fixes: 03f95332 ("KVM: PPC: Book3S: Allow XICS emulation to work in nested hosts using XIVE")
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

e74d53e3

21 2月, 2019 1 次提交

KVM: PPC: Book3S HV: Simplify machine check handling · 884dfb72

由 Paul Mackerras 提交于 2月 21, 2019

This makes the handling of machine check interrupts that occur inside
a guest simpler and more robust, with less done in assembler code and
in real mode.

Now, when a machine check occurs inside a guest, we always get the
machine check event struct and put a copy in the vcpu struct for the
vcpu where the machine check occurred.  We no longer call
machine_check_queue_event() from kvmppc_realmode_mc_power7(), because
on POWER8, when a vcpu is running on an offline secondary thread and
we call machine_check_queue_event(), that calls irq_work_queue(),
which doesn't work because the CPU is offline, but instead triggers
the WARN_ON(lazy_irq_pending()) in pnv_smp_cpu_kill_self() (which
fires again and again because nothing clears the condition).

All that machine_check_queue_event() actually does is to cause the
event to be printed to the console.  For a machine check occurring in
the guest, we now print the event in kvmppc_handle_exit_hv()
instead.

The assembly code at label machine_check_realmode now just calls C
code and then continues exiting the guest.  We no longer either
synthesize a machine check for the guest in assembly code or return
to the guest without a machine check.

The code in kvmppc_handle_exit_hv() is extended to handle the case
where the guest is not FWNMI-capable.  In that case we now always
synthesize a machine check interrupt for the guest.  Previously, if
the host thinks it has recovered the machine check fully, it would
return to the guest without any notification that the machine check
had occurred.  If the machine check was caused by some action of the
guest (such as creating duplicate SLB entries), it is much better to
tell the guest that it has caused a problem.  Therefore we now always
generate a machine check interrupt for guests that are not
FWNMI-capable.
Reviewed-by: NAravinda Prasad <aravinda@linux.vnet.ibm.com>
Reviewed-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

884dfb72

19 2月, 2019 1 次提交

KVM: PPC: Book3S: Allow XICS emulation to work in nested hosts using XIVE · 03f95332

由 Paul Mackerras 提交于 2月 04, 2019

Currently, the KVM code assumes that if the host kernel is using the
XIVE interrupt controller (the new interrupt controller that first
appeared in POWER9 systems), then the in-kernel XICS emulation will
use the XIVE hardware to deliver interrupts to the guest. However,
this only works when the host is running in hypervisor mode and has
full access to all of the XIVE functionality. It doesn't work in any
nested virtualization scenario, either with PR KVM or nested-HV KVM,
because the XICS-on-XIVE code calls directly into the native-XIVE
routines, which are not initialized and cannot function correctly
because they use OPAL calls, and OPAL is not available in a guest.

This means that using the in-kernel XICS emulation in a nested
hypervisor that is using XIVE as its interrupt controller will cause a
(nested) host kernel crash. To fix this, we change most of the places
where the current code calls xive_enabled() to select between the
XICS-on-XIVE emulation and the plain XICS emulation to call a new
function, xics_on_xive(), which returns false in a guest.

However, there is a further twist. The plain XICS emulation has some
functions which are used in real mode and access the underlying XICS
controller (the interrupt controller of the host) directly. In the
case of a nested hypervisor, this means doing XICS hypercalls
directly. When the nested host is using XIVE as its interrupt
controller, these hypercalls will fail. Therefore this also adds
checks in the places where the XICS emulation wants to access the
underlying interrupt controller directly, and if that is XIVE, makes
the code use the virtual mode fallback paths, which call generic
kernel infrastructure rather than doing direct XICS access.
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
Reviewed-by: NCédric Le Goater <clg@kaod.org>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

03f95332

17 12月, 2018 2 次提交

KVM: PPC: Add load_from_eaddr and store_to_eaddr to the kvmppc_ops struct · dceadcf9

由 Suraj Jitindar Singh 提交于 12月 14, 2018

The kvmppc_ops struct is used to store function pointers to kvm
implementation specific functions.

Introduce two new functions load_from_eaddr and store_to_eaddr to be
used to load from and store to a guest effective address respectively.

Also implement these for the kvm-hv module. If we are using the radix
mmu then we can call the functions to access quadrant 1 and 2.
Signed-off-by: NSuraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

dceadcf9

KVM: PPC: Pass change type down to memslot commit function · f032b734

由 Bharata B Rao 提交于 12月 12, 2018

Currently, kvm_arch_commit_memory_region() gets called with a
parameter indicating what type of change is being made to the memslot,
but it doesn't pass it down to the platform-specific memslot commit
functions.  This adds the `change' parameter to the lower-level
functions so that they can use it in future.

[paulus@ozlabs.org - fix book E also.]
Signed-off-by: NBharata B Rao <bharata@linux.vnet.ibm.com>
Reviewed-by: NSuraj Jitindar Singh <sjitindarsingh@gmail.com>
Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

f032b734

09 10月, 2018 5 次提交

KVM: PPC: Book3S HV: Add a VM capability to enable nested virtualization · aa069a99

由 Paul Mackerras 提交于 9月 21, 2018

With this, userspace can enable a KVM-HV guest to run nested guests
under it.

The administrator can control whether any nested guests can be run;
setting the "nested" module parameter to false prevents any guests
becoming nested hypervisors (that is, any attempt to enable the nested
capability on a guest will fail).  Guests which are already nested
hypervisors will continue to be so.
Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

aa069a99

KVM: PPC: Book3S HV: Streamlined guest entry/exit path on P9 for radix guests · 95a6432c

由 Paul Mackerras 提交于 10月 08, 2018

This creates an alternative guest entry/exit path which is used for
radix guests on POWER9 systems when we have indep_threads_mode=Y.  In
these circumstances there is exactly one vcpu per vcore and there is
no coordination required between vcpus or vcores; the vcpu can enter
the guest without needing to synchronize with anything else.

The new fast path is implemented almost entirely in C in book3s_hv.c
and runs with the MMU on until the guest is entered.  On guest exit
we use the existing path until the point where we are committed to
exiting the guest (as distinct from handling an interrupt in the
low-level code and returning to the guest) and we have pulled the
guest context from the XIVE.  At that point we check a flag in the
stack frame to see whether we came in via the old path and the new
path; if we came in via the new path then we go back to C code to do
the rest of the process of saving the guest context and restoring the
host context.

The C code is split into separate functions for handling the
OS-accessible state and the hypervisor state, with the idea that the
latter can be replaced by a hypercall when we implement nested
virtualization.
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
[mpe: Fix CONFIG_ALTIVEC=n build]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

95a6432c

KVM: PPC: Book3S HV: Move interrupt delivery on guest entry to C code · f7035ce9

由 Paul Mackerras 提交于 10月 08, 2018

This is based on a patch by Suraj Jitindar Singh.

This moves the code in book3s_hv_rmhandlers.S that generates an
external, decrementer or privileged doorbell interrupt just before
entering the guest to C code in book3s_hv_builtin.c.  This is to
make future maintenance and modification easier.  The algorithm
expressed in the C code is almost identical to the previous
algorithm.
Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

f7035ce9

KVM: PPC: Remove redundand permission bits removal · a3ac077b

由 Alexey Kardashevskiy 提交于 9月 10, 2018

The kvmppc_gpa_to_ua() helper itself takes care of the permission
bits in the TCE and yet every single caller removes them.

This changes semantics of kvmppc_gpa_to_ua() so it takes TCEs
(which are GPAs + TCE permission bits) to make the callers simpler.

This should cause no behavioural change.
Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

a3ac077b

KVM: PPC: Validate TCEs against preregistered memory page sizes · 42de7b9e

由 Alexey Kardashevskiy 提交于 9月 10, 2018

The userspace can request an arbitrary supported page size for a DMA
window and this works fine as long as the mapped memory is backed with
the pages of the same or bigger size; if this is not the case,
mm_iommu_ua_to_hpa{_rm}() fail and tables do not populated with
dangerously incorrect TCEs.

However since it is quite easy to misconfigure the KVM and we do not do
reverts to all changes made to TCE tables if an error happens in a middle,
we better do the acceptable page size validation before we even touch
the tables.

This enhances kvmppc_tce_validate() to check the hardware IOMMU page sizes
against the preregistered memory page sizes.

Since the new check uses real/virtual mode helpers, this renames
kvmppc_tce_validate() to kvmppc_rm_tce_validate() to handle the real mode
case and mirrors it for the virtual mode under the old name. The real
mode handler is not used for the virtual mode as:
1. it uses _lockless() list traversing primitives instead of RCU;
2. realmode's mm_iommu_ua_to_hpa_rm() uses vmalloc_to_phys() which
virtual mode does not have to use and since on POWER9+radix only virtual
mode handlers actually work, we do not want to slow down that path even
a bit.

This removes EXPORT_SYMBOL_GPL(kvmppc_tce_validate) as the validators
are static now.

From now on the attempts on mapping IOMMU pages bigger than allowed
will result in KVM exit.
Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
[mpe: Fix KVM_HV=n build]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

42de7b9e

22 5月, 2018 3 次提交

KVM: PPC: Reimplement LOAD_VMX/STORE_VMX instruction mmio emulation with analyse_instr() input · acc9eb93

由 Simon Guo 提交于 5月 21, 2018

This patch reimplements LOAD_VMX/STORE_VMX MMIO emulation with
analyse_instr() input. When emulating the store, the VMX reg will need to
be flushed so that the right reg val can be retrieved before writing to
IO MEM.

This patch also adds support for lvebx/lvehx/lvewx/stvebx/stvehx/stvewx
MMIO emulation. To meet the requirement of handling different element
sizes, kvmppc_handle_load128_by2x64()/kvmppc_handle_store128_by2x64()
were replaced with kvmppc_handle_vmx_load()/kvmppc_handle_vmx_store().

The framework used is similar to VSX instruction MMIO emulation.
Suggested-by: NPaul Mackerras <paulus@ozlabs.org>
Signed-off-by: NSimon Guo <wei.guo.simon@gmail.com>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

acc9eb93

KVM: PPC: Add giveup_ext() hook to PPC KVM ops · 2e6baa46

由 Simon Guo 提交于 5月 21, 2018

Currently HV will save math regs(FP/VEC/VSX) when trap into host. But
PR KVM will only save math regs when qemu task switch out of CPU, or
when returning from qemu code.

To emulate FP/VEC/VSX mmio load, PR KVM need to make sure that math
regs were flushed firstly and then be able to update saved VCPU
FPR/VEC/VSX area reasonably.

This patch adds giveup_ext() field to KVM ops. Only PR KVM has non-NULL
giveup_ext() ops. kvmppc_complete_mmio_load() can invoke that hook
(when not NULL) to flush math regs accordingly, before updating saved
register vals.

Math regs flush is also necessary for STORE, which will be covered
in later patch within this patch series.
Signed-off-by: NSimon Guo <wei.guo.simon@gmail.com>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

2e6baa46

KVM: PPC: Reimplement non-SIMD LOAD/STORE instruction mmio emulation with analyse_instr() input · 70923603

由 Simon Guo 提交于 5月 21, 2018

This patch reimplements non-SIMD LOAD/STORE instruction MMIO emulation
with analyse_instr() input. It utilizes the BYTEREV/UPDATE/SIGNEXT
properties exported by analyse_instr() and invokes
kvmppc_handle_load(s)/kvmppc_handle_store() accordingly.

It also moves CACHEOP type handling into the skeleton.

instruction_type within kvm_ppc.h is renamed to avoid conflict with
sstep.h.
Suggested-by: NPaul Mackerras <paulus@ozlabs.org>
Signed-off-by: NSimon Guo <wei.guo.simon@gmail.com>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

70923603

30 3月, 2018 1 次提交

powerpc/64: Use array of paca pointers and allocate pacas individually · d2e60075

由 Nicholas Piggin 提交于 2月 14, 2018

Change the paca array into an array of pointers to pacas. Allocate
pacas individually.

This allows flexibility in where the PACAs are allocated. Future work
will allocate them node-local. Platforms that don't have address limits
on PACAs would be able to defer PACA allocations until later in boot
rather than allocate all possible ones up-front then freeing unused.

This is slightly more overhead (one additional indirection) for cross
CPU paca references, but those aren't too common.
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

d2e60075

19 3月, 2018 1 次提交

KVM: PPC: Remove unused kvm_unmap_hva callback · 39c983ea

由 Paul Mackerras 提交于 2月 22, 2018

Since commit fb1522e0 ("KVM: update to new mmu_notifier semantic
v2", 2017-08-31), the MMU notifier code in KVM no longer calls the
kvm_unmap_hva callback.  This removes the PPC implementations of
kvm_unmap_hva().
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

39c983ea

09 2月, 2018 1 次提交

KVM: PPC: Book3S: Add MMIO emulation for VMX instructions · 09f98496

由 Jose Ricardo Ziviani 提交于 2月 03, 2018

This patch provides the MMIO load/store vector indexed
X-Form emulation.

Instructions implemented:
lvx: the quadword in storage addressed by the result of EA &
0xffff_ffff_ffff_fff0 is loaded into VRT.

stvx: the contents of VRS are stored into the quadword in storage
addressed by the result of EA & 0xffff_ffff_ffff_fff0.
Reported-by: NGopesh Kumar Chaudhary <gopchaud@in.ibm.com>
Reported-by: NBalamuruhan S <bala24@linux.vnet.ibm.com>
Signed-off-by: NJose Ricardo Ziviani <joserz@linux.vnet.ibm.com>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

09f98496

19 1月, 2018 3 次提交

powerpc/64: Rename soft_enabled to irq_soft_mask · 4e26bc4a

由 Madhavan Srinivasan 提交于 12月 20, 2017

Rename the paca->soft_enabled to paca->irq_soft_mask as it is no
longer used as a flag for interrupt state, but a mask.
Signed-off-by: NMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

4e26bc4a

powerpc/64: Move set_soft_enabled() and rename · 0b63acf4

由 Madhavan Srinivasan 提交于 12月 20, 2017

Move set_soft_enabled() from powerpc/kernel/irq.c to asm/hw_irq.c, to
encourage updates to paca->soft_enabled done via these access
function. Add "memory" clobber to hint compiler since
paca->soft_enabled memory is the target here.

Renaming it as soft_enabled_set() will make namespaces works better as
prefix than a postfix when new soft_enabled manipulation functions are
introduced.
Reviewed-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

0b63acf4

powerpc/64: Add #defines for paca->soft_enabled flags · c2e480ba

由 Madhavan Srinivasan 提交于 12月 20, 2017

Two #defines IRQS_ENABLED and IRQS_DISABLED are added to be used when
updating paca->soft_enabled. Replace the hardcoded values used when
updating paca->soft_enabled with IRQ_(EN|DIS)ABLED #define. No logic
change.
Reviewed-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

c2e480ba

23 11月, 2017 1 次提交

KVM: PPC: Book3S HV: Fix migration and HPT resizing of HPT guests on radix hosts · ded13fc1

由 Paul Mackerras 提交于 11月 22, 2017

This fixes two errors that prevent a guest using the HPT MMU from
successfully migrating to a POWER9 host in radix MMU mode, or resizing
its HPT when running on a radix host.

The first bug was that commit 8dc6cca5 ("KVM: PPC: Book3S HV:
Don't rely on host's page size information", 2017-09-11) missed two
uses of hpte_base_page_size(), one in the HPT rehashing code and
one in kvm_htab_write() (which is used on the destination side in
migrating a HPT guest). Instead we use kvmppc_hpte_base_page_shift().
Having the shift count means that we can use left and right shifts
instead of multiplication and division in a few places.

Along the way, this adds a check in kvm_htab_write() to ensure that the
page size encoding in the incoming HPTEs is recognized, and if not
return an EINVAL error to userspace.

The second bug was that kvm_htab_write was performing some but not all
of the functions of kvmhv_setup_mmu(), resulting in the destination VM
being left in radix mode as far as the hardware is concerned. The
simplest fix for now is make kvm_htab_write() call
kvmppc_setup_partition_table() like kvmppc_hv_setup_htab_rma() does.
In future it would be better to refactor the code more extensively
to remove the duplication.

Fixes: 8dc6cca5 ("KVM: PPC: Book3S HV: Don't rely on host's page size information")
Fixes: 7a84084c ("KVM: PPC: Book3S HV: Set partition table rather than SDR1 on POWER9")
Reported-by: NSuraj Jitindar Singh <sjitindarsingh@gmail.com>
Tested-by: NSuraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

ded13fc1

01 11月, 2017 1 次提交

KVM: PPC: Book3S HV: Add infrastructure for running HPT guests on radix host · 18c3640c

由 Paul Mackerras 提交于 9月 13, 2017

This sets up the machinery for switching a guest between HPT (hashed
page table) and radix MMU modes, so that in future we can run a HPT
guest on a radix host on POWER9 machines.

* The KVM_PPC_CONFIGURE_V3_MMU ioctl can now specify either HPT or
  radix mode, on a radix host.

* The KVM_CAP_PPC_MMU_HASH_V3 capability now returns 1 on POWER9
  with HV KVM on a radix host.

* The KVM_PPC_GET_SMMU_INFO returns information about the HPT MMU on a
  radix host.

* The KVM_PPC_ALLOCATE_HTAB ioctl on a radix host will switch the
  guest to HPT mode and allocate a HPT.

* For simplicity, we now allocate the rmap array for each memslot,
  even on a radix host, since it will be needed if the guest switches
  to HPT mode.

* Since we cannot yet run a HPT guest on a radix host, the KVM_RUN
  ioctl will return an EINVAL error in that case.
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

18c3640c

19 6月, 2017 1 次提交

KVM: PPC: Book3S HV: Allow userspace to set the desired SMT mode · 3c313524

由 Paul Mackerras 提交于 2月 06, 2017

This allows userspace to set the desired virtual SMT (simultaneous
multithreading) mode for a VM, that is, the number of VCPUs that
get assigned to each virtual core.  Previously, the virtual SMT mode
was fixed to the number of threads per subcore, and if userspace
wanted to have fewer vcpus per vcore, then it would achieve that by
using a sparse CPU numbering.  This had the disadvantage that the
vcpu numbers can get quite large, particularly for SMT1 guests on
a POWER8 with 8 threads per core.  With this patch, userspace can
set its desired virtual SMT mode and then use contiguous vcpu
numbering.

On POWER8, where the threading mode is "strict", the virtual SMT mode
must be less than or equal to the number of threads per subcore.  On
POWER9, which implements a "loose" threading mode, the virtual SMT
mode can be any power of 2 between 1 and 8, even though there is
effectively one thread per subcore, since the threads are independent
and can all be in different partitions.
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

3c313524

27 4月, 2017 1 次提交

KVM: PPC: Book3S HV: Native usage of the XIVE interrupt controller · 5af50993

由 Benjamin Herrenschmidt 提交于 4月 05, 2017

This patch makes KVM capable of using the XIVE interrupt controller
to provide the standard PAPR "XICS" style hypercalls. It is necessary
for proper operations when the host uses XIVE natively.

This has been lightly tested on an actual system, including PCI
pass-through with a TG3 device.
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
[mpe: Cleanup pr_xxx(), unsplit pr_xxx() strings, etc., fix build
 failures by adding KVM_XIVE which depends on KVM_XICS and XIVE, and
 adding empty stubs for the kvm_xive_xxx() routines, fixup subject,
 integrate fixes from Paul for building PR=y HV=n]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

5af50993

20 4月, 2017 5 次提交

KVM: PPC: VFIO: Add in-kernel acceleration for VFIO · 121f80ba

由 Alexey Kardashevskiy 提交于 3月 22, 2017

This allows the host kernel to handle H_PUT_TCE, H_PUT_TCE_INDIRECT
and H_STUFF_TCE requests targeted an IOMMU TCE table used for VFIO
without passing them to user space which saves time on switching
to user space and back.

This adds H_PUT_TCE/H_PUT_TCE_INDIRECT/H_STUFF_TCE handlers to KVM.
KVM tries to handle a TCE request in the real mode, if failed
it passes the request to the virtual mode to complete the operation.
If it a virtual mode handler fails, the request is passed to
the user space; this is not expected to happen though.

To avoid dealing with page use counters (which is tricky in real mode),
this only accelerates SPAPR TCE IOMMU v2 clients which are required
to pre-register the userspace memory. The very first TCE request will
be handled in the VFIO SPAPR TCE driver anyway as the userspace view
of the TCE table (iommu_table::it_userspace) is not allocated till
the very first mapping happens and we cannot call vmalloc in real mode.

If we fail to update a hardware IOMMU table unexpected reason, we just
clear it and move on as there is nothing really we can do about it -
for example, if we hot plug a VFIO device to a guest, existing TCE tables
will be mirrored automatically to the hardware and there is no interface
to report to the guest about possible failures.

This adds new attribute - KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE - to
the VFIO KVM device. It takes a VFIO group fd and SPAPR TCE table fd
and associates a physical IOMMU table with the SPAPR TCE table (which
is a guest view of the hardware IOMMU table). The iommu_table object
is cached and referenced so we do not have to look up for it in real mode.

This does not implement the UNSET counterpart as there is no use for it -
once the acceleration is enabled, the existing userspace won't
disable it unless a VFIO container is destroyed; this adds necessary
cleanup to the KVM_DEV_VFIO_GROUP_DEL handler.

This advertises the new KVM_CAP_SPAPR_TCE_VFIO capability to the user
space.

This adds real mode version of WARN_ON_ONCE() as the generic version
causes problems with rcu_sched. Since we testing what vmalloc_to_phys()
returns in the code, this also adds a check for already existing
vmalloc_to_phys() call in kvmppc_rm_h_put_tce_indirect().

This finally makes use of vfio_external_user_iommu_id() which was
introduced quite some time ago and was considered for removal.

Tests show that this patch increases transmission speed from 220MB/s
to 750..1020MB/s on 10Gb network (Chelsea CXGB3 10Gb ethernet card).
Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
Acked-by: NAlex Williamson <alex.williamson@redhat.com>
Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

121f80ba

KVM: PPC: iommu: Unify TCE checking · b1af23d8

由 Alexey Kardashevskiy 提交于 3月 22, 2017

This reworks helpers for checking TCE update parameters in way they
can be used in KVM.

This should cause no behavioral change.
Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
Acked-by: NMichael Ellerman <mpe@ellerman.id.au>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

b1af23d8

KVM: PPC: Pass kvm* to kvmppc_find_table() · 503bfcbe

由 Alexey Kardashevskiy 提交于 3月 22, 2017

The guest view TCE tables are per KVM anyway (not per VCPU) so pass kvm*
there. This will be used in the following patches where we will be
attaching VFIO containers to LIOBNs via ioctl() to KVM (rather than
to VCPU).
Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

503bfcbe

KVM: PPC: Book3S: Add MMIO emulation for FP and VSX instructions · 6f63e81b

由 Bin Lu 提交于 2月 21, 2017

This patch provides the MMIO load/store emulation for instructions
of 'double & vector unsigned char & vector signed char & vector
unsigned short & vector signed short & vector unsigned int & vector
signed int & vector double '.

The instructions that this adds emulation for are:

- ldx, ldux, lwax,
- lfs, lfsx, lfsu, lfsux, lfd, lfdx, lfdu, lfdux,
- stfs, stfsx, stfsu, stfsux, stfd, stfdx, stfdu, stfdux, stfiwx,
- lxsdx, lxsspx, lxsiwax, lxsiwzx, lxvd2x, lxvw4x, lxvdsx,
- stxsdx, stxsspx, stxsiwx, stxvd2x, stxvw4x

[paulus@ozlabs.org - some cleanups, fixes and rework, make it
 compile for Book E, fix build when PR KVM is built in]
Signed-off-by: NBin Lu <lblulb@linux.vnet.ibm.com>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

6f63e81b

KVM: PPC: Provide functions for queueing up FP/VEC/VSX unavailable interrupts · 307d9279

由 Paul Mackerras 提交于 3月 22, 2017

This provides functions that can be used for generating interrupts
indicating that a given functional unit (floating point, vector, or
VSX) is unavailable.  These functions will be used in instruction
emulation code.
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

307d9279

10 4月, 2017 3 次提交

powerpc: Consolidate variants of real-mode MMIOs · d381d7ca

由 Benjamin Herrenschmidt 提交于 4月 05, 2017

We have all sort of variants of MMIO accessors for the real mode
instructions. This creates a clean set of accessors based on
Linux normal naming conventions, replacing all occurrences of
the old ones in the tree.

I have purposefully removed the "out/in" variants in favor of
only including __raw variants. Any code using these is already
pretty much hand tuned to operate in a very specific environment.
I've fixed up the 2 users (only one of them actually needed
a barrier in the first place).
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

d381d7ca

powerpc/kvm: Remove obsolete kvm_vm_ioctl_xics_irq declaration · f50d6bd3

由 Benjamin Herrenschmidt 提交于 4月 05, 2017

The function doesn't exist anymore
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

f50d6bd3

powerpc/kvm: Make kvmppc_xics_create_icp static · 936774cd

由 Benjamin Herrenschmidt 提交于 4月 05, 2017

It's only used within the same file it's defined
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

936774cd

31 1月, 2017 3 次提交

KVM: PPC: Book3S HV: Outline of KVM-HV HPT resizing implementation · 5e985969

由 David Gibson 提交于 12月 20, 2016

This adds a not yet working outline of the HPT resizing PAPR
extension.  Specifically it adds the necessary ioctl() functions,
their basic steps, the work function which will handle preparation for
the resize, and synchronization between these, the guest page fault
path and guest HPT update path.

The actual guts of the implementation isn't here yet, so for now the
calls will always fail.
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

5e985969

KVM: PPC: Book3S HV: Allow KVM_PPC_ALLOCATE_HTAB ioctl() to change HPT size · f98a8bf9

由 David Gibson 提交于 12月 20, 2016

The KVM_PPC_ALLOCATE_HTAB ioctl() is used to set the size of hashed page
table (HPT) that userspace expects a guest VM to have, and is also used to
clear that HPT when necessary (e.g. guest reboot).

At present, once the ioctl() is called for the first time, the HPT size can
never be changed thereafter - it will be cleared but always sized as from
the first call.

With upcoming HPT resize implementation, we're going to need to allow
userspace to resize the HPT at reset (to change it back to the default size
if the guest changed it).

So, we need to allow this ioctl() to change the HPT size.

This patch also updates Documentation/virtual/kvm/api.txt to reflect
the new behaviour.  In fact the documentation was already slightly
incorrect since 572abd56 "KVM: PPC: Book3S HV: Don't fall back to
smaller HPT size in allocation ioctl"
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

f98a8bf9

KVM: PPC: Book3S HV: Split HPT allocation from activation · aae0777f

由 David Gibson 提交于 12月 20, 2016

Currently, kvmppc_alloc_hpt() both allocates a new hashed page table (HPT)
and sets it up as the active page table for a VM. For the upcoming HPT
resize implementation we're going to want to allocate HPTs separately from
activating them.

So, split the allocation itself out into kvmppc_allocate_hpt() and perform
the activation with a new kvmppc_set_hpt() function. Likewise we split
kvmppc_free_hpt(), which just frees the HPT, from kvmppc_release_hpt()
which unsets it as an active HPT, then frees it.

We also move the logic to fall back to smaller HPT sizes if the first try
fails into the single caller which used that behaviour,
kvmppc_hv_setup_htab_rma(). This introduces a slight semantic change, in
that previously if the initial attempt at CMA allocation failed, we would
fall back to attempting smaller sizes with the page allocator. Now, we
try first CMA, then the page allocator at each size. As far as I can tell
this change should be harmless.

To match, we make kvmppc_free_hpt() just free the actual HPT itself. The
call to kvmppc_free_lpid() that was there, we move to the single caller.
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

aae0777f

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功