提交 · df158189dbcc2e0ee29dc4b917d45ee5bf25a35e · openeuler / Kernel

17 5月, 2018 5 次提交

KVM: PPC: Book 3S HV: Do ptesync in radix guest exit path · df158189

由 Paul Mackerras 提交于 5月 17, 2018

A radix guest can execute tlbie instructions to invalidate TLB entries.
After a tlbie or a group of tlbies, it must then do the architected
sequence eieio; tlbsync; ptesync to ensure that the TLB invalidation
has been processed by all CPUs in the system before it can rely on
no CPU using any translation that it just invalidated.

In fact it is the ptesync which does the actual synchronization in
this sequence, and hardware has a requirement that the ptesync must
be executed on the same CPU thread as the tlbies which it is expected
to order. Thus, if a vCPU gets moved from one physical CPU to
another after it has done some tlbies but before it can get to do the
ptesync, the ptesync will not have the desired effect when it is
executed on the second physical CPU.

To fix this, we do a ptesync in the exit path for radix guests. If
there are any pending tlbies, this will wait for them to complete.
If there aren't, then ptesync will just do the same as sync.
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

df158189

KVM: PPC: Book3S HV: XIVE: Resend re-routed interrupts on CPU priority change · 9dc81d6b

由 Benjamin Herrenschmidt 提交于 5月 10, 2018

When a vcpu priority (CPPR) is set to a lower value (masking more
interrupts), we stop processing interrupts already in the queue
for the priorities that have now been masked.

If those interrupts were previously re-routed to a different
CPU, they might still be stuck until the older one that has
them in its queue processes them. In the case of guest CPU
unplug, that can be never.

To address that without creating additional overhead for
the normal interrupt processing path, this changes H_CPPR
handling so that when such a priority change occurs, we
scan the interrupt queue for that vCPU, and for any
interrupt in there that has been re-routed, we replace it
with a dummy and force a re-trigger.
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Tested-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

9dc81d6b

KVM: PPC: Book3S HV: Make radix clear pte when unmapping · 7e3d9a1d

由 Nicholas Piggin 提交于 5月 09, 2018

The current partition table unmap code clears the _PAGE_PRESENT bit
out of the pte, which leaves pud_huge/pmd_huge true and does not
clear pud_present/pmd_present.  This can confuse subsequent page
faults and possibly lead to the guest looping doing continual
hypervisor page faults.
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

7e3d9a1d

KVM: PPC: Book3S HV: Make radix use correct tlbie sequence in kvmppc_radix_tlbie_page · e2560b10

由 Nicholas Piggin 提交于 5月 09, 2018

The standard eieio ; tlbsync ; ptesync must follow tlbie to ensure it
is ordered with respect to subsequent operations.
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

e2560b10

KVM: PPC: Book3S HV: Snapshot timebase offset on guest entry · 57b8daa7

由 Paul Mackerras 提交于 4月 20, 2018

Currently, the HV KVM guest entry/exit code adds the timebase offset
from the vcore struct to the timebase on guest entry, and subtracts
it on guest exit. Which is fine, except that it is possible for
userspace to change the offset using the SET_ONE_REG interface while
the vcore is running, as there is only one timebase offset per vcore
but potentially multiple VCPUs in the vcore. If that were to happen,
KVM would subtract a different offset on guest exit from that which
it had added on guest entry, leading to the timebase being out of sync
between cores in the host, which then leads to bad things happening
such as hangs and spurious watchdog timeouts.

To fix this, we add a new field 'tb_offset_applied' to the vcore struct
which stores the offset that is currently applied to the timebase.
This value is set from the vcore tb_offset field on guest entry, and
is what is subtracted from the timebase on guest exit. Since it is
zero when the timebase offset is not applied, we can simplify the
logic in kvmhv_start_timing and kvmhv_accumulate_time.

In addition, we had secondary threads reading the timebase while
running concurrently with code on the primary thread which would
eventually add or subtract the timebase offset from the timebase.
This occurred while saving or restoring the DEC register value on
the secondary threads. Although no specific incorrect behaviour has
been observed, this is a race which should be fixed. To fix it, we
move the DEC saving code to just before we call kvmhv_commence_exit,
and the DEC restoring code to after the point where we have waited
for the primary thread to switch the MMU context and add the timebase
offset. That way we are sure that the timebase contains the guest
timebase value in both cases.
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

57b8daa7

27 4月, 2018 1 次提交

powerpc/kvm/booke: Fix altivec related build break · b2d7ecbe

由 Laurentiu Tudor 提交于 4月 26, 2018

Add missing "altivec unavailable" interrupt injection helper
thus fixing the linker error below:

arch/powerpc/kvm/emulate_loadstore.o: In function `kvmppc_check_altivec_disabled':
arch/powerpc/kvm/emulate_loadstore.c: undefined reference to `.kvmppc_core_queue_vec_unavail'

Fixes: 09f98496 ("KVM: PPC: Book3S: Add MMIO emulation for VMX instructions")
Signed-off-by: NLaurentiu Tudor <laurentiu.tudor@nxp.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

b2d7ecbe

11 4月, 2018 1 次提交

KVM: PPC: Book3S HV: trace_tlbie must not be called in realmode · 19ce7909

由 Nicholas Piggin 提交于 4月 06, 2018

This crashes with a "Bad real address for load" attempting to load
from the vmalloc region in realmode (faulting address is in DAR).

  Oops: Bad interrupt in KVM entry/exit code, sig: 6 [#1]
  LE SMP NR_CPUS=2048 NUMA PowerNV
  CPU: 53 PID: 6582 Comm: qemu-system-ppc Not tainted 4.16.0-01530-g43d1859f0994
  NIP:  c0000000000155ac LR: c0000000000c2430 CTR: c000000000015580
  REGS: c000000fff76dd80 TRAP: 0200   Not tainted  (4.16.0-01530-g43d1859f0994)
  MSR:  9000000000201003 <SF,HV,ME,RI,LE>  CR: 48082222  XER: 00000000
  CFAR: 0000000102900ef0 DAR: d00017fffd941a28 DSISR: 00000040 SOFTE: 3
  NIP [c0000000000155ac] perf_trace_tlbie+0x2c/0x1a0
  LR [c0000000000c2430] do_tlbies+0x230/0x2f0

I suspect the reason is the per-cpu data is not in the linear chunk.
This could be restored if that was able to be fixed, but for now,
just remove the tracepoints.

Fixes: 0428491c ("powerpc/mm: Trace tlbie(l) instructions")
Cc: stable@vger.kernel.org # v4.13+
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

19ce7909

03 4月, 2018 1 次提交

KVM: PPC: Book3S HV: Fix ppc_breakpoint_available compile error · e303c087

由 Nicholas Piggin 提交于 4月 01, 2018

arch/powerpc/kvm/book3s_hv.c: In function ‘kvmppc_h_set_mode’:
arch/powerpc/kvm/book3s_hv.c:745:8: error: implicit declaration of function ‘ppc_breakpoint_available’
   if (!ppc_breakpoint_available())
        ^~~~~~~~~~~~~~~~~~~~~~~~

Fixes: 398e712c ("KVM: PPC: Book3S HV: Return error from h_set_mode(SET_DAWR) on POWER9")
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

e303c087

31 3月, 2018 2 次提交

powerpc/64s: Remove POWER4 support · 471d7ff8

由 Nicholas Piggin 提交于 2月 21, 2018

POWER4 has been broken since at least the change 49d09bf2
("powerpc/64s: Optimise MSR handling in exception handling"), which
requires mtmsrd L=1 support. This was introduced in ISA v2.01, and
POWER4 supports ISA v2.00.
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

471d7ff8

powerpc/kvm: Fix guest boot failure on Power9 since DAWR changes · ca9a16c3

由 Aneesh Kumar K.V 提交于 3月 30, 2018

SLOF checks for 'sc 1' (hypercall) support by issuing a hcall with
H_SET_DABR. Since the recent commit e8ebedbf ("KVM: PPC: Book3S
HV: Return error from h_set_dabr() on POWER9") changed H_SET_DABR to
return H_UNSUPPORTED on Power9, we see guest boot failures, the
symptom is the boot seems to just stop in SLOF, eg:

  SLOF ***************************************************************
  QEMU Starting
   Build Date = Sep 24 2017 12:23:07
   FW Version = buildd@ release 20170724
  <no further output>

SLOF can cope if H_SET_DABR returns H_HARDWARE. So wwitch the return
value to H_HARDWARE instead of H_UNSUPPORTED so that we don't break
the guest boot.

That does mean we return a different error to PowerVM in this case,
but that's probably not a big concern.

Fixes: e8ebedbf ("KVM: PPC: Book3S HV: Return error from h_set_dabr() on POWER9")
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

ca9a16c3

30 3月, 2018 3 次提交

powerpc/64s: Allocate LPPACAs individually · 499dcd41

由 Nicholas Piggin 提交于 2月 14, 2018

We no longer allocate lppacas in an array, so this patch removes the
1kB static alignment for the structure, and enforces the PAPR
alignment requirements at allocation time. We can not reduce the 1kB
allocation size however, due to existing KVM hypervisors.
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

499dcd41

powerpc/64: Use array of paca pointers and allocate pacas individually · d2e60075

由 Nicholas Piggin 提交于 2月 14, 2018

Change the paca array into an array of pointers to pacas. Allocate
pacas individually.

This allows flexibility in where the PACAs are allocated. Future work
will allocate them node-local. Platforms that don't have address limits
on PACAs would be able to defer PACA allocations until later in boot
rather than allocate all possible ones up-front then freeing unused.

This is slightly more overhead (one additional indirection) for cross
CPU paca references, but those aren't too common.
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

d2e60075

powerpc/64s: Do not allocate lppaca if we are not virtualized · 8e0b634b

由 Nicholas Piggin 提交于 2月 14, 2018

The "lppaca" is a structure registered with the hypervisor. This is
unnecessary when running on non-virtualised platforms. One field from
the lppaca (pmcregs_in_use) is also used by the host, so move the host
part out into the paca (lppaca field is still updated in
guest mode).
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
[mpe: Fix non-pseries build with some #ifdefs]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

8e0b634b

28 3月, 2018 1 次提交

KVM: PPC: Book3S HV: Use __gfn_to_pfn_memslot() in page fault handler · 31c8b0d0

由 Paul Mackerras 提交于 3月 01, 2018

This changes the hypervisor page fault handler for radix guests to use
the generic KVM __gfn_to_pfn_memslot() function instead of using
get_user_pages_fast() and then handling the case of VM_PFNMAP vmas
specially.  The old code missed the case of VM_IO vmas; with this
change, VM_IO vmas will now be handled correctly by code within
__gfn_to_pfn_memslot.

Currently, __gfn_to_pfn_memslot calls hva_to_pfn, which only uses
__get_user_pages_fast for the initial lookup in the cases where
either atomic or async is set.  Since we are not setting either
atomic or async, we do our own __get_user_pages_fast first, for now.

This also adds code to check for the KVM_MEM_READONLY flag on the
memslot.  If it is set and this is a write access, we synthesize a
data storage interrupt for the guest.

In the case where the page is not normal RAM (i.e. page == NULL in
kvmppc_book3s_radix_page_fault(), we read the PTE from the Linux page
tables because we need the mapping attribute bits as well as the PFN.
(The mapping attribute bits indicate whether accesses have to be
non-cacheable and/or guarded.)
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

31c8b0d0

27 3月, 2018 3 次提交

KVM: PPC: Book3S HV: Handle migration with POWER9 disabled DAWR · b53221e7

由 Michael Neuling 提交于 3月 27, 2018

POWER9 with the DAWR disabled causes problems for partition
migration. Either we have to fail the migration (since we lose the
DAWR) or we silently drop the DAWR and allow the migration to pass.

This patch does the latter and allows the migration to pass (at the
cost of silently losing the DAWR). This is not ideal but hopefully the
best overall solution. This approach has been acked by Paulus.

With this patch kvmppc_set_one_reg() will store the DAWR in the vcpu
but won't actually set it on POWER9 hardware.
Signed-off-by: NMichael Neuling <mikey@neuling.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

b53221e7

KVM: PPC: Book3S HV: Return error from h_set_dabr() on POWER9 · e8ebedbf

由 Michael Neuling 提交于 3月 27, 2018

POWER7 compat mode guests can use h_set_dabr on POWER9. POWER9 should
use the DAWR but since it's disabled there we can't.

This returns H_UNSUPPORTED on a h_set_dabr() on POWER9 where the DAWR
is disabled.

Current Linux guests ignore this error, so they will silently not get
the DAWR (sigh). The same error code is being used by POWERVM in this
case.
Signed-off-by: NMichael Neuling <mikey@neuling.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

e8ebedbf

KVM: PPC: Book3S HV: Return error from h_set_mode(SET_DAWR) on POWER9 · 398e712c

由 Michael Neuling 提交于 3月 27, 2018

Return H_P2 on a h_set_mode(SET_DAWR) on POWER9 where the DAWR is
disabled.

Current Linux guests ignore this error, so they will silently not get
the DAWR (sigh). The same error code is being used by POWERVM in this
case.
Signed-off-by: NMichael Neuling <mikey@neuling.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

398e712c

23 3月, 2018 5 次提交

KVM: PPC: Book3S HV: Work around TEXASR bug in fake suspend state · 681c617b

由 Paul Mackerras 提交于 3月 21, 2018

This works around a hardware bug in "Nimbus" POWER9 DD2.2 processors,
where the contents of the TEXASR can get corrupted while a thread is
in fake suspend state.  The workaround is for the instruction emulation
code to use the value saved at the most recent guest exit in real
suspend mode.  We achieve this by simply not saving the TEXASR into
the vcpu struct on an exit in fake suspend state.  We also have to
take care to set the orig_texasr field only on guest exit in real
suspend state.

This also means that on guest entry in fake suspend state, TEXASR
will be restored to the value it had on the last exit in real suspend
state, effectively counteracting any hardware-caused corruption.  This
works because TEXASR may not be written in suspend state.

With this, the guest might see the wrong values in TEXASR if it reads
it while in suspend state, but will see the correct value in
non-transactional state (e.g. after a treclaim), and treclaim will
work correctly.

With this workaround, the code will actually run slightly faster, and
will operate correctly on systems without the TEXASR bug (since TEXASR
may not be written in suspend state, and is only changed by failure
recording, which will have already been done before we get into fake
suspend state).  Therefore these changes are not made subject to a CPU
feature bit.
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

681c617b

KVM: PPC: Book3S HV: Work around XER[SO] bug in fake suspend mode · 87a11bb6

由 Suraj Jitindar Singh 提交于 3月 21, 2018

This works around a hardware bug in "Nimbus" POWER9 DD2.2 processors,
where a treclaim performed in fake suspend mode can cause subsequent
reads from the XER register to return inconsistent values for the SO
(summary overflow) bit.  The inconsistent SO bit state can potentially
be observed on any thread in the core.  We have to do the treclaim
because that is the only way to get the thread out of suspend state
(fake or real) and into non-transactional state.

The workaround for the bug is to force the core into SMT4 mode before
doing the treclaim.  This patch adds the code to do that, conditional
on the CPU_FTR_P9_TM_XER_SO_BUG feature bit.
Signed-off-by: NSuraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

87a11bb6

KVM: PPC: Book3S HV: Work around transactional memory bugs in POWER9 · 4bb3c7a0

由 Paul Mackerras 提交于 3月 21, 2018

POWER9 has hardware bugs relating to transactional memory and thread
reconfiguration (changes to hardware SMT mode). Specifically, the core
does not have enough storage to store a complete checkpoint of all the
architected state for all four threads. The DD2.2 version of POWER9
includes hardware modifications designed to allow hypervisor software
to implement workarounds for these problems. This patch implements
those workarounds in KVM code so that KVM guests see a full, working
transactional memory implementation.

The problems center around the use of TM suspended state, where the
CPU has a checkpointed state but execution is not transactional. The
workaround is to implement a "fake suspend" state, which looks to the
guest like suspended state but the CPU does not store a checkpoint.
In this state, any instruction that would cause a transition to
transactional state (rfid, rfebb, mtmsrd, tresume) or would use the
checkpointed state (treclaim) causes a "soft patch" interrupt (vector
0x1500) to the hypervisor so that it can be emulated. The trechkpt
instruction also causes a soft patch interrupt.

On POWER9 DD2.2, we avoid returning to the guest in any state which
would require a checkpoint to be present. The trechkpt in the guest
entry path which would normally create that checkpoint is replaced by
either a transition to fake suspend state, if the guest is in suspend
state, or a rollback to the pre-transactional state if the guest is in
transactional state. Fake suspend state is indicated by a flag in the
PACA plus a new bit in the PSSCR. The new PSSCR bit is write-only and
reads back as 0.

On exit from the guest, if the guest is in fake suspend state, we still
do the treclaim instruction as we would in real suspend state, in order
to get into non-transactional state, but we do not save the resulting
register state since there was no checkpoint.

Emulation of the instructions that cause a softpatch interrupt is
handled in two paths. If the guest is in real suspend mode, we call
kvmhv_p9_tm_emulation_early() to handle the cases where the guest is
transitioning to transactional state. This is called before we do the
treclaim in the guest exit path; because we haven't done treclaim, we
can get back to the guest with the transaction still active. If the
instruction is a case that kvmhv_p9_tm_emulation_early() doesn't
handle, or if the guest is in fake suspend state, then we proceed to
do the complete guest exit path and subsequently call
kvmhv_p9_tm_emulation() in host context with the MMU on. This handles
all the cases including the cases that generate program interrupts
(illegal instruction or TM Bad Thing) and facility unavailable
interrupts.

The emulation is reasonably straightforward and is mostly concerned
with checking for exception conditions and updating the state of
registers such as MSR and CR0. The treclaim emulation takes care to
ensure that the TEXASR register gets updated as if it were the guest
treclaim instruction that had done failure recording, not the treclaim
done in hypervisor state in the guest exit path.

With this, the KVM_CAP_PPC_HTM capability returns true (1) even if
transactional memory is not available to host userspace.
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

4bb3c7a0

powerpc/mm: Fixup tlbie vs store ordering issue on POWER9 · a5d4b589

由 Aneesh Kumar K.V 提交于 3月 23, 2018

On POWER9, under some circumstances, a broadcast TLB invalidation
might complete before all previous stores have drained, potentially
allowing stale stores from becoming visible after the invalidation.
This works around it by doubling up those TLB invalidations which was
verified by HW to be sufficient to close the risk window.

This will be documented in a yet-to-be-published errata.

Fixes: 1a472c9d ("powerpc/mm/radix: Add tlbflush routines")
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
[mpe: Enable the feature in the DT CPU features code for all Power9,
      rename the feature to CPU_FTR_P9_TLBIE_BUG per benh.]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

a5d4b589

KVM: PPC: Book3S HV: Fix duplication of host SLB entries · cda4a147

由 Paul Mackerras 提交于 3月 22, 2018

Since commit 6964e6a4 ("KVM: PPC: Book3S HV: Do SLB load/unload
with guest LPCR value loaded", 2018-01-11), we have been seeing
occasional machine check interrupts on POWER8 systems when running
KVM guests, due to SLB multihit errors.

This turns out to be due to the guest exit code reloading the host
SLB entries from the SLB shadow buffer when the SLB was not previously
cleared in the guest entry path. This can happen because the path
which skips from the guest entry code to the guest exit code without
entering the guest now does the skip before the SLB is cleared and
loaded with guest values, but the host values are loaded after the
point in the guest exit path that we skip to.

To fix this, we move the code that reloads the host SLB values up
so that it occurs just before the point in the guest exit code (the
label guest_bypass:) where we skip to from the guest entry path.
Reported-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
Fixes: 6964e6a4 ("KVM: PPC: Book3S HV: Do SLB load/unload with guest LPCR value loaded")
Tested-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

cda4a147

19 3月, 2018 4 次提交

KVM: PPC: Book3S HV: Handle 1GB pages in radix page fault handler · 58c5c276

由 Paul Mackerras 提交于 2月 24, 2018

This adds code to the radix hypervisor page fault handler to handle the
case where the guest memory is backed by 1GB hugepages, and put them
into the partition-scoped radix tree at the PUD level. The code is
essentially analogous to the code for 2MB pages. This also rearranges
kvmppc_create_pte() to make it easier to follow.
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

58c5c276

KVM: PPC: Book3S HV: Streamline setting of reference and change bits · f7caf712

由 Paul Mackerras 提交于 2月 24, 2018

When using the radix MMU, we can get hypervisor page fault interrupts
with the DSISR_SET_RC bit set in DSISR/HSRR1, indicating that an
attempt to set the R (reference) or C (change) bit in a PTE atomically
failed. Previously we would find the corresponding Linux PTE and
check the permission and dirty bits there, but this is not really
necessary since we only need to do what the hardware was trying to
do, namely set R or C atomically. This removes the code that reads
the Linux PTE and just update the partition-scoped PTE, having first
checked that it is still present, and if the access is a write, that
the PTE still has write permission.

Furthermore, we now check whether any other relevant bits are set
in DSISR, and if there are, then we proceed with the rest of the
function in order to handle whatever condition they represent,
instead of returning to the guest as we did previously.
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

f7caf712

KVM: PPC: Book3S HV: Radix page fault handler optimizations · c4c8a764

由 Paul Mackerras 提交于 2月 23, 2018

This improves the handling of transparent huge pages in the radix
hypervisor page fault handler.  Previously, if a small page is faulted
in to a 2MB region of guest physical space, that means that there is
a page table pointer at the PMD level, which could never be replaced
by a leaf (2MB) PMD entry.  This adds the code to clear the PMD,
invlidate the page walk cache and free the page table page in this
situation, so that the leaf PMD entry can be created.

This also adds code to check whether a PMD or PTE being inserted is
the same as is already there (because of a race with another CPU that
faulted on the same page) and if so, we don't replace the existing
entry, meaning that we don't invalidate the PTE or PMD and do a TLB
invalidation.
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

c4c8a764

KVM: PPC: Remove unused kvm_unmap_hva callback · 39c983ea

由 Paul Mackerras 提交于 2月 22, 2018

Since commit fb1522e0 ("KVM: update to new mmu_notifier semantic
v2", 2017-08-31), the MMU notifier code in KVM no longer calls the
kvm_unmap_hva callback.  This removes the PPC implementations of
kvm_unmap_hva().
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

39c983ea

14 3月, 2018 1 次提交

KVM: PPC: Book3S HV: Fix trap number return from __kvmppc_vcore_entry · a8b48a4d

由 Paul Mackerras 提交于 3月 07, 2018

This fixes a bug where the trap number that is returned by
__kvmppc_vcore_entry gets corrupted.  The effect of the corruption
is that IPIs get ignored on POWER9 systems when the IPI is sent via
a doorbell interrupt to a CPU which is executing in a KVM guest.
The effect of the IPI being ignored is often that another CPU locks
up inside smp_call_function_many() (and if that CPU is holding a
spinlock, other CPUs then lock up inside raw_spin_lock()).

The trap number is currently held in register r12 for most of the
assembly-language part of the guest exit path.  In that path, we
call kvmppc_subcore_exit_guest(), which is a C function, without
restoring r12 afterwards.  Depending on the kernel config and the
compiler, it may modify r12 or it may not, so some config/compiler
combinations see the bug and others don't.

To fix this, we arrange for the trap number to be stored on the
stack from the 'guest_bypass:' label until the end of the function,
then the trap number is loaded and returned in r12 as before.

Cc: stable@vger.kernel.org # v4.8+
Fixes: fd7bacbc ("KVM: PPC: Book3S HV: Fix TB corruption in guest exit path on HMI interrupt")
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

a8b48a4d

03 3月, 2018 1 次提交

KVM: PPC: Book3S HV: Fix guest time accounting with VIRT_CPU_ACCOUNTING_GEN · 61bd0f66

由 Laurent Vivier 提交于 3月 02, 2018

Since commit 8b24e69f ("KVM: PPC: Book3S HV: Close race with testing
for signals on guest entry"), if CONFIG_VIRT_CPU_ACCOUNTING_GEN is set, the
guest time is not accounted to guest time and user time, but instead to
system time.

This is because guest_enter()/guest_exit() are called while interrupts
are disabled and the tick counter cannot be updated between them.

To fix that, move guest_exit() after local_irq_enable(), and as
guest_enter() is called with IRQ disabled, call guest_enter_irqoff()
instead.

Fixes: 8b24e69f ("KVM: PPC: Book3S HV: Close race with testing for signals on guest entry")
Signed-off-by: NLaurent Vivier <lvivier@redhat.com>
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

61bd0f66

02 3月, 2018 2 次提交

KVM: PPC: Book3S HV: Fix VRMA initialization with 2MB or 1GB memory backing · debd574f

由 Paul Mackerras 提交于 3月 02, 2018

The current code for initializing the VRMA (virtual real memory area)
for HPT guests requires the page size of the backing memory to be one
of 4kB, 64kB or 16MB.  With a radix host we have the possibility that
the backing memory page size can be 2MB or 1GB.  In these cases, if the
guest switches to HPT mode, KVM will not initialize the VRMA and the
guest will fail to run.

In fact it is not necessary that the VRMA page size is the same as the
backing memory page size; any VRMA page size less than or equal to the
backing memory page size is acceptable.  Therefore we now choose the
largest page size out of the set {4k, 64k, 16M} which is not larger
than the backing memory page size.
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

debd574f

KVM: PPC: Book3S HV: Fix handling of large pages in radix page fault handler · c3856aeb

由 Paul Mackerras 提交于 2月 23, 2018

This fixes several bugs in the radix page fault handler relating to
the way large pages in the memory backing the guest were handled.
First, the check for large pages only checked for explicit huge pages
and missed transparent huge pages.  Then the check that the addresses
(host virtual vs. guest physical) had appropriate alignment was
wrong, meaning that the code never put a large page in the partition
scoped radix tree; it was always demoted to a small page.

Fixing this exposed bugs in kvmppc_create_pte().  We were never
invalidating a 2MB PTE, which meant that if a page was initially
faulted in without write permission and the guest then attempted
to store to it, we would never update the PTE to have write permission.
If we find a valid 2MB PTE in the PMD, we need to clear it and
do a TLB invalidation before installing either the new 2MB PTE or
a pointer to a page table page.

This also corrects an assumption that get_user_pages_fast would set
the _PAGE_DIRTY bit if we are writing, which is not true.  Instead we
mark the page dirty explicitly with set_page_dirty_lock().  This
also means we don't need the dirty bit set on the host PTE when
providing write access on a read fault.
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

c3856aeb

22 2月, 2018 1 次提交

treewide/trivial: Remove ';;$' typo noise · ed7158ba

由 Ingo Molnar 提交于 2月 22, 2018

On lkml suggestions were made to split up such trivial typo fixes into per subsystem
patches:

  --- a/arch/x86/boot/compressed/eboot.c
  +++ b/arch/x86/boot/compressed/eboot.c
  @@ -439,7 +439,7 @@ setup_uga32(void **uga_handle, unsigned long size, u32 *width, u32 *height)
          struct efi_uga_draw_protocol *uga = NULL, *first_uga;
          efi_guid_t uga_proto = EFI_UGA_PROTOCOL_GUID;
          unsigned long nr_ugas;
  -       u32 *handles = (u32 *)uga_handle;;
  +       u32 *handles = (u32 *)uga_handle;
          efi_status_t status = EFI_INVALID_PARAMETER;
          int i;

This patch is the result of the following script:

  $ sed -i 's/;;$/;/g' $(git grep -E ';;$'  | grep "\.[ch]:"  | grep -vwE 'for|ia64' | cut -d: -f1 | sort | uniq)

... followed by manual review to make sure it's all good.

Splitting this up is just crazy talk, let's get over with this and just do it.
Reported-by: NPavel Machek <pavel@ucw.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: NIngo Molnar <mingo@kernel.org>

ed7158ba

13 2月, 2018 2 次提交

KVM: PPC: Book3S: Fix compile error that occurs with some gcc versions · 6df3877f

由 Paul Mackerras 提交于 2月 13, 2018

Some versions of gcc generate a warning that the variable "emulated"
may be used uninitialized in function kvmppc_handle_load128_by2x64().
It would be used uninitialized if kvmppc_handle_load128_by2x64 was
ever called with vcpu->arch.mmio_vmx_copy_nums == 0, but neither of
the callers ever do that, so there is no actual bug. When gcc
generates a warning, it causes the build to fail because arch/powerpc
is compiled with -Werror.

This silences the warning by initializing "emulated" to EMULATE_DONE.

Fixes: 09f98496 ("KVM: PPC: Book3S: Add MMIO emulation for VMX instructions")
Reported-by: NMichael Ellerman <mpe@ellerman.id.au>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

6df3877f

KVM: PPC: Fix compile error that occurs when CONFIG_ALTIVEC=n · c662f773

由 Paul Mackerras 提交于 2月 13, 2018

Commit accb757d ("KVM: Move vcpu_load to arch-specific
kvm_arch_vcpu_ioctl_run", 2017-12-04) added a "goto out"
statement and an "out:" label to kvm_arch_vcpu_ioctl_run().
Since the only "goto out" is inside a CONFIG_VSX block,
compiling with CONFIG_VSX=n gives a warning that label "out"
is defined but not used, and because arch/powerpc is compiled
with -Werror, that becomes a compile error that makes the kernel
build fail.

Merge commit 1ab03c07 ("Merge tag 'kvm-ppc-next-4.16-2' of
git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc",
2018-02-09) added a similar block of code inside a #ifdef
CONFIG_ALTIVEC, with a "goto out" statement.

In order to make the build succeed, this adds a #ifdef around the
"out:" label.  This is a minimal, ugly fix, to be replaced later
by a refactoring of the code.  Since CONFIG_VSX depends on
CONFIG_ALTIVEC, it is sufficient to use #ifdef CONFIG_ALTIVEC here.

Fixes: accb757d ("KVM: Move vcpu_load to arch-specific kvm_arch_vcpu_ioctl_run")
Reported-by: NChristian Zigotzky <chzigotzky@xenosoft.de>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

c662f773

09 2月, 2018 4 次提交

KVM: PPC: Book3S: Add MMIO emulation for VMX instructions · 09f98496

由 Jose Ricardo Ziviani 提交于 2月 03, 2018

This patch provides the MMIO load/store vector indexed
X-Form emulation.

Instructions implemented:
lvx: the quadword in storage addressed by the result of EA &
0xffff_ffff_ffff_fff0 is loaded into VRT.

stvx: the contents of VRS are stored into the quadword in storage
addressed by the result of EA & 0xffff_ffff_ffff_fff0.
Reported-by: NGopesh Kumar Chaudhary <gopchaud@in.ibm.com>
Reported-by: NBalamuruhan S <bala24@linux.vnet.ibm.com>
Signed-off-by: NJose Ricardo Ziviani <joserz@linux.vnet.ibm.com>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

09f98496

KVM: PPC: Book3S HV: Branch inside feature section · d20fe50a

由 Alexander Graf 提交于 2月 08, 2018

We ended up with code that did a conditional branch inside a feature
section to code outside of the feature section. Depending on how the
object file gets organized, that might mean we exceed the 14bit
relocation limit for conditional branches:

arch/powerpc/kvm/built-in.o:arch/powerpc/kvm/book3s_hv_rmhandlers.S:416:(__ftr_alt_97+0x8): relocation truncated to fit: R_PPC64_REL14 against `.text'+1ca4

So instead of doing a conditional branch outside of the feature section,
let's just jump at the end of the same, making the branch very short.
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

d20fe50a

KVM: PPC: Book3S HV: Make HPT resizing work on POWER9 · 790a9df5

由 David Gibson 提交于 2月 02, 2018

This adds code to enable the HPT resizing code to work on POWER9,
which uses a slightly modified HPT entry format compared to POWER8.
On POWER9, we convert HPTEs read from the HPT from the new format to
the old format so that the rest of the HPT resizing code can work as
before.  HPTEs written to the new HPT are converted to the new format
as the last step before writing them into the new HPT.

This takes out the checks added by commit bcd3bb63 ("KVM: PPC:
Book3S HV: Disable HPT resizing on POWER9 for now", 2017-02-18),
now that HPT resizing works on POWER9.

On POWER9, when we pivot to the new HPT, we now call
kvmppc_setup_partition_table() to update the partition table in order
to make the hardware use the new HPT.

[paulus@ozlabs.org - added kvmppc_setup_partition_table() call,
 wrote commit message.]
Tested-by: NLaurent Vivier <lvivier@redhat.com>
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

790a9df5

KVM: PPC: Book3S HV: Fix handling of secondary HPTEG in HPT resizing code · 05f2bb03

由 Paul Mackerras 提交于 2月 07, 2018

This fixes the computation of the HPTE index to use when the HPT
resizing code encounters a bolted HPTE which is stored in its
secondary HPTE group.  The code inverts the HPTE group number, which
is correct, but doesn't then mask it with new_hash_mask.  As a result,
new_pteg will be effectively negative, resulting in new_hptep
pointing before the new HPT, which will corrupt memory.

In addition, this removes two BUG_ON statements.  The condition that
the BUG_ONs were testing -- that we have computed the hash value
incorrectly -- has never been observed in testing, and if it did
occur, would only affect the guest, not the host.  Given that
BUG_ON should only be used in conditions where the kernel (i.e.
the host kernel, in this case) can't possibly continue execution,
it is not appropriate here.
Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

05f2bb03

08 2月, 2018 1 次提交

KVM: PPC: Book3S PR: Fix broken select due to misspelling · 57ea5f16

由 Ulf Magnusson 提交于 2月 05, 2018

Commit 76d837a4 ("KVM: PPC: Book3S PR: Don't include SPAPR TCE code
on non-pseries platforms") added a reference to the globally undefined
symbol PPC_SERIES. Looking at the rest of the commit, PPC_PSERIES was
probably intended.

Change PPC_SERIES to PPC_PSERIES.

Discovered with the
https://github.com/ulfalizer/Kconfiglib/blob/master/examples/list_undefined.py
script.

Fixes: 76d837a4 ("KVM: PPC: Book3S PR: Don't include SPAPR TCE code on non-pseries platforms")
Cc: stable@vger.kernel.org # v4.12+
Signed-off-by: NUlf Magnusson <ulfalizer@gmail.com>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

57ea5f16

01 2月, 2018 2 次提交

KVM: PPC: Book3S PR: Fix svcpu copying with preemption enabled · 07ae5389

由 Alexander Graf 提交于 1月 31, 2018

When copying between the vcpu and svcpu, we may get scheduled away onto
a different host CPU which in turn means our svcpu pointer may change.

That means we need to atomically copy to and from the svcpu with preemption
disabled, so that all code around it always sees a coherent state.
Reported-by: NSimon Guo <wei.guo.simon@gmail.com>
Fixes: 3d3319b4 ("KVM: PPC: Book3S: PR: Enable interrupts earlier")
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

07ae5389

KVM: PPC: Book3S HV: Drop locks before reading guest memory · 36ee41d1

由 Paul Mackerras 提交于 1月 30, 2018

Running with CONFIG_DEBUG_ATOMIC_SLEEP reveals that HV KVM tries to
read guest memory, in order to emulate guest instructions, while
preempt is disabled and a vcore lock is held.  This occurs in
kvmppc_handle_exit_hv(), called from post_guest_process(), when
emulating guest doorbell instructions on POWER9 systems, and also
when checking whether we have hit a hypervisor breakpoint.
Reading guest memory can cause a page fault and thus cause the
task to sleep, so we need to avoid reading guest memory while
holding a spinlock or when preempt is disabled.

To fix this, we move the preempt_enable() in kvmppc_run_core() to
before the loop that calls post_guest_process() for each vcore that
has just run, and we drop and re-take the vcore lock around the calls
to kvmppc_emulate_debug_inst() and kvmppc_emulate_doorbell_instr().

Dropping the lock is safe with respect to the iteration over the
runnable vcpus in post_guest_process(); for_each_runnable_thread
is actually safe to use locklessly.  It is possible for a vcpu
to become runnable and add itself to the runnable_threads array
(code near the beginning of kvmppc_run_vcpu()) and then get included
in the iteration in post_guest_process despite the fact that it
has not just run.  This is benign because vcpu->arch.trap and
vcpu->arch.ceded will be zero.

Cc: stable@vger.kernel.org # v4.13+
Fixes: 57900694 ("KVM: PPC: Book3S HV: Virtualize doorbell facility on POWER9")
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

36ee41d1

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功