提交 · c0ab85267e25e34ce8b7e4429f0ef01fa0795b80 · openanolis / cloud-kernel

24 5月, 2018 2 次提交

bpf: powerpc64: add JIT support for multi-function programs · 8484ce83

由 Sandipan Das 提交于 5月 24, 2018

This adds support for bpf-to-bpf function calls in the powerpc64
JIT compiler. The JIT compiler converts the bpf call instructions
to native branch instructions. After a round of the usual passes,
the start addresses of the JITed images for the callee functions
are known. Finally, to fixup the branch target addresses, we need
to perform an extra pass.

Because of the address range in which JITed images are allocated
on powerpc64, the offsets of the start addresses of these images
from __bpf_call_base are as large as 64 bits. So, for a function
call, we cannot use the imm field of the instruction to determine
the callee's address. Instead, we use the alternative method of
getting it from the list of function addresses in the auxiliary
data of the caller by using the off field as an index.
Signed-off-by: NSandipan Das <sandipan@linux.vnet.ibm.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

8484ce83

bpf: powerpc64: pad function address loads with NOPs · 4ea69b2f

由 Sandipan Das 提交于 5月 24, 2018

For multi-function programs, loading the address of a callee
function to a register requires emitting instructions whose
count varies from one to five depending on the nature of the
address.

Since we come to know of the callee's address only before the
extra pass, the number of instructions required to load this
address may vary from what was previously generated. This can
make the JITed image grow or shrink.

To avoid this, we should generate a constant five-instruction
when loading function addresses by padding the optimized load
sequence with NOPs.
Signed-off-by: NSandipan Das <sandipan@linux.vnet.ibm.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

4ea69b2f

22 5月, 2018 1 次提交

powerpc/64s: Add support for a store forwarding barrier at kernel entry/exit · a048a07d

由 Nicholas Piggin 提交于 5月 22, 2018

On some CPUs we can prevent a vulnerability related to store-to-load
forwarding by preventing store forwarding between privilege domains,
by inserting a barrier in kernel entry and exit paths.

This is known to be the case on at least Power7, Power8 and Power9
powerpc CPUs.

Barriers must be inserted generally before the first load after moving
to a higher privilege, and after the last store before moving to a
lower privilege, HV and PR privilege transitions must be protected.

Barriers are added as patch sections, with all kernel/hypervisor entry
points patched, and the exit points to lower privilge levels patched
similarly to the RFI flush patching.

Firmware advertisement is not implemented yet, so CPU flush types
are hard coded.

Thanks to Michal Suchánek for bug fixes and review.
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>
Signed-off-by: NMichael Neuling <mikey@neuling.org>
Signed-off-by: NMichal Suchánek <msuchanek@suse.de>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a048a07d

18 5月, 2018 1 次提交

powerpc/64s: Clear PCR on boot · faf37c44

由 Michael Neuling 提交于 5月 18, 2018

Clear the PCR (Processor Compatibility Register) on boot to ensure we
are not running in a compatibility mode.

We've seen this cause problems when a crash (and kdump) occurs while
running compat mode guests. The kdump kernel then runs with the PCR
set and causes problems. The symptom in the kdump kernel (also seen in
petitboot after fast-reboot) is early userspace programs taking
sigills on newer instructions (seen in libc).
Signed-off-by: NMichael Neuling <mikey@neuling.org>
Cc: stable@vger.kernel.org
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

faf37c44

17 5月, 2018 7 次提交

powerpc/powernv: Fix NVRAM sleep in invalid context when crashing · c1d2a313

由 Nicholas Piggin 提交于 5月 15, 2018

Similarly to opal_event_shutdown, opal_nvram_write can be called in
the crash path with irqs disabled. Special case the delay to avoid
sleeping in invalid context.

Fixes: 3b807033 ("powerpc/powernv: Fix OPAL NVRAM driver OPAL_BUSY loops")
Cc: stable@vger.kernel.org # v3.2
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

c1d2a313

powerpc: Allow LD_DEAD_CODE_DATA_ELIMINATION to be selected · 4c1d9bb0

由 Nicholas Piggin 提交于 5月 09, 2018

This requires further changes to linker script to KEEP some tables
and wildcard compiler generated sections into the right place. This
includes pp32 modifications from Christophe Leroy.

When compiling powernv_defconfig with this option, the resulting
kernel is almost 400kB smaller (and still boots):

    text      data       bss        dec   filename
11827621   4810490   1341080   17979191   vmlinux
11752437   4598858   1338776   17690071   vmlinux.dcde

Mathieu's numbers for custom Mac Mini G4 config has almost 200kB
saving. It also had some increase in vmlinux size for as-yet
unknown reasons.

    text      data       bss        dec   filename
 7461457   2475122   1428064   11364643   vmlinux
 7386425   2364370   1425432   11176227   vmlinux.dcde

Tested-by: Christophe Leroy <christophe.leroy@c-s.fr> [8xx]
Tested-by: Mathieu Malaterre <malat@debian.org> [32-bit powermac]
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>

4c1d9bb0

KVM: PPC: Book 3S HV: Do ptesync in radix guest exit path · df158189

由 Paul Mackerras 提交于 5月 17, 2018

A radix guest can execute tlbie instructions to invalidate TLB entries.
After a tlbie or a group of tlbies, it must then do the architected
sequence eieio; tlbsync; ptesync to ensure that the TLB invalidation
has been processed by all CPUs in the system before it can rely on
no CPU using any translation that it just invalidated.

In fact it is the ptesync which does the actual synchronization in
this sequence, and hardware has a requirement that the ptesync must
be executed on the same CPU thread as the tlbies which it is expected
to order. Thus, if a vCPU gets moved from one physical CPU to
another after it has done some tlbies but before it can get to do the
ptesync, the ptesync will not have the desired effect when it is
executed on the second physical CPU.

To fix this, we do a ptesync in the exit path for radix guests. If
there are any pending tlbies, this will wait for them to complete.
If there aren't, then ptesync will just do the same as sync.
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

df158189

KVM: PPC: Book3S HV: XIVE: Resend re-routed interrupts on CPU priority change · 9dc81d6b

由 Benjamin Herrenschmidt 提交于 5月 10, 2018

When a vcpu priority (CPPR) is set to a lower value (masking more
interrupts), we stop processing interrupts already in the queue
for the priorities that have now been masked.

If those interrupts were previously re-routed to a different
CPU, they might still be stuck until the older one that has
them in its queue processes them. In the case of guest CPU
unplug, that can be never.

To address that without creating additional overhead for
the normal interrupt processing path, this changes H_CPPR
handling so that when such a priority change occurs, we
scan the interrupt queue for that vCPU, and for any
interrupt in there that has been re-routed, we replace it
with a dummy and force a re-trigger.
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Tested-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

9dc81d6b

KVM: PPC: Book3S HV: Make radix clear pte when unmapping · 7e3d9a1d

由 Nicholas Piggin 提交于 5月 09, 2018

The current partition table unmap code clears the _PAGE_PRESENT bit
out of the pte, which leaves pud_huge/pmd_huge true and does not
clear pud_present/pmd_present.  This can confuse subsequent page
faults and possibly lead to the guest looping doing continual
hypervisor page faults.
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

7e3d9a1d

KVM: PPC: Book3S HV: Make radix use correct tlbie sequence in kvmppc_radix_tlbie_page · e2560b10

由 Nicholas Piggin 提交于 5月 09, 2018

The standard eieio ; tlbsync ; ptesync must follow tlbie to ensure it
is ordered with respect to subsequent operations.
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

e2560b10

KVM: PPC: Book3S HV: Snapshot timebase offset on guest entry · 57b8daa7

由 Paul Mackerras 提交于 4月 20, 2018

Currently, the HV KVM guest entry/exit code adds the timebase offset
from the vcore struct to the timebase on guest entry, and subtracts
it on guest exit. Which is fine, except that it is possible for
userspace to change the offset using the SET_ONE_REG interface while
the vcore is running, as there is only one timebase offset per vcore
but potentially multiple VCPUs in the vcore. If that were to happen,
KVM would subtract a different offset on guest exit from that which
it had added on guest entry, leading to the timebase being out of sync
between cores in the host, which then leads to bad things happening
such as hangs and spurious watchdog timeouts.

To fix this, we add a new field 'tb_offset_applied' to the vcore struct
which stores the offset that is currently applied to the timebase.
This value is set from the vcore tb_offset field on guest entry, and
is what is subtracted from the timebase on guest exit. Since it is
zero when the timebase offset is not applied, we can simplify the
logic in kvmhv_start_timing and kvmhv_accumulate_time.

In addition, we had secondary threads reading the timebase while
running concurrently with code on the primary thread which would
eventually add or subtract the timebase offset from the timebase.
This occurred while saving or restoring the DEC register value on
the secondary threads. Although no specific incorrect behaviour has
been observed, this is a race which should be fixed. To fix it, we
move the DEC saving code to just before we call kvmhv_commence_exit,
and the DEC restoring code to after the point where we have waited
for the primary thread to switch the MMU context and add the timebase
offset. That way we are sure that the timebase contains the guest
timebase value in both cases.
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

57b8daa7

16 5月, 2018 1 次提交

proc: introduce proc_create_single{,_data} · 3f3942ac

由 Christoph Hellwig 提交于 5月 15, 2018

Variants of proc_create{,_data} that directly take a seq_file show
callback and drastically reduces the boilerplate code in the callers.

All trivial callers converted over.
Signed-off-by: NChristoph Hellwig <hch@lst.de>

3f3942ac

14 5月, 2018 2 次提交

misc: IBM Virtual Management Channel Driver (VMC) · 0eca353e

由 Bryant G. Ly 提交于 4月 25, 2018

This driver is a logical device which provides an
interface between the hypervisor and a management
partition. This interface is like a message
passing interface. This management partition
is intended to provide an alternative to HMC-based
system management.

VMC enables the Management LPAR to provide basic
logical partition functions:
- Logical Partition Configuration
- Boot, start, and stop actions for individual
  partitions
- Display of partition status
- Management of virtual Ethernet
- Management of virtual Storage
- Basic system management

This driver is to be used for the POWER Virtual
Management Channel Virtual Adapter on the PowerPC
platform. It provides a character device which
allows for both request/response and async message
support through the /dev/ibmvmc node.
Signed-off-by: NBryant G. Ly <bryantly@linux.vnet.ibm.com>
Reviewed-by: NSteven Royer <seroyer@linux.vnet.ibm.com>
Reviewed-by: NAdam Reznechek <adreznec@linux.vnet.ibm.com>
Reviewed-by: NRandy Dunlap <rdunlap@infradead.org>
Tested-by: NTaylor Jakobson <tjakobs@us.ibm.com>
Tested-by: NBrad Warrum <bwarrum@us.ibm.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

0eca353e

softirq/powerpc: Switch to generic local_softirq_pending() implementation · 1321a5de

由 Frederic Weisbecker 提交于 5月 08, 2018

Remove the ad-hoc implementation, the generic code now allows us not to
reinvent the wheel.
Signed-off-by: NFrederic Weisbecker <frederic@kernel.org>
Acked-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NPeter Zijlstra <peterz@infradead.org>
Acked-by: NMichael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: James E.J. Bottomley <jejb@parisc-linux.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: http://lkml.kernel.org/r/1525786706-22846-9-git-send-email-frederic@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

1321a5de

09 5月, 2018 6 次提交

swiotlb: move the SWIOTLB config symbol to lib/Kconfig · 09230cbc

由 Christoph Hellwig 提交于 4月 24, 2018

This way we have one central definition of it, and user can select it as
needed.  The new option is not user visible, which is the behavior
it had in most architectures, with a few notable exceptions:

 - On x86_64 and mips/loongson3 it used to be user selectable, but
   defaulted to y.  It now is unconditional, which seems like the right
   thing for 64-bit architectures without guaranteed availablity of
   IOMMUs.
 - on powerpc the symbol is user selectable and defaults to n, but
   many boards select it.  This change assumes no working setup
   required a manual selection, but if that turned out to be wrong
   we'll have to add another select statement or two for the respective
   boards.
Signed-off-by: NChristoph Hellwig <hch@lst.de>

09230cbc

arch: define the ARCH_DMA_ADDR_T_64BIT config symbol in lib/Kconfig · 4965a687

由 Christoph Hellwig 提交于 4月 03, 2018

Define this symbol if the architecture either uses 64-bit pointers or the
PHYS_ADDR_T_64BIT is set.  This covers 95% of the old arch magic.  We only
need an additional select for Xen on ARM (why anyway?), and we now always
set ARCH_DMA_ADDR_T_64BIT on mips boards with 64-bit physical addressing
instead of only doing it when highmem is set.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NJames Hogan <jhogan@kernel.org>

4965a687

arch: remove the ARCH_PHYS_ADDR_T_64BIT config symbol · d4a451d5

由 Christoph Hellwig 提交于 4月 03, 2018

Instead select the PHYS_ADDR_T_64BIT for 32-bit architectures that need a
64-bit phys_addr_t type directly.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NJames Hogan <jhogan@kernel.org>

d4a451d5

scatterlist: move the NEED_SG_DMA_LENGTH config symbol to lib/Kconfig · 86596f0a

由 Christoph Hellwig 提交于 4月 05, 2018

This way we have one central definition of it, and user can select it as
needed.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NAnshuman Khandual <khandual@linux.vnet.ibm.com>

86596f0a

iommu-helper: move the IOMMU_HELPER config symbol to lib/ · a4ce5a48

由 Christoph Hellwig 提交于 4月 03, 2018

This way we have one central definition of it, and user can select it as
needed.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NAnshuman Khandual <khandual@linux.vnet.ibm.com>

a4ce5a48

iommu-helper: mark iommu_is_span_boundary as inline · 79c1879e

由 Christoph Hellwig 提交于 4月 03, 2018

This avoids selecting IOMMU_HELPER just for this function.  And we only
use it once or twice in normal builds so this often even is a size
reduction.
Signed-off-by: NChristoph Hellwig <hch@lst.de>

79c1879e

08 5月, 2018 3 次提交

dma-debug: remove CONFIG_HAVE_DMA_API_DEBUG · 6e88628d

由 Christoph Hellwig 提交于 5月 08, 2018

There is no arch specific code required for dma-debug, so there is no
need to opt into the support either.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NRobin Murphy <robin.murphy@arm.com>

6e88628d

dma-debug: move initialization to common code · 15b28bbc

由 Christoph Hellwig 提交于 4月 16, 2018

Most mainstream architectures are using 65536 entries, so lets stick to
that.  If someone is really desperate to override it that can still be
done through <asm/dma-mapping.h>, but I'd rather see a really good
rationale for that.

dma_debug_init is now called as a core_initcall, which for many
architectures means much earlier, and provides dma-debug functionality
earlier in the boot process.  This should be safe as it only relies
on the memory allocator already being available.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NMarek Szyprowski <m.szyprowski@samsung.com>
Reviewed-by: NRobin Murphy <robin.murphy@arm.com>

15b28bbc

powerpc/pseries: Fix CONFIG_NUMA=n build · 6c0a8f6b

由 Michael Ellerman 提交于 5月 08, 2018

The build is failing with CONFIG_NUMA=n and some compiler versions:

arch/powerpc/platforms/pseries/hotplug-cpu.o: In function `dlpar_online_cpu':
hotplug-cpu.c:(.text+0x12c): undefined reference to `timed_topology_update'
arch/powerpc/platforms/pseries/hotplug-cpu.o: In function `dlpar_cpu_remove':
hotplug-cpu.c:(.text+0x400): undefined reference to `timed_topology_update'

Fix it by moving the empty version of timed_topology_update() into the
existing #ifdef block, which has the right guard of SPLPAR && NUMA.

Fixes: cee5405d ("powerpc/hotplug: Improve responsiveness of hotplug change")
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

6c0a8f6b

07 5月, 2018 4 次提交

powerpc/trace/syscalls: Update syscall name matching logic to account for ppc_ prefix · edf6a2df

由 Naveen N. Rao 提交于 5月 04, 2018

Some syscall entry functions on powerpc are prefixed with
ppc_/ppc32_/ppc64_ rather than the usual sys_/__se_sys prefix. fork(),
clone(), swapcontext() are some examples of syscalls with such entry
points. We need to match against these names when initializing ftrace
syscall tracing.
Signed-off-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

edf6a2df

powerpc/trace/syscalls: Update syscall name matching logic · 0b7758aa

由 Naveen N. Rao 提交于 5月 04, 2018

On powerpc64 ABIv1, we are enabling syscall tracing for only ~20
syscalls. This is due to commit e145242e ("syscalls/core,
syscalls/x86: Clean up syscall stub naming convention") which has
changed the syscall entry wrapper prefix from "SyS" to "__se_sys".

Update the logic for ABIv1 to not just skip the initial dot, but also
the "__se_sys" prefix.

Fixes: commit e145242e ("syscalls/core, syscalls/x86: Clean up syscall stub naming convention")
Reported-by: NMichael Ellerman <mpe@ellerman.id.au>
Signed-off-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

0b7758aa

powerpc/64: Remove unused paca->soft_enabled · c4ec1f03

由 Michael Ellerman 提交于 5月 02, 2018

In commit 4e26bc4a ("powerpc/64: Rename soft_enabled to
irq_soft_mask") we renamed paca->soft_enabled. But then in commit
8e0b634b ("powerpc/64s: Do not allocate lppaca if we are not
virtualized") we added it back. Oops. This happened because the two
patches were in flight at the same time and rebased vs each other
multiple times, and we missed it in review.

Fixes: 8e0b634b ("powerpc/64s: Do not allocate lppaca if we are not virtualized")
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

c4ec1f03

PCI: remove PCI_DMA_BUS_IS_PHYS · 325ef185

由 Christoph Hellwig 提交于 4月 12, 2018

This was used by the ide, scsi and networking code in the past to
determine if they should bounce payloads.  Now that the dma mapping
always have to support dma to all physical memory (thanks to swiotlb
for non-iommu systems) there is no need to this crude hack any more.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Acked-by: Palmer Dabbelt <palmer@sifive.com> (for riscv)
Reviewed-by: NJens Axboe <axboe@kernel.dk>

325ef185

04 5月, 2018 1 次提交

bpf, ppc64: remove ld_abs/ld_ind · dbf44daf

由 Daniel Borkmann 提交于 5月 04, 2018

Since LD_ABS/LD_IND instructions are now removed from the core and
reimplemented through a combination of inlined BPF instructions and
a slow-path helper, we can get rid of the complexity from ppc64 JIT.
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Tested-by: NSandipan Das <sandipan@linux.vnet.ibm.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

dbf44daf

27 4月, 2018 2 次提交

powerpc/kvm/booke: Fix altivec related build break · b2d7ecbe

由 Laurentiu Tudor 提交于 4月 26, 2018

Add missing "altivec unavailable" interrupt injection helper
thus fixing the linker error below:

arch/powerpc/kvm/emulate_loadstore.o: In function `kvmppc_check_altivec_disabled':
arch/powerpc/kvm/emulate_loadstore.c: undefined reference to `.kvmppc_core_queue_vec_unavail'

Fixes: 09f98496 ("KVM: PPC: Book3S: Add MMIO emulation for VMX instructions")
Signed-off-by: NLaurentiu Tudor <laurentiu.tudor@nxp.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

b2d7ecbe

powerpc: Fix deadlock with multiple calls to smp_send_stop · 6029755e

由 Nicholas Piggin 提交于 4月 27, 2018

smp_send_stop can lock up the IPI path for any subsequent calls,
because the receiving CPUs spin in their handler function. This
started becoming a problem with the addition of an smp_send_stop
call in the reboot path, because panics can reboot after doing
their own smp_send_stop.

The NMI IPI variant was fixed with ac61c115 ("powerpc: Fix
smp_send_stop NMI IPI handling"), which leaves the smp_call_function
variant.

This is fixed by having smp_send_stop only ever do the
smp_call_function once. This is a bit less robust than the NMI IPI
fix, because any other call to smp_call_function after smp_send_stop
could deadlock, but that has always been the case, and it was not
been a problem before.

Fixes: f2748bdf ("powerpc/powernv: Always stop secondaries before reboot/shutdown")
Reported-by: NAbdul Haleem <abdhalee@linux.vnet.ibm.com>
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

6029755e

25 4月, 2018 5 次提交

signal/powerpc: Replace TRAP_FIXME with TRAP_UNK · e821fa42

由 Eric W. Biederman 提交于 4月 17, 2018

Using an si_code of 0 that aliases with SI_USER is clearly the wrong
thing todo, and causes problems in interesting ways.

For use in unknown_exception the recently defined TRAP_UNK
semantically is a perfect fit.  For use in RunModeException it looks
like something more specific than TRAP_UNK could be used.  No one has
bothered to find a better fit than the broken si_code of 0 in all of
these years and I don't see an obvious better fit so TRAP_UNK is
switching RunModeException to return TRAP_UNK is clearly an
improvement.

Recent history suggests no actually cares about crazy corner
cases of the kernel behavior like this so I don't expect any
regressions from changing this.  However if something does
happen this change is easy to revert.

Though I wonder if SIGKILL might not be a better fit.

Cc: Paul Mackerras <paulus@samba.org>
Cc: Kumar Gala <kumar.gala@freescale.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: linuxppc-dev@lists.ozlabs.org
Fixes: 9bad068c24d7 ("[PATCH] ppc32: support for e500 and 85xx")
Fixes: 0ed70f6105ef ("PPC32: Provide proper siginfo information on various exceptions.")
History Tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.gitSigned-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

e821fa42

signal/powerpc: Replace FPE_FIXME with FPE_FLTUNK · aeb1c0f6

由 Eric W. Biederman 提交于 4月 17, 2018

Using an si_code of 0 that aliases with SI_USER is clearly the
wrong thing todo, and causes problems in interesting ways.

The newly defined FPE_FLTUNK semantically appears to fit the
bill so use it instead.

Cc: Paul Mackerras <paulus@samba.org>
Cc: Kumar Gala <kumar.gala@freescale.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc:  linuxppc-dev@lists.ozlabs.org
Fixes: 9bad068c24d7 ("[PATCH] ppc32: support for e500 and 85xx")
Fixes: 0ed70f6105ef ("PPC32: Provide proper siginfo information on various exceptions.")
History Tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.gitSigned-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

aeb1c0f6

signal: Ensure every siginfo we send has all bits initialized · 3eb0f519

由 Eric W. Biederman 提交于 4月 17, 2018

Call clear_siginfo to ensure every stack allocated siginfo is properly
initialized before being passed to the signal sending functions.

Note: It is not safe to depend on C initializers to initialize struct
siginfo on the stack because C is allowed to skip holes when
initializing a structure.

The initialization of struct siginfo in tracehook_report_syscall_exit
was moved from the helper user_single_step_siginfo into
tracehook_report_syscall_exit itself, to make it clear that the local
variable siginfo gets fully initialized.

In a few cases the scope of struct siginfo has been reduced to make it
clear that siginfo siginfo is not used on other paths in the function
in which it is declared.

Instances of using memset to initialize siginfo have been replaced
with calls clear_siginfo for clarity.
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

3eb0f519

powerpc: Fix smp_send_stop NMI IPI handling · ac61c115

由 Nicholas Piggin 提交于 4月 25, 2018

The NMI IPI handler for a receiving CPU increments nmi_ipi_busy_count
over the handler function call, which causes later smp_send_nmi_ipi()
callers to spin until the call is finished.

The stop_this_cpu() function never returns, so the busy count is never
decremeted, which can cause the system to hang in some cases. For
example panic() will call smp_send_stop() early on which calls
stop_this_cpu() on other CPUs, then later in the reboot path,
pnv_restart() will call smp_send_stop() again, which hangs.

Fix this by adding a special case to the stop_this_cpu() handler to
decrement the busy count, because it will never return.

Now that the NMI/non-NMI versions of stop_this_cpu() are different,
split them out into separate functions rather than doing #ifdef tricks
to share the body between the two functions.

Fixes: 6bed3237 ("powerpc: use NMI IPI for smp_send_stop")
Reported-by: NAbdul Haleem <abdhalee@linux.vnet.ibm.com>
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
[mpe: Split out the functions, tweak change log a bit]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

ac61c115

rtc: opal: Fix OPAL RTC driver OPAL_BUSY loops · 682e6b4d

由 Nicholas Piggin 提交于 4月 10, 2018

The OPAL RTC driver does not sleep in case it gets OPAL_BUSY or
OPAL_BUSY_EVENT from firmware, which causes large scheduling
latencies, up to 50 seconds have been observed here when RTC stops
responding (BMC reboot can do it).

Fix this by converting it to the standard form OPAL_BUSY loop that
sleeps.

Fixes: 628daa8d ("powerpc/powernv: Add RTC and NVRAM support plus RTAS fallbacks")
Cc: stable@vger.kernel.org # v3.2+
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Acked-by: NAlexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

682e6b4d

24 4月, 2018 5 次提交

powerpc/mce: Fix a bug where mce loops on memory UE. · 75ecfb49

由 Mahesh Salgaonkar 提交于 4月 23, 2018

The current code extracts the physical address for UE errors and then
hooks it up into memory failure infrastructure. On successful
extraction of physical address it wrongly sets "handled = 1" which
means this UE error has been recovered. Since MCE handler gets return
value as handled = 1, it assumes that error has been recovered and
goes back to same NIP. This causes MCE interrupt again and again in a
loop leading to hard lockup.

Also, initialize phys_addr to ULONG_MAX so that we don't end up
queuing undesired page to hwpoison.

Without this patch we see:
  Severe Machine check interrupt [Recovered]
    NIP: [000000001002588c] PID: 7109 Comm: find
    Initiator: CPU
    Error type: UE [Load/Store]
      Effective address: 00007fffd2755940
      Physical address:  000020181a080000
  ...
  Severe Machine check interrupt [Recovered]
    NIP: [000000001002588c] PID: 7109 Comm: find
    Initiator: CPU
    Error type: UE [Load/Store]
      Effective address: 00007fffd2755940
      Physical address:  000020181a080000
  Severe Machine check interrupt [Recovered]
    NIP: [000000001002588c] PID: 7109 Comm: find
    Initiator: CPU
    Error type: UE [Load/Store]
      Effective address: 00007fffd2755940
      Physical address:  000020181a080000
  Memory failure: 0x20181a08: recovery action for dirty LRU page: Recovered
  Memory failure: 0x20181a08: already hardware poisoned
  Memory failure: 0x20181a08: already hardware poisoned
  Memory failure: 0x20181a08: already hardware poisoned
  Memory failure: 0x20181a08: already hardware poisoned
  Memory failure: 0x20181a08: already hardware poisoned
  Memory failure: 0x20181a08: already hardware poisoned
  ...
  Watchdog CPU:38 Hard LOCKUP

After this patch we see:

  Severe Machine check interrupt [Not recovered]
    NIP: [00007fffaae585f4] PID: 7168 Comm: find
    Initiator: CPU
    Error type: UE [Load/Store]
      Effective address: 00007fffaafe28ac
      Physical address:  00002017c0bd0000
  find[7168]: unhandled signal 7 at 00007fffaae585f4 nip 00007fffaae585f4 lr 00007fffaae585e0 code 4
  Memory failure: 0x2017c0bd: recovery action for dirty LRU page: Recovered

Fixes: 01eaac2b ("powerpc/mce: Hookup ierror (instruction) UE errors")
Fixes: ba41e1e1 ("powerpc/mce: Hookup derror (load/store) UE errors")
Cc: stable@vger.kernel.org # v4.15+
Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: NBalbir Singh <bsingharora@gmail.com>
Reviewed-by: NBalbir Singh <bsingharora@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

75ecfb49

powerpc/powernv/npu: Do a PID GPU TLB flush when invalidating a large address range · d0cf9b56

由 Alistair Popple 提交于 4月 17, 2018

The NPU has a limited number of address translation shootdown (ATSD)
registers and the GPU has limited bandwidth to process ATSDs. This can
result in contention of ATSD registers leading to soft lockups on some
threads, particularly when invalidating a large address range in
pnv_npu2_mn_invalidate_range().

At some threshold it becomes more efficient to flush the entire GPU
TLB for the given MM context (PID) than individually flushing each
address in the range. This patch will result in ranges greater than
2MB being converted from 32+ ATSDs into a single ATSD which will flush
the TLB for the given PID on each GPU.

Fixes: 1ab66d1f ("powerpc/powernv: Introduce address translation services for Nvlink2")
Cc: stable@vger.kernel.org # v4.12+
Signed-off-by: NAlistair Popple <alistair@popple.id.au>
Acked-by: NBalbir Singh <bsingharora@gmail.com>
Tested-by: NBalbir Singh <bsingharora@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

d0cf9b56

powerpc/powernv/npu: Prevent overwriting of pnv_npu2_init_contex() callback parameters · a1409ada

由 Alistair Popple 提交于 4月 11, 2018

There is a single npu context per set of callback parameters. Callers
should be prevented from overwriting existing callback values so
instead return an error if different parameters are passed.

Fixes: 1ab66d1f ("powerpc/powernv: Introduce address translation services for Nvlink2")
Cc: stable@vger.kernel.org # v4.12+
Signed-off-by: NAlistair Popple <alistair@popple.id.au>
Reviewed-by: NMark Hairgrove <mhairgrove@nvidia.com>
Tested-by: NMark Hairgrove <mhairgrove@nvidia.com>
Reviewed-by: NBalbir Singh <bsingharora@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

a1409ada

powerpc/powernv/npu: Add lock to prevent race in concurrent context init/destroy · 28a5933e

由 Alistair Popple 提交于 4月 11, 2018

The pnv_npu2_init_context() and pnv_npu2_destroy_context() functions
are used to allocate/free contexts to allow address translation and
shootdown by the NPU on a particular GPU. Context initialisation is
implicitly safe as it is protected by the requirement mmap_sem be held
in write mode, however pnv_npu2_destroy_context() does not require
mmap_sem to be held and it is not safe to call with a concurrent
initialisation for a different GPU.

It was assumed the driver would ensure destruction was not called
concurrently with initialisation. However the driver may be simplified
by allowing concurrent initialisation and destruction for different
GPUs. As npu context creation/destruction is not a performance
critical path and the critical section is not large a single spinlock
is used for simplicity.

Fixes: 1ab66d1f ("powerpc/powernv: Introduce address translation services for Nvlink2")
Cc: stable@vger.kernel.org # v4.12+
Signed-off-by: NAlistair Popple <alistair@popple.id.au>
Reviewed-by: NMark Hairgrove <mhairgrove@nvidia.com>
Tested-by: NMark Hairgrove <mhairgrove@nvidia.com>
Reviewed-by: NBalbir Singh <bsingharora@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

28a5933e

powerpc/powernv/memtrace: Let the arch hotunplug code flush cache · 7fd6641d

由 Balbir Singh 提交于 4月 06, 2018

Don't do this via custom code, instead now that we have support in the
arch hotplug/hotunplug code, rely on those routines to do the right
thing.

The existing flush doesn't work because it uses ppc64_caches.l1d.size
instead of ppc64_caches.l1d.line_size.

Fixes: 9d5171a8 ("powerpc/powernv: Enable removal of memory for in memory tracing")
Signed-off-by: NBalbir Singh <bsingharora@gmail.com>
Reviewed-by: NRashmica Gupta <rashmica.g@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

7fd6641d

openanolis / cloud-kernel 接近 2 年 前同步成功

openanolis / cloud-kernel
接近 2 年前同步成功