提交 · 4bb3c7a0208fc13ca70598efd109901a7cd45ae7 · openanolis / cloud-kernel

23 3月, 2018 1 次提交

KVM: PPC: Book3S HV: Work around transactional memory bugs in POWER9 · 4bb3c7a0

由 Paul Mackerras 提交于 3月 21, 2018

POWER9 has hardware bugs relating to transactional memory and thread
reconfiguration (changes to hardware SMT mode). Specifically, the core
does not have enough storage to store a complete checkpoint of all the
architected state for all four threads. The DD2.2 version of POWER9
includes hardware modifications designed to allow hypervisor software
to implement workarounds for these problems. This patch implements
those workarounds in KVM code so that KVM guests see a full, working
transactional memory implementation.

The problems center around the use of TM suspended state, where the
CPU has a checkpointed state but execution is not transactional. The
workaround is to implement a "fake suspend" state, which looks to the
guest like suspended state but the CPU does not store a checkpoint.
In this state, any instruction that would cause a transition to
transactional state (rfid, rfebb, mtmsrd, tresume) or would use the
checkpointed state (treclaim) causes a "soft patch" interrupt (vector
0x1500) to the hypervisor so that it can be emulated. The trechkpt
instruction also causes a soft patch interrupt.

On POWER9 DD2.2, we avoid returning to the guest in any state which
would require a checkpoint to be present. The trechkpt in the guest
entry path which would normally create that checkpoint is replaced by
either a transition to fake suspend state, if the guest is in suspend
state, or a rollback to the pre-transactional state if the guest is in
transactional state. Fake suspend state is indicated by a flag in the
PACA plus a new bit in the PSSCR. The new PSSCR bit is write-only and
reads back as 0.

On exit from the guest, if the guest is in fake suspend state, we still
do the treclaim instruction as we would in real suspend state, in order
to get into non-transactional state, but we do not save the resulting
register state since there was no checkpoint.

Emulation of the instructions that cause a softpatch interrupt is
handled in two paths. If the guest is in real suspend mode, we call
kvmhv_p9_tm_emulation_early() to handle the cases where the guest is
transitioning to transactional state. This is called before we do the
treclaim in the guest exit path; because we haven't done treclaim, we
can get back to the guest with the transaction still active. If the
instruction is a case that kvmhv_p9_tm_emulation_early() doesn't
handle, or if the guest is in fake suspend state, then we proceed to
do the complete guest exit path and subsequently call
kvmhv_p9_tm_emulation() in host context with the MMU on. This handles
all the cases including the cases that generate program interrupts
(illegal instruction or TM Bad Thing) and facility unavailable
interrupts.

The emulation is reasonably straightforward and is mostly concerned
with checking for exception conditions and updating the state of
registers such as MSR and CR0. The treclaim emulation takes care to
ensure that the TEXASR register gets updated as if it were the guest
treclaim instruction that had done failure recording, not the treclaim
done in hypervisor state in the guest exit path.

With this, the KVM_CAP_PPC_HTM capability returns true (1) even if
transactional memory is not available to host userspace.
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

4bb3c7a0

23 1月, 2018 1 次提交

powerpc/64s: Improve RFI L1-D cache flush fallback · bdcb1aef

由 Nicholas Piggin 提交于 1月 17, 2018

The fallback RFI flush is used when firmware does not provide a way
to flush the cache. It's a "displacement flush" that evicts useful
data by displacing it with an uninteresting buffer.

The flush has to take care to work with implementation specific cache
replacment policies, so the recipe has been in flux. The initial
slow but conservative approach is to touch all lines of a congruence
class, with dependencies between each load. It has since been
determined that a linear pattern of loads without dependencies is
sufficient, and is significantly faster.

Measuring the speed of a null syscall with RFI fallback flush enabled
gives the relative improvement:

P8 - 1.83x
P9 - 1.75x

The flush also becomes simpler and more adaptable to different cache
geometries.
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

bdcb1aef

20 1月, 2018 1 次提交

powerpc: Handle exceptions caused by pkey violation · e6c2a479

由 Ram Pai 提交于 1月 18, 2018

Handle Data and  Instruction exceptions caused by memory
protection-key.

The CPU will detect the key fault if the HPTE is already
programmed with the key.

However if the HPTE is not  hashed, a key fault will not
be detected by the hardware. The software will detect
pkey violation in such a case.
Signed-off-by: NRam Pai <linuxram@us.ibm.com>
Signed-off-by: NThiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

e6c2a479

19 1月, 2018 2 次提交

powerpc/64s: Add support to mask perf interrupts and replay them · f442d004

由 Madhavan Srinivasan 提交于 12月 20, 2017

Two new bit mask field "IRQ_DISABLE_MASK_PMU" is introduced to support
the masking of PMI and "IRQ_DISABLE_MASK_ALL" to aid interrupt masking
checking.

Couple of new irq #defs "PACA_IRQ_PMI" and "SOFTEN_VALUE_0xf0*" added
to use in the exception code to check for PMI interrupts.

In the masked_interrupt handler, for PMIs we reset the MSR[EE] and
return. In the __check_irq_replay(), replay the PMI interrupt by
calling performance_monitor_common handler.
Signed-off-by: NMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

f442d004

powerpc/64s: Add support to take additional parameter in MASKABLE_* macro · f14e953b

由 Madhavan Srinivasan 提交于 12月 20, 2017

To support addition of "bitmask" to MASKABLE_* macros, factor out the
EXCPETION_PROLOG_1 macro.

Make it explicit the interrupt masking supported by a gievn interrupt
handler. Patch correspondingly extends the MASKABLE_* macros with an
addition's parameter. "bitmask" parameter is passed to SOFTEN_TEST
macro to decide on masking the interrupt.
Signed-off-by: NMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

f14e953b

16 1月, 2018 1 次提交

powerpc: Remove useless EXC_COMMON_HV · 872e2ae4

由 Benjamin Herrenschmidt 提交于 1月 12, 2018

The only difference between EXC_COMMON_HV and EXC_COMMON is that the
former adds "2" to the trap number which is supposed to represent the
fact that this is an "HV" interrupt which uses HSRR0/1.

However KVM is the only one who cares and it has its own separate macros.

In fact, we only have one user of EXC_COMMON_HV and it's for an
unknown interrupt case. All the other ones already using EXC_COMMON.
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

872e2ae4

10 1月, 2018 3 次提交

powerpc/64s: Add support for RFI flush of L1-D cache · aa8a5e00

由 Michael Ellerman 提交于 1月 10, 2018

On some CPUs we can prevent the Meltdown vulnerability by flushing the
L1-D cache on exit from kernel to user mode, and from hypervisor to
guest.

This is known to be the case on at least Power7, Power8 and Power9. At
this time we do not know the status of the vulnerability on other CPUs
such as the 970 (Apple G5), pasemi CPUs (AmigaOne X1000) or Freescale
CPUs. As more information comes to light we can enable this, or other
mechanisms on those CPUs.

The vulnerability occurs when the load of an architecturally
inaccessible memory region (eg. userspace load of kernel memory) is
speculatively executed to the point where its result can influence the
address of a subsequent speculatively executed load.

In order for that to happen, the first load must hit in the L1,
because before the load is sent to the L2 the permission check is
performed. Therefore if no kernel addresses hit in the L1 the
vulnerability can not occur. We can ensure that is the case by
flushing the L1 whenever we return to userspace. Similarly for
hypervisor vs guest.

In order to flush the L1-D cache on exit, we add a section of nops at
each (h)rfi location that returns to a lower privileged context, and
patch that with some sequence. Newer firmwares are able to advertise
to us that there is a special nop instruction that flushes the L1-D.
If we do not see that advertised, we fall back to doing a displacement
flush in software.

For guest kernels we support migration between some CPU versions, and
different CPUs may use different flush instructions. So that we are
prepared to migrate to a machine with a different flush instruction
activated, we may have to patch more than one flush instruction at
boot if the hypervisor tells us to.

In the end this patch is mostly the work of Nicholas Piggin and
Michael Ellerman. However a cast of thousands contributed to analysis
of the issue, earlier versions of the patch, back ports testing etc.
Many thanks to all of them.
Tested-by: NJon Masters <jcm@redhat.com>
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

aa8a5e00

powerpc/64s: Convert slb_miss_common to use RFI_TO_USER/KERNEL · c7305645

由 Nicholas Piggin 提交于 1月 10, 2018

In the SLB miss handler we may be returning to user or kernel. We need
to add a check early on and save the result in the cr4 register, and
then we bifurcate the return path based on that.
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

c7305645

powerpc/64s: Simple RFI macro conversions · 222f20f1

由 Nicholas Piggin 提交于 1月 10, 2018

This commit does simple conversions of rfi/rfid to the new macros that
include the expected destination context. By simple we mean cases
where there is a single well known destination context, and it's
simply a matter of substituting the instruction for the appropriate
macro.
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

222f20f1

14 11月, 2017 1 次提交

powerpc/64s: Fix masking of SRR1 bits on instruction fault · 475b581f

由 Michael Ellerman 提交于 11月 14, 2017

On 64-bit Book3s, when we take an instruction fault the reason for the
fault may be reported in SRR1. For data faults the reason is reported
in DSISR (Data Storage Instruction Status Register).

The reasons reported in each do not necessarily correspond, so we mask
the SRR1 bits before copying them to the DSISR, which is then used by
the page fault code.

Prior to commit b4c001dc ("powerpc/mm: Use symbolic constants for
filtering SRR1 bits on ISIs") we used a hard-coded mask of 0x58200000,
which corresponds to:

  DSISR_NOHPTE		0x40000000 /* no translation found */
  DSISR_NOEXEC_OR_G	0x10000000 /* exec of no-exec or guarded */
  DSISR_PROTFAULT	0x08000000 /* protection fault */
  DSISR_KEYFAULT	0x00200000 /* Storage Key fault */

That commit added a #define for the mask, DSISR_SRR1_MATCH_64S, but
incorrectly used a different similarly named DSISR_BAD_FAULT_64S.

This had the effect of changing the mask to 0xa43a0000, which omits
everything but DSISR_KEYFAULT.

Luckily this had no visible effect, because in practice we hardly use
the DSISR bits. The lack of DSISR_NOHPTE means a TLB flush
optimisation was missed in the native HPTE code, and DSISR_NOEXEC_OR_G
and DSISR_PROTFAULT are both only used to trigger rare warnings.

So we got lucky, but let's fix it. The new value only has bits between
17 and 30 set, so we can continue to use andis.

Fixes: b4c001dc ("powerpc/mm: Use symbolic constants for filtering SRR1 bits on ISIs")
Cc: stable@vger.kernel.org # v4.14+
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

475b581f

13 11月, 2017 1 次提交

powerpc: Fix DABR match on hash based systems · f23ab3ef

由 Benjamin Herrenschmidt 提交于 11月 10, 2017

Commit 398a719d ("powerpc/mm: Update bits used to skip hash_page")
mistakenly dropped the DSISR_DABRMATCH bit from the mask of bit tested
to skip trying to hash a page.

As a result, the DABR matches would no longer be detected.

This adds it back. We open code it in the 2 places where it matters
rather than fold it into DSISR_BAD_FAULT_32S/64S because this isn't
technically a bad fault and while we would never hit it with the
current code, I prefer if page_fault_is_bad() didn't trigger on these.

Fixes: 398a719d ("powerpc/mm: Update bits used to skip hash_page")
Cc: stable@vger.kernel.org # v4.14
Tested-by: NPedro Miraglia Franco de Carvalho <pedromfc@br.ibm.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

f23ab3ef

06 11月, 2017 2 次提交

powerpc/64s: Replace CONFIG_PPC_STD_MMU_64 with CONFIG_PPC_BOOK3S_64 · 4e003747

由 Michael Ellerman 提交于 10月 19, 2017

CONFIG_PPC_STD_MMU_64 indicates support for the "standard" powerpc MMU
on 64-bit CPUs. The "standard" MMU refers to the hash page table MMU
found in "server" processors, from IBM mainly.

Currently CONFIG_PPC_STD_MMU_64 is == CONFIG_PPC_BOOK3S_64. While it's
annoying to have two symbols that always have the same value, it's not
quite annoying enough to bother removing one.

However with the arrival of Power9, we now have the situation where
CONFIG_PPC_STD_MMU_64 is enabled, but the kernel is running using the
Radix MMU - *not* the "standard" MMU. So it is now actively confusing
to use it, because it implies that code is disabled or inactive when
the Radix MMU is in use, however that is not necessarily true.

So s/CONFIG_PPC_STD_MMU_64/CONFIG_PPC_BOOK3S_64/, and do some minor
formatting updates of some of the affected lines.

This will be a pain for backports, but c'est la vie.
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

4e003747

KVM: PPC: Book3S HV: Handle host system reset in guest mode · 6de6638b

由 Nicholas Piggin 提交于 11月 05, 2017

If the host takes a system reset interrupt while a guest is running,
the CPU must exit the guest before processing the host exception
handler.

After this patch, taking a sysrq+x with a CPU running in a guest
gives a trace like this:

   cpu 0x27: Vector: 100 (System Reset) at [c000000fdf5776f0]
       pc: c008000010158b80: kvmppc_run_core+0x16b8/0x1ad0 [kvm_hv]
       lr: c008000010158b80: kvmppc_run_core+0x16b8/0x1ad0 [kvm_hv]
       sp: c000000fdf577850
      msr: 9000000002803033
     current = 0xc000000fdf4b1e00
     paca    = 0xc00000000fd4d680	 softe: 3	 irq_happened: 0x01
       pid   = 6608, comm = qemu-system-ppc
   Linux version 4.14.0-rc7-01489-g47e1893a404a-dirty #26 SMP
   [c000000fdf577a00] c008000010159dd4 kvmppc_vcpu_run_hv+0x3dc/0x12d0 [kvm_hv]
   [c000000fdf577b30] c0080000100a537c kvmppc_vcpu_run+0x44/0x60 [kvm]
   [c000000fdf577b60] c0080000100a1ae0 kvm_arch_vcpu_ioctl_run+0x118/0x310 [kvm]
   [c000000fdf577c00] c008000010093e98 kvm_vcpu_ioctl+0x530/0x7c0 [kvm]
   [c000000fdf577d50] c000000000357bf8 do_vfs_ioctl+0xd8/0x8c0
   [c000000fdf577df0] c000000000358448 SyS_ioctl+0x68/0x100
   [c000000fdf577e30] c00000000000b220 system_call+0x58/0x6c
   --- Exception: c01 (System Call) at 00007fff76868df0
   SP (7fff7069baf0) is in userspace

Fixes: e36d0a2e ("powerpc/powernv: Implement NMI IPI with OPAL_SIGNAL_SYSTEM_RESET")
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

6de6638b

02 11月, 2017 1 次提交

License cleanup: add SPDX GPL-2.0 license identifier to files with no license · b2441318

由 Greg Kroah-Hartman 提交于 11月 01, 2017

Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.

By default all files without license information are under the default
license of the kernel, which is GPL version 2.

Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier.  The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.

This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.

How this work was done:

Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
 - file had no licensing information it it.
 - file was a */uapi/* one with no licensing information in it,
 - file was a */uapi/* one with existing licensing information,

Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.

The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne.  Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.

The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed.  Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.

Criteria used to select files for SPDX license identifier tagging was:
 - Files considered eligible had to be source code files.
 - Make and config files were included as candidates if they contained >5
   lines of source
 - File already had some variant of a license header in it (even if <5
   lines).

All documentation files were explicitly excluded.

The following heuristics were used to determine which SPDX license
identifiers to apply.

 - when both scanners couldn't find any license traces, file was
   considered to have no license information in it, and the top level
   COPYING file license applied.

   For non */uapi/* files that summary was:

   SPDX license identifier                            # files
   ---------------------------------------------------|-------
   GPL-2.0                                              11139

   and resulted in the first patch in this series.

   If that file was a */uapi/* path one, it was "GPL-2.0 WITH
   Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:

   SPDX license identifier                            # files
   ---------------------------------------------------|-------
   GPL-2.0 WITH Linux-syscall-note                        930

   and resulted in the second patch in this series.

 - if a file had some form of licensing information in it, and was one
   of the */uapi/* ones, it was denoted with the Linux-syscall-note if
   any GPL family license was found in the file or had no licensing in
   it (per prior point).  Results summary:

   SPDX license identifier                            # files
   ---------------------------------------------------|------
   GPL-2.0 WITH Linux-syscall-note                       270
   GPL-2.0+ WITH Linux-syscall-note                      169
   ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
   ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
   LGPL-2.1+ WITH Linux-syscall-note                      15
   GPL-1.0+ WITH Linux-syscall-note                       14
   ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
   LGPL-2.0+ WITH Linux-syscall-note                       4
   LGPL-2.1 WITH Linux-syscall-note                        3
   ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
   ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1

   and that resulted in the third patch in this series.

 - when the two scanners agreed on the detected license(s), that became
   the concluded license(s).

 - when there was disagreement between the two scanners (one detected a
   license but the other didn't, or they both detected different
   licenses) a manual inspection of the file occurred.

 - In most cases a manual inspection of the information in the file
   resulted in a clear resolution of the license that should apply (and
   which scanner probably needed to revisit its heuristics).

 - When it was not immediately clear, the license identifier was
   confirmed with lawyers working with the Linux Foundation.

 - If there was any question as to the appropriate license identifier,
   the file was flagged for further research and to be revisited later
   in time.

In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.

Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights.  The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.

Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.

In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.

Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
 - a full scancode scan run, collecting the matched texts, detected
   license ids and scores
 - reviewing anything where there was a license detected (about 500+
   files) to ensure that the applied SPDX license was correct
 - reviewing anything where there was no detection but the patch license
   was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
   SPDX license was correct

This produced a worksheet with 20 files needing minor correction.  This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.

These .csv files were then reviewed by Greg.  Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected.  This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.)  Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: NKate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: NPhilippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

b2441318

22 10月, 2017 2 次提交

powerpc: Disable the fast-endian switch syscall by default · 727f1361

由 Michael Ellerman 提交于 10月 09, 2017

Back in 2008 we added support for "fast little-endian switch" in the
syscall path. This added a special case syscall number 0x1ebe, which
is caught very early in the system call exception and switches endian
with as little overhead as possible. See commit 745a14cc
("[POWERPC] Add fast little-endian switch system call") for full
details.

Although it is fast, it's also completely non standard. The "syscall
number" is out of the range of normal syscalls, it can't be traced or
audited, and it's a bit of a wart. To the best of our knowledge it was
only used by one program, now long since discontinued.

So in an effort to shake out any current users, put it behind a config
option, and make it default n. If anyone *is* using it they can
quickly reinstate it with a rebuild, and we can flip it to default y.
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

727f1361

M
powerpc/64s: Move the two FAST_ENDIAN macros next to each other · 5c2511bf
由 Michael Ellerman 提交于 10月 09, 2017
```
So we can #ifdef them in the next patch.
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
```
5c2511bf

16 10月, 2017 1 次提交

powerpc/mce: Hookup derror (load/store) UE errors · ba41e1e1

由 Balbir Singh 提交于 9月 29, 2017

Extract physical_address for UE errors by walking the page
tables for the mm and address at the NIP, to extract the
instruction. Then use the instruction to find the effective
address via analyse_instr().

We might have page table walking races, but we expect them to
be rare, the physical address extraction is best effort. The idea
is to then hook up this infrastructure to memory failure eventually.
Signed-off-by: NBalbir Singh <bsingharora@gmail.com>
Reviewed-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

ba41e1e1

06 10月, 2017 1 次提交

powerpc/64s: Use emergency stack for kernel TM Bad Thing program checks · 265e60a1

由 Cyril Bur 提交于 8月 17, 2017

When using transactional memory (TM), the CPU can be in one of six
states as far as TM is concerned, encoded in the Machine State
Register (MSR). Certain state transitions are illegal and if attempted
trigger a "TM Bad Thing" type program check exception.

If we ever hit one of these exceptions it's treated as a bug, ie. we
oops, and kill the process and/or panic, depending on configuration.

One case where we can trigger a TM Bad Thing, is when returning to
userspace after a system call or interrupt, using RFID. When this
happens the CPU first restores the user register state, in particular
r1 (the stack pointer) and then attempts to update the MSR. However
the MSR update is not allowed and so we take the program check with
the user register state, but the kernel MSR.

This tricks the exception entry code into thinking we have a bad
kernel stack pointer, because the MSR says we're coming from the
kernel, but r1 is pointing to userspace.

To avoid this we instead always switch to the emergency stack if we
take a TM Bad Thing from the kernel. That way none of the user
register values are used, other than for printing in the oops message.

This is the fix for CVE-2017-1000255.

Fixes: 5d176f75 ("powerpc: tm: Enable transactional memory (TM) lazily for userspace")
Cc: stable@vger.kernel.org # v4.9+
Signed-off-by: NCyril Bur <cyrilbur@gmail.com>
[mpe: Rewrite change log & comments, tweak asm slightly]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

265e60a1

27 9月, 2017 1 次提交

powerpc/64s: Add workaround for P9 vector CI load issue · 5080332c

由 Michael Neuling 提交于 9月 15, 2017

POWER9 DD2.1 and earlier has an issue where some cache inhibited
vector load will return bad data. The workaround is two part, one
firmware/microcode part triggers HMI interrupts when hitting such
loads, the other part is this patch which then emulates the
instructions in Linux.

The affected instructions are limited to lxvd2x, lxvw4x, lxvb16x and
lxvh8x.

When an instruction triggers the HMI, all threads in the core will be
sent to the HMI handler, not just the one running the vector load.

In general, these spurious HMIs are detected by the emulation code and
we just return back to the running process. Unfortunately, if a
spurious interrupt occurs on a vector load that's to normal memory we
have no way to detect that it's spurious (unless we walk the page
tables, which is very expensive). In this case we emulate the load but
we need do so using a vector load itself to ensure 128bit atomicity is
preserved.

Some additional debugfs emulated instruction counters are added also.
Signed-off-by: NMichael Neuling <mikey@neuling.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
[mpe: Switch CONFIG_PPC_BOOK3S_64 to CONFIG_VSX to unbreak the build]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

5080332c

23 8月, 2017 7 次提交

powerpc/64s: Remove spurious IRQ reason in IRQ replay · 7b76a1f5

由 Nicholas Piggin 提交于 8月 12, 2017

HVI interrupts have always used 0x500, so remove the dead branch.
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

7b76a1f5

powerpc/64s: Use the HV handler for external IRQ replay in HV mode on POWER9 · e6c1203d

由 Nicholas Piggin 提交于 8月 12, 2017

POWER9 host external interrupts use the h_virt_irq_common handler, so
use that to replay them rather than using the hardware_interrupt_common
handler. Both call do_IRQ, but using the correct handler reduces
i-cache footprint.
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

e6c1203d

powerpc/64s: Merge HV and non-HV paths for doorbell IRQ replay · d6f73fc6

由 Nicholas Piggin 提交于 8月 12, 2017

This results in smaller code, and fewer branches. This relies on the
fact that both the 0xe80 and 0xa00 handlers call the same upper level
code, namely doorbell_exception().
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
[mpe: Mention we rely on the implementation of the 0xe80/0xa00 handlers]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

d6f73fc6

powerpc/64s: masked_interrupt() returns to kernel so avoid restoring r13 · c05f0be8

由 Nicholas Piggin 提交于 8月 12, 2017

Places in the kernel where r13 is not the PACA pointer must have
maskable interrupts disabled, so r13 does not have to be restored when
returning from a soft-masked interrupt. We should never have
interrupts soft disabled when we're in user space.
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

c05f0be8

powerpc/64s: Optimise clearing of MSR_EE in masked_[H]interrupt() · 6e9a2f6e

由 Nicholas Piggin 提交于 8月 12, 2017

MSR_EE is always enabled in SRR1 for masked interrupts, so we can use
xor to clear it.
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

6e9a2f6e

powerpc/64s: Avoid a branch in masked_[H]interrupt() · e0c827c0

由 Nicholas Piggin 提交于 8月 12, 2017

Interrupts which do not require EE to be cleared can all be tested
with a single bitwise test.
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

e0c827c0

powerpc/64s: Fix replay interrupt return label name · 3e23a12b

由 Michael Ellerman 提交于 8月 22, 2017

In __replay_interrupt() we take the address of a local label so we can
return to it later. However the assembler turns the local label into a
symbol with a name like ".L1^B42" - where "^B" is literally "\002".
This does not make for pleasant stack traces. Fix it by giving the
label a sensible name.
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

3e23a12b

10 8月, 2017 1 次提交

powerpc: Fix powerpc-specific watchdog build configuration · 75eb767e

由 Nicholas Piggin 提交于 8月 01, 2017

The powerpc kernel/watchdog.o should be built when HARDLOCKUP_DETECTOR
and HAVE_HARDLOCKUP_DETECTOR_ARCH are both selected. If only the former
is selected, then the generic perf watchdog has been selected.

To simplify this check, introduce a new Kconfig symbol PPC_WATCHDOG that
depends on both. This Kconfig option means the powerpc specific
watchdog is enabled.

Without this patch, Book3E will attempt to build the powerpc watchdog.

Fixes: 2104180a ("powerpc/64s: implement arch-specific hardlockup watchdog")
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

75eb767e

03 8月, 2017 2 次提交

powerpc/mm: Use symbolic constants for filtering SRR1 bits on ISIs · b4c001dc

由 Benjamin Herrenschmidt 提交于 7月 19, 2017

This uses the newly defined constants for this rather than open-coded
numbers. There is a side effect on 64-bit which is to pass through
some of the new P9 bits which we didn't before.
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

b4c001dc

powerpc/mm: Update bits used to skip hash_page · 398a719d

由 Benjamin Herrenschmidt 提交于 7月 19, 2017

We test a number of bits from DSISR/SRR1 before deciding
to call hash_page(). If any of these is set, we go directly
to do_page_fault() as the bit indicate a fault that needs
to be handled there (no hashing needed).

This updates the current open-coded masks to use the new
DSISR definitions.

This *does* change the masks actually used in two ways:

 - We used to test various bits that were defined as "always 0"
in the architecture and could be repurposed for something
else. From now on, we just ignore such bits.

 - We were missing some new bits defined on P9
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

398a719d

31 7月, 2017 1 次提交

powerpc/64s: Fix stack setup in watchdog soft_nmi_common() · cc491f1d

由 Nicholas Piggin 提交于 7月 29, 2017

The watchdog soft-NMI exception stack setup loads a stack pointer
twice, which is an obvious error. It ends up using the system reset
interrupt (true-NMI) stack, which is also a bug because the watchdog
could be preempted by a system reset interrupt that overwrites the
NMI stack.

Change the soft-NMI to use the "emergency stack". The current kernel
stack is not used, because of the longer-term goal to prevent
asynchronous stack access using soft-disable.

Fixes: 2104180a ("powerpc/64s: implement arch-specific hardlockup watchdog")
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

cc491f1d

18 7月, 2017 1 次提交

powerpc/64s: Fix hypercall entry clobbering r12 input · 76fc0cfc

由 Nicholas Piggin 提交于 7月 18, 2017

A previous optimisation incorrectly assumed the PAPR hcall does
not use r12, and clobbers it upon entry. In fact it is used as
an input. This can result in KVM guests crashing (observed with
PR KVM).

Instead of using r12 to save r13, tihs patch saves r13 in ctr.
This is more costly, but not as slow as using the SPRG.

Fixes: acd7d8ce ("powerpc/64s: Optimize hypercall/syscall entry")
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

76fc0cfc

13 7月, 2017 1 次提交

powerpc/64s: implement arch-specific hardlockup watchdog · 2104180a

由 Nicholas Piggin 提交于 7月 12, 2017

Implement an arch-speicfic watchdog rather than use the perf-based
hardlockup detector.

The new watchdog takes the soft-NMI directly, rather than going through
perf.  Perf interrupts are to be made maskable in future, so that would
prevent the perf detector from working in those regions.

Additionally, implement a SMP based detector where all CPUs watch one
another by pinging a shared cpumask.  This is because powerpc Book3S
does not have a true periodic local NMI, but some platforms do implement
a true NMI IPI.

If a CPU is stuck with interrupts hard disabled, the soft-NMI watchdog
does not work, but the SMP watchdog will.  Even on platforms without a
true NMI IPI to get a good trace from the stuck CPU, other CPUs will
notice the lockup sufficiently to report it and panic.

[npiggin@gmail.com: honor watchdog disable at boot/hotplug]
  Link: http://lkml.kernel.org/r/20170621001346.5bb337c9@roar.ozlabs.ibm.com
[npiggin@gmail.com: fix false positive warning at CPU unplug]
  Link: http://lkml.kernel.org/r/20170630080740.20766-1-npiggin@gmail.com
[akpm@linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/20170616065715.18390-6-npiggin@gmail.comSigned-off-by: NNicholas Piggin <npiggin@gmail.com>
Reviewed-by: NDon Zickus <dzickus@redhat.com>
Tested-by: Babu Moger <babu.moger@oracle.com>	[sparc]
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2104180a

03 7月, 2017 2 次提交

powerpc/64s: Blacklist functions invoked on a trap · 15770a13

由 Naveen N. Rao 提交于 6月 29, 2017

Blacklist all functions involved while handling a trap. We:
- convert some of the symbols into private symbols, and
- blacklist most functions involved while handling a trap.
Reviewed-by: NMasami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

15770a13

powerpc/64s: Convert .L__replay_interrupt_return to a local label · 9d6c4523

由 Naveen N. Rao 提交于 6月 29, 2017

Commit b48bbb82 ("powerpc/64s: Don't unbalance the return branch
predictor in __replay_interrupt()") introduced __replay_interrupt_return
symbol with '.L' prefix in hopes of keeping it private. However, due to
the use of LOAD_REG_ADDR(), the assembler kept this symbol visible. Fix
the same by instead using the local label '1'.

Fixes: Commit b48bbb82 ("powerpc/64s: Don't unbalance the return branch
predictor in __replay_interrupt()")
Suggested-by: NNicholas Piggin <npiggin@gmail.com>
Reviewed-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

9d6c4523

27 6月, 2017 1 次提交

powerpc/64s: Invalidate ERAT on powersave wakeup for POWER9 · ba6d334a

由 Benjamin Herrenschmidt 提交于 6月 24, 2017

On POWER9 the ERAT may be incorrect on wakeup from some stop states
that lose state. This causes random segvs and illegal instructions
when these stop states are enabled.

This patch invalidates the ERAT on wakeup on POWER9 to prevent this
from causing a problem.
Signed-off-by: NMichael Neuling <mikey@neuling.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Reviewed-by: NNicholas Piggin <npiggin@gmail.com>
[mpe: Merge comment change with upstream changes]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

ba6d334a

21 6月, 2017 3 次提交

powerpc/64s: Rename slb_allocate_realmode() to slb_allocate() · fd88b945

由 Michael Ellerman 提交于 6月 19, 2017

As for slb_miss_realmode(), rename slb_allocate_realmode() to avoid
confusion over whether it runs in real or virtual mode - it runs in
both.
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Reviewed-by: NNicholas Piggin <npiggin@gmail.com>

fd88b945

powerpc/64s: Rename slb_miss_realmode() to slb_miss_common() · 442b6e8e

由 Michael Ellerman 提交于 6月 19, 2017

slb_miss_realmode() doesn't always runs in real mode, which is what the
name implies. So rename it to avoid confusing people.
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Reviewed-by: NNicholas Piggin <npiggin@gmail.com>

442b6e8e

powerpc/64s: Use BRANCH_TO_COMMON() for slb_miss_realmode · b102063b

由 Michael Ellerman 提交于 6月 19, 2017

All the callers of slb_miss_realmode currently open code the #ifndef
CONFIG_RELOCATABLE check and the branch via CTR in the RELOCATABLE case.
We have a macro to do this, BRANCH_TO_COMMON(), so use it.
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Reviewed-by: NNicholas Piggin <npiggin@gmail.com>

b102063b

20 6月, 2017 2 次提交

powerpc/64s: Avoid r3 save/restore in SLB miss handler · 4d7cd3b9

由 Nicholas Piggin 提交于 5月 21, 2017

The SLB miss handler uses r3 for the faulting address but r12 is
mostly able to be freed up to save r3 in. It just requires SRR1
be reloaded again on error.

It would be more conventional to use r12 for SRR1 (and use r11 to
save r3), but slb_allocate_realmode clobbers r11 and not r12.
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

4d7cd3b9

powerpc/64s: SLB miss already has CTR saved for relocatable kernel · fe5482c0

由 Nicholas Piggin 提交于 5月 21, 2017

The EXCEPTION_PROLOG_1 used by SLB miss already saves CTR when the
kernel is built with CONFIG_RELOCATABLE. So it does not have to be
saved and reloaded when branching to slb_miss_realmode. It can be
restored from the PACA as usual.
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

fe5482c0

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功