提交 · 7e4d4233260be0611c7fbdb2730c12731c4b8dc0 · openeuler / Kernel

29 11月, 2017 2 次提交

powerpc: Do not assign thread.tidr if already assigned · 7e4d4233

由 Vaibhav Jain 提交于 11月 24, 2017

If set_thread_tidr() is called twice for same task_struct then it will
allocate a new tidr value to it leaving the previous value still
dangling in the vas_thread_ida table.

To fix this the patch changes set_thread_tidr() to check if a tidr
value is already assigned to the task_struct and if yes then returns
zero.

Fixes: ec233ede("powerpc: Add support for setting SPRN_TIDR")
Signed-off-by: NVaibhav Jain <vaibhav@linux.vnet.ibm.com>
Reviewed-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com>
[mpe: Modify to return 0 in the success case, not the TID value]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

7e4d4233

powerpc: Avoid signed to unsigned conversion in set_thread_tidr() · aca7573f

由 Vaibhav Jain 提交于 11月 28, 2017

There is an unsafe signed to unsigned conversion in set_thread_tidr()
that may cause an error value to be assigned to SPRN_TIDR register and
used as thread-id.

The issue happens as assign_thread_tidr() returns an int and
thread.tidr is an unsigned-long. So a negative error code returned
from assign_thread_tidr() will fail the error check and gets assigned
as tidr as a large positive value.

To fix this the patch assigns the return value of assign_thread_tidr()
to a temporary int and assigns it to thread.tidr iff its '> 0'.

The patch shouldn't impact the calling convention of set_thread_tidr()
i.e all -ve return-values are error codes and a return value of '0'
indicates success.

Fixes: ec233ede("powerpc: Add support for setting SPRN_TIDR")
Signed-off-by: NVaibhav Jain <vaibhav@linux.vnet.ibm.com>
Reviewed-by: Christophe Lombard clombard@linux.vnet.ibm.com
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

aca7573f

24 11月, 2017 1 次提交

powerpc/kexec: Fix kexec/kdump in P9 guest kernels · 2621e945

由 Michael Ellerman 提交于 11月 24, 2017

The code that cleans up the IAMR/AMOR before kexec'ing failed to
remember that when we're running as a guest AMOR is not writable, it's
hypervisor privileged.

They symptom is that the kexec stops before entering purgatory and
nothing else is seen on the console. If you examine the state of the
system all threads will be in the 0x700 program check handler.

Fix it by making the write to AMOR dependent on HV mode.

Fixes: 1e2a516e ("powerpc/kexec: Fix radix to hash kexec due to IAMR/AMOR")
Cc: stable@vger.kernel.org # v4.10+
Reported-by: NYilin Zhang <yilzhang@redhat.com>
Debugged-by: NDavid Gibson <david@gibson.dropbear.id.au>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Acked-by: NBalbir Singh <bsingharora@gmail.com>
Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
Tested-by: NDavid Gibson <david@gibson.dropbear.id.au>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

2621e945

23 11月, 2017 2 次提交

powerpc/powernv: Fix kexec crashes caused by tlbie tracing · a3961f82

由 Mahesh Salgaonkar 提交于 11月 22, 2017

Rebooting into a new kernel with kexec fails in trace_tlbie() which is
called from native_hpte_clear(). This happens if the running kernel
has CONFIG_LOCKDEP enabled. With lockdep enabled, the tracepoints
always execute few RCU checks regardless of whether tracing is on or
off. We are already in the last phase of kexec sequence in real mode
with HILE_BE set. At this point the RCU check ends up in
RCU_LOCKDEP_WARN and causes kexec to fail.

Fix this by not calling trace_tlbie() from native_hpte_clear().

mpe: It's not safe to call trace points at this point in the kexec
path, even if we could avoid the RCU checks/warnings. The only
solution is to not call them.

Fixes: 0428491c ("powerpc/mm: Trace tlbie(l) instructions")
Cc: stable@vger.kernel.org # v4.13+
Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Reported-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Suggested-by: NMichael Ellerman <mpe@ellerman.id.au>
Acked-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

a3961f82

cxl: Check if vphb exists before iterating over AFU devices · 12841f87

由 Vaibhav Jain 提交于 11月 23, 2017

During an eeh a kernel-oops is reported if no vPHB is allocated to the
AFU. This happens as during AFU init, an error in creation of vPHB is
a non-fatal error. Hence afu->phb should always be checked for NULL
before iterating over it for the virtual AFU pci devices.

This patch fixes the kenel-oops by adding a NULL pointer check for
afu->phb before it is dereferenced.

Fixes: 9e8df8a2 ("cxl: EEH support")
Cc: stable@vger.kernel.org # v4.3+
Signed-off-by: NVaibhav Jain <vaibhav@linux.vnet.ibm.com>
Acked-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com>
Acked-by: NFrederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

12841f87

22 11月, 2017 5 次提交

powerpc/64s: Fix Power9 DD2.1 logic in DT CPU features · 4d6c51b1

由 Michael Ellerman 提交于 11月 22, 2017

I got the logic wrong in the DT CPU features code when I added the
Power9 DD2.1 feature. We should be setting the bit if we detect a
DD2.1, not clearing it if we detect a DD2.0.

This code isn't actually exercised at the moment so nothing is
actually broken.

Fixes: 3ffa9d9e ("powerpc/64s: Fix Power9 DD2.0 workarounds by adding DD2.1 feature")
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

4d6c51b1

powerpc/perf: Fix IMC_MAX_PMU macro · 73ce9aec

由 Madhavan Srinivasan 提交于 11月 22, 2017

IMC_MAX_PMU is used for static storage (per_nest_pmu_arr) which holds
nest pmu information. Current value for the macro is 32 based on
the initial number of nest pmu units supported by the nest microcode.
But going forward, microcode could support more nest units. Instead
of static storage, patch to fix the code to dynamically allocate an
array based on the number of nest imc units found in the device tree.

Fixes:8f95faaa ('powerpc/powernv: Detect and create IMC device')
Signed-off-by: NMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

73ce9aec

powerpc/perf: Fix pmu_count to count only nest imc pmus · de34787f

由 Madhavan Srinivasan 提交于 11月 22, 2017

"pmu_count" in opal_imc_counters_probe() is intended to hold
the number of successful nest imc pmu registerations. But
current code also counts other imc units like core_imc and
thread_imc. Patch add a check to count only nest imc pmus.
Signed-off-by: NMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

de34787f

powerpc: Fix boot on BOOK3S_32 with CONFIG_STRICT_KERNEL_RWX · 252eb558

由 Christophe Leroy 提交于 11月 21, 2017

On powerpc32, patch_instruction() is called by apply_feature_fixups()
which is called from early_init()

There is the following note in front of early_init():
 * Note that the kernel may be running at an address which is different
 * from the address that it was linked at, so we must use RELOC/PTRRELOC
 * to access static data (including strings).  -- paulus

Therefore, slab_is_available() cannot be called yet, and
text_poke_area must be addressed with PTRRELOC()

Fixes: 95902e6c ("powerpc/mm: Implement STRICT_KERNEL_RWX on PPC32")
Cc: stable@vger.kernel.org # v4.14+
Reported-by: NMeelis Roos <mroos@linux.ee>
Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

252eb558

powerpc/perf/imc: Use cpu_to_node() not topology_physical_package_id() · f3f1dfd6

由 Michael Ellerman 提交于 10月 16, 2017

init_imc_pmu() uses topology_physical_package_id() to detect the
node id of the processor it is on to get local memory, but that's
wrong, and can lead to crashes. Fix it to use cpu_to_node().

Fixes: 885dcd70 ("powerpc/perf: Add nest IMC PMU support")
Cc: stable@vger.kernel.org # v4.14+
Reported-By: NRob Lippert <rlippert@google.com>
Tested-By: NMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: NMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

f3f1dfd6

21 11月, 2017 1 次提交

powerpc/vas: Export chip_to_vas_id() · 62b49c42

由 Sukadev Bhattiprolu 提交于 11月 20, 2017

Export the symbol chip_to_vas_id() to fix a build failure when
CONFIG_CRYPTO_DEV_NX_COMPRESS_POWERNV=m.

Fixes: d4ef61b5 ("powerpc/vas, nx-842: Define and use chip_to_vas_id()")
Reported-by: NHaren Myneni <hbabu@us.ibm.com>
Reported-by: NJosh Boyer <jwboyer@fedoraproject.org>
Signed-off-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

62b49c42

20 11月, 2017 1 次提交

powerpc/64s/slice: Use addr limit when computing slice mask · 7a06c668

由 Aneesh Kumar K.V 提交于 11月 10, 2017

While computing slice mask for the free area we need make sure we only
search in the addr limit applicable for this mmap. We update the
slb_addr_limit after we request for a mmap above 128TB. But the
following mmap request with hint addr below 128TB should still limit
its search to below 128TB. ie. we should not use slb_addr_limit to
compute slice mask in this case. Instead, we should derive high addr
limit based on the mmap hint addr value.

Fixes: f4ea6dcb ("powerpc/mm: Enable mappings above 128TB")
Cc: stable@vger.kernel.org # v4.12+
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

7a06c668

15 11月, 2017 1 次提交

powerpc/64s: Fix Power9 DD2.0 workarounds by adding DD2.1 feature · 3ffa9d9e

由 Michael Ellerman 提交于 11月 15, 2017

Recently we added a CPU feature for Power9 DD2.0, to capture the fact
that some workarounds are required only on Power9 DD1 and DD2.0 but
not DD2.1 or later.

Then in commit 9d2f510a ("powerpc/64s/idle: avoid POWER9 DD1 and
DD2.0 ERAT workaround on DD2.1") and commit e3646330
"powerpc/64s/idle: avoid POWER9 DD1 and DD2.0 PMU workaround on
DD2.1") we changed CPU_FTR_SECTIONs to check for DD1 or DD20, eg:

  BEGIN_FTR_SECTION
          PPC_INVALIDATE_ERAT
  END_FTR_SECTION_IFSET(CPU_FTR_POWER9_DD1 | CPU_FTR_POWER9_DD20)

Unfortunately although this reads as "if set DD1 or DD2.0", the or is
a bitwise or and actually generates a mask of both bits. The code that
does the feature patching then checks that the value of the CPU
features masked with that mask are equal to the mask.

So the end result is we're checking for DD1 and DD20 being set, which
never happens. Yes the API is terrible.

Removing the ERAT workaround on DD2.0 results in random SEGVs, the
system tends to boot, but things randomly die including sometimes
dhclient, udev etc.

To fix the problem and hopefully avoid it in future, we remove the
DD2.0 CPU feature and instead add a DD2.1 (or later) feature. This
allows us to easily express that the workarounds are required if DD2.1
is not set.

At some point we will drop the DD1 workarounds entirely and some of
this can be cleaned up.

Fixes: 9d2f510a ("powerpc/64s/idle: avoid POWER9 DD1 and DD2.0 ERAT workaround on DD2.1")
Fixes: e3646330 ("powerpc/64s/idle: avoid POWER9 DD1 and DD2.0 PMU workaround on DD2.1")
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

3ffa9d9e

14 11月, 2017 1 次提交

powerpc/64s: Fix masking of SRR1 bits on instruction fault · 475b581f

由 Michael Ellerman 提交于 11月 14, 2017

On 64-bit Book3s, when we take an instruction fault the reason for the
fault may be reported in SRR1. For data faults the reason is reported
in DSISR (Data Storage Instruction Status Register).

The reasons reported in each do not necessarily correspond, so we mask
the SRR1 bits before copying them to the DSISR, which is then used by
the page fault code.

Prior to commit b4c001dc ("powerpc/mm: Use symbolic constants for
filtering SRR1 bits on ISIs") we used a hard-coded mask of 0x58200000,
which corresponds to:

  DSISR_NOHPTE		0x40000000 /* no translation found */
  DSISR_NOEXEC_OR_G	0x10000000 /* exec of no-exec or guarded */
  DSISR_PROTFAULT	0x08000000 /* protection fault */
  DSISR_KEYFAULT	0x00200000 /* Storage Key fault */

That commit added a #define for the mask, DSISR_SRR1_MATCH_64S, but
incorrectly used a different similarly named DSISR_BAD_FAULT_64S.

This had the effect of changing the mask to 0xa43a0000, which omits
everything but DSISR_KEYFAULT.

Luckily this had no visible effect, because in practice we hardly use
the DSISR bits. The lack of DSISR_NOHPTE means a TLB flush
optimisation was missed in the native HPTE code, and DSISR_NOEXEC_OR_G
and DSISR_PROTFAULT are both only used to trigger rare warnings.

So we got lucky, but let's fix it. The new value only has bits between
17 and 30 set, so we can continue to use andis.

Fixes: b4c001dc ("powerpc/mm: Use symbolic constants for filtering SRR1 bits on ISIs")
Cc: stable@vger.kernel.org # v4.14+
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

475b581f

13 11月, 2017 14 次提交

powerpc/64s: mm_context.addr_limit is only used on hash · 4722476b

由 Nicholas Piggin 提交于 11月 10, 2017

Radix keeps no meaningful state in addr_limit, so remove it from radix
code and rename to slb_addr_limit to make it clear it applies to hash
only.
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

4722476b

powerpc/64s/radix: Fix 128TB-512TB virtual address boundary case allocation · 85e3f1ad

由 Nicholas Piggin 提交于 11月 10, 2017

Radix VA space allocations test addresses against mm->task_size which
is 512TB, even in cases where the intention is to limit allocation to
below 128TB.

This results in mmap with a hint address below 128TB but address +
length above 128TB succeeding when it should fail (as hash does after
the previous patch).

Set the high address limit to be considered up front, and base
subsequent allocation checks on that consistently.

Fixes: f4ea6dcb ("powerpc/mm: Enable mappings above 128TB")
Cc: stable@vger.kernel.org # v4.12+
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

85e3f1ad

powerpc/64s/hash: Allow MAP_FIXED allocations to cross 128TB boundary · 35602f82

由 Nicholas Piggin 提交于 11月 10, 2017

While mapping hints with a length that cross 128TB are disallowed,
MAP_FIXED allocations that cross 128TB are allowed. These are failing
on hash (on radix they succeed). Add an additional case for fixed
mappings to expand the addr_limit when crossing 128TB.

Fixes: f4ea6dcb ("powerpc/mm: Enable mappings above 128TB")
Cc: stable@vger.kernel.org # v4.12+
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

35602f82

powerpc/64s/hash: Fix fork() with 512TB process address space · effc1b25

由 Nicholas Piggin 提交于 11月 10, 2017

Hash unconditionally resets the addr_limit to default (128TB) when the
mm context is initialised. If a process has > 128TB mappings when it
forks, the child will not get the 512TB addr_limit, so accesses to
valid > 128TB mappings will fail in the child.

Fix this by only resetting the addr_limit to default if it was 0. Non
zero indicates it was duplicated from the parent (0 means exec()).

Fixes: f4ea6dcb ("powerpc/mm: Enable mappings above 128TB")
Cc: stable@vger.kernel.org # v4.12+
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

effc1b25

powerpc/64s/hash: Fix 128TB-512TB virtual address boundary case allocation · 6a72dc03

由 Nicholas Piggin 提交于 11月 10, 2017

When allocating VA space with a hint that crosses 128TB, the SLB
addr_limit variable is not expanded if addr is not > 128TB, but the
slice allocation looks at task_size, which is 512TB. This results in
slice_check_fit() incorrectly succeeding because the slice_count
truncates off bit 128 of the requested mask, so the comparison to the
available mask succeeds.

Fix this by using mm->context.addr_limit instead of mm->task_size for
testing allocation limits. This causes such allocations to fail.

Fixes: f4ea6dcb ("powerpc/mm: Enable mappings above 128TB")
Cc: stable@vger.kernel.org # v4.12+
Reported-by: NFlorian Weimer <fweimer@redhat.com>
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

6a72dc03

powerpc/64s/hash: Fix 512T hint detection to use >= 128T · 7ece3709

由 Michael Ellerman 提交于 11月 13, 2017

Currently userspace is able to request mmap() search between 128T-512T
by specifying a hint address that is greater than 128T. But that means
a hint of 128T exactly will return an address below 128T, which is
confusing and wrong.

So fix the logic to check the hint is greater than *or equal* to 128T.

Fixes: f4ea6dcb ("powerpc/mm: Enable mappings above 128TB")
Cc: stable@vger.kernel.org # v4.12+
Suggested-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Suggested-by: NNicholas Piggin <npiggin@gmail.com>
[mpe: Split out of Nick's bigger patch]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

7ece3709

powerpc: Fix DABR match on hash based systems · f23ab3ef

由 Benjamin Herrenschmidt 提交于 11月 10, 2017

Commit 398a719d ("powerpc/mm: Update bits used to skip hash_page")
mistakenly dropped the DSISR_DABRMATCH bit from the mask of bit tested
to skip trying to hash a page.

As a result, the DABR matches would no longer be detected.

This adds it back. We open code it in the 2 places where it matters
rather than fold it into DSISR_BAD_FAULT_32S/64S because this isn't
technically a bad fault and while we would never hit it with the
current code, I prefer if page_fault_is_bad() didn't trigger on these.

Fixes: 398a719d ("powerpc/mm: Update bits used to skip hash_page")
Cc: stable@vger.kernel.org # v4.14
Tested-by: NPedro Miraglia Franco de Carvalho <pedromfc@br.ibm.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

f23ab3ef

powerpc/signal: Properly handle return value from uprobe_deny_signal() · 46725b17

由 Naveen N. Rao 提交于 8月 31, 2017

When a uprobe is installed on an instruction that we currently do not
emulate, we copy the instruction into a xol buffer and single step
that instruction. If that instruction generates a fault, we abort the
single stepping before invoking the signal handler. Once the signal
handler is done, the uprobe trap is hit again since the instruction is
retried and the process repeats.

We use uprobe_deny_signal() to detect if the xol instruction triggered
a signal. If so, we clear TIF_SIGPENDING and set TIF_UPROBE so that the
signal is not handled until after the single stepping is aborted. In
this case, uprobe_deny_signal() returns true and get_signal() ends up
returning 0. However, in do_signal(), we are not looking at the return
value, but depending on ksig.sig for further action, all with an
uninitialized ksig that is not touched in this scenario. Fix the same
by initializing ksig.sig to 0.

Fixes: 129b69df ("powerpc: Use get_signal() signal_setup_done()")
Cc: stable@vger.kernel.org # v3.17+
Reported-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

46725b17

powerpc/fadump: use kstrtoint to handle sysfs store · dcdc4679

由 Michal Suchanek 提交于 6月 26, 2017

Currently sysfs store handlers in fadump use if buf[0] == 'char'.

This means input "100foo" is interpreted as '1' and "01" as '0'.

Change to kstrtoint so leading zeroes and the like is handled in
expected way.
Signed-off-by: NMichal Suchanek <msuchanek@suse.de>
Acked-by: NHari Bathini <hbathini@linux.vnet.ibm.com>
Signed-off-by: Michal Suchanek <a class="moz-txt-link-rfc2396E" href="mailto:msuchanek@suse.de"><msuchanek@suse.de></a></pre>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

dcdc4679

powerpc/lib: Implement UACCESS_FLUSHCACHE API · 6c44741d

由 Oliver O'Halloran 提交于 10月 19, 2017

Implement the architecture specific portitions of the UACCESS_FLUSHCACHE
API. This provides functions for the copy_user_flushcache iterator that
ensure that when the copy is finished the destination buffer contains
a copy of the original and that the destination buffer is clean in the
processor caches.
Signed-off-by: NOliver O'Halloran <oohall@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

6c44741d

powerpc/lib: Implement PMEM API · 32ce3862

由 Oliver O'Halloran 提交于 10月 19, 2017

Implement the architecture specific cache maintence functions that make
up the "PMEM API". Currently the writeback and invalidate functions
are the same since the function of the DCBST (data cache block store)
instruction is typically interpreted as "writeback to the point of
coherency" rather than to memory. As a result implementing the API
requires a full cache flush rather than just a cache write back. This
will probably change in the not-too-distant future.
Signed-off-by: NOliver O'Halloran <oohall@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

32ce3862

powerpc/powernv/npu: Don't explicitly flush nmmu tlb · 1b2c2b12

由 Alistair Popple 提交于 9月 06, 2017

The nest mmu required an explicit flush as a tlbi would not flush it in the
same way as the core. However an alternate firmware fix exists which should
eliminate the need for this flush, so instead add a device-tree property
(ibm,nmmu-flush) on the NVLink2 PHB to enable it only if required.
Signed-off-by: NAlistair Popple <alistair@popple.id.au>
Reviewed-by: NFrederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

1b2c2b12

powerpc/powernv/npu: Use flush_all_mm() instead of flush_tlb_mm() · 2a31ad09

由 Alistair Popple 提交于 9月 06, 2017

With the optimisations introduced by commit a46cc7a908 ("powerpc/mm/radix:
Improve TLB/PWC flushes"), flush_tlb_mm() no longer flushes the page walk
cache with radix. Switch to using flush_all_mm() to ensure the pwc and tlb
are properly flushed on the nmmu.
Signed-off-by: NAlistair Popple <alistair@popple.id.au>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

2a31ad09

powerpc/powernv/idle: Round up latency and residency values · 8d4e10e9

由 Vaidyanathan Srinivasan 提交于 8月 24, 2017

On PowerNV platforms, firmware provides exit latency and
target residency for each of the idle states in nano
seconds.  Cpuidle framework expects the values in micro
seconds.  Round up to nearest micro seconds to avoid errors
in cases where the values are defined as fractional micro
seconds.

Default idle state of 'snooze' has exit latency of zero.  If
other states have fractional micro second exit latency, they
would get rounded down to zero micro second and make cpuidle
framework choose deeper idle state when snooze loop is the
right choice.
Reported-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NVaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Reviewed-by: NGautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

8d4e10e9

12 11月, 2017 12 次提交

powerpc/kprobes: refactor kprobe_lookup_name for safer string operations · acdfe931

由 Naveen N. Rao 提交于 10月 23, 2017

Use safer string manipulation functions when dealing with a
user-provided string in kprobe_lookup_name().
Reported-by: NDavid Laight <David.Laight@ACULAB.COM>
Signed-off-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

acdfe931

powerpc/kprobes: Blacklist emulate_update_regs() from kprobes · 67ac0bfe

由 Naveen N. Rao 提交于 10月 23, 2017

Commit 3cdfcbfd ("powerpc: Change analyse_instr so it doesn't
modify *regs") introduced emulate_update_regs() to perform part of what
emulate_step() was doing earlier. However, this function was not added
to the kprobes blacklist. Add it so as to prevent it from being probed.
Signed-off-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: NMasami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

67ac0bfe

powerpc/kprobes: Do not disable interrupts for optprobes and kprobes_on_ftrace · f72180cc

由 Naveen N. Rao 提交于 10月 23, 2017

Per Documentation/kprobes.txt, we don't necessarily need to disable
interrupts before invoking the kprobe handlers. Masami submitted
similar changes for x86 via commit a19b2e3d ("kprobes/x86: Remove
IRQ disabling from ftrace-based/optimized kprobes"). Do the same for
powerpc.
Signed-off-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: NMasami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

f72180cc

powerpc/kprobes: Disable preemption before invoking probe handler for optprobes · 8a2d71a3

由 Naveen N. Rao 提交于 10月 23, 2017

Per Documentation/kprobes.txt, probe handlers need to be invoked with
preemption disabled. Update optimized_callback() to do so. Also move
get_kprobe_ctlblk() invocation post preemption disable, since it
accesses pre-cpu data.

This was not an issue so far since optprobes wasn't selected if
CONFIG_PREEMPT was enabled. Commit a30b85df ("kprobes: Use
synchronize_rcu_tasks() for optprobe with CONFIG_PREEMPT=y") changes
this.
Signed-off-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: NMasami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

8a2d71a3

powerpc/64s: ppc_save_regs is now needed for all 64s builds · fc2a5a61

由 Stephen Rothwell 提交于 11月 02, 2017

Commit 78adf6c2 ("powerpc/64s: Implement system reset idle wakeup
reason"), added a call to ppc_save_regs() in the book3s code.

ppc_save_regs() is only built if XMON and/or KEXEC_CORE are enabled,
which is usually the case, however if they're not enabled then the
build breaks.

Fix it by making the Makefile check also build ppc_save_regs.o if
CONFIG_PPC_BOOK3S is enabled.

Fixes: 78adf6c2 ("powerpc/64s: Implement system reset idle wakeup reason")
Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
[mpe: Write change log]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

fc2a5a61

powerpc/mm/radix: Fix crashes on Power9 DD1 with radix MMU and STRICT_RWX · f79ad50e

由 Balbir Singh 提交于 10月 16, 2017

When using the radix MMU on Power9 DD1, to work around a hardware
problem, radix__pte_update() is required to do a two stage update of
the PTE. First we write a zero value into the PTE, then we flush the
TLB, and then we write the new PTE value.

In the normal case that works OK, but it does not work if we're
updating the PTE that maps the code we're executing, because the
mapping is removed by the TLB flush and we can no longer execute from
it. Unfortunately the STRICT_RWX code needs to do exactly that.

The exact symptoms when we hit this case vary, sometimes we print an
oops and then get stuck after that, but I've also seen a machine just
get stuck continually page faulting with no oops printed. The variance
is presumably due to the exact layout of the text and the page size
used for the mappings. In all cases we are unable to boot to a shell.

There are possible solutions such as creating a second mapping of the
TLB flush code, executing from that, and then jumping back to the
original. However we don't want to add that level of complexity for a
DD1 work around.

So just detect that we're running on Power9 DD1 and refrain from
changing the permissions, effectively disabling STRICT_RWX on Power9
DD1.

Fixes: 7614ff32 ("powerpc/mm/radix: Implement STRICT_RWX/mark_rodata_ro() for Radix")
Cc: stable@vger.kernel.org # v4.13+
Reported-by: NAndrew Jeffery <andrew@aj.id.au>
[Changelog as suggested by Michael Ellerman <mpe@ellerman.id.au>]
Signed-off-by: NBalbir Singh <bsingharora@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

f79ad50e

crypto/nx: Do not initialize workmem allocation · 0f46a79a

由 Haren Myneni 提交于 9月 24, 2017

We are using percpu send window on P9 NX (powerNV) instead of opening
/ closing per each crypto session. Means txwin is removed from
workmem. So we do not need to initialize workmem for each request.
Signed-off-by: NHaren Myneni <haren@us.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

0f46a79a

crypto/nx: Use percpu send window for NX requests · 976dd649

由 Haren Myneni 提交于 9月 24, 2017

For P9 NX, the send window is opened for each crypto session and
closed upon free. But VAS supports 64K windows per chip for all
coprocessors including in user space support. So there is a
possibility of not getting the window for kernel requests.

This patch reserves windows for each coprocessor type (NX842) and are
available forever for kernel requests, Opens each window for each CPU
on the corresponding chip during driver initialization. So then use
the percpu txwin for NX requests depends on the CPU on which the
process is executing.
Signed-off-by: NHaren Myneni <haren@us.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

976dd649

powerpc/vas: Add support for user receive window · 6c8e6bb2

由 Sukadev Bhattiprolu 提交于 11月 07, 2017

Add support for user space receive window (for the Fast thread-wakeup
coprocessor type)
Signed-off-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

6c8e6bb2

powerpc/vas: Define vas_win_id() · 61f3cca8

由 Sukadev Bhattiprolu 提交于 11月 07, 2017

Define an interface to return a system-wide unique id for a given VAS
window.

The vas_win_id() will be used in a follow-on patch to generate an unique
handle for a user space receive window. Applications can use this handle
to pair send and receive windows for fast thread-wakeup.

The hardware refers to this system-wide unique id as a Partition Send
Window ID which is expected to be used during fault handling. Hence the
"pswid" in the function names.
Signed-off-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

61f3cca8

powerpc/vas: Define vas_win_paste_addr() · 5676be2f

由 Sukadev Bhattiprolu 提交于 11月 07, 2017

Define an interface that the NX drivers can use to find the physical
paste address of a send window. This interface is expected to be used
with the mmap() operation of the NX driver's device. i.e the user space
process can use driver's mmap() operation to map the send window's paste
address into their address space and then use copy and paste instructions
to submit the CRBs to the NX engine.

Note that kernel drivers will use vas_paste_crb() directly and don't need
this interface.
Signed-off-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

5676be2f

powerpc: Define set_thread_uses_vas() · 9d2a4d71

由 Sukadev Bhattiprolu 提交于 11月 07, 2017

A CP_ABORT instruction is required in processes that have mapped a VAS
"paste address" with the intention of using COPY/PASTE instructions.
But since CP_ABORT is expensive, we want to restrict it to only
processes that use/intend to use COPY/PASTE.

Define an interface, set_thread_uses_vas(), that VAS can use to
indicate that the current process opened a send window. During context
switch, issue CP_ABORT only for processes that have the flag set.

Thanks for input from Nick Piggin, Michael Ellerman.
Signed-off-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
[mpe: Fix to not use new_thread after _switch() returns]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

9d2a4d71

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功