提交 · 36a7eeaff7d06cef253c8df6dfe363bfc4a553f8 · openeuler / Kernel

30 7月, 2018 3 次提交

powerpc/405: move PPC405_ERR77 in asm-405.h · 36a7eeaf

由 Christophe Leroy 提交于 7月 05, 2018

Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

36a7eeaf

powerpc: remove unneeded inclusions of cpu_has_feature.h · 8c58259b

由 Christophe Leroy 提交于 7月 05, 2018

Files not using cpu_has_feature() don't need cpu_has_feature.h
Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

8c58259b

powerpc: remove kdump.h from page.h · db0a2b63

由 Christophe Leroy 提交于 7月 05, 2018

page.h doesn't need kdump.h
Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

db0a2b63

24 7月, 2018 35 次提交

powerpc/powernv: implement opal_put_chars_atomic · 17cc1dd4

由 Nicholas Piggin 提交于 5月 01, 2018

The RAW console does not need writes to be atomic, so relax
opal_put_chars to be able to do partial writes, and implement an
_atomic variant which does not take a spinlock. This API is used
in xmon, so the less locking that is used, the better chance there
is that a crash can be debugged.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

17cc1dd4

powerpc/powernv: move opal console flushing to udbg · ac4ac788

由 Nicholas Piggin 提交于 5月 01, 2018

OPAL console writes do not have to synchronously flush firmware /
hardware buffers unless they are going through the udbg path.

Remove the unconditional flushing from opal_put_chars. Flush if
there was no space in the buffer as an optimisation (callers loop
waiting for success in that case). udbg flushing is moved to
udbg_opal_putc.
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

ac4ac788

powerpc/powernv: Remove OPALv1 support from opal console driver · b74d2807

由 Nicholas Piggin 提交于 5月 01, 2018

opal_put_chars deals with partial writes because in OPALv1,
opal_console_write_buffer_space did not work correctly. That firmware
is not supported.

This reworks the opal_put_chars code to no longer deal with partial
writes by turning them into full writes. Partial write handling is still
supported in terms of what gets returned to the caller, but it may not
go to the console atomically. A warning message is printed in this
case.

This allows console flushing to be moved out of the opal_write_lock
spinlock. That could cause the lock to be held for long periods if the
console is busy (especially if it was being spammed by firmware),
which is dangerous because the lock is taken by xmon to debug the
system. Flushing outside the lock improves the situation a bit.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

b74d2807

powerpc/powernv: Implement and use opal_flush_console · d2a2262e

由 Nicholas Piggin 提交于 5月 01, 2018

A new console flushing firmware API was introduced to replace event
polling loops, and implemented in opal-kmsg with affddff6
("powerpc/powernv: Add a kmsg_dumper that flushes console output on
panic"), to flush the console in the panic path.

The OPAL console driver has other situations where interrupts are off
and it needs to flush the console synchronously. These still use a
polling loop.

So move the opal-kmsg flush code to opal_flush_console, and use the
new function in opal-kmsg and opal_put_chars.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reviewed-by: NRussell Currey <ruscur@russell.cc>
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

d2a2262e

powerpc/powernv: opal-kmsg use flush fallback from console code · e00da0f2

由 Nicholas Piggin 提交于 5月 01, 2018

Use the more refined and tested event polling loop from opal_put_chars
as the fallback console flush in the opal-kmsg path. This loop is used
by the console driver today, whereas the opal-kmsg fallback is not
likely to have been used for years.

Use WARN_ONCE rather than a printk when the fallback is invoked to
prepare for moving the console flush into a common function.
Reviewed-by: NRussell Currey <ruscur@russell.cc>
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

e00da0f2

powerpc/powernv: opal-kmsg standardise OPAL_BUSY handling · 3a80bfc7

由 Nicholas Piggin 提交于 5月 01, 2018

OPAL_CONSOLE_FLUSH is documented as being able to return OPAL_BUSY,
so implement the standard OPAL_BUSY handling for it.
Reviewed-by: NRussell Currey <ruscur@russell.cc>
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

3a80bfc7

powerpc/powernv: Fix OPAL console driver OPAL_BUSY loops · 36d2dabc

由 Nicholas Piggin 提交于 5月 01, 2018

The OPAL console driver does not delay in case it gets OPAL_BUSY or
OPAL_BUSY_EVENT from firmware.

It can't yet be made to sleep because it is called under spinlock,
but it can be changed to the standard OPAL_BUSY loop form, and a
delay added to keep it from hitting the firmware too frequently.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

36d2dabc

powerpc/powernv: opal_put_chars partial write fix · bd90284c

由 Nicholas Piggin 提交于 5月 01, 2018

The intention here is to consume and discard the remaining buffer
upon error. This works if there has not been a previous partial write.
If there has been, then total_len is no longer total number of bytes
to copy. total_len is always "bytes left to copy", so it should be
added to written bytes.

This code may not be exercised any more if partial writes will not be
hit, but this is a small bugfix before a larger change.
Reviewed-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

bd90284c

powerpc/powernv/opal-dump : Use IRQ_HANDLED instead of numbers in interrupt handler · b29336c0

由 Mukesh Ojha 提交于 2月 20, 2017

Fixes: 8034f715 ("powernv/opal-dump: Convert to irq domain")

Converts all the return explicit number to a more proper IRQ_HANDLED,
which looks proper incase of interrupt handler returning case.

Here, It also removes error message like "nobody cared" which was
getting unveiled while returning -1 or 0 from handler.
Signed-off-by: NMukesh Ojha <mukesh02@linux.vnet.ibm.com>
Reviewed-by: NVasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

b29336c0

powerpc/powernv/opal-dump : Handles opal_dump_info properly · a5bbe8fd

由 Mukesh Ojha 提交于 2月 20, 2017

Moves the return value check of 'opal_dump_info' to a proper place which
was previously unnecessarily filling all the dump info even on failure.
Signed-off-by: NMukesh Ojha <mukesh02@linux.vnet.ibm.com>
Acked-by: NStewart Smith <stewart@linux.vnet.ibm.com>
Acked-by: NJeremy Kerr <jk@ozlabs.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

a5bbe8fd

powerpc/tm: Remove struct thread_info param from tm_reclaim_thread() · edd00b83

由 Cyril Bur 提交于 2月 01, 2018

Since commit dc310669 ("powerpc: tm: Always use fp_state and
vr_state to store live registers") tm_reclaim_thread() doesn't use the
parameter anymore, both callers have to bother getting it as they have
no need for a struct thread_info either.

Just remove it and adjust the callers.
Signed-off-by: NCyril Bur <cyrilbur@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

edd00b83

powerpc/tm: Update function prototype comment · a596a7e9

由 Cyril Bur 提交于 2月 05, 2018

In commit eb5c3f1c ("powerpc: Always save/restore checkpointed regs
during treclaim/trecheckpoint") __tm_recheckpoint was modified to no
longer take the second parameter 'unsigned long orig_msr' as part of a
TM rewrite to simplify the reclaiming/recheckpointing process.

There is a comment in the asm file where the function is delcared which
has an incorrect prototype with the 'orig_msr' parameter.

This patch corrects the comment.
Signed-off-by: NCyril Bur <cyrilbur@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

a596a7e9

powerpc/64: add 32 bytes prechecking before using VMX optimization on memcmp() · c2a4e54e

由 Simon Guo 提交于 6月 07, 2018

This patch is based on the previous VMX patch on memcmp().

To optimize ppc64 memcmp() with VMX instruction, we need to think about
the VMX penalty brought with: If kernel uses VMX instruction, it needs
to save/restore current thread's VMX registers. There are 32 x 128 bits
VMX registers in PPC, which means 32 x 16 = 512 bytes for load and store.

The major concern regarding the memcmp() performance in kernel is KSM,
who will use memcmp() frequently to merge identical pages. So it will
make sense to take some measures/enhancement on KSM to see whether any
improvement can be done here.  Cyril Bur indicates that the memcmp() for
KSM has a higher possibility to fail (unmatch) early in previous bytes
in following mail.
	https://patchwork.ozlabs.org/patch/817322/#1773629
And I am taking a follow-up on this with this patch.

Per some testing, it shows KSM memcmp() will fail early at previous 32
bytes.  More specifically:
    - 76% cases will fail/unmatch before 16 bytes;
    - 83% cases will fail/unmatch before 32 bytes;
    - 84% cases will fail/unmatch before 64 bytes;
So 32 bytes looks a better choice than other bytes for pre-checking.

The early failure is also true for memcmp() for non-KSM case. With a
non-typical call load, it shows ~73% cases fail before first 32 bytes.

This patch adds a 32 bytes pre-checking firstly before jumping into VMX
operations, to avoid the unnecessary VMX penalty. It is not limited to
KSM case. And the testing shows ~20% improvement on memcmp() average
execution time with this patch.

And note the 32B pre-checking is only performed when the compare size
is long enough (>=4K currently) to allow VMX operation.

The detail data and analysis is at:
https://github.com/justdoitqd/publicFiles/blob/master/memcmp/README.mdSigned-off-by: NSimon Guo <wei.guo.simon@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

c2a4e54e

powerpc/64: enhance memcmp() with VMX instruction for long bytes comparision · d58badfb

由 Simon Guo 提交于 6月 07, 2018

This patch add VMX primitives to do memcmp() in case the compare size
is equal or greater than 4K bytes. KSM feature can benefit from this.

Test result with following test program(replace the "^>" with ""):
------
># cat tools/testing/selftests/powerpc/stringloops/memcmp.c
>#include <malloc.h>
>#include <stdlib.h>
>#include <string.h>
>#include <time.h>
>#include "utils.h"
>#define SIZE (1024 * 1024 * 900)
>#define ITERATIONS 40

int test_memcmp(const void *s1, const void *s2, size_t n);

static int testcase(void)
{
        char *s1;
        char *s2;
        unsigned long i;

        s1 = memalign(128, SIZE);
        if (!s1) {
                perror("memalign");
                exit(1);
        }

        s2 = memalign(128, SIZE);
        if (!s2) {
                perror("memalign");
                exit(1);
        }

        for (i = 0; i < SIZE; i++)  {
                s1[i] = i & 0xff;
                s2[i] = i & 0xff;
        }
        for (i = 0; i < ITERATIONS; i++) {
		int ret = test_memcmp(s1, s2, SIZE);

		if (ret) {
			printf("return %d at[%ld]! should have returned zero\n", ret, i);
			abort();
		}
	}

        return 0;
}

int main(void)
{
        return test_harness(testcase, "memcmp");
}
------
Without this patch (but with the first patch "powerpc/64: Align bytes
before fall back to .Lshort in powerpc64 memcmp()." in the series):
	4.726728762 seconds time elapsed                                          ( +-  3.54%)
With VMX patch:
	4.234335473 seconds time elapsed                                          ( +-  2.63%)
		There is ~+10% improvement.

Testing with unaligned and different offset version (make s1 and s2 shift
random offset within 16 bytes) can archieve higher improvement than 10%..
Signed-off-by: NSimon Guo <wei.guo.simon@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

d58badfb

powerpc: add vcmpequd/vcmpequb ppc instruction macro · f1ecbaf4

由 Simon Guo 提交于 6月 07, 2018

Some old tool chains don't know about instructions like vcmpequd.

This patch adds .long macro for vcmpequd and vcmpequb, which is
a preparation to optimize ppc64 memcmp with VMX instructions.
Signed-off-by: NSimon Guo <wei.guo.simon@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

f1ecbaf4

powerpc/64: Align bytes before fall back to .Lshort in powerpc64 memcmp() · 2d9ee327

由 Simon Guo 提交于 6月 07, 2018

Currently memcmp() 64bytes version in powerpc will fall back to .Lshort
(compare per byte mode) if either src or dst address is not 8 bytes aligned.
It can be opmitized in 2 situations:

1) if both addresses are with the same offset with 8 bytes boundary:
memcmp() can compare the unaligned bytes within 8 bytes boundary firstly
and then compare the rest 8-bytes-aligned content with .Llong mode.

2)  If src/dst addrs are not with the same offset of 8 bytes boundary:
memcmp() can align src addr with 8 bytes, increment dst addr accordingly,
 then load src with aligned mode and load dst with unaligned mode.

This patch optmizes memcmp() behavior in the above 2 situations.

Tested with both little/big endian. Performance result below is based on
little endian.

Following is the test result with src/dst having the same offset case:
(a similar result was observed when src/dst having different offset):
(1) 256 bytes
Test with the existing tools/testing/selftests/powerpc/stringloops/memcmp:
- without patch
	29.773018302 seconds time elapsed                                          ( +- 0.09% )
- with patch
	16.485568173 seconds time elapsed                                          ( +-  0.02% )
		-> There is ~+80% percent improvement

(2) 32 bytes
To observe performance impact on < 32 bytes, modify
tools/testing/selftests/powerpc/stringloops/memcmp.c with following:
-------
 #include <string.h>
 #include "utils.h"

-#define SIZE 256
+#define SIZE 32
 #define ITERATIONS 10000

 int test_memcmp(const void *s1, const void *s2, size_t n);
--------

- Without patch
	0.244746482 seconds time elapsed                                          ( +-  0.36%)
- with patch
	0.215069477 seconds time elapsed                                          ( +-  0.51%)
		-> There is ～+13% improvement

(3) 0~8 bytes
To observe <8 bytes performance impact, modify
tools/testing/selftests/powerpc/stringloops/memcmp.c with following:
-------
 #include <string.h>
 #include "utils.h"

-#define SIZE 256
-#define ITERATIONS 10000
+#define SIZE 8
+#define ITERATIONS 1000000

 int test_memcmp(const void *s1, const void *s2, size_t n);
-------
- Without patch
       1.845642503 seconds time elapsed                                          ( +- 0.12% )
- With patch
       1.849767135 seconds time elapsed                                          ( +- 0.26% )
		-> They are nearly the same. (-0.2%)
Signed-off-by: NSimon Guo <wei.guo.simon@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

2d9ee327

powerpc/pseries/mm: Improve error reporting on HCALL failures · ca42d8d2

由 Aneesh Kumar K.V 提交于 6月 29, 2018

This patch adds error reporting to H_ENTER and H_READ hcalls. A
failure for both these hcalls are mostly fatal and it would be good to
log the failure reason.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
[mpe: Split out of larger patch]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

ca42d8d2

powerpc/pseries: Use pr_xxx() in lpar.c · 65471d76

由 Aneesh Kumar K.V 提交于 6月 29, 2018

Switch from printk to pr_fmt() / pr_xxx().
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
[mpe: Split out of larger patch]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

65471d76

powerpc/mm/hash: Reduce contention on hpte lock · 27d8959d

由 Aneesh Kumar K.V 提交于 6月 29, 2018

We do this in some part. This patch make sure we always try to search
for hpte without holding lock and redo the compare with lock held once
match found.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

27d8959d

powerpc/mm/hash: Add hpte_get_old_v and use that instead of opencoding · a833280b

由 Aneesh Kumar K.V 提交于 6月 29, 2018

No functional change
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

a833280b

powerpc/mm/hash: Remove the superfluous bitwise operation when find hpte group · 1531cff4

由 Aneesh Kumar K.V 提交于 6月 29, 2018

When computing the starting slot number for a hash page table group we used
to do this
hpte_group = ((hash & htab_hash_mask) * HPTES_PER_GROUP) & ~0x7UL;

Multiplying with 8 (HPTES_PER_GROUP) imply the last three bits are 0. Hence we
really don't need to clear then separately.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

1531cff4

powerpc/mm: Increase MAX_PHYSMEM_BITS to 128TB with SPARSEMEM_VMEMMAP config · 7d4340bb

由 Aneesh Kumar K.V 提交于 6月 21, 2018

We do this only with VMEMMAP config so that our page_to_[nid/section] etc are not
impacted.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

7d4340bb

powerpc/mm: Check memblock_add against MAX_PHYSMEM_BITS range · 6aba0c84

由 Aneesh Kumar K.V 提交于 6月 21, 2018

With SPARSEMEM config enabled, we make sure that we don't add sections beyond
MAX_PHYSMEM_BITS range. This results in not building vmemmap mapping for
range beyond max range. But our memblock layer looks the device tree and create
mapping for the full memory range. Prevent this by checking against
MAX_PHSYSMEM_BITS when doing memblock_add.

We don't do similar check for memeblock_reserve_range. If reserve range is beyond
MAX_PHYSMEM_BITS we expect that to be configured with 'nomap'. Any other
reserved range should come from existing memblock ranges which we already
filtered while adding.

This avoids crash as below when running on a system with system ram config above
MAX_PHSYSMEM_BITS

 Unable to handle kernel paging request for data at address 0xc00a001000000440
 Faulting instruction address: 0xc000000001034118
 cpu 0x0: Vector: 300 (Data Access) at [c00000000124fb30]
     pc: c000000001034118: __free_pages_bootmem+0xc0/0x1c0
     lr: c00000000103b258: free_all_bootmem+0x19c/0x22c
     sp: c00000000124fdb0
    msr: 9000000002001033
    dar: c00a001000000440
  dsisr: 40000000
   current = 0xc00000000120dd00
   paca    = 0xc000000001f60000^I irqmask: 0x03^I irq_happened: 0x01
     pid   = 0, comm = swapper
 [c00000000124fe20] c00000000103b258 free_all_bootmem+0x19c/0x22c
 [c00000000124fee0] c000000001010a68 mem_init+0x3c/0x5c
 [c00000000124ff00] c00000000100401c start_kernel+0x298/0x5e4
 [c00000000124ff90] c00000000000b57c start_here_common+0x1c/0x520
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

6aba0c84

powerpc: Add ppc64le and ppc64_book3e allmodconfig targets · 64de5d8d

由 Michael Ellerman 提交于 7月 10, 2018

Similarly as we just did for 32-bit, add phony targets for generating
a little endian and Book3E allmodconfig. These aren't covered by the
regular allmodconfig, which is big endian and Book3S due to the way
the Kconfig symbols are structured.
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

64de5d8d

powerpc: Add ppc32_allmodconfig defconfig target · 8db0c9d4

由 Michael Ellerman 提交于 7月 10, 2018

Because the allmodconfig logic just sets every symbol to M or Y, it
has the effect of always generating a 64-bit config, because
CONFIG_PPC64 becomes Y.

So to make it easier for folks to test 32-bit code, provide a phony
defconfig target that generates a 32-bit allmodconfig.

The 32-bit port has several mutually exclusive CPU types, we choose
the Book3S variants as that's what the help text in Kconfig says is
most common.
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

8db0c9d4

powerpc64s: Show ori31 availability in spectre_v1 sysfs file not v2 · 6d44acae

由 Michael Ellerman 提交于 7月 09, 2018

When I added the spectre_v2 information in sysfs, I included the
availability of the ori31 speculation barrier.

Although the ori31 barrier can be used to mitigate v2, it's primarily
intended as a spectre v1 mitigation. Spectre v2 is mitigated by
hardware changes.

So rework the sysfs files to show the ori31 information in the
spectre_v1 file, rather than v2.

Currently we display eg:

  $ grep . spectre_v*
  spectre_v1:Mitigation: __user pointer sanitization
  spectre_v2:Mitigation: Indirect branch cache disabled, ori31 speculation barrier enabled

After:

  $ grep . spectre_v*
  spectre_v1:Mitigation: __user pointer sanitization, ori31 speculation barrier enabled
  spectre_v2:Mitigation: Indirect branch cache disabled

Fixes: d6fbe1c5 ("powerpc/64s: Wire up cpu_show_spectre_v2()")
Cc: stable@vger.kernel.org # v4.17+
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

6d44acae

powerpc: NMI IPI make NMI IPIs fully sychronous · 5b73151f

由 Nicholas Piggin 提交于 4月 25, 2018

There is an asynchronous aspect to smp_send_nmi_ipi. The caller waits
for all CPUs to call in to the handler, but it does not wait for
completion of the handler. This is a needless complication, so remove
it and always wait synchronously.

The synchronous wait allows the caller to easily time out and clear
the wait for completion (zero nmi_ipi_busy_count) in the case of badly
behaved handlers. This would have prevented the recent smp_send_stop
NMI IPI bug from causing the system to hang.
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

5b73151f

powerpc/64s: make PACA_IRQ_HARD_DIS track MSR[EE] closely · 9b81c021

由 Nicholas Piggin 提交于 6月 03, 2018

When the masked interrupt handler clears MSR[EE] for an interrupt in
the PACA_IRQ_MUST_HARD_MASK set, it does not set PACA_IRQ_HARD_DIS.
This makes them get out of synch.

With that taken into account, it's only low level irq manipulation
(and interrupt entry before reconcile) where they can be out of synch.
This makes the code less surprising.

It also allows the IRQ replay code to rely on the IRQ_HARD_DIS value
and not have to mtmsrd again in this case (e.g., for an external
interrupt that has been masked). The bigger benefit might just be
that there is not such an element of surprise in these two bits of
state.
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

9b81c021

powerpc/pkeys: make protection key 0 less special · 07f522d2

由 Ram Pai 提交于 7月 17, 2018

Applications need the ability to associate an address-range with some
key and latter revert to its initial default key. Pkey-0 comes close to
providing this function but falls short, because the current
implementation disallows applications to explicitly associate pkey-0 to
the address range.

Lets make pkey-0 less special and treat it almost like any other key.
Thus it can be explicitly associated with any address range, and can be
freed. This gives the application more flexibility and power.  The
ability to free pkey-0 must be used responsibily, since pkey-0 is
associated with almost all address-range by default.

Even with this change pkey-0 continues to be slightly more special
from the following point of view.
(a) it is implicitly allocated.
(b) it is the default key assigned to any address-range.
(c) its permissions cannot be modified by userspace.

NOTE: (c) is specific to powerpc only. pkey-0 is associated by default
with all pages including kernel pages, and pkeys are also active in
kernel mode. If any permission is denied on pkey-0, the kernel running
in the context of the application will be unable to operate.

Tested on powerpc.
Signed-off-by: NRam Pai <linuxram@us.ibm.com>
[mpe: Drop #define PKEY_0 0 in favour of plain old 0]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

07f522d2

powerpc/pkeys: Preallocate execute-only key · a4fcc877

由 Ram Pai 提交于 7月 17, 2018

execute-only key is allocated dynamically. This is a problem. When a
thread implicitly creates an execute-only key, and resets the UAMOR
for that key, the UAMOR value does not percolate to all the other
threads. Any other thread may ignorantly change the permissions on the
key. This can cause the key to be not execute-only for that thread.

Preallocate the execute-only key and ensure that no thread can change
the permission of the key, by resetting the corresponding bit in
UAMOR.

Fixes: 5586cf61 ("powerpc: introduce execute-only pkey")
Cc: stable@vger.kernel.org # v4.16+
Signed-off-by: NRam Pai <linuxram@us.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

a4fcc877

powerpc/pkeys: Fix calculation of total pkeys. · fe6a2804

由 Ram Pai 提交于 7月 17, 2018

Total number of pkeys calculation is off by 1. Fix it.

Fixes: 4fb158f6 ("powerpc: track allocation status of all pkeys")
Cc: stable@vger.kernel.org # v4.16+
Signed-off-by: NRam Pai <linuxram@us.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

fe6a2804

powerpc/pkeys: Save the pkey registers before fork · c76662e8

由 Ram Pai 提交于 7月 17, 2018

When a thread forks the contents of AMR, IAMR, UAMOR registers in the
newly forked thread are not inherited.

Save the registers before forking, for content of those
registers to be automatically copied into the new thread.

Fixes: cf43d3b2 ("powerpc: Enable pkey subsystem")
Cc: stable@vger.kernel.org # v4.16+
Signed-off-by: NRam Pai <linuxram@us.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

c76662e8

powerpc/pkeys: key allocation/deallocation must not change pkey registers · 4a4a5e5d

由 Ram Pai 提交于 7月 17, 2018

Key allocation and deallocation has the side effect of programming the
UAMOR/AMR/IAMR registers. This is wrong, since its the responsibility of
the application and not that of the kernel, to modify the permission on
the key.

Do not modify the pkey registers at key allocation/deallocation.

This patch also fixes a bug where a sys_pkey_free() resets the UAMOR
bits of the key, thus making its permissions unmodifiable from user
space. Later if the same key gets reallocated from a different thread
this thread will no longer be able to change the permissions on the key.

Fixes: cf43d3b2 ("powerpc: Enable pkey subsystem")
Cc: stable@vger.kernel.org # v4.16+
Reviewed-by: NThiago Jung Bauermann <bauerman@linux.ibm.com>
Signed-off-by: NRam Pai <linuxram@us.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

4a4a5e5d

powerpc/pkeys: Deny read/write/execute by default · de113256

由 Ram Pai 提交于 7月 17, 2018

Deny all permissions on all keys, with some exceptions. pkey-0 must
allow all permissions, or else everything comes to a screaching halt.
Execute-only key must allow execute permission.

Fixes: cf43d3b2 ("powerpc: Enable pkey subsystem")
Cc: stable@vger.kernel.org # v4.16+
Signed-off-by: NRam Pai <linuxram@us.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

de113256

powerpc/pkeys: Give all threads control of their key permissions · a57a04c7

由 Ram Pai 提交于 7月 17, 2018

Currently in a multithreaded application, a key allocated by one
thread is not usable by other threads. By "not usable" we mean that
other threads are unable to change the access permissions for that
key for themselves.

When a new key is allocated in one thread, the corresponding UAMOR
bits for that thread get enabled, however the UAMOR bits for that key
for all other threads remain disabled.

Other threads have no way to set permissions on the key, and the
current default permissions are that read/write is enabled for all
keys, which means the key has no effect for other threads. Although
that may be the desired behaviour in some circumstances, having all
threads able to control their permissions for the key is more
flexible.

The current behaviour also differs from the x86 behaviour, which is
problematic for users.

To fix this, enable the UAMOR bits for all keys, at process
creation (in start_thread(), ie exec time). Since the contents of
UAMOR are inherited at fork, all threads are capable of modifying the
permissions on any key.

This is technically an ABI break on powerpc, but pkey support is fairly
new on powerpc and not widely used, and this brings us into
line with x86.

Fixes: cf43d3b2 ("powerpc: Enable pkey subsystem")
Cc: stable@vger.kernel.org # v4.16+
Tested-by: NFlorian Weimer <fweimer@redhat.com>
Signed-off-by: NRam Pai <linuxram@us.ibm.com>
[mpe: Reword some of the changelog]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

a57a04c7

20 7月, 2018 1 次提交

powerpc/prom_init: Remove linux,stdout-package property · ec933639

由 Murilo Opsfelder Araujo 提交于 7月 18, 2018

This property was added in 2004 and the only use of it, which was
already inside `#if 0`, was removed a month later.
Signed-off-by: NMurilo Opsfelder Araujo <muriloo@linux.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

ec933639

19 7月, 2018 1 次提交

powerpc/powernv/npu: Add a debugfs setting to change ATSD threshold · 99c3ce33

由 Alistair Popple 提交于 4月 17, 2018

The threshold at which it becomes more efficient to coalesce a range
of ATSDs into a single per-PID ATSD is currently not well understood
due to a lack of real-world work loads. This patch adds a debugfs
parameter allowing the threshold to be altered at runtime in order to
aid future development and refinement of the value.
Signed-off-by: NAlistair Popple <alistair@popple.id.au>
Acked-by: NBalbir Singh <bsingharora@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

99c3ce33

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功