1. 30 7月, 2018 3 次提交
  2. 24 7月, 2018 35 次提交
    • N
      powerpc/powernv: implement opal_put_chars_atomic · 17cc1dd4
      Nicholas Piggin 提交于
      The RAW console does not need writes to be atomic, so relax
      opal_put_chars to be able to do partial writes, and implement an
      _atomic variant which does not take a spinlock. This API is used
      in xmon, so the less locking that is used, the better chance there
      is that a crash can be debugged.
      
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      17cc1dd4
    • N
      powerpc/powernv: move opal console flushing to udbg · ac4ac788
      Nicholas Piggin 提交于
      OPAL console writes do not have to synchronously flush firmware /
      hardware buffers unless they are going through the udbg path.
      
      Remove the unconditional flushing from opal_put_chars. Flush if
      there was no space in the buffer as an optimisation (callers loop
      waiting for success in that case). udbg flushing is moved to
      udbg_opal_putc.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      ac4ac788
    • N
      powerpc/powernv: Remove OPALv1 support from opal console driver · b74d2807
      Nicholas Piggin 提交于
      opal_put_chars deals with partial writes because in OPALv1,
      opal_console_write_buffer_space did not work correctly. That firmware
      is not supported.
      
      This reworks the opal_put_chars code to no longer deal with partial
      writes by turning them into full writes. Partial write handling is still
      supported in terms of what gets returned to the caller, but it may not
      go to the console atomically. A warning message is printed in this
      case.
      
      This allows console flushing to be moved out of the opal_write_lock
      spinlock. That could cause the lock to be held for long periods if the
      console is busy (especially if it was being spammed by firmware),
      which is dangerous because the lock is taken by xmon to debug the
      system. Flushing outside the lock improves the situation a bit.
      
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      b74d2807
    • N
      powerpc/powernv: Implement and use opal_flush_console · d2a2262e
      Nicholas Piggin 提交于
      A new console flushing firmware API was introduced to replace event
      polling loops, and implemented in opal-kmsg with affddff6
      ("powerpc/powernv: Add a kmsg_dumper that flushes console output on
      panic"), to flush the console in the panic path.
      
      The OPAL console driver has other situations where interrupts are off
      and it needs to flush the console synchronously. These still use a
      polling loop.
      
      So move the opal-kmsg flush code to opal_flush_console, and use the
      new function in opal-kmsg and opal_put_chars.
      
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Reviewed-by: NRussell Currey <ruscur@russell.cc>
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      d2a2262e
    • N
      powerpc/powernv: opal-kmsg use flush fallback from console code · e00da0f2
      Nicholas Piggin 提交于
      Use the more refined and tested event polling loop from opal_put_chars
      as the fallback console flush in the opal-kmsg path. This loop is used
      by the console driver today, whereas the opal-kmsg fallback is not
      likely to have been used for years.
      
      Use WARN_ONCE rather than a printk when the fallback is invoked to
      prepare for moving the console flush into a common function.
      Reviewed-by: NRussell Currey <ruscur@russell.cc>
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      e00da0f2
    • N
      powerpc/powernv: opal-kmsg standardise OPAL_BUSY handling · 3a80bfc7
      Nicholas Piggin 提交于
      OPAL_CONSOLE_FLUSH is documented as being able to return OPAL_BUSY,
      so implement the standard OPAL_BUSY handling for it.
      Reviewed-by: NRussell Currey <ruscur@russell.cc>
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      3a80bfc7
    • N
      powerpc/powernv: Fix OPAL console driver OPAL_BUSY loops · 36d2dabc
      Nicholas Piggin 提交于
      The OPAL console driver does not delay in case it gets OPAL_BUSY or
      OPAL_BUSY_EVENT from firmware.
      
      It can't yet be made to sleep because it is called under spinlock,
      but it can be changed to the standard OPAL_BUSY loop form, and a
      delay added to keep it from hitting the firmware too frequently.
      
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      36d2dabc
    • N
      powerpc/powernv: opal_put_chars partial write fix · bd90284c
      Nicholas Piggin 提交于
      The intention here is to consume and discard the remaining buffer
      upon error. This works if there has not been a previous partial write.
      If there has been, then total_len is no longer total number of bytes
      to copy. total_len is always "bytes left to copy", so it should be
      added to written bytes.
      
      This code may not be exercised any more if partial writes will not be
      hit, but this is a small bugfix before a larger change.
      Reviewed-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      bd90284c
    • M
      powerpc/powernv/opal-dump : Use IRQ_HANDLED instead of numbers in interrupt handler · b29336c0
      Mukesh Ojha 提交于
      Fixes: 8034f715 ("powernv/opal-dump: Convert to irq domain")
      
      Converts all the return explicit number to a more proper IRQ_HANDLED,
      which looks proper incase of interrupt handler returning case.
      
      Here, It also removes error message like "nobody cared" which was
      getting unveiled while returning -1 or 0 from handler.
      Signed-off-by: NMukesh Ojha <mukesh02@linux.vnet.ibm.com>
      Reviewed-by: NVasant Hegde <hegdevasant@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      b29336c0
    • M
      powerpc/powernv/opal-dump : Handles opal_dump_info properly · a5bbe8fd
      Mukesh Ojha 提交于
      Moves the return value check of 'opal_dump_info' to a proper place which
      was previously unnecessarily filling all the dump info even on failure.
      Signed-off-by: NMukesh Ojha <mukesh02@linux.vnet.ibm.com>
      Acked-by: NStewart Smith <stewart@linux.vnet.ibm.com>
      Acked-by: NJeremy Kerr <jk@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      a5bbe8fd
    • C
      powerpc/tm: Remove struct thread_info param from tm_reclaim_thread() · edd00b83
      Cyril Bur 提交于
      Since commit dc310669 ("powerpc: tm: Always use fp_state and
      vr_state to store live registers") tm_reclaim_thread() doesn't use the
      parameter anymore, both callers have to bother getting it as they have
      no need for a struct thread_info either.
      
      Just remove it and adjust the callers.
      Signed-off-by: NCyril Bur <cyrilbur@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      edd00b83
    • C
      powerpc/tm: Update function prototype comment · a596a7e9
      Cyril Bur 提交于
      In commit eb5c3f1c ("powerpc: Always save/restore checkpointed regs
      during treclaim/trecheckpoint") __tm_recheckpoint was modified to no
      longer take the second parameter 'unsigned long orig_msr' as part of a
      TM rewrite to simplify the reclaiming/recheckpointing process.
      
      There is a comment in the asm file where the function is delcared which
      has an incorrect prototype with the 'orig_msr' parameter.
      
      This patch corrects the comment.
      Signed-off-by: NCyril Bur <cyrilbur@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      a596a7e9
    • S
      powerpc/64: add 32 bytes prechecking before using VMX optimization on memcmp() · c2a4e54e
      Simon Guo 提交于
      This patch is based on the previous VMX patch on memcmp().
      
      To optimize ppc64 memcmp() with VMX instruction, we need to think about
      the VMX penalty brought with: If kernel uses VMX instruction, it needs
      to save/restore current thread's VMX registers. There are 32 x 128 bits
      VMX registers in PPC, which means 32 x 16 = 512 bytes for load and store.
      
      The major concern regarding the memcmp() performance in kernel is KSM,
      who will use memcmp() frequently to merge identical pages. So it will
      make sense to take some measures/enhancement on KSM to see whether any
      improvement can be done here.  Cyril Bur indicates that the memcmp() for
      KSM has a higher possibility to fail (unmatch) early in previous bytes
      in following mail.
      	https://patchwork.ozlabs.org/patch/817322/#1773629
      And I am taking a follow-up on this with this patch.
      
      Per some testing, it shows KSM memcmp() will fail early at previous 32
      bytes.  More specifically:
          - 76% cases will fail/unmatch before 16 bytes;
          - 83% cases will fail/unmatch before 32 bytes;
          - 84% cases will fail/unmatch before 64 bytes;
      So 32 bytes looks a better choice than other bytes for pre-checking.
      
      The early failure is also true for memcmp() for non-KSM case. With a
      non-typical call load, it shows ~73% cases fail before first 32 bytes.
      
      This patch adds a 32 bytes pre-checking firstly before jumping into VMX
      operations, to avoid the unnecessary VMX penalty. It is not limited to
      KSM case. And the testing shows ~20% improvement on memcmp() average
      execution time with this patch.
      
      And note the 32B pre-checking is only performed when the compare size
      is long enough (>=4K currently) to allow VMX operation.
      
      The detail data and analysis is at:
      https://github.com/justdoitqd/publicFiles/blob/master/memcmp/README.mdSigned-off-by: NSimon Guo <wei.guo.simon@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      c2a4e54e
    • S
      powerpc/64: enhance memcmp() with VMX instruction for long bytes comparision · d58badfb
      Simon Guo 提交于
      This patch add VMX primitives to do memcmp() in case the compare size
      is equal or greater than 4K bytes. KSM feature can benefit from this.
      
      Test result with following test program(replace the "^>" with ""):
      ------
      ># cat tools/testing/selftests/powerpc/stringloops/memcmp.c
      >#include <malloc.h>
      >#include <stdlib.h>
      >#include <string.h>
      >#include <time.h>
      >#include "utils.h"
      >#define SIZE (1024 * 1024 * 900)
      >#define ITERATIONS 40
      
      int test_memcmp(const void *s1, const void *s2, size_t n);
      
      static int testcase(void)
      {
              char *s1;
              char *s2;
              unsigned long i;
      
              s1 = memalign(128, SIZE);
              if (!s1) {
                      perror("memalign");
                      exit(1);
              }
      
              s2 = memalign(128, SIZE);
              if (!s2) {
                      perror("memalign");
                      exit(1);
              }
      
              for (i = 0; i < SIZE; i++)  {
                      s1[i] = i & 0xff;
                      s2[i] = i & 0xff;
              }
              for (i = 0; i < ITERATIONS; i++) {
      		int ret = test_memcmp(s1, s2, SIZE);
      
      		if (ret) {
      			printf("return %d at[%ld]! should have returned zero\n", ret, i);
      			abort();
      		}
      	}
      
              return 0;
      }
      
      int main(void)
      {
              return test_harness(testcase, "memcmp");
      }
      ------
      Without this patch (but with the first patch "powerpc/64: Align bytes
      before fall back to .Lshort in powerpc64 memcmp()." in the series):
      	4.726728762 seconds time elapsed                                          ( +-  3.54%)
      With VMX patch:
      	4.234335473 seconds time elapsed                                          ( +-  2.63%)
      		There is ~+10% improvement.
      
      Testing with unaligned and different offset version (make s1 and s2 shift
      random offset within 16 bytes) can archieve higher improvement than 10%..
      Signed-off-by: NSimon Guo <wei.guo.simon@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      d58badfb
    • S
      powerpc: add vcmpequd/vcmpequb ppc instruction macro · f1ecbaf4
      Simon Guo 提交于
      Some old tool chains don't know about instructions like vcmpequd.
      
      This patch adds .long macro for vcmpequd and vcmpequb, which is
      a preparation to optimize ppc64 memcmp with VMX instructions.
      Signed-off-by: NSimon Guo <wei.guo.simon@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      f1ecbaf4
    • S
      powerpc/64: Align bytes before fall back to .Lshort in powerpc64 memcmp() · 2d9ee327
      Simon Guo 提交于
      Currently memcmp() 64bytes version in powerpc will fall back to .Lshort
      (compare per byte mode) if either src or dst address is not 8 bytes aligned.
      It can be opmitized in 2 situations:
      
      1) if both addresses are with the same offset with 8 bytes boundary:
      memcmp() can compare the unaligned bytes within 8 bytes boundary firstly
      and then compare the rest 8-bytes-aligned content with .Llong mode.
      
      2)  If src/dst addrs are not with the same offset of 8 bytes boundary:
      memcmp() can align src addr with 8 bytes, increment dst addr accordingly,
       then load src with aligned mode and load dst with unaligned mode.
      
      This patch optmizes memcmp() behavior in the above 2 situations.
      
      Tested with both little/big endian. Performance result below is based on
      little endian.
      
      Following is the test result with src/dst having the same offset case:
      (a similar result was observed when src/dst having different offset):
      (1) 256 bytes
      Test with the existing tools/testing/selftests/powerpc/stringloops/memcmp:
      - without patch
      	29.773018302 seconds time elapsed                                          ( +- 0.09% )
      - with patch
      	16.485568173 seconds time elapsed                                          ( +-  0.02% )
      		-> There is ~+80% percent improvement
      
      (2) 32 bytes
      To observe performance impact on < 32 bytes, modify
      tools/testing/selftests/powerpc/stringloops/memcmp.c with following:
      -------
       #include <string.h>
       #include "utils.h"
      
      -#define SIZE 256
      +#define SIZE 32
       #define ITERATIONS 10000
      
       int test_memcmp(const void *s1, const void *s2, size_t n);
      --------
      
      - Without patch
      	0.244746482 seconds time elapsed                                          ( +-  0.36%)
      - with patch
      	0.215069477 seconds time elapsed                                          ( +-  0.51%)
      		-> There is ~+13% improvement
      
      (3) 0~8 bytes
      To observe <8 bytes performance impact, modify
      tools/testing/selftests/powerpc/stringloops/memcmp.c with following:
      -------
       #include <string.h>
       #include "utils.h"
      
      -#define SIZE 256
      -#define ITERATIONS 10000
      +#define SIZE 8
      +#define ITERATIONS 1000000
      
       int test_memcmp(const void *s1, const void *s2, size_t n);
      -------
      - Without patch
             1.845642503 seconds time elapsed                                          ( +- 0.12% )
      - With patch
             1.849767135 seconds time elapsed                                          ( +- 0.26% )
      		-> They are nearly the same. (-0.2%)
      Signed-off-by: NSimon Guo <wei.guo.simon@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      2d9ee327
    • A
      powerpc/pseries/mm: Improve error reporting on HCALL failures · ca42d8d2
      Aneesh Kumar K.V 提交于
      This patch adds error reporting to H_ENTER and H_READ hcalls. A
      failure for both these hcalls are mostly fatal and it would be good to
      log the failure reason.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      [mpe: Split out of larger patch]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      ca42d8d2
    • A
      powerpc/pseries: Use pr_xxx() in lpar.c · 65471d76
      Aneesh Kumar K.V 提交于
      Switch from printk to pr_fmt() / pr_xxx().
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      [mpe: Split out of larger patch]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      65471d76
    • A
      powerpc/mm/hash: Reduce contention on hpte lock · 27d8959d
      Aneesh Kumar K.V 提交于
      We do this in some part. This patch make sure we always try to search
      for hpte without holding lock and redo the compare with lock held once
      match found.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      27d8959d
    • A
      a833280b
    • A
      powerpc/mm/hash: Remove the superfluous bitwise operation when find hpte group · 1531cff4
      Aneesh Kumar K.V 提交于
      When computing the starting slot number for a hash page table group we used
      to do this
      hpte_group = ((hash & htab_hash_mask) * HPTES_PER_GROUP) & ~0x7UL;
      
      Multiplying with 8 (HPTES_PER_GROUP) imply the last three bits are 0. Hence we
      really don't need to clear then separately.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      1531cff4
    • A
      powerpc/mm: Increase MAX_PHYSMEM_BITS to 128TB with SPARSEMEM_VMEMMAP config · 7d4340bb
      Aneesh Kumar K.V 提交于
      We do this only with VMEMMAP config so that our page_to_[nid/section] etc are not
      impacted.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      7d4340bb
    • A
      powerpc/mm: Check memblock_add against MAX_PHYSMEM_BITS range · 6aba0c84
      Aneesh Kumar K.V 提交于
      With SPARSEMEM config enabled, we make sure that we don't add sections beyond
      MAX_PHYSMEM_BITS range. This results in not building vmemmap mapping for
      range beyond max range. But our memblock layer looks the device tree and create
      mapping for the full memory range. Prevent this by checking against
      MAX_PHSYSMEM_BITS when doing memblock_add.
      
      We don't do similar check for memeblock_reserve_range. If reserve range is beyond
      MAX_PHYSMEM_BITS we expect that to be configured with 'nomap'. Any other
      reserved range should come from existing memblock ranges which we already
      filtered while adding.
      
      This avoids crash as below when running on a system with system ram config above
      MAX_PHSYSMEM_BITS
      
       Unable to handle kernel paging request for data at address 0xc00a001000000440
       Faulting instruction address: 0xc000000001034118
       cpu 0x0: Vector: 300 (Data Access) at [c00000000124fb30]
           pc: c000000001034118: __free_pages_bootmem+0xc0/0x1c0
           lr: c00000000103b258: free_all_bootmem+0x19c/0x22c
           sp: c00000000124fdb0
          msr: 9000000002001033
          dar: c00a001000000440
        dsisr: 40000000
         current = 0xc00000000120dd00
         paca    = 0xc000000001f60000^I irqmask: 0x03^I irq_happened: 0x01
           pid   = 0, comm = swapper
       [c00000000124fe20] c00000000103b258 free_all_bootmem+0x19c/0x22c
       [c00000000124fee0] c000000001010a68 mem_init+0x3c/0x5c
       [c00000000124ff00] c00000000100401c start_kernel+0x298/0x5e4
       [c00000000124ff90] c00000000000b57c start_here_common+0x1c/0x520
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      6aba0c84
    • M
      powerpc: Add ppc64le and ppc64_book3e allmodconfig targets · 64de5d8d
      Michael Ellerman 提交于
      Similarly as we just did for 32-bit, add phony targets for generating
      a little endian and Book3E allmodconfig. These aren't covered by the
      regular allmodconfig, which is big endian and Book3S due to the way
      the Kconfig symbols are structured.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      64de5d8d
    • M
      powerpc: Add ppc32_allmodconfig defconfig target · 8db0c9d4
      Michael Ellerman 提交于
      Because the allmodconfig logic just sets every symbol to M or Y, it
      has the effect of always generating a 64-bit config, because
      CONFIG_PPC64 becomes Y.
      
      So to make it easier for folks to test 32-bit code, provide a phony
      defconfig target that generates a 32-bit allmodconfig.
      
      The 32-bit port has several mutually exclusive CPU types, we choose
      the Book3S variants as that's what the help text in Kconfig says is
      most common.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      8db0c9d4
    • M
      powerpc64s: Show ori31 availability in spectre_v1 sysfs file not v2 · 6d44acae
      Michael Ellerman 提交于
      When I added the spectre_v2 information in sysfs, I included the
      availability of the ori31 speculation barrier.
      
      Although the ori31 barrier can be used to mitigate v2, it's primarily
      intended as a spectre v1 mitigation. Spectre v2 is mitigated by
      hardware changes.
      
      So rework the sysfs files to show the ori31 information in the
      spectre_v1 file, rather than v2.
      
      Currently we display eg:
      
        $ grep . spectre_v*
        spectre_v1:Mitigation: __user pointer sanitization
        spectre_v2:Mitigation: Indirect branch cache disabled, ori31 speculation barrier enabled
      
      After:
      
        $ grep . spectre_v*
        spectre_v1:Mitigation: __user pointer sanitization, ori31 speculation barrier enabled
        spectre_v2:Mitigation: Indirect branch cache disabled
      
      Fixes: d6fbe1c5 ("powerpc/64s: Wire up cpu_show_spectre_v2()")
      Cc: stable@vger.kernel.org # v4.17+
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      6d44acae
    • N
      powerpc: NMI IPI make NMI IPIs fully sychronous · 5b73151f
      Nicholas Piggin 提交于
      There is an asynchronous aspect to smp_send_nmi_ipi. The caller waits
      for all CPUs to call in to the handler, but it does not wait for
      completion of the handler. This is a needless complication, so remove
      it and always wait synchronously.
      
      The synchronous wait allows the caller to easily time out and clear
      the wait for completion (zero nmi_ipi_busy_count) in the case of badly
      behaved handlers. This would have prevented the recent smp_send_stop
      NMI IPI bug from causing the system to hang.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      5b73151f
    • N
      powerpc/64s: make PACA_IRQ_HARD_DIS track MSR[EE] closely · 9b81c021
      Nicholas Piggin 提交于
      When the masked interrupt handler clears MSR[EE] for an interrupt in
      the PACA_IRQ_MUST_HARD_MASK set, it does not set PACA_IRQ_HARD_DIS.
      This makes them get out of synch.
      
      With that taken into account, it's only low level irq manipulation
      (and interrupt entry before reconcile) where they can be out of synch.
      This makes the code less surprising.
      
      It also allows the IRQ replay code to rely on the IRQ_HARD_DIS value
      and not have to mtmsrd again in this case (e.g., for an external
      interrupt that has been masked). The bigger benefit might just be
      that there is not such an element of surprise in these two bits of
      state.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      9b81c021
    • R
      powerpc/pkeys: make protection key 0 less special · 07f522d2
      Ram Pai 提交于
      Applications need the ability to associate an address-range with some
      key and latter revert to its initial default key. Pkey-0 comes close to
      providing this function but falls short, because the current
      implementation disallows applications to explicitly associate pkey-0 to
      the address range.
      
      Lets make pkey-0 less special and treat it almost like any other key.
      Thus it can be explicitly associated with any address range, and can be
      freed. This gives the application more flexibility and power.  The
      ability to free pkey-0 must be used responsibily, since pkey-0 is
      associated with almost all address-range by default.
      
      Even with this change pkey-0 continues to be slightly more special
      from the following point of view.
      (a) it is implicitly allocated.
      (b) it is the default key assigned to any address-range.
      (c) its permissions cannot be modified by userspace.
      
      NOTE: (c) is specific to powerpc only. pkey-0 is associated by default
      with all pages including kernel pages, and pkeys are also active in
      kernel mode. If any permission is denied on pkey-0, the kernel running
      in the context of the application will be unable to operate.
      
      Tested on powerpc.
      Signed-off-by: NRam Pai <linuxram@us.ibm.com>
      [mpe: Drop #define PKEY_0 0 in favour of plain old 0]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      07f522d2
    • R
      powerpc/pkeys: Preallocate execute-only key · a4fcc877
      Ram Pai 提交于
      execute-only key is allocated dynamically. This is a problem. When a
      thread implicitly creates an execute-only key, and resets the UAMOR
      for that key, the UAMOR value does not percolate to all the other
      threads. Any other thread may ignorantly change the permissions on the
      key. This can cause the key to be not execute-only for that thread.
      
      Preallocate the execute-only key and ensure that no thread can change
      the permission of the key, by resetting the corresponding bit in
      UAMOR.
      
      Fixes: 5586cf61 ("powerpc: introduce execute-only pkey")
      Cc: stable@vger.kernel.org # v4.16+
      Signed-off-by: NRam Pai <linuxram@us.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      a4fcc877
    • R
      powerpc/pkeys: Fix calculation of total pkeys. · fe6a2804
      Ram Pai 提交于
      Total number of pkeys calculation is off by 1. Fix it.
      
      Fixes: 4fb158f6 ("powerpc: track allocation status of all pkeys")
      Cc: stable@vger.kernel.org # v4.16+
      Signed-off-by: NRam Pai <linuxram@us.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      fe6a2804
    • R
      powerpc/pkeys: Save the pkey registers before fork · c76662e8
      Ram Pai 提交于
      When a thread forks the contents of AMR, IAMR, UAMOR registers in the
      newly forked thread are not inherited.
      
      Save the registers before forking, for content of those
      registers to be automatically copied into the new thread.
      
      Fixes: cf43d3b2 ("powerpc: Enable pkey subsystem")
      Cc: stable@vger.kernel.org # v4.16+
      Signed-off-by: NRam Pai <linuxram@us.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      c76662e8
    • R
      powerpc/pkeys: key allocation/deallocation must not change pkey registers · 4a4a5e5d
      Ram Pai 提交于
      Key allocation and deallocation has the side effect of programming the
      UAMOR/AMR/IAMR registers. This is wrong, since its the responsibility of
      the application and not that of the kernel, to modify the permission on
      the key.
      
      Do not modify the pkey registers at key allocation/deallocation.
      
      This patch also fixes a bug where a sys_pkey_free() resets the UAMOR
      bits of the key, thus making its permissions unmodifiable from user
      space. Later if the same key gets reallocated from a different thread
      this thread will no longer be able to change the permissions on the key.
      
      Fixes: cf43d3b2 ("powerpc: Enable pkey subsystem")
      Cc: stable@vger.kernel.org # v4.16+
      Reviewed-by: NThiago Jung Bauermann <bauerman@linux.ibm.com>
      Signed-off-by: NRam Pai <linuxram@us.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      4a4a5e5d
    • R
      powerpc/pkeys: Deny read/write/execute by default · de113256
      Ram Pai 提交于
      Deny all permissions on all keys, with some exceptions. pkey-0 must
      allow all permissions, or else everything comes to a screaching halt.
      Execute-only key must allow execute permission.
      
      Fixes: cf43d3b2 ("powerpc: Enable pkey subsystem")
      Cc: stable@vger.kernel.org # v4.16+
      Signed-off-by: NRam Pai <linuxram@us.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      de113256
    • R
      powerpc/pkeys: Give all threads control of their key permissions · a57a04c7
      Ram Pai 提交于
      Currently in a multithreaded application, a key allocated by one
      thread is not usable by other threads. By "not usable" we mean that
      other threads are unable to change the access permissions for that
      key for themselves.
      
      When a new key is allocated in one thread, the corresponding UAMOR
      bits for that thread get enabled, however the UAMOR bits for that key
      for all other threads remain disabled.
      
      Other threads have no way to set permissions on the key, and the
      current default permissions are that read/write is enabled for all
      keys, which means the key has no effect for other threads. Although
      that may be the desired behaviour in some circumstances, having all
      threads able to control their permissions for the key is more
      flexible.
      
      The current behaviour also differs from the x86 behaviour, which is
      problematic for users.
      
      To fix this, enable the UAMOR bits for all keys, at process
      creation (in start_thread(), ie exec time). Since the contents of
      UAMOR are inherited at fork, all threads are capable of modifying the
      permissions on any key.
      
      This is technically an ABI break on powerpc, but pkey support is fairly
      new on powerpc and not widely used, and this brings us into
      line with x86.
      
      Fixes: cf43d3b2 ("powerpc: Enable pkey subsystem")
      Cc: stable@vger.kernel.org # v4.16+
      Tested-by: NFlorian Weimer <fweimer@redhat.com>
      Signed-off-by: NRam Pai <linuxram@us.ibm.com>
      [mpe: Reword some of the changelog]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      a57a04c7
  3. 20 7月, 2018 1 次提交
  4. 19 7月, 2018 1 次提交