1. 19 9月, 2018 23 次提交
    • N
      powerpc: remove old GCC version checks · f2910f0e
      Nicholas Piggin 提交于
      GCC 4.6 is the minimum supported now.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Reviewed-by: NJoel Stanley <joel@jms.id.au>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      f2910f0e
    • N
      powerpc/64s/hash: Add a SLB preload cache · 89ca4e12
      Nicholas Piggin 提交于
      When switching processes, currently all user SLBEs are cleared, and a
      few (exec_base, pc, and stack) are preloaded. In trivial testing with
      small apps, this tends to miss the heap and low 256MB segments, and it
      will also miss commonly accessed segments on large memory workloads.
      
      Add a simple round-robin preload cache that just inserts the last SLB
      miss into the head of the cache and preloads those at context switch
      time. Every 256 context switches, the oldest entry is removed from the
      cache to shrink the cache and require fewer slbmte if they are unused.
      
      Much more could go into this, including into the SLB entry reclaim
      side to track some LRU information etc, which would require a study of
      large memory workloads. But this is a simple thing we can do now that
      is an obvious win for common workloads.
      
      With the full series, process switching speed on the context_switch
      benchmark on POWER9/hash (with kernel speculation security masures
      disabled) increases from 140K/s to 178K/s (27%).
      
      POWER8 does not change much (within 1%), it's unclear why it does not
      see a big gain like POWER9.
      
      Booting to busybox init with 256MB segments has SLB misses go down
      from 945 to 69, and with 1T segments 900 to 21. These could almost all
      be eliminated by preloading a bit more carefully with ELF binary
      loading.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      89ca4e12
    • N
      powerpc/64s/hash: provide arch_setup_exec hooks for hash slice setup · 2e162674
      Nicholas Piggin 提交于
      This will be used by the SLB code in the next patch, but for now this
      sets the slb_addr_limit to the correct size for 32-bit tasks.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      2e162674
    • N
    • N
      powerpc/64s/hash: SLB allocation status bitmaps · 655deecf
      Nicholas Piggin 提交于
      Add 32-entry bitmaps to track the allocation status of the first 32
      SLB entries, and whether they are user or kernel entries. These are
      used to allocate free SLB entries first, before resorting to the round
      robin allocator.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      655deecf
    • N
      powerpc/64s/hash: remove user SLB data from the paca · 8fed04d0
      Nicholas Piggin 提交于
      User SLB mappig data is copied into the PACA from the mm->context so
      it can be accessed by the SLB miss handlers.
      
      After the C conversion, SLB miss handlers now run with relocation on,
      and user SLB misses are able to take recursive kernel SLB misses, so
      the user SLB mapping data can be removed from the paca and accessed
      directly.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      8fed04d0
    • N
      powerpc/64s/hash: convert SLB miss handlers to C · 5e46e29e
      Nicholas Piggin 提交于
      This patch moves SLB miss handlers completely to C, using the standard
      exception handler macros to set up the stack and branch to C.
      
      This can be done because the segment containing the kernel stack is
      always bolted, so accessing it with relocation on will not cause an
      SLB exception.
      
      Arbitrary kernel memory may not be accessed when handling kernel space
      SLB misses, so care should be taken there. However user SLB misses can
      access any kernel memory, which can be used to move some fields out of
      the paca (in later patches).
      
      User SLB misses could quite easily reconcile IRQs and set up a first
      class kernel environment and exit via ret_from_except, however that
      doesn't seem to be necessary at the moment, so we only do that if a
      bad fault is encountered.
      
      [ Credit to Aneesh for bug fixes, error checks, and improvements to bad
        address handling, etc ]
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      
      Since RFC:
      - Added MSR[RI] handling
      - Fixed up a register loss bug exposed by irq tracing (Aneesh)
      - Reject misses outside the defined kernel regions (Aneesh)
      - Added several more sanity checks and error handling (Aneesh), we may
        look at consolidating these tests and tightenig up the code but for
        a first pass we decided it's better to check carefully.
      
      Since v1:
      - Fixed SLB cache corruption (Aneesh)
      - Fixed untidy SLBE allocation "leak" in get_vsid error case
      - Now survives some stress testing on real hardware
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      5e46e29e
    • N
      powerpc/64s/hash: Use POWER9 SLBIA IH=3 variant in switch_slb · 82d8f4c2
      Nicholas Piggin 提交于
      POWER9 introduces SLBIA IH=3, which invalidates all SLB entries and
      associated lookaside information that have a class value of 1, which
      Linux assigns to user addresses. This matches what switch_slb wants,
      and allows a simple fast implementation that avoids the slb_cache
      complexity.
      
      As a side-effect, the POWER5 < DD2.1 SLB invalidation workaround is
      also avoided on POWER9.
      
      Process context switching rate is improved about 2.2% for a small
      process that hits the slb cache which is the best case for the current
      code.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      82d8f4c2
    • N
      powerpc/64s/hash: Use POWER6 SLBIA IH=1 variant in switch_slb · 5141c182
      Nicholas Piggin 提交于
      The SLBIA IH=1 hint will remove all non-zero SLBEs, but only
      invalidate ERAT entries associated with a class value of 1, for
      processors that support the hint (e.g., POWER6 and newer), which
      Linux assigns to user addresses.
      
      This prevents kernel ERAT entries from being invalidated when
      context switchig (if the thread faulted in more than 8 user SLBEs).
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      5141c182
    • N
      powerpc/64s/hash: remove the vmalloc segment from the bolted SLB · 85376e2a
      Nicholas Piggin 提交于
      Remove the vmalloc segment from bolted SLBEs. This is not required to
      be bolted, and seems like it was added to help pre-load the SLB on
      context switch. However there are now other segments like the vmemmap
      segment and non-zero node memory that often take misses after a context
      switch, so it is better to solve this in a more general way.
      
      A subsequent change will track free SLB entries and uses those rather
      than round-robin overwrite valid entries, which makes it far less
      likely for kernel SLBEs to be evicted after they are installed.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      85376e2a
    • N
      powerpc/64s/hash: move POWER5 < DD2.1 slbie workaround where it is needed · 8b92887c
      Nicholas Piggin 提交于
      The POWER5 < DD2.1 issue is that slbie needs to be issued more than
      once. It came in with this change:
      
      ChangeSet@1.1608, 2004-04-29 07:12:31-07:00, david@gibson.dropbear.id.au
        [PATCH] POWER5 erratum workaround
      
        Early POWER5 revisions (<DD2.1) have a problem requiring slbie
        instructions to be repeated under some circumstances.  The patch below
        adds a workaround (patch made by Anton Blanchard).
      
      (aka. 3e4520f7605243abf66a7ccd3d2e49e48e8c0483 in the full history tree)
      
      The extra slbie in switch_slb is done even for the case where slbia is
      called (slb_flush_and_rebolt). I don't believe that is required
      because there are other slb_flush_and_rebolt callers which do not
      issue the workaround slbie, which would be broken if it was required.
      
      It also seems to be fine inside the isync with the first slbie, as it
      is in the kernel stack switch code.
      
      So move this workaround to where it is required. This is not much of
      an optimisation because this is the fast path, but it makes the code
      more understandable and neater.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      [mpe: Retain slbie_data initialisation to avoid compiler warning]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      8b92887c
    • N
      powerpc/64s/hash: avoid the POWER5 < DD2.1 slb invalidate workaround on POWER8/9 · 505ea82e
      Nicholas Piggin 提交于
      I only have POWER8/9 to test, so just remove it for those.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      505ea82e
    • N
      powerpc/64s/hash: Fix stab_rr off by one initialization · 09b4438d
      Nicholas Piggin 提交于
      This causes SLB alloation to start 1 beyond the start of the SLB.
      There is no real problem because after it wraps it stats behaving
      properly, it's just surprisig to see when looking at SLB traces.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      09b4438d
    • M
      powernv/pseries: consolidate code for mce early handling. · db7d31ac
      Mahesh Salgaonkar 提交于
      Now that other platforms also implements real mode mce handler,
      lets consolidate the code by sharing existing powernv machine check
      early code. Rename machine_check_powernv_early to
      machine_check_common_early and reuse the code.
      Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      db7d31ac
    • M
      powerpc/pseries: Dump the SLB contents on SLB MCE errors. · c6d15258
      Mahesh Salgaonkar 提交于
      If we get a machine check exceptions due to SLB errors then dump the
      current SLB contents which will be very much helpful in debugging the
      root cause of SLB errors. Introduce an exclusive buffer per cpu to hold
      faulty SLB entries. In real mode mce handler saves the old SLB contents
      into this buffer accessible through paca and print it out later in virtual
      mode.
      
      With this patch the console will log SLB contents like below on SLB MCE
      errors:
      
      [  507.297236] SLB contents of cpu 0x1
      [  507.297237] Last SLB entry inserted at slot 16
      [  507.297238] 00 c000000008000000 400ea1b217000500
      [  507.297239]   1T  ESID=   c00000  VSID=      ea1b217 LLP:100
      [  507.297240] 01 d000000008000000 400d43642f000510
      [  507.297242]   1T  ESID=   d00000  VSID=      d43642f LLP:110
      [  507.297243] 11 f000000008000000 400a86c85f000500
      [  507.297244]   1T  ESID=   f00000  VSID=      a86c85f LLP:100
      [  507.297245] 12 00007f0008000000 4008119624000d90
      [  507.297246]   1T  ESID=       7f  VSID=      8119624 LLP:110
      [  507.297247] 13 0000000018000000 00092885f5150d90
      [  507.297247]  256M ESID=        1  VSID=   92885f5150 LLP:110
      [  507.297248] 14 0000010008000000 4009e7cb50000d90
      [  507.297249]   1T  ESID=        1  VSID=      9e7cb50 LLP:110
      [  507.297250] 15 d000000008000000 400d43642f000510
      [  507.297251]   1T  ESID=   d00000  VSID=      d43642f LLP:110
      [  507.297252] 16 d000000008000000 400d43642f000510
      [  507.297253]   1T  ESID=   d00000  VSID=      d43642f LLP:110
      [  507.297253] ----------------------------------
      [  507.297254] SLB cache ptr value = 3
      [  507.297254] Valid SLB cache entries:
      [  507.297255] 00 EA[0-35]=    7f000
      [  507.297256] 01 EA[0-35]=        1
      [  507.297257] 02 EA[0-35]=     1000
      [  507.297257] Rest of SLB cache entries:
      [  507.297258] 03 EA[0-35]=    7f000
      [  507.297258] 04 EA[0-35]=        1
      [  507.297259] 05 EA[0-35]=     1000
      [  507.297260] 06 EA[0-35]=       12
      [  507.297260] 07 EA[0-35]=    7f000
      Suggested-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Suggested-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Reviewed-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      c6d15258
    • M
      powerpc/pseries: Display machine check error details. · 8f0b8056
      Mahesh Salgaonkar 提交于
      Extract the MCE error details from RTAS extended log and display it to
      console.
      
      With this patch you should now see mce logs like below:
      
      [  142.371818] Severe Machine check interrupt [Recovered]
      [  142.371822]   NIP [d00000000ca301b8]: init_module+0x1b8/0x338 [bork_kernel]
      [  142.371822]   Initiator: CPU
      [  142.371823]   Error type: SLB [Multihit]
      [  142.371824]     Effective address: d00000000ca70000
      Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      8f0b8056
    • M
      powerpc/pseries: Flush SLB contents on SLB MCE errors. · a43c1590
      Mahesh Salgaonkar 提交于
      On pseries, as of today system crashes if we get a machine check
      exceptions due to SLB errors. These are soft errors and can be fixed
      by flushing the SLBs so the kernel can continue to function instead of
      system crash. We do this in real mode before turning on MMU. Otherwise
      we would run into nested machine checks. This patch now fetches the
      rtas error log in real mode and flushes the SLBs on SLB/ERAT errors.
      Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Signed-off-by: NMichal Suchanek <msuchanek@suse.com>
      Reviewed-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      a43c1590
    • M
      powerpc/pseries: Define MCE error event section. · 04fce21c
      Mahesh Salgaonkar 提交于
      On pseries, the machine check error details are part of RTAS extended
      event log passed under Machine check exception section. This patch adds
      the definition of rtas MCE event section and related helper
      functions.
      Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      04fce21c
    • B
      selftests/powerpc: Do not fail with reschedule · 44d947ef
      Breno Leitao 提交于
      There are cases where the test is not expecting to have the transaction
      aborted, but, the test process might have been rescheduled, either in the
      OS level or by KVM (if it is running on a KVM guest machine). The process
      reschedule will cause a treclaim/recheckpoint which will cause the
      transaction to doom, aborting the transaction as soon as the process is
      rescheduled back to the CPU. This might cause the test to fail, but this is
      not a failure in essence.
      
      If that is the case, TEXASR[FC] is indicated with either
      TM_CAUSE_RESCHEDULE or TM_CAUSE_KVM_RESCHEDULE for KVM interruptions.
      
      In this scenario, ignore these two failures and avoid the whole test to
      return failure.
      Signed-off-by: NBreno Leitao <leitao@debian.org>
      Reviewed-by: NGustavo Romero <gromero@linux.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      44d947ef
    • B
      powerpc/iommu: Avoid derefence before pointer check · 984ecdd6
      Breno Leitao 提交于
      The tbl pointer is being derefenced by IOMMU_PAGE_SIZE prior the check
      if it is not NULL.
      
      Just moving the dereference code to after the check, where there will
      be guarantee that 'tbl' will not be NULL.
      Signed-off-by: NBreno Leitao <leitao@debian.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      984ecdd6
    • B
      powerpc/xive: Use xive_cpu->chip_id instead of looking it up again · 8ac9e5bf
      Breno Leitao 提交于
      Function xive_native_get_ipi() might use chip_id without it being
      initialized, if the CPU node is not found, as reported by smatch:
      
        error: uninitialized symbol 'chip_id'
      
      As suggested by Cédric, we can use xc->chip_id instead of consulting
      the device tree for chip id, which is safe since xive_prepare_cpu()
      should have initialized ->chip_id by the time xive_native_get_ipi() is
      called.
      Signed-off-by: NBreno Leitao <leitao@debian.org>
      Reviewed-by: NCédric Le Goater <clg@kaod.org>
      [mpe: Tweak change log]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      8ac9e5bf
    • C
      ocxl: Fix access to the AFU Descriptor Data · 6f8e45f7
      Christophe Lombard 提交于
      The AFU Information DVSEC capability is a means to extract common,
      general information about all of the AFUs associated with a Function
      independent of the specific functionality that each AFU provides.
      Write in the AFU Index field allows to access to the descriptor data
      for each AFU.
      
      With the current code, we are not able to access to these specific data
      when the index >= 1 because we are writing to the wrong location.
      All requests to the data of each AFU are pointing to those of the AFU 0,
      which could have impacts when using a card with more than one AFU per
      function.
      
      This patch fixes the access to the AFU Descriptor Data indexed by the
      AFU Info Index field.
      
      Fixes: 5ef3166e ("ocxl: Driver code for 'generic' opencapi devices")
      Cc: stable <stable@vger.kernel.org>     # 4.16
      Signed-off-by: NChristophe Lombard <clombard@linux.vnet.ibm.com>
      Acked-by: NFrederic Barrat <fbarrat@linux.vnet.ibm.com>
      Acked-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      6f8e45f7
    • R
      powerpc/memtrace: Remove memory in chunks · 3f7daf3d
      Rashmica Gupta 提交于
      When hot-removing memory release_mem_region_adjustable() splits iomem
      resources if they are not the exact size of the memory being
      hot-deleted. Adding this memory back to the kernel adds a new resource.
      
      Eg a node has memory 0x0 - 0xfffffffff. Hot-removing 1GB from
      0xf40000000 results in the single resource 0x0-0xfffffffff being split
      into two resources: 0x0-0xf3fffffff and 0xf80000000-0xfffffffff.
      
      When we hot-add the memory back we now have three resources:
      0x0-0xf3fffffff, 0xf40000000-0xf7fffffff, and 0xf80000000-0xfffffffff.
      
      This is an issue if we try to remove some memory that overlaps
      resources. Eg when trying to remove 2GB at address 0xf40000000,
      release_mem_region_adjustable() fails as it expects the chunk of memory
      to be within the boundaries of a single resource. We then get the
      warning: "Unable to release resource" and attempting to use memtrace
      again gives us this error: "bash: echo: write error: Resource
      temporarily unavailable"
      
      This patch makes memtrace remove memory in chunks that are always the
      same size from an address that is always equal to end_of_memory -
      n*size, for some n. So hotremoving and hotadding memory of different
      sizes will now not attempt to remove memory that spans multiple
      resources.
      Signed-off-by: NRashmica Gupta <rashmica.g@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      3f7daf3d
  2. 17 9月, 2018 7 次提交
  3. 14 9月, 2018 2 次提交
    • A
      powerpc/vdso: Correct call frame information · 56d20861
      Alan Modra 提交于
      Call Frame Information is used by gdb for back-traces and inserting
      breakpoints on function return for the "finish" command.  This failed
      when inside __kernel_clock_gettime.  More concerning than difficulty
      debugging is that CFI is also used by stack frame unwinding code to
      implement exceptions.  If you have an app that needs to handle
      asynchronous exceptions for some reason, and you are unlucky enough to
      get one inside the VDSO time functions, your app will crash.
      
      What's wrong:  There is control flow in __kernel_clock_gettime that
      reaches label 99 without saving lr in r12.  CFI info however is
      interpreted by the unwinder without reference to control flow: It's a
      simple matter of "Execute all the CFI opcodes up to the current
      address".  That means the unwinder thinks r12 contains the return
      address at label 99.  Disabuse it of that notion by resetting CFI for
      the return address at label 99.
      
      Note that the ".cfi_restore lr" could have gone anywhere from the
      "mtlr r12" a few instructions earlier to the instruction at label 99.
      I put the CFI as late as possible, because in general that's best
      practice (and if possible grouped with other CFI in order to reduce
      the number of CFI opcodes executed when unwinding).  Using r12 as the
      return address is perfectly fine after the "mtlr r12" since r12 on
      that code path still contains the return address.
      
      __get_datapage also has a CFI error.  That function temporarily saves
      lr in r0, and reflects that fact with ".cfi_register lr,r0".  A later
      use of r0 means the CFI at that point isn't correct, as r0 no longer
      contains the return address.  Fix that too.
      Signed-off-by: NAlan Modra <amodra@gmail.com>
      Tested-by: NReza Arbab <arbab@linux.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      56d20861
    • M
      powerpc/tm: Fix HFSCR bit for no suspend case · dd9a8c5a
      Michael Neuling 提交于
      Currently on P9N DD2.1 we end up taking infinite TM facility
      unavailable exceptions on the first TM usage by userspace.
      
      In the special case of TM no suspend (P9N DD2.1), Linux is told TM is
      off via CPU dt-ftrs but told to (partially) use it via
      OPAL_REINIT_CPUS_TM_SUSPEND_DISABLED. So HFSCR[TM] will be off from
      dt-ftrs but we need to turn it on for the no suspend case.
      
      This patch fixes this by enabling HFSCR TM in this case.
      
      Cc: stable@vger.kernel.org # 4.15+
      Signed-off-by: NMichael Neuling <mikey@neuling.org>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      dd9a8c5a
  4. 10 9月, 2018 1 次提交
  5. 09 9月, 2018 7 次提交