1. 12 11月, 2018 1 次提交
    • M
      powerpc/mm/64s: Fix preempt warning in slb_allocate_kernel() · c8b00bb7
      Michael Ellerman 提交于
      With preempt enabled we see warnings in do_slb_fault():
      
        BUG: using smp_processor_id() in preemptible [00000000] code: kworker/u33:0/98
        futex hash table entries: 4096 (order: 3, 524288 bytes)
        caller is do_slb_fault+0x204/0x230
        CPU: 5 PID: 98 Comm: kworker/u33:0 Not tainted 4.19.0-rc3-gcc-7.3.1-00022-g1936f094 #138
        Call Trace:
          dump_stack+0xb4/0x104 (unreliable)
          check_preemption_disabled+0x148/0x150
          do_slb_fault+0x204/0x230
          data_access_slb_common+0x138/0x180
      
      This is caused by the get_paca() in slb_allocate_kernel(), which
      includes a call to debug_smp_processor_id().
      
      slb_allocate_kernel() can only be called from do_slb_fault(), and in
      that path interrupts are hard disabled and so we can't be preempted,
      but we can't update the preempt flags (in thread_info) because that
      could cause an SLB fault.
      
      So just use local_paca which is safe and doesn't cause the warning.
      
      Fixes: 48e7b769 ("powerpc/64s/hash: Convert SLB miss handlers to C")
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      c8b00bb7
  2. 06 11月, 2018 3 次提交
  3. 14 10月, 2018 7 次提交
    • A
      powerpc/mm: Increase the max addressable memory to 2PB · 4ffe713b
      Aneesh Kumar K.V 提交于
      Currently we limit the max addressable memory to 128TB. This patch increase the
      limit to 2PB. We can have devices like nvdimm which adds memory above 512TB
      limit.
      
      We still don't support regular system ram above 512TB. One of the challenge with
      that is the percpu allocator, that allocates per node memory and use the max
      distance between them as the percpu offsets. This means with large gap in
      address space ( system ram above 1PB) we will run out of vmalloc space to map
      the percpu allocation.
      
      In order to support addressable memory above 512TB, kernel should be able to
      linear map this range. To do that with hash translation we now add 4 context
      to kernel linear map region. Our per context addressable range is 512TB. We
      still keep VMALLOC and VMEMMAP region to old size. SLB miss handlers is updated
      to validate these limit.
      
      We also limit this update to SPARSEMEM_VMEMMAP and SPARSEMEM_EXTREME
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      4ffe713b
    • A
      powerpc/mm/hash: Rename get_ea_context to get_user_context · c9f80734
      Aneesh Kumar K.V 提交于
      We will be adding get_kernel_context later. Update function name to indicate
      this handle context allocation user space address.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      c9f80734
    • N
      powerpc/64s/hash: Add some SLB debugging tests · e15a4fea
      Nicholas Piggin 提交于
      This adds CONFIG_DEBUG_VM checks to ensure:
        - The kernel stack is in the SLB after it's flushed and bolted.
        - We don't insert an SLB for an address that is aleady in the SLB.
        - The kernel SLB miss handler does not take an SLB miss.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      e15a4fea
    • N
      powerpc/64s/hash: Simplify slb_flush_and_rebolt() · 94ee4272
      Nicholas Piggin 提交于
      slb_flush_and_rebolt() is misleading, it is called in virtual mode, so
      it can not possibly change the stack, so it should not be touching the
      shadow area. And since vmalloc is no longer bolted, it should not
      change any bolted mappings at all.
      
      Change the name to slb_flush_and_restore_bolted(), and have it just
      load the kernel stack from what's currently in the shadow SLB area.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      94ee4272
    • N
      powerpc/64s/hash: Add a SLB preload cache · 5434ae74
      Nicholas Piggin 提交于
      When switching processes, currently all user SLBEs are cleared, and a
      few (exec_base, pc, and stack) are preloaded. In trivial testing with
      small apps, this tends to miss the heap and low 256MB segments, and it
      will also miss commonly accessed segments on large memory workloads.
      
      Add a simple round-robin preload cache that just inserts the last SLB
      miss into the head of the cache and preloads those at context switch
      time. Every 256 context switches, the oldest entry is removed from the
      cache to shrink the cache and require fewer slbmte if they are unused.
      
      Much more could go into this, including into the SLB entry reclaim
      side to track some LRU information etc, which would require a study of
      large memory workloads. But this is a simple thing we can do now that
      is an obvious win for common workloads.
      
      With the full series, process switching speed on the context_switch
      benchmark on POWER9/hash (with kernel speculation security masures
      disabled) increases from 140K/s to 178K/s (27%).
      
      POWER8 does not change much (within 1%), it's unclear why it does not
      see a big gain like POWER9.
      
      Booting to busybox init with 256MB segments has SLB misses go down
      from 945 to 69, and with 1T segments 900 to 21. These could almost all
      be eliminated by preloading a bit more carefully with ELF binary
      loading.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      5434ae74
    • N
      powerpc/64s/hash: Add SLB allocation status bitmaps · 126b11b2
      Nicholas Piggin 提交于
      Add 32-entry bitmaps to track the allocation status of the first 32
      SLB entries, and whether they are user or kernel entries. These are
      used to allocate free SLB entries first, before resorting to the round
      robin allocator.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      126b11b2
    • N
      powerpc/64s/hash: Convert SLB miss handlers to C · 48e7b769
      Nicholas Piggin 提交于
      This patch moves SLB miss handlers completely to C, using the standard
      exception handler macros to set up the stack and branch to C.
      
      This can be done because the segment containing the kernel stack is
      always bolted, so accessing it with relocation on will not cause an
      SLB exception.
      
      Arbitrary kernel memory must not be accessed when handling kernel
      space SLB misses, so care should be taken there. However user SLB
      misses can access any kernel memory, which can be used to move some
      fields out of the paca (in later patches).
      
      User SLB misses could quite easily reconcile IRQs and set up a first
      class kernel environment and exit via ret_from_except, however that
      doesn't seem to be necessary at the moment, so we only do that if a
      bad fault is encountered.
      
      [ Credit to Aneesh for bug fixes, error checks, and improvements to
        bad address handling, etc ]
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      [mpe: Disallow tracing for all of slb.c for now.]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      48e7b769
  4. 03 10月, 2018 1 次提交
  5. 19 9月, 2018 11 次提交
    • N
      powerpc/64s/hash: Add a SLB preload cache · 89ca4e12
      Nicholas Piggin 提交于
      When switching processes, currently all user SLBEs are cleared, and a
      few (exec_base, pc, and stack) are preloaded. In trivial testing with
      small apps, this tends to miss the heap and low 256MB segments, and it
      will also miss commonly accessed segments on large memory workloads.
      
      Add a simple round-robin preload cache that just inserts the last SLB
      miss into the head of the cache and preloads those at context switch
      time. Every 256 context switches, the oldest entry is removed from the
      cache to shrink the cache and require fewer slbmte if they are unused.
      
      Much more could go into this, including into the SLB entry reclaim
      side to track some LRU information etc, which would require a study of
      large memory workloads. But this is a simple thing we can do now that
      is an obvious win for common workloads.
      
      With the full series, process switching speed on the context_switch
      benchmark on POWER9/hash (with kernel speculation security masures
      disabled) increases from 140K/s to 178K/s (27%).
      
      POWER8 does not change much (within 1%), it's unclear why it does not
      see a big gain like POWER9.
      
      Booting to busybox init with 256MB segments has SLB misses go down
      from 945 to 69, and with 1T segments 900 to 21. These could almost all
      be eliminated by preloading a bit more carefully with ELF binary
      loading.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      89ca4e12
    • N
      powerpc/64s/hash: SLB allocation status bitmaps · 655deecf
      Nicholas Piggin 提交于
      Add 32-entry bitmaps to track the allocation status of the first 32
      SLB entries, and whether they are user or kernel entries. These are
      used to allocate free SLB entries first, before resorting to the round
      robin allocator.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      655deecf
    • N
      powerpc/64s/hash: remove user SLB data from the paca · 8fed04d0
      Nicholas Piggin 提交于
      User SLB mappig data is copied into the PACA from the mm->context so
      it can be accessed by the SLB miss handlers.
      
      After the C conversion, SLB miss handlers now run with relocation on,
      and user SLB misses are able to take recursive kernel SLB misses, so
      the user SLB mapping data can be removed from the paca and accessed
      directly.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      8fed04d0
    • N
      powerpc/64s/hash: convert SLB miss handlers to C · 5e46e29e
      Nicholas Piggin 提交于
      This patch moves SLB miss handlers completely to C, using the standard
      exception handler macros to set up the stack and branch to C.
      
      This can be done because the segment containing the kernel stack is
      always bolted, so accessing it with relocation on will not cause an
      SLB exception.
      
      Arbitrary kernel memory may not be accessed when handling kernel space
      SLB misses, so care should be taken there. However user SLB misses can
      access any kernel memory, which can be used to move some fields out of
      the paca (in later patches).
      
      User SLB misses could quite easily reconcile IRQs and set up a first
      class kernel environment and exit via ret_from_except, however that
      doesn't seem to be necessary at the moment, so we only do that if a
      bad fault is encountered.
      
      [ Credit to Aneesh for bug fixes, error checks, and improvements to bad
        address handling, etc ]
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      
      Since RFC:
      - Added MSR[RI] handling
      - Fixed up a register loss bug exposed by irq tracing (Aneesh)
      - Reject misses outside the defined kernel regions (Aneesh)
      - Added several more sanity checks and error handling (Aneesh), we may
        look at consolidating these tests and tightenig up the code but for
        a first pass we decided it's better to check carefully.
      
      Since v1:
      - Fixed SLB cache corruption (Aneesh)
      - Fixed untidy SLBE allocation "leak" in get_vsid error case
      - Now survives some stress testing on real hardware
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      5e46e29e
    • N
      powerpc/64s/hash: Use POWER9 SLBIA IH=3 variant in switch_slb · 82d8f4c2
      Nicholas Piggin 提交于
      POWER9 introduces SLBIA IH=3, which invalidates all SLB entries and
      associated lookaside information that have a class value of 1, which
      Linux assigns to user addresses. This matches what switch_slb wants,
      and allows a simple fast implementation that avoids the slb_cache
      complexity.
      
      As a side-effect, the POWER5 < DD2.1 SLB invalidation workaround is
      also avoided on POWER9.
      
      Process context switching rate is improved about 2.2% for a small
      process that hits the slb cache which is the best case for the current
      code.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      82d8f4c2
    • N
      powerpc/64s/hash: Use POWER6 SLBIA IH=1 variant in switch_slb · 5141c182
      Nicholas Piggin 提交于
      The SLBIA IH=1 hint will remove all non-zero SLBEs, but only
      invalidate ERAT entries associated with a class value of 1, for
      processors that support the hint (e.g., POWER6 and newer), which
      Linux assigns to user addresses.
      
      This prevents kernel ERAT entries from being invalidated when
      context switchig (if the thread faulted in more than 8 user SLBEs).
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      5141c182
    • N
      powerpc/64s/hash: remove the vmalloc segment from the bolted SLB · 85376e2a
      Nicholas Piggin 提交于
      Remove the vmalloc segment from bolted SLBEs. This is not required to
      be bolted, and seems like it was added to help pre-load the SLB on
      context switch. However there are now other segments like the vmemmap
      segment and non-zero node memory that often take misses after a context
      switch, so it is better to solve this in a more general way.
      
      A subsequent change will track free SLB entries and uses those rather
      than round-robin overwrite valid entries, which makes it far less
      likely for kernel SLBEs to be evicted after they are installed.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      85376e2a
    • N
      powerpc/64s/hash: move POWER5 < DD2.1 slbie workaround where it is needed · 8b92887c
      Nicholas Piggin 提交于
      The POWER5 < DD2.1 issue is that slbie needs to be issued more than
      once. It came in with this change:
      
      ChangeSet@1.1608, 2004-04-29 07:12:31-07:00, david@gibson.dropbear.id.au
        [PATCH] POWER5 erratum workaround
      
        Early POWER5 revisions (<DD2.1) have a problem requiring slbie
        instructions to be repeated under some circumstances.  The patch below
        adds a workaround (patch made by Anton Blanchard).
      
      (aka. 3e4520f7605243abf66a7ccd3d2e49e48e8c0483 in the full history tree)
      
      The extra slbie in switch_slb is done even for the case where slbia is
      called (slb_flush_and_rebolt). I don't believe that is required
      because there are other slb_flush_and_rebolt callers which do not
      issue the workaround slbie, which would be broken if it was required.
      
      It also seems to be fine inside the isync with the first slbie, as it
      is in the kernel stack switch code.
      
      So move this workaround to where it is required. This is not much of
      an optimisation because this is the fast path, but it makes the code
      more understandable and neater.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      [mpe: Retain slbie_data initialisation to avoid compiler warning]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      8b92887c
    • N
      powerpc/64s/hash: avoid the POWER5 < DD2.1 slb invalidate workaround on POWER8/9 · 505ea82e
      Nicholas Piggin 提交于
      I only have POWER8/9 to test, so just remove it for those.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      505ea82e
    • N
      powerpc/64s/hash: Fix stab_rr off by one initialization · 09b4438d
      Nicholas Piggin 提交于
      This causes SLB alloation to start 1 beyond the start of the SLB.
      There is no real problem because after it wraps it stats behaving
      properly, it's just surprisig to see when looking at SLB traces.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      09b4438d
    • M
      powerpc/pseries: Dump the SLB contents on SLB MCE errors. · c6d15258
      Mahesh Salgaonkar 提交于
      If we get a machine check exceptions due to SLB errors then dump the
      current SLB contents which will be very much helpful in debugging the
      root cause of SLB errors. Introduce an exclusive buffer per cpu to hold
      faulty SLB entries. In real mode mce handler saves the old SLB contents
      into this buffer accessible through paca and print it out later in virtual
      mode.
      
      With this patch the console will log SLB contents like below on SLB MCE
      errors:
      
      [  507.297236] SLB contents of cpu 0x1
      [  507.297237] Last SLB entry inserted at slot 16
      [  507.297238] 00 c000000008000000 400ea1b217000500
      [  507.297239]   1T  ESID=   c00000  VSID=      ea1b217 LLP:100
      [  507.297240] 01 d000000008000000 400d43642f000510
      [  507.297242]   1T  ESID=   d00000  VSID=      d43642f LLP:110
      [  507.297243] 11 f000000008000000 400a86c85f000500
      [  507.297244]   1T  ESID=   f00000  VSID=      a86c85f LLP:100
      [  507.297245] 12 00007f0008000000 4008119624000d90
      [  507.297246]   1T  ESID=       7f  VSID=      8119624 LLP:110
      [  507.297247] 13 0000000018000000 00092885f5150d90
      [  507.297247]  256M ESID=        1  VSID=   92885f5150 LLP:110
      [  507.297248] 14 0000010008000000 4009e7cb50000d90
      [  507.297249]   1T  ESID=        1  VSID=      9e7cb50 LLP:110
      [  507.297250] 15 d000000008000000 400d43642f000510
      [  507.297251]   1T  ESID=   d00000  VSID=      d43642f LLP:110
      [  507.297252] 16 d000000008000000 400d43642f000510
      [  507.297253]   1T  ESID=   d00000  VSID=      d43642f LLP:110
      [  507.297253] ----------------------------------
      [  507.297254] SLB cache ptr value = 3
      [  507.297254] Valid SLB cache entries:
      [  507.297255] 00 EA[0-35]=    7f000
      [  507.297256] 01 EA[0-35]=        1
      [  507.297257] 02 EA[0-35]=     1000
      [  507.297257] Rest of SLB cache entries:
      [  507.297258] 03 EA[0-35]=    7f000
      [  507.297258] 04 EA[0-35]=        1
      [  507.297259] 05 EA[0-35]=     1000
      [  507.297260] 06 EA[0-35]=       12
      [  507.297260] 07 EA[0-35]=    7f000
      Suggested-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Suggested-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Reviewed-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      c6d15258
  6. 23 8月, 2018 1 次提交
  7. 10 8月, 2018 1 次提交
  8. 03 6月, 2018 2 次提交
  9. 30 3月, 2018 1 次提交
  10. 21 6月, 2017 1 次提交
  11. 13 4月, 2017 1 次提交
  12. 31 3月, 2017 1 次提交
  13. 02 3月, 2017 1 次提交
  14. 11 4月, 2016 1 次提交
    • M
      powerpc/mm: Remove long disabled SLB code · 1f4c66e8
      Michael Ellerman 提交于
      We have a bunch of SLB related code in the tree which is there to handle
      dynamic VSIDs - but currently it's all disabled at compile time. The
      comments say "Keep that around for when we re-implement dynamic VSIDs".
      
      But that was over 10 years ago (commit 3c726f8d ("[PATCH] ppc64:
      support 64k pages")). The chance that it would still work unchanged is
      minimal, and in the meantime it's confusing to folks browsing/grepping
      the code. If we ever want to re-instate it, it's in the git history.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Acked-by: NBalbir Singh <bsingharora@gmail.com>
      1f4c66e8
  15. 19 12月, 2015 1 次提交
  16. 01 10月, 2015 2 次提交
  17. 12 8月, 2015 3 次提交
  18. 08 10月, 2014 1 次提交