1. 26 1月, 2020 1 次提交
  2. 16 1月, 2020 1 次提交
  3. 15 6月, 2019 1 次提交
  4. 31 5月, 2019 1 次提交
  5. 30 4月, 2019 1 次提交
    • N
      powerpc/64s: Reimplement book3s idle code in C · 10d91611
      Nicholas Piggin 提交于
      Reimplement Book3S idle code in C, moving POWER7/8/9 implementation
      speific HV idle code to the powernv platform code.
      
      Book3S assembly stubs are kept in common code and used only to save
      the stack frame and non-volatile GPRs before executing architected
      idle instructions, and restoring the stack and reloading GPRs then
      returning to C after waking from idle.
      
      The complex logic dealing with threads and subcores, locking, SPRs,
      HMIs, timebase resync, etc., is all done in C which makes it more
      maintainable.
      
      This is not a strict translation to C code, there are some
      significant differences:
      
      - Idle wakeup no longer uses the ->cpu_restore call to reinit SPRs,
        but saves and restores them itself.
      
      - The optimisation where EC=ESL=0 idle modes did not have to save GPRs
        or change MSR is restored, because it's now simple to do. ESL=1
        sleeps that do not lose GPRs can use this optimization too.
      
      - KVM secondary entry and cede is now more of a call/return style
        rather than branchy. nap_state_lost is not required because KVM
        always returns via NVGPR restoring path.
      
      - KVM secondary wakeup from offline sequence is moved entirely into
        the offline wakeup, which avoids a hwsync in the normal idle wakeup
        path.
      
      Performance measured with context switch ping-pong on different
      threads or cores, is possibly improved a small amount, 1-3% depending
      on stop state and core vs thread test for shallow states. Deep states
      it's in the noise compared with other latencies.
      
      KVM improvements:
      
      - Idle sleepers now always return to caller rather than branch out
        to KVM first.
      
      - This allows optimisations like very fast return to caller when no
        state has been lost.
      
      - KVM no longer requires nap_state_lost because it controls NVGPR
        save/restore itself on the way in and out.
      
      - The heavy idle wakeup KVM request check can be moved out of the
        normal host idle code and into the not-performance-critical offline
        code.
      
      - KVM nap code now returns from where it is called, which makes the
        flow a bit easier to follow.
      Reviewed-by: NGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      [mpe: Squash the KVM changes in]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      10d91611
  6. 21 4月, 2019 1 次提交
    • C
      powerpc/32s: Implement Kernel Userspace Access Protection · a68c31fc
      Christophe Leroy 提交于
      This patch implements Kernel Userspace Access Protection for
      book3s/32.
      
      Due to limitations of the processor page protection capabilities,
      the protection is only against writing. read protection cannot be
      achieved using page protection.
      
      The previous patch modifies the page protection so that RW user
      pages are RW for Key 0 and RO for Key 1, and it sets Key 0 for
      both user and kernel.
      
      This patch changes userspace segment registers are set to Ku 0
      and Ks 1. When kernel needs to write to RW pages, the associated
      segment register is then changed to Ks 0 in order to allow write
      access to the kernel.
      
      In order to avoid having the read all segment registers when
      locking/unlocking the access, some data is kept in the thread_struct
      and saved on stack on exceptions. The field identifies both the
      first unlocked segment and the first segment following the last
      unlocked one. When no segment is unlocked, it contains value 0.
      
      As the hash_page() function is not able to easily determine if a
      protfault is due to a bad kernel access to userspace, protfaults
      need to be handled by handle_page_fault when KUAP is set.
      Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      [mpe: Drop allow_read/write_to/from_user() as they're now in kup.h,
            and adapt allow_user_access() to do nothing when to == NULL]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      a68c31fc
  7. 23 2月, 2019 4 次提交
  8. 21 2月, 2019 1 次提交
  9. 31 10月, 2018 1 次提交
  10. 14 10月, 2018 2 次提交
    • N
      powerpc/64s/hash: Add a SLB preload cache · 5434ae74
      Nicholas Piggin 提交于
      When switching processes, currently all user SLBEs are cleared, and a
      few (exec_base, pc, and stack) are preloaded. In trivial testing with
      small apps, this tends to miss the heap and low 256MB segments, and it
      will also miss commonly accessed segments on large memory workloads.
      
      Add a simple round-robin preload cache that just inserts the last SLB
      miss into the head of the cache and preloads those at context switch
      time. Every 256 context switches, the oldest entry is removed from the
      cache to shrink the cache and require fewer slbmte if they are unused.
      
      Much more could go into this, including into the SLB entry reclaim
      side to track some LRU information etc, which would require a study of
      large memory workloads. But this is a simple thing we can do now that
      is an obvious win for common workloads.
      
      With the full series, process switching speed on the context_switch
      benchmark on POWER9/hash (with kernel speculation security masures
      disabled) increases from 140K/s to 178K/s (27%).
      
      POWER8 does not change much (within 1%), it's unclear why it does not
      see a big gain like POWER9.
      
      Booting to busybox init with 256MB segments has SLB misses go down
      from 945 to 69, and with 1T segments 900 to 21. These could almost all
      be eliminated by preloading a bit more carefully with ELF binary
      loading.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      5434ae74
    • N
      powerpc/64: Interrupts save PPR on stack rather than thread_struct · 4c2de74c
      Nicholas Piggin 提交于
      PPR is the odd register out when it comes to interrupt handling, it is
      saved in current->thread.ppr while all others are saved on the stack.
      
      The difficulty with this is that accessing thread.ppr can cause a SLB
      fault, but the SLB fault handler implementation in C change had
      assumed the normal exception entry handlers would not cause an SLB
      fault.
      
      Fix this by allocating room in the interrupt stack to save PPR.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      4c2de74c
  11. 03 10月, 2018 1 次提交
  12. 19 9月, 2018 1 次提交
    • N
      powerpc/64s/hash: Add a SLB preload cache · 89ca4e12
      Nicholas Piggin 提交于
      When switching processes, currently all user SLBEs are cleared, and a
      few (exec_base, pc, and stack) are preloaded. In trivial testing with
      small apps, this tends to miss the heap and low 256MB segments, and it
      will also miss commonly accessed segments on large memory workloads.
      
      Add a simple round-robin preload cache that just inserts the last SLB
      miss into the head of the cache and preloads those at context switch
      time. Every 256 context switches, the oldest entry is removed from the
      cache to shrink the cache and require fewer slbmte if they are unused.
      
      Much more could go into this, including into the SLB entry reclaim
      side to track some LRU information etc, which would require a study of
      large memory workloads. But this is a simple thing we can do now that
      is an obvious win for common workloads.
      
      With the full series, process switching speed on the context_switch
      benchmark on POWER9/hash (with kernel speculation security masures
      disabled) increases from 140K/s to 178K/s (27%).
      
      POWER8 does not change much (within 1%), it's unclear why it does not
      see a big gain like POWER9.
      
      Booting to busybox init with 256MB segments has SLB misses go down
      from 945 to 69, and with 1T segments 900 to 21. These could almost all
      be eliminated by preloading a bit more carefully with ELF binary
      loading.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      89ca4e12
  13. 30 7月, 2018 1 次提交
  14. 03 6月, 2018 2 次提交
  15. 31 3月, 2018 1 次提交
  16. 30 3月, 2018 2 次提交
  17. 20 1月, 2018 1 次提交
  18. 12 11月, 2017 2 次提交
  19. 02 7月, 2017 1 次提交
  20. 29 6月, 2017 1 次提交
  21. 19 6月, 2017 1 次提交
  22. 08 6月, 2017 1 次提交
  23. 05 5月, 2017 1 次提交
    • S
      powerpc/64e: Don't place the stack beyond TASK_SIZE · 61baf155
      Scott Wood 提交于
      Commit f4ea6dcb ("powerpc/mm: Enable mappings above 128TB") increased
      the task size on book3s, and introduced a mechanism to dynamically
      control whether a task uses these larger addresses.  While the change to
      the task size itself was ifdef-protected to only apply on book3s, the
      change to STACK_TOP_USER64 was not.  On book3e, this had the effect of
      trying to use addresses up to 128TiB for the stack despite a 64TiB task
      size limit -- which broke 64-bit userspace producing the following errors:
      
      Starting init: /sbin/init exists but couldn't execute it (error -14)
      Starting init: /bin/sh exists but couldn't execute it (error -14)
      Kernel panic - not syncing: No working init found.  Try passing init= option to kernel. See Linux Documentation/admin-guide/init.rst for guidance.
      
      Fixes: f4ea6dcb ("powerpc/mm: Enable mappings above 128TB")
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NScott Wood <oss@buserror.net>
      61baf155
  24. 01 4月, 2017 1 次提交
    • A
      powerpc/mm: Enable mappings above 128TB · f4ea6dcb
      Aneesh Kumar K.V 提交于
      Not all user space application is ready to handle wide addresses. It's
      known that at least some JIT compilers use higher bits in pointers to
      encode their information. It collides with valid pointers with 512TB
      addresses and leads to crashes.
      
      To mitigate this, we are not going to allocate virtual address space
      above 128TB by default.
      
      But userspace can ask for allocation from full address space by
      specifying hint address (with or without MAP_FIXED) above 128TB.
      
      If hint address set above 128TB, but MAP_FIXED is not specified, we try
      to look for unmapped area by specified address. If it's already
      occupied, we look for unmapped area in *full* address space, rather than
      from 128TB window.
      
      This approach helps to easily make application's memory allocator aware
      about large address space without manually tracking allocated virtual
      address space.
      
      This is going to be a per mmap decision. ie, we can have some mmaps with
      larger addresses and other that do not.
      
      A sample memory layout looks like:
      
        10000000-10010000 r-xp 00000000 fc:00 9057045          /home/max_addr_512TB
        10010000-10020000 r--p 00000000 fc:00 9057045          /home/max_addr_512TB
        10020000-10030000 rw-p 00010000 fc:00 9057045          /home/max_addr_512TB
        10029630000-10029660000 rw-p 00000000 00:00 0          [heap]
        7fff834a0000-7fff834b0000 rw-p 00000000 00:00 0
        7fff834b0000-7fff83670000 r-xp 00000000 fc:00 9177190  /lib/powerpc64le-linux-gnu/libc-2.23.so
        7fff83670000-7fff83680000 r--p 001b0000 fc:00 9177190  /lib/powerpc64le-linux-gnu/libc-2.23.so
        7fff83680000-7fff83690000 rw-p 001c0000 fc:00 9177190  /lib/powerpc64le-linux-gnu/libc-2.23.so
        7fff83690000-7fff836a0000 rw-p 00000000 00:00 0
        7fff836a0000-7fff836c0000 r-xp 00000000 00:00 0        [vdso]
        7fff836c0000-7fff83700000 r-xp 00000000 fc:00 9177193  /lib/powerpc64le-linux-gnu/ld-2.23.so
        7fff83700000-7fff83710000 r--p 00030000 fc:00 9177193  /lib/powerpc64le-linux-gnu/ld-2.23.so
        7fff83710000-7fff83720000 rw-p 00040000 fc:00 9177193  /lib/powerpc64le-linux-gnu/ld-2.23.so
        7fffdccf0000-7fffdcd20000 rw-p 00000000 00:00 0        [stack]
        1000000000000-1000000010000 rw-p 00000000 00:00 0
        1ffff83710000-1ffff83720000 rw-p 00000000 00:00 0
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      f4ea6dcb
  25. 31 3月, 2017 1 次提交
    • A
      powerpc/mm/hash: Increase VA range to 128TB · f6eedbba
      Aneesh Kumar K.V 提交于
      We update the hash linux page table layout such that we can support
      512TB. But we limit the TASK_SIZE to 128TB. We can switch to 128TB by
      default without conditional because that is the max virtual address
      supported by other architectures. We will later add a mechanism to
      on-demand increase the application's effective address range to 512TB.
      
      Having the page table layout changed to accommodate 512TB makes testing
      large memory configuration easier with less code changes to kernel
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      f6eedbba
  26. 31 1月, 2017 1 次提交
    • G
      powernv: Pass PSSCR value and mask to power9_idle_stop · 09206b60
      Gautham R. Shenoy 提交于
      The power9_idle_stop method currently takes only the requested stop
      level as a parameter and picks up the rest of the PSSCR bits from a
      hand-coded macro. This is not a very flexible design, especially when
      the firmware has the capability to communicate the psscr value and the
      mask associated with a particular stop state via device tree.
      
      This patch modifies the power9_idle_stop API to take as parameters the
      PSSCR value and the PSSCR mask corresponding to the stop state that
      needs to be set. These PSSCR value and mask are respectively obtained
      by parsing the "ibm,cpu-idle-state-psscr" and
      "ibm,cpu-idle-state-psscr-mask" fields from the device tree.
      
      In addition to this, the patch adds support for handling stop states
      for which ESL and EC bits in the PSSCR are zero. As per the
      architecture, a wakeup from these stop states resumes execution from
      the subsequent instruction as opposed to waking up at the System
      Vector.
      
      The older firmware sets only the Requested Level (RL) field in the
      psscr and psscr-mask exposed in the device tree. For older firmware
      where psscr-mask=0xf, this patch will set the default sane values that
      the set for for remaining PSSCR fields (i.e PSLL, MTL, ESL, EC, and
      TR). For the new firmware, the patch will validate that the invariants
      required by the ISA for the psscr values are maintained by the
      firmware.
      
      This skiboot patch that exports fully populated PSSCR values and the
      mask for all the stop states can be found here:
      https://lists.ozlabs.org/pipermail/skiboot/2016-September/004869.html
      
      [Optimize the number of instructions before entering STOP with
      ESL=EC=0, validate the PSSCR values provided by the firimware
      maintains the invariants required as per the ISA suggested by Balbir
      Singh]
      Acked-by: NBalbir Singh <bsingharora@gmail.com>
      Signed-off-by: NGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      09206b60
  27. 25 1月, 2017 1 次提交
  28. 17 11月, 2016 1 次提交
  29. 16 11月, 2016 2 次提交
    • C
      locking/core, arch: Remove cpu_relax_lowlatency() · 5bd0b85b
      Christian Borntraeger 提交于
      As there are no users left, we can remove cpu_relax_lowlatency()
      implementations from every architecture.
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Noam Camus <noamc@ezchip.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: linuxppc-dev@lists.ozlabs.org
      Cc: virtualization@lists.linux-foundation.org
      Cc: xen-devel@lists.xenproject.org
      Cc: <linux-arch@vger.kernel.org>
      Link: http://lkml.kernel.org/r/1477386195-32736-6-git-send-email-borntraeger@de.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      5bd0b85b
    • C
      locking/core: Introduce cpu_relax_yield() · 79ab11cd
      Christian Borntraeger 提交于
      For spinning loops people do often use barrier() or cpu_relax().
      For most architectures cpu_relax and barrier are the same, but on
      some architectures cpu_relax can add some latency.
      For example on power,sparc64 and arc, cpu_relax can shift the CPU
      towards other hardware threads in an SMT environment.
      On s390 cpu_relax does even more, it uses an hypercall to the
      hypervisor to give up the timeslice.
      In contrast to the SMT yielding this can result in larger latencies.
      In some places this latency is unwanted, so another variant
      "cpu_relax_lowlatency" was introduced. Before this is used in more
      and more places, lets revert the logic and provide a cpu_relax_yield
      that can be called in places where yielding is more important than
      latency. By default this is the same as cpu_relax on all architectures.
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Noam Camus <noamc@ezchip.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: linuxppc-dev@lists.ozlabs.org
      Cc: virtualization@lists.linux-foundation.org
      Cc: xen-devel@lists.xenproject.org
      Link: http://lkml.kernel.org/r/1477386195-32736-2-git-send-email-borntraeger@de.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      79ab11cd
  30. 14 11月, 2016 1 次提交
  31. 04 10月, 2016 2 次提交