1. 04 5月, 2011 2 次提交
    • P
      powerpc: Save Come-From Address Register (CFAR) in exception frame · 48404f2e
      Paul Mackerras 提交于
      Recent 64-bit server processors (POWER6 and POWER7) have a "Come-From
      Address Register" (CFAR), that records the address of the most recent
      branch or rfid (return from interrupt) instruction for debugging purposes.
      
      This saves the value of the CFAR in the exception entry code and stores
      it in the exception frame.  We also make xmon print the CFAR value in
      its register dump code.
      
      Rather than extend the pt_regs struct at this time, we steal the orig_gpr3
      field, which is only used for system calls, and use it for the CFAR value
      for all exceptions/interrupts other than system calls.  This means we
      don't save the CFAR on system calls, which is not a great problem since
      system calls tend not to happen unexpectedly, and also avoids adding the
      overhead of reading the CFAR to the system call entry path.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      48404f2e
    • T
      powerpc: Add Initiate Coprocessor Store Word (icswx) support · 851d2e2f
      Tseng-Hui (Frank) Lin 提交于
      Icswx is a PowerPC instruction to send data to a co-processor. On Book-S
      processors the LPAR_ID and process ID (PID) of the owning process are
      registered in the window context of the co-processor at initialization
      time. When the icswx instruction is executed the L2 generates a cop-reg
      transaction on PowerBus. The transaction has no address and the
      processor does not perform an MMU access to authenticate the transaction.
      The co-processor compares the LPAR_ID and the PID included in the
      transaction and the LPAR_ID and PID held in the window context to
      determine if the process is authorized to generate the transaction.
      
      The OS needs to assign a 16-bit PID for the process. This cop-PID needs
      to be updated during context switch. The cop-PID needs to be destroyed
      when the context is destroyed.
      Signed-off-by: NSonny Rao <sonnyrao@linux.vnet.ibm.com>
      Signed-off-by: NTseng-Hui (Frank) Lin <thlin@linux.vnet.ibm.com>
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      851d2e2f
  2. 27 4月, 2011 2 次提交
  3. 20 4月, 2011 1 次提交
  4. 12 4月, 2011 2 次提交
  5. 02 2月, 2011 1 次提交
  6. 29 11月, 2010 1 次提交
  7. 02 9月, 2010 1 次提交
    • A
      powerpc: Feature nop out reservation clear when stcx checks address · f89451fb
      Anton Blanchard 提交于
      The POWER architecture does not require stcx to check that it is operating
      on the same address as the larx. This means it is possible for an
      an exception handler to execute a larx, get a reservation, decide
      not to do the stcx and then return back with an active reservation. If the
      interrupted code was in the middle of a larx/stcx sequence the stcx could
      incorrectly succeed.
      
      All recent POWER CPUs check the address before letting the stcx succeed
      so we can create a CPU feature and nop it out. As Ben suggested, we can
      only do this in our syscall path because there is a remote possibility
      some kernel code gets interrupted by an exception that ends up operating
      on the same cacheline.
      
      Thanks to Paul Mackerras and Derek Williams for the idea.
      
      To test this I used a very simple null syscall (actually getppid) testcase
      at http://ozlabs.org/~anton/junkcode/null_syscall.c
      
      I tested against 2.6.35-git10 with the following changes against the
      pseries_defconfig:
      
      CONFIG_VIRT_CPU_ACCOUNTING=n
      CONFIG_AUDIT=n
      CONFIG_PPC_4K_PAGES=n
      CONFIG_PPC_64K_PAGES=y
      CONFIG_FORCE_MAX_ZONEORDER=9
      CONFIG_PPC_SUBPAGE_PROT=n
      CONFIG_FUNCTION_TRACER=n
      CONFIG_FUNCTION_GRAPH_TRACER=n
      CONFIG_IRQSOFF_TRACER=n
      CONFIG_STACK_TRACER=n
      
      to remove the overhead of virtual CPU accounting, syscall auditing and
      the ftrace mcount tracers. 64kB pages were enabled to minimise TLB misses.
      
      POWER6: +8.2%
      POWER7: +7.0%
      
      Another suggestion was to use a larx to something in the L1 instead of a stcx.
      This was almost as fast as removing the larx on POWER6, but only 3.5% faster
      on POWER7. We can use this to speed up the reservation clear in our
      exception exit code.
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      f89451fb
  8. 22 6月, 2010 1 次提交
    • K
      powerpc, hw_breakpoints: Implement hw_breakpoints for 64-bit server processors · 5aae8a53
      K.Prasad 提交于
      Implement perf-events based hw-breakpoint interfaces for PowerPC
      64-bit server (Book III S) processors.  This allows access to a
      given location to be used as an event that can be counted or
      profiled by the perf_events subsystem.
      
      This is done using the DABR (data breakpoint register), which can
      also be used for process debugging via ptrace.  When perf_event
      hw_breakpoint support is configured in, the perf_event subsystem
      manages the DABR and arbitrates access to it, and ptrace then
      creates a perf_event when it is requested to set a data breakpoint.
      
      [Adopted suggestions from Paul Mackerras <paulus@samba.org> to
      - emulate_step() all system-wide breakpoints and single-step only the
        per-task breakpoints
      - perform arch-specific cleanup before unregistration through
        arch_unregister_hw_breakpoint()
      ]
      Signed-off-by: NK.Prasad <prasad@linux.vnet.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      5aae8a53
  9. 09 6月, 2010 1 次提交
    • M
      powerpc: Enable asymmetric SMT scheduling on POWER7 · 76cbd8a8
      Michael Neuling 提交于
      The POWER7 core has dynamic SMT mode switching which is controlled by
      the hypervisor.  There are 3 SMT modes:
      	SMT1 uses thread  0
      	SMT2 uses threads 0 & 1
      	SMT4 uses threads 0, 1, 2 & 3
      When in any particular SMT mode, all threads have the same performance
      as each other (ie. at any moment in time, all threads perform the same).
      
      The SMT mode switching works such that when linux has threads 2 & 3 idle
      and 0 & 1 active, it will cede (H_CEDE hypercall) threads 2 and 3 in the
      idle loop and the hypervisor will automatically switch to SMT2 for that
      core (independent of other cores).  The opposite is not true, so if
      threads 0 & 1 are idle and 2 & 3 are active, we will stay in SMT4 mode.
      
      Similarly if thread 0 is active and threads 1, 2 & 3 are idle, we'll go
      into SMT1 mode.
      
      If we can get the core into a lower SMT mode (SMT1 is best), the threads
      will perform better (since they share less core resources).  Hence when
      we have idle threads, we want them to be the higher ones.
      
      This adds a feature bit for asymmetric packing to powerpc and then
      enables it on POWER7.
      Signed-off-by: NMichael Neuling <mikey@neuling.org>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: linuxppc-dev@ozlabs.org
      LKML-Reference: <20100608045702.31FB5CC8C7@localhost.localdomain>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      76cbd8a8
  10. 21 5月, 2010 1 次提交
  11. 05 5月, 2010 2 次提交
  12. 17 2月, 2010 1 次提交
    • A
      powerpc: Use lwsync for acquire barrier if CPU supports it · 5a0e9b57
      Anton Blanchard 提交于
      Nick Piggin discovered that lwsync barriers around locks were faster than isync
      on 970. That was a long time ago and I completely dropped the ball in testing
      his patches across other ppc64 processors.
      
      Turns out the idea helps on other chips. Using a microbenchmark that
      uses a lot of threads to contend on a global pthread mutex (and therefore a
      global futex), POWER6 improves 8% and POWER7 improves 2%. I checked POWER5
      and while I couldn't measure an improvement, there was no regression.
      
      This patch uses the lwsync patching code to replace the isyncs with lwsyncs
      on CPUs that support the instruction. We were marking POWER3 and RS64 as lwsync
      capable but in reality they treat it as a full sync (ie slow). Remove the
      CPU_FTR_LWSYNC bit from these CPUs so they continue to use the faster isync
      method.
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      5a0e9b57
  13. 17 3月, 2009 1 次提交
  14. 23 2月, 2009 1 次提交
  15. 21 12月, 2008 2 次提交
  16. 16 12月, 2008 1 次提交
  17. 05 11月, 2008 1 次提交
  18. 16 9月, 2008 1 次提交
    • M
      powerpc: Add new CPU feature: CPU_FTR_CP_USE_DCBTZ · 2a929436
      Mark Nelson 提交于
      Add a new CPU feature bit, CPU_FTR_CP_USE_DCBTZ, to be added to the
      64bit powerpc chips that benefit from having dcbt and dcbz
      instructions used in their memory copy routines.
      
      This will be used in a subsequent patch that updates copy_4K_page().
      The new bit is added to Cell, PPC970 and Power4 because they show
      better performance with the new copy_4K_page() when dcbt and dcbz
      instructions are used.
      Signed-off-by: NMark Nelson <markn@au1.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      2a929436
  19. 20 8月, 2008 1 次提交
  20. 04 8月, 2008 1 次提交
  21. 25 7月, 2008 1 次提交
  22. 15 7月, 2008 1 次提交
    • N
      powerpc: Add PPC_FEATURE_PSERIES_PERFMON_COMPAT · 0f473314
      Nathan Lynch 提交于
      Background from Maynard Johnson:
      As of POWER6, a set of 32 common events is defined that must be
      supported on all future POWER processors.  The main impetus for this
      compat set is the need to support partition migration, especially from
      processor P(n) to processor P(n+1), where performance software that's
      running in the new partition may not be knowledgeable about processor
      P(n+1).  If a performance tool determines it does not support the
      physical processor, but is told (via the
      PPC_FEATURE_PSERIES_PERFMON_COMPAT bit) that the processor supports
      the notion of the PMU compat set, then the performance tool can
      surface just those events to the user of the tool.
      
      PPC_FEATURE_PSERIES_PERFMON_COMPAT indicates that the PMU supports at
      least this basic subset of events which is compatible across POWER
      processor lines.
      Signed-off-by: NNathan Lynch <ntl@pobox.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      0f473314
  23. 09 7月, 2008 1 次提交
  24. 03 7月, 2008 1 次提交
    • K
      powerpc: Fixup lwsync at runtime · 2d1b2027
      Kumar Gala 提交于
      To allow for a single kernel image on e500 v1/v2/mc we need to fixup lwsync
      at runtime.  On e500v1/v2 lwsync causes an illop so we need to patch up
      the code.  We default to 'sync' since that is always safe and if the cpu
      is capable we will replace 'sync' with 'lwsync'.
      
      We introduce CPU_FTR_LWSYNC as a way to determine at runtime if this is
      needed.  This flag could be moved elsewhere since we dont really use it
      for the normal CPU_FTR purpose.
      
      Finally we only store the relative offset in the fixup section to keep it
      as small as possible rather than using a full fixup_entry.
      Signed-off-by: NKumar Gala <galak@kernel.crashing.org>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      2d1b2027
  25. 01 7月, 2008 3 次提交
  26. 30 6月, 2008 1 次提交
  27. 26 6月, 2008 2 次提交
  28. 19 6月, 2008 1 次提交
    • K
      powerpc/booke: Add support for new e500mc core · 3dfa8773
      Kumar Gala 提交于
      The new e500mc core from Freescale is based on the e500v2 but with the
      following changes:
      
      * Supports only the Enhanced Debug Architecture (DSRR0/1, etc)
      * Floating Point
      * No SPE
      * Supports lwsync
      * Doorbell Exceptions
      * Hypervisor
      * Cache line size is now 64-bytes (e500v1/v2 have a 32-byte cache line)
      Signed-off-by: NKumar Gala <galak@kernel.crashing.org>
      3dfa8773
  29. 06 2月, 2008 1 次提交
  30. 24 12月, 2007 1 次提交
  31. 13 11月, 2007 1 次提交
    • B
      [POWERPC] Avoid unpaired stwcx. on some processors · b64f87c1
      Becky Bruce 提交于
      The context switch code in the kernel issues a dummy stwcx. to clear the
      reservation, as recommended by the architecture.  However, some processors
      can have issues if this stwcx to address A occurs while the reservation
      is already held to a different address B.  To avoid this problem, the dummy
      stwcx. needs to be paired with a dummy lwarx to the same address.
      
      This adds the dummy lwarx, and creates a cpu feature bit to indicate
      which cpus are affected.  Tested on mpc8641_hpcn_defconfig in
      arch/powerpc; build tested in arch/ppc.
      Signed-off-by: NBecky Bruce <becky.bruce@freescale.com>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      b64f87c1
  32. 17 10月, 2007 1 次提交
    • O
      [POWERPC] Add 1TB workaround for PA6T · f66bce5e
      Olof Johansson 提交于
      PA6T has a bug where the slbie instruction does not honor the large
      segment bit.  As a result, we have to always use slbia when switching
      context.
      
      We don't have to worry about changing the slbie's during fault processing,
      since they should never be replacing one VSID with another using the
      same ESID.  I.e. there's no risk for inserting duplicate entries due to a
      failed slbie of the old entry.  So as long as we clear it out on context
      switch we should be fine.
      Signed-off-by: NOlof Johansson <olof@lixom.net>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      f66bce5e