1. 02 9月, 2010 2 次提交
    • P
      powerpc: Account time using timebase rather than PURR · cf9efce0
      Paul Mackerras 提交于
      Currently, when CONFIG_VIRT_CPU_ACCOUNTING is enabled, we use the
      PURR register for measuring the user and system time used by
      processes, as well as other related times such as hardirq and
      softirq times.  This turns out to be quite confusing for users
      because it means that a program will often be measured as taking
      less time when run on a multi-threaded processor (SMT2 or SMT4 mode)
      than it does when run on a single-threaded processor (ST mode), even
      though the program takes longer to finish.  The discrepancy is
      accounted for as stolen time, which is also confusing, particularly
      when there are no other partitions running.
      
      This changes the accounting to use the timebase instead, meaning that
      the reported user and system times are the actual number of real-time
      seconds that the program was executing on the processor thread,
      regardless of which SMT mode the processor is in.  Thus a program will
      generally show greater user and system times when run on a
      multi-threaded processor than on a single-threaded processor.
      
      On pSeries systems on POWER5 or later processors, we measure the
      stolen time (time when this partition wasn't running) using the
      hypervisor dispatch trace log.  We check for new entries in the
      log on every entry from user mode and on every transition from
      kernel process context to soft or hard IRQ context (i.e. when
      account_system_vtime() gets called).  So that we can correctly
      distinguish time stolen from user time and time stolen from system
      time, without having to check the log on every exit to user mode,
      we store separate timestamps for exit to user mode and entry from
      user mode.
      
      On systems that have a SPURR (POWER6 and POWER7), we read the SPURR
      in account_system_vtime() (as before), and then apportion the SPURR
      ticks since the last time we read it between scaled user time and
      scaled system time according to the relative proportions of user
      time and system time over the same interval.  This avoids having to
      read the SPURR on every kernel entry and exit.  On systems that have
      PURR but not SPURR (i.e., POWER5), we do the same using the PURR
      rather than the SPURR.
      
      This disables the DTL user interface in /sys/debug/kernel/powerpc/dtl
      for now since it conflicts with the use of the dispatch trace log
      by the time accounting code.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      cf9efce0
    • M
      powerpc: Move arch_sd_sibling_asym_packing() to smp.c · e1f0ece1
      Michael Neuling 提交于
      Simple cleanup by moving arch_sd_sibling_asym_packing from process.c to
      smp.c to save an #ifdef CONFIG_SMP
      
      No functionality change.
      Signed-off-by: NMichael Neuling <mikey@neuling.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      e1f0ece1
  2. 24 8月, 2010 2 次提交
  3. 18 8月, 2010 1 次提交
    • D
      Make do_execve() take a const filename pointer · d7627467
      David Howells 提交于
      Make do_execve() take a const filename pointer so that kernel_execve() compiles
      correctly on ARM:
      
      arch/arm/kernel/sys_arm.c:88: warning: passing argument 1 of 'do_execve' discards qualifiers from pointer target type
      
      This also requires the argv and envp arguments to be consted twice, once for
      the pointer array and once for the strings the array points to.  This is
      because do_execve() passes a pointer to the filename (now const) to
      copy_strings_kernel().  A simpler alternative would be to cast the filename
      pointer in do_execve() when it's passed to copy_strings_kernel().
      
      do_execve() may not change any of the strings it is passed as part of the argv
      or envp lists as they are some of them in .rodata, so marking these strings as
      const should be fine.
      
      Further kernel_execve() and sys_execve() need to be changed to match.
      
      This has been test built on x86_64, frv, arm and mips.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Tested-by: NRalf Baechle <ralf@linux-mips.org>
      Acked-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d7627467
  4. 14 8月, 2010 1 次提交
  5. 09 7月, 2010 1 次提交
    • B
      powerpc/book3e: Hack to get gdb moving along on Book3E 64-bit · a2e19811
      Benjamin Herrenschmidt 提交于
      Our handling of debug interrupts on Book3E 64-bit is not quite
      the way it should be just yet. This is a workaround to let gdb
      work at least for now. We ensure that when context switching,
      we set the appropriate DBCR0 value for the new task. We also
      make sure that we turn off MSR[DE] within the kernel, and set
      it as part of the bits that get set when going back to userspace.
      
      In the long run, we will probably set the userspace DBCR0 on the
      exception exit code path and ensure we have some proper kernel
      value to set on the way into the kernel, a bit like ppc32 does,
      but that will take more work.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      a2e19811
  6. 29 6月, 2010 1 次提交
  7. 22 6月, 2010 1 次提交
    • K
      powerpc, hw_breakpoints: Implement hw_breakpoints for 64-bit server processors · 5aae8a53
      K.Prasad 提交于
      Implement perf-events based hw-breakpoint interfaces for PowerPC
      64-bit server (Book III S) processors.  This allows access to a
      given location to be used as an event that can be counted or
      profiled by the perf_events subsystem.
      
      This is done using the DABR (data breakpoint register), which can
      also be used for process debugging via ptrace.  When perf_event
      hw_breakpoint support is configured in, the perf_event subsystem
      manages the DABR and arbitrates access to it, and ptrace then
      creates a perf_event when it is requested to set a data breakpoint.
      
      [Adopted suggestions from Paul Mackerras <paulus@samba.org> to
      - emulate_step() all system-wide breakpoints and single-step only the
        per-task breakpoints
      - perform arch-specific cleanup before unregistration through
        arch_unregister_hw_breakpoint()
      ]
      Signed-off-by: NK.Prasad <prasad@linux.vnet.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      5aae8a53
  8. 15 6月, 2010 1 次提交
  9. 09 6月, 2010 2 次提交
    • P
      powerpc: Exclude arch_sd_sibiling_asym_packing() on UP · 89275d59
      Peter Zijlstra 提交于
      Only SMP systems care about load-balance features, plus this
      saves some .text space on UP and also fixes the build.
      Reported-by: NMichael Ellerman <michael@ellerman.id.au>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Michael Neuling <mikey@neuling.org>
      LKML-Reference: <tip-76cbd8a8@git.kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      89275d59
    • M
      powerpc: Enable asymmetric SMT scheduling on POWER7 · 76cbd8a8
      Michael Neuling 提交于
      The POWER7 core has dynamic SMT mode switching which is controlled by
      the hypervisor.  There are 3 SMT modes:
      	SMT1 uses thread  0
      	SMT2 uses threads 0 & 1
      	SMT4 uses threads 0, 1, 2 & 3
      When in any particular SMT mode, all threads have the same performance
      as each other (ie. at any moment in time, all threads perform the same).
      
      The SMT mode switching works such that when linux has threads 2 & 3 idle
      and 0 & 1 active, it will cede (H_CEDE hypercall) threads 2 and 3 in the
      idle loop and the hypervisor will automatically switch to SMT2 for that
      core (independent of other cores).  The opposite is not true, so if
      threads 0 & 1 are idle and 2 & 3 are active, we will stay in SMT4 mode.
      
      Similarly if thread 0 is active and threads 1, 2 & 3 are idle, we'll go
      into SMT1 mode.
      
      If we can get the core into a lower SMT mode (SMT1 is best), the threads
      will perform better (since they share less core resources).  Hence when
      we have idle threads, we want them to be the higher ones.
      
      This adds a feature bit for asymmetric packing to powerpc and then
      enables it on POWER7.
      Signed-off-by: NMichael Neuling <mikey@neuling.org>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: linuxppc-dev@ozlabs.org
      LKML-Reference: <20100608045702.31FB5CC8C7@localhost.localdomain>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      76cbd8a8
  10. 05 5月, 2010 1 次提交
  11. 17 2月, 2010 2 次提交
    • D
      powerpc/booke: Add support for advanced debug registers · 3bffb652
      Dave Kleikamp 提交于
      powerpc/booke: Add support for advanced debug registers
      
      From: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
      
      Based on patches originally written by Torez Smith.
      
      This patch defines context switch and trap related functionality
      for BookE specific Debug Registers. It adds support to ptrace()
      for setting and getting BookE related Debug Registers
      Signed-off-by: NDave Kleikamp <shaggy@linux.vnet.ibm.com>
      Cc: Torez Smith  <lnxtorez@linux.vnet.ibm.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David Gibson <dwg@au1.ibm.com>
      Cc: Josh Boyer <jwboyer@linux.vnet.ibm.com>
      Cc: Kumar Gala <galak@kernel.crashing.org>
      Cc: Sergio Durigan Junior <sergiodj@br.ibm.com>
      Cc: Thiago Jung Bauermann <bauerman@br.ibm.com>
      Cc: linuxppc-dev list <Linuxppc-dev@ozlabs.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      3bffb652
    • D
      powerpc/booke: Introduce new CONFIG options for advanced debug registers · 172ae2e7
      Dave Kleikamp 提交于
      powerpc/booke: Introduce new CONFIG options for advanced debug registers
      
      From: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
      
      Introduce new config options to simplify the ifdefs pertaining to the
      advanced debug registers for booke and 40x processors:
      
      CONFIG_PPC_ADV_DEBUG_REGS - boolean: true for dac-based processors
      CONFIG_PPC_ADV_DEBUG_IACS - number of IAC registers
      CONFIG_PPC_ADV_DEBUG_DACS - number of DAC registers
      CONFIG_PPC_ADV_DEBUG_DVCS - number of DVC registers
      CONFIG_PPC_ADV_DEBUG_DAC_RANGE - DAC ranges supported
      
      Beginning conservatively, since I only have the facilities to test 440
      hardware.  I believe all 40x and booke platforms support at least 2 IAC
      and 2 DAC registers.  For 440, 4 IAC and 2 DVC registers are enabled, as
      well as the DAC ranges.
      Signed-off-by: NDave Kleikamp <shaggy@linux.vnet.ibm.com>
      Acked-by: NDavid Gibson <dwg@au1.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      172ae2e7
  12. 01 2月, 2010 1 次提交
  13. 27 10月, 2009 1 次提交
    • K
      powerpc: Fix compile errors found by new ppc64e_defconfig · ce7a35c7
      Kumar Gala 提交于
      Fix the following 3 issues:
      
      arch/powerpc/kernel/process.c: In function 'arch_randomize_brk':
      arch/powerpc/kernel/process.c:1183: error: 'mmu_highuser_ssize' undeclared (first use in this function)
      arch/powerpc/kernel/process.c:1183: error: (Each undeclared identifier is reported only once
      arch/powerpc/kernel/process.c:1183: error: for each function it appears in.)
      arch/powerpc/kernel/process.c:1183: error: 'MMU_SEGSIZE_1T' undeclared (first use in this function)
      
      In file included from arch/powerpc/kernel/setup_64.c:60:
      arch/powerpc/include/asm/mmu-hash64.h:132: error: redefinition of 'struct mmu_psize_def'
      arch/powerpc/include/asm/mmu-hash64.h:159: error: expected identifier or '(' before numeric constant
      arch/powerpc/include/asm/mmu-hash64.h:396: error: conflicting types for 'mm_context_t'
      arch/powerpc/include/asm/mmu-book3e.h:184: error: previous declaration of 'mm_context_t' was here
      
      cc1: warnings being treated as errors
      arch/powerpc/kernel/pci_64.c: In function 'pcibios_unmap_io_space':
      arch/powerpc/kernel/pci_64.c:100: error: unused variable 'res'
      Signed-off-by: NKumar Gala <galak@kernel.crashing.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      ce7a35c7
  14. 14 10月, 2009 1 次提交
    • S
      powerpc/ftrace: show real return addresses in modules · 9135c3cc
      Steven Rostedt 提交于
      When the function graph tracer is enabled, it replaces the return address
      with a hook back to the tracer. This makes back traces see the hook instead
      of the actual return address.
      
      The current code also shows the real address by checking if the return
      address jumps to the return_to_handler. If it is, is also prints out
      the saved real return address.
      
      On powerpc64, some modules may return to mod_return_to_handler, which
      is not checked. This patch will also show the real address if a return
      is to mod_return_to_handler as well.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      9135c3cc
  15. 24 9月, 2009 1 次提交
    • A
      powerpc: Move 64bit heap above 1TB on machines with 1TB segments · 8bbde7a7
      Anton Blanchard 提交于
      If we are using 1TB segments and we are allowed to randomise the heap, we can
      put it above 1TB so it is backed by a 1TB segment. Otherwise the heap will be
      in the bottom 1TB which always uses 256MB segments and this may result in a
      performance penalty.
      
      This functionality is disabled when heap randomisation is turned off:
      
      echo 1 > /proc/sys/kernel/randomize_va_space
      
      which may be useful when trying to allocate the maximum amount of 16M or 16G
      pages.
      
      On a microbenchmark that repeatedly touches 32GB of memory with a stride of
      256MB + 4kB (designed to stress 256MB segments while still mapping nicely into
      the L1 cache), we see the improvement:
      
      Force malloc to use heap all the time:
      # export MALLOC_MMAP_MAX_=0 MALLOC_TRIM_THRESHOLD_=-1
      
      Disable heap randomization:
      # echo 1 > /proc/sys/kernel/randomize_va_space
      # time ./test
      12.51s
      
      Enable heap randomization:
      # echo 2 > /proc/sys/kernel/randomize_va_space
      # time ./test
      1.70s
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      8bbde7a7
  16. 11 9月, 2009 1 次提交
  17. 20 8月, 2009 1 次提交
  18. 26 6月, 2009 1 次提交
  19. 09 6月, 2009 1 次提交
  20. 03 4月, 2009 1 次提交
  21. 23 2月, 2009 4 次提交
  22. 31 12月, 2008 1 次提交
    • M
      [PATCH] idle cputime accounting · 79741dd3
      Martin Schwidefsky 提交于
      The cpu time spent by the idle process actually doing something is
      currently accounted as idle time. This is plain wrong, the architectures
      that support VIRT_CPU_ACCOUNTING=y can do better: distinguish between the
      time spent doing nothing and the time spent by idle doing work. The first
      is accounted with account_idle_time and the second with account_system_time.
      The architectures that use the account_xxx_time interface directly and not
      the account_xxx_ticks interface now need to do the check for the idle
      process in their arch code. In particular to improve the system vs true
      idle time accounting the arch code needs to measure the true idle time
      instead of just testing for the idle process.
      To improve the tick based accounting as well we would need an architecture
      primitive that can tell us if the pt_regs of the interrupted context
      points to the magic instruction that halts the cpu.
      
      In addition idle time is no more added to the stime of the idle process.
      This field now contains the system time of the idle process as it should
      be. On systems without VIRT_CPU_ACCOUNTING this will always be zero as
      every tick that occurs while idle is running will be accounted as idle
      time.
      
      This patch contains the necessary common code changes to be able to
      distinguish idle system time and true idle time. The architectures with
      support for VIRT_CPU_ACCOUNTING need some changes to exploit this.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      79741dd3
  23. 03 12月, 2008 2 次提交
  24. 04 8月, 2008 1 次提交
  25. 28 7月, 2008 1 次提交
  26. 26 7月, 2008 1 次提交
  27. 25 7月, 2008 1 次提交
  28. 15 7月, 2008 1 次提交
  29. 09 7月, 2008 1 次提交
  30. 03 7月, 2008 1 次提交
  31. 01 7月, 2008 2 次提交