1. 02 9月, 2010 1 次提交
    • P
      powerpc: Account time using timebase rather than PURR · cf9efce0
      Paul Mackerras 提交于
      Currently, when CONFIG_VIRT_CPU_ACCOUNTING is enabled, we use the
      PURR register for measuring the user and system time used by
      processes, as well as other related times such as hardirq and
      softirq times.  This turns out to be quite confusing for users
      because it means that a program will often be measured as taking
      less time when run on a multi-threaded processor (SMT2 or SMT4 mode)
      than it does when run on a single-threaded processor (ST mode), even
      though the program takes longer to finish.  The discrepancy is
      accounted for as stolen time, which is also confusing, particularly
      when there are no other partitions running.
      
      This changes the accounting to use the timebase instead, meaning that
      the reported user and system times are the actual number of real-time
      seconds that the program was executing on the processor thread,
      regardless of which SMT mode the processor is in.  Thus a program will
      generally show greater user and system times when run on a
      multi-threaded processor than on a single-threaded processor.
      
      On pSeries systems on POWER5 or later processors, we measure the
      stolen time (time when this partition wasn't running) using the
      hypervisor dispatch trace log.  We check for new entries in the
      log on every entry from user mode and on every transition from
      kernel process context to soft or hard IRQ context (i.e. when
      account_system_vtime() gets called).  So that we can correctly
      distinguish time stolen from user time and time stolen from system
      time, without having to check the log on every exit to user mode,
      we store separate timestamps for exit to user mode and entry from
      user mode.
      
      On systems that have a SPURR (POWER6 and POWER7), we read the SPURR
      in account_system_vtime() (as before), and then apportion the SPURR
      ticks since the last time we read it between scaled user time and
      scaled system time according to the relative proportions of user
      time and system time over the same interval.  This avoids having to
      read the SPURR on every kernel entry and exit.  On systems that have
      PURR but not SPURR (i.e., POWER5), we do the same using the PURR
      rather than the SPURR.
      
      This disables the DTL user interface in /sys/debug/kernel/powerpc/dtl
      for now since it conflicts with the use of the dispatch trace log
      by the time accounting code.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      cf9efce0
  2. 21 5月, 2010 1 次提交
    • A
      powerpc: Use smt_snooze_delay=-1 to always busy loop · b878dc00
      Anton Blanchard 提交于
      Right now if we want to busy loop and not give up any time to the hypervisor
      we put a very large value into smt_snooze_delay. This is sometimes useful
      when running a single partition and you want to avoid any latencies due
      to the hypervisor or CPU power state transitions. While this works, it's a bit
      ugly - how big a number is enough now we have NO_HZ and can be idle for a very
      long time.
      
      The patch below makes smt_snooze_delay signed, and a negative value means loop
      forever:
      
      echo -1 > /sys/devices/system/cpu/cpu0/smt_snooze_delay
      
      This change shouldn't affect the existing userspace tools (eg ppc64_cpu), but
      I'm cc-ing Nathan just to be sure.
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      b878dc00
  3. 30 3月, 2010 1 次提交
    • T
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo 提交于
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6
  4. 11 9月, 2009 1 次提交
    • P
      powerpc: Fix bug where perf_counters breaks oprofile · a6dbf93a
      Paul Mackerras 提交于
      Currently there is a bug where if you use oprofile on a pSeries
      machine, then use perf_counters, then use oprofile again, oprofile
      will not work correctly; it will lose the PMU configuration the next
      time the hypervisor does a partition context switch, and thereafter
      won't count anything.
      
      Maynard Johnson identified the sequence causing the problem:
      - oprofile setup calls ppc_enable_pmcs(), which calls
        pseries_lpar_enable_pmcs, which tells the hypervisor that we want
        to use the PMU, and sets the "PMU in use" flag in the lppaca.
        This flag tells the hypervisor whether it needs to save and restore
        the PMU config.
      - The perf_counter code sets and clears the "PMU in use" flag directly
        as it context-switches the PMU between tasks, and leaves it clear
        when it finishes.
      - oprofile setup, called for a new oprofile run, calls ppc_enable_pmcs,
        which does nothing because it has already been called.  In particular
        it doesn't set the "PMU in use" flag.
      
      This fixes the problem by arranging for ppc_enable_pmcs to always set
      the "PMU in use" flag.  It makes the perf_counter code call
      ppc_enable_pmcs also rather than calling the lower-level function
      directly, and removes the setting of the "PMU in use" flag from
      pseries_lpar_enable_pmcs, since that is now done in its caller.
      
      This also removes the declaration of pasemi_enable_pmcs because it
      isn't defined anywhere.
      Reported-by: NMaynard Johnson <mpjohn@us.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Cc: <stable@kernel.org)
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      a6dbf93a
  5. 21 5月, 2009 1 次提交
  6. 07 9月, 2008 1 次提交
  7. 26 8月, 2008 1 次提交
  8. 18 8月, 2008 1 次提交
  9. 25 7月, 2008 1 次提交
  10. 14 5月, 2008 1 次提交
  11. 24 4月, 2008 2 次提交
  12. 18 4月, 2008 3 次提交
  13. 17 4月, 2008 1 次提交
  14. 26 3月, 2008 1 次提交
  15. 03 12月, 2007 1 次提交
  16. 20 11月, 2007 1 次提交
    • L
      [POWERPC] Fix RTAS os-term usage on kernel panic · a2b51812
      Linas Vepstas 提交于
      The rtas_os_term() routine was being called at the wrong time.
      The actual rtas call "os-term" will not ever return, and so
      calling it from the panic notifier is too early.  Instead,
      call it from the machine_reset() call.
      
      This splits the rtas_os_term() routine into two: one part to capture
      the kernel panic message, invoked during the panic notifier, and
      another part that is invoked during machine_reset().
      
      Prior to this patch, the os-term call was never being made,
      because panic_timeout was always non-zero.  Calling os-term
      helps keep the hypervisor happy!  We have to keep the hypervisor
      happy to avoid service, dump and error reporting problems.
      Signed-off-by: NLinas Vepstas <linas@austin.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      a2b51812
  17. 11 10月, 2007 1 次提交
  18. 22 7月, 2007 1 次提交
  19. 16 7月, 2007 1 次提交
  20. 14 6月, 2007 1 次提交
    • J
      [POWERPC] Donate idle CPU cycles on dedicated partitions · d8c391a5
      Jake Moilanen 提交于
      A Power6 can give up CPU cycles on a dedicated CPU (as opposed to a
      shared CPU) to other shared processors if the administrator asks for it
      (via the HMC).
      
      This enables that to work properly on P6.
      
      This just involves setting a bit in the CAS structure as well as the
      VPA.  To donate cycles, a CPU has to have all SMT threads idle and
      have the donate bit set in the VPA.  Then call H_CEDE.
      
      The reason why shared processors just aren't used is because dedicated
      CPUs are guaranteed an actual processor, yet the system is still able to
      increase the capacity of the shared CPU pool.
      
      Also rename the VPA's cpuctls_task_attrs field to a more accurate name.
      Signed-off-by: NJake Moilanen <moilanen@austin.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      d8c391a5
  21. 07 5月, 2007 1 次提交
  22. 27 4月, 2007 1 次提交
  23. 13 4月, 2007 2 次提交
  24. 09 3月, 2007 1 次提交
  25. 17 2月, 2007 1 次提交
  26. 14 2月, 2007 3 次提交
  27. 09 12月, 2006 2 次提交
  28. 08 12月, 2006 1 次提交
  29. 04 12月, 2006 2 次提交
    • L
      [POWERPC] Wrap cpu_die() with CONFIG_HOTPLUG_CPU · 088df4d2
      Linas Vepstas 提交于
      Per email discussion, it appears that rtas_stop_self()
      and pSeries_mach_cpu_die() should not be compiled if
      CONFIG_HOTPLUG_CPU is not defined. This patch adds
      #ifdefs around these bits of code.
      Signed-off-by: NLinas Vepstas <linas@austin.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      088df4d2
    • B
      [POWERPC] Make pci_read_irq_line the default · f90bb153
      Benjamin Herrenschmidt 提交于
      This patch reworks the way IRQs are fixed up on PCI for arch powerpc.
      
      It makes pci_read_irq_line() called by default in the PCI code for
      devices that are probed, and add an optional per-device fixup in
      ppc_md for platforms that really need to correct what they obtain
      from pci_read_irq_line().
      
      It also removes ppc_md.irq_bus_setup which was only used by pSeries
      and should not be needed anymore.
      
      I've also removed the pSeries s7a workaround as it can't work with
      the current interrupt code anyway. I'm trying to get one of these
      machines working so I can test a proper fix for that problem.
      
      I also haven't updated the old-style fixup code from 85xx_cds.c
      because it's actually buggy :) It assigns pci_dev->irq hard coded
      numbers which is no good with the new IRQ mapping code. It should
      at least use irq_create_mapping(NULL, hard_coded_number); and possibly
      also set_irq_type() to set them as level low.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      f90bb153
  30. 07 10月, 2006 1 次提交
  31. 05 10月, 2006 1 次提交
    • D
      IRQ: Maintain regs pointer globally rather than passing to IRQ handlers · 7d12e780
      David Howells 提交于
      Maintain a per-CPU global "struct pt_regs *" variable which can be used instead
      of passing regs around manually through all ~1800 interrupt handlers in the
      Linux kernel.
      
      The regs pointer is used in few places, but it potentially costs both stack
      space and code to pass it around.  On the FRV arch, removing the regs parameter
      from all the genirq function results in a 20% speed up of the IRQ exit path
      (ie: from leaving timer_interrupt() to leaving do_IRQ()).
      
      Where appropriate, an arch may override the generic storage facility and do
      something different with the variable.  On FRV, for instance, the address is
      maintained in GR28 at all times inside the kernel as part of general exception
      handling.
      
      Having looked over the code, it appears that the parameter may be handed down
      through up to twenty or so layers of functions.  Consider a USB character
      device attached to a USB hub, attached to a USB controller that posts its
      interrupts through a cascaded auxiliary interrupt controller.  A character
      device driver may want to pass regs to the sysrq handler through the input
      layer which adds another few layers of parameter passing.
      
      I've build this code with allyesconfig for x86_64 and i386.  I've runtested the
      main part of the code on FRV and i386, though I can't test most of the drivers.
      I've also done partial conversion for powerpc and MIPS - these at least compile
      with minimal configurations.
      
      This will affect all archs.  Mostly the changes should be relatively easy.
      Take do_IRQ(), store the regs pointer at the beginning, saving the old one:
      
      	struct pt_regs *old_regs = set_irq_regs(regs);
      
      And put the old one back at the end:
      
      	set_irq_regs(old_regs);
      
      Don't pass regs through to generic_handle_irq() or __do_IRQ().
      
      In timer_interrupt(), this sort of change will be necessary:
      
      	-	update_process_times(user_mode(regs));
      	-	profile_tick(CPU_PROFILING, regs);
      	+	update_process_times(user_mode(get_irq_regs()));
      	+	profile_tick(CPU_PROFILING);
      
      I'd like to move update_process_times()'s use of get_irq_regs() into itself,
      except that i386, alone of the archs, uses something other than user_mode().
      
      Some notes on the interrupt handling in the drivers:
      
       (*) input_dev() is now gone entirely.  The regs pointer is no longer stored in
           the input_dev struct.
      
       (*) finish_unlinks() in drivers/usb/host/ohci-q.c needs checking.  It does
           something different depending on whether it's been supplied with a regs
           pointer or not.
      
       (*) Various IRQ handler function pointers have been moved to type
           irq_handler_t.
      Signed-Off-By: NDavid Howells <dhowells@redhat.com>
      (cherry picked from 1b16e7ac850969f38b375e511e3fa2f474a33867 commit)
      7d12e780
  32. 04 10月, 2006 1 次提交