1. 28 1月, 2013 1 次提交
    • F
      cputime: Generic on-demand virtual cputime accounting · abf917cd
      Frederic Weisbecker 提交于
      If we want to stop the tick further idle, we need to be
      able to account the cputime without using the tick.
      
      Virtual based cputime accounting solves that problem by
      hooking into kernel/user boundaries.
      
      However implementing CONFIG_VIRT_CPU_ACCOUNTING require
      low level hooks and involves more overhead. But we already
      have a generic context tracking subsystem that is required
      for RCU needs by archs which plan to shut down the tick
      outside idle.
      
      This patch implements a generic virtual based cputime
      accounting that relies on these generic kernel/user hooks.
      
      There are some upsides of doing this:
      
      - This requires no arch code to implement CONFIG_VIRT_CPU_ACCOUNTING
      if context tracking is already built (already necessary for RCU in full
      tickless mode).
      
      - We can rely on the generic context tracking subsystem to dynamically
      (de)activate the hooks, so that we can switch anytime between virtual
      and tick based accounting. This way we don't have the overhead
      of the virtual accounting when the tick is running periodically.
      
      And one downside:
      
      - There is probably more overhead than a native virtual based cputime
      accounting. But this relies on hooks that are already set anyway.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Namhyung Kim <namhyung.kim@lge.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      abf917cd
  2. 15 11月, 2012 3 次提交
  3. 10 9月, 2012 3 次提交
  4. 07 9月, 2012 1 次提交
  5. 09 3月, 2012 2 次提交
    • G
      powerpc/eeh: Introduce EEH device · eb740b5f
      Gavin Shan 提交于
      Original EEH implementation depends on struct pci_dn heavily. However,
      EEH shouldn't depend on that actually because EEH needn't share much
      information with other PCI components. That's to say, EEH should have
      worked independently.
      
      The patch introduces struct eeh_dev so that EEH core components needn't
      be working based on struct pci_dn in future. Also, struct pci_dn, struct
      eeh_dev instances are created in dynamic fasion and the binding with EEH
      device, OF node, PCI device is implemented as well.
      
      The EEH devices are created after PHBs are detected and initialized, but
      PCI emunation hasn't started yet. Apart from that, PHB might be created
      dynamically through DLPAR component and the EEH devices should be creatd
      as well. Another case might be OF node is created dynamically by DR
      (Dynamic Reconfiguration), which has been defined by PAPR. For those OF
      nodes created by DR, EEH devices should be also created accordingly. The
      binding between EEH device and OF node is done while the EEH device is
      initially created.
      
      The binding between EEH device and PCI device should be done after PCI
      emunation is done. Besides, PCI hotplug also needs the binding so that
      the EEH devices could be traced from the newly coming PCI buses or PCI
      devices.
      Signed-off-by: NGavin Shan <shangw@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      eb740b5f
    • G
      powerpc/eeh: Platform dependent EEH operations · aa1e6374
      Gavin Shan 提交于
      EEH has been implemented on RTAS-compliant pSeries platform.
      That's to say, the EEH operations will be implemented through RTAS
      calls eventually. The situation limited feasible extension on EEH.
      In order to support EEH on multiple platforms like pseries and powernv
      simutaneously. We have to split the platform dependent EEH options
      up out of current implementation.
      
      The patch addresses supporting EEH on multiple platforms. The pseries
      platform dependent EEH operations will be abstracted by struct eeh_ops.
      EEH core components will be built based on the registered EEH operations.
      With the mechanism, what the individual platform needs to do is implement
      platform dependent EEH operations.
      
      For now, the pseries platform is covered under the mechanism. That means
      we have to think about other platforms to support EEH, like powernv.
      Besides, we only have framework for the mechanism and we have to implement
      it for pseries platform later.
      Signed-off-by: NGavin Shan <shangw@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      aa1e6374
  6. 24 2月, 2012 2 次提交
  7. 23 2月, 2012 2 次提交
    • K
      powerpc/mpic: Remove duplicate MPIC_WANTS_RESET flag · e55d7f73
      Kyle Moffett 提交于
      There are two separate flags controlling whether or not the MPIC is
      reset during initialization, which is completely unnecessary, and only
      one of them can be specified in the device tree.
      
      Also, most platforms in-tree right now do actually want to reset the
      MPIC during initialization anyways, which means lots of duplicate code
      passing the MPIC_WANTS_RESET flag.
      
      Fix all of the callers which currently do not pass the MPIC_WANTS_RESET
      flag to pass the MPIC_NO_RESET flag, then remove the MPIC_WANTS_RESET
      flag and make the code reset the MPIC by default.
      Signed-off-by: NKyle Moffett <Kyle.D.Moffett@boeing.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      e55d7f73
    • K
      powerpc/mpic: Remove MPIC_BROKEN_FRR_NIRQS and duplicate irq_count · 5019609f
      Kyle Moffett 提交于
      The mpic->irq_count variable is only used as a software error-checking
      limit to determine whether or not an IRQ number is valid.  In board code
      which does not manually specify an IRQ count to mpic_alloc(), i.e. 0, it
      is automatically detected from the number of ISUs and the ISU size.
      
      In practice, all hardware ends up with irq_count == num_sources, so all
      of the runtime checks on mpic->irq_count should just check the value of
      mpic->num_sources instead.
      
      When platform hardware does not correctly report the number of IRQs,
      which only happens on the MPC85xx/MPC86xx, the MPIC_BROKEN_FRR_NIRQS
      flag is used to override the detected value of num_sources with the
      manual irq_count parameter.  Since there's no need to manually specify
      the number of IRQs except in this case, the extra flag can be eliminated
      and the test changed to "irq_count != 0".
      Signed-off-by: NKyle Moffett <Kyle.D.Moffett@boeing.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      5019609f
  8. 08 12月, 2011 3 次提交
  9. 07 12月, 2011 1 次提交
  10. 01 11月, 2011 1 次提交
  11. 05 8月, 2011 1 次提交
  12. 29 6月, 2011 1 次提交
    • B
      powerpc/pseries: Re-implement HVSI as part of hvc_vio · 4d2bb3f5
      Benjamin Herrenschmidt 提交于
      On pseries machines, consoles are provided by the hypervisor using
      a low level get_chars/put_chars type interface. However, this is
      really just a transport to the service processor which implements
      them either as "raw" console (networked consoles, HMC, ...) or as
      "hvsi" serial ports.
      
      The later is a simple packet protocol on top of the raw character
      interface that is supposed to convey additional "serial port" style
      semantics. In practice however, all it does is provide a way to
      read the CD line and set/clear our DTR line, that's it.
      
      We currently implement the "raw" protocol as an hvc console backend
      (/dev/hvcN) and the "hvsi" protocol using a separate tty driver
      (/dev/hvsi0).
      
      However this is quite impractical. The arbitrary difference between
      the two type of devices has been a major source of user (and distro)
      confusion. Additionally, there's an additional mini -hvsi implementation
      in the pseries platform code for our low level debug console and early
      boot kernel messages, which means code duplication, though that low
      level variant is impractical as it's incapable of doing the initial
      protocol negociation to establish the link to the FSP.
      
      This essentially replaces the dedicated hvsi driver and the platform
      udbg code completely by extending the existing hvc_vio backend used
      in "raw" mode so that:
      
       - It now supports HVSI as well
       - We add support for hvc backend providing tiocm{get,set}
       - It also provides a udbg interface for early debug and boot console
      
      This is overall less code, though this will only be obvious once we
      remove the old "hvsi" driver, which is still available for now. When
      the old driver is enabled, the new code still kicks in for the low
      level udbg console, replacing the old mini implementation in the platform
      code, it just doesn't provide the higher level "hvc" interface.
      
      In addition to producing generally simler code, this has several benefits
      over our current situation:
      
       - The user/distro only has to deal with /dev/hvcN for the hypervisor
      console, avoiding all sort of confusion that has plagued us in the past
      
       - The tty, kernel and low level debug console all use the same code
      base which supports the full protocol establishment process, thus the
      console is now available much earlier than it used to be with the
      old HVSI driver. The kernel console works much earlier and udbg is
      available much earlier too. Hackers can enable a hard coded very-early
      debug console as well that works with HVSI (previously that was only
      supported for the "raw" mode).
      
      I've tried to keep the same semantics as hvsi relative to how I react
      to things like CD changes, with some subtle differences though:
      
       - I clear DTR on close if HUPCL is set
      
       - Current hvsi triggers a hangup if it detects a up->down transition
         on CD (you can still open a console with CD down). My new implementation
         triggers a hangup if the link to the FSP is severed, and severs it upon
         detecting a up->down transition on CD.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      4d2bb3f5
  13. 19 5月, 2011 1 次提交
    • N
      powerpc: Ensure dtl buffers do not cross 4k boundary · af442a1b
      Nishanth Aravamudan 提交于
      Future releases of fimrware will enforce a requirement that DTL buffers
      do not cross a 4k boundary. Commit
      127493d5 satisfies this requirement for
      CONFIG_VIRT_CPU_ACCOUNTING=y kernels, but if !CONFIG_VIRT_CPU_ACCOUNTING
      && CONFIG_DTL=y, the current code will fail at dtl registration time.
      Fix this by making the kmem cache from
      127493d5 visible outside of setup.c and
      using the same cache in both dtl.c and setup.c. This requires a bit of
      reorganization to ensure ordering of the kmem cache and buffer
      allocations.
      
      Note: Since firmware now limits the size of the buffer, I made
      dtl_buf_entries read-only in debugfs.
      
      Tested with upcoming firmware with the 4 combinations of
      CONFIG_VIRT_CPU_ACCOUNTING and CONFIG_DTL.
      Signed-off-by: NNishanth Aravamudan <nacc@us.ibm.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Anton Blanchard <anton@samba.org>
      Cc: linuxppc-dev@lists.ozlabs.org
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      af442a1b
  14. 04 5月, 2011 1 次提交
  15. 20 4月, 2011 1 次提交
    • B
      powerpc/xics: Rewrite XICS driver · 0b05ac6e
      Benjamin Herrenschmidt 提交于
      This is a significant rework of the XICS driver, too significant to
      conveniently break it up into a series of smaller patches to be honest.
      
      The driver is moved to a more generic location to allow new platforms
      to use it, and is broken up into separate ICP and ICS "backends". For
      now we have the native and "hypervisor" ICP backends and one common
      RTAS ICS backend.
      
      The driver supports one ICP backend instanciation, and many ICS ones,
      in order to accomodate future platforms with multiple possibly different
      interrupt "sources" mechanisms.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      0b05ac6e
  16. 18 4月, 2011 1 次提交
    • N
      powerpc/pseries: Use a kmem cache for DTL buffers · 127493d5
      Nishanth Aravamudan 提交于
      PAPR specifies that DTL buffers can not cross AMS environments (aka CMO
      in the PAPR) and can not cross a memory entitlement granule boundary
      (4k). This is found in section 14.11.3.2 H_REGISTER_VPA of the PAPR.
      kmalloc does not guarantee an alignment of the allocation, though,
      beyond 8 bytes (at least in my understanding). Create a special kmem
      cache for DTL buffers with the alignment requirement.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      127493d5
  17. 05 4月, 2011 1 次提交
  18. 29 3月, 2011 1 次提交
  19. 10 3月, 2011 1 次提交
  20. 02 9月, 2010 1 次提交
    • P
      powerpc: Account time using timebase rather than PURR · cf9efce0
      Paul Mackerras 提交于
      Currently, when CONFIG_VIRT_CPU_ACCOUNTING is enabled, we use the
      PURR register for measuring the user and system time used by
      processes, as well as other related times such as hardirq and
      softirq times.  This turns out to be quite confusing for users
      because it means that a program will often be measured as taking
      less time when run on a multi-threaded processor (SMT2 or SMT4 mode)
      than it does when run on a single-threaded processor (ST mode), even
      though the program takes longer to finish.  The discrepancy is
      accounted for as stolen time, which is also confusing, particularly
      when there are no other partitions running.
      
      This changes the accounting to use the timebase instead, meaning that
      the reported user and system times are the actual number of real-time
      seconds that the program was executing on the processor thread,
      regardless of which SMT mode the processor is in.  Thus a program will
      generally show greater user and system times when run on a
      multi-threaded processor than on a single-threaded processor.
      
      On pSeries systems on POWER5 or later processors, we measure the
      stolen time (time when this partition wasn't running) using the
      hypervisor dispatch trace log.  We check for new entries in the
      log on every entry from user mode and on every transition from
      kernel process context to soft or hard IRQ context (i.e. when
      account_system_vtime() gets called).  So that we can correctly
      distinguish time stolen from user time and time stolen from system
      time, without having to check the log on every exit to user mode,
      we store separate timestamps for exit to user mode and entry from
      user mode.
      
      On systems that have a SPURR (POWER6 and POWER7), we read the SPURR
      in account_system_vtime() (as before), and then apportion the SPURR
      ticks since the last time we read it between scaled user time and
      scaled system time according to the relative proportions of user
      time and system time over the same interval.  This avoids having to
      read the SPURR on every kernel entry and exit.  On systems that have
      PURR but not SPURR (i.e., POWER5), we do the same using the PURR
      rather than the SPURR.
      
      This disables the DTL user interface in /sys/debug/kernel/powerpc/dtl
      for now since it conflicts with the use of the dispatch trace log
      by the time accounting code.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      cf9efce0
  21. 21 5月, 2010 1 次提交
    • A
      powerpc: Use smt_snooze_delay=-1 to always busy loop · b878dc00
      Anton Blanchard 提交于
      Right now if we want to busy loop and not give up any time to the hypervisor
      we put a very large value into smt_snooze_delay. This is sometimes useful
      when running a single partition and you want to avoid any latencies due
      to the hypervisor or CPU power state transitions. While this works, it's a bit
      ugly - how big a number is enough now we have NO_HZ and can be idle for a very
      long time.
      
      The patch below makes smt_snooze_delay signed, and a negative value means loop
      forever:
      
      echo -1 > /sys/devices/system/cpu/cpu0/smt_snooze_delay
      
      This change shouldn't affect the existing userspace tools (eg ppc64_cpu), but
      I'm cc-ing Nathan just to be sure.
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      b878dc00
  22. 30 3月, 2010 1 次提交
    • T
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo 提交于
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6
  23. 11 9月, 2009 1 次提交
    • P
      powerpc: Fix bug where perf_counters breaks oprofile · a6dbf93a
      Paul Mackerras 提交于
      Currently there is a bug where if you use oprofile on a pSeries
      machine, then use perf_counters, then use oprofile again, oprofile
      will not work correctly; it will lose the PMU configuration the next
      time the hypervisor does a partition context switch, and thereafter
      won't count anything.
      
      Maynard Johnson identified the sequence causing the problem:
      - oprofile setup calls ppc_enable_pmcs(), which calls
        pseries_lpar_enable_pmcs, which tells the hypervisor that we want
        to use the PMU, and sets the "PMU in use" flag in the lppaca.
        This flag tells the hypervisor whether it needs to save and restore
        the PMU config.
      - The perf_counter code sets and clears the "PMU in use" flag directly
        as it context-switches the PMU between tasks, and leaves it clear
        when it finishes.
      - oprofile setup, called for a new oprofile run, calls ppc_enable_pmcs,
        which does nothing because it has already been called.  In particular
        it doesn't set the "PMU in use" flag.
      
      This fixes the problem by arranging for ppc_enable_pmcs to always set
      the "PMU in use" flag.  It makes the perf_counter code call
      ppc_enable_pmcs also rather than calling the lower-level function
      directly, and removes the setting of the "PMU in use" flag from
      pseries_lpar_enable_pmcs, since that is now done in its caller.
      
      This also removes the declaration of pasemi_enable_pmcs because it
      isn't defined anywhere.
      Reported-by: NMaynard Johnson <mpjohn@us.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Cc: <stable@kernel.org)
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      a6dbf93a
  24. 21 5月, 2009 1 次提交
  25. 07 9月, 2008 1 次提交
  26. 26 8月, 2008 1 次提交
  27. 18 8月, 2008 1 次提交
  28. 25 7月, 2008 1 次提交
  29. 14 5月, 2008 1 次提交
  30. 24 4月, 2008 2 次提交