1. 29 6月, 2011 1 次提交
    • B
      powerpc/pseries: Re-implement HVSI as part of hvc_vio · 4d2bb3f5
      Benjamin Herrenschmidt 提交于
      On pseries machines, consoles are provided by the hypervisor using
      a low level get_chars/put_chars type interface. However, this is
      really just a transport to the service processor which implements
      them either as "raw" console (networked consoles, HMC, ...) or as
      "hvsi" serial ports.
      
      The later is a simple packet protocol on top of the raw character
      interface that is supposed to convey additional "serial port" style
      semantics. In practice however, all it does is provide a way to
      read the CD line and set/clear our DTR line, that's it.
      
      We currently implement the "raw" protocol as an hvc console backend
      (/dev/hvcN) and the "hvsi" protocol using a separate tty driver
      (/dev/hvsi0).
      
      However this is quite impractical. The arbitrary difference between
      the two type of devices has been a major source of user (and distro)
      confusion. Additionally, there's an additional mini -hvsi implementation
      in the pseries platform code for our low level debug console and early
      boot kernel messages, which means code duplication, though that low
      level variant is impractical as it's incapable of doing the initial
      protocol negociation to establish the link to the FSP.
      
      This essentially replaces the dedicated hvsi driver and the platform
      udbg code completely by extending the existing hvc_vio backend used
      in "raw" mode so that:
      
       - It now supports HVSI as well
       - We add support for hvc backend providing tiocm{get,set}
       - It also provides a udbg interface for early debug and boot console
      
      This is overall less code, though this will only be obvious once we
      remove the old "hvsi" driver, which is still available for now. When
      the old driver is enabled, the new code still kicks in for the low
      level udbg console, replacing the old mini implementation in the platform
      code, it just doesn't provide the higher level "hvc" interface.
      
      In addition to producing generally simler code, this has several benefits
      over our current situation:
      
       - The user/distro only has to deal with /dev/hvcN for the hypervisor
      console, avoiding all sort of confusion that has plagued us in the past
      
       - The tty, kernel and low level debug console all use the same code
      base which supports the full protocol establishment process, thus the
      console is now available much earlier than it used to be with the
      old HVSI driver. The kernel console works much earlier and udbg is
      available much earlier too. Hackers can enable a hard coded very-early
      debug console as well that works with HVSI (previously that was only
      supported for the "raw" mode).
      
      I've tried to keep the same semantics as hvsi relative to how I react
      to things like CD changes, with some subtle differences though:
      
       - I clear DTR on close if HUPCL is set
      
       - Current hvsi triggers a hangup if it detects a up->down transition
         on CD (you can still open a console with CD down). My new implementation
         triggers a hangup if the link to the FSP is severed, and severs it upon
         detecting a up->down transition on CD.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      4d2bb3f5
  2. 19 5月, 2011 1 次提交
    • N
      powerpc: Ensure dtl buffers do not cross 4k boundary · af442a1b
      Nishanth Aravamudan 提交于
      Future releases of fimrware will enforce a requirement that DTL buffers
      do not cross a 4k boundary. Commit
      127493d5 satisfies this requirement for
      CONFIG_VIRT_CPU_ACCOUNTING=y kernels, but if !CONFIG_VIRT_CPU_ACCOUNTING
      && CONFIG_DTL=y, the current code will fail at dtl registration time.
      Fix this by making the kmem cache from
      127493d5 visible outside of setup.c and
      using the same cache in both dtl.c and setup.c. This requires a bit of
      reorganization to ensure ordering of the kmem cache and buffer
      allocations.
      
      Note: Since firmware now limits the size of the buffer, I made
      dtl_buf_entries read-only in debugfs.
      
      Tested with upcoming firmware with the 4 combinations of
      CONFIG_VIRT_CPU_ACCOUNTING and CONFIG_DTL.
      Signed-off-by: NNishanth Aravamudan <nacc@us.ibm.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Anton Blanchard <anton@samba.org>
      Cc: linuxppc-dev@lists.ozlabs.org
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      af442a1b
  3. 04 5月, 2011 1 次提交
  4. 20 4月, 2011 1 次提交
    • B
      powerpc/xics: Rewrite XICS driver · 0b05ac6e
      Benjamin Herrenschmidt 提交于
      This is a significant rework of the XICS driver, too significant to
      conveniently break it up into a series of smaller patches to be honest.
      
      The driver is moved to a more generic location to allow new platforms
      to use it, and is broken up into separate ICP and ICS "backends". For
      now we have the native and "hypervisor" ICP backends and one common
      RTAS ICS backend.
      
      The driver supports one ICP backend instanciation, and many ICS ones,
      in order to accomodate future platforms with multiple possibly different
      interrupt "sources" mechanisms.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      0b05ac6e
  5. 18 4月, 2011 1 次提交
    • N
      powerpc/pseries: Use a kmem cache for DTL buffers · 127493d5
      Nishanth Aravamudan 提交于
      PAPR specifies that DTL buffers can not cross AMS environments (aka CMO
      in the PAPR) and can not cross a memory entitlement granule boundary
      (4k). This is found in section 14.11.3.2 H_REGISTER_VPA of the PAPR.
      kmalloc does not guarantee an alignment of the allocation, though,
      beyond 8 bytes (at least in my understanding). Create a special kmem
      cache for DTL buffers with the alignment requirement.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      127493d5
  6. 05 4月, 2011 1 次提交
  7. 29 3月, 2011 1 次提交
  8. 10 3月, 2011 1 次提交
  9. 02 9月, 2010 1 次提交
    • P
      powerpc: Account time using timebase rather than PURR · cf9efce0
      Paul Mackerras 提交于
      Currently, when CONFIG_VIRT_CPU_ACCOUNTING is enabled, we use the
      PURR register for measuring the user and system time used by
      processes, as well as other related times such as hardirq and
      softirq times.  This turns out to be quite confusing for users
      because it means that a program will often be measured as taking
      less time when run on a multi-threaded processor (SMT2 or SMT4 mode)
      than it does when run on a single-threaded processor (ST mode), even
      though the program takes longer to finish.  The discrepancy is
      accounted for as stolen time, which is also confusing, particularly
      when there are no other partitions running.
      
      This changes the accounting to use the timebase instead, meaning that
      the reported user and system times are the actual number of real-time
      seconds that the program was executing on the processor thread,
      regardless of which SMT mode the processor is in.  Thus a program will
      generally show greater user and system times when run on a
      multi-threaded processor than on a single-threaded processor.
      
      On pSeries systems on POWER5 or later processors, we measure the
      stolen time (time when this partition wasn't running) using the
      hypervisor dispatch trace log.  We check for new entries in the
      log on every entry from user mode and on every transition from
      kernel process context to soft or hard IRQ context (i.e. when
      account_system_vtime() gets called).  So that we can correctly
      distinguish time stolen from user time and time stolen from system
      time, without having to check the log on every exit to user mode,
      we store separate timestamps for exit to user mode and entry from
      user mode.
      
      On systems that have a SPURR (POWER6 and POWER7), we read the SPURR
      in account_system_vtime() (as before), and then apportion the SPURR
      ticks since the last time we read it between scaled user time and
      scaled system time according to the relative proportions of user
      time and system time over the same interval.  This avoids having to
      read the SPURR on every kernel entry and exit.  On systems that have
      PURR but not SPURR (i.e., POWER5), we do the same using the PURR
      rather than the SPURR.
      
      This disables the DTL user interface in /sys/debug/kernel/powerpc/dtl
      for now since it conflicts with the use of the dispatch trace log
      by the time accounting code.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      cf9efce0
  10. 21 5月, 2010 1 次提交
    • A
      powerpc: Use smt_snooze_delay=-1 to always busy loop · b878dc00
      Anton Blanchard 提交于
      Right now if we want to busy loop and not give up any time to the hypervisor
      we put a very large value into smt_snooze_delay. This is sometimes useful
      when running a single partition and you want to avoid any latencies due
      to the hypervisor or CPU power state transitions. While this works, it's a bit
      ugly - how big a number is enough now we have NO_HZ and can be idle for a very
      long time.
      
      The patch below makes smt_snooze_delay signed, and a negative value means loop
      forever:
      
      echo -1 > /sys/devices/system/cpu/cpu0/smt_snooze_delay
      
      This change shouldn't affect the existing userspace tools (eg ppc64_cpu), but
      I'm cc-ing Nathan just to be sure.
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      b878dc00
  11. 30 3月, 2010 1 次提交
    • T
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo 提交于
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6
  12. 11 9月, 2009 1 次提交
    • P
      powerpc: Fix bug where perf_counters breaks oprofile · a6dbf93a
      Paul Mackerras 提交于
      Currently there is a bug where if you use oprofile on a pSeries
      machine, then use perf_counters, then use oprofile again, oprofile
      will not work correctly; it will lose the PMU configuration the next
      time the hypervisor does a partition context switch, and thereafter
      won't count anything.
      
      Maynard Johnson identified the sequence causing the problem:
      - oprofile setup calls ppc_enable_pmcs(), which calls
        pseries_lpar_enable_pmcs, which tells the hypervisor that we want
        to use the PMU, and sets the "PMU in use" flag in the lppaca.
        This flag tells the hypervisor whether it needs to save and restore
        the PMU config.
      - The perf_counter code sets and clears the "PMU in use" flag directly
        as it context-switches the PMU between tasks, and leaves it clear
        when it finishes.
      - oprofile setup, called for a new oprofile run, calls ppc_enable_pmcs,
        which does nothing because it has already been called.  In particular
        it doesn't set the "PMU in use" flag.
      
      This fixes the problem by arranging for ppc_enable_pmcs to always set
      the "PMU in use" flag.  It makes the perf_counter code call
      ppc_enable_pmcs also rather than calling the lower-level function
      directly, and removes the setting of the "PMU in use" flag from
      pseries_lpar_enable_pmcs, since that is now done in its caller.
      
      This also removes the declaration of pasemi_enable_pmcs because it
      isn't defined anywhere.
      Reported-by: NMaynard Johnson <mpjohn@us.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Cc: <stable@kernel.org)
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      a6dbf93a
  13. 21 5月, 2009 1 次提交
  14. 07 9月, 2008 1 次提交
  15. 26 8月, 2008 1 次提交
  16. 18 8月, 2008 1 次提交
  17. 25 7月, 2008 1 次提交
  18. 14 5月, 2008 1 次提交
  19. 24 4月, 2008 2 次提交
  20. 18 4月, 2008 3 次提交
  21. 17 4月, 2008 1 次提交
  22. 26 3月, 2008 1 次提交
  23. 03 12月, 2007 1 次提交
  24. 20 11月, 2007 1 次提交
    • L
      [POWERPC] Fix RTAS os-term usage on kernel panic · a2b51812
      Linas Vepstas 提交于
      The rtas_os_term() routine was being called at the wrong time.
      The actual rtas call "os-term" will not ever return, and so
      calling it from the panic notifier is too early.  Instead,
      call it from the machine_reset() call.
      
      This splits the rtas_os_term() routine into two: one part to capture
      the kernel panic message, invoked during the panic notifier, and
      another part that is invoked during machine_reset().
      
      Prior to this patch, the os-term call was never being made,
      because panic_timeout was always non-zero.  Calling os-term
      helps keep the hypervisor happy!  We have to keep the hypervisor
      happy to avoid service, dump and error reporting problems.
      Signed-off-by: NLinas Vepstas <linas@austin.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      a2b51812
  25. 11 10月, 2007 1 次提交
  26. 22 7月, 2007 1 次提交
  27. 16 7月, 2007 1 次提交
  28. 14 6月, 2007 1 次提交
    • J
      [POWERPC] Donate idle CPU cycles on dedicated partitions · d8c391a5
      Jake Moilanen 提交于
      A Power6 can give up CPU cycles on a dedicated CPU (as opposed to a
      shared CPU) to other shared processors if the administrator asks for it
      (via the HMC).
      
      This enables that to work properly on P6.
      
      This just involves setting a bit in the CAS structure as well as the
      VPA.  To donate cycles, a CPU has to have all SMT threads idle and
      have the donate bit set in the VPA.  Then call H_CEDE.
      
      The reason why shared processors just aren't used is because dedicated
      CPUs are guaranteed an actual processor, yet the system is still able to
      increase the capacity of the shared CPU pool.
      
      Also rename the VPA's cpuctls_task_attrs field to a more accurate name.
      Signed-off-by: NJake Moilanen <moilanen@austin.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      d8c391a5
  29. 07 5月, 2007 1 次提交
  30. 27 4月, 2007 1 次提交
  31. 13 4月, 2007 2 次提交
  32. 09 3月, 2007 1 次提交
  33. 17 2月, 2007 1 次提交
  34. 14 2月, 2007 3 次提交