1. 11 11月, 2008 1 次提交
    • F
      tracing, x86: add low level support for ftrace return tracing · caf4b323
      Frederic Weisbecker 提交于
      Impact: add infrastructure for function-return tracing
      
      Add low level support for ftrace return tracing.
      
      This plug-in stores return addresses on the thread_info structure of
      the current task.
      
      The index of the current return address is initialized when the task
      is the first one (init) and when a process forks (the child). It is
      not needed when a task does a sys_execve because after this syscall,
      it still needs to return on the kernel functions it called.
      
      Note that the code of return_to_handler has been suggested by Steven
      Rostedt as almost all of the ideas of improvements in this V3.
      
      For purpose of security, arch/x86/kernel/process_32.c is not traced
      because __switch_to() changes the current task during its execution.
      That could cause inconsistency in the stored return address of this
      function even if I didn't have any crash after testing with tracing on
      this function enabled.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      caf4b323
  2. 21 10月, 2008 1 次提交
  3. 20 10月, 2008 2 次提交
    • M
      container freezer: implement freezer cgroup subsystem · dc52ddc0
      Matt Helsley 提交于
      This patch implements a new freezer subsystem in the control groups
      framework.  It provides a way to stop and resume execution of all tasks in
      a cgroup by writing in the cgroup filesystem.
      
      The freezer subsystem in the container filesystem defines a file named
      freezer.state.  Writing "FROZEN" to the state file will freeze all tasks
      in the cgroup.  Subsequently writing "RUNNING" will unfreeze the tasks in
      the cgroup.  Reading will return the current state.
      
      * Examples of usage :
      
         # mkdir /containers/freezer
         # mount -t cgroup -ofreezer freezer  /containers
         # mkdir /containers/0
         # echo $some_pid > /containers/0/tasks
      
      to get status of the freezer subsystem :
      
         # cat /containers/0/freezer.state
         RUNNING
      
      to freeze all tasks in the container :
      
         # echo FROZEN > /containers/0/freezer.state
         # cat /containers/0/freezer.state
         FREEZING
         # cat /containers/0/freezer.state
         FROZEN
      
      to unfreeze all tasks in the container :
      
         # echo RUNNING > /containers/0/freezer.state
         # cat /containers/0/freezer.state
         RUNNING
      
      This is the basic mechanism which should do the right thing for user space
      task in a simple scenario.
      
      It's important to note that freezing can be incomplete.  In that case we
      return EBUSY.  This means that some tasks in the cgroup are busy doing
      something that prevents us from completely freezing the cgroup at this
      time.  After EBUSY, the cgroup will remain partially frozen -- reflected
      by freezer.state reporting "FREEZING" when read.  The state will remain
      "FREEZING" until one of these things happens:
      
      	1) Userspace cancels the freezing operation by writing "RUNNING" to
      		the freezer.state file
      	2) Userspace retries the freezing operation by writing "FROZEN" to
      		the freezer.state file (writing "FREEZING" is not legal
      		and returns EIO)
      	3) The tasks that blocked the cgroup from entering the "FROZEN"
      		state disappear from the cgroup's set of tasks.
      
      [akpm@linux-foundation.org: coding-style fixes]
      [akpm@linux-foundation.org: export thaw_process]
      Signed-off-by: NCedric Le Goater <clg@fr.ibm.com>
      Signed-off-by: NMatt Helsley <matthltc@us.ibm.com>
      Acked-by: NSerge E. Hallyn <serue@us.ibm.com>
      Tested-by: NMatt Helsley <matthltc@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      dc52ddc0
    • M
      container freezer: make refrigerator always available · 8174f150
      Matt Helsley 提交于
      Now that the TIF_FREEZE flag is available in all architectures, extract
      the refrigerator() and freeze_task() from kernel/power/process.c and make
      it available to all.
      
      The refrigerator() can now be used in a control group subsystem
      implementing a control group freezer.
      Signed-off-by: NCedric Le Goater <clg@fr.ibm.com>
      Signed-off-by: NMatt Helsley <matthltc@us.ibm.com>
      Acked-by: NSerge E. Hallyn <serue@us.ibm.com>
      Tested-by: NMatt Helsley <matthltc@us.ibm.com>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8174f150
  4. 14 10月, 2008 1 次提交
    • M
      tracing: Kernel Tracepoints · 97e1c18e
      Mathieu Desnoyers 提交于
      Implementation of kernel tracepoints. Inspired from the Linux Kernel
      Markers. Allows complete typing verification by declaring both tracing
      statement inline functions and probe registration/unregistration static
      inline functions within the same macro "DEFINE_TRACE". No format string
      is required. See the tracepoint Documentation and Samples patches for
      usage examples.
      
      Taken from the documentation patch :
      
      "A tracepoint placed in code provides a hook to call a function (probe)
      that you can provide at runtime. A tracepoint can be "on" (a probe is
      connected to it) or "off" (no probe is attached). When a tracepoint is
      "off" it has no effect, except for adding a tiny time penalty (checking
      a condition for a branch) and space penalty (adding a few bytes for the
      function call at the end of the instrumented function and adds a data
      structure in a separate section).  When a tracepoint is "on", the
      function you provide is called each time the tracepoint is executed, in
      the execution context of the caller. When the function provided ends its
      execution, it returns to the caller (continuing from the tracepoint
      site).
      
      You can put tracepoints at important locations in the code. They are
      lightweight hooks that can pass an arbitrary number of parameters, which
      prototypes are described in a tracepoint declaration placed in a header
      file."
      
      Addition and removal of tracepoints is synchronized by RCU using the
      scheduler (and preempt_disable) as guarantees to find a quiescent state
      (this is really RCU "classic"). The update side uses rcu_barrier_sched()
      with call_rcu_sched() and the read/execute side uses
      "preempt_disable()/preempt_enable()".
      
      We make sure the previous array containing probes, which has been
      scheduled for deletion by the rcu callback, is indeed freed before we
      proceed to the next update. It therefore limits the rate of modification
      of a single tracepoint to one update per RCU period. The objective here
      is to permit fast batch add/removal of probes on _different_
      tracepoints.
      
      Changelog :
      - Use #name ":" #proto as string to identify the tracepoint in the
        tracepoint table. This will make sure not type mismatch happens due to
        connexion of a probe with the wrong type to a tracepoint declared with
        the same name in a different header.
      - Add tracepoint_entry_free_old.
      - Change __TO_TRACE to get rid of the 'i' iterator.
      
      Masami Hiramatsu <mhiramat@redhat.com> :
      Tested on x86-64.
      
      Performance impact of a tracepoint : same as markers, except that it
      adds about 70 bytes of instructions in an unlikely branch of each
      instrumented function (the for loop, the stack setup and the function
      call). It currently adds a memory read, a test and a conditional branch
      at the instrumentation site (in the hot path). Immediate values will
      eventually change this into a load immediate, test and branch, which
      removes the memory read which will make the i-cache impact smaller
      (changing the memory read for a load immediate removes 3-4 bytes per
      site on x86_32 (depending on mov prefixes), or 7-8 bytes on x86_64, it
      also saves the d-cache hit).
      
      About the performance impact of tracepoints (which is comparable to
      markers), even without immediate values optimizations, tests done by
      Hideo Aoki on ia64 show no regression. His test case was using hackbench
      on a kernel where scheduler instrumentation (about 5 events in code
      scheduler code) was added.
      
      Quoting Hideo Aoki about Markers :
      
      I evaluated overhead of kernel marker using linux-2.6-sched-fixes git
      tree, which includes several markers for LTTng, using an ia64 server.
      
      While the immediate trace mark feature isn't implemented on ia64, there
      is no major performance regression. So, I think that we don't have any
      issues to propose merging marker point patches into Linus's tree from
      the viewpoint of performance impact.
      
      I prepared two kernels to evaluate. The first one was compiled without
      CONFIG_MARKERS. The second one was enabled CONFIG_MARKERS.
      
      I downloaded the original hackbench from the following URL:
      http://devresources.linux-foundation.org/craiger/hackbench/src/hackbench.c
      
      I ran hackbench 5 times in each condition and calculated the average and
      difference between the kernels.
      
          The parameter of hackbench: every 50 from 50 to 800
          The number of CPUs of the server: 2, 4, and 8
      
      Below is the results. As you can see, major performance regression
      wasn't found in any case. Even if number of processes increases,
      differences between marker-enabled kernel and marker- disabled kernel
      doesn't increase. Moreover, if number of CPUs increases, the differences
      doesn't increase either.
      
      Curiously, marker-enabled kernel is better than marker-disabled kernel
      in more than half cases, although I guess it comes from the difference
      of memory access pattern.
      
      * 2 CPUs
      
      Number of | without      | with         | diff     | diff    |
      processes | Marker [Sec] | Marker [Sec] |   [Sec]  |   [%]   |
      --------------------------------------------------------------
             50 |      4.811   |       4.872  |  +0.061  |  +1.27  |
            100 |      9.854   |      10.309  |  +0.454  |  +4.61  |
            150 |     15.602   |      15.040  |  -0.562  |  -3.6   |
            200 |     20.489   |      20.380  |  -0.109  |  -0.53  |
            250 |     25.798   |      25.652  |  -0.146  |  -0.56  |
            300 |     31.260   |      30.797  |  -0.463  |  -1.48  |
            350 |     36.121   |      35.770  |  -0.351  |  -0.97  |
            400 |     42.288   |      42.102  |  -0.186  |  -0.44  |
            450 |     47.778   |      47.253  |  -0.526  |  -1.1   |
            500 |     51.953   |      52.278  |  +0.325  |  +0.63  |
            550 |     58.401   |      57.700  |  -0.701  |  -1.2   |
            600 |     63.334   |      63.222  |  -0.112  |  -0.18  |
            650 |     68.816   |      68.511  |  -0.306  |  -0.44  |
            700 |     74.667   |      74.088  |  -0.579  |  -0.78  |
            750 |     78.612   |      79.582  |  +0.970  |  +1.23  |
            800 |     85.431   |      85.263  |  -0.168  |  -0.2   |
      --------------------------------------------------------------
      
      * 4 CPUs
      
      Number of | without      | with         | diff     | diff    |
      processes | Marker [Sec] | Marker [Sec] |   [Sec]  |   [%]   |
      --------------------------------------------------------------
             50 |      2.586   |       2.584  |  -0.003  |  -0.1   |
            100 |      5.254   |       5.283  |  +0.030  |  +0.56  |
            150 |      8.012   |       8.074  |  +0.061  |  +0.76  |
            200 |     11.172   |      11.000  |  -0.172  |  -1.54  |
            250 |     13.917   |      14.036  |  +0.119  |  +0.86  |
            300 |     16.905   |      16.543  |  -0.362  |  -2.14  |
            350 |     19.901   |      20.036  |  +0.135  |  +0.68  |
            400 |     22.908   |      23.094  |  +0.186  |  +0.81  |
            450 |     26.273   |      26.101  |  -0.172  |  -0.66  |
            500 |     29.554   |      29.092  |  -0.461  |  -1.56  |
            550 |     32.377   |      32.274  |  -0.103  |  -0.32  |
            600 |     35.855   |      35.322  |  -0.533  |  -1.49  |
            650 |     39.192   |      38.388  |  -0.804  |  -2.05  |
            700 |     41.744   |      41.719  |  -0.025  |  -0.06  |
            750 |     45.016   |      44.496  |  -0.520  |  -1.16  |
            800 |     48.212   |      47.603  |  -0.609  |  -1.26  |
      --------------------------------------------------------------
      
      * 8 CPUs
      
      Number of | without      | with         | diff     | diff    |
      processes | Marker [Sec] | Marker [Sec] |   [Sec]  |   [%]   |
      --------------------------------------------------------------
             50 |      2.094   |       2.072  |  -0.022  |  -1.07  |
            100 |      4.162   |       4.273  |  +0.111  |  +2.66  |
            150 |      6.485   |       6.540  |  +0.055  |  +0.84  |
            200 |      8.556   |       8.478  |  -0.078  |  -0.91  |
            250 |     10.458   |      10.258  |  -0.200  |  -1.91  |
            300 |     12.425   |      12.750  |  +0.325  |  +2.62  |
            350 |     14.807   |      14.839  |  +0.032  |  +0.22  |
            400 |     16.801   |      16.959  |  +0.158  |  +0.94  |
            450 |     19.478   |      19.009  |  -0.470  |  -2.41  |
            500 |     21.296   |      21.504  |  +0.208  |  +0.98  |
            550 |     23.842   |      23.979  |  +0.137  |  +0.57  |
            600 |     26.309   |      26.111  |  -0.198  |  -0.75  |
            650 |     28.705   |      28.446  |  -0.259  |  -0.9   |
            700 |     31.233   |      31.394  |  +0.161  |  +0.52  |
            750 |     34.064   |      33.720  |  -0.344  |  -1.01  |
            800 |     36.320   |      36.114  |  -0.206  |  -0.57  |
      --------------------------------------------------------------
      Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
      Acked-by: NMasami Hiramatsu <mhiramat@redhat.com>
      Acked-by: N'Peter Zijlstra' <peterz@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      97e1c18e
  5. 26 7月, 2008 1 次提交
    • A
      build kernel/profile.o only when requested · b03f6489
      Adrian Bunk 提交于
      Build kernel/profile.o only if CONFIG_PROFILING is enabled.
      
      This makes CONFIG_PROFILING=n kernels smaller.
      
      As a bonus, some profile_tick() calls and one branch from schedule() are
      now eliminated with CONFIG_PROFILING=n (but I doubt these are
      measurable effects).
      
      This patch changes the effects of CONFIG_PROFILING=n, but I don't think
      having more than two choices would be the better choice.
      
      This patch also adds the name of the first parameter to the prototypes
      of profile_{hits,tick}() since I anyway had to add them for the dummy
      functions.
      Signed-off-by: NAdrian Bunk <bunk@kernel.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b03f6489
  6. 18 7月, 2008 1 次提交
  7. 17 7月, 2008 1 次提交
  8. 11 7月, 2008 1 次提交
  9. 30 6月, 2008 1 次提交
  10. 26 6月, 2008 1 次提交
    • J
      Add generic helpers for arch IPI function calls · 3d442233
      Jens Axboe 提交于
      This adds kernel/smp.c which contains helpers for IPI function calls. In
      addition to supporting the existing smp_call_function() in a more efficient
      manner, it also adds a more scalable variant called smp_call_function_single()
      for calling a given function on a single CPU only.
      
      The core of this is based on the x86-64 patch from Nick Piggin, lots of
      changes since then. "Alan D. Brunelle" <Alan.Brunelle@hp.com> has
      contributed lots of fixes and suggestions as well. Also thanks to
      Paul E. McKenney <paulmck@linux.vnet.ibm.com> for reviewing RCU usage
      and getting rid of the data allocation fallback deadlock.
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Reviewed-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      3d442233
  11. 06 6月, 2008 2 次提交
  12. 24 5月, 2008 4 次提交
  13. 06 5月, 2008 1 次提交
    • P
      sched: add optional support for CONFIG_HAVE_UNSTABLE_SCHED_CLOCK · 3e51f33f
      Peter Zijlstra 提交于
      this replaces the rq->clock stuff (and possibly cpu_clock()).
      
       - architectures that have an 'imperfect' hardware clock can set
         CONFIG_HAVE_UNSTABLE_SCHED_CLOCK
      
       - the 'jiffie' window might be superfulous when we update tick_gtod
         before the __update_sched_clock() call in sched_clock_tick()
      
       - cpu_clock() might be implemented as:
      
           sched_clock_cpu(smp_processor_id())
      
         if the accuracy proves good enough - how far can TSC drift in a
         single jiffie when considering the filtering and idle hooks?
      
      [ mingo@elte.hu: various fixes and cleanups ]
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      3e51f33f
  14. 29 4月, 2008 1 次提交
  15. 18 4月, 2008 1 次提交
  16. 17 4月, 2008 1 次提交
  17. 09 2月, 2008 4 次提交
    • H
      avoid overflows in kernel/time.c · bdc80787
      H. Peter Anvin 提交于
      When the conversion factor between jiffies and milli- or microseconds is
      not a single multiply or divide, as for the case of HZ == 300, we currently
      do a multiply followed by a divide.  The intervening result, however, is
      subject to overflows, especially since the fraction is not simplified (for
      HZ == 300, we multiply by 300 and divide by 1000).
      
      This is exposed to the user when passing a large timeout to poll(), for
      example.
      
      This patch replaces the multiply-divide with a reciprocal multiplication on
      32-bit platforms.  When the input is an unsigned long, there is no portable
      way to do this on 64-bit platforms there is no portable way to do this
      since it requires a 128-bit intermediate result (which gcc does support on
      64-bit platforms but may generate libgcc calls, e.g.  on 64-bit s390), but
      since the output is a 32-bit integer in the cases affected, just simplify
      the multiply-divide (*3/10 instead of *300/1000).
      
      The reciprocal multiply used can have off-by-one errors in the upper half
      of the valid output range.  This could be avoided at the expense of having
      to deal with a potential 65-bit intermediate result.  Since the intent is
      to avoid overflow problems and most of the other time conversions are only
      semiexact, the off-by-one errors were considered an acceptable tradeoff.
      
      At Ralf Baechle's suggestion, this version uses a Perl script to compute
      the necessary constants.  We already have dependencies on Perl for kernel
      compiles.  This does, however, require the Perl module Math::BigInt, which
      is included in the standard Perl distribution starting with version 5.8.0.
      In order to support older versions of Perl, include a table of canned
      constants in the script itself, and structure the script so that
      Math::BigInt isn't required if pulling values from said table.
      
      Running the script requires that the HZ value is available from the
      Makefile.  Thus, this patch also adds the Kconfig variable CONFIG_HZ to the
      architectures which didn't already have it (alpha, cris, frv, h8300, m32r,
      m68k, m68knommu, sparc, v850, and xtensa.) It does *not* touch the sh or
      sh64 architectures, since Paul Mundt has dealt with those separately in the
      sh tree.
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>,
      Cc: Sam Ravnborg <sam@ravnborg.org>,
      Cc: Paul Mundt <lethal@linux-sh.org>,
      Cc: Richard Henderson <rth@twiddle.net>,
      Cc: Michael Starvik <starvik@axis.com>,
      Cc: David Howells <dhowells@redhat.com>,
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>,
      Cc: Hirokazu Takata <takata@linux-m32r.org>,
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>,
      Cc: Roman Zippel <zippel@linux-m68k.org>,
      Cc: William L. Irwin <sparclinux@vger.kernel.org>,
      Cc: Chris Zankel <chris@zankel.net>,
      Cc: H. Peter Anvin <hpa@zytor.com>,
      Cc: Jan Engelhardt <jengelh@computergmbh.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bdc80787
    • P
      namespaces: cleanup the code managed with PID_NS option · 74bd59bb
      Pavel Emelyanov 提交于
      Just like with the user namespaces, move the namespace management code into
      the separate .c file and mark the (already existing) PID_NS option as "depend
      on NAMESPACES"
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
      Acked-by: NSerge Hallyn <serue@us.ibm.com>
      Cc: Cedric Le Goater <clg@fr.ibm.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Herbert Poetzl <herbert@13thfloor.at>
      Cc: Kirill Korotaev <dev@sw.ru>
      Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      74bd59bb
    • P
      namespaces: cleanup the code managed with the USER_NS option · aee16ce7
      Pavel Emelyanov 提交于
      Make the user_namespace.o compilation depend on this option and move the
      init_user_ns into user.c file to make the kernel compile and work without the
      namespaces support.  This make the user namespace code be organized similar to
      other namespaces'.
      
      Also mask the USER_NS option as "depend on NAMESPACES".
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
      Acked-by: NSerge Hallyn <serue@us.ibm.com>
      Cc: Cedric Le Goater <clg@fr.ibm.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Herbert Poetzl <herbert@13thfloor.at>
      Cc: Kirill Korotaev <dev@sw.ru>
      Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      aee16ce7
    • P
      namespaces: move the UTS namespace under UTS_NS option · 58bfdd6d
      Pavel Emelyanov 提交于
      Currently all the namespace management code is in the kernel/utsname.c file,
      so just compile it out and make stubs in the appropriate header.
      
      The init namespace itself is in init/version.c and is in the kernel all the
      time.
      Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
      Acked-by: NSerge Hallyn <serue@us.ibm.com>
      Cc: Cedric Le Goater <clg@fr.ibm.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Herbert Poetzl <herbert@13thfloor.at>
      Cc: Kirill Korotaev <dev@sw.ru>
      Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      58bfdd6d
  18. 08 2月, 2008 1 次提交
  19. 06 2月, 2008 2 次提交
    • M
      latency.c: use QoS infrastructure · f011e2e2
      Mark Gross 提交于
      Replace latency.c use with pm_qos_params use.
      Signed-off-by: Nmark gross <mgross@linux.intel.com>
      Cc: "John W. Linville" <linville@tuxdriver.com>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Jaroslav Kysela <perex@suse.cz>
      Cc: Takashi Iwai <tiwai@suse.de>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f011e2e2
    • M
      pm qos infrastructure and interface · d82b3518
      Mark Gross 提交于
      The following patch is a generalization of the latency.c implementation done
      by Arjan last year.  It provides infrastructure for more than one parameter,
      and exposes a user mode interface for processes to register pm_qos
      expectations of processes.
      
      This interface provides a kernel and user mode interface for registering
      performance expectations by drivers, subsystems and user space applications on
      one of the parameters.
      
      Currently we have {cpu_dma_latency, network_latency, network_throughput} as
      the initial set of pm_qos parameters.
      
      The infrastructure exposes multiple misc device nodes one per implemented
      parameter.  The set of parameters implement is defined by pm_qos_power_init()
      and pm_qos_params.h.  This is done because having the available parameters
      being runtime configurable or changeable from a driver was seen as too easy to
      abuse.
      
      For each parameter a list of performance requirements is maintained along with
      an aggregated target value.  The aggregated target value is updated with
      changes to the requirement list or elements of the list.  Typically the
      aggregated target value is simply the max or min of the requirement values
      held in the parameter list elements.
      
      >From kernel mode the use of this interface is simple:
      
      pm_qos_add_requirement(param_id, name, target_value):
      
        Will insert a named element in the list for that identified PM_QOS
        parameter with the target value.  Upon change to this list the new target is
        recomputed and any registered notifiers are called only if the target value
        is now different.
      
      pm_qos_update_requirement(param_id, name, new_target_value):
      
        Will search the list identified by the param_id for the named list element
        and then update its target value, calling the notification tree if the
        aggregated target is changed.  with that name is already registered.
      
      pm_qos_remove_requirement(param_id, name):
      
        Will search the identified list for the named element and remove it, after
        removal it will update the aggregate target and call the notification tree
        if the target was changed as a result of removing the named requirement.
      
      >From user mode:
      
        Only processes can register a pm_qos requirement.  To provide for
        automatic cleanup for process the interface requires the process to register
        its parameter requirements in the following way:
      
        To register the default pm_qos target for the specific parameter, the
        process must open one of /dev/[cpu_dma_latency, network_latency,
        network_throughput]
      
        As long as the device node is held open that process has a registered
        requirement on the parameter.  The name of the requirement is
        "process_<PID>" derived from the current->pid from within the open system
        call.
      
        To change the requested target value the process needs to write a s32
        value to the open device node.  This translates to a
        pm_qos_update_requirement call.
      
        To remove the user mode request for a target value simply close the device
        node.
      
      [akpm@linux-foundation.org: fix warnings]
      [akpm@linux-foundation.org: fix build]
      [akpm@linux-foundation.org: fix build again]
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: Nmark gross <mgross@linux.intel.com>
      Cc: "John W. Linville" <linville@tuxdriver.com>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Jaroslav Kysela <perex@suse.cz>
      Cc: Takashi Iwai <tiwai@suse.de>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Venki Pallipadi <venkatesh.pallipadi@intel.com>
      Cc: Adam Belay <abelay@novell.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d82b3518
  20. 03 2月, 2008 1 次提交
    • P
      kobject: Always build in kernel/ksysfs.o. · dfacd68e
      Paul Mundt 提交于
      kernel/ksysfs.c seems to be a random dumping group for misc globals
      that the rest of the tree depend on. This has caused problems with
      exports in the past when sysfs is disabled, which can already be
      observed in commit-id 51107301.
      
      The latest one is the kernel_kobj usage, which presently results in:
      
      fs/built-in.o: In function `debugfs_init':
      inode.c:(.init.text+0xc34): undefined reference to `kernel_kobj'
      make: *** [.tmp_vmlinux1] Error 1
      
      kernel/ksysfs.c itself at this point only contains globals and some
      basic sysfs initialization, the sysfs initialization code is optimized
      out when we build with sysfs disabled. Given that, it's easier to just
      build in unconditionally, rather than trying to find some other random
      place to dump and initialize the globals.
      
      Additionally, the current trend seems to be decoupling of kobjects from
      sysfs, in which case it still makes sense to perform the kernel_kobj
      initialization that happens here even if sysfs is disabled, as
      lib/kobject.o is built-in unconditionally.
      Signed-off-by: NPaul Mundt <lethal@linux-sh.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      dfacd68e
  21. 30 1月, 2008 2 次提交
  22. 26 1月, 2008 3 次提交
  23. 15 11月, 2007 1 次提交
    • A
      revert "Task Control Groups: example CPU accounting subsystem" · cfb52856
      Andrew Morton 提交于
      Revert 62d0df64.
      
      This was originally intended as a simple initial example of how to create a
      control groups subsystem; it wasn't intended for mainline, but I didn't make
      this clear enough to Andrew.
      
      The CFS cgroup subsystem now has better functionality for the per-cgroup usage
      accounting (based directly on CFS stats) than the "usage" status file in this
      patch, and the "load" status file is rather simplistic - although having a
      per-cgroup load average report would be a useful feature, I don't believe this
      patch actually provides it.  If it gets into the final 2.6.24 we'd probably
      have to support this interface for ever.
      
      Cc: Paul Menage <menage@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cfb52856
  24. 21 10月, 2007 1 次提交
    • A
      [PATCH] audit: watching subtrees · 74c3cbe3
      Al Viro 提交于
      New kind of audit rule predicates: "object is visible in given subtree".
      The part that can be sanely implemented, that is.  Limitations:
      	* if you have hardlink from outside of tree, you'd better watch
      it too (or just watch the object itself, obviously)
      	* if you mount something under a watched tree, tell audit
      that new chunk should be added to watched subtrees
      	* if you umount something in a watched tree and it's still mounted
      elsewhere, you will get matches on events happening there.  New command
      tells audit to recalculate the trees, trimming such sources of false
      positives.
      
      Note that it's _not_ about path - if something mounted in several places
      (multiple mount, bindings, different namespaces, etc.), the match does
      _not_ depend on which one we are using for access.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      74c3cbe3
  25. 20 10月, 2007 4 次提交