1. 29 4月, 2008 5 次提交
    • B
      cgroups: add an owner to the mm_struct · cf475ad2
      Balbir Singh 提交于
      Remove the mem_cgroup member from mm_struct and instead adds an owner.
      
      This approach was suggested by Paul Menage.  The advantage of this approach
      is that, once the mm->owner is known, using the subsystem id, the cgroup
      can be determined.  It also allows several control groups that are
      virtually grouped by mm_struct, to exist independent of the memory
      controller i.e., without adding mem_cgroup's for each controller, to
      mm_struct.
      
      A new config option CONFIG_MM_OWNER is added and the memory resource
      controller selects this config option.
      
      This patch also adds cgroup callbacks to notify subsystems when mm->owner
      changes.  The mm_cgroup_changed callback is called with the task_lock() of
      the new task held and is called just prior to changing the mm->owner.
      
      I am indebted to Paul Menage for the several reviews of this patchset and
      helping me make it lighter and simpler.
      
      This patch was tested on a powerpc box, it was compiled with both the
      MM_OWNER config turned on and off.
      
      After the thread group leader exits, it's moved to init_css_state by
      cgroup_exit(), thus all future charges from runnings threads would be
      redirected to the init_css_set's subsystem.
      Signed-off-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
      Cc: Pavel Emelianov <xemul@openvz.org>
      Cc: Hugh Dickins <hugh@veritas.com>
      Cc: Sudhir Kumar <skumar@linux.vnet.ibm.com>
      Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
      Cc: Hirokazu Takahashi <taka@valinux.co.jp>
      Cc: David Rientjes <rientjes@google.com>,
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: NPekka Enberg <penberg@cs.helsinki.fi>
      Reviewed-by: NPaul Menage <menage@google.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cf475ad2
    • S
      cgroups: implement device whitelist · 08ce5f16
      Serge E. Hallyn 提交于
      Implement a cgroup to track and enforce open and mknod restrictions on device
      files.  A device cgroup associates a device access whitelist with each cgroup.
       A whitelist entry has 4 fields.  'type' is a (all), c (char), or b (block).
      'all' means it applies to all types and all major and minor numbers.  Major
      and minor are either an integer or * for all.  Access is a composition of r
      (read), w (write), and m (mknod).
      
      The root device cgroup starts with rwm to 'all'.  A child devcg gets a copy of
      the parent.  Admins can then remove devices from the whitelist or add new
      entries.  A child cgroup can never receive a device access which is denied its
      parent.  However when a device access is removed from a parent it will not
      also be removed from the child(ren).
      
      An entry is added using devices.allow, and removed using
      devices.deny.  For instance
      
      	echo 'c 1:3 mr' > /cgroups/1/devices.allow
      
      allows cgroup 1 to read and mknod the device usually known as
      /dev/null.  Doing
      
      	echo a > /cgroups/1/devices.deny
      
      will remove the default 'a *:* mrw' entry.
      
      CAP_SYS_ADMIN is needed to change permissions or move another task to a new
      cgroup.  A cgroup may not be granted more permissions than the cgroup's parent
      has.  Any task can move itself between cgroups.  This won't be sufficient, but
      we can decide the best way to adequately restrict movement later.
      
      [akpm@linux-foundation.org: coding-style fixes]
      [akpm@linux-foundation.org: fix may-be-used-uninitialized warning]
      Signed-off-by: NSerge E. Hallyn <serue@us.ibm.com>
      Acked-by: NJames Morris <jmorris@namei.org>
      Looks-good-to: Pavel Emelyanov <xemul@openvz.org>
      Cc: Daniel Hokka Zakrisson <daniel@hozac.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Paul Menage <menage@google.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      08ce5f16
    • P
      CGroup API files: make CGROUP_DEBUG default to off · 418d7d87
      Paul Menage 提交于
      The cgroup debug subsystem isn't generally useful for users.  It should
      default to "n".
      Signed-off-by: NPaul Menage <menage@google.com>
      Cc: "Li Zefan" <lizf@cn.fujitsu.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Paul Jackson <pj@sgi.com>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: "YAMAMOTO Takashi" <yamamoto@valinux.co.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      418d7d87
    • A
      let LOG_BUF_SHIFT default to 17 · f17a32e9
      Adrian Bunk 提交于
      16 kB is often no longer enough for a normal boot of an UP system.
      
      And even less when people e.g. use suspend.
      
      17 seems to be a more reasonable default for current kernels on current
      hardware (it's just the default, anyone who is memory limited can still lower
      it).
      Signed-off-by: NAdrian Bunk <bunk@kernel.org>
      Acked-by: NRandy Dunlap <randy.dunlap@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f17a32e9
    • I
      make CC_OPTIMIZE_FOR_SIZE non-experimental · 96fffeb4
      Ingo Molnar 提交于
      this option has been the default on a wide range of distributions
      for a long time - time to make it non-experimental.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      96fffeb4
  2. 27 4月, 2008 1 次提交
  3. 20 4月, 2008 1 次提交
  4. 14 4月, 2008 1 次提交
    • C
      slub: No need for per node slab counters if !SLUB_DEBUG · 0f389ec6
      Christoph Lameter 提交于
      The per node counters are used mainly for showing data through the sysfs API.
      If that API is not compiled in then there is no point in keeping track of this
      data. Disable counters for the number of slabs and the number of total slabs
      if !SLUB_DEBUG. Incrementing the per node counters is also accessing a
      potentially contended cacheline so this could actually be a performance
      benefit to embedded systems.
      
      SLABINFO support is also affected. It now must depends on SLUB_DEBUG (which
      is on by default).
      
      Patch also avoids a check for a NULL kmem_cache_node pointer in new_slab()
      if the system is not compiled with NUMA support.
      
      [penberg@cs.helsinki.fi: fix oops and move ->nr_slabs into CONFIG_SLUB_DEBUG]
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
      0f389ec6
  5. 11 3月, 2008 1 次提交
    • P
      rcu: move PREEMPT_RCU config option back under PREEMPT · 21bbb39c
      Paul E. McKenney 提交于
      The original preemptible-RCU patch put the choice between classic and
      preemptible RCU into kernel/Kconfig.preempt, which resulted in build failures
      on machines not supporting CONFIG_PREEMPT.  This choice was therefore moved to
      init/Kconfig, which worked, but placed the choice between classic and
      preemptible RCU at the top level, a very obtuse choice indeed.
      
      This patch changes from the Kconfig "choice" mechanism to a pair of booleans,
      only one of which (CONFIG_PREEMPT_RCU) is user-visible, and is located in
      kernel/Kconfig.preempt, where one would expect it to be.  The other
      (CONFIG_CLASSIC_RCU) is in init/Kconfig so that it is available to all
      architectures, hopefully avoiding build breakage.  Thanks to Roman Zippel for
      suggesting this approach.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      Cc: Dipankar Sarma <dipankar@in.ibm.com>
      Cc: Josh Triplett <josh@freedesktop.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Roman Zippel <zippel@linux-m68k.org>
      Cc: Sam Ravnborg <sam@ravnborg.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      21bbb39c
  6. 05 3月, 2008 3 次提交
  7. 24 2月, 2008 1 次提交
  8. 13 2月, 2008 1 次提交
  9. 10 2月, 2008 1 次提交
  10. 09 2月, 2008 5 次提交
  11. 08 2月, 2008 2 次提交
  12. 07 2月, 2008 1 次提交
  13. 06 2月, 2008 3 次提交
  14. 03 2月, 2008 2 次提交
    • M
      Move Kconfig.instrumentation to arch/Kconfig and init/Kconfig · 125e5645
      Mathieu Desnoyers 提交于
      Move the instrumentation Kconfig to
      
      arch/Kconfig for architecture dependent options
        - oprofile
        - kprobes
      
      and
      
      init/Kconfig for architecture independent options
        - profiling
        - markers
      
      Remove the "Instrumentation Support" menu. Everything moves to "General setup".
      Delete the kernel/Kconfig.instrumentation file.
      Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: NSam Ravnborg <sam@ravnborg.org>
      125e5645
    • M
      Create arch/Kconfig · fb32e03f
      Mathieu Desnoyers 提交于
      Puts the content of arch/Kconfig in the "General setup" menu.
      
      Linus:
      
      > Should it come with a re-duplication of it's content into each
      > architecture, which was the case previously ? The oprofile and kprobes
      > menu entries were litteraly cut and pasted from one architecture to
      > another. Should we put its content in init/Kconfig then ?
      
      I don't think it's a good idea to go back to making it per-architecture,
      although that extensive "depends on <list-of-archiectures-here>" might
      indicate that there certainly is room for cleanup there.
      
      And I don't think it's wrong keeping it in kernel/Kconfig.xyz per se, I
      just think it's wrong to (a) lump the code together when it really doesn't
      necessarily need to and (b) show it to users as some kind of choice that
      is tied together (whether it then has common code or not).
      
      On the per-architecture side, I do think it would be better to *not* have
      internal architecture knowledge in a generic file, and as such a line like
      
              depends on X86_32 || IA64 || PPC || S390 || SPARC64 || X86_64 || AVR32
      
      really shouldn't exist in a file like kernel/Kconfig.instrumentation.
      
      It would be much better to do
      
              depends on ARCH_SUPPORTS_KPROBES
      
      in that generic file, and then architectures that do support it would just
      have a
      
              bool ARCH_SUPPORTS_KPROBES
                      default y
      
      in *their* architecture files. That would seem to be much more logical,
      and is readable both for arch maintainers *and* for people who have no
      clue - and don't care - about which architecture is supposed to support
      which interface...
      
      Sam Ravnborg:
      
      Stuff it into a new file: arch/Kconfig
      We can then extend this file to include all the 'trailing'
      Kconfig things that are anyway equal for all ARCHs.
      
      But it should be kept clean - so if we introduce such a file
      then we should use ARCH_HAS_whatever in the arch specific Kconfig
      files to enable stuff that is not shared.
      
      [...]
      
      The above suggestion is actually not exactly the best way to do it...
      First the naming..
      A quick grep shows following usage today (in Kconfig files)
      ARCH_HAS        51
      ARCH_SUPPORTS   4
      HAVE_ARCH       7
      
      ARCH_HAS is the clear winner.
      
      In the common Kconfig file do:
      
      config FOO
              depends on ARCH_HAS_FOO
              bool "bla bla"
      
      config ARCH_HAS_FOO
              def_bool n
      
      In the arch specific Kconfig file in a suitable place do:
      
      config SUITABLE_OPTION
              select ARCH_HAS_FOO
      
      The naming of ARCH_HAS_ is fixed and shall be:
      ARCH_HAS_<config option it will enable>
      
      Only a single line added pr. architecture.
      And we will end up with a (maybe even commented) list of trivial selects.
      
      - Yet another update :
      
      Moving to HAVE_* now.
      Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
      Signed-off-by: NSam Ravnborg <sam@ravnborg.org>
      fb32e03f
  15. 01 2月, 2008 1 次提交
  16. 29 1月, 2008 1 次提交
  17. 28 1月, 2008 1 次提交
  18. 26 1月, 2008 1 次提交
  19. 25 1月, 2008 1 次提交
  20. 03 1月, 2008 1 次提交
  21. 03 12月, 2007 1 次提交
    • S
      sched: cpu accounting controller (V2) · d842de87
      Srivatsa Vaddagiri 提交于
      Commit cfb52856 removed a useful feature for
      us, which provided a cpu accounting resource controller.  This feature would be
      useful if someone wants to group tasks only for accounting purpose and doesnt
      really want to exercise any control over their cpu consumption.
      
      The patch below reintroduces the feature. It is based on Paul Menage's
      original patch (Commit 62d0df64), with
      these differences:
      
              - Removed load average information. I felt it needs more thought (esp
      	  to deal with SMP and virtualized platforms) and can be added for
      	  2.6.25 after more discussions.
              - Convert group cpu usage to be nanosecond accurate (as rest of the cfs
      	  stats are) and invoke cpuacct_charge() from the respective scheduler
      	  classes
      	- Make accounting scalable on SMP systems by splitting the usage
      	  counter to be per-cpu
      	- Move the code from kernel/cpu_acct.c to kernel/sched.c (since the
      	  code is not big enough to warrant a new file and also this rightly
      	  needs to live inside the scheduler. Also things like accessing
      	  rq->lock while reading cpu usage becomes easier if the code lived in
      	  kernel/sched.c)
      
      The patch also modifies the cpu controller not to provide the same accounting
      information.
      Tested-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
      
       Tested the patches on top of 2.6.24-rc3. The patches work fine. Ran
       some simple tests like cpuspin (spin on the cpu), ran several tasks in
       the same group and timed them. Compared their time stamps with
       cpuacct.usage.
      Signed-off-by: NSrivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
      Signed-off-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d842de87
  22. 23 11月, 2007 1 次提交
  23. 15 11月, 2007 2 次提交
    • E
      pidns: Place under CONFIG_EXPERIMENTAL · 57d5f66b
      Eric W. Biederman 提交于
      This is my trivial patch to swat innumerable little bugs with a single
      blow.
      
      After some intensive review (my apologies for not having gotten to this
      sooner) what we have looks like a good base to build on with the current
      pid namespace code but it is not complete, and it is still much to simple
      to find issues where the kernel does the wrong thing outside of the initial
      pid namespace.
      
      Until the dust settles and we are certain we have the ABI and the
      implementation is as correct as humanly possible let's keep process ID
      namespaces behind CONFIG_EXPERIMENTAL.
      
      Allowing us the option of fixing any ABI or other bugs we find as long as
      they are minor.
      
      Allowing users of the kernel to avoid those bugs simply by ensuring their
      kernel does not have support for multiple pid namespaces.
      
      [akpm@linux-foundation.org: coding-style cleanups]
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Cc: Cedric Le Goater <clg@fr.ibm.com>
      Cc: Adrian Bunk <bunk@kernel.org>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Kir Kolyshkin <kir@swsoft.com>
      Cc: Kirill Korotaev <dev@sw.ru>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      57d5f66b
    • A
      revert "Task Control Groups: example CPU accounting subsystem" · cfb52856
      Andrew Morton 提交于
      Revert 62d0df64.
      
      This was originally intended as a simple initial example of how to create a
      control groups subsystem; it wasn't intended for mainline, but I didn't make
      this clear enough to Andrew.
      
      The CFS cgroup subsystem now has better functionality for the per-cgroup usage
      accounting (based directly on CFS stats) than the "usage" status file in this
      patch, and the "load" status file is rather simplistic - although having a
      per-cgroup load average report would be a useful feature, I don't believe this
      patch actually provides it.  If it gets into the final 2.6.24 we'd probably
      have to support this interface for ever.
      
      Cc: Paul Menage <menage@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cfb52856
  24. 25 10月, 2007 1 次提交
  25. 21 10月, 2007 1 次提交
    • A
      [PATCH] audit: watching subtrees · 74c3cbe3
      Al Viro 提交于
      New kind of audit rule predicates: "object is visible in given subtree".
      The part that can be sanely implemented, that is.  Limitations:
      	* if you have hardlink from outside of tree, you'd better watch
      it too (or just watch the object itself, obviously)
      	* if you mount something under a watched tree, tell audit
      that new chunk should be added to watched subtrees
      	* if you umount something in a watched tree and it's still mounted
      elsewhere, you will get matches on events happening there.  New command
      tells audit to recalculate the trees, trimming such sources of false
      positives.
      
      Note that it's _not_ about path - if something mounted in several places
      (multiple mount, bindings, different namespaces, etc.), the match does
      _not_ depend on which one we are using for access.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      74c3cbe3