1. 20 10月, 2007 10 次提交
    • P
      Task Control Groups: automatic userspace notification of idle cgroups · 81a6a5cd
      Paul Menage 提交于
      Add the following files to the cgroup filesystem:
      
      notify_on_release - configures/reports whether the cgroup subsystem should
      attempt to run a release script when this cgroup becomes unused
      
      release_agent - configures/reports the release agent to be used for this
      hierarchy (top level in each hierarchy only)
      
      releasable - reports whether this cgroup would have been auto-released if
      notify_on_release was true and a release agent was configured (mainly useful
      for debugging)
      
      To avoid locking issues, invoking the userspace release agent is done via a
      workqueue task; cgroups that need to have their release agents invoked by
      the workqueue task are linked on to a list.
      
      [pj@sgi.com: Need to include kmod.h]
      Signed-off-by: NPaul Menage <menage@google.com>
      Cc: Serge E. Hallyn <serue@us.ibm.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Dave Hansen <haveblue@us.ibm.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Paul Jackson <pj@sgi.com>
      Cc: Kirill Korotaev <dev@openvz.org>
      Cc: Herbert Poetzl <herbert@13thfloor.at>
      Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
      Cc: Cedric Le Goater <clg@fr.ibm.com>
      Signed-off-by: NPaul Jackson <pj@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      81a6a5cd
    • P
      Task Control Groups: shared cgroup subsystem group arrays · 817929ec
      Paul Menage 提交于
      Replace the struct css_set embedded in task_struct with a pointer; all tasks
      that have the same set of memberships across all hierarchies will share a
      css_set object, and will be linked via their css_sets field to the "tasks"
      list_head in the css_set.
      
      Assuming that many tasks share the same cgroup assignments, this reduces
      overall space usage and keeps the size of the task_struct down (three pointers
      added to task_struct compared to a non-cgroups kernel, no matter how many
      subsystems are registered).
      
      [akpm@linux-foundation.org: fix a printk]
      [akpm@linux-foundation.org: build fix]
      Signed-off-by: NPaul Menage <menage@google.com>
      Cc: Serge E. Hallyn <serue@us.ibm.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Dave Hansen <haveblue@us.ibm.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Paul Jackson <pj@sgi.com>
      Cc: Kirill Korotaev <dev@openvz.org>
      Cc: Herbert Poetzl <herbert@13thfloor.at>
      Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
      Cc: Cedric Le Goater <clg@fr.ibm.com>
      Cc: Serge E. Hallyn <serue@us.ibm.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Dave Hansen <haveblue@us.ibm.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Paul Jackson <pj@sgi.com>
      Cc: Kirill Korotaev <dev@openvz.org>
      Cc: Herbert Poetzl <herbert@13thfloor.at>
      Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
      Cc: Cedric Le Goater <clg@fr.ibm.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      817929ec
    • P
      Task Control Groups: add procfs interface · a424316c
      Paul Menage 提交于
      Add:
      
      /proc/cgroups - general system info
      
      /proc/*/cgroup - per-task cgroup membership info
      
      [a.p.zijlstra@chello.nl: cgroups: bdi init hooks]
      Signed-off-by: NPaul Menage <menage@google.com>
      Cc: Serge E. Hallyn <serue@us.ibm.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Dave Hansen <haveblue@us.ibm.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Paul Jackson <pj@sgi.com>
      Cc: Kirill Korotaev <dev@openvz.org>
      Cc: Herbert Poetzl <herbert@13thfloor.at>
      Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
      Cc: Cedric Le Goater <clg@fr.ibm.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a424316c
    • P
      Task Control Groups: add cgroup_clone() interface · 697f4161
      Paul Menage 提交于
      Add support for cgroup_clone(), a way to create new cgroups intended to
      be used for systems such as namespace unsharing.  A new subsystem callback,
      post_clone(), is added to allow subsystems to automatically configure cloned
      cgroups.
      Signed-off-by: NPaul Menage <menage@google.com>
      Cc: Serge E. Hallyn <serue@us.ibm.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Dave Hansen <haveblue@us.ibm.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Paul Jackson <pj@sgi.com>
      Cc: Kirill Korotaev <dev@openvz.org>
      Cc: Herbert Poetzl <herbert@13thfloor.at>
      Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
      Cc: Cedric Le Goater <clg@fr.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      697f4161
    • P
      Task Control Groups: add fork()/exit() hooks · b4f48b63
      Paul Menage 提交于
      This adds the necessary hooks to the fork() and exit() paths to ensure
      that new children inherit their parent's cgroup assignments, and that
      exiting processes release reference counts on their cgroups.
      Signed-off-by: NPaul Menage <menage@google.com>
      Cc: Serge E. Hallyn <serue@us.ibm.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Dave Hansen <haveblue@us.ibm.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Paul Jackson <pj@sgi.com>
      Cc: Kirill Korotaev <dev@openvz.org>
      Cc: Herbert Poetzl <herbert@13thfloor.at>
      Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
      Cc: Cedric Le Goater <clg@fr.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b4f48b63
    • P
      Add cgroup write_uint() helper method · 355e0c48
      Paul Menage 提交于
      Add write_uint() helper method for cgroup subsystems
      
      This helper is analagous to the read_uint() helper method for
      reporting u64 values to userspace. It's designed to reduce the amount
      of boilerplate requierd for creating new cgroup subsystems.
      Signed-off-by: NPaul Menage <menage@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      355e0c48
    • P
      Task Control Groups: add tasks file interface · bbcb81d0
      Paul Menage 提交于
      Add the per-directory "tasks" file for cgroupfs mounts; this allows the
      user to determine which tasks are members of a cgroup by reading a
      cgroup's "tasks", and to move a task into a cgroup by writing its pid to
      its "tasks".
      Signed-off-by: NPaul Menage <menage@google.com>
      Cc: Serge E. Hallyn <serue@us.ibm.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Dave Hansen <haveblue@us.ibm.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Paul Jackson <pj@sgi.com>
      Cc: Kirill Korotaev <dev@openvz.org>
      Cc: Herbert Poetzl <herbert@13thfloor.at>
      Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
      Cc: Cedric Le Goater <clg@fr.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bbcb81d0
    • P
      Task Control Groups: basic task cgroup framework · ddbcc7e8
      Paul Menage 提交于
      Generic Process Control Groups
      --------------------------
      
      There have recently been various proposals floating around for
      resource management/accounting and other task grouping subsystems in
      the kernel, including ResGroups, User BeanCounters, NSProxy
      cgroups, and others.  These all need the basic abstraction of being
      able to group together multiple processes in an aggregate, in order to
      track/limit the resources permitted to those processes, or control
      other behaviour of the processes, and all implement this grouping in
      different ways.
      
      This patchset provides a framework for tracking and grouping processes
      into arbitrary "cgroups" and assigning arbitrary state to those
      groupings, in order to control the behaviour of the cgroup as an
      aggregate.
      
      The intention is that the various resource management and
      virtualization/cgroup efforts can also become task cgroup
      clients, with the result that:
      
      - the userspace APIs are (somewhat) normalised
      
      - it's easier to test e.g. the ResGroups CPU controller in
       conjunction with the BeanCounters memory controller, or use either of
      them as the resource-control portion of a virtual server system.
      
      - the additional kernel footprint of any of the competing resource
       management systems is substantially reduced, since it doesn't need
       to provide process grouping/containment, hence improving their
       chances of getting into the kernel
      
      This patch:
      
      Add the main task cgroups framework - the cgroup filesystem, and the
      basic structures for tracking membership and associating subsystem state
      objects to tasks.
      Signed-off-by: NPaul Menage <menage@google.com>
      Cc: Serge E. Hallyn <serue@us.ibm.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Dave Hansen <haveblue@us.ibm.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Paul Jackson <pj@sgi.com>
      Cc: Kirill Korotaev <dev@openvz.org>
      Cc: Herbert Poetzl <herbert@13thfloor.at>
      Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
      Cc: Cedric Le Goater <clg@fr.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ddbcc7e8
    • P
      cpuset: zero malloc - revert the old cpuset fix · 55a230aa
      Paul Jackson 提交于
      The cpuset code to present a list of tasks using a cpuset to user space could
      write to an array that it had kmalloc'd, after a kmalloc request of zero size.
      
      The problem was that the code didn't check for writes past the allocated end
      of the array until -after- the first write.
      
      This is a race condition that is likely rare -- it would only show up if a
      cpuset went from being empty to having a task in it, during the brief time
      between the allocation and the first write.
      
      Prior to roughly 2.6.22 kernels, this was also a benign problem, because a
      zero kmalloc returned a few usable bytes anyway, and no harm was done with the
      bogus write.
      
      With the 2.6.22 kernel changes to make issue a warning if code tries to write
      to the location returned from a zero size allocation, this problem is no
      longer benign.  This cpuset code would occassionally trigger that warning.
      
      The fix is trivial -- check before storing into the array, not after, whether
      the array is big enough to hold the store.
      
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: "Serge E. Hallyn" <serue@us.ibm.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Dave Hansen <haveblue@us.ibm.com>
      Cc: Herbert Poetzl <herbert@13thfloor.at>
      Cc: Kirill Korotaev <dev@openvz.org>
      Cc: Paul Menage <menage@google.com>
      Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
      Cc: Christoph Lameter <clameter@sgi.com>
      Signed-off-by: NPaul Jackson <pj@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      55a230aa
    • A
      Add kernel/notifier.c · fe9d4f57
      Alexey Dobriyan 提交于
      There is separate notifier header, but no separate notifier .c file.
      
      Extract notifier code out of kernel/sys.c which will remain for
      misc syscalls I hope. Merge kernel/die_notifier.c into kernel/notifier.c.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NAlexey Dobriyan <adobriyan@sw.ru>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fe9d4f57
  2. 19 10月, 2007 30 次提交