1. 15 9月, 2012 1 次提交
    • D
      cgroup: Remove CGROUP_BUILTIN_SUBSYS_COUNT · be45c900
      Daniel Wagner 提交于
      CGROUP_BUILTIN_SUBSYS_COUNT is used as start index or stop index when
      looping over the subsys array looking either at the builtin or the
      module subsystems. Since all the builtin subsystems have an id which
      is lower then CGROUP_BUILTIN_SUBSYS_COUNT we know that any module will
      have an id larger than CGROUP_BUILTIN_SUBSYS_COUNT. In short the ids
      are sorted.
      
      We are about to change id assignment to happen only at compile time
      later in this series. That means we can't rely on the above trick
      since all ids will always be defined at compile time. Furthermore,
      ordering the builtin subsystems and the module subsystems is not
      really necessary.
      
      So we need a different way to know which subsystem is a builtin or a
      module one. We can use the subsys[]->module pointer for this. Any
      place where we need to know if a subsys is module we just check for
      the pointer. If it is NULL then the subsystem is a builtin one.
      
      With this we are able to drop the CGROUP_BUILTIN_SUBSYS_COUNT
      enum. Though we need to introduce a temporary placeholder so that we
      don't get a compilation error when only CONFIG_CGROUP is selected and
      no single controller. An empty enum definition is not valid. Later in
      this series we are able to remove the placeholder again.
      
      And with this change we get a fix for this:
      
      kernel/cgroup.c: In function ‘cgroup_load_subsys’:
      kernel/cgroup.c:4326:38: warning: array subscript is below array bounds [-Warray-bounds]
      
      when CONFIG_CGROUP=y and no built in controller was enabled.
      Signed-off-by: NDaniel Wagner <daniel.wagner@bmw-carit.de>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Cc: Gao feng <gaofeng@cn.fujitsu.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: John Fastabend <john.r.fastabend@intel.com>
      Cc: netdev@vger.kernel.org
      Cc: cgroups@vger.kernel.org
      be45c900
  2. 25 8月, 2012 1 次提交
    • A
      cgroup: add xattr support · 03b1cde6
      Aristeu Rozanski 提交于
      This is one of the items in the plumber's wish list.
      
      For use cases:
      
      >> What would the use case be for this?
      >
      > Attaching meta information to services, in an easily discoverable
      > way. For example, in systemd we create one cgroup for each service, and
      > could then store data like the main pid of the specific service as an
      > xattr on the cgroup itself. That way we'd have almost all service state
      > in the cgroupfs, which would make it possible to terminate systemd and
      > later restart it without losing any state information. But there's more:
      > for example, some very peculiar services cannot be terminated on
      > shutdown (i.e. fakeraid DM stuff) and it would be really nice if the
      > services in question could just mark that on their cgroup, by setting an
      > xattr. On the more desktopy side of things there are other
      > possibilities: for example there are plans defining what an application
      > is along the lines of a cgroup (i.e. an app being a collection of
      > processes). With xattrs one could then attach an icon or human readable
      > program name on the cgroup.
      >
      > The key idea is that this would allow attaching runtime meta information
      > to cgroups and everything they model (services, apps, vms), that doesn't
      > need any complex userspace infrastructure, has good access control
      > (i.e. because the file system enforces that anyway, and there's the
      > "trusted." xattr namespace), notifications (inotify), and can easily be
      > shared among applications.
      >
      > Lennart
      
      v7:
      - no changes
      v6:
      - remove user xattr namespace, only allow trusted and security
      v5:
      - check for capabilities before setting/removing xattrs
      v4:
      - no changes
      v3:
      - instead of config option, use mount option to enable xattr support
      Original-patch-by: NLi Zefan <lizefan@huawei.com>
      Cc: Li Zefan <lizefan@huawei.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Hillf Danton <dhillf@gmail.com>
      Cc: Lennart Poettering <lpoetter@redhat.com>
      Signed-off-by: NLi Zefan <lizefan@huawei.com>
      Signed-off-by: NAristeu Rozanski <aris@redhat.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      03b1cde6
  3. 07 6月, 2012 1 次提交
  4. 12 4月, 2012 1 次提交
  5. 02 4月, 2012 7 次提交
    • T
      cgroup: make css->refcnt clearing on cgroup removal optional · 48ddbe19
      Tejun Heo 提交于
      Currently, cgroup removal tries to drain all css references.  If there
      are active css references, the removal logic waits and retries
      ->pre_detroy() until either all refs drop to zero or removal is
      cancelled.
      
      This semantics is unusual and adds non-trivial complexity to cgroup
      core and IMHO is fundamentally misguided in that it couples internal
      implementation details (references to internal data structure) with
      externally visible operation (rmdir).  To userland, this is a behavior
      peculiarity which is unnecessary and difficult to expect (css refs is
      otherwise invisible from userland), and, to policy implementations,
      this is an unnecessary restriction (e.g. blkcg wants to hold css refs
      for caching purposes but can't as that becomes visible as rmdir hang).
      
      Unfortunately, memcg currently depends on ->pre_destroy() retrials and
      cgroup removal vetoing and can't be immmediately switched to the new
      behavior.  This patch introduces the new behavior of not waiting for
      css refs to drain and maintains the old behavior for subsystems which
      have __DEPRECATED_clear_css_refs set.
      
      Once, memcg is updated, we can drop the code paths for the old
      behavior as proposed in the following patch.  Note that the following
      patch is incorrect in that dput work item is in cgroup and may lose
      some of dputs when multiples css's are released back-to-back, and
      __css_put() triggers check_for_release() when refcnt reaches 0 instead
      of 1; however, it shows what part can be removed.
      
        http://thread.gmane.org/gmane.linux.kernel.containers/22559/focus=75251
      
      Note that, in not-too-distant future, cgroup core will start emitting
      warning messages for subsys which require the old behavior, so please
      get moving.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizf@cn.fujitsu.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      48ddbe19
    • T
      cgroup: use negative bias on css->refcnt to block css_tryget() · 28b4c27b
      Tejun Heo 提交于
      When a cgroup is about to be removed, cgroup_clear_css_refs() is
      called to check and ensure that there are no active css references.
      
      This is currently achieved by dropping the refcnt to zero iff it has
      only the base ref.  If all css refs could be dropped to zero, ref
      clearing is successful and CSS_REMOVED is set on all css.  If not, the
      base ref is restored.  While css ref is zero w/o CSS_REMOVED set, any
      css_tryget() attempt on it busy loops so that they are atomic
      w.r.t. the whole css ref clearing.
      
      This does work but dropping and re-instating the base ref is somewhat
      hairy and makes it difficult to add more logic to the put path as
      there are two of them - the regular css_put() and the reversible base
      ref clearing.
      
      This patch updates css ref clearing such that blocking new
      css_tryget() and putting the base ref are separate operations.
      CSS_DEACT_BIAS, defined as INT_MIN, is added to css->refcnt and
      css_tryget() busy loops while refcnt is negative.  After all css refs
      are deactivated, if they were all one, ref clearing succeeded and
      CSS_REMOVED is set and the base ref is put using the regular
      css_put(); otherwise, CSS_DEACT_BIAS is subtracted from the refcnts
      and the original postive values are restored.
      
      css_refcnt() accessor which always returns the unbiased positive
      reference counts is added and used to simplify refcnt usages.  While
      at it, relocate and reformat comments in cgroup_has_css_refs().
      
      This separates css->refcnt deactivation and putting the base ref,
      which enables the next patch to make ref clearing optional.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizf@cn.fujitsu.com>
      28b4c27b
    • T
      cgroup: implement cgroup_rm_cftypes() · 79578621
      Tejun Heo 提交于
      Implement cgroup_rm_cftypes() which removes an array of cftypes from a
      subsystem.  It can be called whether the target subsys is attached or
      not.  cgroup core will remove the specified file from all existing
      cgroups.
      
      This will be used to improve sub-subsys modularity and will be helpful
      for unified hierarchy.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizf@cn.fujitsu.com>
      79578621
    • T
      cgroup: introduce struct cfent · 05ef1d7c
      Tejun Heo 提交于
      This patch adds cfent (cgroup file entry) which is the association
      between a cgroup and a file.  This is in-cgroup representation of
      files under a cgroup directory.  This simplifies walking walking
      cgroup files and thus cgroup_clear_directory(), which is now
      implemented in two parts - cgroup_rm_file() and a loop around it.
      
      cgroup_rm_file() will be used to implement cftype removal and cfent is
      scheduled to serve cgroup specific per-file data (e.g. for sysfs-like
      "sever" semantics).
      
      v2: - cfe was freed from cgroup_rm_file() which led to use-after-free
            if the file had openers at the time of removal.  Moved to
            cgroup_diput().
      
          - cgroup_clear_directory() triggered WARN_ON_ONCE() if d_subdirs
            wasn't empty after removing all files.  This triggered
            spuriously if some files were open during directory clearing.
            Removed.
      
      v3: - In cgroup_diput(), WARN_ONCE(!list_empty(&cfe->node)) could be
            spuriously triggered for root cgroups because they don't go
            through cgroup_clear_directory() on unmount.  Don't trigger WARN
            for root cgroups.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizf@cn.fujitsu.com>
      Cc: Glauber Costa <glommer@parallels.com>
      05ef1d7c
    • T
      cgroup: remove cgroup_add_file[s]() · db0416b6
      Tejun Heo 提交于
      No controller is using cgroup_add_files[s]().  Unexport them, and
      convert cgroup_add_files() to handle NULL entry terminated array
      instead of taking count explicitly and continue creation on failure
      for internal use.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizf@cn.fujitsu.com>
      db0416b6
    • T
      cgroup: implement cgroup_add_cftypes() and friends · 8e3f6541
      Tejun Heo 提交于
      Currently, cgroup directories are populated by subsys->populate()
      callback explicitly creating files on each cgroup creation.  This
      level of flexibility isn't needed or desirable.  It provides largely
      unused flexibility which call for abuses while severely limiting what
      the core layer can do through the lack of structure and conventions.
      
      Per each cgroup file type, the only distinction that cgroup users is
      making is whether a cgroup is root or not, which can easily be
      expressed with flags.
      
      This patch introduces cgroup_add_cftypes().  These deal with cftypes
      instead of individual files - controllers indicate that certain types
      of files exist for certain subsystem.  Newly added CFTYPE_*_ON_ROOT
      flags indicate whether a cftype should be excluded or created only on
      the root cgroup.
      
      cgroup_add_cftypes() can be called any time whether the target
      subsystem is currently attached or not.  cgroup core will create files
      on the existing cgroups as necessary.
      
      Also, cgroup_subsys->base_cftypes is added to ease registration of the
      base files for the subsystem.  If non-NULL on subsys init, the cftypes
      pointed to by ->base_cftypes are automatically registered on subsys
      init / load.
      
      Further patches will convert the existing users and remove the file
      based interface.  Note that this interface allows dynamic addition of
      files to an active controller.  This will be used for sub-controller
      modularity and unified hierarchy in the longer term.
      
      This patch implements the new mechanism but doesn't apply it to any
      user.
      
      v2: replaced DECLARE_CGROUP_CFTYPES[_COND]() with
          cgroup_subsys->base_cftypes, which works better for cgroup_subsys
          which is loaded as module.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizf@cn.fujitsu.com>
      8e3f6541
    • T
      cgroup: build list of all cgroups under a given cgroupfs_root · b0ca5a84
      Tejun Heo 提交于
      Build a list of all cgroups anchored at cgroupfs_root->allcg_list and
      going through cgroup->allcg_node.  The list is protected by
      cgroup_mutex and will be used to improve cgroup file handling.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizf@cn.fujitsu.com>
      b0ca5a84
  6. 22 3月, 2012 1 次提交
  7. 03 2月, 2012 1 次提交
    • L
      cgroup: remove cgroup_subsys argument from callbacks · 761b3ef5
      Li Zefan 提交于
      The argument is not used at all, and it's not necessary, because
      a specific callback handler of course knows which subsys it
      belongs to.
      
      Now only ->pupulate() takes this argument, because the handlers of
      this callback always call cgroup_add_file()/cgroup_add_files().
      
      So we reduce a few lines of code, though the shrinking of object size
      is minimal.
      
       16 files changed, 113 insertions(+), 162 deletions(-)
      
         text    data     bss     dec     hex filename
      5486240  656987 7039960 13183187         c928d3 vmlinux.o.orig
      5486170  656987 7039960 13183117         c9288d vmlinux.o
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      761b3ef5
  8. 21 1月, 2012 2 次提交
  9. 04 1月, 2012 1 次提交
  10. 13 12月, 2011 2 次提交
  11. 03 11月, 2011 1 次提交
    • A
      memcg: replace ss->id_lock with a rwlock · c1e2ee2d
      Andrew Bresticker 提交于
      While back-porting Johannes Weiner's patch "mm: memcg-aware global
      reclaim" for an internal effort, we noticed a significant performance
      regression during page-reclaim heavy workloads due to high contention of
      the ss->id_lock.  This lock protects idr map, and serializes calls to
      idr_get_next() in css_get_next() (which is used during the memcg hierarchy
      walk).
      
      Since idr_get_next() is just doing a look up, we need only serialize it
      with respect to idr_remove()/idr_get_new().  By making the ss->id_lock a
      rwlock, contention is greatly reduced and performance improves.
      
      Tested: cat a 256m file from a ramdisk in a 128m container 50 times on
      each core (one file + container per core) in parallel on a NUMA machine.
      Result is the time for the test to complete in 1 of the containers.
      Both kernels included Johannes' memcg-aware global reclaim patches.
      
      Before rwlock patch: 1710.778s
      After rwlock patch: 152.227s
      Signed-off-by: NAndrew Bresticker <abrestic@google.com>
      Cc: Paul Menage <menage@gmail.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Ying Han <yinghan@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c1e2ee2d
  12. 09 7月, 2011 1 次提交
  13. 27 5月, 2011 2 次提交
    • D
      cgroup: remove the ns_cgroup · a77aea92
      Daniel Lezcano 提交于
      The ns_cgroup is an annoying cgroup at the namespace / cgroup frontier and
      leads to some problems:
      
        * cgroup creation is out-of-control
        * cgroup name can conflict when pids are looping
        * it is not possible to have a single process handling a lot of
          namespaces without falling in a exponential creation time
        * we may want to create a namespace without creating a cgroup
      
        The ns_cgroup was replaced by a compatibility flag 'clone_children',
        where a newly created cgroup will copy the parent cgroup values.
        The userspace has to manually create a cgroup and add a task to
        the 'tasks' file.
      
      This patch removes the ns_cgroup as suggested in the following thread:
      
      https://lists.linux-foundation.org/pipermail/containers/2009-June/018616.html
      
      The 'cgroup_clone' function is removed because it is no longer used.
      
      This is a userspace-visible change.  Commit 45531757 ("cgroup: notify
      ns_cgroup deprecated") (merged into 2.6.27) caused the kernel to emit a
      printk warning users that the feature is planned for removal.  Since that
      time we have heard from XXX users who were affected by this.
      Signed-off-by: NDaniel Lezcano <daniel.lezcano@free.fr>
      Signed-off-by: NSerge E. Hallyn <serge.hallyn@canonical.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Jamal Hadi Salim <hadi@cyberus.ca>
      Reviewed-by: NLi Zefan <lizf@cn.fujitsu.com>
      Acked-by: NPaul Menage <menage@google.com>
      Acked-by: NMatt Helsley <matthltc@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a77aea92
    • B
      cgroups: add per-thread subsystem callbacks · f780bdb7
      Ben Blum 提交于
      Add cgroup subsystem callbacks for per-thread attachment in atomic contexts
      
      Add can_attach_task(), pre_attach(), and attach_task() as new callbacks
      for cgroups's subsystem interface.  Unlike can_attach and attach, these
      are for per-thread operations, to be called potentially many times when
      attaching an entire threadgroup.
      
      Also, the old "bool threadgroup" interface is removed, as replaced by
      this.  All subsystems are modified for the new interface - of note is
      cpuset, which requires from/to nodemasks for attach to be globally scoped
      (though per-cpuset would work too) to persist from its pre_attach to
      attach_task and attach.
      
      This is a pre-patch for cgroup-procs-writable.patch.
      Signed-off-by: NBen Blum <bblum@andrew.cmu.edu>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Matt Helsley <matthltc@us.ibm.com>
      Reviewed-by: NPaul Menage <menage@google.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Miao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f780bdb7
  14. 31 3月, 2011 1 次提交
  15. 16 2月, 2011 2 次提交
    • S
      perf: Add cgroup support · e5d1367f
      Stephane Eranian 提交于
      This kernel patch adds the ability to filter monitoring based on
      container groups (cgroups). This is for use in per-cpu mode only.
      
      The cgroup to monitor is passed as a file descriptor in the pid
      argument to the syscall. The file descriptor must be opened to
      the cgroup name in the cgroup filesystem. For instance, if the
      cgroup name is foo and cgroupfs is mounted in /cgroup, then the
      file descriptor is opened to /cgroup/foo. Cgroup mode is
      activated by passing PERF_FLAG_PID_CGROUP in the flags argument
      to the syscall.
      
      For instance to measure in cgroup foo on CPU1 assuming
      cgroupfs is mounted under /cgroup:
      
      struct perf_event_attr attr;
      int cgroup_fd, fd;
      
      cgroup_fd = open("/cgroup/foo", O_RDONLY);
      fd = perf_event_open(&attr, cgroup_fd, 1, -1, PERF_FLAG_PID_CGROUP);
      close(cgroup_fd);
      Signed-off-by: NStephane Eranian <eranian@google.com>
      [ added perf_cgroup_{exit,attach} ]
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <4d590250.114ddf0a.689e.4482@mx.google.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e5d1367f
    • P
      cgroup: Fix cgroup_subsys::exit callback · d41d5a01
      Peter Zijlstra 提交于
      Make the ::exit method act like ::attach, it is after all very nearly
      the same thing.
      
      The bug had no effect on correctness - fixing it is an optimization for
      the scheduler. Also, later perf-cgroups patches rely on it.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NPaul Menage <menage@google.com>
      LKML-Reference: <1297160655.13327.92.camel@laptop>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d41d5a01
  16. 02 11月, 2010 1 次提交
  17. 28 10月, 2010 1 次提交
    • D
      cgroup: add clone_children control file · 97978e6d
      Daniel Lezcano 提交于
      The ns_cgroup is a control group interacting with the namespaces.  When a
      new namespace is created, a corresponding cgroup is automatically created
      too.  The cgroup name is the pid of the process who did 'unshare' or the
      child of 'clone'.
      
      This cgroup is tied with the namespace because it prevents a process to
      escape the control group and use the post_clone callback, so the child
      cgroup inherits the values of the parent cgroup.
      
      Unfortunately, the more we use this cgroup and the more we are facing
      problems with it:
      
      (1) when a process unshares, the cgroup name may conflict with a
          previous cgroup with the same pid, so unshare or clone return -EEXIST
      
      (2) the cgroup creation is out of control because there may have an
          application creating several namespaces where the system will
          automatically create several cgroups in his back and let them on the
          cgroupfs (eg.  a vrf based on the network namespace).
      
      (3) the mix of (1) and (2) force an administrator to regularly check
          and clean these cgroups.
      
      This patchset removes the ns_cgroup by adding a new flag to the cgroup and
      the cgroupfs mount option.  It enables the copy of the parent cgroup when
      a child cgroup is created.  We can then safely remove the ns_cgroup as
      this flag brings a compatibility.  We have now to manually create and add
      the task to a cgroup, which is consistent with the cgroup framework.
      
      This patch:
      
      Sent as an answer to a previous thread around the ns_cgroup.
      
      https://lists.linux-foundation.org/pipermail/containers/2009-June/018627.html
      
      It adds a control file 'clone_children' for a cgroup.  This control file
      is a boolean specifying if the child cgroup should be a clone of the
      parent cgroup or not.  The default value is 'false'.
      
      This flag makes the child cgroup to call the post_clone callback of all
      the subsystem, if it is available.
      
      At present, the cpuset is the only one which had implemented the
      post_clone callback.
      
      The option can be set at mount time by specifying the 'clone_children'
      mount option.
      Signed-off-by: NDaniel Lezcano <daniel.lezcano@free.fr>
      Signed-off-by: NSerge E. Hallyn <serge.hallyn@canonical.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Acked-by: NPaul Menage <menage@google.com>
      Reviewed-by: NLi Zefan <lizf@cn.fujitsu.com>
      Cc: Jamal Hadi Salim <hadi@cyberus.ca>
      Cc: Matt Helsley <matthltc@us.ibm.com>
      Acked-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      97978e6d
  18. 10 9月, 2010 1 次提交
  19. 05 9月, 2010 1 次提交
    • M
      cgroups: fix API thinko · 73457f0f
      Michael S. Tsirkin 提交于
      cgroup_attach_task_current_cg API that have upstream is backwards: we
      really need an API to attach to the cgroups from another process A to
      the current one.
      
      In our case (vhost), a priveledged user wants to attach it's task to cgroups
      from a less priveledged one, the API makes us run it in the other
      task's context, and this fails.
      
      So let's make the API generic and just pass in 'from' and 'to' tasks.
      Add an inline wrapper for cgroup_attach_task_current_cg to avoid
      breaking bisect.
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      Acked-by: NLi Zefan <lizf@cn.fujitsu.com>
      Acked-by: NPaul Menage <menage@google.com>
      73457f0f
  20. 20 8月, 2010 1 次提交
  21. 28 7月, 2010 1 次提交
  22. 09 6月, 2010 1 次提交
    • P
      sched: Fix PROVE_RCU vs cpu_cgroup · dc61b1d6
      Peter Zijlstra 提交于
      PROVE_RCU has a few issues with the cpu_cgroup because the scheduler
      typically holds rq->lock around the css rcu derefs but the generic
      cgroup code doesn't (and can't) know about that lock.
      
      Provide means to add extra checks to the css dereference and use that
      in the scheduler to annotate its users.
      
      The addition of rq->lock to these checks is correct because the
      cgroup_subsys::attach() method takes the rq->lock for each task it
      moves, therefore by holding that lock, we ensure the task is pinned to
      the current cgroup and the RCU derefence is valid.
      
      That leaves one genuine race in __sched_setscheduler() where we used
      task_group() without holding any of the required locks and thus raced
      with the cgroup code. Solve this by moving the check under the
      appropriate lock.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      dc61b1d6
  23. 28 5月, 2010 1 次提交
  24. 05 5月, 2010 1 次提交
  25. 13 3月, 2010 6 次提交
    • K
      cgroups: remove events before destroying subsystem state objects · a0a4db54
      Kirill A. Shutemov 提交于
      Events should be removed after rmdir of cgroup directory, but before
      destroying subsystem state objects.  Let's take reference to cgroup
      directory dentry to do that.
      Signed-off-by: NKirill A. Shutemov <kirill@shutemov.name>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hioryu@jp.fujitsu.com>
      Cc: Paul Menage <menage@google.com>
      Acked-by: NLi Zefan <lizf@cn.fujitsu.com>
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Cc: Dan Malek <dan@embeddedalley.com>
      Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a0a4db54
    • K
      cgroup: implement eventfd-based generic API for notifications · 0dea1168
      Kirill A. Shutemov 提交于
      This patchset introduces eventfd-based API for notifications in cgroups
      and implements memory notifications on top of it.
      
      It uses statistics in memory controler to track memory usage.
      
      Output of time(1) on building kernel on tmpfs:
      
      Root cgroup before changes:
      	make -j2  506.37 user 60.93s system 193% cpu 4:52.77 total
      Non-root cgroup before changes:
      	make -j2  507.14 user 62.66s system 193% cpu 4:54.74 total
      Root cgroup after changes (0 thresholds):
      	make -j2  507.13 user 62.20s system 193% cpu 4:53.55 total
      Non-root cgroup after changes (0 thresholds):
      	make -j2  507.70 user 64.20s system 193% cpu 4:55.70 total
      Root cgroup after changes (1 thresholds, never crossed):
      	make -j2  506.97 user 62.20s system 193% cpu 4:53.90 total
      Non-root cgroup after changes (1 thresholds, never crossed):
      	make -j2  507.55 user 64.08s system 193% cpu 4:55.63 total
      
      This patch:
      
      Introduce the write-only file "cgroup.event_control" in every cgroup.
      
      To register new notification handler you need:
      - create an eventfd;
      - open a control file to be monitored. Callbacks register_event() and
        unregister_event() must be defined for the control file;
      - write "<event_fd> <control_fd> <args>" to cgroup.event_control.
        Interpretation of args is defined by control file implementation;
      
      eventfd will be woken up by control file implementation or when the
      cgroup is removed.
      
      To unregister notification handler just close eventfd.
      
      If you need notification functionality for a control file you have to
      implement callbacks register_event() and unregister_event() in the
      struct cftype.
      
      [kamezawa.hiroyu@jp.fujitsu.com: Kconfig fix]
      Signed-off-by: NKirill A. Shutemov <kirill@shutemov.name>
      Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Paul Menage <menage@google.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Cc: Dan Malek <dan@embeddedalley.com>
      Cc: Vladislav Buzov <vbuzov@embeddedalley.com>
      Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Cc: Alexander Shishkin <virtuoso@slind.org>
      Cc: Davide Libenzi <davidel@xmailserver.org>
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0dea1168
    • B
      cgroups: subsystem module unloading · cf5d5941
      Ben Blum 提交于
      Provides support for unloading modular subsystems.
      
      This patch adds a new function cgroup_unload_subsys which is to be used
      for removing a loaded subsystem during module deletion.  Reference
      counting of the subsystems' modules is moved from once (at load time) to
      once per attached hierarchy (in parse_cgroupfs_options and
      rebind_subsystems) (i.e., 0 or 1).
      Signed-off-by: NBen Blum <bblum@andrew.cmu.edu>
      Acked-by: NLi Zefan <lizf@cn.fujitsu.com>
      Cc: Paul Menage <menage@google.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cf5d5941
    • B
      cgroups: subsystem module loading interface · e6a1105b
      Ben Blum 提交于
      Add interface between cgroups subsystem management and module loading
      
      This patch implements rudimentary module-loading support for cgroups -
      namely, a cgroup_load_subsys (similar to cgroup_init_subsys) for use as a
      module initcall, and a struct module pointer in struct cgroup_subsys.
      
      Several functions that might be wanted by modules have had EXPORT_SYMBOL
      added to them, but it's unclear exactly which functions want it and which
      won't.
      Signed-off-by: NBen Blum <bblum@andrew.cmu.edu>
      Acked-by: NLi Zefan <lizf@cn.fujitsu.com>
      Cc: Paul Menage <menage@google.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e6a1105b
    • B
      cgroups: revamp subsys array · aae8aab4
      Ben Blum 提交于
      This patch series provides the ability for cgroup subsystems to be
      compiled as modules both within and outside the kernel tree.  This is
      mainly useful for classifiers and subsystems that hook into components
      that are already modules.  cls_cgroup and blkio-cgroup serve as the
      example use cases for this feature.
      
      It provides an interface cgroup_load_subsys() and cgroup_unload_subsys()
      which modular subsystems can use to register and depart during runtime.
      The net_cls classifier subsystem serves as the example for a subsystem
      which can be converted into a module using these changes.
      
      Patch #1 sets up the subsys[] array so its contents can be dynamic as
      modules appear and (eventually) disappear.  Iterations over the array are
      modified to handle when subsystems are absent, and the dynamic section of
      the array is protected by cgroup_mutex.
      
      Patch #2 implements an interface for modules to load subsystems, called
      cgroup_load_subsys, similar to cgroup_init_subsys, and adds a module
      pointer in struct cgroup_subsys.
      
      Patch #3 adds a mechanism for unloading modular subsystems, which includes
      a more advanced rework of the rudimentary reference counting introduced in
      patch 2.
      
      Patch #4 modifies the net_cls subsystem, which already had some module
      declarations, to be configurable as a module, which also serves as a
      simple proof-of-concept.
      
      Part of implementing patches 2 and 4 involved updating css pointers in
      each css_set when the module appears or leaves.  In doing this, it was
      discovered that css_sets always remain linked to the dummy cgroup,
      regardless of whether or not any subsystems are actually bound to it
      (i.e., not mounted on an actual hierarchy).  The subsystem loading and
      unloading code therefore should keep in mind the special cases where the
      added subsystem is the only one in the dummy cgroup (and therefore all
      css_sets need to be linked back into it) and where the removed subsys was
      the only one in the dummy cgroup (and therefore all css_sets should be
      unlinked from it) - however, as all css_sets always stay attached to the
      dummy cgroup anyway, these cases are ignored.  Any fix that addresses this
      issue should also make sure these cases are addressed in the subsystem
      loading and unloading code.
      
      This patch:
      
      Make subsys[] able to be dynamically populated to support modular
      subsystems
      
      This patch reworks the way the subsys[] array is used so that subsystems
      can register themselves after boot time, and enables the internals of
      cgroups to be able to handle when subsystems are not present or may
      appear/disappear.
      Signed-off-by: NBen Blum <bblum@andrew.cmu.edu>
      Acked-by: NLi Zefan <lizf@cn.fujitsu.com>
      Cc: Paul Menage <menage@google.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      aae8aab4
    • D
      cgroup: introduce coalesce css_get() and css_put() · d7b9fff7
      Daisuke Nishimura 提交于
      Current css_get() and css_put() increment/decrement css->refcnt one by
      one.
      
      This patch add a new function __css_get(), which takes "count" as a arg
      and increment the css->refcnt by "count".  And this patch also add a new
      arg("count") to __css_put() and change the function to decrement the
      css->refcnt by "count".
      
      These coalesce version of __css_get()/__css_put() will be used to improve
      performance of memcg's moving charge feature later, where instead of
      calling css_get()/css_put() repeatedly, these new functions will be used.
      
      No change is needed for current users of css_get()/css_put().
      Signed-off-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Acked-by: NPaul Menage <menage@google.com>
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d7b9fff7