1. 14 6月, 2013 7 次提交
    • T
      cgroup: remove cgroup->count and use · 6f3d828f
      Tejun Heo 提交于
      cgroup->count tracks the number of css_sets associated with the cgroup
      and used only to verify that no css_set is associated when the cgroup
      is being destroyed.  It's superflous as the destruction path can
      simply check whether cgroup->cset_links is empty instead.
      
      Drop cgroup->count and check ->cset_links directly from
      cgroup_destroy_locked().
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      6f3d828f
    • T
      cgroup: drop unnecessary RCU dancing from __put_css_set() · ddd69148
      Tejun Heo 提交于
      __put_css_set() does RCU read access on @cgrp across dropping
      @cgrp->count so that it can continue accessing @cgrp even if the count
      reached zero and destruction of the cgroup commenced.  Given that both
      sides - __css_put() and cgroup_destroy_locked() - are cold paths, this
      is unnecessary.  Just making cgroup_destroy_locked() grab css_set_lock
      while checking @cgrp->count is enough.
      
      Remove the RCU read locking from __put_css_set() and make
      cgroup_destroy_locked() read-lock css_set_lock when checking
      @cgrp->count.  This will also allow removing @cgrp->count.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      ddd69148
    • T
      cgroup: rename CGRP_REMOVED to CGRP_DEAD · 54766d4a
      Tejun Heo 提交于
      We will add another flag indicating that the cgroup is in the process
      of being killed.  REMOVING / REMOVED is more difficult to distinguish
      and cgroup_is_removing()/cgroup_is_removed() are a bit awkward.  Also,
      later percpu_ref usage will involve "kill"ing the refcnt.
      
       s/CGRP_REMOVED/CGRP_DEAD/
       s/cgroup_is_removed()/cgroup_is_dead()
      
      This patch is purely cosmetic.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      54766d4a
    • T
      cgroup: use kzalloc() instead of kmalloc() · f4f4be2b
      Tejun Heo 提交于
      There's no point in using kmalloc() instead of the clearing variant
      for trivial stuff.  We can live dangerously elsewhere.  Use kzalloc()
      instead and drop 0 inits.
      
      While at it, do trivial code reorganization in cgroup_file_open().
      
      This patch doesn't introduce any functional changes.
      
      v2: I was caught in the very distant past where list_del() didn't
          poison and the initial version converted list_del()s to
          list_del_init()s too.  Li and Kent took me out of the stasis
          chamber.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Kent Overstreet <koverstreet@google.com>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      f4f4be2b
    • T
      cgroup: bring some sanity to naming around cg_cgroup_link · 69d0206c
      Tejun Heo 提交于
      cgroups and css_sets are mapped M:N and this M:N mapping is
      represented by struct cg_cgroup_link which forms linked lists on both
      sides.  The naming around this mapping is already confusing and struct
      cg_cgroup_link exacerbates the situation quite a bit.
      
      >From cgroup side, it starts off ->css_sets and runs through
      ->cgrp_link_list.  From css_set side, it starts off ->cg_links and
      runs through ->cg_link_list.  This is rather reversed as
      cgrp_link_list is used to iterate css_sets and cg_link_list cgroups.
      Also, this is the only place which is still using the confusing "cg"
      for css_sets.  This patch cleans it up a bit.
      
      * s/cgroup->css_sets/cgroup->cset_links/
        s/css_set->cg_links/css_set->cgrp_links/
        s/cgroup_iter->cg_link/cgroup_iter->cset_link/
      
      * s/cg_cgroup_link/cgrp_cset_link/
      
      * s/cgrp_cset_link->cg/cgrp_cset_link->cset/
        s/cgrp_cset_link->cgrp_link_list/cgrp_cset_link->cset_link/
        s/cgrp_cset_link->cg_link_list/cgrp_cset_link->cgrp_link/
      
      * s/init_css_set_link/init_cgrp_cset_link/
        s/free_cg_links/free_cgrp_cset_links/
        s/allocate_cg_links/allocate_cgrp_cset_links/
      
      * s/cgl[12]/link[12]/ in compare_css_sets()
      
      * s/saved_link/tmp_link/ s/tmp/tmp_links/ and a couple similar
        adustments.
      
      * Comment and whiteline adjustments.
      
      After the changes, we have
      
      	list_for_each_entry(link, &cont->cset_links, cset_link) {
      		struct css_set *cset = link->cset;
      
      instead of
      
      	list_for_each_entry(link, &cont->css_sets, cgrp_link_list) {
      		struct css_set *cset = link->cg;
      
      This patch is purely cosmetic.
      
      v2: Fix broken sentences in the patch description.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      69d0206c
    • T
      cgroup: consistently use @cset for struct css_set variables · 5abb8855
      Tejun Heo 提交于
      cgroup.c uses @cg for most struct css_set variables, which in itself
      could be a bit confusing, but made much worse by the fact that there
      are places which use @cg for struct cgroup variables.
      compare_css_sets() epitomizes this confusion - @[old_]cg are struct
      css_set while @cg[12] are struct cgroup.
      
      It's not like the whole deal with cgroup, css_set and cg_cgroup_link
      isn't already confusing enough.  Let's give it some sanity by
      uniformly using @cset for all struct css_set variables.
      
      * s/cg/cset/ for all css_set variables.
      
      * s/oldcg/old_cset/ s/oldcgrp/old_cgrp/.  The same for the ones
        prefixed with "new".
      
      * s/cg/cgrp/ for cgroup variables in compare_css_sets().
      
      * s/css/cset/ for the cgroup variable in task_cgroup_from_root().
      
      * Whiteline adjustments.
      
      This patch is purely cosmetic.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      5abb8855
    • T
      cgroup: remove now unused css_depth() · 3fc3db9a
      Tejun Heo 提交于
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      3fc3db9a
  2. 06 6月, 2013 3 次提交
    • T
      cgroup: clean up the cftype array for the base cgroup files · d5c56ced
      Tejun Heo 提交于
      * Rename it from files[] (really?) to cgroup_base_files[].
      
      * Drop CGROUP_FILE_GENERIC_PREFIX which was defined as "cgroup." and
        used inconsistently.  Just use "cgroup." directly.
      
      * Collect insane files at the end.  Note that only the insane ones are
        missing "cgroup." prefix.
      
      This patch doesn't introduce any functional changes.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      d5c56ced
    • T
      cgroup: mark "notify_on_release" and "release_agent" cgroup files insane · cc5943a7
      Tejun Heo 提交于
      The empty cgroup notification mechanism currently implemented in
      cgroup is tragically outdated.  Forking and execing userland process
      stopped being a viable notification mechanism more than a decade ago.
      We're gonna have a saner mechanism.  Let's make it clear that this
      abomination is going away.
      
      Mark "notify_on_release" and "release_agent" with CFTYPE_INSANE.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      cc5943a7
    • T
      cgroup: mark "tasks" cgroup file as insane · f12dc020
      Tejun Heo 提交于
      Some resources controlled by cgroup aren't per-task and cgroup core
      allowing threads of a single thread_group to be in different cgroups
      forced memcg do explicitly find the group leader and use it.  This is
      gonna be nasty when transitioning to unified hierarchy and in general
      we don't want and won't support granularity finer than processes.
      
      Mark "tasks" with CFTYPE_INSANE.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: cgroups@vger.kernel.org
      Cc: Vivek Goyal <vgoyal@redhat.com>
      f12dc020
  3. 24 5月, 2013 4 次提交
    • T
      cgroup: update iterators to use cgroup_next_sibling() · 75501a6d
      Tejun Heo 提交于
      This patch converts cgroup_for_each_child(),
      cgroup_next_descendant_pre/post() and thus
      cgroup_for_each_descendant_pre/post() to use cgroup_next_sibling()
      instead of manually dereferencing ->sibling.next.
      
      The only reason the iterators couldn't allow dropping RCU read lock
      while iteration is in progress was because they couldn't determine the
      next sibling safely once RCU read lock is dropped.  Using
      cgroup_next_sibling() removes that problem and enables all iterators
      to allow dropping RCU read lock in the middle.  Comments are updated
      accordingly.
      
      This makes the iterators easier to use and will simplify controllers.
      
      Note that @cgroup argument is renamed to @cgrp in
      cgroup_for_each_child() because it conflicts with "struct cgroup" used
      in the new macro body.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NSerge E. Hallyn <serge.hallyn@ubuntu.com>
      Reviewed-by: NMichal Hocko <mhocko@suse.cz>
      75501a6d
    • T
      cgroup: add cgroup->serial_nr and implement cgroup_next_sibling() · 53fa5261
      Tejun Heo 提交于
      Currently, there's no easy way to find out the next sibling cgroup
      unless it's known that the current cgroup is accessed from the
      parent's children list in a single RCU critical section.  This in turn
      forces all iterators to require whole iteration to be enclosed in a
      single RCU critical section, which sometimes is too restrictive.  This
      patch implements cgroup_next_sibling() which can reliably determine
      the next sibling regardless of the state of the current cgroup as long
      as it's accessible.
      
      It currently is impossible to determine the next sibling after
      dropping RCU read lock because the cgroup being iterated could be
      removed anytime and if RCU read lock is dropped, nothing guarantess
      its ->sibling.next pointer is accessible.  A removed cgroup would
      continue to point to its next sibling for RCU accesses but stop
      receiving updates from the sibling.  IOW, the next sibling could be
      removed and then complete its grace period while RCU read lock is
      dropped, making it unsafe to dereference ->sibling.next after dropping
      and re-acquiring RCU read lock.
      
      This can be solved by adding a way to traverse to the next sibling
      without dereferencing ->sibling.next.  This patch adds a monotonically
      increasing cgroup serial number, cgroup->serial_nr, which guarantees
      that all cgroup->children lists are kept in increasing serial_nr
      order.  A new function, cgroup_next_sibling(), is implemented, which,
      if CGRP_REMOVED is not set on the current cgroup, follows
      ->sibling.next; otherwise, traverses the parent's ->children list
      until it sees a sibling with higher ->serial_nr.
      
      This allows the function to always return the next sibling regardless
      of the state of the current cgroup without adding overhead in the fast
      path.
      
      Further patches will update the iterators to use cgroup_next_sibling()
      so that they allow dropping RCU read lock and blocking while iteration
      is in progress which in turn will be used to simplify controllers.
      
      v2: Typo fix as per Serge.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NSerge E. Hallyn <serge.hallyn@ubuntu.com>
      53fa5261
    • T
      cgroup: make cgroup_is_removed() static · bdc7119f
      Tejun Heo 提交于
      cgroup_is_removed() no longer has external users and it shouldn't grow
      any - controllers should deal with cgroup_subsys_state on/offline
      state instead of cgroup removal state.  Make it static.
      
      While at it, make it return bool.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      bdc7119f
    • T
      cgroup: fix a subtle bug in descendant pre-order walk · 7805d000
      Tejun Heo 提交于
      When cgroup_next_descendant_pre() initiates a walk, it checks whether
      the subtree root doesn't have any children and if not returns NULL.
      Later code assumes that the subtree isn't empty.  This is broken
      because the subtree may become empty inbetween, which can lead to the
      traversal escaping the subtree by walking to the sibling of the
      subtree root.
      
      There's no reason to have the early exit path.  Remove it along with
      the later assumption that the subtree isn't empty.  This simplifies
      the code a bit and fixes the subtle bug.
      
      While at it, fix the comment of cgroup_for_each_descendant_pre() which
      was incorrectly referring to ->css_offline() instead of
      ->css_online().
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NMichal Hocko <mhocko@suse.cz>
      Cc: stable@vger.kernel.org
      7805d000
  4. 15 5月, 2013 4 次提交
    • T
      cgroup: implement task_cgroup_path_from_hierarchy() · 857a2beb
      Tejun Heo 提交于
      kdbus folks want a sane way to determine the cgroup path that a given
      task belongs to on a given hierarchy, which is a reasonble thing to
      expect from cgroup core.
      
      Implement task_cgroup_path_from_hierarchy().
      
      v2: Dropped unnecessary NULL check on the return value of
          task_cgroup_from_root() as suggested by Li Zefan.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NGreg Kroah-Hartman <greg@kroah.com>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Cc: Kay Sievers <kay@vrfy.org>
      Cc: Lennart Poettering <lennart@poettering.net>
      Cc: Daniel Mack <daniel@zonque.org>
      857a2beb
    • T
      cgroup: make hierarchy_id use cyclic idr · 1a574231
      Tejun Heo 提交于
      We want to be able to lookup a hierarchy from its id and cyclic
      allocation is a whole lot simpler with idr.  Convert to idr and use
      idr_alloc_cyclc().
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      1a574231
    • T
      cgroup: drop hierarchy_id_lock · 54e7b4eb
      Tejun Heo 提交于
      Now that hierarchy_id alloc / free are protected by the cgroup
      mutexes, there's no need for this separate lock.  Drop it.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      54e7b4eb
    • T
      cgroup: refactor hierarchy_id handling · fa3ca07e
      Tejun Heo 提交于
      We're planning to converting hierarchy_ida to an idr and use it to
      look up hierarchy from its id.  As we want the mapping to happen
      atomically with cgroupfs_root registration, this patch refactors
      hierarchy_id init / exit so that ida operations happen inside
      cgroup_[root_]mutex.
      
      * s/init_root_id()/cgroup_init_root_id()/ and make it return 0 or
        -errno like a normal function.
      
      * Move hierarchy_id initialization from cgroup_root_from_opts() into
        cgroup_mount() block where the root is confirmed to be used and
        being registered while holding both mutexes.
      
      * Split cgroup_drop_id() into cgroup_exit_root_id() and
        cgroup_free_root(), so that ID release can happen before dropping
        the mutexes in cgroup_kill_sb().  The latter expects hierarchy_id to
        be exited before being invoked.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      fa3ca07e
  5. 14 5月, 2013 1 次提交
  6. 02 5月, 2013 1 次提交
  7. 30 4月, 2013 1 次提交
  8. 27 4月, 2013 2 次提交
  9. 19 4月, 2013 1 次提交
    • L
      cgroup: fix broken file xattrs · 712317ad
      Li Zefan 提交于
      We should store file xattrs in struct cfent instead of struct cftype,
      because cftype is a type while cfent is object instance of cftype.
      
      For example each cgroup has a tasks file, and each tasks file is
      associated with a uniq cfent, but all those files share the same
      struct cftype.
      
      Alexey Kodanev reported a crash, which can be reproduced:
      
        # mount -t cgroup -o xattr /sys/fs/cgroup
        # mkdir /sys/fs/cgroup/test
        # setfattr -n trusted.value -v test_value /sys/fs/cgroup/tasks
        # rmdir /sys/fs/cgroup/test
        # umount /sys/fs/cgroup
        oops!
      
      In this case, simple_xattrs_free() will free the same struct simple_xattrs
      twice.
      
      tj: Dropped unused local variable @cft from cgroup_diput().
      
      Cc: <stable@vger.kernel.org> # 3.8.x
      Reported-by: NAlexey Kodanev <alexey.kodanev@oracle.com>
      Signed-off-by: NLi Zefan <lizefan@huawei.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      712317ad
  10. 15 4月, 2013 5 次提交
    • L
      cgroup: remove cgrp->top_cgroup · 05fb22ec
      Li Zefan 提交于
      It's not used, and it can be retrieved via cgrp->root->top_cgroup.
      Signed-off-by: NLi Zefan <lizefan@huawei.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      05fb22ec
    • T
      cgroup: introduce sane_behavior mount option · 873fe09e
      Tejun Heo 提交于
      It's a sad fact that at this point various cgroup controllers are
      carrying so many idiosyncrasies and pure insanities that it simply
      isn't possible to reach any sort of sane consistent behavior while
      maintaining staying fully compatible with what already has been
      exposed to userland.
      
      As we can't break exposed userland interface, transitioning to sane
      behaviors can only be done in steps while maintaining backwards
      compatibility.  This patch introduces a new mount option -
      __DEVEL__sane_behavior - which disables crazy features and enforces
      consistent behaviors in cgroup core proper and various controllers.
      As exactly which behaviors it changes are still being determined, the
      mount option, at this point, is useful only for development of the new
      behaviors.  As such, the mount option is prefixed with __DEVEL__ and
      generates a warning message when used.
      
      Eventually, once we get to the point where all controller's behaviors
      are consistent enough to implement unified hierarchy, the __DEVEL__
      prefix will be dropped, and more importantly, unified-hierarchy will
      enforce sane_behavior by default.  Maybe we'll able to completely drop
      the crazy stuff after a while, maybe not, but we at least have a
      strategy to move on to saner behaviors.
      
      This patch introduces the mount option and changes the following
      behaviors in cgroup core.
      
      * Mount options "noprefix" and "clone_children" are disallowed.  Also,
        cgroupfs file cgroup.clone_children is not created.
      
      * When mounting an existing superblock, mount options should match.
        This is currently pretty crazy.  If one mounts a cgroup, creates a
        subdirectory, unmounts it and then mount it again with different
        option, it looks like the new options are applied but they aren't.
      
      * Remount is disallowed.
      
      The behaviors changes are documented in the comment above
      CGRP_ROOT_SANE_BEHAVIOR enum and will be expanded as different
      controllers are converted and planned improvements progress.
      
      v2: Dropped unnecessary explicit file permission setting sane_behavior
          cftype entry as suggested by Li Zefan.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NSerge E. Hallyn <serge.hallyn@ubuntu.com>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      873fe09e
    • T
      move cgroupfs_root to include/linux/cgroup.h · 25a7e684
      Tejun Heo 提交于
      While controllers shouldn't be accessing cgroupfs_root directly, it
      being hidden inside kern/cgroup.c makes somethings pretty silly.  This
      makes routing hierarchy-wide settings which need to be visible to
      controllers cumbersome.
      
      We're gonna add another hierarchy-wide setting which needs to be
      accessed from controllers.  Move cgroupfs_root and its flags to the
      header file so that we can access root settings with inline helpers.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NSerge E. Hallyn <serge.hallyn@ubuntu.com>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      25a7e684
    • T
      cgroup: convert cgroupfs_root flag bits to masks and add CGRP_ prefix · 93438629
      Tejun Heo 提交于
      There's no reason to be using bitops, which tends to be more
      cumbersome, to handle root flags.  Convert them to masks.  Also, as
      they'll be moved to include/linux/cgroup.h and it's generally a good
      idea, add CGRP_ prefix.
      
      Note that flags are assigned from (1 << 1).  The first bit will be
      used by a flag which will be added soon.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NSerge E. Hallyn <serge.hallyn@ubuntu.com>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      93438629
    • T
      cgroup: make cgroup_path() not print double slashes · da1f296f
      Tejun Heo 提交于
      While reimplementing cgroup_path(), 65dff759 ("cgroup: fix
      cgroup_path() vs rename() race") introduced a bug where the path of a
      non-root cgroup would have two slahses at the beginning, which is
      caused by treating the root cgroup which has the name '/' like
      non-root cgroups.
      
       $ grep systemd /proc/self/cgroup
       1:name=systemd://user/root/1
      
      Fix it by special casing root cgroup case and not looping over it in
      the normal path.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Li Zefan <lizefan@huawei.com>
      da1f296f
  11. 13 4月, 2013 1 次提交
  12. 11 4月, 2013 3 次提交
  13. 10 4月, 2013 1 次提交
  14. 08 4月, 2013 5 次提交
  15. 04 4月, 2013 1 次提交