1. 24 5月, 2013 4 次提交
    • T
      cgroup: update iterators to use cgroup_next_sibling() · 75501a6d
      Tejun Heo 提交于
      This patch converts cgroup_for_each_child(),
      cgroup_next_descendant_pre/post() and thus
      cgroup_for_each_descendant_pre/post() to use cgroup_next_sibling()
      instead of manually dereferencing ->sibling.next.
      
      The only reason the iterators couldn't allow dropping RCU read lock
      while iteration is in progress was because they couldn't determine the
      next sibling safely once RCU read lock is dropped.  Using
      cgroup_next_sibling() removes that problem and enables all iterators
      to allow dropping RCU read lock in the middle.  Comments are updated
      accordingly.
      
      This makes the iterators easier to use and will simplify controllers.
      
      Note that @cgroup argument is renamed to @cgrp in
      cgroup_for_each_child() because it conflicts with "struct cgroup" used
      in the new macro body.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NSerge E. Hallyn <serge.hallyn@ubuntu.com>
      Reviewed-by: NMichal Hocko <mhocko@suse.cz>
      75501a6d
    • T
      cgroup: add cgroup->serial_nr and implement cgroup_next_sibling() · 53fa5261
      Tejun Heo 提交于
      Currently, there's no easy way to find out the next sibling cgroup
      unless it's known that the current cgroup is accessed from the
      parent's children list in a single RCU critical section.  This in turn
      forces all iterators to require whole iteration to be enclosed in a
      single RCU critical section, which sometimes is too restrictive.  This
      patch implements cgroup_next_sibling() which can reliably determine
      the next sibling regardless of the state of the current cgroup as long
      as it's accessible.
      
      It currently is impossible to determine the next sibling after
      dropping RCU read lock because the cgroup being iterated could be
      removed anytime and if RCU read lock is dropped, nothing guarantess
      its ->sibling.next pointer is accessible.  A removed cgroup would
      continue to point to its next sibling for RCU accesses but stop
      receiving updates from the sibling.  IOW, the next sibling could be
      removed and then complete its grace period while RCU read lock is
      dropped, making it unsafe to dereference ->sibling.next after dropping
      and re-acquiring RCU read lock.
      
      This can be solved by adding a way to traverse to the next sibling
      without dereferencing ->sibling.next.  This patch adds a monotonically
      increasing cgroup serial number, cgroup->serial_nr, which guarantees
      that all cgroup->children lists are kept in increasing serial_nr
      order.  A new function, cgroup_next_sibling(), is implemented, which,
      if CGRP_REMOVED is not set on the current cgroup, follows
      ->sibling.next; otherwise, traverses the parent's ->children list
      until it sees a sibling with higher ->serial_nr.
      
      This allows the function to always return the next sibling regardless
      of the state of the current cgroup without adding overhead in the fast
      path.
      
      Further patches will update the iterators to use cgroup_next_sibling()
      so that they allow dropping RCU read lock and blocking while iteration
      is in progress which in turn will be used to simplify controllers.
      
      v2: Typo fix as per Serge.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NSerge E. Hallyn <serge.hallyn@ubuntu.com>
      53fa5261
    • T
      cgroup: make cgroup_is_removed() static · bdc7119f
      Tejun Heo 提交于
      cgroup_is_removed() no longer has external users and it shouldn't grow
      any - controllers should deal with cgroup_subsys_state on/offline
      state instead of cgroup removal state.  Make it static.
      
      While at it, make it return bool.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      bdc7119f
    • T
      cgroup: fix a subtle bug in descendant pre-order walk · 7805d000
      Tejun Heo 提交于
      When cgroup_next_descendant_pre() initiates a walk, it checks whether
      the subtree root doesn't have any children and if not returns NULL.
      Later code assumes that the subtree isn't empty.  This is broken
      because the subtree may become empty inbetween, which can lead to the
      traversal escaping the subtree by walking to the sibling of the
      subtree root.
      
      There's no reason to have the early exit path.  Remove it along with
      the later assumption that the subtree isn't empty.  This simplifies
      the code a bit and fixes the subtle bug.
      
      While at it, fix the comment of cgroup_for_each_descendant_pre() which
      was incorrectly referring to ->css_offline() instead of
      ->css_online().
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NMichal Hocko <mhocko@suse.cz>
      Cc: stable@vger.kernel.org
      7805d000
  2. 15 5月, 2013 4 次提交
    • T
      cgroup: implement task_cgroup_path_from_hierarchy() · 857a2beb
      Tejun Heo 提交于
      kdbus folks want a sane way to determine the cgroup path that a given
      task belongs to on a given hierarchy, which is a reasonble thing to
      expect from cgroup core.
      
      Implement task_cgroup_path_from_hierarchy().
      
      v2: Dropped unnecessary NULL check on the return value of
          task_cgroup_from_root() as suggested by Li Zefan.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NGreg Kroah-Hartman <greg@kroah.com>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Cc: Kay Sievers <kay@vrfy.org>
      Cc: Lennart Poettering <lennart@poettering.net>
      Cc: Daniel Mack <daniel@zonque.org>
      857a2beb
    • T
      cgroup: make hierarchy_id use cyclic idr · 1a574231
      Tejun Heo 提交于
      We want to be able to lookup a hierarchy from its id and cyclic
      allocation is a whole lot simpler with idr.  Convert to idr and use
      idr_alloc_cyclc().
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      1a574231
    • T
      cgroup: drop hierarchy_id_lock · 54e7b4eb
      Tejun Heo 提交于
      Now that hierarchy_id alloc / free are protected by the cgroup
      mutexes, there's no need for this separate lock.  Drop it.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      54e7b4eb
    • T
      cgroup: refactor hierarchy_id handling · fa3ca07e
      Tejun Heo 提交于
      We're planning to converting hierarchy_ida to an idr and use it to
      look up hierarchy from its id.  As we want the mapping to happen
      atomically with cgroupfs_root registration, this patch refactors
      hierarchy_id init / exit so that ida operations happen inside
      cgroup_[root_]mutex.
      
      * s/init_root_id()/cgroup_init_root_id()/ and make it return 0 or
        -errno like a normal function.
      
      * Move hierarchy_id initialization from cgroup_root_from_opts() into
        cgroup_mount() block where the root is confirmed to be used and
        being registered while holding both mutexes.
      
      * Split cgroup_drop_id() into cgroup_exit_root_id() and
        cgroup_free_root(), so that ID release can happen before dropping
        the mutexes in cgroup_kill_sb().  The latter expects hierarchy_id to
        be exited before being invoked.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      fa3ca07e
  3. 14 5月, 2013 1 次提交
  4. 02 5月, 2013 1 次提交
  5. 30 4月, 2013 1 次提交
  6. 27 4月, 2013 2 次提交
  7. 19 4月, 2013 1 次提交
    • L
      cgroup: fix broken file xattrs · 712317ad
      Li Zefan 提交于
      We should store file xattrs in struct cfent instead of struct cftype,
      because cftype is a type while cfent is object instance of cftype.
      
      For example each cgroup has a tasks file, and each tasks file is
      associated with a uniq cfent, but all those files share the same
      struct cftype.
      
      Alexey Kodanev reported a crash, which can be reproduced:
      
        # mount -t cgroup -o xattr /sys/fs/cgroup
        # mkdir /sys/fs/cgroup/test
        # setfattr -n trusted.value -v test_value /sys/fs/cgroup/tasks
        # rmdir /sys/fs/cgroup/test
        # umount /sys/fs/cgroup
        oops!
      
      In this case, simple_xattrs_free() will free the same struct simple_xattrs
      twice.
      
      tj: Dropped unused local variable @cft from cgroup_diput().
      
      Cc: <stable@vger.kernel.org> # 3.8.x
      Reported-by: NAlexey Kodanev <alexey.kodanev@oracle.com>
      Signed-off-by: NLi Zefan <lizefan@huawei.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      712317ad
  8. 15 4月, 2013 5 次提交
    • L
      cgroup: remove cgrp->top_cgroup · 05fb22ec
      Li Zefan 提交于
      It's not used, and it can be retrieved via cgrp->root->top_cgroup.
      Signed-off-by: NLi Zefan <lizefan@huawei.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      05fb22ec
    • T
      cgroup: introduce sane_behavior mount option · 873fe09e
      Tejun Heo 提交于
      It's a sad fact that at this point various cgroup controllers are
      carrying so many idiosyncrasies and pure insanities that it simply
      isn't possible to reach any sort of sane consistent behavior while
      maintaining staying fully compatible with what already has been
      exposed to userland.
      
      As we can't break exposed userland interface, transitioning to sane
      behaviors can only be done in steps while maintaining backwards
      compatibility.  This patch introduces a new mount option -
      __DEVEL__sane_behavior - which disables crazy features and enforces
      consistent behaviors in cgroup core proper and various controllers.
      As exactly which behaviors it changes are still being determined, the
      mount option, at this point, is useful only for development of the new
      behaviors.  As such, the mount option is prefixed with __DEVEL__ and
      generates a warning message when used.
      
      Eventually, once we get to the point where all controller's behaviors
      are consistent enough to implement unified hierarchy, the __DEVEL__
      prefix will be dropped, and more importantly, unified-hierarchy will
      enforce sane_behavior by default.  Maybe we'll able to completely drop
      the crazy stuff after a while, maybe not, but we at least have a
      strategy to move on to saner behaviors.
      
      This patch introduces the mount option and changes the following
      behaviors in cgroup core.
      
      * Mount options "noprefix" and "clone_children" are disallowed.  Also,
        cgroupfs file cgroup.clone_children is not created.
      
      * When mounting an existing superblock, mount options should match.
        This is currently pretty crazy.  If one mounts a cgroup, creates a
        subdirectory, unmounts it and then mount it again with different
        option, it looks like the new options are applied but they aren't.
      
      * Remount is disallowed.
      
      The behaviors changes are documented in the comment above
      CGRP_ROOT_SANE_BEHAVIOR enum and will be expanded as different
      controllers are converted and planned improvements progress.
      
      v2: Dropped unnecessary explicit file permission setting sane_behavior
          cftype entry as suggested by Li Zefan.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NSerge E. Hallyn <serge.hallyn@ubuntu.com>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      873fe09e
    • T
      move cgroupfs_root to include/linux/cgroup.h · 25a7e684
      Tejun Heo 提交于
      While controllers shouldn't be accessing cgroupfs_root directly, it
      being hidden inside kern/cgroup.c makes somethings pretty silly.  This
      makes routing hierarchy-wide settings which need to be visible to
      controllers cumbersome.
      
      We're gonna add another hierarchy-wide setting which needs to be
      accessed from controllers.  Move cgroupfs_root and its flags to the
      header file so that we can access root settings with inline helpers.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NSerge E. Hallyn <serge.hallyn@ubuntu.com>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      25a7e684
    • T
      cgroup: convert cgroupfs_root flag bits to masks and add CGRP_ prefix · 93438629
      Tejun Heo 提交于
      There's no reason to be using bitops, which tends to be more
      cumbersome, to handle root flags.  Convert them to masks.  Also, as
      they'll be moved to include/linux/cgroup.h and it's generally a good
      idea, add CGRP_ prefix.
      
      Note that flags are assigned from (1 << 1).  The first bit will be
      used by a flag which will be added soon.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NSerge E. Hallyn <serge.hallyn@ubuntu.com>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      93438629
    • T
      cgroup: make cgroup_path() not print double slashes · da1f296f
      Tejun Heo 提交于
      While reimplementing cgroup_path(), 65dff759 ("cgroup: fix
      cgroup_path() vs rename() race") introduced a bug where the path of a
      non-root cgroup would have two slahses at the beginning, which is
      caused by treating the root cgroup which has the name '/' like
      non-root cgroups.
      
       $ grep systemd /proc/self/cgroup
       1:name=systemd://user/root/1
      
      Fix it by special casing root cgroup case and not looping over it in
      the normal path.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Li Zefan <lizefan@huawei.com>
      da1f296f
  9. 13 4月, 2013 1 次提交
  10. 11 4月, 2013 3 次提交
  11. 10 4月, 2013 1 次提交
  12. 08 4月, 2013 5 次提交
  13. 04 4月, 2013 1 次提交
  14. 20 3月, 2013 3 次提交
    • L
      cgroup: consolidate cgroup_attach_task() and cgroup_attach_proc() · 081aa458
      Li Zefan 提交于
      These two functions share most of the code.
      Signed-off-by: NLi Zefan <lizefan@huawei.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      081aa458
    • L
      cgroup: fix an off-by-one bug which may trigger BUG_ON() · 3ac1707a
      Li Zefan 提交于
      The 3rd parameter of flex_array_prealloc() is the number of elements,
      not the index of the last element.
      
      The effect of the bug is, when opening cgroup.procs, a flex array will
      be allocated and all elements of the array is allocated with
      GFP_KERNEL flag, but the last one is GFP_ATOMIC, and if we fail to
      allocate memory for it, it'll trigger a BUG_ON().
      Signed-off-by: NLi Zefan <lizefan@huawei.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: stable@vger.kernel.org
      3ac1707a
    • T
      sched: replace PF_THREAD_BOUND with PF_NO_SETAFFINITY · 14a40ffc
      Tejun Heo 提交于
      PF_THREAD_BOUND was originally used to mark kernel threads which were
      bound to a specific CPU using kthread_bind() and a task with the flag
      set allows cpus_allowed modifications only to itself.  Workqueue is
      currently abusing it to prevent userland from meddling with
      cpus_allowed of workqueue workers.
      
      What we need is a flag to prevent userland from messing with
      cpus_allowed of certain kernel tasks.  In kernel, anyone can
      (incorrectly) squash the flag, and, for worker-type usages,
      restricting cpus_allowed modification to the task itself doesn't
      provide meaningful extra proection as other tasks can inject work
      items to the task anyway.
      
      This patch replaces PF_THREAD_BOUND with PF_NO_SETAFFINITY.
      sched_setaffinity() checks the flag and return -EINVAL if set.
      set_cpus_allowed_ptr() is no longer affected by the flag.
      
      This will allow simplifying workqueue worker CPU affinity management.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Reviewed-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      14a40ffc
  15. 13 3月, 2013 5 次提交
  16. 06 3月, 2013 1 次提交
  17. 05 3月, 2013 1 次提交
    • L
      cgroup: no need to check css refs for release notification · f50daa70
      Li Zefan 提交于
      We no longer fail rmdir() when there're still css refs, so we don't
      need to check css refs in check_for_release().
      
      This also voids a bug. cgroup_has_css_refs() accesses subsys[i]
      without cgroup_mutex, so it can race with cgroup_unload_subsys().
      
      cgroup_has_css_refs()
      ...
        if (ss == NULL || ss->root != cgrp->root)
      
      if ss pointers to net_cls_subsys, and cls_cgroup module is unloaded
      right after the former check but before the latter, the memory that
      net_cls_subsys resides has become invalid.
      Signed-off-by: NLi Zefan <lizefan@huawei.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      f50daa70