1. 15 5月, 2013 2 次提交
    • T
      cgroup: drop hierarchy_id_lock · 54e7b4eb
      Tejun Heo 提交于
      Now that hierarchy_id alloc / free are protected by the cgroup
      mutexes, there's no need for this separate lock.  Drop it.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      54e7b4eb
    • T
      cgroup: refactor hierarchy_id handling · fa3ca07e
      Tejun Heo 提交于
      We're planning to converting hierarchy_ida to an idr and use it to
      look up hierarchy from its id.  As we want the mapping to happen
      atomically with cgroupfs_root registration, this patch refactors
      hierarchy_id init / exit so that ida operations happen inside
      cgroup_[root_]mutex.
      
      * s/init_root_id()/cgroup_init_root_id()/ and make it return 0 or
        -errno like a normal function.
      
      * Move hierarchy_id initialization from cgroup_root_from_opts() into
        cgroup_mount() block where the root is confirmed to be used and
        being registered while holding both mutexes.
      
      * Split cgroup_drop_id() into cgroup_exit_root_id() and
        cgroup_free_root(), so that ID release can happen before dropping
        the mutexes in cgroup_kill_sb().  The latter expects hierarchy_id to
        be exited before being invoked.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      fa3ca07e
  2. 02 5月, 2013 1 次提交
  3. 30 4月, 2013 1 次提交
  4. 27 4月, 2013 2 次提交
  5. 19 4月, 2013 1 次提交
    • L
      cgroup: fix broken file xattrs · 712317ad
      Li Zefan 提交于
      We should store file xattrs in struct cfent instead of struct cftype,
      because cftype is a type while cfent is object instance of cftype.
      
      For example each cgroup has a tasks file, and each tasks file is
      associated with a uniq cfent, but all those files share the same
      struct cftype.
      
      Alexey Kodanev reported a crash, which can be reproduced:
      
        # mount -t cgroup -o xattr /sys/fs/cgroup
        # mkdir /sys/fs/cgroup/test
        # setfattr -n trusted.value -v test_value /sys/fs/cgroup/tasks
        # rmdir /sys/fs/cgroup/test
        # umount /sys/fs/cgroup
        oops!
      
      In this case, simple_xattrs_free() will free the same struct simple_xattrs
      twice.
      
      tj: Dropped unused local variable @cft from cgroup_diput().
      
      Cc: <stable@vger.kernel.org> # 3.8.x
      Reported-by: NAlexey Kodanev <alexey.kodanev@oracle.com>
      Signed-off-by: NLi Zefan <lizefan@huawei.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      712317ad
  6. 15 4月, 2013 5 次提交
    • L
      cgroup: remove cgrp->top_cgroup · 05fb22ec
      Li Zefan 提交于
      It's not used, and it can be retrieved via cgrp->root->top_cgroup.
      Signed-off-by: NLi Zefan <lizefan@huawei.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      05fb22ec
    • T
      cgroup: introduce sane_behavior mount option · 873fe09e
      Tejun Heo 提交于
      It's a sad fact that at this point various cgroup controllers are
      carrying so many idiosyncrasies and pure insanities that it simply
      isn't possible to reach any sort of sane consistent behavior while
      maintaining staying fully compatible with what already has been
      exposed to userland.
      
      As we can't break exposed userland interface, transitioning to sane
      behaviors can only be done in steps while maintaining backwards
      compatibility.  This patch introduces a new mount option -
      __DEVEL__sane_behavior - which disables crazy features and enforces
      consistent behaviors in cgroup core proper and various controllers.
      As exactly which behaviors it changes are still being determined, the
      mount option, at this point, is useful only for development of the new
      behaviors.  As such, the mount option is prefixed with __DEVEL__ and
      generates a warning message when used.
      
      Eventually, once we get to the point where all controller's behaviors
      are consistent enough to implement unified hierarchy, the __DEVEL__
      prefix will be dropped, and more importantly, unified-hierarchy will
      enforce sane_behavior by default.  Maybe we'll able to completely drop
      the crazy stuff after a while, maybe not, but we at least have a
      strategy to move on to saner behaviors.
      
      This patch introduces the mount option and changes the following
      behaviors in cgroup core.
      
      * Mount options "noprefix" and "clone_children" are disallowed.  Also,
        cgroupfs file cgroup.clone_children is not created.
      
      * When mounting an existing superblock, mount options should match.
        This is currently pretty crazy.  If one mounts a cgroup, creates a
        subdirectory, unmounts it and then mount it again with different
        option, it looks like the new options are applied but they aren't.
      
      * Remount is disallowed.
      
      The behaviors changes are documented in the comment above
      CGRP_ROOT_SANE_BEHAVIOR enum and will be expanded as different
      controllers are converted and planned improvements progress.
      
      v2: Dropped unnecessary explicit file permission setting sane_behavior
          cftype entry as suggested by Li Zefan.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NSerge E. Hallyn <serge.hallyn@ubuntu.com>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      873fe09e
    • T
      move cgroupfs_root to include/linux/cgroup.h · 25a7e684
      Tejun Heo 提交于
      While controllers shouldn't be accessing cgroupfs_root directly, it
      being hidden inside kern/cgroup.c makes somethings pretty silly.  This
      makes routing hierarchy-wide settings which need to be visible to
      controllers cumbersome.
      
      We're gonna add another hierarchy-wide setting which needs to be
      accessed from controllers.  Move cgroupfs_root and its flags to the
      header file so that we can access root settings with inline helpers.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NSerge E. Hallyn <serge.hallyn@ubuntu.com>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      25a7e684
    • T
      cgroup: convert cgroupfs_root flag bits to masks and add CGRP_ prefix · 93438629
      Tejun Heo 提交于
      There's no reason to be using bitops, which tends to be more
      cumbersome, to handle root flags.  Convert them to masks.  Also, as
      they'll be moved to include/linux/cgroup.h and it's generally a good
      idea, add CGRP_ prefix.
      
      Note that flags are assigned from (1 << 1).  The first bit will be
      used by a flag which will be added soon.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NSerge E. Hallyn <serge.hallyn@ubuntu.com>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      93438629
    • T
      cgroup: make cgroup_path() not print double slashes · da1f296f
      Tejun Heo 提交于
      While reimplementing cgroup_path(), 65dff759 ("cgroup: fix
      cgroup_path() vs rename() race") introduced a bug where the path of a
      non-root cgroup would have two slahses at the beginning, which is
      caused by treating the root cgroup which has the name '/' like
      non-root cgroups.
      
       $ grep systemd /proc/self/cgroup
       1:name=systemd://user/root/1
      
      Fix it by special casing root cgroup case and not looping over it in
      the normal path.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Li Zefan <lizefan@huawei.com>
      da1f296f
  7. 13 4月, 2013 1 次提交
  8. 11 4月, 2013 3 次提交
  9. 10 4月, 2013 1 次提交
  10. 08 4月, 2013 5 次提交
  11. 04 4月, 2013 1 次提交
  12. 20 3月, 2013 3 次提交
    • L
      cgroup: consolidate cgroup_attach_task() and cgroup_attach_proc() · 081aa458
      Li Zefan 提交于
      These two functions share most of the code.
      Signed-off-by: NLi Zefan <lizefan@huawei.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      081aa458
    • L
      cgroup: fix an off-by-one bug which may trigger BUG_ON() · 3ac1707a
      Li Zefan 提交于
      The 3rd parameter of flex_array_prealloc() is the number of elements,
      not the index of the last element.
      
      The effect of the bug is, when opening cgroup.procs, a flex array will
      be allocated and all elements of the array is allocated with
      GFP_KERNEL flag, but the last one is GFP_ATOMIC, and if we fail to
      allocate memory for it, it'll trigger a BUG_ON().
      Signed-off-by: NLi Zefan <lizefan@huawei.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: stable@vger.kernel.org
      3ac1707a
    • T
      sched: replace PF_THREAD_BOUND with PF_NO_SETAFFINITY · 14a40ffc
      Tejun Heo 提交于
      PF_THREAD_BOUND was originally used to mark kernel threads which were
      bound to a specific CPU using kthread_bind() and a task with the flag
      set allows cpus_allowed modifications only to itself.  Workqueue is
      currently abusing it to prevent userland from meddling with
      cpus_allowed of workqueue workers.
      
      What we need is a flag to prevent userland from messing with
      cpus_allowed of certain kernel tasks.  In kernel, anyone can
      (incorrectly) squash the flag, and, for worker-type usages,
      restricting cpus_allowed modification to the task itself doesn't
      provide meaningful extra proection as other tasks can inject work
      items to the task anyway.
      
      This patch replaces PF_THREAD_BOUND with PF_NO_SETAFFINITY.
      sched_setaffinity() checks the flag and return -EINVAL if set.
      set_cpus_allowed_ptr() is no longer affected by the flag.
      
      This will allow simplifying workqueue worker CPU affinity management.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Reviewed-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      14a40ffc
  13. 13 3月, 2013 5 次提交
  14. 06 3月, 2013 1 次提交
  15. 05 3月, 2013 2 次提交
    • L
      cgroup: no need to check css refs for release notification · f50daa70
      Li Zefan 提交于
      We no longer fail rmdir() when there're still css refs, so we don't
      need to check css refs in check_for_release().
      
      This also voids a bug. cgroup_has_css_refs() accesses subsys[i]
      without cgroup_mutex, so it can race with cgroup_unload_subsys().
      
      cgroup_has_css_refs()
      ...
        if (ss == NULL || ss->root != cgrp->root)
      
      if ss pointers to net_cls_subsys, and cls_cgroup module is unloaded
      right after the former check but before the latter, the memory that
      net_cls_subsys resides has become invalid.
      Signed-off-by: NLi Zefan <lizefan@huawei.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      f50daa70
    • L
      cgroup: fix cgroup_path() vs rename() race · 65dff759
      Li Zefan 提交于
      rename() will change dentry->d_name. The result of this race can
      be worse than seeing partially rewritten name, but we might access
      a stale pointer because rename() will re-allocate memory to hold
      a longer name.
      
      As accessing dentry->name must be protected by dentry->d_lock or
      parent inode's i_mutex, while on the other hand cgroup-path() can
      be called with some irq-safe spinlocks held, we can't generate
      cgroup path using dentry->d_name.
      
      Alternatively we make a copy of dentry->d_name and save it in
      cgrp->name when a cgroup is created, and update cgrp->name at
      rename().
      
      v5: use flexible array instead of zero-size array.
      v4: - allocate root_cgroup_name and all root_cgroup->name points to it.
          - add cgroup_name() wrapper.
      v3: use kfree_rcu() instead of synchronize_rcu() in user-visible path.
      v2: make cgrp->name RCU safe.
      Signed-off-by: NLi Zefan <lizefan@huawei.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      65dff759
  16. 28 2月, 2013 3 次提交
    • S
      hlist: drop the node parameter from iterators · b67bfe0d
      Sasha Levin 提交于
      I'm not sure why, but the hlist for each entry iterators were conceived
      
              list_for_each_entry(pos, head, member)
      
      The hlist ones were greedy and wanted an extra parameter:
      
              hlist_for_each_entry(tpos, pos, head, member)
      
      Why did they need an extra pos parameter? I'm not quite sure. Not only
      they don't really need it, it also prevents the iterator from looking
      exactly like the list iterator, which is unfortunate.
      
      Besides the semantic patch, there was some manual work required:
      
       - Fix up the actual hlist iterators in linux/list.h
       - Fix up the declaration of other iterators based on the hlist ones.
       - A very small amount of places were using the 'node' parameter, this
       was modified to use 'obj->member' instead.
       - Coccinelle didn't handle the hlist_for_each_entry_safe iterator
       properly, so those had to be fixed up manually.
      
      The semantic patch which is mostly the work of Peter Senna Tschudin is here:
      
      @@
      iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;
      
      type T;
      expression a,c,d,e;
      identifier b;
      statement S;
      @@
      
      -T b;
          <+... when != b
      (
      hlist_for_each_entry(a,
      - b,
      c, d) S
      |
      hlist_for_each_entry_continue(a,
      - b,
      c) S
      |
      hlist_for_each_entry_from(a,
      - b,
      c) S
      |
      hlist_for_each_entry_rcu(a,
      - b,
      c, d) S
      |
      hlist_for_each_entry_rcu_bh(a,
      - b,
      c, d) S
      |
      hlist_for_each_entry_continue_rcu_bh(a,
      - b,
      c) S
      |
      for_each_busy_worker(a, c,
      - b,
      d) S
      |
      ax25_uid_for_each(a,
      - b,
      c) S
      |
      ax25_for_each(a,
      - b,
      c) S
      |
      inet_bind_bucket_for_each(a,
      - b,
      c) S
      |
      sctp_for_each_hentry(a,
      - b,
      c) S
      |
      sk_for_each(a,
      - b,
      c) S
      |
      sk_for_each_rcu(a,
      - b,
      c) S
      |
      sk_for_each_from
      -(a, b)
      +(a)
      S
      + sk_for_each_from(a) S
      |
      sk_for_each_safe(a,
      - b,
      c, d) S
      |
      sk_for_each_bound(a,
      - b,
      c) S
      |
      hlist_for_each_entry_safe(a,
      - b,
      c, d, e) S
      |
      hlist_for_each_entry_continue_rcu(a,
      - b,
      c) S
      |
      nr_neigh_for_each(a,
      - b,
      c) S
      |
      nr_neigh_for_each_safe(a,
      - b,
      c, d) S
      |
      nr_node_for_each(a,
      - b,
      c) S
      |
      nr_node_for_each_safe(a,
      - b,
      c, d) S
      |
      - for_each_gfn_sp(a, c, d, b) S
      + for_each_gfn_sp(a, c, d) S
      |
      - for_each_gfn_indirect_valid_sp(a, c, d, b) S
      + for_each_gfn_indirect_valid_sp(a, c, d) S
      |
      for_each_host(a,
      - b,
      c) S
      |
      for_each_host_safe(a,
      - b,
      c, d) S
      |
      for_each_mesh_entry(a,
      - b,
      c, d) S
      )
          ...+>
      
      [akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
      [akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
      [akpm@linux-foundation.org: checkpatch fixes]
      [akpm@linux-foundation.org: fix warnings]
      [akpm@linux-foudnation.org: redo intrusive kvm changes]
      Tested-by: NPeter Senna Tschudin <peter.senna@gmail.com>
      Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NSasha Levin <sasha.levin@oracle.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b67bfe0d
    • T
      cgroup: convert to idr_alloc() · d228d9ec
      Tejun Heo 提交于
      Convert to the much saner new idr interface.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d228d9ec
    • T
      cgroup: don't use idr_remove_all() · c897ff68
      Tejun Heo 提交于
      idr_destroy() can destroy idr by itself and idr_remove_all() is being
      deprecated.  Drop its usage.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c897ff68
  17. 23 2月, 2013 1 次提交
  18. 19 2月, 2013 2 次提交