1. 15 4月, 2013 4 次提交
    • T
      cgroup: introduce sane_behavior mount option · 873fe09e
      Tejun Heo 提交于
      It's a sad fact that at this point various cgroup controllers are
      carrying so many idiosyncrasies and pure insanities that it simply
      isn't possible to reach any sort of sane consistent behavior while
      maintaining staying fully compatible with what already has been
      exposed to userland.
      
      As we can't break exposed userland interface, transitioning to sane
      behaviors can only be done in steps while maintaining backwards
      compatibility.  This patch introduces a new mount option -
      __DEVEL__sane_behavior - which disables crazy features and enforces
      consistent behaviors in cgroup core proper and various controllers.
      As exactly which behaviors it changes are still being determined, the
      mount option, at this point, is useful only for development of the new
      behaviors.  As such, the mount option is prefixed with __DEVEL__ and
      generates a warning message when used.
      
      Eventually, once we get to the point where all controller's behaviors
      are consistent enough to implement unified hierarchy, the __DEVEL__
      prefix will be dropped, and more importantly, unified-hierarchy will
      enforce sane_behavior by default.  Maybe we'll able to completely drop
      the crazy stuff after a while, maybe not, but we at least have a
      strategy to move on to saner behaviors.
      
      This patch introduces the mount option and changes the following
      behaviors in cgroup core.
      
      * Mount options "noprefix" and "clone_children" are disallowed.  Also,
        cgroupfs file cgroup.clone_children is not created.
      
      * When mounting an existing superblock, mount options should match.
        This is currently pretty crazy.  If one mounts a cgroup, creates a
        subdirectory, unmounts it and then mount it again with different
        option, it looks like the new options are applied but they aren't.
      
      * Remount is disallowed.
      
      The behaviors changes are documented in the comment above
      CGRP_ROOT_SANE_BEHAVIOR enum and will be expanded as different
      controllers are converted and planned improvements progress.
      
      v2: Dropped unnecessary explicit file permission setting sane_behavior
          cftype entry as suggested by Li Zefan.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NSerge E. Hallyn <serge.hallyn@ubuntu.com>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      873fe09e
    • T
      move cgroupfs_root to include/linux/cgroup.h · 25a7e684
      Tejun Heo 提交于
      While controllers shouldn't be accessing cgroupfs_root directly, it
      being hidden inside kern/cgroup.c makes somethings pretty silly.  This
      makes routing hierarchy-wide settings which need to be visible to
      controllers cumbersome.
      
      We're gonna add another hierarchy-wide setting which needs to be
      accessed from controllers.  Move cgroupfs_root and its flags to the
      header file so that we can access root settings with inline helpers.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NSerge E. Hallyn <serge.hallyn@ubuntu.com>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      25a7e684
    • T
      cgroup: convert cgroupfs_root flag bits to masks and add CGRP_ prefix · 93438629
      Tejun Heo 提交于
      There's no reason to be using bitops, which tends to be more
      cumbersome, to handle root flags.  Convert them to masks.  Also, as
      they'll be moved to include/linux/cgroup.h and it's generally a good
      idea, add CGRP_ prefix.
      
      Note that flags are assigned from (1 << 1).  The first bit will be
      used by a flag which will be added soon.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NSerge E. Hallyn <serge.hallyn@ubuntu.com>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      93438629
    • T
      cgroup: make cgroup_path() not print double slashes · da1f296f
      Tejun Heo 提交于
      While reimplementing cgroup_path(), 65dff759 ("cgroup: fix
      cgroup_path() vs rename() race") introduced a bug where the path of a
      non-root cgroup would have two slahses at the beginning, which is
      caused by treating the root cgroup which has the name '/' like
      non-root cgroups.
      
       $ grep systemd /proc/self/cgroup
       1:name=systemd://user/root/1
      
      Fix it by special casing root cgroup case and not looping over it in
      the normal path.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Li Zefan <lizefan@huawei.com>
      da1f296f
  2. 13 4月, 2013 1 次提交
  3. 11 4月, 2013 3 次提交
  4. 08 4月, 2013 5 次提交
  5. 04 4月, 2013 1 次提交
  6. 20 3月, 2013 2 次提交
  7. 13 3月, 2013 5 次提交
  8. 06 3月, 2013 1 次提交
  9. 05 3月, 2013 2 次提交
    • L
      cgroup: no need to check css refs for release notification · f50daa70
      Li Zefan 提交于
      We no longer fail rmdir() when there're still css refs, so we don't
      need to check css refs in check_for_release().
      
      This also voids a bug. cgroup_has_css_refs() accesses subsys[i]
      without cgroup_mutex, so it can race with cgroup_unload_subsys().
      
      cgroup_has_css_refs()
      ...
        if (ss == NULL || ss->root != cgrp->root)
      
      if ss pointers to net_cls_subsys, and cls_cgroup module is unloaded
      right after the former check but before the latter, the memory that
      net_cls_subsys resides has become invalid.
      Signed-off-by: NLi Zefan <lizefan@huawei.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      f50daa70
    • L
      cgroup: fix cgroup_path() vs rename() race · 65dff759
      Li Zefan 提交于
      rename() will change dentry->d_name. The result of this race can
      be worse than seeing partially rewritten name, but we might access
      a stale pointer because rename() will re-allocate memory to hold
      a longer name.
      
      As accessing dentry->name must be protected by dentry->d_lock or
      parent inode's i_mutex, while on the other hand cgroup-path() can
      be called with some irq-safe spinlocks held, we can't generate
      cgroup path using dentry->d_name.
      
      Alternatively we make a copy of dentry->d_name and save it in
      cgrp->name when a cgroup is created, and update cgrp->name at
      rename().
      
      v5: use flexible array instead of zero-size array.
      v4: - allocate root_cgroup_name and all root_cgroup->name points to it.
          - add cgroup_name() wrapper.
      v3: use kfree_rcu() instead of synchronize_rcu() in user-visible path.
      v2: make cgrp->name RCU safe.
      Signed-off-by: NLi Zefan <lizefan@huawei.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      65dff759
  10. 28 2月, 2013 3 次提交
    • S
      hlist: drop the node parameter from iterators · b67bfe0d
      Sasha Levin 提交于
      I'm not sure why, but the hlist for each entry iterators were conceived
      
              list_for_each_entry(pos, head, member)
      
      The hlist ones were greedy and wanted an extra parameter:
      
              hlist_for_each_entry(tpos, pos, head, member)
      
      Why did they need an extra pos parameter? I'm not quite sure. Not only
      they don't really need it, it also prevents the iterator from looking
      exactly like the list iterator, which is unfortunate.
      
      Besides the semantic patch, there was some manual work required:
      
       - Fix up the actual hlist iterators in linux/list.h
       - Fix up the declaration of other iterators based on the hlist ones.
       - A very small amount of places were using the 'node' parameter, this
       was modified to use 'obj->member' instead.
       - Coccinelle didn't handle the hlist_for_each_entry_safe iterator
       properly, so those had to be fixed up manually.
      
      The semantic patch which is mostly the work of Peter Senna Tschudin is here:
      
      @@
      iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;
      
      type T;
      expression a,c,d,e;
      identifier b;
      statement S;
      @@
      
      -T b;
          <+... when != b
      (
      hlist_for_each_entry(a,
      - b,
      c, d) S
      |
      hlist_for_each_entry_continue(a,
      - b,
      c) S
      |
      hlist_for_each_entry_from(a,
      - b,
      c) S
      |
      hlist_for_each_entry_rcu(a,
      - b,
      c, d) S
      |
      hlist_for_each_entry_rcu_bh(a,
      - b,
      c, d) S
      |
      hlist_for_each_entry_continue_rcu_bh(a,
      - b,
      c) S
      |
      for_each_busy_worker(a, c,
      - b,
      d) S
      |
      ax25_uid_for_each(a,
      - b,
      c) S
      |
      ax25_for_each(a,
      - b,
      c) S
      |
      inet_bind_bucket_for_each(a,
      - b,
      c) S
      |
      sctp_for_each_hentry(a,
      - b,
      c) S
      |
      sk_for_each(a,
      - b,
      c) S
      |
      sk_for_each_rcu(a,
      - b,
      c) S
      |
      sk_for_each_from
      -(a, b)
      +(a)
      S
      + sk_for_each_from(a) S
      |
      sk_for_each_safe(a,
      - b,
      c, d) S
      |
      sk_for_each_bound(a,
      - b,
      c) S
      |
      hlist_for_each_entry_safe(a,
      - b,
      c, d, e) S
      |
      hlist_for_each_entry_continue_rcu(a,
      - b,
      c) S
      |
      nr_neigh_for_each(a,
      - b,
      c) S
      |
      nr_neigh_for_each_safe(a,
      - b,
      c, d) S
      |
      nr_node_for_each(a,
      - b,
      c) S
      |
      nr_node_for_each_safe(a,
      - b,
      c, d) S
      |
      - for_each_gfn_sp(a, c, d, b) S
      + for_each_gfn_sp(a, c, d) S
      |
      - for_each_gfn_indirect_valid_sp(a, c, d, b) S
      + for_each_gfn_indirect_valid_sp(a, c, d) S
      |
      for_each_host(a,
      - b,
      c) S
      |
      for_each_host_safe(a,
      - b,
      c, d) S
      |
      for_each_mesh_entry(a,
      - b,
      c, d) S
      )
          ...+>
      
      [akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
      [akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
      [akpm@linux-foundation.org: checkpatch fixes]
      [akpm@linux-foundation.org: fix warnings]
      [akpm@linux-foudnation.org: redo intrusive kvm changes]
      Tested-by: NPeter Senna Tschudin <peter.senna@gmail.com>
      Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NSasha Levin <sasha.levin@oracle.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b67bfe0d
    • T
      cgroup: convert to idr_alloc() · d228d9ec
      Tejun Heo 提交于
      Convert to the much saner new idr interface.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d228d9ec
    • T
      cgroup: don't use idr_remove_all() · c897ff68
      Tejun Heo 提交于
      idr_destroy() can destroy idr by itself and idr_remove_all() is being
      deprecated.  Drop its usage.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c897ff68
  11. 23 2月, 2013 1 次提交
  12. 19 2月, 2013 3 次提交
  13. 25 1月, 2013 5 次提交
  14. 23 1月, 2013 1 次提交
    • L
      cgroup: fix bogus kernel warnings when cgroup_create() failed · 2739d3cc
      Li Zefan 提交于
      If cgroup_create() failed and cgroup_destroy_locked() is called to
      do cleanup, we'll see a bunch of warnings:
      
      cgroup_addrm_files: failed to remove 2MB.limit_in_bytes, err=-2
      cgroup_addrm_files: failed to remove 2MB.usage_in_bytes, err=-2
      cgroup_addrm_files: failed to remove 2MB.max_usage_in_bytes, err=-2
      cgroup_addrm_files: failed to remove 2MB.failcnt, err=-2
      cgroup_addrm_files: failed to remove prioidx, err=-2
      cgroup_addrm_files: failed to remove ifpriomap, err=-2
      ...
      
      We failed to remove those files, because cgroup_create() has failed
      before creating those cgroup files.
      
      To fix this, we simply don't warn if cgroup_rm_file() can't find the
      cft entry.
      Signed-off-by: NLi Zefan <lizefan@huawei.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      2739d3cc
  15. 15 1月, 2013 2 次提交
    • L
      cgroup: remove synchronize_rcu() from rebind_subsystems() · 130e3695
      Li Zefan 提交于
      Nothing's protected by RCU in rebind_subsystems(), and I can't think
      of a reason why it is needed.
      Signed-off-by: NLi Zefan <lizefan@huawei.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      130e3695
    • L
      cgroup: remove synchronize_rcu() from cgroup_attach_{task|proc}() · 5d65bc0c
      Li Zefan 提交于
      These 2 syncronize_rcu()s make attaching a task to a cgroup
      quite slow, and it can't be ignored in some situations.
      
      A real case from Colin Cross: Android uses cgroups heavily to
      manage thread priorities, putting threads in a background group
      with reduced cpu.shares when they are not visible to the user,
      and in a foreground group when they are. Some RPCs from foreground
      threads to background threads will temporarily move the background
      thread into the foreground group for the duration of the RPC.
      This results in many calls to cgroup_attach_task.
      
      In cgroup_attach_task() it's task->cgroups that is protected by RCU,
      and put_css_set() calls kfree_rcu() to free it.
      
      If we remove this synchronize_rcu(), there can be threads in RCU-read
      sections accessing their old cgroup via current->cgroups with
      concurrent rmdir operation, but this is safe.
      
       # time for ((i=0; i<50; i++)) { echo $$ > /mnt/sub/tasks; echo $$ > /mnt/tasks; }
      
      real    0m2.524s
      user    0m0.008s
      sys     0m0.004s
      
      With this patch:
      
      real    0m0.004s
      user    0m0.004s
      sys     0m0.000s
      
      tj: These synchronize_rcu()s are utterly confused.  synchornize_rcu()
          necessarily has to come between two operations to guarantee that
          the changes made by the former operation are visible to all rcu
          readers before proceeding to the latter operation.  Here,
          synchornize_rcu() are at the end of attach operations with nothing
          beyond it.  Its only effect would be delaying completion of
          write(2) to sysfs tasks/procs files until all rcu readers see the
          change, which doesn't mean anything.
      Signed-off-by: NLi Zefan <lizefan@huawei.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NColin Cross <ccross@google.com>
      5d65bc0c
  16. 11 1月, 2013 1 次提交