1. 28 5月, 2012 1 次提交
    • T
      cgroup: superblock can't be released with active dentries · fa980ca8
      Tejun Heo 提交于
      48ddbe19 "cgroup: make css->refcnt clearing on cgroup removal
      optional" allowed a css to linger after the associated cgroup is
      removed.  As a css holds a reference on the cgroup's dentry, it means
      that cgroup dentries may linger for a while.
      
      cgroup_create() does grab an active reference on the superblock to
      prevent it from going away while there are !root cgroups; however, the
      reference is put from cgroup_diput() which is invoked on cgroup
      removal, so cgroup dentries which are removed but persisting due to
      lingering csses already have released their superblock active refs
      allowing superblock to be killed while those dentries are around.
      
      Given the right condition, this makes cgroup_kill_sb() call
      kill_litter_super() with dentries with non-zero d_count leading to
      BUG() in shrink_dcache_for_umount_subtree().
      
      Fix it by adding cgroup_dops->d_release() operation and moving
      deactivate_super() to it.  cgroup_diput() now marks dentry->d_fsdata
      with itself if superblock should be deactivated and cgroup_d_release()
      deactivates the superblock on dentry release.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NSasha Levin <levinsasha928@gmail.com>
      Tested-by: NSasha Levin <levinsasha928@gmail.com>
      LKML-Reference: <CA+1xoqe5hMuxzCRhMy7J0XchDk2ZnuxOHJKikROk1-ReAzcT6g@mail.gmail.com>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      fa980ca8
  2. 24 4月, 2012 1 次提交
  3. 12 4月, 2012 1 次提交
  4. 02 4月, 2012 12 次提交
    • T
      cgroup: make css->refcnt clearing on cgroup removal optional · 48ddbe19
      Tejun Heo 提交于
      Currently, cgroup removal tries to drain all css references.  If there
      are active css references, the removal logic waits and retries
      ->pre_detroy() until either all refs drop to zero or removal is
      cancelled.
      
      This semantics is unusual and adds non-trivial complexity to cgroup
      core and IMHO is fundamentally misguided in that it couples internal
      implementation details (references to internal data structure) with
      externally visible operation (rmdir).  To userland, this is a behavior
      peculiarity which is unnecessary and difficult to expect (css refs is
      otherwise invisible from userland), and, to policy implementations,
      this is an unnecessary restriction (e.g. blkcg wants to hold css refs
      for caching purposes but can't as that becomes visible as rmdir hang).
      
      Unfortunately, memcg currently depends on ->pre_destroy() retrials and
      cgroup removal vetoing and can't be immmediately switched to the new
      behavior.  This patch introduces the new behavior of not waiting for
      css refs to drain and maintains the old behavior for subsystems which
      have __DEPRECATED_clear_css_refs set.
      
      Once, memcg is updated, we can drop the code paths for the old
      behavior as proposed in the following patch.  Note that the following
      patch is incorrect in that dput work item is in cgroup and may lose
      some of dputs when multiples css's are released back-to-back, and
      __css_put() triggers check_for_release() when refcnt reaches 0 instead
      of 1; however, it shows what part can be removed.
      
        http://thread.gmane.org/gmane.linux.kernel.containers/22559/focus=75251
      
      Note that, in not-too-distant future, cgroup core will start emitting
      warning messages for subsys which require the old behavior, so please
      get moving.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizf@cn.fujitsu.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      48ddbe19
    • T
      cgroup: use negative bias on css->refcnt to block css_tryget() · 28b4c27b
      Tejun Heo 提交于
      When a cgroup is about to be removed, cgroup_clear_css_refs() is
      called to check and ensure that there are no active css references.
      
      This is currently achieved by dropping the refcnt to zero iff it has
      only the base ref.  If all css refs could be dropped to zero, ref
      clearing is successful and CSS_REMOVED is set on all css.  If not, the
      base ref is restored.  While css ref is zero w/o CSS_REMOVED set, any
      css_tryget() attempt on it busy loops so that they are atomic
      w.r.t. the whole css ref clearing.
      
      This does work but dropping and re-instating the base ref is somewhat
      hairy and makes it difficult to add more logic to the put path as
      there are two of them - the regular css_put() and the reversible base
      ref clearing.
      
      This patch updates css ref clearing such that blocking new
      css_tryget() and putting the base ref are separate operations.
      CSS_DEACT_BIAS, defined as INT_MIN, is added to css->refcnt and
      css_tryget() busy loops while refcnt is negative.  After all css refs
      are deactivated, if they were all one, ref clearing succeeded and
      CSS_REMOVED is set and the base ref is put using the regular
      css_put(); otherwise, CSS_DEACT_BIAS is subtracted from the refcnts
      and the original postive values are restored.
      
      css_refcnt() accessor which always returns the unbiased positive
      reference counts is added and used to simplify refcnt usages.  While
      at it, relocate and reformat comments in cgroup_has_css_refs().
      
      This separates css->refcnt deactivation and putting the base ref,
      which enables the next patch to make ref clearing optional.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizf@cn.fujitsu.com>
      28b4c27b
    • T
      cgroup: implement cgroup_rm_cftypes() · 79578621
      Tejun Heo 提交于
      Implement cgroup_rm_cftypes() which removes an array of cftypes from a
      subsystem.  It can be called whether the target subsys is attached or
      not.  cgroup core will remove the specified file from all existing
      cgroups.
      
      This will be used to improve sub-subsys modularity and will be helpful
      for unified hierarchy.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizf@cn.fujitsu.com>
      79578621
    • T
      cgroup: introduce struct cfent · 05ef1d7c
      Tejun Heo 提交于
      This patch adds cfent (cgroup file entry) which is the association
      between a cgroup and a file.  This is in-cgroup representation of
      files under a cgroup directory.  This simplifies walking walking
      cgroup files and thus cgroup_clear_directory(), which is now
      implemented in two parts - cgroup_rm_file() and a loop around it.
      
      cgroup_rm_file() will be used to implement cftype removal and cfent is
      scheduled to serve cgroup specific per-file data (e.g. for sysfs-like
      "sever" semantics).
      
      v2: - cfe was freed from cgroup_rm_file() which led to use-after-free
            if the file had openers at the time of removal.  Moved to
            cgroup_diput().
      
          - cgroup_clear_directory() triggered WARN_ON_ONCE() if d_subdirs
            wasn't empty after removing all files.  This triggered
            spuriously if some files were open during directory clearing.
            Removed.
      
      v3: - In cgroup_diput(), WARN_ONCE(!list_empty(&cfe->node)) could be
            spuriously triggered for root cgroups because they don't go
            through cgroup_clear_directory() on unmount.  Don't trigger WARN
            for root cgroups.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizf@cn.fujitsu.com>
      Cc: Glauber Costa <glommer@parallels.com>
      05ef1d7c
    • T
      cgroup: relocate __d_cgrp() and __d_cft() · f6ea9372
      Tejun Heo 提交于
      Move the two macros upwards as they'll be used earlier in the file.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizf@cn.fujitsu.com>
      f6ea9372
    • T
      cgroup: remove cgroup_add_file[s]() · db0416b6
      Tejun Heo 提交于
      No controller is using cgroup_add_files[s]().  Unexport them, and
      convert cgroup_add_files() to handle NULL entry terminated array
      instead of taking count explicitly and continue creation on failure
      for internal use.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizf@cn.fujitsu.com>
      db0416b6
    • T
      cgroup: convert all non-memcg controllers to the new cftype interface · 4baf6e33
      Tejun Heo 提交于
      Convert debug, freezer, cpuset, cpu_cgroup, cpuacct, net_prio, blkio,
      net_cls and device controllers to use the new cftype based interface.
      Termination entry is added to cftype arrays and populate callbacks are
      replaced with cgroup_subsys->base_cftypes initializations.
      
      This is functionally identical transformation.  There shouldn't be any
      visible behavior change.
      
      memcg is rather special and will be converted separately.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizf@cn.fujitsu.com>
      Cc: Paul Menage <paul@paulmenage.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      4baf6e33
    • T
      cgroup: merge cft_release_agent cftype array into the base files array · 6e6ff25b
      Tejun Heo 提交于
      Now that cftype can express whether a file should only be on root,
      cft_release_agent can be merged into the base files cftypes array.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizf@cn.fujitsu.com>
      6e6ff25b
    • T
      cgroup: implement cgroup_add_cftypes() and friends · 8e3f6541
      Tejun Heo 提交于
      Currently, cgroup directories are populated by subsys->populate()
      callback explicitly creating files on each cgroup creation.  This
      level of flexibility isn't needed or desirable.  It provides largely
      unused flexibility which call for abuses while severely limiting what
      the core layer can do through the lack of structure and conventions.
      
      Per each cgroup file type, the only distinction that cgroup users is
      making is whether a cgroup is root or not, which can easily be
      expressed with flags.
      
      This patch introduces cgroup_add_cftypes().  These deal with cftypes
      instead of individual files - controllers indicate that certain types
      of files exist for certain subsystem.  Newly added CFTYPE_*_ON_ROOT
      flags indicate whether a cftype should be excluded or created only on
      the root cgroup.
      
      cgroup_add_cftypes() can be called any time whether the target
      subsystem is currently attached or not.  cgroup core will create files
      on the existing cgroups as necessary.
      
      Also, cgroup_subsys->base_cftypes is added to ease registration of the
      base files for the subsystem.  If non-NULL on subsys init, the cftypes
      pointed to by ->base_cftypes are automatically registered on subsys
      init / load.
      
      Further patches will convert the existing users and remove the file
      based interface.  Note that this interface allows dynamic addition of
      files to an active controller.  This will be used for sub-controller
      modularity and unified hierarchy in the longer term.
      
      This patch implements the new mechanism but doesn't apply it to any
      user.
      
      v2: replaced DECLARE_CGROUP_CFTYPES[_COND]() with
          cgroup_subsys->base_cftypes, which works better for cgroup_subsys
          which is loaded as module.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizf@cn.fujitsu.com>
      8e3f6541
    • T
      cgroup: build list of all cgroups under a given cgroupfs_root · b0ca5a84
      Tejun Heo 提交于
      Build a list of all cgroups anchored at cgroupfs_root->allcg_list and
      going through cgroup->allcg_node.  The list is protected by
      cgroup_mutex and will be used to improve cgroup file handling.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizf@cn.fujitsu.com>
      b0ca5a84
    • T
      cgroup: move cgroup_clear_directory() call out of cgroup_populate_dir() · ff4c8d50
      Tejun Heo 提交于
      cgroup_populate_dir() currently clears all files and then repopulate
      the directory; however, the clearing part is only useful when it's
      called from cgroup_remount().  Relocate the invocation to
      cgroup_remount().
      
      This is to prepare for further cgroup file handling updates.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizf@cn.fujitsu.com>
      ff4c8d50
    • T
      cgroup: deprecate remount option changes · 8b5a5a9d
      Tejun Heo 提交于
      This patch marks the following features for deprecation.
      
      * Rebinding subsys by remount: Never reached useful state - only works
        on empty hierarchies.
      
      * release_agent update by remount: release_agent itself will be
        replaced with conventional fsnotify notification.
      
      v2: Lennart pointed out that "name=" is necessary for mounts w/o any
          controller attached.  Drop "name=" deprecation.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizf@cn.fujitsu.com>
      Cc: Lennart Poettering <mzxreary@0pointer.de>
      8b5a5a9d
  5. 30 3月, 2012 1 次提交
    • T
      cgroup: cgroup_attach_task() could return -errno after success · 8f121918
      Tejun Heo 提交于
      61d1d219 "cgroup: remove extra calls to find_existing_css_set" made
      cgroup_task_migrate() return void.  An unfortunate side effect was
      that cgroup_attach_task() was depending on that function's return
      value to clear its @retval on the success path.  On cgroup mounts
      without any subsystem with ->can_attach() callback,
      cgroup_attach_task() ended up returning @retval without initializing
      it on success.
      
      For some reason, gcc failed to warn about it and it didn't cause
      cgroup_attach_task() to return non-zero value in many cases, probably
      due to difference in register allocation.  When the problem
      materializes, systemd fails to populate /systemd cgroup mount and
      fails to boot.
      
      Fix it by initializing @retval to zero on declaration.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NJiri Kosina <jkosina@suse.cz>
      LKML-Reference: <alpine.LNX.2.00.1203282354440.25526@pobox.suse.cz>
      Reviewed-by: NMandeep Singh Baines <msb@chromium.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      8f121918
  6. 22 3月, 2012 2 次提交
  7. 21 3月, 2012 1 次提交
  8. 22 2月, 2012 2 次提交
    • F
      cgroup: Walk task list under tasklist_lock in cgroup_enable_task_cg_list · 3ce3230a
      Frederic Weisbecker 提交于
      Walking through the tasklist in cgroup_enable_task_cg_list() inside
      an RCU read side critical section is not enough because:
      
      - RCU is not (yet) safe against while_each_thread()
      
      - If we use only RCU, a forking task that has passed cgroup_post_fork()
        without seeing use_task_css_set_links == 1 is not guaranteed to have
        its child immediately visible in the tasklist if we walk through it
        remotely with RCU. In this case it will be missing in its css_set's
        task list.
      
      Thus we need to traverse the list (unfortunately) under the
      tasklist_lock. It makes us safe against while_each_thread() and also
      make sure we see all forked task that have been added to the tasklist.
      
      As a secondary effect, reading and writing use_task_css_set_links are
      now well ordered against tasklist traversing and modification. The new
      layout is:
      
      CPU 0                                      CPU 1
      
      use_task_css_set_links = 1                write_lock(tasklist_lock)
      read_lock(tasklist_lock)                  add task to tasklist
      do_each_thread() {                        write_unlock(tasklist_lock)
      	add thread to css set links       if (use_task_css_set_links)
      } while_each_thread()                         add thread to css set links
      read_unlock(tasklist_lock)
      
      If CPU 0 traverse the list after the task has been added to the tasklist
      then it is correctly added to the css set links. OTOH if CPU 0 traverse
      the tasklist before the new task had the opportunity to be added to the
      tasklist because it was too early in the fork process, then CPU 1
      catches up and add the task to the css set links after it added the task
      to the tasklist. The right value of use_task_css_set_links is guaranteed
      to be visible from CPU 1 due to the LOCK/UNLOCK implicit barrier properties:
      the read_unlock on CPU 0 makes the write on use_task_css_set_links happening
      and the write_lock on CPU 1 make the read of use_task_css_set_links that comes
      afterward to return the correct value.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: NLi Zefan <lizf@cn.fujitsu.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Mandeep Singh Baines <msb@chromium.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      3ce3230a
    • F
      cgroup: Remove wrong comment on cgroup_enable_task_cg_list() · 9a4b4304
      Frederic Weisbecker 提交于
      Remove the stale comment about RCU protection. Many callers
      (all of them?) of cgroup_enable_task_cg_list() don't seem
      to be in an RCU read side critical section. Besides, RCU is
      not helpful to protect against while_each_thread().
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: NLi Zefan <lizf@cn.fujitsu.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Mandeep Singh Baines <msb@chromium.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      9a4b4304
  9. 03 2月, 2012 1 次提交
    • L
      cgroup: remove cgroup_subsys argument from callbacks · 761b3ef5
      Li Zefan 提交于
      The argument is not used at all, and it's not necessary, because
      a specific callback handler of course knows which subsys it
      belongs to.
      
      Now only ->pupulate() takes this argument, because the handlers of
      this callback always call cgroup_add_file()/cgroup_add_files().
      
      So we reduce a few lines of code, though the shrinking of object size
      is minimal.
      
       16 files changed, 113 insertions(+), 162 deletions(-)
      
         text    data     bss     dec     hex filename
      5486240  656987 7039960 13183187         c928d3 vmlinux.o.orig
      5486170  656987 7039960 13183117         c9288d vmlinux.o
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      761b3ef5
  10. 31 1月, 2012 1 次提交
    • M
      cgroup: remove extra calls to find_existing_css_set · 61d1d219
      Mandeep Singh Baines 提交于
      In cgroup_attach_proc, we indirectly call find_existing_css_set 3
      times. It is an expensive call so we want to call it a minimum
      of times. This patch only calls it once and stores the result so
      that it can be used later on when we call cgroup_task_migrate.
      
      This required modifying cgroup_task_migrate to take the new css_set
      (which we obtained from find_css_set) as a parameter. The nice side
      effect of this is that cgroup_task_migrate is now identical for
      cgroup_attach_task and cgroup_attach_proc. It also now returns a
      void since it can never fail.
      
      Changes in V5:
      * https://lkml.org/lkml/2012/1/20/344 (Tejun Heo)
        * Remove css_set_refs
      Changes in V4:
      * https://lkml.org/lkml/2011/12/22/421 (Li Zefan)
        * Avoid GFP_KERNEL (sleep) in rcu_read_lock by getting css_set in
          a separate loop not under an rcu_read_lock
      Changes in V3:
      * https://lkml.org/lkml/2011/12/22/13 (Li Zefan)
        * Fixed earlier bug by creating a seperate patch to remove tasklist_lock
      Changes in V2:
      * https://lkml.org/lkml/2011/12/20/372 (Tejun Heo)
        * Move find_css_set call into loop which creates the flex array
      * Author
        * Kill css_set_refs and use group_size instead
        * Fix an off-by-one error in counting css_set refs
        * Add a retval check in out_list_teardown
      Signed-off-by: NMandeep Singh Baines <msb@chromium.org>
      Acked-by: NLi Zefan <lizf@cn.fujitsu.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: containers@lists.linux-foundation.org
      Cc: cgroups@vger.kernel.org
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Paul Menage <paul@paulmenage.org>
      61d1d219
  11. 21 1月, 2012 3 次提交
  12. 07 1月, 2012 1 次提交
  13. 06 1月, 2012 1 次提交
    • L
      cgroup: fix to allow mounting a hierarchy by name · 0d19ea86
      Li Zefan 提交于
      If we mount a hierarchy with a specified name, the name is unique,
      and we can use it to mount the hierarchy without specifying its
      set of subsystem names. This feature is documented is
      Documentation/cgroups/cgroups.txt section 2.3
      
      Here's an example:
      
      	# mount -t cgroup -o cpuset,name=myhier xxx /cgroup1
      	# mount -t cgroup -o name=myhier xxx /cgroup2
      
      But it was broken by commit 32a8cf23
      (cgroup: make the mount options parsing more accurate)
      
      This fixes the regression.
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: stable@vger.kernel.org
      0d19ea86
  14. 04 1月, 2012 3 次提交
  15. 28 12月, 2011 3 次提交
  16. 22 12月, 2011 5 次提交
  17. 20 12月, 2011 1 次提交