1. 15 9月, 2012 2 次提交
    • D
      cgroup: Wrap subsystem selection macro · 5fc0b025
      Daniel Wagner 提交于
      Before we are able to define all subsystem ids at compile time we need
      a more fine grained control what gets defined when we include
      cgroup_subsys.h. For example we define the enums for the subsystems or
      to declare for struct cgroup_subsys (builtin subsystem) by including
      cgroup_subsys.h and defining SUBSYS accordingly.
      
      Currently, the decision if a subsys is used is defined inside the
      header by testing if CONFIG_*=y is true. By moving this test outside
      of cgroup_subsys.h we are able to control it on the include level.
      
      This is done by introducing IS_SUBSYS_ENABLED which then is defined
      according the task, e.g. is CONFIG_*=y or CONFIG_*=m.
      Signed-off-by: NDaniel Wagner <daniel.wagner@bmw-carit.de>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Cc: Gao feng <gaofeng@cn.fujitsu.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: John Fastabend <john.r.fastabend@intel.com>
      Cc: netdev@vger.kernel.org
      Cc: cgroups@vger.kernel.org
      5fc0b025
    • D
      cgroup: Remove CGROUP_BUILTIN_SUBSYS_COUNT · be45c900
      Daniel Wagner 提交于
      CGROUP_BUILTIN_SUBSYS_COUNT is used as start index or stop index when
      looping over the subsys array looking either at the builtin or the
      module subsystems. Since all the builtin subsystems have an id which
      is lower then CGROUP_BUILTIN_SUBSYS_COUNT we know that any module will
      have an id larger than CGROUP_BUILTIN_SUBSYS_COUNT. In short the ids
      are sorted.
      
      We are about to change id assignment to happen only at compile time
      later in this series. That means we can't rely on the above trick
      since all ids will always be defined at compile time. Furthermore,
      ordering the builtin subsystems and the module subsystems is not
      really necessary.
      
      So we need a different way to know which subsystem is a builtin or a
      module one. We can use the subsys[]->module pointer for this. Any
      place where we need to know if a subsys is module we just check for
      the pointer. If it is NULL then the subsystem is a builtin one.
      
      With this we are able to drop the CGROUP_BUILTIN_SUBSYS_COUNT
      enum. Though we need to introduce a temporary placeholder so that we
      don't get a compilation error when only CONFIG_CGROUP is selected and
      no single controller. An empty enum definition is not valid. Later in
      this series we are able to remove the placeholder again.
      
      And with this change we get a fix for this:
      
      kernel/cgroup.c: In function ‘cgroup_load_subsys’:
      kernel/cgroup.c:4326:38: warning: array subscript is below array bounds [-Warray-bounds]
      
      when CONFIG_CGROUP=y and no built in controller was enabled.
      Signed-off-by: NDaniel Wagner <daniel.wagner@bmw-carit.de>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Cc: Gao feng <gaofeng@cn.fujitsu.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: John Fastabend <john.r.fastabend@intel.com>
      Cc: netdev@vger.kernel.org
      Cc: cgroups@vger.kernel.org
      be45c900
  2. 25 8月, 2012 3 次提交
    • A
      cgroup: rename subsys_bits to subsys_mask · a1a71b45
      Aristeu Rozanski 提交于
      In a previous discussion, Tejun Heo suggested to rename references to
      subsys_bits (added_bits, removed_bits, etc) by something more meaningful.
      
      Cc: Li Zefan <lizefan@huawei.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Hillf Danton <dhillf@gmail.com>
      Cc: Lennart Poettering <lpoetter@redhat.com>
      Signed-off-by: NAristeu Rozanski <aris@redhat.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      a1a71b45
    • A
      cgroup: add xattr support · 03b1cde6
      Aristeu Rozanski 提交于
      This is one of the items in the plumber's wish list.
      
      For use cases:
      
      >> What would the use case be for this?
      >
      > Attaching meta information to services, in an easily discoverable
      > way. For example, in systemd we create one cgroup for each service, and
      > could then store data like the main pid of the specific service as an
      > xattr on the cgroup itself. That way we'd have almost all service state
      > in the cgroupfs, which would make it possible to terminate systemd and
      > later restart it without losing any state information. But there's more:
      > for example, some very peculiar services cannot be terminated on
      > shutdown (i.e. fakeraid DM stuff) and it would be really nice if the
      > services in question could just mark that on their cgroup, by setting an
      > xattr. On the more desktopy side of things there are other
      > possibilities: for example there are plans defining what an application
      > is along the lines of a cgroup (i.e. an app being a collection of
      > processes). With xattrs one could then attach an icon or human readable
      > program name on the cgroup.
      >
      > The key idea is that this would allow attaching runtime meta information
      > to cgroups and everything they model (services, apps, vms), that doesn't
      > need any complex userspace infrastructure, has good access control
      > (i.e. because the file system enforces that anyway, and there's the
      > "trusted." xattr namespace), notifications (inotify), and can easily be
      > shared among applications.
      >
      > Lennart
      
      v7:
      - no changes
      v6:
      - remove user xattr namespace, only allow trusted and security
      v5:
      - check for capabilities before setting/removing xattrs
      v4:
      - no changes
      v3:
      - instead of config option, use mount option to enable xattr support
      Original-patch-by: NLi Zefan <lizefan@huawei.com>
      Cc: Li Zefan <lizefan@huawei.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Hillf Danton <dhillf@gmail.com>
      Cc: Lennart Poettering <lpoetter@redhat.com>
      Signed-off-by: NLi Zefan <lizefan@huawei.com>
      Signed-off-by: NAristeu Rozanski <aris@redhat.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      03b1cde6
    • A
      cgroup: revise how we re-populate root directory · 13af07df
      Aristeu Rozanski 提交于
      When remounting cgroupfs with some subsystems added to it and some
      removed, cgroup will remove all the files in root directory and then
      re-popluate it.
      
      What I'm doing here is, only remove files which belong to subsystems that
      are to be unbinded, and only create files for newly-added subsystems.
      The purpose is to have all other files untouched.
      
      This is a preparation for cgroup xattr support.
      
      v7:
      - checkpatch warnings fixed
      v6:
      - no changes
      v5:
      - no changes
      v4:
      - refactored cgroup_clear_directory() to not use cgroup_rm_file()
      - instead of going thru the list of files, get the file list using the
        subsystems
      - use 'subsys_mask' instead of {added,removed}_bits and made
        cgroup_populate_dir() to match the parameters with cgroup_clear_directory()
      v3:
      - refresh patches after recent refactoring
      Original-patch-by: NLi Zefan <lizefan@huawei.com>
      Cc: Li Zefan <lizefan@huawei.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Hillf Danton <dhillf@gmail.com>
      Cc: Lennart Poettering <lpoetter@redhat.com>
      Signed-off-by: NLi Zefan <lizefan@huawei.com>
      Signed-off-by: NAristeu Rozanski <aris@redhat.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      13af07df
  3. 14 7月, 2012 2 次提交
  4. 10 7月, 2012 1 次提交
  5. 08 7月, 2012 2 次提交
    • T
      cgroup: fix cgroup hierarchy umount race · 5db9a4d9
      Tejun Heo 提交于
      48ddbe19 "cgroup: make css->refcnt clearing on cgroup removal
      optional" allowed a css to linger after the associated cgroup is
      removed.  As a css holds a reference on the cgroup's dentry, it means
      that cgroup dentries may linger for a while.
      
      Destroying a superblock which has dentries with positive refcnts is a
      critical bug and triggers BUG() in vfs code.  As each cgroup dentry
      holds an s_active reference, any lingering cgroup has both its dentry
      and the superblock pinned and thus preventing premature release of
      superblock.
      
      Unfortunately, after 48ddbe19, there's a small window while
      releasing a cgroup which is directly under the root of the hierarchy.
      When a cgroup directory is released, vfs layer first deletes the
      corresponding dentry and then invokes dput() on the parent, which may
      recurse further, so when a cgroup directly below root cgroup is
      released, the cgroup is first destroyed - which releases the s_active
      it was holding - and then the dentry for the root cgroup is dput().
      
      This creates a window where the root dentry's refcnt isn't zero but
      superblock's s_active is.  If umount happens before or during this
      window, vfs will see the root dentry with non-zero refcnt and trigger
      BUG().
      
      Before 48ddbe19, this problem didn't exist because the last dentry
      reference was guaranteed to be put synchronously from rmdir(2)
      invocation which holds s_active around the whole process.
      
      Fix it by holding an extra superblock->s_active reference across
      dput() from css release, which is the dput() path added by 48ddbe19
      and the only one which doesn't hold an extra s_active ref across the
      final cgroup dput().
      Signed-off-by: NTejun Heo <tj@kernel.org>
      LKML-Reference: <4FEEA5CB.8070809@huawei.com>
      Reported-by: Nshyju pv <shyju.pv@huawei.com>
      Tested-by: Nshyju pv <shyju.pv@huawei.com>
      Cc: Sasha Levin <levinsasha928@gmail.com>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      5db9a4d9
    • T
      Revert "cgroup: superblock can't be released with active dentries" · 7db5b3ca
      Tejun Heo 提交于
      This reverts commit fa980ca8.  The
      commit was an attempt to fix a race condition where a cgroup hierarchy
      may be unmounted with positive dentry reference on root cgroup.  While
      the commit made the race condition slightly more difficult to trigger,
      the race was still there and could be reliably triggered using a
      different test case.
      
      Revert the incorrect fix.  The next commit will describe the race and
      fix it correctly.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      LKML-Reference: <4FEEA5CB.8070809@huawei.com>
      Reported-by: Nshyju pv <shyju.pv@huawei.com>
      Cc: Sasha Levin <levinsasha928@gmail.com>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      7db5b3ca
  6. 19 6月, 2012 1 次提交
  7. 07 6月, 2012 2 次提交
  8. 30 5月, 2012 1 次提交
  9. 28 5月, 2012 1 次提交
    • T
      cgroup: superblock can't be released with active dentries · fa980ca8
      Tejun Heo 提交于
      48ddbe19 "cgroup: make css->refcnt clearing on cgroup removal
      optional" allowed a css to linger after the associated cgroup is
      removed.  As a css holds a reference on the cgroup's dentry, it means
      that cgroup dentries may linger for a while.
      
      cgroup_create() does grab an active reference on the superblock to
      prevent it from going away while there are !root cgroups; however, the
      reference is put from cgroup_diput() which is invoked on cgroup
      removal, so cgroup dentries which are removed but persisting due to
      lingering csses already have released their superblock active refs
      allowing superblock to be killed while those dentries are around.
      
      Given the right condition, this makes cgroup_kill_sb() call
      kill_litter_super() with dentries with non-zero d_count leading to
      BUG() in shrink_dcache_for_umount_subtree().
      
      Fix it by adding cgroup_dops->d_release() operation and moving
      deactivate_super() to it.  cgroup_diput() now marks dentry->d_fsdata
      with itself if superblock should be deactivated and cgroup_d_release()
      deactivates the superblock on dentry release.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NSasha Levin <levinsasha928@gmail.com>
      Tested-by: NSasha Levin <levinsasha928@gmail.com>
      LKML-Reference: <CA+1xoqe5hMuxzCRhMy7J0XchDk2ZnuxOHJKikROk1-ReAzcT6g@mail.gmail.com>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      fa980ca8
  10. 16 5月, 2012 1 次提交
  11. 24 4月, 2012 1 次提交
  12. 12 4月, 2012 1 次提交
  13. 02 4月, 2012 12 次提交
    • T
      cgroup: make css->refcnt clearing on cgroup removal optional · 48ddbe19
      Tejun Heo 提交于
      Currently, cgroup removal tries to drain all css references.  If there
      are active css references, the removal logic waits and retries
      ->pre_detroy() until either all refs drop to zero or removal is
      cancelled.
      
      This semantics is unusual and adds non-trivial complexity to cgroup
      core and IMHO is fundamentally misguided in that it couples internal
      implementation details (references to internal data structure) with
      externally visible operation (rmdir).  To userland, this is a behavior
      peculiarity which is unnecessary and difficult to expect (css refs is
      otherwise invisible from userland), and, to policy implementations,
      this is an unnecessary restriction (e.g. blkcg wants to hold css refs
      for caching purposes but can't as that becomes visible as rmdir hang).
      
      Unfortunately, memcg currently depends on ->pre_destroy() retrials and
      cgroup removal vetoing and can't be immmediately switched to the new
      behavior.  This patch introduces the new behavior of not waiting for
      css refs to drain and maintains the old behavior for subsystems which
      have __DEPRECATED_clear_css_refs set.
      
      Once, memcg is updated, we can drop the code paths for the old
      behavior as proposed in the following patch.  Note that the following
      patch is incorrect in that dput work item is in cgroup and may lose
      some of dputs when multiples css's are released back-to-back, and
      __css_put() triggers check_for_release() when refcnt reaches 0 instead
      of 1; however, it shows what part can be removed.
      
        http://thread.gmane.org/gmane.linux.kernel.containers/22559/focus=75251
      
      Note that, in not-too-distant future, cgroup core will start emitting
      warning messages for subsys which require the old behavior, so please
      get moving.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizf@cn.fujitsu.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      48ddbe19
    • T
      cgroup: use negative bias on css->refcnt to block css_tryget() · 28b4c27b
      Tejun Heo 提交于
      When a cgroup is about to be removed, cgroup_clear_css_refs() is
      called to check and ensure that there are no active css references.
      
      This is currently achieved by dropping the refcnt to zero iff it has
      only the base ref.  If all css refs could be dropped to zero, ref
      clearing is successful and CSS_REMOVED is set on all css.  If not, the
      base ref is restored.  While css ref is zero w/o CSS_REMOVED set, any
      css_tryget() attempt on it busy loops so that they are atomic
      w.r.t. the whole css ref clearing.
      
      This does work but dropping and re-instating the base ref is somewhat
      hairy and makes it difficult to add more logic to the put path as
      there are two of them - the regular css_put() and the reversible base
      ref clearing.
      
      This patch updates css ref clearing such that blocking new
      css_tryget() and putting the base ref are separate operations.
      CSS_DEACT_BIAS, defined as INT_MIN, is added to css->refcnt and
      css_tryget() busy loops while refcnt is negative.  After all css refs
      are deactivated, if they were all one, ref clearing succeeded and
      CSS_REMOVED is set and the base ref is put using the regular
      css_put(); otherwise, CSS_DEACT_BIAS is subtracted from the refcnts
      and the original postive values are restored.
      
      css_refcnt() accessor which always returns the unbiased positive
      reference counts is added and used to simplify refcnt usages.  While
      at it, relocate and reformat comments in cgroup_has_css_refs().
      
      This separates css->refcnt deactivation and putting the base ref,
      which enables the next patch to make ref clearing optional.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizf@cn.fujitsu.com>
      28b4c27b
    • T
      cgroup: implement cgroup_rm_cftypes() · 79578621
      Tejun Heo 提交于
      Implement cgroup_rm_cftypes() which removes an array of cftypes from a
      subsystem.  It can be called whether the target subsys is attached or
      not.  cgroup core will remove the specified file from all existing
      cgroups.
      
      This will be used to improve sub-subsys modularity and will be helpful
      for unified hierarchy.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizf@cn.fujitsu.com>
      79578621
    • T
      cgroup: introduce struct cfent · 05ef1d7c
      Tejun Heo 提交于
      This patch adds cfent (cgroup file entry) which is the association
      between a cgroup and a file.  This is in-cgroup representation of
      files under a cgroup directory.  This simplifies walking walking
      cgroup files and thus cgroup_clear_directory(), which is now
      implemented in two parts - cgroup_rm_file() and a loop around it.
      
      cgroup_rm_file() will be used to implement cftype removal and cfent is
      scheduled to serve cgroup specific per-file data (e.g. for sysfs-like
      "sever" semantics).
      
      v2: - cfe was freed from cgroup_rm_file() which led to use-after-free
            if the file had openers at the time of removal.  Moved to
            cgroup_diput().
      
          - cgroup_clear_directory() triggered WARN_ON_ONCE() if d_subdirs
            wasn't empty after removing all files.  This triggered
            spuriously if some files were open during directory clearing.
            Removed.
      
      v3: - In cgroup_diput(), WARN_ONCE(!list_empty(&cfe->node)) could be
            spuriously triggered for root cgroups because they don't go
            through cgroup_clear_directory() on unmount.  Don't trigger WARN
            for root cgroups.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizf@cn.fujitsu.com>
      Cc: Glauber Costa <glommer@parallels.com>
      05ef1d7c
    • T
      cgroup: relocate __d_cgrp() and __d_cft() · f6ea9372
      Tejun Heo 提交于
      Move the two macros upwards as they'll be used earlier in the file.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizf@cn.fujitsu.com>
      f6ea9372
    • T
      cgroup: remove cgroup_add_file[s]() · db0416b6
      Tejun Heo 提交于
      No controller is using cgroup_add_files[s]().  Unexport them, and
      convert cgroup_add_files() to handle NULL entry terminated array
      instead of taking count explicitly and continue creation on failure
      for internal use.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizf@cn.fujitsu.com>
      db0416b6
    • T
      cgroup: convert all non-memcg controllers to the new cftype interface · 4baf6e33
      Tejun Heo 提交于
      Convert debug, freezer, cpuset, cpu_cgroup, cpuacct, net_prio, blkio,
      net_cls and device controllers to use the new cftype based interface.
      Termination entry is added to cftype arrays and populate callbacks are
      replaced with cgroup_subsys->base_cftypes initializations.
      
      This is functionally identical transformation.  There shouldn't be any
      visible behavior change.
      
      memcg is rather special and will be converted separately.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizf@cn.fujitsu.com>
      Cc: Paul Menage <paul@paulmenage.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      4baf6e33
    • T
      cgroup: merge cft_release_agent cftype array into the base files array · 6e6ff25b
      Tejun Heo 提交于
      Now that cftype can express whether a file should only be on root,
      cft_release_agent can be merged into the base files cftypes array.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizf@cn.fujitsu.com>
      6e6ff25b
    • T
      cgroup: implement cgroup_add_cftypes() and friends · 8e3f6541
      Tejun Heo 提交于
      Currently, cgroup directories are populated by subsys->populate()
      callback explicitly creating files on each cgroup creation.  This
      level of flexibility isn't needed or desirable.  It provides largely
      unused flexibility which call for abuses while severely limiting what
      the core layer can do through the lack of structure and conventions.
      
      Per each cgroup file type, the only distinction that cgroup users is
      making is whether a cgroup is root or not, which can easily be
      expressed with flags.
      
      This patch introduces cgroup_add_cftypes().  These deal with cftypes
      instead of individual files - controllers indicate that certain types
      of files exist for certain subsystem.  Newly added CFTYPE_*_ON_ROOT
      flags indicate whether a cftype should be excluded or created only on
      the root cgroup.
      
      cgroup_add_cftypes() can be called any time whether the target
      subsystem is currently attached or not.  cgroup core will create files
      on the existing cgroups as necessary.
      
      Also, cgroup_subsys->base_cftypes is added to ease registration of the
      base files for the subsystem.  If non-NULL on subsys init, the cftypes
      pointed to by ->base_cftypes are automatically registered on subsys
      init / load.
      
      Further patches will convert the existing users and remove the file
      based interface.  Note that this interface allows dynamic addition of
      files to an active controller.  This will be used for sub-controller
      modularity and unified hierarchy in the longer term.
      
      This patch implements the new mechanism but doesn't apply it to any
      user.
      
      v2: replaced DECLARE_CGROUP_CFTYPES[_COND]() with
          cgroup_subsys->base_cftypes, which works better for cgroup_subsys
          which is loaded as module.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizf@cn.fujitsu.com>
      8e3f6541
    • T
      cgroup: build list of all cgroups under a given cgroupfs_root · b0ca5a84
      Tejun Heo 提交于
      Build a list of all cgroups anchored at cgroupfs_root->allcg_list and
      going through cgroup->allcg_node.  The list is protected by
      cgroup_mutex and will be used to improve cgroup file handling.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizf@cn.fujitsu.com>
      b0ca5a84
    • T
      cgroup: move cgroup_clear_directory() call out of cgroup_populate_dir() · ff4c8d50
      Tejun Heo 提交于
      cgroup_populate_dir() currently clears all files and then repopulate
      the directory; however, the clearing part is only useful when it's
      called from cgroup_remount().  Relocate the invocation to
      cgroup_remount().
      
      This is to prepare for further cgroup file handling updates.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizf@cn.fujitsu.com>
      ff4c8d50
    • T
      cgroup: deprecate remount option changes · 8b5a5a9d
      Tejun Heo 提交于
      This patch marks the following features for deprecation.
      
      * Rebinding subsys by remount: Never reached useful state - only works
        on empty hierarchies.
      
      * release_agent update by remount: release_agent itself will be
        replaced with conventional fsnotify notification.
      
      v2: Lennart pointed out that "name=" is necessary for mounts w/o any
          controller attached.  Drop "name=" deprecation.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizf@cn.fujitsu.com>
      Cc: Lennart Poettering <mzxreary@0pointer.de>
      8b5a5a9d
  14. 30 3月, 2012 1 次提交
    • T
      cgroup: cgroup_attach_task() could return -errno after success · 8f121918
      Tejun Heo 提交于
      61d1d219 "cgroup: remove extra calls to find_existing_css_set" made
      cgroup_task_migrate() return void.  An unfortunate side effect was
      that cgroup_attach_task() was depending on that function's return
      value to clear its @retval on the success path.  On cgroup mounts
      without any subsystem with ->can_attach() callback,
      cgroup_attach_task() ended up returning @retval without initializing
      it on success.
      
      For some reason, gcc failed to warn about it and it didn't cause
      cgroup_attach_task() to return non-zero value in many cases, probably
      due to difference in register allocation.  When the problem
      materializes, systemd fails to populate /systemd cgroup mount and
      fails to boot.
      
      Fix it by initializing @retval to zero on declaration.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NJiri Kosina <jkosina@suse.cz>
      LKML-Reference: <alpine.LNX.2.00.1203282354440.25526@pobox.suse.cz>
      Reviewed-by: NMandeep Singh Baines <msb@chromium.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      8f121918
  15. 22 3月, 2012 2 次提交
  16. 21 3月, 2012 1 次提交
  17. 22 2月, 2012 2 次提交
    • F
      cgroup: Walk task list under tasklist_lock in cgroup_enable_task_cg_list · 3ce3230a
      Frederic Weisbecker 提交于
      Walking through the tasklist in cgroup_enable_task_cg_list() inside
      an RCU read side critical section is not enough because:
      
      - RCU is not (yet) safe against while_each_thread()
      
      - If we use only RCU, a forking task that has passed cgroup_post_fork()
        without seeing use_task_css_set_links == 1 is not guaranteed to have
        its child immediately visible in the tasklist if we walk through it
        remotely with RCU. In this case it will be missing in its css_set's
        task list.
      
      Thus we need to traverse the list (unfortunately) under the
      tasklist_lock. It makes us safe against while_each_thread() and also
      make sure we see all forked task that have been added to the tasklist.
      
      As a secondary effect, reading and writing use_task_css_set_links are
      now well ordered against tasklist traversing and modification. The new
      layout is:
      
      CPU 0                                      CPU 1
      
      use_task_css_set_links = 1                write_lock(tasklist_lock)
      read_lock(tasklist_lock)                  add task to tasklist
      do_each_thread() {                        write_unlock(tasklist_lock)
      	add thread to css set links       if (use_task_css_set_links)
      } while_each_thread()                         add thread to css set links
      read_unlock(tasklist_lock)
      
      If CPU 0 traverse the list after the task has been added to the tasklist
      then it is correctly added to the css set links. OTOH if CPU 0 traverse
      the tasklist before the new task had the opportunity to be added to the
      tasklist because it was too early in the fork process, then CPU 1
      catches up and add the task to the css set links after it added the task
      to the tasklist. The right value of use_task_css_set_links is guaranteed
      to be visible from CPU 1 due to the LOCK/UNLOCK implicit barrier properties:
      the read_unlock on CPU 0 makes the write on use_task_css_set_links happening
      and the write_lock on CPU 1 make the read of use_task_css_set_links that comes
      afterward to return the correct value.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: NLi Zefan <lizf@cn.fujitsu.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Mandeep Singh Baines <msb@chromium.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      3ce3230a
    • F
      cgroup: Remove wrong comment on cgroup_enable_task_cg_list() · 9a4b4304
      Frederic Weisbecker 提交于
      Remove the stale comment about RCU protection. Many callers
      (all of them?) of cgroup_enable_task_cg_list() don't seem
      to be in an RCU read side critical section. Besides, RCU is
      not helpful to protect against while_each_thread().
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: NLi Zefan <lizf@cn.fujitsu.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Mandeep Singh Baines <msb@chromium.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      9a4b4304
  18. 03 2月, 2012 1 次提交
    • L
      cgroup: remove cgroup_subsys argument from callbacks · 761b3ef5
      Li Zefan 提交于
      The argument is not used at all, and it's not necessary, because
      a specific callback handler of course knows which subsys it
      belongs to.
      
      Now only ->pupulate() takes this argument, because the handlers of
      this callback always call cgroup_add_file()/cgroup_add_files().
      
      So we reduce a few lines of code, though the shrinking of object size
      is minimal.
      
       16 files changed, 113 insertions(+), 162 deletions(-)
      
         text    data     bss     dec     hex filename
      5486240  656987 7039960 13183187         c928d3 vmlinux.o.orig
      5486170  656987 7039960 13183117         c9288d vmlinux.o
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      761b3ef5
  19. 31 1月, 2012 1 次提交
    • M
      cgroup: remove extra calls to find_existing_css_set · 61d1d219
      Mandeep Singh Baines 提交于
      In cgroup_attach_proc, we indirectly call find_existing_css_set 3
      times. It is an expensive call so we want to call it a minimum
      of times. This patch only calls it once and stores the result so
      that it can be used later on when we call cgroup_task_migrate.
      
      This required modifying cgroup_task_migrate to take the new css_set
      (which we obtained from find_css_set) as a parameter. The nice side
      effect of this is that cgroup_task_migrate is now identical for
      cgroup_attach_task and cgroup_attach_proc. It also now returns a
      void since it can never fail.
      
      Changes in V5:
      * https://lkml.org/lkml/2012/1/20/344 (Tejun Heo)
        * Remove css_set_refs
      Changes in V4:
      * https://lkml.org/lkml/2011/12/22/421 (Li Zefan)
        * Avoid GFP_KERNEL (sleep) in rcu_read_lock by getting css_set in
          a separate loop not under an rcu_read_lock
      Changes in V3:
      * https://lkml.org/lkml/2011/12/22/13 (Li Zefan)
        * Fixed earlier bug by creating a seperate patch to remove tasklist_lock
      Changes in V2:
      * https://lkml.org/lkml/2011/12/20/372 (Tejun Heo)
        * Move find_css_set call into loop which creates the flex array
      * Author
        * Kill css_set_refs and use group_size instead
        * Fix an off-by-one error in counting css_set refs
        * Add a retval check in out_list_teardown
      Signed-off-by: NMandeep Singh Baines <msb@chromium.org>
      Acked-by: NLi Zefan <lizf@cn.fujitsu.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: containers@lists.linux-foundation.org
      Cc: cgroups@vger.kernel.org
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Paul Menage <paul@paulmenage.org>
      61d1d219
  20. 21 1月, 2012 2 次提交