1. 13 2月, 2014 4 次提交
  2. 12 2月, 2014 11 次提交
    • T
      cgroup: remove cgroupfs_root->refcnt · 776f02fa
      Tejun Heo 提交于
      Currently, cgroupfs_root and its ->top_cgroup are separated reference
      counted and the latter's is ignored.  There's no reason to do this
      separately.  This patch removes cgroupfs_root->refcnt and destroys
      cgroupfs_root when the top_cgroup is released.
      
      * cgroup_put() updated to ignore cgroup_is_dead() test for top
        cgroups.  cgroup_free_fn() updated to handle root destruction when
        releasing a top cgroup.
      
      * As root destruction is now bounced through cgroup destruction, it is
        asynchronous.  Update cgroup_mount() so that it waits for pending
        release which is currently implemented using msleep().  Converting
        this to proper wait_queue isn't hard but likely unnecessary.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      776f02fa
    • T
      cgroup: rename cgroupfs_root->number_of_cgroups to ->nr_cgrps and make it atomic_t · 3c9c825b
      Tejun Heo 提交于
      root->number_of_cgroups is currently an integer protected with
      cgroup_mutex.  Except for sanity checks and proc reporting, the only
      place it's used is to check whether the root has any child during
      remount; however, this is a bit flawed as the counter is not
      decremented when the cgroup is unlinked but when it's released,
      meaning that there could be an extended period where all cgroups are
      removed but remount is still not allowed because some internal objects
      are lingering.  While not perfect either, it'd be better to use
      emptiness test on root->top_cgroup.children.
      
      This patch updates cgroup_remount() to test top_cgroup's children
      instead, which makes number_of_cgroups only actual usage statistics
      printing in proc implemented in proc_cgroupstats_show().  Let's
      shorten its name and make it an atomic_t so that we don't have to
      worry about its synchronization.  It's purely auxiliary at this point.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      3c9c825b
    • T
      cgroup: remove cgroup->name · e61734c5
      Tejun Heo 提交于
      cgroup->name handling became quite complicated over time involving
      dedicated struct cgroup_name for RCU protection.  Now that cgroup is
      on kernfs, we can drop all of it and simply use kernfs_name/path() and
      friends.  Replace cgroup->name and all related code with kernfs
      name/path constructs.
      
      * Reimplement cgroup_name() and cgroup_path() as thin wrappers on top
        of kernfs counterparts, which involves semantic changes.
        pr_cont_cgroup_name() and pr_cont_cgroup_path() added.
      
      * cgroup->name handling dropped from cgroup_rename().
      
      * All users of cgroup_name/path() updated to the new semantics.  Users
        which were formatting the string just to printk them are converted
        to use pr_cont_cgroup_name/path() instead, which simplifies things
        quite a bit.  As cgroup_name() no longer requires RCU read lock
        around it, RCU lockings which were protecting only cgroup_name() are
        removed.
      
      v2: Comment above oom_info_lock updated as suggested by Michal.
      
      v3: dummy_top doesn't have a kn associated and
          pr_cont_cgroup_name/path() ended up calling the matching kernfs
          functions with NULL kn leading to oops.  Test for NULL kn and
          print "/" if so.  This issue was reported by Fengguang Wu.
      
      v4: Rebased on top of 0ab02ca8 ("cgroup: protect modifications to
          cgroup_idr with cgroup_mutex").
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Acked-by: NMichal Hocko <mhocko@suse.cz>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Cc: Fengguang Wu <fengguang.wu@intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      e61734c5
    • T
      cgroup: remove cftype_set · 0adb0704
      Tejun Heo 提交于
      cftype_set was added primarily to allow registering the same cftype
      array more than once for different subsystems.  Nobody uses or needs
      such thing and it's already broken because each cftype has ->ss
      pointer which is initialized during registration.
      
      Let's add list_head ->node to cftype and use the first cftype entry in
      the array to link them instead of allocating separate cftype_set.
      While at it, trigger WARN if cft seems previously initialized during
      registration.
      
      This simplifies cftype handling a bit.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      0adb0704
    • T
      cgroup: warn if "xattr" is specified with "sane_behavior" · 86bf4b68
      Tejun Heo 提交于
      Mount option "xattr" is no longer necessary as it's enabled by default
      on kernfs.  Warn if "xattr" is specified with "sane_behavior" so that
      the option can be removed in the future.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      86bf4b68
    • T
      cgroup: convert to kernfs · 2bd59d48
      Tejun Heo 提交于
      cgroup filesystem code was derived from the original sysfs
      implementation which was heavily intertwined with vfs objects and
      locking with the goal of re-using the existing vfs infrastructure.
      That experiment turned out rather disastrous and sysfs switched, a
      long time ago, to distributed filesystem model where a separate
      representation is maintained which is queried by vfs.  Unfortunately,
      cgroup stuck with the failed experiment all these years and
      accumulated even more problems over time.
      
      Locking and object lifetime management being entangled with vfs is
      probably the most egregious.  vfs is never designed to be misused like
      this and cgroup ends up jumping through various convoluted dancing to
      make things work.  Even then, operations across multiple cgroups can't
      be done safely as it'll deadlock with rename locking.
      
      Recently, kernfs is separated out from sysfs so that it can be used by
      users other than sysfs.  This patch converts cgroup to use kernfs,
      which will bring the following benefits.
      
      * Separation from vfs internals.  Locking and object lifetime
        management is contained in cgroup proper making things a lot
        simpler.  This removes significant amount of locking convolutions,
        hairy object lifetime rules and the restriction on multi-cgroup
        operations.
      
      * Can drop a lot of code to implement filesystem interface as most are
        provided by kernfs.
      
      * Proper "severing" semantics, which allows controllers to not worry
        about lingering file accesses after offline.
      
      While the preceding patches did as much as possible to make the
      transition less painful, large part of the conversion has to be one
      discrete step making this patch rather large.  The rest of the commit
      message lists notable changes in different areas.
      
      Overall
      -------
      
      * vfs constructs replaced with kernfs ones.  cgroup->dentry w/ ->kn,
        cgroupfs_root->sb w/ ->kf_root.
      
      * All dentry accessors are removed.  Helpers to map from kernfs
        constructs are added.
      
      * All vfs plumbing around dentry, inode and bdi removed.
      
      * cgroup_mount() now directly looks for matching root and then
        proceeds to create a new one if not found.
      
      Synchronization and object lifetime
      -----------------------------------
      
      * vfs inode locking removed.  Among other things, this removes the
        need for the convolution in cgroup_cfts_commit().  Future patches
        will further simplify it.
      
      * vfs refcnting replaced with cgroup internal ones.  cgroup->refcnt,
        cgroupfs_root->refcnt added.  cgroup_put_root() now directly puts
        root->refcnt and when it reaches zero proceeds to destroy it thus
        merging cgroup_put_root() and the former cgroup_kill_sb().
        Simliarly, cgroup_put() now directly schedules cgroup_free_rcu()
        when refcnt reaches zero.
      
      * Unlike before, kernfs objects don't hold onto cgroup objects.  When
        cgroup destroys a kernfs node, all existing operations are drained
        and the association is broken immediately.  The same for
        cgroupfs_roots and mounts.
      
      * All operations which come through kernfs guarantee that the
        associated cgroup is and stays valid for the duration of operation;
        however, there are two paths which need to find out the associated
        cgroup from dentry without going through kernfs -
        css_tryget_from_dir() and cgroupstats_build().  For these two,
        kernfs_node->priv is RCU managed so that they can dereference it
        under RCU read lock.
      
      File and directory handling
      ---------------------------
      
      * File and directory operations converted to kernfs_ops and
        kernfs_syscall_ops.
      
      * xattrs is implicitly supported by kernfs.  No need to worry about it
        from cgroup.  This means that "xattr" mount option is no longer
        necessary.  A future patch will add a deprecated warning message
        when sane_behavior.
      
      * When cftype->max_write_len > PAGE_SIZE, it's necessary to make a
        private copy of one of the kernfs_ops to set its atomic_write_len.
        cftype->kf_ops is added and cgroup_init/exit_cftypes() are updated
        to handle it.
      
      * cftype->lockdep_key added so that kernfs lockdep annotation can be
        per cftype.
      
      * Inidividual file entries and open states are now managed by kernfs.
        No need to worry about them from cgroup.  cfent, cgroup_open_file
        and their friends are removed.
      
      * kernfs_nodes are created deactivated and kernfs_activate()
        invocations added to places where creation of new nodes are
        committed.
      
      * cgroup_rmdir() uses kernfs_[un]break_active_protection() for
        self-removal.
      
      v2: - Li pointed out in an earlier patch that specifying "name="
            during mount without subsystem specification should succeed if
            there's an existing hierarchy with a matching name although it
            should fail with -EINVAL if a new hierarchy should be created.
            Prior to the conversion, this used by handled by deferring
            failure from NULL return from cgroup_root_from_opts(), which was
            necessary because root was being created before checking for
            existing ones.  Note that cgroup_root_from_opts() returned an
            ERR_PTR() value for error conditions which require immediate
            mount failure.
      
            As we now have separate search and creation steps, deferring
            failure from cgroup_root_from_opts() is no longer necessary.
            cgroup_root_from_opts() is updated to always return ERR_PTR()
            value on failure.
      
          - The logic to match existing roots is updated so that a mount
            attempt with a matching name but different subsys_mask are
            rejected.  This was handled by a separate matching loop under
            the comment "Check for name clashes with existing mounts" but
            got lost during conversion.  Merge the check into the main
            search loop.
      
          - Add __rcu __force casting in RCU_INIT_POINTER() in
            cgroup_destroy_locked() to avoid the sparse address space
            warning reported by kbuild test bot.  Maybe we want an explicit
            interface to use kn->priv as RCU protected pointer?
      
      v3: Make CONFIG_CGROUPS select CONFIG_KERNFS.
      
      v4: Rebased on top of 0ab02ca8 ("cgroup: protect modifications to
          cgroup_idr with cgroup_mutex").
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Cc: kbuild test robot fengguang.wu@intel.com>
      2bd59d48
    • T
      cgroup: misc preps for kernfs conversion · 59f5296b
      Tejun Heo 提交于
      * Un-inline seq_css().  After kernfs conversion, the function will
        need to dereference internal data structures.
      
      * Add cgroup_get/put_root() and replace direct super_block->s_active
        manipulatinos with them.  These will be converted to kernfs_root
        refcnting.
      
      * Add cgroup_get/put() and replace dget/put() on cgrp->dentry with
        them.  These will be converted to kernfs refcnting.
      
      * Update current_css_set_cg_links_read() to use cgroup_name() instead
        of reaching into the dentry name.  The end result is the same.
      
      These changes don't make functional differences but will make
      transition to kernfs easier.
      
      v2: Rebased on top of 0ab02ca8 ("cgroup: protect modifications to
          cgroup_idr with cgroup_mutex").
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      59f5296b
    • T
      cgroup: introduce cgroup_ino() · b1664924
      Tejun Heo 提交于
      mm/memory-failure.c::hwpoison_filter_task() has been reaching into
      cgroup to extract the associated ino to be used as a filtering
      criterion.  This is an implementation detail which shouldn't be
      depended upon from outside cgroup proper and is about to change with
      the scheduled kernfs conversion.
      
      This patch introduces a proper interface to determine the associated
      ino, cgroup_ino(), and updates hwpoison_filter_task() to use it
      instead of reaching directly into cgroup.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      b1664924
    • T
      cgroup: update the meaning of cftype->max_write_len · 5f469907
      Tejun Heo 提交于
      cftype->max_write_len is used to extend the maximum size of writes.
      It's interpreted in such a way that the actual maximum size is one
      less than the specified value.  The default size is defined by
      CGROUP_LOCAL_BUFFER_SIZE.  Its interpretation is quite confusing - its
      value is decremented by 1 and then compared for equality with max
      size, which means that the actual default size is
      CGROUP_LOCAL_BUFFER_SIZE - 2, which is 62 chars.
      
      There's no point in having a limit that low.  Update its definition so
      that it means the actual string length sans termination and anything
      below PAGE_SIZE-1 is treated as PAGE_SIZE-1.
      
      .max_write_len for "release_agent" is updated to PATH_MAX-1 and
      cgroup_release_agent_write() is updated so that the redundant strlen()
      check is removed and it uses strlcpy() instead of strcpy().
      .max_write_len initializations in blk-throttle.c and cfq-iosched.c are
      no longer necessary and removed.  The one in cpuset is kept unchanged
      as it's an approximated value to begin with.
      
      This will also make transition to kernfs smoother.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      5f469907
    • T
      cgroup: make cgroup_subsys->base_cftypes use cgroup_add_cftypes() · de00ffa5
      Tejun Heo 提交于
      Currently, cgroup_subsys->base_cftypes registration is different from
      dynamic cftypes registartion.  Instead of going through
      cgroup_add_cftypes(), cgroup_init_subsys() invokes
      cgroup_init_cftsets() which makes use of cgroup_subsys->base_cftset
      which doesn't involve dynamic allocation.
      
      While avoiding dynamic allocation is somewhat nice, having two
      separate paths for cftypes registration is nasty, especially as we're
      planning to add more operations during cftypes registration.
      
      This patch drops cgroup_init_cftsets() and cgroup_subsys->base_cftset
      and registers base_cftypes using cgroup_add_cftypes().  This is done
      as a separate step in cgroup_init() instead of a part of
      cgroup_init_subsys().  This is because cgroup_init_subsys() can be
      called very early during boot when kmalloc() isn't available yet.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      de00ffa5
    • T
      cgroup: improve css_from_dir() into css_tryget_from_dir() · 5a17f543
      Tejun Heo 提交于
      css_from_dir() returns the matching css (cgroup_subsys_state) given a
      dentry and subsystem.  The function doesn't pin the css before
      returning and requires the caller to be holding RCU read lock or
      cgroup_mutex and handling pinning on the caller side.
      
      Given that users of the function are likely to want to pin the
      returned css (both existing users do) and that getting and putting
      css's are very cheap, there's no reason for the interface to be tricky
      like this.
      
      Rename css_from_dir() to css_tryget_from_dir() and make it try to pin
      the found css and return it only if pinning succeeded.  The callers
      are updated so that they no longer do RCU locking and pinning around
      the function and just use the returned css.
      
      This will also ease converting cgroup to kernfs.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NMichal Hocko <mhocko@suse.cz>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      5a17f543
  3. 11 2月, 2014 1 次提交
    • L
      cgroup: protect modifications to cgroup_idr with cgroup_mutex · 0ab02ca8
      Li Zefan 提交于
      Setup cgroupfs like this:
        # mount -t cgroup -o cpuacct xxx /cgroup
        # mkdir /cgroup/sub1
        # mkdir /cgroup/sub2
      
      Then run these two commands:
        # for ((; ;)) { mkdir /cgroup/sub1/tmp && rmdir /mnt/sub1/tmp; } &
        # for ((; ;)) { mkdir /cgroup/sub2/tmp && rmdir /mnt/sub2/tmp; } &
      
      After seconds you may see this warning:
      
      ------------[ cut here ]------------
      WARNING: CPU: 1 PID: 25243 at lib/idr.c:527 sub_remove+0x87/0x1b0()
      idr_remove called for id=6 which is not allocated.
      ...
      Call Trace:
       [<ffffffff8156063c>] dump_stack+0x7a/0x96
       [<ffffffff810591ac>] warn_slowpath_common+0x8c/0xc0
       [<ffffffff81059296>] warn_slowpath_fmt+0x46/0x50
       [<ffffffff81300aa7>] sub_remove+0x87/0x1b0
       [<ffffffff810f3f02>] ? css_killed_work_fn+0x32/0x1b0
       [<ffffffff81300bf5>] idr_remove+0x25/0xd0
       [<ffffffff810f2bab>] cgroup_destroy_css_killed+0x5b/0xc0
       [<ffffffff810f4000>] css_killed_work_fn+0x130/0x1b0
       [<ffffffff8107cdbc>] process_one_work+0x26c/0x550
       [<ffffffff8107eefe>] worker_thread+0x12e/0x3b0
       [<ffffffff81085f96>] kthread+0xe6/0xf0
       [<ffffffff81570bac>] ret_from_fork+0x7c/0xb0
      ---[ end trace 2d1577ec10cf80d0 ]---
      
      It's because allocating/removing cgroup ID is not properly synchronized.
      
      The bug was introduced when we converted cgroup_ida to cgroup_idr.
      While synchronization is already done inside ida_simple_{get,remove}(),
      users are responsible for concurrent calls to idr_{alloc,remove}().
      
      tj: Refreshed on top of b58c8998 ("cgroup: fix error return from
      cgroup_create()").
      
      Fixes: 4e96ee8e ("cgroup: convert cgroup_ida to cgroup_idr")
      Cc: <stable@vger.kernel.org> #3.12+
      Reported-by: NMichal Hocko <mhocko@suse.cz>
      Signed-off-by: NLi Zefan <lizefan@huawei.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      0ab02ca8
  4. 08 2月, 2014 3 次提交
    • T
      cgroup: rename cgroup_subsys->subsys_id to ->id · aec25020
      Tejun Heo 提交于
      It's no longer referenced outside cgroup core, so renaming is easy.
      Let's rename it for consistency & brevity.
      
      This patch is pure rename.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      aec25020
    • T
      cgroup: clean up cgroup_subsys names and initialization · 073219e9
      Tejun Heo 提交于
      cgroup_subsys is a bit messier than it needs to be.
      
      * The name of a subsys can be different from its internal identifier
        defined in cgroup_subsys.h.  Most subsystems use the matching name
        but three - cpu, memory and perf_event - use different ones.
      
      * cgroup_subsys_id enums are postfixed with _subsys_id and each
        cgroup_subsys is postfixed with _subsys.  cgroup.h is widely
        included throughout various subsystems, it doesn't and shouldn't
        have claim on such generic names which don't have any qualifier
        indicating that they belong to cgroup.
      
      * cgroup_subsys->subsys_id should always equal the matching
        cgroup_subsys_id enum; however, we require each controller to
        initialize it and then BUG if they don't match, which is a bit
        silly.
      
      This patch cleans up cgroup_subsys names and initialization by doing
      the followings.
      
      * cgroup_subsys_id enums are now postfixed with _cgrp_id, and each
        cgroup_subsys with _cgrp_subsys.
      
      * With the above, renaming subsys identifiers to match the userland
        visible names doesn't cause any naming conflicts.  All non-matching
        identifiers are renamed to match the official names.
      
        cpu_cgroup -> cpu
        mem_cgroup -> memory
        perf -> perf_event
      
      * controllers no longer need to initialize ->subsys_id and ->name.
        They're generated in cgroup core and set automatically during boot.
      
      * Redundant cgroup_subsys declarations removed.
      
      * While updating BUG_ON()s in cgroup_init_early(), convert them to
        WARN()s.  BUGging that early during boot is stupid - the kernel
        can't print anything, even through serial console and the trap
        handler doesn't even link stack frame properly for back-tracing.
      
      This patch doesn't introduce any behavior changes.
      
      v2: Rebased on top of fe1217c4 ("net: net_cls: move cgroupfs
          classid handling into core").
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Acked-by: N"David S. Miller" <davem@davemloft.net>
      Acked-by: N"Rafael J. Wysocki" <rjw@rjwysocki.net>
      Acked-by: NMichal Hocko <mhocko@suse.cz>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Acked-by: NAristeu Rozanski <aris@redhat.com>
      Acked-by: NIngo Molnar <mingo@redhat.com>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Serge E. Hallyn <serue@us.ibm.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Thomas Graf <tgraf@suug.ch>
      073219e9
    • T
      cgroup: drop module support · 3ed80a62
      Tejun Heo 提交于
      With module supported dropped from net_prio, no controller is using
      cgroup module support.  None of actual resource controllers can be
      built as a module and we aren't gonna add new controllers which don't
      control resources.  This patch drops module support from cgroup.
      
      * cgroup_[un]load_subsys() and cgroup_subsys->module removed.
      
      * As there's no point in distinguishing IS_BUILTIN() and IS_MODULE(),
        cgroup_subsys.h now uses IS_ENABLED() directly.
      
      * enum cgroup_subsys_id now exactly matches the list of enabled
        controllers as ordered in cgroup_subsys.h.
      
      * cgroup_subsys[] is now a contiguously occupied array.  Size
        specification is no longer necessary and dropped.
      
      * for_each_builtin_subsys() is removed and for_each_subsys() is
        updated to not require any locking.
      
      * module ref handling is removed from rebind_subsystems().
      
      * Module related comments dropped.
      
      v2: Rebased on top of fe1217c4 ("net: net_cls: move cgroupfs
          classid handling into core").
      
      v3: Added {} around the if (need_forkexit_callback) block in
          cgroup_post_fork() for readability as suggested by Li.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      3ed80a62
  5. 13 1月, 2014 1 次提交
  6. 07 12月, 2013 1 次提交
    • T
      cgroup: remove for_each_root_subsys() · b85d2040
      Tejun Heo 提交于
      After the previous patch which introduced for_each_css(),
      for_each_root_subsys() only has two users left.  This patch replaces
      it with for_each_subsys() + explicit subsys_mask testing and remove
      for_each_root_subsys() along with cgroupfs_root->subsys_list handling.
      
      This patch doesn't introduce any behavior changes.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      b85d2040
  7. 06 12月, 2013 4 次提交
    • T
      cgroup: unify pidlist and other file handling · 6612f05b
      Tejun Heo 提交于
      In preparation of conversion to kernfs, cgroup file handling is
      updated so that it can be easily mapped to kernfs.  With the previous
      changes, the difference between pidlist and other files are very
      small.  Both are served by seq_file in a pretty standard way with the
      only difference being !pidlist files use single_open().
      
      This patch adds cftype->seq_start(), ->seq_next and ->seq_stop() and
      implements the matching cgroup_seqfile_start/next/stop() which either
      emulates single_open() behavior or invokes cftype->seq_*() operations
      if specified.  This allows using single seq_operations for both
      pidlist and other files and makes cgroup_pidlist_operations and
      cgorup_pidlist_open() no longer necessary.  As cgroup_pidlist_open()
      was the only user of cftype->open(), the method is dropped together.
      
      This brings cftype file interface very close to kernfs interface and
      mapping shouldn't be too difficult.  Once converted to kernfs, most of
      the plumbing code including cgroup_seqfile_*() will be removed as
      kernfs provides those facilities.
      
      This patch does not introduce any behavior changes.
      
      v2: Refreshed on top of the updated "cgroup: introduce struct
          cgroup_pidlist_open_file".
      
      v3: Refreshed on top of the updated "cgroup: attach cgroup_open_file
          to all cgroup files".
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      6612f05b
    • T
      cgroup: replace cftype->read_seq_string() with cftype->seq_show() · 2da8ca82
      Tejun Heo 提交于
      In preparation of conversion to kernfs, cgroup file handling is
      updated so that it can be easily mapped to kernfs.  This patch
      replaces cftype->read_seq_string() with cftype->seq_show() which is
      not limited to single_open() operation and will map directcly to
      kernfs seq_file interface.
      
      The conversions are mechanical.  As ->seq_show() doesn't have @css and
      @cft, the functions which make use of them are converted to use
      seq_css() and seq_cft() respectively.  In several occassions, e.f. if
      it has seq_string in its name, the function name is updated to fit the
      new method better.
      
      This patch does not introduce any behavior changes.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NAristeu Rozanski <arozansk@redhat.com>
      Acked-by: NVivek Goyal <vgoyal@redhat.com>
      Acked-by: NMichal Hocko <mhocko@suse.cz>
      Acked-by: NDaniel Wagner <daniel.wagner@bmw-carit.de>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      2da8ca82
    • T
      cgroup: attach cgroup_open_file to all cgroup files · 7da11279
      Tejun Heo 提交于
      In preparation of conversion to kernfs, cgroup file handling is
      updated so that it can be easily mapped to kernfs.  This patch
      attaches cgroup_open_file, which used to be attached to pidlist files,
      to all cgroup files, introduces seq_css/cft() accessors to determine
      the cgroup_subsys_state and cftype associated with a given cgroup
      seq_file, exports them as public interface.
      
      This doesn't cause any behavior changes but unifies cgroup file
      handling across different file types and will help converting them to
      kernfs seq_show() interface.
      
      v2: Li pointed out that the original patch was using
          single_open_size() incorrectly assuming that the size param is
          private data size.  Fix it by allocating @of separately and
          passing it to single_open() and explicitly freeing it in the
          release path.  This isn't the prettiest but this path is gonna be
          restructured by the following patches pretty soon.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      7da11279
    • T
      cgroup: remove cftype->read(), ->read_map() and ->write() · 6e0755b0
      Tejun Heo 提交于
      In preparation of conversion to kernfs, cgroup file handling is being
      consolidated so that it can be easily mapped to the seq_file based
      interface of kernfs.
      
      After recent updates, ->read() and ->read_map() don't have any user
      left and ->write() never had any user.  Remove them.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      6e0755b0
  8. 29 11月, 2013 2 次提交
  9. 23 11月, 2013 3 次提交
    • T
      cgroup: unexport cgroup_css() and remove __file_cft() · b36824c7
      Tejun Heo 提交于
      Now that cgroup_event is made memcg specific, the temporarily exported
      functions are no longer necessary.  Unexport cgroup_css() and remove
      __file_cft() which doesn't have any user left.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      b36824c7
    • T
      cgroup, memcg: move cgroup->event_list[_lock] and event callbacks into memcg · fba94807
      Tejun Heo 提交于
      cgroup_event is being moved from cgroup core to memcg and the
      implementation is already moved by the previous patch.  This patch
      moves the data fields and callbacks.
      
      * cgroup->event_list[_lock] are moved to mem_cgroup.
      
      * cftype->[un]register_event() are moved to cgroup_event.  This makes
        it impossible for individual cftype definitions to specify their
        event callbacks.  This is worked around by simply hard-coding
        filename to event callback mapping in cgroup_write_event_control().
        This is awkward and inflexible, which is actually desirable given
        that we don't want to grow more usages of this feature.
      
      * eventfd_ctx declaration is removed from cgroup.h, which makes
        vmpressure.h miss eventfd_ctx declaration.  Include eventfd.h from
        vmpressure.h.
      
      v2: Use file name from dentry instead of cftype.  This will allow
          removing all cftype handling in the function.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NMichal Hocko <mhocko@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Balbir Singh <bsingharora@gmail.com>
      fba94807
    • T
      cgroup, memcg: move cgroup_event implementation to memcg · 79bd9814
      Tejun Heo 提交于
      cgroup_event is way over-designed and tries to build a generic
      flexible event mechanism into cgroup - fully customizable event
      specification for each user of the interface.  This is utterly
      unnecessary and overboard especially in the light of the planned
      unified hierarchy as there's gonna be single agent.  Simply generating
      events at fixed points, or if that's too restrictive, configureable
      cadence or single set of configureable points should be enough.
      
      Thankfully, memcg is the only user and gets to keep it.  Replacing it
      with something simpler on sane_behavior is strongly recommended.
      
      This patch moves cgroup_event and "cgroup.event_control"
      implementation to mm/memcontrol.c.  Clearing of events on cgroup
      destruction is moved from cgroup_destroy_locked() to
      mem_cgroup_css_offline(), which shouldn't make any noticeable
      difference.
      
      cgroup_css() and __file_cft() are exported to enable the move;
      however, this will soon be reverted once the event code is updated to
      be memcg specific.
      
      Note that "cgroup.event_control" will now exist only on the hierarchy
      with memcg attached to it.  While this change is visible to userland,
      it is unlikely to be noticeable as the file has never been meaningful
      outside memcg.
      
      Aside from the above change, this is pure code relocation.
      
      v2: Per Li Zefan's comments, init/Kconfig updated accordingly and
          poll.h inclusion moved from cgroup.c to memcontrol.c.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NMichal Hocko <mhocko@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Balbir Singh <bsingharora@gmail.com>
      79bd9814
  10. 24 9月, 2013 1 次提交
  11. 27 8月, 2013 2 次提交
    • T
      cgroup: implement CFTYPE_NO_PREFIX · 9fa4db33
      Tejun Heo 提交于
      When cgroup files are created, cgroup core automatically prepends the
      name of the subsystem as prefix.  This patch adds CFTYPE_NO_ which
      disables the automatic prefix.  This is to work around historical
      baggages and shouldn't be used for new files.
      
      This will be used to move "cgroup.event_control" from cgroup core to
      memcg.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Glauber Costa <glommer@gmail.com>
      9fa4db33
    • T
      cgroup: rename cgroup_css_from_dir() to css_from_dir() and update its syntax · 35cf0836
      Tejun Heo 提交于
      cgroup_css_from_dir() will grow another user.  In preparation, make
      the following changes.
      
      * All css functions are prefixed with just "css_", rename it to
        css_from_dir().
      
      * Take dentry * instead of file * as dentry is what ultimately
        identifies a cgroup and file may not always be available.  Note that
        the function now checkes whether @dentry->d_inode is NULL as the
        caller now may specify a negative dentry.
      
      * Make it take cgroup_subsys * instead of integer subsys_id.  This
        simplifies the function and allows specifying no subsystem for
        cgroup->dummy_css.
      
      * Make return section a bit less verbose.
      
      This patch doesn't introduce any behavior changes.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      35cf0836
  12. 19 8月, 2013 1 次提交
  13. 14 8月, 2013 3 次提交
    • T
      cgroup: RCU protect each cgroup_subsys_state release · 0c21ead1
      Tejun Heo 提交于
      With the planned unified hierarchy, individual css's will be created
      and destroyed dynamically across the lifetime of a cgroup.  To enable
      such usages, css destruction is being decoupled from cgroup
      destruction.  Most of the destruction path has been decoupled but the
      actual free of css still depends on cgroup free path.
      
      When all css refs are drained, css_release() kicks off
      css_free_work_fn() which puts the cgroup.  When the cgroup refcnt
      reaches zero, cgroup_diput() is invoked which in turn schedules RCU
      free of the cgroup.  After a grace period, all css's are freed along
      with the cgroup itself.
      
      This patch moves the RCU grace period and css freeing from cgroup
      release path to css release path.  css_release(), instead of kicking
      off css_free_work_fn() directly, schedules RCU callback
      css_free_rcu_fn() which in turn kicks off css_free_work_fn() after a
      RCU grace period.  css_free_work_fn() is updated to free the css
      directly.
      
      The five-way punting - percpu ref kill confirmation, a work item,
      percpu ref release, RCU grace period, and again a work item - is quite
      hairy but the work items are there only to provide process context and
      the actual sequence is kill confirm -> release -> RCU free, which
      isn't simple but not too crazy.
      
      This removes cgroup_css() usage after offline_css() allowing clearing
      cgroup->subsys[] from offline_css(), which makes it consistent with
      online_css() and brings it closer to proper lifetime management for
      individual css's.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      0c21ead1
    • T
      cgroup: decouple cgroup_subsys_state destruction from cgroup destruction · 09a503ea
      Tejun Heo 提交于
      Currently, css (cgroup_subsys_state) lifetime is tied to that of the
      associated cgroup.  css's are created when the associated cgroup is
      created and destroyed when it gets destroyed.  Also, individual css's
      aren't RCU protected but the whole cgroup is.  With the planned
      unified hierarchy, css's will need to be dynamically created and
      destroyed within the lifetime of a cgroup.
      
      To enable such usages, this patch decouples css destruction from
      cgroup destruction - offline_css() invocation and the final css_put()
      are moved from cgroup_destroy_css_killed() to css_killed_work_fn().
      Now each css is individually offlined and put as its reference count
      is killed instead of waiting for all css's attached to the cgroup to
      finish refcnt killing and then proceeding to offlining and putting
      them together.
      
      While this changes the order of destruction operations, the changes
      shouldn't be noticeable to cgroup subsystems or userland.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      09a503ea
    • T
      cgroup: replace cgroup->css_kill_cnt with ->nr_css · f20104de
      Tejun Heo 提交于
      Currently, css (cgroup_subsys_state) lifetime is tied to that of the
      associated cgroup.  With the planned unified hierarchy, css's will be
      dynamically created and destroyed within the lifetime of a cgroup.  To
      enable such usages, css's will be individually RCU protected instead
      of being tied to the cgroup.
      
      cgroup->css_kill_cnt is used during cgroup destruction to wait for css
      reference count disable; however, this model doesn't work once css's
      lifetimes are managed separately from cgroup's.  This patch replaces
      it with cgroup->nr_css which is an cgroup_mutex protected integer
      counting the number of attached css's.  The count is incremented from
      online_css() and decremented after refcnt kill is confirmed.  If the
      count reaches zero and the cgroup is marked dead, the second stage of
      cgroup destruction is kicked off.  If a cgroup doesn't have any css
      attached at the time of rmdir, cgroup_destroy_locked() now invokes the
      second stage directly as no css kill confirmation would happen.
      
      cgroup_offline_fn() - the second step of cgroup destruction - is
      renamed to cgroup_destroy_css_killed() and now expects to be called
      with cgroup_mutex held.
      
      While this patch changes how css destruction is punted to work items,
      it shouldn't change any visible behavior.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      f20104de
  14. 13 8月, 2013 3 次提交
    • T
      cgroup: add __rcu modifier to cgroup->subsys[] · 73e80ed8
      Tejun Heo 提交于
      For the planned unified hierarchy, each css (cgroup_subsys_state) will
      be RCU protected so that it can be created and destroyed individually
      while allowing RCU accesses.  Previous changes ensured that all
      cgroup->subsys[] accesses use the cgroup_css() accessor.  This patch
      adds __rcu modifier to cgroup->subsys[], add matching RCU dereference
      in cgroup_css() and convert all assignments to either
      rcu_assign_pointer() or RCU_INIT_POINTER().
      
      This change prepares for the actual RCUfication of css's and doesn't
      introduce any visible behavior change.  The conversion is verified
      with sparse and all accesses are properly RCU annotated.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      73e80ed8
    • T
      cgroup: add cgroup_subsys_state->parent · 0ae78e0b
      Tejun Heo 提交于
      With the planned unified hierarchy, css's (cgroup_subsys_state) will
      be RCU protected and allowed to be attached and detached dynamically
      over the course of a cgroup's lifetime.  This means that css's will
      stay accessible after being detached from its cgroup - the matching
      pointer in cgroup->subsys[] cleared - for ref draining and RCU grace
      period.
      
      cgroup core still wants to guarantee that the parent css is never
      destroyed before its children and css_parent() always returns the
      parent regardless of the state of the child css as long as it's
      accessible.
      
      This patch makes css's hold onto their parents and adds css->parent so
      that the parent css is never detroyed before its children and can be
      determined without consulting the cgroups.
      
      cgroup->dummy_css is also updated to point to the parent dummy_css;
      however, it doesn't need to worry about object lifetime as the parent
      cgroup is already pinned by the child.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      0ae78e0b
    • T
      cgroup: rename cgroup_subsys_state->dput_work and its callback function · 35ef10da
      Tejun Heo 提交于
      css (cgroup_subsys_state) will become RCU protected and there will be
      two stages which require punting to work item during release.  To
      prepare for using the work item for multiple times, rename
      css->dput_work to css->destroy_work and css_dput_fn() to
      css_free_work_fn() and move work item initialization from css init to
      right before the actual usage.
      
      This reorganization doesn't introduce any behavior change.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      35ef10da