1. 02 7月, 2016 1 次提交
  2. 17 2月, 2016 1 次提交
    • A
      cgroup: introduce cgroup namespaces · a79a908f
      Aditya Kali 提交于
      Introduce the ability to create new cgroup namespace. The newly created
      cgroup namespace remembers the cgroup of the process at the point
      of creation of the cgroup namespace (referred as cgroupns-root).
      The main purpose of cgroup namespace is to virtualize the contents
      of /proc/self/cgroup file. Processes inside a cgroup namespace
      are only able to see paths relative to their namespace root
      (unless they are moved outside of their cgroupns-root, at which point
       they will see a relative path from their cgroupns-root).
      For a correctly setup container this enables container-tools
      (like libcontainer, lxc, lmctfy, etc.) to create completely virtualized
      containers without leaking system level cgroup hierarchy to the task.
      This patch only implements the 'unshare' part of the cgroupns.
      Signed-off-by: NAditya Kali <adityakali@google.com>
      Signed-off-by: NSerge Hallyn <serge.hallyn@canonical.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      a79a908f
  3. 09 12月, 2015 1 次提交
    • T
      sock, cgroup: add sock->sk_cgroup · bd1060a1
      Tejun Heo 提交于
      In cgroup v1, dealing with cgroup membership was difficult because the
      number of membership associations was unbound.  As a result, cgroup v1
      grew several controllers whose primary purpose is either tagging
      membership or pull in configuration knobs from other subsystems so
      that cgroup membership test can be avoided.
      
      net_cls and net_prio controllers are examples of the latter.  They
      allow configuring network-specific attributes from cgroup side so that
      network subsystem can avoid testing cgroup membership; unfortunately,
      these are not only cumbersome but also problematic.
      
      Both net_cls and net_prio aren't properly hierarchical.  Both inherit
      configuration from the parent on creation but there's no interaction
      afterwards.  An ancestor doesn't restrict the behavior in its subtree
      in anyway and configuration changes aren't propagated downwards.
      Especially when combined with cgroup delegation, this is problematic
      because delegatees can mess up whatever network configuration
      implemented at the system level.  net_prio would allow the delegatees
      to set whatever priority value regardless of CAP_NET_ADMIN and net_cls
      the same for classid.
      
      While it is possible to solve these issues from controller side by
      implementing hierarchical allowable ranges in both controllers, it
      would involve quite a bit of complexity in the controllers and further
      obfuscate network configuration as it becomes even more difficult to
      tell what's actually being configured looking from the network side.
      While not much can be done for v1 at this point, as membership
      handling is sane on cgroup v2, it'd be better to make cgroup matching
      behave like other network matches and classifiers than introducing
      further complications.
      
      In preparation, this patch updates sock->sk_cgrp_data handling so that
      it points to the v2 cgroup that sock was created in until either
      net_prio or net_cls is used.  Once either of the two is used,
      sock->sk_cgrp_data reverts to its previous role of carrying prioidx
      and classid.  This is to avoid adding yet another cgroup related field
      to struct sock.
      
      As the mode switching can happen at most once per boot, the switching
      mechanism is aimed at lowering hot path overhead.  It may leak a
      finite, likely small, number of cgroup refs and report spurious
      prioidx or classid on switching; however, dynamic updates of prioidx
      and classid have always been racy and lossy - socks between creation
      and fd installation are never updated, config changes don't update
      existing sockets at all, and prioidx may index with dead and recycled
      cgroup IDs.  Non-critical inaccuracies from small race windows won't
      make any noticeable difference.
      
      This patch doesn't make use of the pointer yet.  The following patch
      will implement netfilter match for cgroup2 membership.
      
      v2: Use sock_cgroup_data to avoid inflating struct sock w/ another
          cgroup specific field.
      
      v3: Add comments explaining why sock_data_prioidx() and
          sock_data_classid() use different fallback values.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Daniel Wagner <daniel.wagner@bmw-carit.de>
      CC: Neil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bd1060a1
  4. 03 12月, 2015 2 次提交
    • O
      cgroup: kill cgrp_ss_priv[CGROUP_CANFORK_COUNT] and friends · b53202e6
      Oleg Nesterov 提交于
      Now that nobody use the "priv" arg passed to can_fork/cancel_fork/fork we can
      kill CGROUP_CANFORK_COUNT/SUBSYS_TAG/etc and cgrp_ss_priv[] in copy_process().
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      b53202e6
    • T
      cgroup: fix handling of multi-destination migration from subtree_control enabling · 1f7dd3e5
      Tejun Heo 提交于
      Consider the following v2 hierarchy.
      
        P0 (+memory) --- P1 (-memory) --- A
                                       \- B
             
      P0 has memory enabled in its subtree_control while P1 doesn't.  If
      both A and B contain processes, they would belong to the memory css of
      P1.  Now if memory is enabled on P1's subtree_control, memory csses
      should be created on both A and B and A's processes should be moved to
      the former and B's processes the latter.  IOW, enabling controllers
      can cause atomic migrations into different csses.
      
      The core cgroup migration logic has been updated accordingly but the
      controller migration methods haven't and still assume that all tasks
      migrate to a single target css; furthermore, the methods were fed the
      css in which subtree_control was updated which is the parent of the
      target csses.  pids controller depends on the migration methods to
      move charges and this made the controller attribute charges to the
      wrong csses often triggering the following warning by driving a
      counter negative.
      
       WARNING: CPU: 1 PID: 1 at kernel/cgroup_pids.c:97 pids_cancel.constprop.6+0x31/0x40()
       Modules linked in:
       CPU: 1 PID: 1 Comm: systemd Not tainted 4.4.0-rc1+ #29
       ...
        ffffffff81f65382 ffff88007c043b90 ffffffff81551ffc 0000000000000000
        ffff88007c043bc8 ffffffff810de202 ffff88007a752000 ffff88007a29ab00
        ffff88007c043c80 ffff88007a1d8400 0000000000000001 ffff88007c043bd8
       Call Trace:
        [<ffffffff81551ffc>] dump_stack+0x4e/0x82
        [<ffffffff810de202>] warn_slowpath_common+0x82/0xc0
        [<ffffffff810de2fa>] warn_slowpath_null+0x1a/0x20
        [<ffffffff8118e031>] pids_cancel.constprop.6+0x31/0x40
        [<ffffffff8118e0fd>] pids_can_attach+0x6d/0xf0
        [<ffffffff81188a4c>] cgroup_taskset_migrate+0x6c/0x330
        [<ffffffff81188e05>] cgroup_migrate+0xf5/0x190
        [<ffffffff81189016>] cgroup_attach_task+0x176/0x200
        [<ffffffff8118949d>] __cgroup_procs_write+0x2ad/0x460
        [<ffffffff81189684>] cgroup_procs_write+0x14/0x20
        [<ffffffff811854e5>] cgroup_file_write+0x35/0x1c0
        [<ffffffff812e26f1>] kernfs_fop_write+0x141/0x190
        [<ffffffff81265f88>] __vfs_write+0x28/0xe0
        [<ffffffff812666fc>] vfs_write+0xac/0x1a0
        [<ffffffff81267019>] SyS_write+0x49/0xb0
        [<ffffffff81bcef32>] entry_SYSCALL_64_fastpath+0x12/0x76
      
      This patch fixes the bug by removing @css parameter from the three
      migration methods, ->can_attach, ->cancel_attach() and ->attach() and
      updating cgroup_taskset iteration helpers also return the destination
      css in addition to the task being migrated.  All controllers are
      updated accordingly.
      
      * Controllers which don't care whether there are one or multiple
        target csses can be converted trivially.  cpu, io, freezer, perf,
        netclassid and netprio fall in this category.
      
      * cpuset's current implementation assumes that there's single source
        and destination and thus doesn't support v2 hierarchy already.  The
        only change made by this patchset is how that single destination css
        is obtained.
      
      * memory migration path already doesn't do anything on v2.  How the
        single destination css is obtained is updated and the prep stage of
        mem_cgroup_can_attach() is reordered to accomodate the change.
      
      * pids is the only controller which was affected by this bug.  It now
        correctly handles multi-destination migrations and no longer causes
        counter underflow from incorrect accounting.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-and-tested-by: NDaniel Wagner <daniel.wagner@bmw-carit.de>
      Cc: Aleksa Sarai <cyphar@cyphar.com>
      1f7dd3e5
  5. 21 11月, 2015 2 次提交
    • T
      cgroup: implement cgroup_get_from_path() and expose cgroup_put() · 16af4396
      Tejun Heo 提交于
      Implement cgroup_get_from_path() using kernfs_walk_and_get() which
      obtains a default hierarchy cgroup from its path.  This will be used
      to allow cgroup path based matching from outside cgroup proper -
      e.g. networking and perf.
      
      v2: Add EXPORT_SYMBOL_GPL(cgroup_get_from_path).
      Signed-off-by: NTejun Heo <tj@kernel.org>
      16af4396
    • T
      cgroup: record ancestor IDs and reimplement cgroup_is_descendant() using it · b11cfb58
      Tejun Heo 提交于
      cgroup_is_descendant() currently walks up the hierarchy and compares
      each ancestor to the cgroup in question.  While enough for cgroup core
      usages, this can't be used in hot paths to test cgroup membership.
      This patch adds cgroup->ancestor_ids[] which records the IDs of all
      ancestors including self and cgroup->level for the nesting level.
      
      This allows testing whether a given cgroup is a descendant of another
      in three finite steps - testing whether the two belong to the same
      hierarchy, whether the descendant candidate is at the same or a higher
      level than the ancestor and comparing the recorded ancestor_id at the
      matching level.  cgroup_is_descendant() is accordingly reimplmented
      and made inline.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      b11cfb58
  6. 16 11月, 2015 1 次提交
    • T
      cgroup: fix cftype->file_offset handling · 34c06254
      Tejun Heo 提交于
      6f60eade ("cgroup: generalize obtaining the handles of and
      notifying cgroup files") introduced cftype->file_offset so that the
      handles for per-css file instances can be recorded.  These handles
      then can be used, for example, to generate file modified
      notifications.
      
      Unfortunately, it made the wrong assumption that files are created
      once for a given css and removed on its destruction.  Due to the
      dependencies among subsystems, a css may be hidden from userland and
      then later shown again.  This is implemented by removing and
      re-creating the affected files, so the associated kernfs_node for a
      given cgroup file may change over time.  This incorrect assumption led
      to the corruption of css->files lists.
      
      Reimplement cftype->file_offset handling so that cgroup_file->kn is
      protected by a lock and updated as files are created and destroyed.
      This also makes keeping them on per-cgroup list unnecessary.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NJames Sedgwick <jsedgwick@fb.com>
      Fixes: 6f60eade ("cgroup: generalize obtaining the handles of and notifying cgroup files")
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: NZefan Li <lizefan@huawei.com>
      34c06254
  7. 16 10月, 2015 4 次提交
    • T
      cgroup: keep zombies associated with their original cgroups · 2e91fa7f
      Tejun Heo 提交于
      cgroup_exit() is called when a task exits and disassociates the
      exiting task from its cgroups and half-attach it to the root cgroup.
      This is unnecessary and undesirable.
      
      No controller actually needs an exiting task to be disassociated with
      non-root cgroups.  Both cpu and perf_event controllers update the
      association to the root cgroup from their exit callbacks just to keep
      consistent with the cgroup core behavior.
      
      Also, this disassociation makes it difficult to track resources held
      by zombies or determine where the zombies came from.  Currently, pids
      controller is completely broken as it uncharges on exit and zombies
      always escape the resource restriction.  With cgroup association being
      reset on exit, fixing it is pretty painful.
      
      There's no reason to reset cgroup membership on exit.  The zombie can
      be removed from its css_set so that it doesn't show up on
      "cgroup.procs" and thus can't be migrated or interfere with cgroup
      removal.  It can still pin and point to the css_set so that its cgroup
      membership is maintained.  This patch makes cgroup core keep zombies
      associated with their cgroups at the time of exit.
      
      * Previous patches decoupled populated_cnt tracking from css_set
        lifetime, so a dying task can be simply unlinked from its css_set
        while pinning and pointing to the css_set.  This keeps css_set
        association from task side alive while hiding it from "cgroup.procs"
        and populated_cnt tracking.  The css_set reference is dropped when
        the task_struct is freed.
      
      * ->exit() callback no longer needs the css arguments as the
        associated css never changes once PF_EXITING is set.  Removed.
      
      * cpu and perf_events controllers no longer need ->exit() callbacks.
        There's no reason to explicitly switch away on exit.  The final
        schedule out is enough.  The callbacks are removed.
      
      * On traditional hierarchies, nothing changes.  "/proc/PID/cgroup"
        still reports "/" for all zombies.  On the default hierarchy,
        "/proc/PID/cgroup" keeps reporting the cgroup that the task belonged
        to at the time of exit.  If the cgroup gets removed before the task
        is reaped, " (deleted)" is appended.
      
      v2: Build brekage due to missing dummy cgroup_free() when
          !CONFIG_CGROUP fixed.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      2e91fa7f
    • T
      cgroup: make css_set_rwsem a spinlock and rename it to css_set_lock · f0d9a5f1
      Tejun Heo 提交于
      css_set_rwsem is the inner lock protecting css_sets and is accessed
      from hot paths such as fork and exit.  Internally, it has no reason to
      be a rwsem or even mutex.  There are no internal blocking operations
      while holding it.  This was rwsem because css task iteration used to
      expose it to external iterator users.  As the previous patch updated
      css task iteration such that the locking is not leaked to its users,
      there's no reason to keep it a rwsem.
      
      This patch converts css_set_rwsem to a spinlock and rename it to
      css_set_lock.  It uses bh-safe operations as a planned usage needs to
      access it from RCU callback context.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      f0d9a5f1
    • T
      cgroup: don't hold css_set_rwsem across css task iteration · ed27b9f7
      Tejun Heo 提交于
      css_sets are synchronized through css_set_rwsem but the locking scheme
      is kinda bizarre.  The hot paths - fork and exit - have to write lock
      the rwsem making the rw part pointless; furthermore, many readers
      already hold cgroup_mutex.
      
      One of the readers is css task iteration.  It read locks the rwsem
      over the entire duration of iteration.  This leads to silly locking
      behavior.  When cpuset tries to migrate processes of a cgroup to a
      different NUMA node, css_set_rwsem is held across the entire migration
      attempt which can take a long time locking out forking, exiting and
      other cgroup operations.
      
      This patch updates css task iteration so that it locks css_set_rwsem
      only while the iterator is being advanced.  css task iteration
      involves two levels - css_set and task iteration.  As css_sets in use
      are practically immutable, simply pinning the current one is enough
      for resuming iteration afterwards.  Task iteration is tricky as tasks
      may leave their css_set while iteration is in progress.  This is
      solved by keeping track of active iterators and advancing them if
      their next task leaves its css_set.
      
      v2: put_task_struct() in css_task_iter_next() moved outside
          css_set_rwsem.  A later patch will add cgroup operations to
          task_struct free path which may grab the same lock and this avoids
          deadlock possibilities.
      
          css_set_move_task() updated to use list_for_each_entry_safe() when
          walking task_iters and advancing them.  This is necessary as
          advancing an iter may remove it from the list.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      ed27b9f7
    • T
      cgroup: replace cgroup_has_tasks() with cgroup_is_populated() · 27bd4dbb
      Tejun Heo 提交于
      Currently, cgroup_has_tasks() tests whether the target cgroup has any
      css_set linked to it.  This works because a css_set's refcnt converges
      with the number of tasks linked to it and thus there's no css_set
      linked to a cgroup if it doesn't have any live tasks.
      
      To help tracking resource usage of zombie tasks, putting the ref of
      css_set will be separated from disassociating the task from the
      css_set which means that a cgroup may have css_sets linked to it even
      when it doesn't have any live tasks.
      
      This patch replaces cgroup_has_tasks() with cgroup_is_populated()
      which tests cgroup->nr_populated instead which locally counts the
      number of populated css_sets.  Unlike cgroup_has_tasks(),
      cgroup_is_populated() is recursive - if any of the descendants is
      populated, the cgroup is populated too.  While this changes the
      meaning of the test, all the existing users are okay with the change.
      
      While at it, replace the open-coded ->populated_cnt test in
      cgroup_events_show() with cgroup_is_populated().
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Li Zefan <lizefan@huawei.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      27bd4dbb
  8. 23 9月, 2015 1 次提交
    • T
      cgroup, memcg, cpuset: implement cgroup_taskset_for_each_leader() · 4530eddb
      Tejun Heo 提交于
      It wasn't explicitly documented but, when a process is being migrated,
      cpuset and memcg depend on cgroup_taskset_first() returning the
      threadgroup leader; however, this approach is somewhat ghetto and
      would no longer work for the planned multi-process migration.
      
      This patch introduces explicit cgroup_taskset_for_each_leader() which
      iterates over only the threadgroup leaders and replaces
      cgroup_taskset_first() usages for accessing the leader with it.
      
      This prepares both memcg and cpuset for multi-process migration.  This
      patch also updates the documentation for cgroup_taskset_for_each() to
      clarify the iteration rules and removes comments mentioning task
      ordering in tasksets.
      
      v2: A previous patch which added threadgroup leader test was dropped.
          Patch updated accordingly.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NZefan Li <lizefan@huawei.com>
      Acked-by: NMichal Hocko <mhocko@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      4530eddb
  9. 19 9月, 2015 1 次提交
    • T
      cgroup: generalize obtaining the handles of and notifying cgroup files · 6f60eade
      Tejun Heo 提交于
      cgroup core handles creations and removals of cgroup interface files
      as described by cftypes.  There are cases where the handle for a given
      file instance is necessary, for example, to generate a file modified
      event.  Currently, this is handled by explicitly matching the callback
      method pointer and storing the file handle manually in
      cgroup_add_file().  While this simple approach works for cgroup core
      files, it can't for controller interface files.
      
      This patch generalizes cgroup interface file handle handling.  struct
      cgroup_file is defined and each cftype can optionally tell cgroup core
      to store the file handle by setting ->file_offset.  A file handle
      remains accessible as long as the containing css is accessible.
      
      Both "cgroup.procs" and "cgroup.events" are converted to use the new
      generic mechanism instead of hooking directly into cgroup_add_file().
      Also, cgroup_file_notify() which takes a struct cgroup_file and
      generates a file modified event on it is added and replaces explicit
      kernfs_notify() invocations.
      
      This generalizes cgroup file handle handling and allows controllers to
      generate file modified notifications.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Li Zefan <lizefan@huawei.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      6f60eade
  10. 18 9月, 2015 2 次提交
  11. 05 8月, 2015 1 次提交
    • T
      cgroup: define controller file conventions · 6abc8ca1
      Tejun Heo 提交于
      Traditionally, each cgroup controller implemented whatever interface
      it wanted leading to interfaces which are widely inconsistent.
      Examining the requirements of the controllers readily yield that there
      are only a few control schemes shared among all.
      
      Two major controllers already had to implement new interface for the
      unified hierarchy due to significant structural changes.  Let's take
      the chance to establish common conventions throughout all controllers.
      
      This patch defines CGROUP_WEIGHT_MIN/DFL/MAX to be used on all weight
      based control knobs and documents the conventions that controllers
      should follow on the unified hierarchy.  Except for io.weight knob,
      all existing unified hierarchy knobs are already compliant.  A
      follow-up patch will update io.weight.
      
      v2: Added descriptions of min, low and high knobs.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: Li Zefan <lizefan@huawei.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      6abc8ca1
  12. 15 7月, 2015 1 次提交
    • A
      cgroup: allow a cgroup subsystem to reject a fork · 7e47682e
      Aleksa Sarai 提交于
      Add a new cgroup subsystem callback can_fork that conditionally
      states whether or not the fork is accepted or rejected by a cgroup
      policy. In addition, add a cancel_fork callback so that if an error
      occurs later in the forking process, any state modified by can_fork can
      be reverted.
      
      Allow for a private opaque pointer to be passed from cgroup_can_fork to
      cgroup_post_fork, allowing for the fork state to be stored by each
      subsystem separately.
      
      Also add a tagging system for cgroup_subsys.h to allow for CGROUP_<TAG>
      enumerations to be be defined and used. In addition, explicitly add a
      CGROUP_CANFORK_COUNT macro to make arrays easier to define.
      
      This is in preparation for implementing the pids cgroup subsystem.
      Signed-off-by: NAleksa Sarai <cyphar@cyphar.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      7e47682e
  13. 02 6月, 2015 1 次提交
    • T
      cgroup, block: implement task_get_css() and use it in bio_associate_current() · ec438699
      Tejun Heo 提交于
      bio_associate_current() currently open codes task_css() and
      css_tryget_online() to find and pin $current's blkcg css.  Abstract it
      into task_get_css() which is implemented from cgroup side.  As a task
      is always associated with an online css for every subsystem except
      while the css_set update is propagating, task_get_css() retries till
      css_tryget_online() succeeds.
      
      This is a cleanup and shouldn't lead to noticeable behavior changes.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Li Zefan <lizefan@huawei.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      ec438699
  14. 19 5月, 2015 2 次提交
    • T
      cgroup: reorganize include/linux/cgroup.h · c326aa2b
      Tejun Heo 提交于
      From c4d440938b5e2015c70594fe6666a099c844f929 Mon Sep 17 00:00:00 2001
      From: Tejun Heo <tj@kernel.org>
      Date: Wed, 13 May 2015 16:21:40 -0400
      
      Over time, cgroup.h grew organically and doesn't have much logical
      structure at this point.  Separation of cgroup-defs.h in the previous
      patch gives us a good chance for reorganizing cgroup.h as changes to
      the header are likely to cause conflicts anyway.
      
      This patch reorganizes cgroup.h so that it has consistent logical
      grouping.
      
      This is pure reorganization.
      
      v2: Relocating #ifdef CONFIG_CGROUPS caused build failure when cgroup
          is disabled.  Dropped.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      c326aa2b
    • T
      cgroup: separate out include/linux/cgroup-defs.h · b4a04ab7
      Tejun Heo 提交于
      From 2d728f74bfc071df06773e2fd7577dd5dab6425d Mon Sep 17 00:00:00 2001
      From: Tejun Heo <tj@kernel.org>
      Date: Wed, 13 May 2015 15:37:01 -0400
      
      This patch separates out cgroup-defs.h from cgroup.h which has grown a
      lot of dependencies.  cgroup-defs.h currently only contains constant
      and type definitions and can be used to break circular include
      dependency.  While moving, definitions are reordered so that
      cgroup-defs.h has consistent logical structure.
      
      This patch is pure reorganization.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      b4a04ab7
  15. 07 1月, 2015 1 次提交
  16. 11 12月, 2014 1 次提交
  17. 20 11月, 2014 1 次提交
  18. 18 11月, 2014 3 次提交
  19. 19 9月, 2014 4 次提交
  20. 18 9月, 2014 1 次提交
  21. 15 7月, 2014 4 次提交
    • T
      cgroup: make CFTYPE_ONLY_ON_DFL and CFTYPE_NO_ internal to cgroup core · 05ebb6e6
      Tejun Heo 提交于
      cgroup now distinguishes cftypes for the default and legacy
      hierarchies more explicitly by using separate arrays and
      CFTYPE_ONLY_ON_DFL and CFTYPE_INSANE should be and are used only
      inside cgroup core proper.  Let's make it clear that the flags are
      internal by prefixing them with double underscores.
      
      CFTYPE_INSANE is renamed to __CFTYPE_NOT_ON_DFL for consistency.  The
      two flags are also collected and assigned bits >= 16 so that they
      aren't mixed with the published flags.
      
      v2: Convert the extra ones in cgroup_exit_cftypes() which are added by
          revision to the previous patch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      05ebb6e6
    • T
      cgroup: distinguish the default and legacy hierarchies when handling cftypes · a8ddc821
      Tejun Heo 提交于
      Until now, cftype arrays carried files for both the default and legacy
      hierarchies and the files which needed to be used on only one of them
      were flagged with either CFTYPE_ONLY_ON_DFL or CFTYPE_INSANE.  This
      gets confusing very quickly and we may end up exposing interface files
      to the default hierarchy without thinking it through.
      
      This patch makes cgroup core provide separate sets of interfaces for
      cftype handling so that the cftypes for the default and legacy
      hierarchies are clearly distinguished.  The previous two patches
      renamed the existing ones so that they clearly indicate that they're
      for the legacy hierarchies.  This patch adds the interface for the
      default hierarchy and apply them selectively depending on the
      hierarchy type.
      
      * cftypes added through cgroup_subsys->dfl_cftypes and
        cgroup_add_dfl_cftypes() only show up on the default hierarchy.
      
      * cftypes added through cgroup_subsys->legacy_cftypes and
        cgroup_add_legacy_cftypes() only show up on the legacy hierarchies.
      
      * cgroup_subsys->dfl_cftypes and ->legacy_cftypes can point to the
        same array for the cases where the interface files are identical on
        both types of hierarchies.
      
      * This makes all the existing subsystem interface files legacy-only by
        default and all subsystems will have no interface file created when
        enabled on the default hierarchy.  Each subsystem should explicitly
        review and compose the interface for the default hierarchy.
      
      * A boot param "cgroup__DEVEL__legacy_files_on_dfl" is added which
        makes subsystems which haven't decided the interface files for the
        default hierarchy to present the legacy files on the default
        hierarchy so that its behavior on the default hierarchy can be
        tested.  As the awkward name suggests, this is for development only.
      
      * memcg's CFTYPE_INSANE on "use_hierarchy" is noop now as the whole
        array isn't used on the default hierarchy.  The flag is removed.
      
      v2: Updated documentation for cgroup__DEVEL__legacy_files_on_dfl.
      
      v3: Clear CFTYPE_ONLY_ON_DFL and CFTYPE_INSANE when cfts are removed
          as suggested by Li.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Aristeu Rozanski <aris@redhat.com>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      a8ddc821
    • T
      cgroup: replace cgroup_add_cftypes() with cgroup_add_legacy_cftypes() · 2cf669a5
      Tejun Heo 提交于
      Currently, cftypes added by cgroup_add_cftypes() are used for both the
      unified default hierarchy and legacy ones and subsystems can mark each
      file with either CFTYPE_ONLY_ON_DFL or CFTYPE_INSANE if it has to
      appear only on one of them.  This is quite hairy and error-prone.
      Also, we may end up exposing interface files to the default hierarchy
      without thinking it through.
      
      cgroup_subsys will grow two separate cftype addition functions and
      apply each only on the hierarchies of the matching type.  This will
      allow organizing cftypes in a lot clearer way and encourage subsystems
      to scrutinize the interface which is being exposed in the new default
      hierarchy.
      
      In preparation, this patch adds cgroup_add_legacy_cftypes() which
      currently is a simple wrapper around cgroup_add_cftypes() and replaces
      all cgroup_add_cftypes() usages with it.
      
      While at it, this patch drops a completely spurious return from
      __hugetlb_cgroup_file_init().
      
      This patch doesn't introduce any functional differences.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      2cf669a5
    • T
      cgroup: rename cgroup_subsys->base_cftypes to ->legacy_cftypes · 5577964e
      Tejun Heo 提交于
      Currently, cgroup_subsys->base_cftypes is used for both the unified
      default hierarchy and legacy ones and subsystems can mark each file
      with either CFTYPE_ONLY_ON_DFL or CFTYPE_INSANE if it has to appear
      only on one of them.  This is quite hairy and error-prone.  Also, we
      may end up exposing interface files to the default hierarchy without
      thinking it through.
      
      cgroup_subsys will grow two separate cftype arrays and apply each only
      on the hierarchies of the matching type.  This will allow organizing
      cftypes in a lot clearer way and encourage subsystems to scrutinize
      the interface which is being exposed in the new default hierarchy.
      
      In preparation, this patch renames cgroup_subsys->base_cftypes to
      cgroup_subsys->legacy_cftypes.  This patch is pure rename.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Aristeu Rozanski <aris@redhat.com>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      5577964e
  22. 09 7月, 2014 4 次提交
    • T
      cgroup: remove sane_behavior support on non-default hierarchies · aa6ec29b
      Tejun Heo 提交于
      sane_behavior has been used as a development vehicle for the default
      unified hierarchy.  Now that the default hierarchy is in place, the
      flag became redundant and confusing as its usage is allowed on all
      hierarchies.  There are gonna be either the default hierarchy or
      legacy ones.  Let's make that clear by removing sane_behavior support
      on non-default hierarchies.
      
      This patch replaces cgroup_sane_behavior() with cgroup_on_dfl().  The
      comment on top of CGRP_ROOT_SANE_BEHAVIOR is moved to on top of
      cgroup_on_dfl() with sane_behavior specific part dropped.
      
      On the default and legacy hierarchies w/o sane_behavior, this
      shouldn't cause any behavior differences.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NVivek Goyal <vgoyal@redhat.com>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      aa6ec29b
    • T
      cgroup: remove CGRP_ROOT_OPTION_MASK · 7450e90b
      Tejun Heo 提交于
      cgroup_root->flags only contains CGRP_ROOT_* flags and there's no
      reason to mask the flags.  Remove CGRP_ROOT_OPTION_MASK.
      
      This doesn't cause any behavior differences.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      7450e90b
    • T
      cgroup: implement cgroup_subsys->depends_on · af0ba678
      Tejun Heo 提交于
      Currently, the blkio subsystem attributes all of writeback IOs to the
      root.  One of the issues is that there's no way to tell who originated
      a writeback IO from block layer.  Those IOs are usually issued
      asynchronously from a task which didn't have anything to do with
      actually generating the dirty pages.  The memory subsystem, when
      enabled, already keeps track of the ownership of each dirty page and
      it's desirable for blkio to piggyback instead of adding its own
      per-page tag.
      
      blkio piggybacking on memory is an implementation detail which
      preferably should be handled automatically without requiring explicit
      userland action.  To achieve that, this patch implements
      cgroup_subsys->depends_on which contains the mask of subsystems which
      should be enabled together when the subsystem is enabled.
      
      The previous patches already implemented the support for enabled but
      invisible subsystems and cgroup_subsys->depends_on can be easily
      implemented by updating cgroup_refresh_child_subsys_mask() so that it
      calculates cgroup->child_subsys_mask considering
      cgroup_subsys->depends_on of the explicitly enabled subsystems.
      
      Documentation/cgroups/unified-hierarchy.txt is updated to explain that
      subsystems may not become immediately available after being unused
      from userland and that dependency could be a factor in it.  As
      subsystems may already keep residual references, this doesn't
      significantly change how subsystem rebinding can be used.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      af0ba678
    • T
      cgroup: implement cgroup_subsys->css_reset() · b4536f0c
      Tejun Heo 提交于
      cgroup is implementing support for subsystem dependency which would
      require a way to enable a subsystem even when it's not directly
      configured through "cgroup.subtree_control".
      
      The previous patches added support for explicitly and implicitly
      enabled subsystems and showing/hiding their interface files.  An
      explicitly enabled subsystem may become implicitly enabled if it's
      turned off through "cgroup.subtree_control" but there are subsystems
      depending on it.  In such cases, the subsystem, as it's turned off
      when seen from userland, shouldn't enforce any resource control.
      Also, the subsystem may be explicitly turned on later again and its
      interface files should be as close to the intial state as possible.
      
      This patch adds cgroup_subsys->css_reset() which is invoked when a css
      is hidden.  The callback should disable resource control and reset the
      state to the vanilla state.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      b4536f0c