1. 27 4月, 2018 9 次提交
    • T
      cgroup: Make cgroup_rstat_updated() ready for root cgroup usage · c43c5ea7
      Tejun Heo 提交于
      cgroup_rstat_updated() ensures that the cgroup's rstat is linked to
      the parent.  If there's no parent, it never gets linked and the
      function ends up grabbing and releasing the cgroup_rstat_lock each
      time for no reason which can be expensive.
      
      This hasn't been a problem till now because nobody was calling the
      function for the root cgroup but rstat is gonna be exposed to
      controllers and use cases, so let's get ready.  Make
      cgroup_rstat_updated() an no-op for the root cgroup.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      c43c5ea7
    • T
      cgroup: Add memory barriers to plug cgroup_rstat_updated() race window · 9a9e97b2
      Tejun Heo 提交于
      cgroup_rstat_updated() has a small race window where an updated
      signaling can race with flush and could be lost till the next update.
      This wasn't a problem for the existing usages, but we plan to use
      rstat to track counters which need to be accurate.
      
      This patch plugs the race window by synchronizing
      cgroup_rstat_updated() and flush path with memory barriers around
      cgroup_rstat_cpu->updated_next pointer.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      9a9e97b2
    • T
      cgroup: Add cgroup_subsys->css_rstat_flush() · 8f53470b
      Tejun Heo 提交于
      This patch adds cgroup_subsys->css_rstat_flush().  If a subsystem has
      this callback, its csses are linked on cgrp->css_rstat_list and rstat
      will call the function whenever the associated cgroup is flushed.
      Flush is also performed when such csses are released so that residual
      counts aren't lost.
      
      Combined with the rstat API previous patches factored out, this allows
      controllers to plug into rstat to manage their statistics in a
      scalable way.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      8f53470b
    • T
      cgroup: Replace cgroup_rstat_mutex with a spinlock · 0fa294fb
      Tejun Heo 提交于
      Currently, rstat flush path is protected with a mutex which is fine as
      all the existing users are from interface file show path.  However,
      rstat is being generalized for use by controllers and flushing from
      atomic contexts will be necessary.
      
      This patch replaces cgroup_rstat_mutex with a spinlock and adds a
      irq-safe flush function - cgroup_rstat_flush_irqsafe().  Explicit
      yield handling is added to the flush path so that other flush
      functions can yield to other threads and flushers.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      0fa294fb
    • T
      cgroup: Factor out and expose cgroup_rstat_*() interface functions · 6162cef0
      Tejun Heo 提交于
      cgroup_rstat is being generalized so that controllers can use it too.
      This patch factors out and exposes the following interface functions.
      
      * cgroup_rstat_updated(): Renamed from cgroup_rstat_cpu_updated() for
        consistency.
      
      * cgroup_rstat_flush_hold/release(): Factored out from base stat
        implementation.
      
      * cgroup_rstat_flush(): Verbatim expose.
      
      While at it, drop assert on cgroup_rstat_mutex in
      cgroup_base_stat_flush() as it crosses layers and make a minor comment
      update.
      
      v2: Added EXPORT_SYMBOL_GPL(cgroup_rstat_updated) to fix a build bug.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      6162cef0
    • T
      cgroup: Reorganize kernel/cgroup/rstat.c · a17556f8
      Tejun Heo 提交于
      Currently, rstat.c has rstat and base stat implementations intermixed.
      Collect base stat implementation at the end of the file.  Also,
      reorder the prototypes.
      
      This patch doesn't make any functional changes.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      a17556f8
    • T
      cgroup: Distinguish base resource stat implementation from rstat · d4ff749b
      Tejun Heo 提交于
      Base resource stat accounts universial (not specific to any
      controller) resource consumptions on top of rstat.  Currently, its
      implementation is intermixed with rstat implementation making the code
      confusing to follow.
      
      This patch clarifies the distintion by doing the followings.
      
      * Encapsulate base resource stat counters, currently only cputime, in
        struct cgroup_base_stat.
      
      * Move prev_cputime into struct cgroup and initialize it with cgroup.
      
      * Rename the related functions so that they start with cgroup_base_stat.
      
      * Prefix the related variables and field names with b.
      
      This patch doesn't make any functional changes.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      d4ff749b
    • T
      cgroup: Rename stat to rstat · c58632b3
      Tejun Heo 提交于
      stat is too generic a name and ends up causing subtle confusions.
      It'll be made generic so that controllers can plug into it, which will
      make the problem worse.  Let's rename it to something more specific -
      cgroup_rstat for cgroup recursive stat.
      
      This patch does the following renames.  No other changes.
      
      * cpu_stat	-> rstat_cpu
      * stat		-> rstat
      * ?cstat	-> ?rstatc
      
      Note that the renames are selective.  The unrenamed are the ones which
      implement basic resource statistics on top of rstat.  This will be
      further cleaned up in the following patches.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      c58632b3
    • T
      cgroup: Rename kernel/cgroup/stat.c to kernel/cgroup/rstat.c · a5c2b93f
      Tejun Heo 提交于
      stat is too generic a name and ends up causing subtle confusions.
      It'll be made generic so that controllers can plug into it, which will
      make the problem worse.  Let's rename it to something more specific -
      cgroup_rstat for cgroup recursive stat.
      
      First, rename kernel/cgroup/stat.c to kernel/cgroup/rstat.c.  No
      content changes.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      a5c2b93f
  2. 28 11月, 2017 1 次提交
  3. 27 10月, 2017 1 次提交
    • T
      cgroup, sched: Move basic cpu stats from cgroup.stat to cpu.stat · d41bf8c9
      Tejun Heo 提交于
      The basic cpu stat is currently shown with "cpu." prefix in
      cgroup.stat, and the same information is duplicated in cpu.stat when
      cpu controller is enabled.  This is ugly and not very scalable as we
      want to expand the coverage of stat information which is always
      available.
      
      This patch makes cgroup core always create "cpu.stat" file and show
      the basic cpu stat there and calls the cpu controller to show the
      extra stats when enabled.  This ensures that the same information
      isn't presented in multiple places and makes future expansion of basic
      stats easier.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      d41bf8c9
  4. 25 9月, 2017 1 次提交
    • T
      cgroup: Implement cgroup2 basic CPU usage accounting · 041cd640
      Tejun Heo 提交于
      In cgroup1, while cpuacct isn't actually controlling any resources, it
      is a separate controller due to combination of two factors -
      1. enabling cpu controller has significant side effects, and 2. we
      have to pick one of the hierarchies to account CPU usages on.  cpuacct
      controller is effectively used to designate a hierarchy to track CPU
      usages on.
      
      cgroup2's unified hierarchy removes the second reason and we can
      account basic CPU usages by default.  While we can use cpuacct for
      this purpose, both its interface and implementation leave a lot to be
      desired - it collects and exposes two sources of truth which don't
      agree with each other and some of the exposed statistics don't make
      much sense.  Also, it propagates all the way up the hierarchy on each
      accounting event which is unnecessary.
      
      This patch adds basic resource accounting mechanism to cgroup2's
      unified hierarchy and accounts CPU usages using it.
      
      * All accountings are done per-cpu and don't propagate immediately.
        It just bumps the per-cgroup per-cpu counters and links to the
        parent's updated list if not already on it.
      
      * On a read, the per-cpu counters are collected into the global ones
        and then propagated upwards.  Only the per-cpu counters which have
        changed since the last read are propagated.
      
      * CPU usage stats are collected and shown in "cgroup.stat" with "cpu."
        prefix.  Total usage is collected from scheduling events.  User/sys
        breakdown is sourced from tick sampling and adjusted to the usage
        using cputime_adjust().
      
      This keeps the accounting side hot path O(1) and per-cpu and the read
      side O(nr_updated_since_last_read).
      
      v2: Minor changes and documentation updates as suggested by Waiman and
          Roman.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Li Zefan <lizefan@huawei.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Waiman Long <longman@redhat.com>
      Cc: Roman Gushchin <guro@fb.com>
      041cd640