1. 23 10月, 2012 1 次提交
    • J
      blkcg: Fix use-after-free of q->root_blkg and q->root_rl.blkg · 65635cbc
      Jun'ichi Nomura 提交于
      blk_put_rl() does not call blkg_put() for q->root_rl because we
      don't take request list reference on q->root_blkg.
      However, if root_blkg is once attached then detached (freed),
      blk_put_rl() is confused by the bogus pointer in q->root_blkg.
      
      For example, with !CONFIG_BLK_DEV_THROTTLING &&
      CONFIG_CFQ_GROUP_IOSCHED,
      switching IO scheduler from cfq to deadline will cause system stall
      after the following warning with 3.6:
      
      > WARNING: at /work/build/linux/block/blk-cgroup.h:250
      > blk_put_rl+0x4d/0x95()
      > Modules linked in: bridge stp llc sunrpc acpi_cpufreq freq_table mperf
      > ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4
      > Pid: 0, comm: swapper/0 Not tainted 3.6.0 #1
      > Call Trace:
      >  <IRQ>  [<ffffffff810453bd>] warn_slowpath_common+0x85/0x9d
      >  [<ffffffff810453ef>] warn_slowpath_null+0x1a/0x1c
      >  [<ffffffff811d5f8d>] blk_put_rl+0x4d/0x95
      >  [<ffffffff811d614a>] __blk_put_request+0xc3/0xcb
      >  [<ffffffff811d71a3>] blk_finish_request+0x232/0x23f
      >  [<ffffffff811d76c3>] ? blk_end_bidi_request+0x34/0x5d
      >  [<ffffffff811d76d1>] blk_end_bidi_request+0x42/0x5d
      >  [<ffffffff811d7728>] blk_end_request+0x10/0x12
      >  [<ffffffff812cdf16>] scsi_io_completion+0x207/0x4d5
      >  [<ffffffff812c6fcf>] scsi_finish_command+0xfa/0x103
      >  [<ffffffff812ce2f8>] scsi_softirq_done+0xff/0x108
      >  [<ffffffff811dcea5>] blk_done_softirq+0x8d/0xa1
      >  [<ffffffff810915d5>] ?
      >  generic_smp_call_function_single_interrupt+0x9f/0xd7
      >  [<ffffffff8104cf5b>] __do_softirq+0x102/0x213
      >  [<ffffffff8108a5ec>] ? lock_release_holdtime+0xb6/0xbb
      >  [<ffffffff8104d2b4>] ? raise_softirq_irqoff+0x9/0x3d
      >  [<ffffffff81424dfc>] call_softirq+0x1c/0x30
      >  [<ffffffff81011beb>] do_softirq+0x4b/0xa3
      >  [<ffffffff8104cdb0>] irq_exit+0x53/0xd5
      >  [<ffffffff8102d865>] smp_call_function_single_interrupt+0x34/0x36
      >  [<ffffffff8142486f>] call_function_single_interrupt+0x6f/0x80
      >  <EOI>  [<ffffffff8101800b>] ? mwait_idle+0x94/0xcd
      >  [<ffffffff81018002>] ? mwait_idle+0x8b/0xcd
      >  [<ffffffff81017811>] cpu_idle+0xbb/0x114
      >  [<ffffffff81401fbd>] rest_init+0xc1/0xc8
      >  [<ffffffff81401efc>] ? csum_partial_copy_generic+0x16c/0x16c
      >  [<ffffffff81cdbd3d>] start_kernel+0x3d4/0x3e1
      >  [<ffffffff81cdb79e>] ? kernel_init+0x1f7/0x1f7
      >  [<ffffffff81cdb2dd>] x86_64_start_reservations+0xb8/0xbd
      >  [<ffffffff81cdb3e3>] x86_64_start_kernel+0x101/0x110
      
      This patch clears q->root_blkg and q->root_rl.blkg when root blkg
      is destroyed.
      Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Acked-by: NVivek Goyal <vgoyal@redhat.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Cc: stable@kernel.org
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      65635cbc
  2. 15 9月, 2012 1 次提交
    • T
      cgroup: mark subsystems with broken hierarchy support and whine if cgroups are nested for them · 8c7f6edb
      Tejun Heo 提交于
      Currently, cgroup hierarchy support is a mess.  cpu related subsystems
      behave correctly - configuration, accounting and control on a parent
      properly cover its children.  blkio and freezer completely ignore
      hierarchy and treat all cgroups as if they're directly under the root
      cgroup.  Others show yet different behaviors.
      
      These differing interpretations of cgroup hierarchy make using cgroup
      confusing and it impossible to co-mount controllers into the same
      hierarchy and obtain sane behavior.
      
      Eventually, we want full hierarchy support from all subsystems and
      probably a unified hierarchy.  Users using separate hierarchies
      expecting completely different behaviors depending on the mounted
      subsystem is deterimental to making any progress on this front.
      
      This patch adds cgroup_subsys.broken_hierarchy and sets it to %true
      for controllers which are lacking in hierarchy support.  The goal of
      this patch is two-fold.
      
      * Move users away from using hierarchy on currently non-hierarchical
        subsystems, so that implementing proper hierarchy support on those
        doesn't surprise them.
      
      * Keep track of which controllers are broken how and nudge the
        subsystems to implement proper hierarchy support.
      
      For now, start with a single warning message.  We can whine louder
      later on.
      
      v2: Fixed a typo spotted by Michal. Warning message updated.
      
      v3: Updated memcg part so that it doesn't generate warning in the
          cases where .use_hierarchy=false doesn't make the behavior
          different from root.use_hierarchy=true.  Fixed a typo spotted by
          Glauber.
      
      v4: Check ->broken_hierarchy after cgroup creation is complete so that
          ->create() can affect the result per Michal.  Dropped unnecessary
          memcg root handling per Michal.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NMichal Hocko <mhocko@suse.cz>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Acked-by: NSerge E. Hallyn <serue@us.ibm.com>
      Cc: Glauber Costa <glommer@parallels.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Paul Turner <pjt@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Thomas Graf <tgraf@suug.ch>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      8c7f6edb
  3. 27 6月, 2012 1 次提交
    • T
      blkcg: implement per-blkg request allocation · a051661c
      Tejun Heo 提交于
      Currently, request_queue has one request_list to allocate requests
      from regardless of blkcg of the IO being issued.  When the unified
      request pool is used up, cfq proportional IO limits become meaningless
      - whoever grabs the next request being freed wins the race regardless
      of the configured weights.
      
      This can be easily demonstrated by creating a blkio cgroup w/ very low
      weight, put a program which can issue a lot of random direct IOs there
      and running a sequential IO from a different cgroup.  As soon as the
      request pool is used up, the sequential IO bandwidth crashes.
      
      This patch implements per-blkg request_list.  Each blkg has its own
      request_list and any IO allocates its request from the matching blkg
      making blkcgs completely isolated in terms of request allocation.
      
      * Root blkcg uses the request_list embedded in each request_queue,
        which was renamed to @q->root_rl from @q->rq.  While making blkcg rl
        handling a bit harier, this enables avoiding most overhead for root
        blkcg.
      
      * Queue fullness is properly per request_list but bdi isn't blkcg
        aware yet, so congestion state currently just follows the root
        blkcg.  As writeback isn't aware of blkcg yet, this works okay for
        async congestion but readahead may get the wrong signals.  It's
        better than blkcg completely collapsing with shared request_list but
        needs to be improved with future changes.
      
      * After this change, each block cgroup gets a full request pool making
        resource consumption of each cgroup higher.  This makes allowing
        non-root users to create cgroups less desirable; however, note that
        allowing non-root users to directly manage cgroups is already
        severely broken regardless of this patch - each block cgroup
        consumes kernel memory and skews IO weight (IO weights are not
        hierarchical).
      
      v2: queue-sysfs.txt updated and patch description udpated as suggested
          by Vivek.
      
      v3: blk_get_rl() wasn't checking error return from
          blkg_lookup_create() and may cause oops on lookup failure.  Fix it
          by falling back to root_rl on blkg lookup failures.  This problem
          was spotted by Rakesh Iyer <rni@google.com>.
      
      v4: Updated to accomodate 458f27a9 "block: Avoid missed wakeup in
          request waitqueue".  blk_drain_queue() now wakes up waiters on all
          blkg->rl on the target queue.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NVivek Goyal <vgoyal@redhat.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      a051661c
  4. 25 6月, 2012 3 次提交
  5. 06 6月, 2012 1 次提交
  6. 04 6月, 2012 1 次提交
  7. 20 4月, 2012 14 次提交
    • T
      blkcg: use radix tree to index blkgs from blkcg · a637120e
      Tejun Heo 提交于
      blkg lookup is currently performed by traversing linked list anchored
      at blkcg->blkg_list.  This is very unscalable and with blk-throttle
      enabled and enough request queues on the system, this can get very
      ugly quickly (blk-throttle performs look up on every bio submission).
      
      This patch makes blkcg use radix tree to index blkgs combined with
      simple last-looked-up hint.  This is mostly identical to how icqs are
      indexed from ioc.
      
      Note that because __blkg_lookup() may be invoked without holding queue
      lock, hint is only updated from __blkg_lookup_create().  Due to cfq's
      cfqq caching, this makes hint updates overly lazy.  This will be
      improved with scheduled blkcg aware request allocation.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      a637120e
    • T
      blkcg: fix blkcg->css ref leak in __blkg_lookup_create() · 496fb780
      Tejun Heo 提交于
      __blkg_lookup_create() leaked blkcg->css ref if blkg allocation
      failed.  Fix it.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      496fb780
    • T
      blkcg: collapse blkcg_policy_ops into blkcg_policy · f9fcc2d3
      Tejun Heo 提交于
      There's no reason to keep blkcg_policy_ops separate.  Collapse it into
      blkcg_policy.
      
      This patch doesn't introduce any functional change.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      f9fcc2d3
    • T
      blkcg: embed struct blkg_policy_data in policy specific data · f95a04af
      Tejun Heo 提交于
      Currently blkg_policy_data carries policy specific data as char flex
      array instead of being embedded in policy specific data.  This was
      forced by oddities around blkg allocation which are all gone now.
      
      This patch makes blkg_policy_data embedded in policy specific data -
      throtl_grp and cfq_group so that it's more conventional and consistent
      with how io_cq is handled.
      
      * blkcg_policy->pdata_size is renamed to ->pd_size.
      
      * Functions which used to take void *pdata now takes struct
        blkg_policy_data *pd.
      
      * blkg_to_pdata/pdata_to_blkg() updated to blkg_to_pd/pd_to_blkg().
      
      * Dummy struct blkg_policy_data definition added.  Dummy
        pdata_to_blkg() definition was unused and inconsistent with the
        non-dummy version - correct dummy pd_to_blkg() added.
      
      * throtl and cfq updated accordingly.
      
      * As dummy blkg_to_pd/pd_to_blkg() are provided,
        blkg_to_cfqg/cfqg_to_blkg() don't need to be ifdef'd.  Moved outside
        ifdef block.
      
      This patch doesn't introduce any functional change.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      f95a04af
    • T
      blkcg: mass rename of blkcg API · 3c798398
      Tejun Heo 提交于
      During the recent blkcg cleanup, most of blkcg API has changed to such
      extent that mass renaming wouldn't cause any noticeable pain.  Take
      the chance and cleanup the naming.
      
      * Rename blkio_cgroup to blkcg.
      
      * Drop blkio / blkiocg prefixes and consistently use blkcg.
      
      * Rename blkio_group to blkcg_gq, which is consistent with io_cq but
        keep the blkg prefix / variable name.
      
      * Rename policy method type and field names to signify they're dealing
        with policy data.
      
      * Rename blkio_policy_type to blkcg_policy.
      
      This patch doesn't cause any functional change.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      3c798398
    • T
      blkcg: remove blkio_group->path[] · 54e7ed12
      Tejun Heo 提交于
      blkio_group->path[] stores the path of the associated cgroup and is
      used only for debug messages.  Just format the path from blkg->cgroup
      when printing debug messages.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      54e7ed12
    • T
      blkcg: shoot down blkgs if all policies are deactivated · 6d18b008
      Tejun Heo 提交于
      There's no reason to keep blkgs around if no policy is activated for
      the queue.  This patch moves queue locking out of blkg_destroy_all()
      and call it from blkg_deactivate_policy() on deactivation of the last
      policy on the queue.
      
      This change was suggested by Vivek.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      6d18b008
    • T
      blkcg: drop stuff unused after per-queue policy activation update · 3c96cb32
      Tejun Heo 提交于
      * All_q_list is unused.  Drop all_q_{mutex|list}.
      
      * @for_root of blkg_lookup_create() is always %false when called from
        outside blk-cgroup.c proper.  Factor out __blkg_lookup_create() so
        that it doesn't check whether @q is bypassing and use the
        underscored version for the @for_root callsite.
      
      * blkg_destroy_all() is used only from blkcg proper and @destroy_root
        is always %true.  Make it static and drop @destroy_root.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      3c96cb32
    • T
      blkcg: implement per-queue policy activation · a2b1693b
      Tejun Heo 提交于
      All blkcg policies were assumed to be enabled on all request_queues.
      Due to various implementation obstacles, during the recent blkcg core
      updates, this was temporarily implemented as shooting down all !root
      blkgs on elevator switch and policy [de]registration combined with
      half-broken in-place root blkg updates.  In addition to being buggy
      and racy, this meant losing all blkcg configurations across those
      events.
      
      Now that blkcg is cleaned up enough, this patch replaces the temporary
      implementation with proper per-queue policy activation.  Each blkcg
      policy should call the new blkcg_[de]activate_policy() to enable and
      disable the policy on a specific queue.  blkcg_activate_policy()
      allocates and installs policy data for the policy for all existing
      blkgs.  blkcg_deactivate_policy() does the reverse.  If a policy is
      not enabled for a given queue, blkg printing / config functions skip
      the respective blkg for the queue.
      
      blkcg_activate_policy() also takes care of root blkg creation, and
      cfq_init_queue() and blk_throtl_init() are updated accordingly.
      
      This replaces blkcg_bypass_{start|end}() and update_root_blkg_pd()
      unnecessary.  Dropped.
      
      v2: cfq_init_queue() was returning uninitialized @ret on root_group
          alloc failure if !CONFIG_CFQ_GROUP_IOSCHED.  Fixed.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      a2b1693b
    • T
      blkcg: make sure blkg_lookup() returns %NULL if @q is bypassing · 80fd9979
      Tejun Heo 提交于
      Currently, blkg_lookup() doesn't check @q bypass state.  This patch
      updates blk_queue_bypass_start() to do synchronize_rcu() before
      returning and updates blkg_lookup() to check blk_queue_bypass() and
      return %NULL if bypassing.  This ensures blkg_lookup() returns %NULL
      if @q is bypassing.
      
      This is to guarantee that nobody is accessing policy data while @q is
      bypassing, which is necessary to allow replacing blkio_cgroup->pd[] in
      place on policy [de]activation.
      
      v2: Added more comments explaining bypass guarantees as suggested by
          Vivek.
      
      v3: Added more comments explaining why there's no synchronize_rcu() in
          blk_cleanup_queue() as suggested by Vivek.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      80fd9979
    • T
      blkcg: make blkg_conf_prep() take @pol and return with queue lock held · da8b0662
      Tejun Heo 提交于
      Add @pol to blkg_conf_prep() and let it return with queue lock held
      (to be released by blkg_conf_finish()).  Note that @pol isn't used
      yet.
      
      This is to prepare for per-queue policy activation and doesn't cause
      any visible difference.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      da8b0662
    • T
      blkcg: remove static policy ID enums · 8bd435b3
      Tejun Heo 提交于
      Remove BLKIO_POLICY_* enums and let blkio_policy_register() allocate
      @pol->plid dynamically on registration.  The maximum number of blkcg
      policies which can be registered at the same time is defined by
      BLKCG_MAX_POLS constant added to include/linux/blkdev.h.
      
      Note that blkio_policy_register() now may fail.  Policy init functions
      updated accordingly and unnecessary ifdefs removed from cfq_init().
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      8bd435b3
    • T
      blkcg: use @pol instead of @plid in update_root_blkg_pd() and blkcg_print_blkgs() · ec399347
      Tejun Heo 提交于
      The two functions were taking "enum blkio_policy_id plid".  Make them
      take "const struct blkio_policy_type *pol" instead.
      
      This is to prepare for per-queue policy activation and doesn't cause
      any functional difference.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      ec399347
    • T
      blkcg: kill blkio_list and replace blkio_list_lock with a mutex · bc0d6501
      Tejun Heo 提交于
      With blkio_policy[], blkio_list is redundant and hinders with
      per-queue policy activation.  Remove it.  Also, replace
      blkio_list_lock with a mutex blkcg_pol_mutex and let it protect the
      whole [un]registration.
      
      This is to prepare for per-queue policy activation and doesn't cause
      any functional difference.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      bc0d6501
  8. 02 4月, 2012 18 次提交
    • T
      blkcg: drop BLKCG_STAT_{PRIV|POL|OFF} macros · 5bc4afb1
      Tejun Heo 提交于
      Now that all stat handling code lives in policy implementations,
      there's no need to encode policy ID in cft->private.
      
      * Export blkcg_prfill_[rw]stat() from blkcg, remove
        blkcg_print_[rw]stat(), and implement cfqg_print_[rw]stat() which
        use hard-code BLKIO_POLICY_PROP.
      
      * Use cft->private for offset of the target field directly and drop
        BLKCG_STAT_{PRIV|POL|OFF}().
      Signed-off-by: NTejun Heo <tj@kernel.org>
      5bc4afb1
    • T
      blkcg: pass around pd->pdata instead of pd itself in prfill functions · d366e7ec
      Tejun Heo 提交于
      Now that all conf and stat fields are moved into policy specific
      blkio_policy_data->pdata areas, there's no reason to use
      blkio_policy_data itself in prfill functions.  Pass around @pd->pdata
      instead of @pd.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      d366e7ec
    • T
      blkcg: move blkio_group_conf->weight to cfq · 3381cb8d
      Tejun Heo 提交于
      blkio_group_conf->weight is owned by cfq and has no reason to be
      defined in blkcg core.  Replace it with cfq_group->dev_weight and let
      conf setting functions directly set it.  If dev_weight is zero, the
      cfqg doesn't have device specific weight configured.
      
      Also, rename BLKIO_WEIGHT_* constants to CFQ_WEIGHT_* and rename
      blkio_cgroup->weight to blkio_cgroup->cfq_weight.  We eventually want
      per-policy storage in blkio_cgroup but just mark the ownership of the
      field for now.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      3381cb8d
    • T
      blkcg: move blkio_group_stats_cpu and friends to blk-throttle.c · 8a3d2615
      Tejun Heo 提交于
      blkio_group_stats_cpu is used only by blk-throtl and has no reason to
      be defined in blkcg core.
      
      * Move blkio_group_stats_cpu to blk-throttle.c and rename it to
        tg_stats_cpu.
      
      * blkg_policy_data->stats_cpu is replaced with throtl_grp->stats_cpu.
        prfill functions updated accordingly.
      
      * All related macros / functions are renamed so that they have tg_
        prefix and the unnecessary @pol arguments are dropped.
      
      * Per-cpu stats allocation code is also moved from blk-cgroup.c to
        blk-throttle.c and gets simplified to only deal with
        BLKIO_POLICY_THROTL.  percpu stat free is performed by the exit
        method throtl_exit_blkio_group().
      
      * throtl_reset_group_stats() implemented for
        blkio_reset_group_stats_fn method so that tg->stats_cpu can be
        reset.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      8a3d2615
    • T
      blkcg: move blkio_group_stats to cfq-iosched.c · 155fead9
      Tejun Heo 提交于
      blkio_group_stats contains only fields used by cfq and has no reason
      to be defined in blkcg core.
      
      * Move blkio_group_stats to cfq-iosched.c and rename it to cfqg_stats.
      
      * blkg_policy_data->stats is replaced with cfq_group->stats.
        blkg_prfill_[rw]stat() are updated to use offset against pd->pdata
        instead.
      
      * All related macros / functions are renamed so that they have cfqg_
        prefix and the unnecessary @pol arguments are dropped.
      
      * All stat functions now take cfq_group * instead of blkio_group *.
      
      * lockdep assertion on queue lock dropped.  Elevator runs under queue
        lock by default.  There isn't much to be gained by adding lockdep
        assertions at stat function level.
      
      * cfqg_stats_reset() implemented for blkio_reset_group_stats_fn method
        so that cfqg->stats can be reset.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      155fead9
    • T
      blkcg: add blkio_policy_ops operations for exit and stat reset · 9ade5ea4
      Tejun Heo 提交于
      Add blkio_policy_ops->blkio_exit_group_fn() and
      ->blkio_reset_group_stats_fn().  These will be used to further
      modularize blkcg policy implementation.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      9ade5ea4
    • T
      blkcg: cfq doesn't need per-cpu dispatch stats · 41b38b6d
      Tejun Heo 提交于
      blkio_group_stats_cpu is used to count dispatch stats using per-cpu
      counters.  This is used by both blk-throtl and cfq-iosched but the
      sharing is rather silly.
      
      * cfq-iosched doesn't need per-cpu dispatch stats.  cfq always updates
        those stats while holding queue_lock.
      
      * blk-throtl needs per-cpu dispatch stats but only service_bytes and
        serviced.  It doesn't make use of sectors.
      
      This patch makes cfq add and use global stats for service_bytes,
      serviced and sectors, removes per-cpu sectors counter and moves
      per-cpu stat printing code to blk-throttle.c.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      41b38b6d
    • T
      blkcg: move statistics update code to policies · 629ed0b1
      Tejun Heo 提交于
      As with conf/stats file handling code, there's no reason for stat
      update code to live in blkcg core with policies calling into update
      them.  The current organization is both inflexible and complex.
      
      This patch moves stat update code to specific policies.  All
      blkiocg_update_*_stats() functions which deal with BLKIO_POLICY_PROP
      stats are collapsed into their cfq_blkiocg_update_*_stats()
      counterparts.  blkiocg_update_dispatch_stats() is used by both
      policies and duplicated as throtl_update_dispatch_stats() and
      cfq_blkiocg_update_dispatch_stats().  This will be cleaned up later.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      629ed0b1
    • T
      blkcg: move conf/stat file handling code to policies · 60c2bc2d
      Tejun Heo 提交于
      blkcg conf/stat handling is convoluted in that details which belong to
      specific policy implementations are all out in blkcg core and then
      policies hook into core layer to access and manipulate confs and
      stats.  This sadly achieves both inflexibility (confs/stats can't be
      modified without messing with blkcg core) and complexity (all the
      call-ins and call-backs).
      
      The previous patches restructured conf and stat handling code such
      that they can be separated out.  This patch relocates the file
      handling part.  All conf/stat file handling code which belongs to
      BLKIO_POLICY_PROP is moved to cfq-iosched.c and all
      BKLIO_POLICY_THROTL code to blk-throtl.c.
      
      The move is verbatim except for blkio_update_group_{weight|bps|iops}()
      callbacks which relays conf changes to policies.  The configuration
      settings are handled in policies themselves so the relaying isn't
      necessary.  Conf setting functions are modified to directly call
      per-policy update functions and the relaying mechanism is dropped.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      60c2bc2d
    • T
      blkcg: implement blkio_policy_type->cftypes · 44ea53de
      Tejun Heo 提交于
      Add blkiop->cftypes which is added and removed together with the
      policy.  This will be used to move conf/stat handling to the policies.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      44ea53de
    • T
      blkcg: export conf/stat helpers to prepare for reorganization · 829fdb50
      Tejun Heo 提交于
      conf/stat handling is about to be moved to policy implementation from
      blkcg core.  Export conf/stat helpers from blkcg core so that
      blk-throttle and cfq-iosched can use them.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      829fdb50
    • T
      blkcg: simplify blkg_conf_prep() · 726fa694
      Tejun Heo 提交于
      blkg_conf_prep() implements "MAJ:MIN VAL" parsing manually, which is
      unnecessary.  Just use sscanf("%u:%u %llu").  This might not reject
      some malformed input (extra input at the end) but we don't care.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      726fa694
    • T
      blkcg: restructure blkio_group configruation setting · 3a8b31d3
      Tejun Heo 提交于
      As part of userland interface restructuring, this patch updates
      per-blkio_group configuration setting.  Instead of funneling
      everything through a master function which has hard-coded cases for
      each config file it may handle, the common part is factored into
      blkg_conf_prep() and blkg_conf_finish() and different configuration
      setters are implemented using the helpers.
      
      While this doesn't result in immediate LOC reduction, this enables
      further cleanups and more modular implementation.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      3a8b31d3
    • T
      blkcg: restructure configuration printing · c4682aec
      Tejun Heo 提交于
      Similarly to the previous stat restructuring, this patch restructures
      conf printing code such that,
      
      * Conf printing uses the same helpers as stat.
      
      * Printing function doesn't require hardcoded switching on the config
        being printed.  Note that this isn't complete yet for throttle
        confs.  The next patch will convert setting for these confs and will
        complete the transition.
      
      * Printing uses read_seq_string callback (other methods will be phased
        out).
      
      Note that blkio_group_conf.iops[2] is changed to u64 so that they can
      be manipulated with the same functions.  This is transitional and will
      go away later.
      
      After this patch, per-device configurations - weight, bps and iops -
      use __blkg_prfill_u64() for printing which uses white space as
      delimiter instead of tab.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      c4682aec
    • T
      blkcg: drop blkiocg_file_write_u64() · 627f29f4
      Tejun Heo 提交于
      blkiocg_file_write_u64() has single switch case.  Drop
      blkiocg_file_write_u64(), rename blkio_weight_write() to
      blkcg_set_weight() and use it directly for .write_u64 callback.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      627f29f4
    • T
      blkcg: restructure statistics printing · d3d32e69
      Tejun Heo 提交于
      blkcg stats handling is a mess.  None of the stats has much to do with
      blkcg core but they are all implemented in blkcg core.  Code sharing
      is achieved by mixing common code with hard-coded cases for each stat
      counter.
      
      This patch restructures statistics printing such that
      
      * Common logic exists as helper functions and specific print functions
        use the helpers to implement specific cases.
      
      * Printing functions serving multiple counters don't require hardcoded
        switching on specific counters.
      
      * Printing uses read_seq_string callback (other methods will be phased
        out).
      
      This change enables further cleanups and relocating stats code to the
      policy implementation it belongs to.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      d3d32e69
    • T
      blkcg: introduce blkg_stat and blkg_rwstat · edcb0722
      Tejun Heo 提交于
      blkcg uses u64_stats_sync to avoid reading wrong u64 statistic values
      on 32bit archs and some stat counters have subtypes to distinguish
      read/writes and sync/async IOs.  The stat code paths are confusing and
      involve a lot of going back and forth between blkcg core and specific
      policy implementations, and synchronization and subtype handling are
      open coded in blkcg core.
      
      This patch introduces struct blkg_stat and blkg_rwstat which, with
      accompanying operations, encapsulate stat updating and accessing with
      proper synchronization.
      
      blkg_stat is simple u64 counter with 64bit read-access protection.
      blkg_rwstat is the one with rw and [a]sync subcounters and takes @rw
      flags to distinguish IO subtypes (%REQ_WRITE and %REQ_SYNC) and
      replaces stat_sub_type indexed arrays.
      
      All counters in blkio_group_stats and blkio_group_stats_cpu are
      replaced with either blkg_stat or blkg_rwstat along with all users.
      
      This does add one u64_stats_sync per counter and increase stats_sync
      operations but they're empty/noops on 64bit archs and blkcg doesn't
      have too many counters, especially with DEBUG_BLK_CGROUP off.
      
      While the currently resulting code isn't necessarily simpler at the
      moment, this will enable further clean up of blkcg stats code.
      
      - BLKIO_STAT_{READ|WRITE|SYNC|ASYNC|TOTAL} renamed to
        BLKG_RWSTAT_{READ|WRITE|SYNC|ASYNC|TOTAL}.
      
      - blkg_stat_add() replaces blkio_add_stat() and
        blkio_check_and_dec_stat().  Note that BUG_ON() on underflow in the
        latter function no longer exists.  It's *way* better to have
        underflowed stat counters than oopsing.
      
      - blkio_group_stats->dequeue is now a proper u64 stat counter instead
        of ulong.
      
      - reset_stats() updated to clear each stat counters individually and
        BLKG_STATS_DEBUG_CLEAR_{START|SIZE} are removed.
      
      - Some functions reconstruct rw flags from direction and sync
        booleans.  This will be removed by future patches.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      edcb0722
    • T
      blkcg: remove unused @pol and @plid parameters · aaec55a0
      Tejun Heo 提交于
      @pol to blkg_to_pdata() and @plid to blkg_lookup_create() are no
      longer necessary.  Drop them.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      aaec55a0