提交 · 41cad6ab2cb9ccb3b11546ad56b8b285e47c6279 · openeuler / Kernel

10 1月, 2013 10 次提交

cfq-iosched: convert cfq_group_slice() to use cfqg->vfraction · 41cad6ab

由 Tejun Heo 提交于 1月 09, 2013

cfq_group_slice() calculates slice by taking a fraction of
cfq_target_latency according to the ratio of cfqg->weight against
service_tree->total_weight.  This currently works only because all
cfqgs are treated to be at the same level.

To prepare for proper hierarchy support, convert cfq_group_slice() to
base the calculation on cfqg->vfraction.  As cfqg->vfraction is always
a fraction of 1 and represents the fraction allocated to the cfqg with
hierarchy considered, the slice can be simply calculated by
multiplying cfqg->vfraction to cfq_target_latency (with fixed point
shift factored in).

As vfraction calculation currently treats all non-root cfqgs as
children of the root cfqg, this patch doesn't introduce noticeable
behavior difference.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NVivek Goyal <vgoyal@redhat.com>

41cad6ab

cfq-iosched: implement hierarchy-ready cfq_group charge scaling · 1d3650f7

由 Tejun Heo 提交于 1月 09, 2013

Currently, cfqg charges are scaled directly according to cfqg->weight.
Regardless of the number of active cfqgs or the amount of active
weights, a given weight value always scales charge the same way.  This
works fine as long as all cfqgs are treated equally regardless of
their positions in the hierarchy, which is what cfq currently
implements.  It can't work in hierarchical settings because the
interpretation of a given weight value depends on where the weight is
located in the hierarchy.

This patch reimplements cfqg charge scaling so that it can be used to
support hierarchy properly.  The scheme is fairly simple and
light-weight.

* When a cfqg is added to the service tree, v(disktime)weight is
  calculated.  It walks up the tree to root calculating the fraction
  it has in the hierarchy.  At each level, the fraction can be
  calculated as

    cfqg->weight / parent->level_weight

  By compounding these, the global fraction of vdisktime the cfqg has
  claim to - vfraction - can be determined.

* When the cfqg needs to be charged, the charge is scaled inversely
  proportionally to the vfraction.

The new scaling scheme uses the same CFQ_SERVICE_SHIFT for fixed point
representation as before; however, the smallest scaling factor is now
1 (ie. 1 << CFQ_SERVICE_SHIFT).  This is different from before where 1
was for CFQ_WEIGHT_DEFAULT and higher weight would result in smaller
scaling factor.

While this shifts the global scale of vdisktime a bit, it doesn't
change the relative relationships among cfqgs and the scheduling
result isn't different.

cfq_group_notify_queue_add uses fixed CFQ_IDLE_DELAY when appending
new cfqg to the service tree.  The specific value of CFQ_IDLE_DELAY
didn't have any relevance to vdisktime before and is unlikely to cause
any visible behavior difference now especially as the scale shift
isn't that large.

As the new scheme now makes proper distinction between cfqg->weight
and ->leaf_weight, reverse the weight aliasing for root cfqgs.  For
root, both weights are now mapped to ->leaf_weight instead of the
other way around.

Because we're still using cfqg_flat_parent(), this patch shouldn't
change the scheduling behavior in any noticeable way.

v2: Beefed up comments on vfraction as requested by Vivek.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NVivek Goyal <vgoyal@redhat.com>

1d3650f7

cfq-iosched: implement cfq_group->nr_active and ->children_weight · 7918ffb5

由 Tejun Heo 提交于 1月 09, 2013

To prepare for blkcg hierarchy support, add cfqg->nr_active and
->children_weight.  cfqg->nr_active counts the number of active cfqgs
at the cfqg's level and ->children_weight is sum of weights of those
cfqgs.  The level covers itself (cfqg->leaf_weight) and immediate
children.

The two values are updated when a cfqg enters and leaves the group
service tree.  Unless the hierarchy is very deep, the added overhead
should be negligible.

Currently, the parent is determined using cfqg_flat_parent() which
makes the root cfqg the parent of all other cfqgs.  This is to make
the transition to hierarchy-aware scheduling gradual.  Scheduling
logic will be converted to use cfqg->children_weight without actually
changing the behavior.  When everything is ready,
blkcg_weight_parent() will be replaced with proper parent function.

This patch doesn't introduce any behavior chagne.

v2: s/cfqg->level_weight/cfqg->children_weight/ as per Vivek.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NVivek Goyal <vgoyal@redhat.com>

7918ffb5

cfq-iosched: add leaf_weight · e71357e1

由 Tejun Heo 提交于 1月 09, 2013

cfq blkcg is about to grow proper hierarchy handling, where a child
blkg's weight would nest inside the parent's.  This makes tasks in a
blkg to compete against both tasks in the sibling blkgs and the tasks
of child blkgs.

We're gonna use the existing weight as the group weight which decides
the blkg's weight against its siblings.  This patch introduces a new
weight - leaf_weight - which decides the weight of a blkg against the
child blkgs.

It's named leaf_weight because another way to look at it is that each
internal blkg nodes have a hidden child leaf node which contains all
its tasks and leaf_weight is the weight of the leaf node and handled
the same as the weight of the child blkgs.

This patch only adds leaf_weight fields and exposes it to userland.
The new weight isn't actually used anywhere yet.  Note that
cfq-iosched currently offcially supports only single level hierarchy
and root blkgs compete with the first level blkgs - ie. root weight is
basically being used as leaf_weight.  For root blkgs, the two weights
are kept in sync for backward compatibility.

v2: cfqd->root_group->leaf_weight initialization was missing from
    cfq_init_queue() causing divide by zero when
    !CONFIG_CFQ_GROUP_SCHED.  Fix it.  Reported by Fengguang.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Fengguang Wu <fengguang.wu@intel.com>

e71357e1

cfq-iosched: Print sync-noidle information in blktrace messages · b226e5c4

由 Vivek Goyal 提交于 10月 03, 2012

Currently we attach a character "S" or "A" to the cfqq<pid>, to represent
whether queues is sync or async. Add one more character "N" to represent
whether it is sync-noidle queue or sync queue. So now three different
type of queues will look as follows.

cfq1234S   --> sync queus
cfq1234SN  --> sync noidle queue
cfq1234A   --> Async queue

Previously S/A classification was being printed only if group scheduling
was enabled. This patch also makes sure that this classification is
displayed even if group idling is disabled.
Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
Acked-by: NJeff Moyer <jmoyer@redhat.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

b226e5c4

cfq-iosched: Get rid of unnecessary local variable · 1f23f121

由 Vivek Goyal 提交于 10月 03, 2012

Use of local varibale "n" seems to be unnecessary. Remove it. This brings
it inline with function __cfq_group_st_add(), which is also doing the
similar operation of adding a group to a rb tree.

No functionality change here.
Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
Acked-by: NJeff Moyer <jmoyer@redhat.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

1f23f121

cfq-iosched: Rename few functions related to selecting workload · 6d816ec7

由 Vivek Goyal 提交于 10月 03, 2012

choose_service_tree() selects/sets both wl_class and wl_type.  Rename it to
choose_wl_class_and_type() to make it very clear.

cfq_choose_wl() only selects and sets wl_type. It is easy to confuse
it with choose_st(). So rename it to cfq_choose_wl_type() to make
it clear what does it do.

Just renaming. No functionality change.
Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
Acked-by: NJeff Moyer <jmoyer@redhat.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

6d816ec7

cfq-iosched: Rename "service_tree" to "st" at some places · 34b98d03

由 Vivek Goyal 提交于 10月 03, 2012

At quite a few places we use the keyword "service_tree". At some places,
especially local variables, I have abbreviated it to "st".

Also at couple of places moved binary operator "+" from beginning of line
to end of previous line, as per Tejun's feedback.

v2:
 Reverted most of the service tree name change based on Jeff Moyer's feedback.
Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

34b98d03

cfq-iosched: More renaming to better represent wl_class and wl_type · 4d2ceea4

由 Vivek Goyal 提交于 10月 03, 2012

Some more renaming. Again making the code uniform w.r.t use of
wl_class/class to represent IO class (RT, BE, IDLE) and using
wl_type/type to represent subclass (SYNC, SYNC-IDLE, ASYNC).

At places this patch shortens the string "workload" to "wl".
Renamed "saved_workload" to "saved_wl_type". Renamed
"saved_serving_class" to "saved_wl_class".

For uniformity with "saved_wl_*" variables, renamed "serving_class"
to "serving_wl_class" and renamed "serving_type" to "serving_wl_type".

Again, just trying to improve upon code uniformity and improve
readability. No functional change.

v2:
- Restored the usage of keyword "service" based on Jeff Moyer's feedback.
Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

4d2ceea4

cfq-iosched: Properly name all references to IO class · 3bf10fea

由 Vivek Goyal 提交于 10月 03, 2012

Currently CFQ has three IO classes, RT, BE and IDLE. At many a places we
are calling workloads belonging to these classes as "prio". This gets
very confusing as one starts to associate it with ioprio.

So this patch just does bunch of renaming so that reading code becomes
easier. All reference to RT, BE and IDLE workload are done using keyword
"class" and all references to subclass, SYNC, SYNC-IDLE, ASYNC are made
using keyword "type".

This makes me feel much better while I am reading the code. There is no
functionality change due to this patch.
Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
Acked-by: NJeff Moyer <jmoyer@redhat.com>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NTejun Heo <tj@kernel.org>

3bf10fea

06 11月, 2012 1 次提交

block CFQ: avoid moving request to different queue · 3d106fba

由 Shaohua Li 提交于 11月 06, 2012

request is queued in cfqq->fifo list. Looks it's possible we are moving a
request from one cfqq to another in request merge case. In such case, adjusting
the fifo list order doesn't make sense and is impossible if we don't iterate
the whole fifo list.

My test does hit one case the two cfqq are different, but didn't cause kernel
crash, maybe it's because fifo list isn't used frequently. Anyway, from the
code logic, this is buggy.

I thought we can re-enable the recusive merge logic after this is fixed.
Signed-off-by: NShaohua Li <shli@fusionio.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

3d106fba

04 6月, 2012 2 次提交

block: blkcg_policy_cfq shouldn't be used if !CONFIG_CFQ_GROUP_IOSCHED · ffea73fc

由 Tejun Heo 提交于 6月 04, 2012

cfq may be built w/ or w/o blkcg support depending on
CONFIG_CFQ_CGROUP_IOSCHED.  If blkcg support is disabled, most of
related code is ifdef'd out but some part is left dangling -
blkcg_policy_cfq is left zero-filled and blkcg_policy_[un]register()
calls are made on it.

Feeding zero filled policy to blkcg_policy_register() is incorrect and
triggers the following WARN_ON() if CONFIG_BLK_CGROUP &&
!CONFIG_CFQ_GROUP_IOSCHED.

 ------------[ cut here ]------------
 WARNING: at block/blk-cgroup.c:867
 Modules linked in:
 Modules linked in:
 CPU: 3 Not tainted 3.4.0-09547-gfb21affa #1
 Process swapper/0 (pid: 1, task: 000000003ff80000, ksp: 000000003ff7f8b8)
 Krnl PSW : 0704100180000000 00000000003d76ca (blkcg_policy_register+0xca/0xe0)
	    R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:1 PM:0 EA:3
 Krnl GPRS: 0000000000000000 00000000014b85ec 00000000014b85b0 0000000000000000
	    000000000096fb60 0000000000000000 00000000009a8e78 0000000000000048
	    000000000099c070 0000000000b6f000 0000000000000000 000000000099c0b8
	    00000000014b85b0 0000000000667580 000000003ff7fd98 000000003ff7fd70
 Krnl Code: 00000000003d76be: a7280001           lhi     %r2,1
	    00000000003d76c2: a7f4ffdf           brc     15,3d7680
	   #00000000003d76c6: a7f40001           brc     15,3d76c8
	   >00000000003d76ca: a7c8ffea           lhi     %r12,-22
	    00000000003d76ce: a7f4ffce           brc     15,3d766a
	    00000000003d76d2: a7f40001           brc     15,3d76d4
	    00000000003d76d6: a7c80000           lhi     %r12,0
	    00000000003d76da: a7f4ffc2           brc     15,3d765e
 Call Trace:
 ([<0000000000b6f000>] initcall_debug+0x0/0x4)
  [<0000000000989e8a>] cfq_init+0x62/0xd4
  [<00000000001000ba>] do_one_initcall+0x3a/0x170
  [<000000000096fb60>] kernel_init+0x214/0x2bc
  [<0000000000623202>] kernel_thread_starter+0x6/0xc
  [<00000000006231fc>] kernel_thread_starter+0x0/0xc
 no locks held by swapper/0/1.
 Last Breaking-Event-Address:
  [<00000000003d76c6>] blkcg_policy_register+0xc6/0xe0
 ---[ end trace b8ef4903fcbf9dd3 ]---

This patch fixes the problem by ensuring all blkcg support code is
inside CONFIG_CFQ_GROUP_IOSCHED.

* blkcg_policy_cfq declaration and blkg_to_cfqg() definition are moved
  inside the first CONFIG_CFQ_GROUP_IOSCHED block.  __maybe_unused is
  dropped from blkcg_policy_cfq decl.

* blkcg_deactivate_poilcy() invocation is moved inside ifdef.  This
  also makes the activation logic match cfq_init_queue().

* All blkcg_policy_[un]register() invocations are moved inside ifdef.
Signed-off-by: NTejun Heo <tj@kernel.org>
Reported-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
LKML-Reference: <20120601112954.GC3535@osiris.boeblingen.de.ibm.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ffea73fc

block: fix return value on cfq_init() failure · fd794956

由 Tejun Heo 提交于 6月 04, 2012

cfq_init() would return zero after kmem cache creation failure.  Fix
so that it returns -ENOMEM.
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

fd794956

20 4月, 2012 11 次提交

blkcg: collapse blkcg_policy_ops into blkcg_policy · f9fcc2d3

由 Tejun Heo 提交于 4月 16, 2012

There's no reason to keep blkcg_policy_ops separate.  Collapse it into
blkcg_policy.

This patch doesn't introduce any functional change.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f9fcc2d3

blkcg: embed struct blkg_policy_data in policy specific data · f95a04af

由 Tejun Heo 提交于 4月 16, 2012

Currently blkg_policy_data carries policy specific data as char flex
array instead of being embedded in policy specific data.  This was
forced by oddities around blkg allocation which are all gone now.

This patch makes blkg_policy_data embedded in policy specific data -
throtl_grp and cfq_group so that it's more conventional and consistent
with how io_cq is handled.

* blkcg_policy->pdata_size is renamed to ->pd_size.

* Functions which used to take void *pdata now takes struct
  blkg_policy_data *pd.

* blkg_to_pdata/pdata_to_blkg() updated to blkg_to_pd/pd_to_blkg().

* Dummy struct blkg_policy_data definition added.  Dummy
  pdata_to_blkg() definition was unused and inconsistent with the
  non-dummy version - correct dummy pd_to_blkg() added.

* throtl and cfq updated accordingly.

* As dummy blkg_to_pd/pd_to_blkg() are provided,
  blkg_to_cfqg/cfqg_to_blkg() don't need to be ifdef'd.  Moved outside
  ifdef block.

This patch doesn't introduce any functional change.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f95a04af

blkcg: mass rename of blkcg API · 3c798398

由 Tejun Heo 提交于 4月 16, 2012

During the recent blkcg cleanup, most of blkcg API has changed to such
extent that mass renaming wouldn't cause any noticeable pain.  Take
the chance and cleanup the naming.

* Rename blkio_cgroup to blkcg.

* Drop blkio / blkiocg prefixes and consistently use blkcg.

* Rename blkio_group to blkcg_gq, which is consistent with io_cq but
  keep the blkg prefix / variable name.

* Rename policy method type and field names to signify they're dealing
  with policy data.

* Rename blkio_policy_type to blkcg_policy.

This patch doesn't cause any functional change.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

3c798398

blkcg: remove blkio_group->path[] · 54e7ed12

由 Tejun Heo 提交于 4月 16, 2012

blkio_group->path[] stores the path of the associated cgroup and is
used only for debug messages.  Just format the path from blkg->cgroup
when printing debug messages.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

54e7ed12

blkcg: drop stuff unused after per-queue policy activation update · 3c96cb32

由 Tejun Heo 提交于 4月 13, 2012

* All_q_list is unused.  Drop all_q_{mutex|list}.

* @for_root of blkg_lookup_create() is always %false when called from
  outside blk-cgroup.c proper.  Factor out __blkg_lookup_create() so
  that it doesn't check whether @q is bypassing and use the
  underscored version for the @for_root callsite.

* blkg_destroy_all() is used only from blkcg proper and @destroy_root
  is always %true.  Make it static and drop @destroy_root.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

3c96cb32

blkcg: implement per-queue policy activation · a2b1693b

由 Tejun Heo 提交于 4月 13, 2012

All blkcg policies were assumed to be enabled on all request_queues.
Due to various implementation obstacles, during the recent blkcg core
updates, this was temporarily implemented as shooting down all !root
blkgs on elevator switch and policy [de]registration combined with
half-broken in-place root blkg updates.  In addition to being buggy
and racy, this meant losing all blkcg configurations across those
events.

Now that blkcg is cleaned up enough, this patch replaces the temporary
implementation with proper per-queue policy activation.  Each blkcg
policy should call the new blkcg_[de]activate_policy() to enable and
disable the policy on a specific queue.  blkcg_activate_policy()
allocates and installs policy data for the policy for all existing
blkgs.  blkcg_deactivate_policy() does the reverse.  If a policy is
not enabled for a given queue, blkg printing / config functions skip
the respective blkg for the queue.

blkcg_activate_policy() also takes care of root blkg creation, and
cfq_init_queue() and blk_throtl_init() are updated accordingly.

This replaces blkcg_bypass_{start|end}() and update_root_blkg_pd()
unnecessary.  Dropped.

v2: cfq_init_queue() was returning uninitialized @ret on root_group
    alloc failure if !CONFIG_CFQ_GROUP_IOSCHED.  Fixed.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

a2b1693b

blkcg: add request_queue->root_blkg · 03d8e111

由 Tejun Heo 提交于 4月 13, 2012

With per-queue policy activation, root blkg creation will be moved to
blkcg core.  Add q->root_blkg in preparation.  For blk-throtl, this
replaces throtl_data->root_tg; however, cfq needs to keep
cfqd->root_group for !CONFIG_CFQ_GROUP_IOSCHED.

This is to prepare for per-queue policy activation and doesn't cause
any functional difference.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

03d8e111

blkcg: make blkg_conf_prep() take @pol and return with queue lock held · da8b0662

由 Tejun Heo 提交于 4月 13, 2012

Add @pol to blkg_conf_prep() and let it return with queue lock held
(to be released by blkg_conf_finish()).  Note that @pol isn't used
yet.

This is to prepare for per-queue policy activation and doesn't cause
any visible difference.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

da8b0662

blkcg: remove static policy ID enums · 8bd435b3

由 Tejun Heo 提交于 4月 13, 2012

Remove BLKIO_POLICY_* enums and let blkio_policy_register() allocate
@pol->plid dynamically on registration.  The maximum number of blkcg
policies which can be registered at the same time is defined by
BLKCG_MAX_POLS constant added to include/linux/blkdev.h.

Note that blkio_policy_register() now may fail.  Policy init functions
updated accordingly and unnecessary ifdefs removed from cfq_init().
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

8bd435b3

blkcg: use @pol instead of @plid in update_root_blkg_pd() and blkcg_print_blkgs() · ec399347

由 Tejun Heo 提交于 4月 13, 2012

The two functions were taking "enum blkio_policy_id plid".  Make them
take "const struct blkio_policy_type *pol" instead.

This is to prepare for per-queue policy activation and doesn't cause
any functional difference.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ec399347

cfq: fix build breakage & warnings · f48ec1d7

由 Tejun Heo 提交于 4月 13, 2012

* CFQ_WEIGHT_* defined inside CONFIG_BLK_CGROUP causes cfq-iosched.c
  compile failure when the config is disabled.  Move it outside the
  ifdef block.

* Dummy cfqg_stats_*() definitions were lacking inline modifiers
  causing unused functions warning if !CONFIG_CFQ_GROUP_IOSCHED.  Add
  them.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f48ec1d7

02 4月, 2012 10 次提交

blkcg: drop BLKCG_STAT_{PRIV|POL|OFF} macros · 5bc4afb1

由 Tejun Heo 提交于 4月 01, 2012

Now that all stat handling code lives in policy implementations,
there's no need to encode policy ID in cft->private.

* Export blkcg_prfill_[rw]stat() from blkcg, remove
  blkcg_print_[rw]stat(), and implement cfqg_print_[rw]stat() which
  use hard-code BLKIO_POLICY_PROP.

* Use cft->private for offset of the target field directly and drop
  BLKCG_STAT_{PRIV|POL|OFF}().
Signed-off-by: NTejun Heo <tj@kernel.org>

5bc4afb1

blkcg: pass around pd->pdata instead of pd itself in prfill functions · d366e7ec

由 Tejun Heo 提交于 4月 01, 2012

Now that all conf and stat fields are moved into policy specific
blkio_policy_data->pdata areas, there's no reason to use
blkio_policy_data itself in prfill functions.  Pass around @pd->pdata
instead of @pd.
Signed-off-by: NTejun Heo <tj@kernel.org>

d366e7ec

blkcg: move blkio_group_conf->weight to cfq · 3381cb8d

由 Tejun Heo 提交于 4月 01, 2012

blkio_group_conf->weight is owned by cfq and has no reason to be
defined in blkcg core.  Replace it with cfq_group->dev_weight and let
conf setting functions directly set it.  If dev_weight is zero, the
cfqg doesn't have device specific weight configured.

Also, rename BLKIO_WEIGHT_* constants to CFQ_WEIGHT_* and rename
blkio_cgroup->weight to blkio_cgroup->cfq_weight.  We eventually want
per-policy storage in blkio_cgroup but just mark the ownership of the
field for now.
Signed-off-by: NTejun Heo <tj@kernel.org>

3381cb8d

blkcg: move blkio_group_stats to cfq-iosched.c · 155fead9

由 Tejun Heo 提交于 4月 01, 2012

blkio_group_stats contains only fields used by cfq and has no reason
to be defined in blkcg core.

* Move blkio_group_stats to cfq-iosched.c and rename it to cfqg_stats.

* blkg_policy_data->stats is replaced with cfq_group->stats.
  blkg_prfill_[rw]stat() are updated to use offset against pd->pdata
  instead.

* All related macros / functions are renamed so that they have cfqg_
  prefix and the unnecessary @pol arguments are dropped.

* All stat functions now take cfq_group * instead of blkio_group *.

* lockdep assertion on queue lock dropped.  Elevator runs under queue
  lock by default.  There isn't much to be gained by adding lockdep
  assertions at stat function level.

* cfqg_stats_reset() implemented for blkio_reset_group_stats_fn method
  so that cfqg->stats can be reset.
Signed-off-by: NTejun Heo <tj@kernel.org>

155fead9

blkcg: cfq doesn't need per-cpu dispatch stats · 41b38b6d

由 Tejun Heo 提交于 4月 01, 2012

blkio_group_stats_cpu is used to count dispatch stats using per-cpu
counters.  This is used by both blk-throtl and cfq-iosched but the
sharing is rather silly.

* cfq-iosched doesn't need per-cpu dispatch stats.  cfq always updates
  those stats while holding queue_lock.

* blk-throtl needs per-cpu dispatch stats but only service_bytes and
  serviced.  It doesn't make use of sectors.

This patch makes cfq add and use global stats for service_bytes,
serviced and sectors, removes per-cpu sectors counter and moves
per-cpu stat printing code to blk-throttle.c.
Signed-off-by: NTejun Heo <tj@kernel.org>

41b38b6d

blkcg: move statistics update code to policies · 629ed0b1

由 Tejun Heo 提交于 4月 01, 2012

As with conf/stats file handling code, there's no reason for stat
update code to live in blkcg core with policies calling into update
them. The current organization is both inflexible and complex.

This patch moves stat update code to specific policies. All
blkiocg_update_*_stats() functions which deal with BLKIO_POLICY_PROP
stats are collapsed into their cfq_blkiocg_update_*_stats()
counterparts. blkiocg_update_dispatch_stats() is used by both
policies and duplicated as throtl_update_dispatch_stats() and
cfq_blkiocg_update_dispatch_stats(). This will be cleaned up later.
Signed-off-by: NTejun Heo <tj@kernel.org>

629ed0b1

cfq: collapse cfq.h into cfq-iosched.c · 2ce4d50f

由 Tejun Heo 提交于 4月 01, 2012

block/cfq.h contains some functions which interact with blkcg;
however, this is only part of it and cfq-iosched.c already has quite
some #ifdef CONFIG_CFQ_GROUP_IOSCHED. With conf/stat handling being
moved to specific policies, having these relay functions isolated in
cfq.h doesn't make much sense. Collapse cfq.h into cfq-iosched.c for
now. Let's split blkcg support properly later if necessary.
Signed-off-by: NTejun Heo <tj@kernel.org>

2ce4d50f

blkcg: move conf/stat file handling code to policies · 60c2bc2d

由 Tejun Heo 提交于 4月 01, 2012

blkcg conf/stat handling is convoluted in that details which belong to
specific policy implementations are all out in blkcg core and then
policies hook into core layer to access and manipulate confs and
stats.  This sadly achieves both inflexibility (confs/stats can't be
modified without messing with blkcg core) and complexity (all the
call-ins and call-backs).

The previous patches restructured conf and stat handling code such
that they can be separated out.  This patch relocates the file
handling part.  All conf/stat file handling code which belongs to
BLKIO_POLICY_PROP is moved to cfq-iosched.c and all
BKLIO_POLICY_THROTL code to blk-throtl.c.

The move is verbatim except for blkio_update_group_{weight|bps|iops}()
callbacks which relays conf changes to policies.  The configuration
settings are handled in policies themselves so the relaying isn't
necessary.  Conf setting functions are modified to directly call
per-policy update functions and the relaying mechanism is dropped.
Signed-off-by: NTejun Heo <tj@kernel.org>

60c2bc2d

blkcg: remove unused @pol and @plid parameters · aaec55a0

由 Tejun Heo 提交于 4月 01, 2012

@pol to blkg_to_pdata() and @plid to blkg_lookup_create() are no
longer necessary.  Drop them.
Signed-off-by: NTejun Heo <tj@kernel.org>

aaec55a0

block: Make cfq_target_latency tunable through sysfs. · 5bf14c07

由 Tao Ma 提交于 4月 01, 2012

In cfq, when we calculate a time slice for a process(or a cfqq to
be precise), we have to consider the cfq_target_latency so that all the
sync request have an estimated latency(300ms) and it is controlled by
cfq_target_latency. But in some hadoop test, we have found that if
there are many processes doing sequential read(24 for example), the
throughput is bad because every process can only work for about 25ms
and the cfqq is switched. That leads to a higher disk seek. We can
achive the good throughput by setting low_latency=0, but then some
read's latency is too much for the application.

So this patch makes cfq_target_latency tunable through sysfs so that
we can tune it and find some magic number which is not bad for both
the throughput and the read latency.

Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: NTao Ma <boyu.mt@taobao.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

5bf14c07

23 3月, 2012 1 次提交

cfq: fix cfqg ref handling when BLK_CGROUP && !CFQ_GROUP_IOSCHED · eb7d8c07

由 Tejun Heo 提交于 3月 23, 2012

When BLK_CGROUP is enabled but CFQ_GROUP_IOSCHED is, cfq ends up
calling blkg_get/put() on dummy cfqg leading to the following crash.

  BUG: unable to handle kernel NULL pointer dereference at 00000000000000b0
  IP: [<ffffffff813d44d8>] cfq_init_queue+0x258/0x430
  PGD 0
  Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
  CPU 0
  Modules linked in:

  Pid: 1, comm: swapper/0 Not tainted 3.3.0-rc6-work+ #125 Bochs Bochs
  RIP: 0010:[<ffffffff813d44d8>]  [<ffffffff813d44d8>] cfq_init_queue+0x258/0x430
  RSP: 0018:ffff88001f9dfd80  EFLAGS: 00010046
  RAX: ffff88001aefbbf0 RBX: ffff88001aeedbf0 RCX: 0000000000000100
  RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff820ffd40
  RBP: ffff88001f9dfdd0 R08: 0000000000000000 R09: 0000000000000001
  R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
  R13: 0000000000000009 R14: ffff88001aefbc30 R15: 0000000000000003
  FS:  0000000000000000(0000) GS:ffff88001fc00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
  CR2: 00000000000000b0 CR3: 000000000206f000 CR4: 00000000000006f0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
  Process swapper/0 (pid: 1, threadinfo ffff88001f9de000, task ffff88001f9dc040)
  Stack:
   ffff88001aeedbf0 ffff88001aefbdb0 ffff88001aef1548 ffff88001aefbbf0
   ffff88001f9dfdd0 ffff88001aef1548 ffffffff820d6320 ffffffff8165ce30
   ffffffff82c555e0 ffff88001aeebbf0 ffff88001f9dfe00 ffffffff813b0507
  Call Trace:
   [<ffffffff813b0507>] elevator_init+0xd7/0x140
   [<ffffffff813b83d5>] blk_init_allocated_queue+0x125/0x150
   [<ffffffff813b94d3>] blk_init_queue_node+0x43/0x80
   [<ffffffff813b9523>] blk_init_queue+0x13/0x20
   [<ffffffff821aec00>] floppy_init+0x82/0xec7
   [<ffffffff810001d2>] do_one_initcall+0x42/0x170
   [<ffffffff821835fc>] kernel_init+0xcb/0x14f
   [<ffffffff81b40b24>] kernel_thread_helper+0x4/0x10
  Code: 00 e8 1d 9e 76 00 48 8b 43 48 48 85 c0 48 89 83 28 03 00 00 74 07 4c 8b a0 10 ff ff ff 8b 15 b0 2e d0 00 85 d2 0f 85 49 01 00 00 <41> 8b 84 24 b0 00 00 00 85 c0 0f 8e 8c 01 00 00 83 e8 01 85 c0
  RIP  [<ffffffff813d44d8>] cfq_init_queue+0x258/0x430

Because cfq's blkcg support has a on/off switch, CFQ_GROUP_IOSCHED,
separate from BLK_CGROUP, blkg access through cfqg needs to be
conditioned on it.

* Make blkg_to_cfqg() and cfqg_to_blkg() conditioned on
  CFQ_GROUP_IOSCHED.  If disabled, they always return %NULL.

* Introduce cfqg_get() and cfqg_put() conditioned on
  CFQ_GROUP_IOSCHED.  If disabled, they are noops.
Reported-by: NFengguang Wu <fengguang.wu@intel.com>
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

eb7d8c07

20 3月, 2012 2 次提交

cfq: don't use icq_get_changed() · 598971bf

由 Tejun Heo 提交于 3月 19, 2012

cfq caches the associated cfqq's for a given cic.  The cache needs to
be flushed if the cic's ioprio or blkcg has changed.  It is currently
done by requiring the changing action to set the respective
ICQ_*_CHANGED bit in the icq and testing it from cfq_set_request(),
which involves iterating through all the affected icqs.

All cfq wants to know is whether ioprio and/or blkcg have changed
since the last flush and can be easily achieved by just remembering
the current ioprio and blkcg ID in cic.

This patch adds cic->{ioprio|blkcg_id}, updates all ioprio users to
use the remembered value instead, and updates cfq_set_request() path
such that, instead of using icq_get_changed(), the current values are
compared against the remembered ones and trigger appropriate flush
action if not.  Condition tests are moved inside both _changed
functions which are now named check_ioprio_changed() and
check_blkcg_changed().

ioprio.h::task_ioprio*() can't be used anymore and replaced with
open-coded IOPRIO_CLASS_NONE case in cfq_async_queue_prio().
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

598971bf

cfq: pass around cfq_io_cq instead of io_context · abede6da

由 Tejun Heo 提交于 3月 19, 2012

Now that io_cq is managed by block core and guaranteed to exist for
any in-flight request, it is easier and carries more information to
pass around cfq_io_cq than io_context.

This patch updates cfq_init_prio_data(), cfq_find_alloc_queue() and
cfq_get_queue() to take @cic instead of @ioc.  This change removes a
duplicate cfq_cic_lookup() from cfq_find_alloc_queue().

This change enables the use of cic-cached ioprio in the next patch.
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

abede6da

07 3月, 2012 3 次提交

block: make block cgroup policies follow bio task association · 4f85cb96

由 Tejun Heo 提交于 3月 05, 2012

Implement bio_blkio_cgroup() which returns the blkcg associated with
the bio if exists or %current's blkcg, and use it in blk-throttle and
cfq-iosched propio.  This makes both cgroup policies honor task
association for the bio instead of always assuming %current.

As nobody is using bio_set_task() yet, this doesn't introduce any
behavior change.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

4f85cb96

block: implement bio_associate_current() · 852c788f

由 Tejun Heo 提交于 3月 05, 2012

IO scheduling and cgroup are tied to the issuing task via io_context
and cgroup of %current.  Unfortunately, there are cases where IOs need
to be routed via a different task which makes scheduling and cgroup
limit enforcement applied completely incorrectly.

For example, all bios delayed by blk-throttle end up being issued by a
delayed work item and get assigned the io_context of the worker task
which happens to serve the work item and dumped to the default block
cgroup.  This is double confusing as bios which aren't delayed end up
in the correct cgroup and makes using blk-throttle and cfq propio
together impossible.

Any code which punts IO issuing to another task is affected which is
getting more and more common (e.g. btrfs).  As both io_context and
cgroup are firmly tied to task including userland visible APIs to
manipulate them, it makes a lot of sense to match up tasks to bios.

This patch implements bio_associate_current() which associates the
specified bio with %current.  The bio will record the associated ioc
and blkcg at that point and block layer will use the recorded ones
regardless of which task actually ends up issuing the bio.  bio
release puts the associated ioc and blkcg.

It grabs and remembers ioc and blkcg instead of the task itself
because task may already be dead by the time the bio is issued making
ioc and blkcg inaccessible and those are all block layer cares about.

elevator_set_req_fn() is updated such that the bio elvdata is being
allocated for is available to the elevator.

This doesn't update block cgroup policies yet.  Further patches will
implement the support.

-v2: #ifdef CONFIG_BLK_CGROUP added around bio->bi_ioc dereference in
     rq_ioc() to fix build breakage.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Kent Overstreet <koverstreet@google.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

852c788f

block: add io_context->active_ref · f6e8d01b

由 Tejun Heo 提交于 3月 05, 2012

Currently ioc->nr_tasks is used to decide two things - whether an ioc
is done issuing IOs and whether it's shared by multiple tasks.  This
patch separate out the first into ioc->active_ref, which is acquired
and released using {get|put}_io_context_active() respectively.

This will be used to associate bio's with a given task.  This patch
doesn't introduce any visible behavior change.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f6e8d01b

openeuler / Kernel 大约 1 年 前同步成功

openeuler / Kernel
大约 1 年前同步成功