提交 · c29b70f6ee4f2fa3ef07f55bc9082945861e5391 · openeuler / Kernel

11 5月, 2009 2 次提交

block: implement and enforce request peek/start/fetch · 9934c8c0

由 Tejun Heo 提交于 5月 08, 2009

Till now block layer allowed two separate modes of request execution.
A request is always acquired from the request queue via
elv_next_request().  After that, drivers are free to either dequeue it
or process it without dequeueing.  Dequeue allows elv_next_request()
to return the next request so that multiple requests can be in flight.

Executing requests without dequeueing has its merits mostly in
allowing drivers for simpler devices which can't do sg to deal with
segments only without considering request boundary.  However, the
benefit this brings is dubious and declining while the cost of the API
ambiguity is increasing.  Segment based drivers are usually for very
old or limited devices and as converting to dequeueing model isn't
difficult, it doesn't justify the API overhead it puts on block layer
and its more modern users.

Previous patches converted all block low level drivers to dequeueing
model.  This patch completes the API transition by...

* renaming elv_next_request() to blk_peek_request()

* renaming blkdev_dequeue_request() to blk_start_request()

* adding blk_fetch_request() which is combination of peek and start

* disallowing completion of queued (not started) requests

* applying new API to all LLDs

Renamings are for consistency and to break out of tree code so that
it's apparent that out of tree drivers need updating.

[ Impact: block request issue API cleanup, no functional change ]
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Mike Miller <mike.miller@hp.com>
Cc: unsik Kim <donari75@gmail.com>
Cc: Paul Clements <paul.clements@steeleye.com>
Cc: Tim Waugh <tim@cyberelk.net>
Cc: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Laurent Vivier <Laurent@lvivier.info>
Cc: Jeff Garzik <jgarzik@pobox.com>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Grant Likely <grant.likely@secretlab.ca>
Cc: Adrian McMenamin <adrian@mcmen.demon.co.uk>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Cc: Borislav Petkov <petkovbb@googlemail.com>
Cc: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Cc: Alex Dubov <oakad@yahoo.com>
Cc: Pierre Ossman <drzeus@drzeus.cx>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Markus Lidel <Markus.Lidel@shadowconnect.com>
Cc: Stefan Weinhuber <wein@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Pete Zaitcev <zaitcev@redhat.com>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

9934c8c0

block: drop request->hard_* and *nr_sectors · 2e46e8b2

由 Tejun Heo 提交于 5月 07, 2009

struct request has had a few different ways to represent some
properties of a request.  ->hard_* represent block layer's view of the
request progress (completion cursor) and the ones without the prefix
are supposed to represent the issue cursor and allowed to be updated
as necessary by the low level drivers.  The thing is that as block
layer supports partial completion, the two cursors really aren't
necessary and only cause confusion.  In addition, manual management of
request detail from low level drivers is cumbersome and error-prone at
the very least.

Another interesting duplicate fields are rq->[hard_]nr_sectors and
rq->{hard_cur|current}_nr_sectors against rq->data_len and
rq->bio->bi_size.  This is more convoluted than the hard_ case.

rq->[hard_]nr_sectors are initialized for requests with bio but
blk_rq_bytes() uses it only for !pc requests.  rq->data_len is
initialized for all request but blk_rq_bytes() uses it only for pc
requests.  This causes good amount of confusion throughout block layer
and its drivers and determining the request length has been a bit of
black magic which may or may not work depending on circumstances and
what the specific LLD is actually doing.

rq->{hard_cur|current}_nr_sectors represent the number of sectors in
the contiguous data area at the front.  This is mainly used by drivers
which transfers data by walking request segment-by-segment.  This
value always equals rq->bio->bi_size >> 9.  However, data length for
pc requests may not be multiple of 512 bytes and using this field
becomes a bit confusing.

In general, having multiple fields to represent the same property
leads only to confusion and subtle bugs.  With recent block low level
driver cleanups, no driver is accessing or manipulating these
duplicate fields directly.  Drop all the duplicates.  Now rq->sector
means the current sector, rq->data_len the current total length and
rq->bio->bi_size the current segment length.  Everything else is
defined in terms of these three and available only through accessors.

* blk_recalc_rq_sectors() is collapsed into blk_update_request() and
  now handles pc and fs requests equally other than rq->sector update.
  This means that now pc requests can use partial completion too (no
  in-kernel user yet tho).

* bio_cur_sectors() is replaced with bio_cur_bytes() as block layer
  now uses byte count as the primary data length.

* blk_rq_pos() is now guranteed to be always correct.  In-block users
  converted.

* blk_rq_bytes() is now guaranteed to be always valid as is
  blk_rq_sectors().  In-block users converted.

* blk_rq_sectors() is now guaranteed to equal blk_rq_bytes() >> 9.
  More convenient one is used.

* blk_rq_bytes() and blk_rq_cur_bytes() are now inlined and take const
  pointer to request.

[ Impact: API cleanup, single way to represent one property of a request ]
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

2e46e8b2

28 4月, 2009 3 次提交

block: include discard requests in IO accounting · c69d4854

由 Jens Axboe 提交于 4月 24, 2009

We currently don't do merging on discard requests, but we potentially
could. If we do, then we need to include discard requests in the IO
accounting, or merging would end up decrementing in_flight IO counters
for an IO which never incremented them.

So enable accounting for discard requests.

Problem found by Nikanth Karthikesan <knikanth@suse.de>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

c69d4854

block: make blk_do_io_stat() do the full "is this rq accountable" checks · c2553b58

由 Jens Axboe 提交于 4月 24, 2009

We currently check for file system requests outside of blk_do_io_stat(rq),
but we may as well just include it.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

c2553b58

block: reorganize request fetching functions · 158dbda0

由 Tejun Heo 提交于 4月 23, 2009

Impact: code reorganization

elv_next_request() and elv_dequeue_request() are public block layer
interface than actual elevator implementation.  They mostly deal with
how requests interact with block layer and low level drivers at the
beginning of rqeuest processing whereas __elv_next_request() is the
actual eleveator request fetching interface.

Move the two functions to blk-core.c.  This prepares for further
interface cleanup.
Signed-off-by: NTejun Heo <tj@kernel.org>

158dbda0

24 4月, 2009 1 次提交

block: simplify I/O stat accounting · 42dad764

由 Jerome Marchand 提交于 4月 22, 2009

This simplifies I/O stat accounting switching code and separates it
completely from I/O scheduler switch code.

Requests are accounted according to the state of their request queue
at the time of the request allocation. There is no need anymore to
flush the request queue when switching I/O accounting state.
Signed-off-by: NJerome Marchand <jmarchan@redhat.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

42dad764

15 4月, 2009 1 次提交

block: fix bad spelling of quiesce · f600abe2

由 Jens Axboe 提交于 4月 08, 2009

Credit goes to Andrew Morton for spotting this one.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

f600abe2

07 4月, 2009 2 次提交

block: fix inconsistency in I/O stat accounting code · 26308eab

由 Jerome Marchand 提交于 3月 27, 2009

This forces in_flight to be zero when turning off or on the I/O stat
accounting and stops updating I/O stats in attempt_merge() when
accounting is turned off.
Signed-off-by: NJerome Marchand <jmarchan@redhat.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

26308eab

block: elevator quiescing helpers · 6c7e8cee

由 Jens Axboe 提交于 3月 27, 2009

Simple helper functions to quiesce the request queue. These are
currently only used for switching IO schedulers on-the-fly, but
we can use them to properly switch IO accounting on and off as well.
Signed-off-by: NJerome Marchand <jmarchan@redhat.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

6c7e8cee

13 3月, 2009 1 次提交

cpumask: use topology_core_cpumask/topology_thread_cpumask instead of cpu_core_map/cpu_sibling_map · c69fc56d

由 Rusty Russell 提交于 3月 13, 2009

Impact: cleanup

This is presumably what those definitions are for, and while all archs
define cpu_core_map/cpu_sibling map, that's changing (eg. x86 wants to
change it to a pointer).
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

c69fc56d

02 2月, 2009 1 次提交

block: fix oops in blk_queue_io_stat() · fb8ec18c

由 Jens Axboe 提交于 2月 02, 2009

Some initial probe requests don't have disk->queue mapped yet, so we
can't rely on a non-NULL queue in blk_queue_io_stat(). Wrap it in
blk_do_io_stat().
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

fb8ec18c

26 12月, 2008 1 次提交

cpumask: Replace cpu_coregroup_map with cpu_coregroup_mask · be4d638c

由 Rusty Russell 提交于 12月 26, 2008

cpu_coregroup_map returned a cpumask_t: it's going away.

(Note, the sched part of this patch won't apply meaningfully to the
sched tree, but I'm posting it to show the goal).
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Signed-off-by: NMike Travis <travis@sgi.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Ingo Molnar <mingo@redhat.com>

be4d638c

17 10月, 2008 1 次提交

block: remove __generic_unplug_device() from exports · f73e2d13

由 Jens Axboe 提交于 10月 17, 2008

The only out-of-core user is IDE, and that should be using
blk_start_queueing() instead.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

f73e2d13

09 10月, 2008 3 次提交

block: add fault injection mechanism for faking request timeouts · 581d4e28

由 Jens Axboe 提交于 9月 14, 2008

Only works for the generic request timer handling. Allows one to
sporadically ignore request completions, thus exercising the timeout
handling.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

581d4e28

block: unify request timeout handling · 242f9dcb

由 Jens Axboe 提交于 9月 14, 2008

Right now SCSI and others do their own command timeout handling.
Move those bits to the block layer.

Instead of having a timer per command, we try to be a bit more clever
and simply have one per-queue. This avoids the overhead of having to
tear down and setup a timer for each command, so it will result in a lot
less timer fiddling.
Signed-off-by: NMike Anderson <andmike@linux.vnet.ibm.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

242f9dcb

block: add support for IO CPU affinity · c7c22e4d

由 Jens Axboe 提交于 9月 13, 2008

This patch adds support for controlling the IO completion CPU of
either all requests on a queue, or on a per-request basis. We export
a sysfs variable (rq_affinity) which, if set, migrates completions
of requests to the CPU that originally submitted it. A bio helper
(bio_set_completion_cpu()) is also added, so that queuers can ask
for completion on that specific CPU.

In testing, this has been show to cut the system time by as much
as 20-40% on synthetic workloads where CPU affinity is desired.

This requires a little help from the architecture, so it'll only
work as designed for archs that are using the new generic smp
helper infrastructure.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

c7c22e4d

03 7月, 2008 1 次提交

block: Block layer data integrity support · 7ba1ba12

由 Martin K. Petersen 提交于 6月 30, 2008

Some block devices support verifying the integrity of requests by way
of checksums or other protection information that is submitted along
with the I/O.

This patch implements support for generating and verifying integrity
metadata, as well as correctly merging, splitting and cloning bios and
requests that have this extra information attached.

See Documentation/block/data-integrity.txt for more information.
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

7ba1ba12

29 4月, 2008 1 次提交

block: rename and export rq_init() · 2a4aa30c

由 FUJITA Tomonori 提交于 4月 29, 2008

This rename rq_init() blk_rq_init() and export it. Any path that hands
the request to the block layer needs to call it to initialize the
request.

This is a preparation for large command support, which needs to
initialize the request in a proper way (that is, just doing a memset()
will not work).
Signed-off-by: NFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

2a4aa30c

04 3月, 2008 1 次提交

proper prototype for blk_dev_init() · ff88972c

由 Adrian Bunk 提交于 3月 04, 2008

This patch adds a proper prototye for blk_dev_init() in block/blk.h
Signed-off-by: NAdrian Bunk <bunk@kernel.org>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

ff88972c

30 1月, 2008 3 次提交

J
block: ll_rw_blk.c split, add blk-merge.c · d6d48196
由 Jens Axboe 提交于 1月 29, 2008
```
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
```
d6d48196

block: continue ll_rw_blk.c splitup · 86db1e29

由 Jens Axboe 提交于 1月 29, 2008

Adds files for barrier handling, rq execution, io context handling,
mapping data to requests, and queue settings.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

86db1e29

J
block: split tag and sysfs handling from blk-core.c · 8324aa91
由 Jens Axboe 提交于 1月 29, 2008
```
Seperates the tag and sysfs handling from ll_rw_blk.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
```
8324aa91

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功