提交 · ec24751a6b57e1373a12361e581b2458bc9bb791 · OpenHarmony / kernel_linux

28 4月, 2009 13 次提交

block: implement and use [__]blk_end_request_all() · 40cbbb78

由 Tejun Heo 提交于 4月 23, 2009

There are many [__]blk_end_request() call sites which call it with
full request length and expect full completion.  Many of them ensure
that the request actually completes by doing BUG_ON() the return
value, which is awkward and error-prone.

This patch adds [__]blk_end_request_all() which takes @rq and @error
and fully completes the request.  BUG_ON() is added to to ensure that
this actually happens.

Most conversions are simple but there are a few noteworthy ones.

* cdrom/viocd: viocd_end_request() replaced with direct calls to
  __blk_end_request_all().

* s390/block/dasd: dasd_end_request() replaced with direct calls to
  __blk_end_request_all().

* s390/char/tape_block: tapeblock_end_request() replaced with direct
  calls to blk_end_request_all().

[ Impact: cleanup ]
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Mike Miller <mike.miller@hp.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Jeff Garzik <jgarzik@pobox.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Alex Dubov <oakad@yahoo.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>

40cbbb78

block: move rq->start_time initialization to blk_rq_init() · b243ddcb

由 Tejun Heo 提交于 4月 23, 2009

rq->start_time was initialized in init_request_from_bio() so special
requests didn't have start_time set.  This has been okay as start_time
has been used only for fs requests; however, there is no indication of
this actually is the case or not.  Set rq->start_time in blk_rq_init()
and guarantee that all initialized rq's have its start_time set.  This
improves consistency at virtually no cost and future changes will make
use of the timestamp for !bio requests.

[ Impact: rq->start_time is valid for all requests ]
Signed-off-by: NTejun Heo <tj@kernel.org>

b243ddcb

block: clean up request completion API · 2e60e022

由 Tejun Heo 提交于 4月 23, 2009

Request completion has gone through several changes and became a bit
messy over the time.  Clean it up.

1. end_that_request_data() is a thin wrapper around
   end_that_request_data_first() which checks whether bio is NULL
   before doing anything and handles bidi completion.
   blk_update_request() is a thin wrapper around
   end_that_request_data() which clears nr_sectors on the last
   iteration but doesn't use the bidi completion.

   Clean it up by moving the initial bio NULL check and nr_sectors
   clearing on the last iteration into end_that_request_data() and
   renaming it to blk_update_request(), which makes blk_end_io() the
   only user of end_that_request_data().  Collapse
   end_that_request_data() into blk_end_io().

2. There are four visible completion variants - blk_end_request(),
   __blk_end_request(), blk_end_bidi_request() and end_request().
   blk_end_request() and blk_end_bidi_request() uses blk_end_request()
   as the backend but __blk_end_request() and end_request() use
   separate implementation in __blk_end_request() due to different
   locking rules.

   blk_end_bidi_request() is identical to blk_end_io().  Collapse
   blk_end_io() into blk_end_bidi_request(), separate out request
   update into internal helper blk_update_bidi_request() and add
   __blk_end_bidi_request().  Redefine [__]blk_end_request() as thin
   inline wrappers around [__]blk_end_bidi_request().

3. As the whole request issue/completion usages are about to be
   modified and audited, it's a good chance to convert completion
   functions return bool which better indicates the intended meaning
   of return values.

4. The function name end_that_request_last() is from the days when it
   was a public interface and slighly confusing.  Give it a proper
   internal name - blk_finish_request().

5. Add description explaning that blk_end_bidi_request() can be safely
   used for uni requests as suggested by Boaz Harrosh.

The only visible behavior change is from #1.  nr_sectors counts are
cleared after the final iteration no matter which function is used to
complete the request.  I couldn't find any place where the code
assumes those nr_sectors counters contain the values for the last
segment and this change is good as it makes the API much more
consistent as the end result is now same whether a request is
completed using [__]blk_end_request() alone or in combination with
blk_update_request().

API further cleaned up per Christoph's suggestion.

[ Impact: cleanup, rq->*nr_sectors always updated after req completion ]
Signed-off-by: NTejun Heo <tj@kernel.org>
Reviewed-by: NBoaz Harrosh <bharrosh@panasas.com>
Cc: Christoph Hellwig <hch@infradead.org>

2e60e022

block: kill blk_end_request_callback() · 0b302d5a

由 Tejun Heo 提交于 4月 23, 2009

With recent IDE updates, blk_end_request_callback() doesn't have any
user now.  Kill it.

[ Impact: removal of unused convoluted interface ]
Signed-off-by: NTejun Heo <tj@kernel.org>

0b302d5a

block: reorganize request fetching functions · 158dbda0

由 Tejun Heo 提交于 4月 23, 2009

Impact: code reorganization

elv_next_request() and elv_dequeue_request() are public block layer
interface than actual elevator implementation.  They mostly deal with
how requests interact with block layer and low level drivers at the
beginning of rqeuest processing whereas __elv_next_request() is the
actual eleveator request fetching interface.

Move the two functions to blk-core.c.  This prepares for further
interface cleanup.
Signed-off-by: NTejun Heo <tj@kernel.org>

158dbda0

block: reorder request completion functions · 5efccd17

由 Tejun Heo 提交于 4月 23, 2009

Reorder request completion functions such that

* All request completion functions are located together.

* Functions which are used by only one caller is put right above the
  caller.

* end_request() is put after other completion functions but before
  blk_update_request().

This change is for completion function cleanup which will follow.

[ Impact: cleanup, code reorganization ]
Signed-off-by: NTejun Heo <tj@kernel.org>

5efccd17

block: clean up misc stuff after block layer timeout conversion · 2eef33e4

由 Tejun Heo 提交于 4月 23, 2009

* In blk_rq_timed_out_timer(), else { if } to else if

* In blk_add_timer(), simplify if/else block

[ Impact: cleanup ]
Signed-off-by: NTejun Heo <tj@kernel.org>

2eef33e4

block: cleanup REQ_SOFTBARRIER usages · 10732f56

由 Tejun Heo 提交于 4月 23, 2009

blk_insert_request() doesn't need to worry about REQ_SOFTBARRIER.
Don't set it.  Combined with recent ide updates, REQ_SOFTBARRIER is
now only used in elevator proper and for discard requests.

[ Impact: cleanup ]
Signed-off-by: NTejun Heo <tj@kernel.org>

10732f56

block: don't set REQ_NOMERGE unnecessarily · e4025f6c

由 Tejun Heo 提交于 4月 23, 2009

RQ_NOMERGE_FLAGS already clears defines which REQ flags aren't
mergeable.  There is no reason to specify it superflously.  It only
adds to confusion.  Don't set REQ_NOMERGE for barriers and requests
with specific queueing directive.  REQ_NOMERGE is now exclusively used
by the merging code.

[ Impact: cleanup ]
Signed-off-by: NTejun Heo <tj@kernel.org>

e4025f6c

block: kill blk_start_queueing() · a7f55792

由 Tejun Heo 提交于 4月 23, 2009

blk_start_queueing() is identical to __blk_run_queue() except that it
doesn't check for recursion.  None of the current users depends on
blk_start_queueing() running request_fn directly.  Replace usages of
blk_start_queueing() with [__]blk_run_queue() and kill it.

[ Impact: removal of mostly duplicate interface function ]
Signed-off-by: NTejun Heo <tj@kernel.org>

a7f55792

block: merge blk_invoke_request_fn() into __blk_run_queue() · a538cd03

由 Tejun Heo 提交于 4月 23, 2009

__blk_run_queue wraps blk_invoke_request_fn() such that it
additionally removes plug and bails out early if the queue is empty.
Both extra operations have their own pending mechanisms and don't
cause any harm correctness-wise when they are done superflously.

The only user of blk_invoke_request_fn() being blk_start_queue(),
there isn't much reason to keep both functions around.  Merge
blk_invoke_request_fn() into __blk_run_queue() and make
blk_start_queue() use __blk_run_queue() instead.

[ Impact: merge two subtly different internal functions ]
Signed-off-by: NTejun Heo <tj@kernel.org>

a538cd03

block: enable by default support for large devices and files on 32-bit archs · db29a6b4

由 Bartlomiej Zolnierkiewicz 提交于 4月 21, 2009

Enable by default support for large devices and files (CONFIG_LBD):

- With 1TB disks being a commodity hardware it is quite easy to hit 2TB
  limitation while building RAIDs etc. and many distros have been using
  CONFIG_LBD=y by default already (at least Fedora 10 and openSUSE 11.1).

- This should also prevent a subtle ext4 filesystem compatibility issue:
  mke2fs.ext4 defaults to creating filesystems with huge_files feature
  enabled and such filesystems cannot be later mounted read-write on
  machines with CONFIG_LBD=n (it should be quite easy to hit this issue
  when trying to use filesystem created using distro kernel on system
  running the self-build kernel, think about USB disk enclosures & co.).

While at it:

- Clarify config option help text w.r.t. mounting ext4 filesystems
  (they can be mounted with CONFIG_LBD=n but in the read-only mode).

Cc: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: NBartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

db29a6b4

block: clear req->errors on bio completion only for fs requests · 924cec77

由 Tejun Heo 提交于 4月 19, 2009

Impact: subtle behavior change

For fs requests, rq is only carrier of bios and rq error status as a
whole doesn't mean much.  This is the reason why rq->errors is being
cleared on each partial completion of a request as on each partial
completion the error status is transferred to the respective bios.

For pc requests, rq->errors is used to carry error status to the
issuer and thus __end_that_request_first() doesn't clear it on such
cases.

The condition was fine till now as only fs and pc requests have used
bio and thus the bio completion path.  However, future changes will
unify data accesses to bio and all non fs users care about rq error
status.  Clear rq->errors on bio completion only for fs requests.

In general, the implicit clearing is a bit too subtle especially as
the meaning of rq->errors is completely dependent on low level
drivers.  Unifying / cleaning up rq->errors usage and letting llds
manage it would be better.  TODO comment added.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NJens Axboe <axboe@kernel.dk>

924cec77

24 4月, 2009 5 次提交

cfq-iosched: cache prio_tree root in cfqq->p_root · f2d1f0ae

由 Jens Axboe 提交于 4月 23, 2009

Currently we look it up from ->ioprio, but ->ioprio can change if
either the process gets its IO priority changed explicitly, or if
cfq decides to temporarily boost it. So if we are unlucky, we can
end up attempting to remove a node from a different rbtree root than
where it was added.

Fix this by using ->org_ioprio as the prio_tree index, since that
will only change for explicit IO priority settings (not for a boost).
Additionally cache the rbtree root inside the cfqq, then we don't have
to add code to reinsert the cfqq in the prio_tree if IO priority changes.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

f2d1f0ae

cfq-iosched: fix bug with aliased request and cooperation detection · 3ac6c9f8

由 Jens Axboe 提交于 4月 23, 2009

cfq_prio_tree_lookup() should return the direct match, yet it always
returns zero. Fix that.

cfq_prio_tree_add() assumes that we don't get a direct match, while
it is very possible that we do. Using O_DIRECT, you can have different
cfqq with matching requests, since you don't have the page cache
to serialize things for you. Fix this bug by only adding the cfqq if
there isn't an existing match.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

3ac6c9f8

cfq-iosched: clear ->prio_trees[] on cfqd alloc · 26a2ac00

由 Jens Axboe 提交于 4月 23, 2009

Not strictly needed, but we should make it clear that we init the
rbtree roots here.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

26a2ac00

block: fix intermittent dm timeout based oops · 17d5c8ca

由 Hannes Reinecke 提交于 4月 23, 2009

Very rarely under stress testing of dm, oopses are occuring as
something tampers with an old stack frame.  This has been traced back
to blk_abort_queue() leaving a timeout_list pointing to the stack.
The reason is that sometimes blk_abort_request() won't delete the
timer (if the request is marked as complete but before the timer has
been removed, a small race window).  Fix this by splicing back from
the ususally empty list to the q->timeout_list.
Signed-off-by: NHannes Reinecke <hare@suse.de>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

17d5c8ca

block: simplify I/O stat accounting · 42dad764

由 Jerome Marchand 提交于 4月 22, 2009

This simplifies I/O stat accounting switching code and separates it
completely from I/O scheduler switch code.

Requests are accounted according to the state of their request queue
at the time of the request allocation. There is no need anymore to
flush the request queue when switching I/O accounting state.
Signed-off-by: NJerome Marchand <jmarchan@redhat.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

42dad764

22 4月, 2009 6 次提交

cfq-iosched: use the default seek distance when there aren't enough seek samples · 04dc6e71

由 Jeff Moyer 提交于 4月 21, 2009

If the cfq io context doesn't have enough samples yet to provide a mean
seek distance, then use the default threshold we have for seeky IO instead
of defaulting to 0.
Signed-off-by: NJeff Moyer <jmoyer@redhat.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

04dc6e71

cfq-iosched: make seek_mean converge more quickly · 4d00aa47

由 Jeff Moyer 提交于 4月 21, 2009

Right now, depending on the first sector to which a process issues I/O,
the seek time may start out way out of whack. So make sure we start
with 0 sectors in seek, instead of the offset of the first request
issued.
Signed-off-by: NJeff Moyer <jmoyer@redhat.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

4d00aa47

block: make blk_abort_queue() ignore non-request based devices · b7591134

由 Jens Axboe 提交于 4月 17, 2009

There's nothing to do for those devices, since the timeout handling is
based on requests.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

b7591134

block: include empty disks in /proc/diskstats · 71982a40

由 Tejun Heo 提交于 4月 17, 2009

/proc/diskstats used to show stats for all disks whether they're
zero-sized or not and their non-zero partitions.  Commit
074a7aca accidentally changed the
behavior such that it doesn't print out zero sized disks.  This patch
implements DISK_PITER_INCL_EMPTY_PART0 flag to partition iterator and
uses it in diskstats_show() such that empty part0 is shown in
/proc/diskstats.

Reported and bisectd by Dianel Collins.
Signed-off-by: NTejun Heo <tj@kernel.org>
Reported-by: NDaniel Collins <solemnwarning@solemnwarning.no-ip.org>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

71982a40

block: fix queue bounce limit setting · cd0aca2d

由 Tejun Heo 提交于 4月 15, 2009

Impact: don't set GFP_DMA in q->bounce_gfp unnecessarily

All DMA address limits are expressed in terms of the last addressable
unit (byte or page) instead of one plus that.  However, when
determining bounce_gfp for 64bit machines in blk_queue_bounce_limit(),
it compares the specified limit against 0x100000000UL to determine
whether it's below 4G ending up falsely setting GFP_DMA in
q->bounce_gfp.

As DMA zone is very small on x86_64, this makes larger SG_IO transfers
very eager to trigger OOM killer.  Fix it.  While at it, rename the
parameter to @dma_mask for clarity and convert comment to proper
winged style.
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

cd0aca2d

block: fix SG_IO vector request data length handling · 25636e28

由 Tejun Heo 提交于 4月 15, 2009

Impact: fix SG_IO behavior such that it matches the documentation

SG_IO howto says that if ->dxfer_len and sum of iovec disagress, the
shorter one wins.  However, the current implementation returns -EINVAL
for such cases.  Trim iovc if it's longer than ->dxfer_len.

This patch uses iov_*() helpers which take struct iovec * by casting
struct sg_iovec * to it.  sg_iovec is always identical to iovec and
this will be further cleaned up with later patches.
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

25636e28

15 4月, 2009 11 次提交

cfq-iosched: add close cooperator code · a36e71f9

由 Jens Axboe 提交于 4月 15, 2009

If we have processes that are working in close proximity to each
other on disk, we don't want to idle wait. Instead allow the close
process to issue a request, getting better aggregate bandwidth.
The anticipatory scheduler has similar checks, noop and deadline do
not need it since they don't care about process <-> io mappings.

The code for CFQ is a little more involved though, since we split
request queues into per-process contexts.

This fixes a performance problem with eg dump(8), since it uses
several processes in some silly attempt to speed IO up. Even if
dump(8) isn't really a valid case (it should be fixed by using
CLONE_IO), there are other cases where we see close processes
and where idling ends up hurting performance.

Credit goes to Jeff Moyer <jmoyer@redhat.com> for writing the
initial implementation.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

a36e71f9

J
cfq-iosched: log responsible 'cfqq' in idle timer arm · 9481ffdc
由 Jens Axboe 提交于 4月 15, 2009
```
Makes it easier to read the traces.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
```
9481ffdc

cfq-iosched: tweak kick logic a bit more · 2d870722

由 Jens Axboe 提交于 4月 15, 2009

We only kick the dispatch for an idling queue, if we think it's a
(somewhat) fully merged request. Also allow a kick if we have other
busy queues in the system, since we don't want to risk waiting for
a potential merge in that case. It's better to get some work done and
proceed.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

2d870722

cfq-iosched: no need to save interrupts in cfq_kick_queue() · 40bb54d1

由 Jens Axboe 提交于 4月 15, 2009

It's called from the workqueue handlers from process context, so
we always have irqs enabled when entered.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

40bb54d1

block: Remove code handling bio_alloc failure with __GFP_WAIT · 15afd1cc

由 Nikanth Karthikesan 提交于 4月 15, 2009

Remove code handling bio_alloc failure with __GFP_WAIT.
GFP_KERNEL implies __GFP_WAIT.
Signed-off-by: NNikanth Karthikesan <knikanth@suse.de>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

15afd1cc

block: fix SG_IO to return a proper error value · 91e463c8

由 FUJITA Tomonori 提交于 4月 13, 2009

blk_rq_unmap_user() returns -EFAULT if a program passes an invalid
address to kernel. SG_IO path needs to pass the returned value to user
space instead of ignoring it.
Signed-off-by: NFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

91e463c8

cfq-iosched: don't delay queue kick for a merged request · d6ceb25e

由 Jens Axboe 提交于 4月 14, 2009

"Zhang, Yanmin" <yanmin_zhang@linux.intel.com> reports that commit
b029195d introduced a regression
of about 50% with sequential threaded read workloads. The test
case is:

tiotest -k0 -k1 -k3 -f 80 -t 32

which starts 32 threads each reading a 80MB file. Twiddle the kick
queue logic so that we do start IO immediately, if it appears to be
a fully merged request. We can't really detect that, so just check
if the request is bigger than a page or not. The assumption is that
since single bio issues will first queue a single request with just
one page attached and then later do merges on that, if we already
have more than a page worth of data in the request, then the request
is most likely good to go.

Verified that this doesn't cause a regression with the test case that
commit b029195d was fixing. It does not,
we still see maximum sized requests for the queue-then-merge cases.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

d6ceb25e

as-iosched: get rid of private REQ_SYNC/REQ_ASYNC defines · 1d6bfbdf

由 Jens Axboe 提交于 4月 08, 2009

We can just use the block layer BLK_RW_SYNC/ASYNC defines now.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

1d6bfbdf

cfq-iosched: get rid of private SYNC/ASYNC defines · ff6657c6

由 Jens Axboe 提交于 4月 08, 2009

We can just use the block layer BLK_RW_SYNC/ASYNC defines now.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

ff6657c6

J
cfq-iosched: use rw_is_sync() to see if rw flags are sync or not · b0b78f81
由 Jens Axboe 提交于 4月 08, 2009
```
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
```
b0b78f81

block: fix bad spelling of quiesce · f600abe2

由 Jens Axboe 提交于 4月 08, 2009

Credit goes to Andrew Morton for spotting this one.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

f600abe2

07 4月, 2009 5 次提交

cfq-iosched: don't let idling interfere with plugging · b029195d

由 Jens Axboe 提交于 4月 07, 2009

When CFQ is waiting for a new request from a process, currently it'll
immediately restart queuing when it sees such a request. This doesn't
work very well with streamed IO, since we then end up splitting IO
that would otherwise have been merged nicely. For a simple dd test,
this causes 10x as many requests to be issued as we should have.
Normally this goes unnoticed due to the low overhead of requests
at the device side, but some hardware is very sensitive to request
sizes and there it can cause big slow downs.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

b029195d

block: remove unused REQ_UNPLUG · 23853277

由 Jens Axboe 提交于 4月 07, 2009

The request inherits the unplug flag from the bio, but it isn't actually
used. The bio flag stops at __make_request(), which tells it to unplug
after submission. Passing it on to the request doesn't make any sense.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

23853277

cfq-iosched: kill two unused cfqq flags · 75e50984

由 Jens Axboe 提交于 4月 07, 2009

We only manipulate the must_dispatch and queue_new flags, they are not
tested anymore. So get rid of them.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

75e50984

cfq-iosched: change dispatch logic to deal with single requests at the time · 2f5cb738

由 Jens Axboe 提交于 4月 07, 2009

The IO scheduler core calls into the IO scheduler dispatch_request hook
to move requests from the IO scheduler and into the driver dispatch
list. It only does so when the dispatch list is empty. CFQ moves several
requests to the dispatch list, which can cause higher latencies if we
suddenly have to switch to some important sync IO. Change the logic to
move one request at the time instead.

This should almost be functionally equivalent to what we did before,
except that we now honor 'quantum' as the maximum queue depth at the
device side from any single cfqq. If there's just a single active
cfqq, we allow up to 4 times the normal quantum.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

2f5cb738

block: fix inconsistency in I/O stat accounting code · 26308eab

由 Jerome Marchand 提交于 3月 27, 2009

This forces in_flight to be zero when turning off or on the I/O stat
accounting and stops updating I/O stats in attempt_merge() when
accounting is turned off.
Signed-off-by: NJerome Marchand <jmarchan@redhat.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

26308eab

OpenHarmony / kernel_linux 上一次同步 3 年多

OpenHarmony / kernel_linux
上一次同步 3 年多