提交 · a538cd03be6f363d039daa94199c28cfbd508455 · OpenHarmony / kernel_linux

28 4月, 2009 3 次提交

block: merge blk_invoke_request_fn() into __blk_run_queue() · a538cd03

由 Tejun Heo 提交于 4月 23, 2009

__blk_run_queue wraps blk_invoke_request_fn() such that it
additionally removes plug and bails out early if the queue is empty.
Both extra operations have their own pending mechanisms and don't
cause any harm correctness-wise when they are done superflously.

The only user of blk_invoke_request_fn() being blk_start_queue(),
there isn't much reason to keep both functions around.  Merge
blk_invoke_request_fn() into __blk_run_queue() and make
blk_start_queue() use __blk_run_queue() instead.

[ Impact: merge two subtly different internal functions ]
Signed-off-by: NTejun Heo <tj@kernel.org>

a538cd03

block: enable by default support for large devices and files on 32-bit archs · db29a6b4

由 Bartlomiej Zolnierkiewicz 提交于 4月 21, 2009

Enable by default support for large devices and files (CONFIG_LBD):

- With 1TB disks being a commodity hardware it is quite easy to hit 2TB
  limitation while building RAIDs etc. and many distros have been using
  CONFIG_LBD=y by default already (at least Fedora 10 and openSUSE 11.1).

- This should also prevent a subtle ext4 filesystem compatibility issue:
  mke2fs.ext4 defaults to creating filesystems with huge_files feature
  enabled and such filesystems cannot be later mounted read-write on
  machines with CONFIG_LBD=n (it should be quite easy to hit this issue
  when trying to use filesystem created using distro kernel on system
  running the self-build kernel, think about USB disk enclosures & co.).

While at it:

- Clarify config option help text w.r.t. mounting ext4 filesystems
  (they can be mounted with CONFIG_LBD=n but in the read-only mode).

Cc: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: NBartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

db29a6b4

block: clear req->errors on bio completion only for fs requests · 924cec77

由 Tejun Heo 提交于 4月 19, 2009

Impact: subtle behavior change

For fs requests, rq is only carrier of bios and rq error status as a
whole doesn't mean much.  This is the reason why rq->errors is being
cleared on each partial completion of a request as on each partial
completion the error status is transferred to the respective bios.

For pc requests, rq->errors is used to carry error status to the
issuer and thus __end_that_request_first() doesn't clear it on such
cases.

The condition was fine till now as only fs and pc requests have used
bio and thus the bio completion path.  However, future changes will
unify data accesses to bio and all non fs users care about rq error
status.  Clear rq->errors on bio completion only for fs requests.

In general, the implicit clearing is a bit too subtle especially as
the meaning of rq->errors is completely dependent on low level
drivers.  Unifying / cleaning up rq->errors usage and letting llds
manage it would be better.  TODO comment added.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NJens Axboe <axboe@kernel.dk>

924cec77

24 4月, 2009 5 次提交

cfq-iosched: cache prio_tree root in cfqq->p_root · f2d1f0ae

由 Jens Axboe 提交于 4月 23, 2009

Currently we look it up from ->ioprio, but ->ioprio can change if
either the process gets its IO priority changed explicitly, or if
cfq decides to temporarily boost it. So if we are unlucky, we can
end up attempting to remove a node from a different rbtree root than
where it was added.

Fix this by using ->org_ioprio as the prio_tree index, since that
will only change for explicit IO priority settings (not for a boost).
Additionally cache the rbtree root inside the cfqq, then we don't have
to add code to reinsert the cfqq in the prio_tree if IO priority changes.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

f2d1f0ae

cfq-iosched: fix bug with aliased request and cooperation detection · 3ac6c9f8

由 Jens Axboe 提交于 4月 23, 2009

cfq_prio_tree_lookup() should return the direct match, yet it always
returns zero. Fix that.

cfq_prio_tree_add() assumes that we don't get a direct match, while
it is very possible that we do. Using O_DIRECT, you can have different
cfqq with matching requests, since you don't have the page cache
to serialize things for you. Fix this bug by only adding the cfqq if
there isn't an existing match.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

3ac6c9f8

cfq-iosched: clear ->prio_trees[] on cfqd alloc · 26a2ac00

由 Jens Axboe 提交于 4月 23, 2009

Not strictly needed, but we should make it clear that we init the
rbtree roots here.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

26a2ac00

block: fix intermittent dm timeout based oops · 17d5c8ca

由 Hannes Reinecke 提交于 4月 23, 2009

Very rarely under stress testing of dm, oopses are occuring as
something tampers with an old stack frame.  This has been traced back
to blk_abort_queue() leaving a timeout_list pointing to the stack.
The reason is that sometimes blk_abort_request() won't delete the
timer (if the request is marked as complete but before the timer has
been removed, a small race window).  Fix this by splicing back from
the ususally empty list to the q->timeout_list.
Signed-off-by: NHannes Reinecke <hare@suse.de>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

17d5c8ca

block: simplify I/O stat accounting · 42dad764

由 Jerome Marchand 提交于 4月 22, 2009

This simplifies I/O stat accounting switching code and separates it
completely from I/O scheduler switch code.

Requests are accounted according to the state of their request queue
at the time of the request allocation. There is no need anymore to
flush the request queue when switching I/O accounting state.
Signed-off-by: NJerome Marchand <jmarchan@redhat.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

42dad764

22 4月, 2009 6 次提交

cfq-iosched: use the default seek distance when there aren't enough seek samples · 04dc6e71

由 Jeff Moyer 提交于 4月 21, 2009

If the cfq io context doesn't have enough samples yet to provide a mean
seek distance, then use the default threshold we have for seeky IO instead
of defaulting to 0.
Signed-off-by: NJeff Moyer <jmoyer@redhat.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

04dc6e71

cfq-iosched: make seek_mean converge more quickly · 4d00aa47

由 Jeff Moyer 提交于 4月 21, 2009

Right now, depending on the first sector to which a process issues I/O,
the seek time may start out way out of whack. So make sure we start
with 0 sectors in seek, instead of the offset of the first request
issued.
Signed-off-by: NJeff Moyer <jmoyer@redhat.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

4d00aa47

block: make blk_abort_queue() ignore non-request based devices · b7591134

由 Jens Axboe 提交于 4月 17, 2009

There's nothing to do for those devices, since the timeout handling is
based on requests.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

b7591134

block: include empty disks in /proc/diskstats · 71982a40

由 Tejun Heo 提交于 4月 17, 2009

/proc/diskstats used to show stats for all disks whether they're
zero-sized or not and their non-zero partitions.  Commit
074a7aca accidentally changed the
behavior such that it doesn't print out zero sized disks.  This patch
implements DISK_PITER_INCL_EMPTY_PART0 flag to partition iterator and
uses it in diskstats_show() such that empty part0 is shown in
/proc/diskstats.

Reported and bisectd by Dianel Collins.
Signed-off-by: NTejun Heo <tj@kernel.org>
Reported-by: NDaniel Collins <solemnwarning@solemnwarning.no-ip.org>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

71982a40

block: fix queue bounce limit setting · cd0aca2d

由 Tejun Heo 提交于 4月 15, 2009

Impact: don't set GFP_DMA in q->bounce_gfp unnecessarily

All DMA address limits are expressed in terms of the last addressable
unit (byte or page) instead of one plus that.  However, when
determining bounce_gfp for 64bit machines in blk_queue_bounce_limit(),
it compares the specified limit against 0x100000000UL to determine
whether it's below 4G ending up falsely setting GFP_DMA in
q->bounce_gfp.

As DMA zone is very small on x86_64, this makes larger SG_IO transfers
very eager to trigger OOM killer.  Fix it.  While at it, rename the
parameter to @dma_mask for clarity and convert comment to proper
winged style.
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

cd0aca2d

block: fix SG_IO vector request data length handling · 25636e28

由 Tejun Heo 提交于 4月 15, 2009

Impact: fix SG_IO behavior such that it matches the documentation

SG_IO howto says that if ->dxfer_len and sum of iovec disagress, the
shorter one wins.  However, the current implementation returns -EINVAL
for such cases.  Trim iovc if it's longer than ->dxfer_len.

This patch uses iov_*() helpers which take struct iovec * by casting
struct sg_iovec * to it.  sg_iovec is always identical to iovec and
this will be further cleaned up with later patches.
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

25636e28

15 4月, 2009 11 次提交

cfq-iosched: add close cooperator code · a36e71f9

由 Jens Axboe 提交于 4月 15, 2009

If we have processes that are working in close proximity to each
other on disk, we don't want to idle wait. Instead allow the close
process to issue a request, getting better aggregate bandwidth.
The anticipatory scheduler has similar checks, noop and deadline do
not need it since they don't care about process <-> io mappings.

The code for CFQ is a little more involved though, since we split
request queues into per-process contexts.

This fixes a performance problem with eg dump(8), since it uses
several processes in some silly attempt to speed IO up. Even if
dump(8) isn't really a valid case (it should be fixed by using
CLONE_IO), there are other cases where we see close processes
and where idling ends up hurting performance.

Credit goes to Jeff Moyer <jmoyer@redhat.com> for writing the
initial implementation.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

a36e71f9

J
cfq-iosched: log responsible 'cfqq' in idle timer arm · 9481ffdc
由 Jens Axboe 提交于 4月 15, 2009
```
Makes it easier to read the traces.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
```
9481ffdc

cfq-iosched: tweak kick logic a bit more · 2d870722

由 Jens Axboe 提交于 4月 15, 2009

We only kick the dispatch for an idling queue, if we think it's a
(somewhat) fully merged request. Also allow a kick if we have other
busy queues in the system, since we don't want to risk waiting for
a potential merge in that case. It's better to get some work done and
proceed.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

2d870722

cfq-iosched: no need to save interrupts in cfq_kick_queue() · 40bb54d1

由 Jens Axboe 提交于 4月 15, 2009

It's called from the workqueue handlers from process context, so
we always have irqs enabled when entered.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

40bb54d1

block: Remove code handling bio_alloc failure with __GFP_WAIT · 15afd1cc

由 Nikanth Karthikesan 提交于 4月 15, 2009

Remove code handling bio_alloc failure with __GFP_WAIT.
GFP_KERNEL implies __GFP_WAIT.
Signed-off-by: NNikanth Karthikesan <knikanth@suse.de>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

15afd1cc

block: fix SG_IO to return a proper error value · 91e463c8

由 FUJITA Tomonori 提交于 4月 13, 2009

blk_rq_unmap_user() returns -EFAULT if a program passes an invalid
address to kernel. SG_IO path needs to pass the returned value to user
space instead of ignoring it.
Signed-off-by: NFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

91e463c8

cfq-iosched: don't delay queue kick for a merged request · d6ceb25e

由 Jens Axboe 提交于 4月 14, 2009

"Zhang, Yanmin" <yanmin_zhang@linux.intel.com> reports that commit
b029195d introduced a regression
of about 50% with sequential threaded read workloads. The test
case is:

tiotest -k0 -k1 -k3 -f 80 -t 32

which starts 32 threads each reading a 80MB file. Twiddle the kick
queue logic so that we do start IO immediately, if it appears to be
a fully merged request. We can't really detect that, so just check
if the request is bigger than a page or not. The assumption is that
since single bio issues will first queue a single request with just
one page attached and then later do merges on that, if we already
have more than a page worth of data in the request, then the request
is most likely good to go.

Verified that this doesn't cause a regression with the test case that
commit b029195d was fixing. It does not,
we still see maximum sized requests for the queue-then-merge cases.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

d6ceb25e

as-iosched: get rid of private REQ_SYNC/REQ_ASYNC defines · 1d6bfbdf

由 Jens Axboe 提交于 4月 08, 2009

We can just use the block layer BLK_RW_SYNC/ASYNC defines now.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

1d6bfbdf

cfq-iosched: get rid of private SYNC/ASYNC defines · ff6657c6

由 Jens Axboe 提交于 4月 08, 2009

We can just use the block layer BLK_RW_SYNC/ASYNC defines now.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

ff6657c6

J
cfq-iosched: use rw_is_sync() to see if rw flags are sync or not · b0b78f81
由 Jens Axboe 提交于 4月 08, 2009
```
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
```
b0b78f81

block: fix bad spelling of quiesce · f600abe2

由 Jens Axboe 提交于 4月 08, 2009

Credit goes to Andrew Morton for spotting this one.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

f600abe2

07 4月, 2009 7 次提交

cfq-iosched: don't let idling interfere with plugging · b029195d

由 Jens Axboe 提交于 4月 07, 2009

When CFQ is waiting for a new request from a process, currently it'll
immediately restart queuing when it sees such a request. This doesn't
work very well with streamed IO, since we then end up splitting IO
that would otherwise have been merged nicely. For a simple dd test,
this causes 10x as many requests to be issued as we should have.
Normally this goes unnoticed due to the low overhead of requests
at the device side, but some hardware is very sensitive to request
sizes and there it can cause big slow downs.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

b029195d

block: remove unused REQ_UNPLUG · 23853277

由 Jens Axboe 提交于 4月 07, 2009

The request inherits the unplug flag from the bio, but it isn't actually
used. The bio flag stops at __make_request(), which tells it to unplug
after submission. Passing it on to the request doesn't make any sense.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

23853277

cfq-iosched: kill two unused cfqq flags · 75e50984

由 Jens Axboe 提交于 4月 07, 2009

We only manipulate the must_dispatch and queue_new flags, they are not
tested anymore. So get rid of them.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

75e50984

cfq-iosched: change dispatch logic to deal with single requests at the time · 2f5cb738

由 Jens Axboe 提交于 4月 07, 2009

The IO scheduler core calls into the IO scheduler dispatch_request hook
to move requests from the IO scheduler and into the driver dispatch
list. It only does so when the dispatch list is empty. CFQ moves several
requests to the dispatch list, which can cause higher latencies if we
suddenly have to switch to some important sync IO. Change the logic to
move one request at the time instead.

This should almost be functionally equivalent to what we did before,
except that we now honor 'quantum' as the maximum queue depth at the
device side from any single cfqq. If there's just a single active
cfqq, we allow up to 4 times the normal quantum.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

2f5cb738

block: fix inconsistency in I/O stat accounting code · 26308eab

由 Jerome Marchand 提交于 3月 27, 2009

This forces in_flight to be zero when turning off or on the I/O stat
accounting and stops updating I/O stats in attempt_merge() when
accounting is turned off.
Signed-off-by: NJerome Marchand <jmarchan@redhat.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

26308eab

block: elevator quiescing helpers · 6c7e8cee

由 Jens Axboe 提交于 3月 27, 2009

Simple helper functions to quiesce the request queue. These are
currently only used for switching IO schedulers on-the-fly, but
we can use them to properly switch IO accounting on and off as well.
Signed-off-by: NJerome Marchand <jmarchan@redhat.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

6c7e8cee

pata_artop: typo · 8feb4d20

由 Alan Cox 提交于 4月 01, 2009

Fix a typo (this was in the original patch but was not merged when the code
fixes were for some reason)
Signed-off-by: NAlan Cox <alan@redhat.com>
Signed-off-by: NJeff Garzik <jgarzik@redhat.com>

8feb4d20

06 4月, 2009 3 次提交

block: Add flag for telling the IO schedulers NOT to anticipate more IO · aeb6fafb

由 Jens Axboe 提交于 4月 06, 2009

By default, CFQ will anticipate more IO from a given io context if the
previously completed IO was sync. This used to be fine, since the only
sync IO was reads and O_DIRECT writes. But with more "normal" sync writes
being used now, we don't want to anticipate for those.

Add a bio/request flag that informs the IO scheduler that this is a sync
request that we should not idle for. Introduce WRITE_ODIRECT specifically
for O_DIRECT writes, and make sure that the other sync writes set this
flag.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

aeb6fafb

block: enabling plugging on SSD devices that don't do queuing · 644b2d99

由 Jens Axboe 提交于 4月 06, 2009

For the older SSD devices that don't do command queuing, we do want to
enable plugging to get better merging.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

644b2d99

block: change the request allocation/congestion logic to be sync/async based · 1faa16d2

由 Jens Axboe 提交于 4月 06, 2009

This makes sure that we never wait on async IO for sync requests, instead
of doing the split on writes vs reads.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1faa16d2

03 4月, 2009 1 次提交

blktrace: fix pdu_len when tracing packet command requests · e2494e1b

由 Li Zefan 提交于 4月 02, 2009

Impact: output all of packet commands - not just the first 4 / 8 bytes

Since commit d7e3c324 ("block: add
large command support"), struct request->cmd has been changed from
unsinged char cmd[BLK_MAX_CDB] to unsigned char *cmd.

v1 -> v2: by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>

- make sure rq->cmd_len is always intialized, and then we can use
  rq->cmd_len instead of BLK_MAX_CDB.
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
Acked-by: NFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
LKML-Reference: <49D4507E.2060602@cn.fujitsu.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

e2494e1b

26 3月, 2009 2 次提交

bsg: Remove bogus check against request_queue->max_sectors · e7cbbf1b

由 Boaz Harrosh 提交于 3月 24, 2009

bsg submits REQ_TYPE_BLOCK_PC so the right check is max_hw_sectors.
But I've removed this check because right after, bsg proceeds with
calling blk_rq_map_user() which does all the right checks.
Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

e7cbbf1b

block: WARN in __blk_put_request() for potential bio leak · 1cd96c24

由 Boaz Harrosh 提交于 3月 24, 2009

Put a WARN_ON in __blk_put_request if it is about to
leak bio(s). This is a serious bug that can happen in error
handling code paths.

For this to work I have fixed a couple of places in block/ where
request->bio != NULL ownership was not honored. And a small cleanup
at sg_io() while at it.
Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

1cd96c24

24 3月, 2009 2 次提交

bsg: add support for tail queuing · 05378940

由 Boaz Harrosh 提交于 3月 24, 2009

Currently inherited from sg.c bsg will submit asynchronous request
 at the head-of-the-queue, (using "at_head" set in the call to
 blk_execute_rq_nowait()). This is bad in situation where the queues
 are full, requests will execute out of order, and can cause
 starvation of the first submitted requests.

The sg_io_v4->flags member is used and a bit is allocated to denote the
Q_AT_TAIL. Zero is to queue at_head as before, to be compatible with old
code at the write/read path. SG_IO code path behavior was changed so to
be the same as write/read behavior. SG_IO was very rarely used and breaking
compatibility with it is OK at this stage.

sg_io_hdr at sg.h also has a flags member and uses 3 bits from the first
nibble and one bit from the last nibble. Even though none of these bits
are supported by bsg, The second nibble is allocated for use by bsg. Just
in case.
Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>
CC: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

05378940

J
block: get rid of unused blkdev_free_rq() define · 50e17493
由 Jens Axboe 提交于 3月 06, 2009
```
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
```
50e17493

OpenHarmony / kernel_linux 上一次同步 大约 4 年

OpenHarmony / kernel_linux
上一次同步大约 4 年