提交 · 93dbb393503d53cd226e5e1f0088fe8f4dbaa2b8 · openanolis / cloud-kernel

18 2月, 2009 2 次提交

block: fix bad definition of BIO_RW_SYNC · 93dbb393

由 Jens Axboe 提交于 2月 16, 2009

We can't OR shift values, so get rid of BIO_RW_SYNC and use BIO_RW_SYNCIO
and BIO_RW_UNPLUG explicitly. This brings back the behaviour from before
213d9417.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

93dbb393

bsg: Fix sense buffer bug in SG_IO · c1c20120

由 Boaz Harrosh 提交于 2月 03, 2009

When submitting requests via SG_IO, which does a sync io, a
bsg_command is not allocated. So an in-Kernel sense_buffer was not
set. However when calling blk_execute_rq() with no sense buffer
one is provided from the stack. Now bsg at blk_complete_sgv4_hdr_rq()
would check if rq->sense_len and a sense was requested by sg_io_v4
the rq->sense was copy_user() back, but by now it is already mangled
stack memory.

I have fixed that by forcing a sense_buffer when calling bsg_map_hdr().
The bsg_command->sense is provided in the write/read path like before,
and on-the-stack buffer is provided when doing SG_IO.

I have also fixed a dprintk message to print rq->errors in hex because
of the scsi bit-field use of this member. For other block devices it
does not matter anyway.
Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>
Acked-by: NFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

c1c20120

02 2月, 2009 1 次提交

block: fix oops in blk_queue_io_stat() · fb8ec18c

由 Jens Axboe 提交于 2月 02, 2009

Some initial probe requests don't have disk->queue mapped yet, so we
can't rely on a non-NULL queue in blk_queue_io_stat(). Wrap it in
blk_do_io_stat().
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

fb8ec18c

30 1月, 2009 8 次提交

cfq-iosched: Allow RT requests to pre-empt ongoing BE timeslice · 3a9a3f6c

由 Divyesh Shah 提交于 1月 30, 2009

This patch adds the ability to pre-empt an ongoing BE timeslice when a RT
request is waiting for the current timeslice to complete. This reduces the
wait time to disk for RT requests from an upper bound of 4 (current value
of cfq_quantum) to 1 disk request.

Applied Jens' suggeested changes to avoid the rb lookup and use !cfq_class_rt()
and retested.

Latency(secs) for the RT task when doing sequential reads from 10G file.
                       | only RT | RT + BE | RT + BE + this patch
small (512 byte) reads | 143     | 163     | 145
large (1Mb) reads      | 142     | 158     | 146
Signed-off-by: NDivyesh Shah <dpshah@google.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

3a9a3f6c

block: add sysfs file for controlling io stats accounting · bc58ba94

由 Jens Axboe 提交于 1月 23, 2009

This allows us to turn off disk stat accounting completely, for the cases
where the 0.5-1% reduction in system time is important.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

bc58ba94

block: silently error an unsupported barrier bio · cec0707e

由 Jens Axboe 提交于 1月 13, 2009

This fixes a "regression" from 2.6.28, where the barrier probes that file
systems may do would trigger additional end request warnings in dmesg.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

cec0707e

T
block: Fix documentation for blkdev_issue_flush() · dbdac9b7
由 Theodore Ts'o 提交于 1月 13, 2009
```
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
```
dbdac9b7
J
block: seperate bio/request unplug and sync bits · 213d9417
由 Jens Axboe 提交于 1月 06, 2009
```
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
```
213d9417

block: export SSD/non-rotational queue flag through sysfs · 1308835f

由 Bartlomiej Zolnierkiewicz 提交于 1月 07, 2009

For some devices (i.e. CFA ATA) we can't reliably detect whether
the device is of rotational or non-rotational type so we need to
leave the final decision about this setting to the user-space.

As a bonus do a minor CodingStyle fixup in queue_nomerges_store().
Suggested-by: NAlan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: NBartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

1308835f

J
block: get rid of the manual directory counting in blktrace · f48fc4d3
由 Jens Axboe 提交于 1月 05, 2009
```
It can result in a stuck blktrace system, if --kill is used.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
```
f48fc4d3

block: Allow empty integrity profile · 32231638

由 Martin K. Petersen 提交于 1月 04, 2009

Allow a block device to allocate and register an integrity profile
without providing a template.  This allows DM to preallocate a profile
to avoid deadlocks during table reconfiguration.
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

32231638

07 1月, 2009 2 次提交
- T
  block: Add Kconfig help which notes that ext4 needs CONFIG_LBD · 4d783b09
  由 Theodore Ts'o 提交于 1月 06, 2009
```
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: Jens Axboe <jens.axboe@oracle.com>
```
  4d783b09
- K
  block: struct device - replace bus_id with dev_name(), dev_set_name() · 3ada8b7e
  由 Kay Sievers 提交于 1月 06, 2009
```
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: NKay Sievers <kay.sievers@vrfy.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
```
  3ada8b7e
03 1月, 2009 2 次提交

[SCSI] block: make blk_rq_map_user take a NULL user-space buffer for WRITE · 97ae77a1

由 FUJITA Tomonori 提交于 12月 18, 2008

The commit 81882766 (block: make
blk_rq_map_user take a NULL user-space buffer) extended
blk_rq_map_user to accept a NULL user-space buffer with a READ
command. It was necessary to convert sg to use the block layer mapping
API.

This patch extends blk_rq_map_user again for a WRITE command. It is
necessary to convert st and osst drivers to use the block layer
apping API.
Signed-off-by: NFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: NJens Axboe <jens.axboe@oracle.com>
Signed-off-by: NJames Bottomley <James.Bottomley@HansenPartnership.com>

97ae77a1

[SCSI] block: fix the partial mappings with struct rq_map_data · 56c451f4

由 FUJITA Tomonori 提交于 12月 18, 2008

This fixes bio_copy_user_iov to properly handle the partial mappings
with struct rq_map_data (which only sg uses for now but st and osst
will shortly). It adds the offset member to struct rq_map_data and
changes blk_rq_map_user to update it so that bio_copy_user_iov can add
an appropriate page frame via bio_add_pc_page().
Signed-off-by: NFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: NJens Axboe <jens.axboe@oracle.com>
Signed-off-by: NJames Bottomley <James.Bottomley@HansenPartnership.com>

56c451f4

29 12月, 2008 22 次提交

cfq-iosched: fix race between exiting queue and exiting task · 62c1fe9d

由 Jens Axboe 提交于 12月 15, 2008

Original patch from Nikanth Karthikesan <knikanth@suse.de>

When a queue exits the queue lock is taken and cfq_exit_queue() would free all
the cic's associated with the queue.

But when a task exits, cfq_exit_io_context() gets cic one by one and then
locks the associated queue to call __cfq_exit_single_io_context. It looks like
between getting a cic from the ioc and locking the queue, the queue might have
exited on another cpu.

Fix this by rechecking the cfq_io_context queue key inside the queue lock
again, and not calling into __cfq_exit_single_io_context() if somebody
beat us to it.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

62c1fe9d

Get rid of CONFIG_LSF · b3a6ffe1

由 Jens Axboe 提交于 12月 12, 2008

We have two seperate config entries for large devices/files. One
is CONFIG_LBD that guards just the devices, the other is CONFIG_LSF
that handles large files. This doesn't make a lot of sense, you typically
want both or none. So get rid of CONFIG_LSF and change CONFIG_LBD wording
to indicate that it covers both.
Acked-by: NJean Delvare <khali@linux-fr.org>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

b3a6ffe1

block: make blk_softirq_init() static · 3c18ce71

由 Roel Kluin 提交于 12月 10, 2008

Sparse asked whether these could be static.
Signed-off-by: NRoel Kluin <roel.kluin@gmail.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

3c18ce71

block: use min_not_zero in blk_queue_stack_limits · 18af8b2c

由 FUJITA Tomonori 提交于 12月 04, 2008

zero is invalid for max_phys_segments, max_hw_segments, and
max_segment_size. It's better to use use min_not_zero instead of
min. min() works though (because the commit 0e435ac2 makes sure that
these values are set to the default values, non zero, if a queue is
initialized properly).

With this patch, blk_queue_stack_limits does the almost same thing
that dm's combine_restrictions_low() does. I think that it's easy to
remove dm's combine_restrictions_low.
Signed-off-by: NFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

18af8b2c

block: add one-hit cache for disk partition lookup · a6f23657

由 Jens Axboe 提交于 10月 24, 2008

disk_map_sector_rcu() returns a partition from a sector offset,
which we use for IO statistics on a per-partition basis. The
lookup itself is an O(N) list lookup, where N is the number of
partitions. This actually hurts performance quite a bit, even
on the lower end partitions. On higher numbered partitions,
it can get pretty bad.

Solve this by adding a one-hit cache for partition lookup.
This makes the lookup O(1) for the case where we do most IO to
one partition. Even for mixed partition workloads, amortized cost
is pretty close to O(1) since the natural IO batching makes the
one-hit cache last for lots of IOs.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

a6f23657

cfq-iosched: remove limit of dispatch depth of max 4 times quantum · 30e0dc28

由 Jens Axboe 提交于 10月 20, 2008

This basically limits the hardware queue depth to 4*quantum at any
point in time, which is 16 with the default settings. As CFQ uses
other means to shrink the hardware queue when necessary in the first
place, there's really no need for this extra heuristic. Additionally,
it ends up hurting performance in some cases.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

30e0dc28

block: get rid of elevator_t typedef · b374d18a

由 Jens Axboe 提交于 10月 31, 2008

Just use struct elevator_queue everywhere instead.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

b374d18a

block: don't use plugging on SSD devices · a31a9738

由 Jens Axboe 提交于 10月 17, 2008

We just want to hand the first bits of IO to the device as fast
as possible. Gains a few percent on the IOPS rate.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

a31a9738

block: fix empty barrier on write-through w/ ordered tag · a185eb4b

由 Tejun Heo 提交于 11月 28, 2008

Empty barrier on write-through (or no cache) w/ ordered tag has no
command to execute and without any command to execute ordered tag is
never issued to the device and the ordering is never achieved.  Force
draining for such cases.
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

a185eb4b

block: simplify empty barrier implementation · 58eea927

由 Tejun Heo 提交于 11月 28, 2008

Empty barrier required special handling in __elv_next_request() to
complete it without letting the low level driver see it.

With previous changes, barrier code is now flexible enough to skip the
BAR step using the same barrier sequence selection mechanism.  Drop
the special handling and mask off q->ordered from start_ordered().

Remove blk_empty_barrier() test which now has no user.
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

58eea927

block: make barrier completion more robust · 8f11b3e9

由 Tejun Heo 提交于 11月 28, 2008

Barrier completion had the following assumptions.

* start_ordered() couldn't finish the whole sequence properly.  If all
  actions are to be skipped, q->ordseq is set correctly but the actual
  completion was never triggered thus hanging the barrier request.

* Drain completion in elv_complete_request() assumed that there's
  always at least one request in the queue when drain completes.

Both assumptions are true but these assumptions need to be removed to
improve empty barrier implementation.  This patch makes the following
changes.

* Make start_ordered() use blk_ordered_complete_seq() to mark skipped
  steps complete and notify __elv_next_request() that it should fetch
  the next request if the whole barrier has completed inside
  start_ordered().

* Make drain completion path in elv_complete_request() check whether
  the queue is empty.  Empty queue also indicates drain completion.

* While at it, convert 0/1 return from blk_do_ordered() to false/true.
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

8f11b3e9

block: make every barrier action optional · f671620e

由 Tejun Heo 提交于 11月 28, 2008

In all barrier sequences, the barrier write itself was always assumed
to be issued and thus didn't have corresponding control flag.  This
patch adds QUEUE_ORDERED_DO_BAR and unify action mask handling in
start_ordered() such that any barrier action can be skipped.

This patch doesn't introduce any visible behavior changes.
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

f671620e

block: remove duplicate or unused barrier/discard error paths · a7384677

由 Tejun Heo 提交于 11月 28, 2008

* Because barrier mode can be changed dynamically, whether barrier is
  supported or not can be determined only when actually issuing the
  barrier and there is no point in checking it earlier.  Drop barrier
  support check in generic_make_request() and __make_request(), and
  update comment around the support check in blk_do_ordered().

* There is no reason to check discard support in both
  generic_make_request() and __make_request().  Drop the check in
  __make_request().  While at it, move error action block to the end
  of the function and add unlikely() to q existence test.

* Barrier request, be it empty or not, is never passed to low level
  driver and thus it's meaningless to try to copy back req->sector to
  bio->bi_sector on error.  In addition, the notion of failed sector
  doesn't make any sense for empty barrier to begin with.  Drop the
  code block from __end_that_request_first().
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

a7384677

block: reorganize QUEUE_ORDERED_* constants · 313e4299

由 Tejun Heo 提交于 11月 28, 2008

Separate out ordering type (drain,) and action masks (preflush,
postflush, fua) from visible ordering mode selectors
(QUEUE_ORDERED_*).  Ordering types are now named QUEUE_ORDERED_BY_*
while action masks are named QUEUE_ORDERED_DO_*.

This change is necessary to add QUEUE_ORDERED_DO_BAR and make it
optional to improve empty barrier implementation.
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

313e4299

block: use cancel_work_sync() instead of kblockd_flush_work() · 64d01dc9

由 Cheng Renquan 提交于 12月 03, 2008

After many improvements on kblockd_flush_work, it is now identical to
cancel_work_sync, so a direct call to cancel_work_sync is suggested.

The only difference is that cancel_work_sync is a GPL symbol,
so no non-GPL modules anymore.
Signed-off-by: NCheng Renquan <crquan@gmail.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

64d01dc9

block: Supress Buffer I/O errors when SCSI REQ_QUIET flag set · 08bafc03

由 Keith Mannthey 提交于 11月 25, 2008

Allow the scsi request REQ_QUIET flag to be propagated to the buffer
file system layer. The basic ideas is to pass the flag from the scsi
request to the bio (block IO) and then to the buffer layer.  The buffer
layer can then suppress needless printks.

This patch declutters the kernel log by removed the 40-50 (per lun)
buffer io error messages seen during a boot in my multipath setup . It
is a good chance any real errors will be missed in the "noise" it the
logs without this patch.

During boot I see blocks of messages like
"
__ratelimit: 211 callbacks suppressed
Buffer I/O error on device sdm, logical block 5242879
Buffer I/O error on device sdm, logical block 5242879
Buffer I/O error on device sdm, logical block 5242847
Buffer I/O error on device sdm, logical block 1
Buffer I/O error on device sdm, logical block 5242878
Buffer I/O error on device sdm, logical block 5242879
Buffer I/O error on device sdm, logical block 5242879
Buffer I/O error on device sdm, logical block 5242879
Buffer I/O error on device sdm, logical block 5242879
Buffer I/O error on device sdm, logical block 5242872
"
in my logs.

My disk environment is multipath fiber channel using the SCSI_DH_RDAC
code and multipathd.  This topology includes an "active" and "ghost"
path for each lun. IO's to the "ghost" path will never complete and the
SCSI layer, via the scsi device handler rdac code, quick returns the IOs
to theses paths and sets the REQ_QUIET scsi flag to suppress the scsi
layer messages.

 I am wanting to extend the QUIET behavior to include the buffer file
system layer to deal with these errors as well. I have been running this
patch for a while now on several boxes without issue.  A few runs of
bonnie++ show no noticeable difference in performance in my setup.

Thanks for John Stultz for the quiet_error finalization.
Submitted-by: NKeith Mannthey <kmannth@us.ibm.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

08bafc03

block: don't take lock on changing ra_pages · 7c239517

由 Wu Fengguang 提交于 11月 25, 2008

There's no need to take queue_lock or kernel_lock when modifying
bdi->ra_pages. So remove them. Also remove out of date comment for
queue_max_sectors_store().
Signed-off-by: NWu Fengguang <wfg@linux.intel.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

7c239517

block/blk-tag.c: cleanup kernel-doc · c6a06f70

由 Qinghuang Feng 提交于 11月 24, 2008

There is no argument named @tags in blk_init_tags,
remove its' comment.
Signed-off-by: NQinghuang Feng <qhfeng.kernel@gmail.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

c6a06f70

scsi-ioctl: use clock_t <> jiffies · 2b91bafc

由 Milton Miller 提交于 11月 17, 2008

Convert the timeout ioctl scalling to use the clock_t functions
which are much more accurate with some USER_HZ vs HZ combinations.
Signed-off-by: NMilton Miller <miltonm@bga.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

2b91bafc

block: leave the request timeout timer running even on an empty list · 70ed28b9

由 Jens Axboe 提交于 11月 19, 2008

For sync IO, we'll often do them serialized. This means we'll be touching
the queue timer for every IO, as opposed to only occasionally like we
do for queued IO. Instead of deleting the timer when the last request
is removed, just let continue running. If a new request comes up soon
we then don't have to readd the timer again. If no new requests arrive,
the timer will expire without side effect later.

This improves high iops sync IO by ~1%.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

70ed28b9

J
block: add comment in blk_rq_timed_out() about why next can not be 0 · 65d3618c
由 Jens Axboe 提交于 10月 30, 2008
```
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
```
65d3618c

block: optimizations in blk_rq_timed_out_timer() · 565e411d

由 malahal@us.ibm.com 提交于 10月 30, 2008

Now the rq->deadline can't be zero if the request is in the
timeout_list, so there is no need to have next_set. There is no need to
access a request's deadline field if blk_rq_timed_out is called on it.
Signed-off-by: NMalahal Naineni <malahal@us.ibm.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

565e411d

26 12月, 2008 1 次提交

cpumask: Replace cpu_coregroup_map with cpu_coregroup_mask · be4d638c

由 Rusty Russell 提交于 12月 26, 2008

cpu_coregroup_map returned a cpumask_t: it's going away.

(Note, the sched part of this patch won't apply meaningfully to the
sched tree, but I'm posting it to show the goal).
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Signed-off-by: NMike Travis <travis@sgi.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Ingo Molnar <mingo@redhat.com>

be4d638c

06 12月, 2008 1 次提交

Enforce a minimum SG_IO timeout · f2f1fa78

由 Linus Torvalds 提交于 12月 05, 2008

There's no point in having too short SG_IO timeouts, since if the
command does end up timing out, we'll end up through the reset sequence
that is several seconds long in order to abort the command that timed
out.

As a result, shorter timeouts than a few seconds simply do not make
sense, as the recovery would be longer than the timeout itself.

Add a BLK_MIN_SG_TIMEOUT to match the existign BLK_DEFAULT_SG_TIMEOUT.
Suggested-by: NAlan Cox <alan@lxorguk.ukuu.org.uk>
Acked-by: NTejun Heo <tj@kernel.org>
Acked-by: NJens Axboe <jens.axboe@oracle.com>
Cc: Jeff Garzik <jeff@garzik.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f2f1fa78

04 12月, 2008 1 次提交

[PATCH 1/2] kill FMODE_NDELAY_NOW · fd4ce1ac

由 Christoph Hellwig 提交于 11月 05, 2008

Update FMODE_NDELAY before each ioctl call so that we can kill the
magic FMODE_NDELAY_NOW.  It would be even better to do this directly
in setfl(), but for that we'd need to have FMODE_NDELAY for all files,
not just block special files.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

fd4ce1ac

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功