提交 · 27898988174bb211fd962ea73b9c6dc09f888705 · openeuler / Kernel

01 7月, 2008 1 次提交

block: Fix the starving writes bug in the anticipatory IO scheduler · d585d0b9

由 Divyesh Shah 提交于 6月 16, 2008

AS scheduler alternates between issuing read and write batches. It does
the batch switch only after all requests from the previous batch are
completed.

When switching to a write batch, if there is an on-going read request,
it waits for its completion and indicates its intention of switching by
setting ad->changed_batch and the new direction but does not update the
batch_expire_time for the new write batch which it does in the case of
no previous pending requests.
On completion of the read request, it sees that we were waiting for the
switch and schedules work for kblockd right away and resets the
ad->changed_data flag.
Now when kblockd enters dispatch_request where it is expected to pick
up a write request, it in turn ends the write batch because the
batch_expire_timer was not updated and shows the expire timestamp for
the previous batch.

This results in the write starvation for all the cases where there is
the intention for switching to a write batch, but there is a previous
in-flight read request and the batch gets reverted to a read_batch
right away.

This also holds true in the reverse case (switching from a write batch
to a read batch with an in-flight write request).

I've checked that this bug exists on 2.6.11, 2.6.18, 2.6.24 and
linux-2.6-block git HEAD. I've tested the fix on x86 platforms with
SCSI drives where the driver asks for the next request while a current
request is in-flight.

This patch is based off linux-2.6-block git HEAD.

Bug reproduction:
A simple scenario which reproduces this bug is:
- dd if=/dev/hda3 of=/dev/null &
- lilo
   The lilo takes forever to complete.

This can also be reproduced fairly easily with the earlier dd and
another test
program doing msync().

The example test program below should print out a message after every
iteration
but it simply hangs forever. With this bugfix it makes forward progress.

====
Example test program using msync() (thanks to suleiman AT google DOT
com)

inline uint64_t
rdtsc(void)
{
         int64_t tsc;

         __asm __volatile("rdtsc" : "=A" (tsc));
         return (tsc);
}

int
main(int argc, char **argv)
{
         struct stat st;
         uint64_t e, s, t;
         char *p, q;
         long i;
         int fd;

         if (argc < 2) {
                 printf("Usage: %s <file>\n", argv[0]);
                 return (1);
         }

         if ((fd = open(argv[1], O_RDWR | O_NOATIME)) < 0)
                 err(1, "open");

         if (fstat(fd, &st) < 0)
                 err(1, "fstat");

         p = mmap(NULL, st.st_size, PROT_READ | PROT_WRITE,
MAP_SHARED, fd, 0);

         t = 0;
         for (i = 0; i < 1000; i++) {
                 *p = 0;
                 msync(p, 4096, MS_SYNC);
                 s = rdtsc();
                *p = 0;
                 __asm __volatile(""::: "memory");
                 e = rdtsc();
                 if (argc > 2)
                         printf("%d: %lld cycles %jd %jd\n",
                                i, e - s, (intmax_t)s, (intmax_t)e);
                 t += e - s;
         }
         printf("average time: %lld cycles\n", t / 1000);
         return (0);
}

Cc: <stable@kernel.org>
Acked-by: NNick Piggin <npiggin@suse.de>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

d585d0b9

13 6月, 2008 1 次提交

block: disable IRQs until data is written to relay channel · 14a73f54

由 Carl Henrik Lunde 提交于 6月 12, 2008

As we may run relay_reserve from interrupt context we must always disable
IRQs.  This is because a call to relay_reserve may expose previously written
data to use space.

Updated new message code and an old but related comment.
Signed-off-by: NCarl Henrik Lunde <chlunde@ping.uio.no>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

14a73f54

10 6月, 2008 1 次提交

Fix invalid access errors in blk_lookup_devt · d5791d13

由 Linus Torvalds 提交于 6月 09, 2008

Commit 30f2f0eb ("block: do_mounts -
accept root=<non-existant partition>") extended blk_lookup_devt() to be
able to look up partitions that had not yet been registered, but in the
process made the assumption that the '&block_class.devices' list only
contains disk devices and that you can do 'dev_to_disk(dev)' on them.

That isn't actually true.  The block_class device list also contains the
partitions we've discovered so far, and you can't just do a
'dev_to_disk()' on those.

So make sure to only work on devices that block/genhd.c has registered
itself, something we can test by checking the 'dev->type' member.  This
makes the loop in blk_lookup_devt() match the other such loops in this
file.

[ We may want to do an alternate version that knows to handle _either_
  whole-disk devices or partitions, but for now this is the minimal fix
  for a series of crashes reported by Mariusz Kozlowski in

	http://lkml.org/lkml/2008/5/25/25

  and Ingo in

	http://lkml.org/lkml/2008/6/9/39 ]
Reported-by: NMariusz Kozlowski <m.kozlowski@tuxland.pl>
Reported-by: NIngo Molnar <mingo@elte.hu>
Cc: Neil Brown <neilb@suse.de>
Cc: Joao Luis Meloni Assirati <assirati@nonada.if.usp.br>
Acked-by: NKay Sievers <kay.sievers@vrfy.org>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d5791d13

28 5月, 2008 6 次提交

cfq-iosched: fix RCU problem in cfq_cic_lookup() · d6de8be7

由 Jens Axboe 提交于 5月 28, 2008

cfq_cic_lookup() needs to properly protect ioc->ioc_data before
dereferencing it and also exclude updaters of ioc->ioc_data as well.

Also add a number of comments documenting why the existing RCU usage
is OK.

Thanks a lot to "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> for
review and comments!
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

d6de8be7

block: make blktrace use per-cpu buffers for message notes · 64565911

由 Jens Axboe 提交于 5月 28, 2008

Currently it uses a single static char array, but that risks
being corrupted when multiple users issue message notes at the
same time. Make the buffers dynamically allocated when the trace
is setup and make them per-cpu instead.

The default max message size of 1k is also very large, the
interface is mainly for small text notes. So shrink it to 128 bytes.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

64565911

A
Added in elevator switch message to blktrace stream · 4722dc52
由 Alan D. Brunelle 提交于 5月 27, 2008
```
Signed-off-by: NAlan D. Brunelle <alan.brunelle@hp.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
```
4722dc52

Added in MESSAGE notes for blktraces · 9d5f09a4

由 Alan D. Brunelle 提交于 5月 27, 2008

Allows messages to be inserted into blktrace streams.
Signed-off-by: NAlan D. Brunelle <alan.brunelle@hp.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

9d5f09a4

block: reorder cfq_queue to save space on 64bit builds · be754d2c

由 Richard Kennedy 提交于 5月 23, 2008

saves 8 bytes of padding & increases objects/slab from 30 to 32 on my
AMD64 config
Signed-off-by: NRichard Kennedy <richard@rsk.demon.co.uk>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

be754d2c

block: Move the second call to get_request to the end of the loop · 05caf8db

由 Zhang, Yanmin 提交于 5月 22, 2008

In function get_request_wait, the second call to get_request could be
moved to the end of the while loop, because if the first call to
get_request fails, the second call will fail without sleep.
Signed-off-by: NZhang Yanmin <yanmin.zhang@intel.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

05caf8db

15 5月, 2008 2 次提交

Remove blkdev warning triggered by using md · e7e72bf6

由 Neil Brown 提交于 5月 14, 2008

As setting and clearing queue flags now requires that we hold a spinlock
on the queue, and as blk_queue_stack_limits is called without that lock,
get the lock inside blk_queue_stack_limits.

For blk_queue_stack_limits to be able to find the right lock, each md
personality needs to set q->queue_lock to point to the appropriate lock.
Those personalities which didn't previously use a spin_lock, us
q->__queue_lock.  So always initialise that lock when allocated.

With this in place, setting/clearing of the QUEUE_FLAG_PLUGGED bit will no
longer cause warnings as it will be clear that the proper lock is held.

Thanks to Dan Williams for review and fixing the silly bugs.
Signed-off-by: NNeilBrown <neilb@suse.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Alistair John Strachan <alistair@devzero.co.uk>
Cc: Nick Piggin <npiggin@suse.de>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Jacek Luczak <difrost.kernel@gmail.com>
Cc: Prakash Punnoor <prakash@punnoor.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e7e72bf6

block: do_mounts - accept root=<non-existant partition> · 30f2f0eb

由 Kay Sievers 提交于 5月 06, 2008

Some devices, like md, may create partitions only at first access,
so allow root= to be set to a valid non-existant partition of an
existing disk. This applies only to non-initramfs root mounting.

This fixes a regression from 2.6.24 which did allow this to happen and
broke some users machines :(
Acked-by: NNeil Brown <neilb@suse.de>
Tested-by: NJoao Luis Meloni Assirati <assirati@nonada.if.usp.br>
Cc: stable <stable@kernel.org>
Signed-off-by: NKay Sievers <kay.sievers@vrfy.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>

30f2f0eb

13 5月, 2008 1 次提交

Fix misuses of bdevname() · f36f21ec

由 Jean Delvare 提交于 5月 12, 2008

bdevname() fills the buffer that it is given as a parameter, so calling
strcpy() or snprintf() on the returned value is redundant (and probably not
guaranteed to work - I don't think strcpy and snprintf support overlapping
buffers.)
Signed-off-by: NJean Delvare <khali@linux-fr.org>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f36f21ec

07 5月, 2008 7 次提交

block: avoid duplicate calls to get_part() in disk stat code · 28f13702

由 Jens Axboe 提交于 5月 07, 2008

get_part() is fairly expensive, as it O(N) loops over partitions
to find the right one. In lots of normal IO paths we end up looking
up the partition twice, to make matters even worse. Change the
stat add code to accept a passed in partition instead.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

28f13702

cfq-iosched: make io priorities inherit CPU scheduling class as well as nice · 6d63c275

由 Jens Axboe 提交于 5月 07, 2008

We currently set all processes to the best-effort scheduling class,
regardless of what CPU scheduling class they belong to. Improve that
so that we correctly track idle and rt scheduling classes as well.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

6d63c275

block: optimize generic_unplug_device() · dbaf2c00

由 Jens Axboe 提交于 5月 07, 2008

Original patch from Mikulas Patocka <mpatocka@redhat.com>

Mike Anderson was doing an OLTP benchmark on a computer with 48 physical
disks mapped to one logical device via device mapper.

He found that there was a slowdown on request_queue->lock in function
generic_unplug_device. The slowdown is caused by the fact that when some
code calls unplug on the device mapper, device mapper calls unplug on all
physical disks. These unplug calls take the lock, find that the queue is
already unplugged, release the lock and exit.

With the below patch, performance of the benchmark was increased by 18%
(the whole OLTP application, not just block layer microbenchmarks).

So I'm submitting this patch for upstream. I think the patch is correct,
because when more threads call simultaneously plug and unplug, it is
unspecified, if the queue is or isn't plugged (so the patch can't make
this worse). And the caller that plugged the queue should unplug it
anyway. (if it doesn't, there's 3ms timeout).
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

dbaf2c00

block: get rid of likely/unlikely predictions in merge logic · 2cdf79ca

由 Jens Axboe 提交于 5月 07, 2008

They tend to depend a lot on the workload, so not a clear-cut
likely or unlikely fit.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

2cdf79ca

cfq-iosched: fix RCU race in the cfq io_context destructor handling · 07416d29

由 Jens Axboe 提交于 5月 07, 2008

put_io_context() drops the RCU read lock before calling into cfq_dtor(),
however we need to hold off freeing there before grabbing and
dereferencing the first object on the list.

So extend the rcu_read_lock() scope to cover the calling of cfq_dtor(),
and optimize cfq_free_io_context() to use a new variant for
call_for_each_cic() that assumes the RCU read lock is already held.

Hit in the wild by Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

07416d29

block: adjust tagging function queue bit locking · aa94b537

由 Jens Axboe 提交于 5月 07, 2008

For most initialization purposes, calling blk_queue_init_tags() without
the queue lock held is OK. Only if called for resizing an existing map
must the lock be held. Ditto for tag cleanup, the maps are reference
counted.

So switch the general queue flag setting to the unlocked variant, but
retain the locked variant for resizing.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

aa94b537

block: sysfs store function needs to grab queue_lock and use queue_flag_*() · bf0f9702

由 Jens Axboe 提交于 5月 07, 2008

Concurrency isn't a big deal here since we have requests in flight
at this point, but do the locked variant to set a better example.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

bf0f9702

03 5月, 2008 1 次提交

[SCSI] add support for variable length extended commands · db4742dd

由 Boaz Harrosh 提交于 4月 30, 2008

Add support for variable-length, extended, and vendor specific
CDBs to scsi-ml. It is now possible for initiators and ULD's
to issue these types of commands. LLDs need not change much.
All they need is to raise the .max_cmd_len to the longest command
they support (see iscsi patch).

- clean-up some code paths that did not expect commands to be
  larger than 16, and change cmd_len members' type to short as
  char is not enough.
Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>
Signed-off-by: NBenny Halevy <bhalevy@panasas.com>
Signed-off-by: NJames Bottomley <James.Bottomley@HansenPartnership.com>

db4742dd

02 5月, 2008 1 次提交

[SCSI] bsg: add large command support · 9f5de6b1

由 FUJITA Tomonori 提交于 4月 30, 2008

This enables bsg to handle the request length larger than BLK_MAX_CDB
(mainly for the variable length CDB format).
Signed-off-by: NFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: NJens Axboe <jens.axboe@oracle.com>
Signed-off-by: NJames Bottomley <James.Bottomley@HansenPartnership.com>

9f5de6b1

01 5月, 2008 1 次提交

block: remove remaining __FUNCTION__ occurrences · 24c03d47

由 Harvey Harrison 提交于 5月 01, 2008

__FUNCTION__ is gcc specific, use __func__
Signed-off-by: NHarvey Harrison <harvey.harrison@gmail.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

24c03d47

30 4月, 2008 1 次提交

mm: bdi: export BDI attributes in sysfs · cf0ca9fe

由 Peter Zijlstra 提交于 4月 30, 2008

Provide a place in sysfs (/sys/class/bdi) for the backing_dev_info object.
This allows us to see and set the various BDI specific variables.

In particular this properly exposes the read-ahead window for all relevant
users and /sys/block/<block>/queue/read_ahead_kb should be deprecated.

With patient help from Kay Sievers and Greg KH

[mszeredi@suse.cz]

 - split off NFS and FUSE changes into separate patches
 - document new sysfs attributes under Documentation/ABI
 - do bdi_class_init as a core_initcall, otherwise the "default" BDI
   won't be initialized
 - remove bdi_init_fmt macro, it's not used very much

[akpm@linux-foundation.org: fix ia64 warning]
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Acked-by: NGreg KH <greg@kroah.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

cf0ca9fe

29 4月, 2008 11 次提交

block: Skip I/O merges when disabled · ac9fafa1

由 Alan D. Brunelle 提交于 4月 29, 2008

The block I/O + elevator + I/O scheduler code spend a lot of time trying
to merge I/Os -- rightfully so under "normal" circumstances. However,
if one were to know that the incoming I/O stream was /very/ random in
nature, the cycles are wasted.

This patch adds a per-request_queue tunable that (when set) disables
merge attempts (beyond the simple one-hit cache check), thus freeing up
a non-trivial amount of CPU cycles.
Signed-off-by: NAlan D. Brunelle <alan.brunelle@hp.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

ac9fafa1

block: add large command support · d7e3c324

由 FUJITA Tomonori 提交于 4月 29, 2008

This patch changes rq->cmd from the static array to a pointer to
support large commands.

We rarely handle large commands. So for optimization, a struct request
still has a static array for a command. rq_init sets rq->cmd pointer
to the static array.
Signed-off-by: NFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

d7e3c324

block: replace sizeof(rq->cmd) with BLK_MAX_CDB · d34c87e4

由 FUJITA Tomonori 提交于 4月 29, 2008

This is a preparation for changing rq->cmd from the static array to a
pointer.
Signed-off-by: NFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Boaz Harrosh <bharrosh@panasas.com>
Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

d34c87e4

block: rename and export rq_init() · 2a4aa30c

由 FUJITA Tomonori 提交于 4月 29, 2008

This rename rq_init() blk_rq_init() and export it. Any path that hands
the request to the block layer needs to call it to initialize the
request.

This is a preparation for large command support, which needs to
initialize the request in a proper way (that is, just doing a memset()
will not work).
Signed-off-by: NFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

2a4aa30c

block: no need to initialize rq->cmd with blk_get_request · 992b5bce

由 FUJITA Tomonori 提交于 4月 29, 2008

blk_get_request initializes rq->cmd (rq_init does) so the users don't
need to do that.

The purpose of this patch is to remove sizeof(rq->cmd) and &rq->cmd,
as a preparation for large command support, which changes rq->cmd from
the static array to a pointer. sizeof(rq->cmd) will not make sense and
&rq->cmd won't work.
Signed-off-by: NFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Alasdair G Kergon <agk@redhat.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

992b5bce

block/blk-barrier.c:blk_ordered_cur_seq() mustn't be inline · 6f6a036e

由 Adrian Bunk 提交于 4月 29, 2008

This patch fixes the following build error with UML and gcc 4.3:

<--  snip  -->

...
  CC      block/blk-barrier.o
/home/bunk/linux/kernel-2.6/git/linux-2.6/block/blk-barrier.c: In function ‘blk_do_ordered’:
/home/bunk/linux/kernel-2.6/git/linux-2.6/block/blk-barrier.c:57: sorry, unimplemented: inlining failed in call to ‘blk_ordered_cur_seq’: function body not available
/home/bunk/linux/kernel-2.6/git/linux-2.6/block/blk-barrier.c:252: sorry, unimplemented: called from here
/home/bunk/linux/kernel-2.6/git/linux-2.6/block/blk-barrier.c:57: sorry, unimplemented: inlining failed in call to ‘blk_ordered_cur_seq’: function body not available
/home/bunk/linux/kernel-2.6/git/linux-2.6/block/blk-barrier.c:253: sorry, unimplemented: called from here
make[2]: *** [block/blk-barrier.o] Error 1

<--  snip  -->
Signed-off-by: NAdrian Bunk <bunk@kernel.org>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

6f6a036e

block/elevator.c:elv_rq_merge_ok() mustn't be inline · 72ed0bf6

由 Adrian Bunk 提交于 4月 29, 2008

This patch fixes the following build error with UML and gcc 4.3:

<--  snip  -->

...
  CC      block/elevator.o
/home/bunk/linux/kernel-2.6/git/linux-2.6/block/elevator.c: In function ‘elv_merge’:
/home/bunk/linux/kernel-2.6/git/linux-2.6/block/elevator.c:73: sorry, unimplemented: inlining failed in call to ‘elv_rq_merge_ok’: function body not available
/home/bunk/linux/kernel-2.6/git/linux-2.6/block/elevator.c:103: sorry, unimplemented: called from here
/home/bunk/linux/kernel-2.6/git/linux-2.6/block/elevator.c:73: sorry, unimplemented: inlining failed in call to ‘elv_rq_merge_ok’: function body not available
/home/bunk/linux/kernel-2.6/git/linux-2.6/block/elevator.c:495: sorry, unimplemented: called from here
make[2]: *** [block/elevator.o] Error 1
make[1]: *** [block] Error 2

<--  snip  -->
Signed-off-by: NAdrian Bunk <bunk@kernel.org>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

72ed0bf6

block: make queue flags non-atomic · 75ad23bc

由 Nick Piggin 提交于 4月 29, 2008

We can save some atomic ops in the IO path, if we clearly define
the rules of how to modify the queue flags.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

75ad23bc

block: add dma alignment and padding support to blk_rq_map_kern · 68154e90

由 FUJITA Tomonori 提交于 4月 25, 2008

This patch adds bio_copy_kern similar to
bio_copy_user. blk_rq_map_kern uses bio_copy_kern instead of
bio_map_kern if necessary.

bio_copy_kern uses temporary pages and the bi_end_io callback frees
these pages. bio_copy_kern saves the original kernel buffer at
bio->bi_private it doesn't use something like struct bio_map_data to
store the information about the caller.
Signed-off-by: NFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Tejun Heo <htejun@gmail.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

68154e90

unexport blk_max_pfn · 657e93be

由 Adrian Bunk 提交于 4月 25, 2008

blk_max_pfn can now be unexported.
Signed-off-by: NAdrian Bunk <bunk@kernel.org>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

657e93be

block: make rq_init() do a full memset() · 1afb20f3

由 FUJITA Tomonori 提交于 4月 25, 2008

This requires moving rq_init() from get_request() to blk_alloc_request().
The upside is that we can now require an rq_init() from any path that
wishes to hand the request to the block layer.

rq_init() will be exported for the code that uses struct request
without blk_get_request.

This is a preparation for large command support, which needs to
initialize struct request in a proper way (that is, just doing a
memset() will not work).
Signed-off-by: NFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

1afb20f3

23 4月, 2008 1 次提交

[SCSI] bsg: add release callback support · 97f46ae4

由 FUJITA Tomonori 提交于 4月 19, 2008

This patch adds release callback support, which is called when a bsg
device goes away. bsg_register_queue() takes a pointer to a callback
function. This feature is useful for stuff like sas_host that can't
use the release callback in struct device.

If a caller doesn't need bsg's release callback, it can call
bsg_register_queue() with NULL pointer (e.g. scsi devices can use
release callback in struct device so they don't need bsg's callback).

With this patch, bsg uses kref for refcounts on bsg devices instead of
get/put_device in fops->open/release. bsg calls put_device and the
caller's release callback (if it was registered) in kref_put's
release.
Signed-off-by: NFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: NJames Bottomley <James.Bottomley@HansenPartnership.com>

97f46ae4

21 4月, 2008 4 次提交

block: fix blk_register_queue() return value · fb199746

由 Akinobu Mita 提交于 4月 21, 2008

blk_register_queue() returns -ENXIO when queue->request_fn is NULL.  But there
are some block drivers that call blk_register_queue() via add_disk() with
queue->request_fn == NULL.  (For example, brd, loop)

Although no one checks return value of blk_register_queue(), this patch makes
it return 0 instead of -ENXIO when queue->request_fn is NULL,

Also this patch adds warning when blk_register_queue() and
blk_unregister_queue() are called with queue == NULL rather than ignore
invalid usage silently.
Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

fb199746

Kconfig: clean up block/Kconfig help descriptions · ee86418d

由 Nick Andrew 提交于 4月 21, 2008

Modify the help descriptions of block/Kconfig for clarity, accuracy and
consistency.

Refactor the BLOCK description a bit.  The wording "This permits ...  to be
removed" isn't quite right; the block layer is removed when the option is
disabled, whereas most descriptions talk about what happens when the option is
enabled.  Reformat the list of what is affected by disabling the block layer.

Add more examples of large block devices to LBD and strive for technical
accuracy; block devices of size _exactly_ 2TB require CONFIG_LBD, not only
"bigger than 2TB".  Also try to say (perhaps not very clearly) that the config
option is only needed when you want to have individual block devices of size
>= 2TB, for example if you had 3 x 1TB disks in your computer you'd have a
total storage size of 3TB but you wouldn't need the option unless you want to
aggregate those disks into a RAID or LVM.

Improve terminology and grammar on BLK_DEV_IO_TRACE.

I also added the boilerplate "If unsure, say N" to most options.

Precisely say "2TB and larger" for LSF.

Indent the help text for BLK_DEV_BSG by 2 spaces in accordance with the
standard.
Signed-off-by: NNick Andrew <nick@nick-andrew.net>
Cc: "Randy.Dunlap" <rdunlap@xenotime.net>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

ee86418d

block: move the padding adjustment to blk_rq_map_sg · f18573ab

由 FUJITA Tomonori 提交于 4月 11, 2008

blk_rq_map_user adjusts bi_size of the last bio. It breaks the rule
that req->data_len (the true data length) is equal to sum(bio). It
broke the scsi command completion code.

commit e97a294e was introduced to fix
the above issue. However, the partial completion code doesn't work
with it. The commit is also a layer violation (scsi mid-layer should
not know about the block layer's padding).

This patch moves the padding adjustment to blk_rq_map_sg (suggested by
James). The padding works like the drain buffer. This patch breaks the
rule that req->data_len is equal to sum(sg), however, the drain buffer
already broke it. So this patch just restores the rule that
req->data_len is equal to sub(bio) without breaking anything new.

Now when a low level driver needs padding, blk_rq_map_user and
blk_rq_map_user_iov guarantee there's enough room for padding.
blk_rq_map_sg can safely extend the last entry of a scatter list.

blk_rq_map_sg must extend the last entry of a scatter list only for a
request that got through bio_copy_user_iov. This patches introduces
new REQ_COPY_USER flag.
Signed-off-by: NFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Tejun Heo <htejun@gmail.com>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

f18573ab

block: add bio_copy_user_iov support to blk_rq_map_user_iov · afdc1a78

由 FUJITA Tomonori 提交于 4月 11, 2008

With this patch, blk_rq_map_user_iov uses bio_copy_user_iov when a low
level driver needs padding or a buffer in sg_iovec isn't aligned. That
is, it uses temporary kernel buffers instead of mapping user pages
directly.

When a LLD needs padding, later blk_rq_map_sg needs to extend the last
entry of a scatter list. bio_copy_user_iov guarantees that there is
enough space for padding by using temporary kernel buffers instead of
user pages.

blk_rq_map_user_iov needs buffers in sg_iovec to be aligned. The
comment in blk_rq_map_user_iov indicates that drivers/scsi/sg.c also
needs buffers in sg_iovec to be aligned. Actually, drivers/scsi/sg.c
works with unaligned buffers in sg_iovec (it always uses temporary
kernel buffers).
Signed-off-by: NFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Tejun Heo <htejun@gmail.com>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

afdc1a78

20 4月, 2008 1 次提交

SCSI: convert struct class_device to struct device · ee959b00

由 Tony Jones 提交于 2月 22, 2008

It's big, but there doesn't seem to be a way to split it up smaller...
Signed-off-by: NTony Jones <tonyj@suse.de>
Signed-off-by: NKay Sievers <kay.sievers@vrfy.org>
Cc: Roland Dreier <rolandd@cisco.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>

ee959b00

openeuler / Kernel 大约 1 年 前同步成功

openeuler / Kernel
大约 1 年前同步成功