提交 · b3a6ffe16b5cc48abe7db8d04882dc45280eb693 · openeuler / Kernel

29 12月, 2008 40 次提交

由 Jens Axboe 提交于 12月 12, 2008

We have two seperate config entries for large devices/files. One
is CONFIG_LBD that guards just the devices, the other is CONFIG_LSF
that handles large files. This doesn't make a lot of sense, you typically
want both or none. So get rid of CONFIG_LSF and change CONFIG_LBD wording
to indicate that it covers both.
Acked-by: NJean Delvare <khali@linux-fr.org>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

b3a6ffe1

block: make blk_softirq_init() static · 3c18ce71

由 Roel Kluin 提交于 12月 10, 2008

Sparse asked whether these could be static.
Signed-off-by: NRoel Kluin <roel.kluin@gmail.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

3c18ce71

block: use min_not_zero in blk_queue_stack_limits · 18af8b2c

由 FUJITA Tomonori 提交于 12月 04, 2008

zero is invalid for max_phys_segments, max_hw_segments, and
max_segment_size. It's better to use use min_not_zero instead of
min. min() works though (because the commit 0e435ac2 makes sure that
these values are set to the default values, non zero, if a queue is
initialized properly).

With this patch, blk_queue_stack_limits does the almost same thing
that dm's combine_restrictions_low() does. I think that it's easy to
remove dm's combine_restrictions_low.
Signed-off-by: NFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

18af8b2c

block: add one-hit cache for disk partition lookup · a6f23657

由 Jens Axboe 提交于 10月 24, 2008

disk_map_sector_rcu() returns a partition from a sector offset,
which we use for IO statistics on a per-partition basis. The
lookup itself is an O(N) list lookup, where N is the number of
partitions. This actually hurts performance quite a bit, even
on the lower end partitions. On higher numbered partitions,
it can get pretty bad.

Solve this by adding a one-hit cache for partition lookup.
This makes the lookup O(1) for the case where we do most IO to
one partition. Even for mixed partition workloads, amortized cost
is pretty close to O(1) since the natural IO batching makes the
one-hit cache last for lots of IOs.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

a6f23657

cfq-iosched: remove limit of dispatch depth of max 4 times quantum · 30e0dc28

由 Jens Axboe 提交于 10月 20, 2008

This basically limits the hardware queue depth to 4*quantum at any
point in time, which is 16 with the default settings. As CFQ uses
other means to shrink the hardware queue when necessary in the first
place, there's really no need for this extra heuristic. Additionally,
it ends up hurting performance in some cases.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

30e0dc28

J
nbd: tell the block layer that it is not a rotational device · 31dcfab0
由 Jens Axboe 提交于 10月 31, 2008
```
Then we can get rid of that manual elevator type fiddling.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
```
31dcfab0

block: get rid of elevator_t typedef · b374d18a

由 Jens Axboe 提交于 10月 31, 2008

Just use struct elevator_queue everywhere instead.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

b374d18a

aio: make the lookup_ioctx() lockless · abf137dd

由 Jens Axboe 提交于 12月 09, 2008

The mm->ioctx_list is currently protected by a reader-writer lock,
so we always grab that lock on the read side for doing ioctx
lookups. As the workload is extremely reader biased, turn this into
an rcu hlist so we can make lookup_ioctx() lockless. Get rid of
the rwlock and use a spinlock for providing update side exclusion.

There's usually only 1 entry on this list, so it doesn't make sense
to look into fancier data structures.
Reviewed-by: NJeff Moyer <jmoyer@redhat.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

abf137dd

bio: add support for inlining a number of bio_vecs inside the bio · 392ddc32

由 Jens Axboe 提交于 12月 23, 2008

When we go and allocate a bio for IO, we actually do two allocations.
One for the bio itself, and one for the bi_io_vec that holds the
actual pages we are interested in.

This feature inlines a definable amount of io vecs inside the bio
itself, so we eliminate the bio_vec array allocation for IO's up
to a certain size. It defaults to 4 vecs, which is typically 16k
of IO.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

392ddc32

bio: allow individual slabs in the bio_set · bb799ca0

由 Jens Axboe 提交于 12月 10, 2008

Instead of having a global bio slab cache, add a reference to one
in each bio_set that is created. This allows for personalized slabs
in each bio_set, so that they can have bios of different sizes.

This means we can personalize the bios we return. File systems may
want to embed the bio inside another structure, to avoid allocation
more items (and stuffing them in ->bi_private) after the get a bio.
Or we may want to embed a number of bio_vecs directly at the end
of a bio, to avoid doing two allocations to return a bio. This is now
possible.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

bb799ca0

bio: move the slab pointer inside the bio_set · 1b434498

由 Jens Axboe 提交于 10月 22, 2008

In preparation for adding differently sized bios.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

1b434498

bio: only mempool back the largest bio_vec slab cache · 7ff9345f

由 Jens Axboe 提交于 12月 11, 2008

We only very rarely need the mempool backing, so it makes sense to
get rid of all but one of the mempool in a bio_set. So keep the
largest bio_vec count mempool so we can always honor the largest
allocation, and "upgrade" callers that fail.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

7ff9345f

block: don't use plugging on SSD devices · a31a9738

由 Jens Axboe 提交于 10月 17, 2008

We just want to hand the first bits of IO to the device as fast
as possible. Gains a few percent on the IOPS rate.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

a31a9738

block: fix empty barrier on write-through w/ ordered tag · a185eb4b

由 Tejun Heo 提交于 11月 28, 2008

Empty barrier on write-through (or no cache) w/ ordered tag has no
command to execute and without any command to execute ordered tag is
never issued to the device and the ordering is never achieved.  Force
draining for such cases.
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

a185eb4b

block: simplify empty barrier implementation · 58eea927

由 Tejun Heo 提交于 11月 28, 2008

Empty barrier required special handling in __elv_next_request() to
complete it without letting the low level driver see it.

With previous changes, barrier code is now flexible enough to skip the
BAR step using the same barrier sequence selection mechanism.  Drop
the special handling and mask off q->ordered from start_ordered().

Remove blk_empty_barrier() test which now has no user.
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

58eea927

block: make barrier completion more robust · 8f11b3e9

由 Tejun Heo 提交于 11月 28, 2008

Barrier completion had the following assumptions.

* start_ordered() couldn't finish the whole sequence properly.  If all
  actions are to be skipped, q->ordseq is set correctly but the actual
  completion was never triggered thus hanging the barrier request.

* Drain completion in elv_complete_request() assumed that there's
  always at least one request in the queue when drain completes.

Both assumptions are true but these assumptions need to be removed to
improve empty barrier implementation.  This patch makes the following
changes.

* Make start_ordered() use blk_ordered_complete_seq() to mark skipped
  steps complete and notify __elv_next_request() that it should fetch
  the next request if the whole barrier has completed inside
  start_ordered().

* Make drain completion path in elv_complete_request() check whether
  the queue is empty.  Empty queue also indicates drain completion.

* While at it, convert 0/1 return from blk_do_ordered() to false/true.
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

8f11b3e9

block: make every barrier action optional · f671620e

由 Tejun Heo 提交于 11月 28, 2008

In all barrier sequences, the barrier write itself was always assumed
to be issued and thus didn't have corresponding control flag.  This
patch adds QUEUE_ORDERED_DO_BAR and unify action mask handling in
start_ordered() such that any barrier action can be skipped.

This patch doesn't introduce any visible behavior changes.
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

f671620e

block: remove duplicate or unused barrier/discard error paths · a7384677

由 Tejun Heo 提交于 11月 28, 2008

* Because barrier mode can be changed dynamically, whether barrier is
  supported or not can be determined only when actually issuing the
  barrier and there is no point in checking it earlier.  Drop barrier
  support check in generic_make_request() and __make_request(), and
  update comment around the support check in blk_do_ordered().

* There is no reason to check discard support in both
  generic_make_request() and __make_request().  Drop the check in
  __make_request().  While at it, move error action block to the end
  of the function and add unlikely() to q existence test.

* Barrier request, be it empty or not, is never passed to low level
  driver and thus it's meaningless to try to copy back req->sector to
  bio->bi_sector on error.  In addition, the notion of failed sector
  doesn't make any sense for empty barrier to begin with.  Drop the
  code block from __end_that_request_first().
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

a7384677

block: reorganize QUEUE_ORDERED_* constants · 313e4299

由 Tejun Heo 提交于 11月 28, 2008

Separate out ordering type (drain,) and action masks (preflush,
postflush, fua) from visible ordering mode selectors
(QUEUE_ORDERED_*).  Ordering types are now named QUEUE_ORDERED_BY_*
while action masks are named QUEUE_ORDERED_DO_*.

This change is necessary to add QUEUE_ORDERED_DO_BAR and make it
optional to improve empty barrier implementation.
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

313e4299

block: reorder struct bio to remove padding on 64bit · ba744d5e

由 Richard Kennedy 提交于 12月 03, 2008

Remove 8 bytes of padding from struct bio which also removes 16 bytes from
struct bio_pair to make it 248 bytes.  bio_pair then fits into one fewer
cache lines & into a smaller slab.
Signed-off-by: NRichard Kennedy <richard@rsk.demon.co.uk>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

ba744d5e

block: use cancel_work_sync() instead of kblockd_flush_work() · 64d01dc9

由 Cheng Renquan 提交于 12月 03, 2008

After many improvements on kblockd_flush_work, it is now identical to
cancel_work_sync, so a direct call to cancel_work_sync is suggested.

The only difference is that cancel_work_sync is a GPL symbol,
so no non-GPL modules anymore.
Signed-off-by: NCheng Renquan <crquan@gmail.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

64d01dc9

block: Supress Buffer I/O errors when SCSI REQ_QUIET flag set · 08bafc03

由 Keith Mannthey 提交于 11月 25, 2008

Allow the scsi request REQ_QUIET flag to be propagated to the buffer
file system layer. The basic ideas is to pass the flag from the scsi
request to the bio (block IO) and then to the buffer layer.  The buffer
layer can then suppress needless printks.

This patch declutters the kernel log by removed the 40-50 (per lun)
buffer io error messages seen during a boot in my multipath setup . It
is a good chance any real errors will be missed in the "noise" it the
logs without this patch.

During boot I see blocks of messages like
"
__ratelimit: 211 callbacks suppressed
Buffer I/O error on device sdm, logical block 5242879
Buffer I/O error on device sdm, logical block 5242879
Buffer I/O error on device sdm, logical block 5242847
Buffer I/O error on device sdm, logical block 1
Buffer I/O error on device sdm, logical block 5242878
Buffer I/O error on device sdm, logical block 5242879
Buffer I/O error on device sdm, logical block 5242879
Buffer I/O error on device sdm, logical block 5242879
Buffer I/O error on device sdm, logical block 5242879
Buffer I/O error on device sdm, logical block 5242872
"
in my logs.

My disk environment is multipath fiber channel using the SCSI_DH_RDAC
code and multipathd.  This topology includes an "active" and "ghost"
path for each lun. IO's to the "ghost" path will never complete and the
SCSI layer, via the scsi device handler rdac code, quick returns the IOs
to theses paths and sets the REQ_QUIET scsi flag to suppress the scsi
layer messages.

 I am wanting to extend the QUIET behavior to include the buffer file
system layer to deal with these errors as well. I have been running this
patch for a while now on several boxes without issue.  A few runs of
bonnie++ show no noticeable difference in performance in my setup.

Thanks for John Stultz for the quiet_error finalization.
Submitted-by: NKeith Mannthey <kmannth@us.ibm.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

08bafc03

block: don't take lock on changing ra_pages · 7c239517

由 Wu Fengguang 提交于 11月 25, 2008

There's no need to take queue_lock or kernel_lock when modifying
bdi->ra_pages. So remove them. Also remove out of date comment for
queue_max_sectors_store().
Signed-off-by: NWu Fengguang <wfg@linux.intel.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

7c239517

Documentation: remove reference to ll_rw_blk.c and moved drivers/block/elevator.c · 42364690

由 Nikanth Karthikesan 提交于 11月 24, 2008

The drivers/block/ll_rw_block.c has been split and organized in the block/
directory, and also drivers/block/elevator.c has been moved to the block/
directory. Update Documentation/block/biodoc.txt accordingly
Signed-off-by: NNikanth Karthikesan <knikanth@suse.de>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

42364690

block/blk-tag.c: cleanup kernel-doc · c6a06f70

由 Qinghuang Feng 提交于 11月 24, 2008

There is no argument named @tags in blk_init_tags,
remove its' comment.
Signed-off-by: NQinghuang Feng <qhfeng.kernel@gmail.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

c6a06f70

cciss: switch to using hlist for command list management · 8a3173de

由 Jens Axboe 提交于 11月 20, 2008

This both cleans up the code and also helps detect the spurious case
of a command attempted being removed from a queue it doesn't belong
to.
Acked-by: NMike Miller <mike.miller@hp.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

8a3173de

Do not free io context when taking recursive faults in do_exit · 7c0990c7

由 Nikanth Karthikesan 提交于 11月 19, 2008

When taking recursive faults in do_exit, if the io_context is not null,
exit_io_context() is being called. But it might decrement the refcount
more than once. It is better to leave this task alone.
Signed-off-by: NNikanth Karthikesan <knikanth@suse.de>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

7c0990c7

cdrom: reduce stack usage of mmc_ioctl_dvd_read_struct · d194139c

由 Marcin Slusarz 提交于 11月 16, 2008

1. kmalloc 192 bytes in dvd_read_bca (which is inlined into dvd_read_struct)
2. Pass struct packet_command to all dvd_read_* functions.

Checkstack output:
Before: mmc_ioctl_dvd_read_struct:         280
After:  mmc_ioctl_dvd_read_struct:         56
Signed-off-by: NMarcin Slusarz <marcin.slusarz@gmail.com>
Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

d194139c

cdrom: split mmc_ioctl to lower stack usage · 3147c531

由 Marcin Slusarz 提交于 11月 16, 2008

Checkstack output:

Before:
mmc_ioctl:                  584

After:
mmc_ioctl_dvd_read_struct:  280
mmc_ioctl_cdrom_subchannel: 152
mmc_ioctl_cdrom_read_data:  120
mmc_ioctl_cdrom_volume:     104
mmc_ioctl_cdrom_read_audio: 104
(mmc_ioctl is inlined into cdrom_ioctl - 104 bytes)
Signed-off-by: NMarcin Slusarz <marcin.slusarz@gmail.com>
Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

3147c531

scsi-ioctl: use clock_t <> jiffies · 2b91bafc

由 Milton Miller 提交于 11月 17, 2008

Convert the timeout ioctl scalling to use the clock_t functions
which are much more accurate with some USER_HZ vs HZ combinations.
Signed-off-by: NMilton Miller <miltonm@bga.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

2b91bafc

block: leave the request timeout timer running even on an empty list · 70ed28b9

由 Jens Axboe 提交于 11月 19, 2008

For sync IO, we'll often do them serialized. This means we'll be touching
the queue timer for every IO, as opposed to only occasionally like we
do for queued IO. Instead of deleting the timer when the last request
is removed, just let continue running. If a new request comes up soon
we then don't have to readd the timer again. If no new requests arrive,
the timer will expire without side effect later.

This improves high iops sync IO by ~1%.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

70ed28b9

J
block: add comment in blk_rq_timed_out() about why next can not be 0 · 65d3618c
由 Jens Axboe 提交于 10月 30, 2008
```
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
```
65d3618c

block: optimizations in blk_rq_timed_out_timer() · 565e411d

由 malahal@us.ibm.com 提交于 10月 30, 2008

Now the rq->deadline can't be zero if the request is in the
timeout_list, so there is no need to have next_set. There is no need to
access a request's deadline field if blk_rq_timed_out is called on it.
Signed-off-by: NMalahal Naineni <malahal@us.ibm.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

565e411d

xen-blkfront: set queue paravirt flag · 66d352e1

由 Fernando Luis Vázquez Cao 提交于 10月 27, 2008

Xen's blkfront sets noop as the default I/O scheduler at initialization
time to avoid elevator overheads such as idling, but with the advent of
basic disk profiling capabilities this is not necessary anymore. We
should just tell the block layer that we are a paravirt front-end driver
and the elevator will automatically make the necessary adjustments.
Signed-off-by: NFernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

66d352e1

virtio_blk: set queue paravirt flag · 7d116b62

由 Fernando Luis Vázquez Cao 提交于 10月 27, 2008

As a paravirt front-end driver, virtio_blk is not a rotational device so
we want do avoid idling in AS/CFQ. Tell the block layer about this.
Signed-off-by: NFernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

7d116b62

block: add queue flag for paravirt frontend drivers · 88e740f1

由 Fernando Luis Vázquez Cao 提交于 10月 27, 2008

As is the case with SSD devices, we do not want to idle in AS/CFQ when
the block device is a paravirt front-end driver. This patch adds a flag
(QUEUE_FLAG_VIRT) which should be used by front-end drivers such as
virtio_blk and xen-blkfront to indicate a paravirtualized device.
Signed-off-by: NFernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

88e740f1

Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc · 3c92ec8a

由 Linus Torvalds 提交于 12月 28, 2008

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (144 commits)
  powerpc/44x: Support 16K/64K base page sizes on 44x
  powerpc: Force memory size to be a multiple of PAGE_SIZE
  powerpc/32: Wire up the trampoline code for kdump
  powerpc/32: Add the ability for a classic ppc kernel to be loaded at 32M
  powerpc/32: Allow __ioremap on RAM addresses for kdump kernel
  powerpc/32: Setup OF properties for kdump
  powerpc/32/kdump: Implement crash_setup_regs() using ppc_save_regs()
  powerpc: Prepare xmon_save_regs for use with kdump
  powerpc: Remove default kexec/crash_kernel ops assignments
  powerpc: Make default kexec/crash_kernel ops implicit
  powerpc: Setup OF properties for ppc32 kexec
  powerpc/pseries: Fix cpu hotplug
  powerpc: Fix KVM build on ppc440
  powerpc/cell: add QPACE as a separate Cell platform
  powerpc/cell: fix build breakage with CONFIG_SPUFS disabled
  powerpc/mpc5200: fix error paths in PSC UART probe function
  powerpc/mpc5200: add rts/cts handling in PSC UART driver
  powerpc/mpc5200: Make PSC UART driver update serial errors counters
  powerpc/mpc5200: Remove obsolete code from mpc5200 MDIO driver
  powerpc/mpc5200: Add MDMA/UDMA support to MPC5200 ATA driver
  ...

Fix trivial conflict in drivers/char/Makefile as per Paul's directions

3c92ec8a

net: ehea NAPI interface cleanup fix · c4c9f018

由 Stephen Rothwell 提交于 12月 29, 2008

Commit 908a7a16 ("net: Remove unused
netdev arg from some NAPI interfaces") missed two spots.
Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c4c9f018

cifs: update for new IP4/6 address printing · bf66542b

由 Stephen Rothwell 提交于 12月 03, 2008

Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

bf66542b

Merge branch 'for-linus' of... · d05a788f

由 Linus Torvalds 提交于 12月 28, 2008

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6:
  smackfs: check for allocation failures in smk_set_access()

d05a788f

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功