提交 · 86ce18d7b7925bfd6b64c061828ca2a857ee83b8 · openeuler / Kernel

24 5月, 2007 1 次提交

genhd: expose AN to user space · 86ce18d7

由 Kristen Carlson Accardi 提交于 5月 23, 2007

Allow user space to determine if a disk supports Asynchronous Notification of
media changes. This is done by adding a new sysfs file "capability_flags",
which is documented in (insert file name). This sysfs file will export all
disk capabilities flags to user space. We also define a new flag to define
the media change notification capability.
Signed-off-by: NKristen Carlson Accardi <kristen.c.accardi@intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

86ce18d7

16 5月, 2007 1 次提交

ll_rw_blk: fix gcc 4.2 warning on current_io_context() · f653c34d

由 Jens Axboe 提交于 5月 15, 2007

current_io_context() is both static and exported with EXPORT_SYMBOL().
As there are no users outside of ll_rw_blk.c itself, just kill the
export.

Problem reported by Martin Michlmayr <tbm@cyrius.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f653c34d

11 5月, 2007 1 次提交

When stacked block devices are in-use (e.g. md or dm), the recursive calls · d89d8796

由 Neil Brown 提交于 5月 01, 2007

to generic_make_request can use up a lot of space, and we would rather they
didn't.

As generic_make_request is a void function, and as it is generally not
expected that it will have any effect immediately, it is safe to delay any
call to generic_make_request until there is sufficient stack space
available.

As ->bi_next is reserved for the driver to use, it can have no valid value
when generic_make_request is called, and as __make_request implicitly
assumes it will be NULL (ELEVATOR_BACK_MERGE fork of switch) we can be
certain that all callers set it to NULL.  We can therefore safely use
bi_next to link pending requests together, providing we clear it before
making the real call.

So, we choose to allow each thread to only be active in one
generic_make_request at a time.  If a subsequent (recursive) call is made,
the bio is linked into a per-thread list, and is handled when the active
call completes.

As the list of pending bios is per-thread, there are no locking issues to
worry about.

I say above that it is "safe to delay any call...".  There are, however,
some behaviours of a make_request_fn which would make it unsafe.  These
include any behaviour that assumes anything will have changed after a
recursive call to generic_make_request.

These could include:
 - waiting for that call to finish and call it's bi_end_io function.
   md use to sometimes do this (marking the superblock dirty before
   completing a write) but doesn't any more
 - inspecting the bio for fields that generic_make_request might
   change, such as bi_sector or bi_bdev.  It is hard to see a good
   reason for this, and I don't think anyone actually does it.
 - inspecing the queue to see if, e.g. it is 'full' yet.  Again, I
   think this is very unlikely to be useful, or to be done.
Signed-off-by: NNeil Brown <neilb@suse.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: <dm-devel@redhat.com>

Alasdair G Kergon <agk@redhat.com> said:

 I can see nothing wrong with this in principle.

 For device-mapper at the moment though it's essential that, while the bio
 mappings may now get delayed, they still get processed in exactly
 the same order as they were passed to generic_make_request().

 My main concern is whether the timing changes implicit in this patch
 will make the rare data-corrupting races in the existing snapshot code
 more likely. (I'm working on a fix for these races, but the unfinished
 patch is already several hundred lines long.)

 It would be helpful if some people on this mailing list would test
 this patch in various scenarios and report back.
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

d89d8796

10 5月, 2007 4 次提交

由 Rafael J. Wysocki 提交于 5月 09, 2007

Since nonboot CPUs are now disabled after tasks and devices have been
frozen and the CPU hotplug infrastructure is used for this purpose, we need
special CPU hotplug notifications that will help the CPU-hotplug-aware
subsystems distinguish normal CPU hotplug events from CPU hotplug events
related to a system-wide suspend or resume operation in progress.  This
patch introduces such notifications and causes them to be used during
suspend and resume transitions.  It also changes all of the
CPU-hotplug-aware subsystems to take these notifications into consideration
(for now they are handled in the same way as the corresponding "normal"
ones).

[oleg@tv-sign.ru: cleanups]
Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
Cc: Gautham R Shenoy <ego@in.ibm.com>
Cc: Pavel Machek <pavel@ucw.cz>
Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8bb78442

unify flush_work/flush_work_keventd and rename it to cancel_work_sync · 28e53bdd

由 Oleg Nesterov 提交于 5月 09, 2007

flush_work(wq, work) doesn't need the first parameter, we can use cwq->wq
(this was possible from the very beginnig, I missed this).  So we can unify
flush_work_keventd and flush_work.

Also, rename flush_work() to cancel_work_sync() and fix all callers.
Perhaps this is not the best name, but "flush_work" is really bad.

(akpm: this is why the earlier patches bypassed maintainers)
Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
Cc: Jeff Garzik <jeff@garzik.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Tejun Heo <htejun@gmail.com>
Cc: Auke Kok <auke-jan.h.kok@intel.com>,
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

28e53bdd

kblockd: use flush_work · 19a75d83

由 Andrew Morton 提交于 5月 09, 2007

Switch the kblockd flushing from a global flush to a more specific
flush_work().

(akpm: bypassed maintainers, sorry.  There are other patches which depend on
this)

Cc: "Maciej W. Rozycki" <macro@linux-mips.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Jens Axboe <axboe@suse.de>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

19a75d83

Display all possible partitions when the root filesystem failed to mount · dd2a345f

由 Dave Gilbert 提交于 5月 09, 2007

Display all possible partitions when the root filesystem is not mounted.
This helps to track spell'o's and missing drivers.

Updated to work with newer kernels.

Example output:

VFS: Cannot open root device "foobar" or unknown-block(0,0)
Please append a correct "root=" boot option; here are the available partitions:
0800    8388608 sda driver: sd
  0801     192748 sda1
  0802    8193150 sda2
0810    4194304 sdb driver: sd
Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0)

[akpm@linux-foundation.org: cleanups, fix printk warnings]
Signed-off-by: NJan Engelhardt <jengelh@gmx.de>
Cc: Dave Gilbert <linux@treblig.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

dd2a345f

09 5月, 2007 3 次提交

Fix occurrences of "the the " · 59c51591

由 Michael Opdenacker 提交于 5月 09, 2007

Signed-off-by: NMichael Opdenacker <michael@free-electrons.com>
Signed-off-by: NAdrian Bunk <bunk@stusta.de>

59c51591

as: fix antic_expire check · c6a632a2

由 Nick Piggin 提交于 5月 08, 2007

Fix units mismatch (jiffies vs msecs) in as-iosched.c, spotted by Xiaoning
Ding <dingxn@cse.ohio-state.edu>.
Signed-off-by: NNick Piggin <npiggin@suse.de>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c6a632a2

[PATCH] ll_rw_blk: fix missing bounce in blk_rq_map_kern() · 821de3a2

由 Mike Christie 提交于 5月 08, 2007

I think we might just need the blk_map_kern users now. For the async
execute I added the bounce code already and the block SG_IO has it
atleady. I think the blk_map_kern bounce code got dropped because we
thought the correct gfp_t would be passed in. But I think all we need is
the patch below and all the paths are take care of. The patch is not
tested. Patch was made against scsi-misc.

The last place that is sending non sg commands may just be md/dm-emc.c
but that is is just waiting on alasdair to take some patches that fix
that and a bunch of junk in there including adding bounce support. If
the patch below is ok though and dm-emc finally gets converted then it
will have sg and bonce buffer support.
Signed-off-by: NMike Christie <michaelc@cs.wisc.edu>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

821de3a2

08 5月, 2007 2 次提交

KMEM_CACHE(): simplify slab cache creation · 0a31bd5f

由 Christoph Lameter 提交于 5月 06, 2007

This patch provides a new macro

KMEM_CACHE(<struct>, <flags>)

to simplify slab creation. KMEM_CACHE creates a slab with the name of the
struct, with the size of the struct and with the alignment of the struct.
Additional slab flags may be specified if necessary.

Example

struct test_slab {
	int a,b,c;
	struct list_head;
} __cacheline_aligned_in_smp;

test_slab_cache = KMEM_CACHE(test_slab, SLAB_PANIC)

will create a new slab named "test_slab" of the size sizeof(struct
test_slab) and aligned to the alignment of test slab.  If it fails then we
panic.
Signed-off-by: NChristoph Lameter <clameter@sgi.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0a31bd5f

mm: remove destroy_dirty_buffers from invalidate_bdev() · f98393a6

由 Peter Zijlstra 提交于 5月 06, 2007

Remove the destroy_dirty_buffers argument from invalidate_bdev(), it hasn't
been used in 6 years (so akpm says).

find * -name \*.[ch] | xargs grep -l invalidate_bdev |
while read file; do
	quilt add $file;
	sed -ie 's/invalidate_bdev(\([^,]*\),[^)]*)/invalidate_bdev(\1)/g' $file;
done
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f98393a6

03 5月, 2007 1 次提交

remove "struct subsystem" as it is no longer needed · 823bccfc

由 Greg Kroah-Hartman 提交于 4月 13, 2007

We need to work on cleaning up the relationship between kobjects, ksets and
ktypes.  The removal of 'struct subsystem' is the first step of this,
especially as it is not really needed at all.

Thanks to Kay for fixing the bugs in this patch.
Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>

823bccfc

30 4月, 2007 19 次提交

[PATCH] elevator: elv_list_lock does not need irq disabling · 2a12dcd7

由 Jens Axboe 提交于 4月 26, 2007

It's never grabbed from irq context, so just make it plain spin_lock().
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

2a12dcd7

cfq-iosched: speedup cic rb lookup · 597bc485

由 Jens Axboe 提交于 4月 24, 2007

We often lookup the same queue many times in succession, so cache
the last looked up queue to avoid browsing the rbtree.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

597bc485

J
ll_rw_blk: add io_context private pointer · 4e521c27
由 Jens Axboe 提交于 4月 24, 2007
```
To be used by as/cfq as they see fit.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
```
4e521c27

cfq-iosched: get rid of cfqq hash · 91fac317

由 Vasily Tarasov 提交于 4月 25, 2007

cfq hash is no more necessary.  We always can get cfqq from io context.
cfq_get_io_context_noalloc() function is introduced, because we don't
want to allocate cic on merging and checking may_queue.  In order to
identify sync queue we've used hash key = CFQ_KEY_ASYNC. Since hash is
eliminated we need to use other criterion: sync flag for queue is added.
In all places where we dig in rb_tree we're in current context, so no
additional locking is required.

Advantages of this patch: no additional memory for hash, no seeking in
hash, code is cleaner. But it is necessary now to seek cic in per-ioc
rbtree, but it is faster:
- most processes work only with few devices
- most systems have only few block devices
- it is a rb-tree
Signed-off-by: NVasily Tarasov <vtaras@openvz.org>

Changes by me:

- Merge into CFQ devel branch
- Get rid of cfq_get_io_context_noalloc()
- Fix various bugs with dereferencing cic->cfqq[] with offset other
  than 0 or 1.
- Fix bug in cfqq setup, is_sync condition was reversed.
- Fix bug where only bio_sync() is used, we need to check for a READ too
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

91fac317

cfq-iosched: tighten queue request overlap condition · cc197479

由 Jens Axboe 提交于 4月 20, 2007

For tagged devices, allow overlap of requests if the idle window
isn't enabled on the current active queue.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

cc197479

J
cfq-iosched: improve sync vs async workloads · 3ed9a296
由 Jens Axboe 提交于 4月 23, 2007
```
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
```
3ed9a296

cfq-iosched: never allow an async queue idling · 1be92f2f

由 Jens Axboe 提交于 4月 19, 2007

We don't enable it by default, don't let it get enabled during
runtime.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

1be92f2f

cfq-iosched: get rid of ->dispatch_slice · 20e493a8

由 Jens Axboe 提交于 4月 23, 2007

We can track it fairly accurately locally, let the slice handling
take care of the rest.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

20e493a8

J
cfq-iosched: don't pass unused preemption variable around · 6084cdda
由 Jens Axboe 提交于 4月 23, 2007
```
We don't use it anymore in the slice expiry handling.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
```
6084cdda

cfq-iosched: get rid of ->cur_rr and ->cfq_list · edd75ffd

由 Jens Axboe 提交于 4月 19, 2007

It's only used for preemption now that the IDLE and RT queues also
use the rbtree. If we pass an 'add_front' variable to
cfq_service_tree_add(), we can set ->rb_key to 0 to force insertion
at the front of the tree.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

edd75ffd

cfq-iosched: slice offset should take ioprio into account · 67e6b49e

由 Jens Axboe 提交于 4月 20, 2007

Use the max_slice-cur_slice as the multipler for the insertion offset.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

67e6b49e

J
[PATCH] cfq-iosched: style cleanups and comments · 498d3aa2
由 Jens Axboe 提交于 4月 26, 2007
```
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
```
498d3aa2

cfq-iosched: sort IDLE queues into the rbtree · 67060e37

由 Jens Axboe 提交于 4月 18, 2007

Same treatment as the RT conversion, just put the sorted idle
branch at the end of the tree.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

67060e37

cfq-iosched: sort RT queues into the rbtree · 0c534e0a

由 Jens Axboe 提交于 4月 18, 2007

Currently CFQ does a linked insert into the current list for RT
queues. We can just factor the class into the rb insertion,
and then we don't have to treat RT queues in a special way. It's
faster, too.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

0c534e0a

[PATCH] cfq-iosched: speed up rbtree handling · cc09e299

由 Jens Axboe 提交于 4月 26, 2007

For cases where the rbtree is mainly used for sorting and min retrieval,
a nice speedup of the rbtree code is to maintain a cache of the leftmost
node in the tree.

Also spotted in the CFS CPU scheduler code.

Improved by Alan D. Brunelle <Alan.Brunelle@hp.com> by updating the
leftmost hint in cfq_rb_first() if it isn't set, instead of only
updating it on insert.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

cc09e299

cfq-iosched: rework the whole round-robin list concept · d9e7620e

由 Jens Axboe 提交于 4月 20, 2007

Drawing on some inspiration from the CFS CPU scheduler design, overhaul
the pending cfq_queue concept list management. Currently CFQ uses a
doubly linked list per priority level for sorting and service uses.
Kill those lists and maintain an rbtree of cfq_queue's, sorted by when
to service them.

This unfortunately means that the ionice levels aren't as strong
anymore, will work on improving those later. We only scale the slice
time now, not the number of times we service. This means that latency
is better (for all priority levels), but that the distinction between
the highest and lower levels aren't as big.

The diffstat speaks for itself.

 cfq-iosched.c |  363 +++++++++++++++++---------------------------------
 1 file changed, 125 insertions(+), 238 deletions(-)
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

d9e7620e

cfq-iosched: minor updates · 1afba045

由 Jens Axboe 提交于 4月 17, 2007

- Move the queue_new flag clear to when the queue is selected
- Only select the non-first queue in cfq_get_best_queue(), if there's
  a substantial difference between the best and first.
- Get rid of ->busy_rr
- Only select a close cooperator, if the current queue is known to take
  a while to "think".
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

1afba045

cfq-iosched: development update · 6d048f53

由 Jens Axboe 提交于 4月 25, 2007

- Implement logic for detecting cooperating processes, so we
  choose the best available queue whenever possible.

- Improve residual slice time accounting.

- Remove dead code: we no longer see async requests coming in on
  sync queues. That part was removed a long time ago. That means
  that we can also remove the difference between cfq_cfqq_sync()
  and cfq_cfqq_class_sync(), they are now indentical. And we can
  kill the on_dispatch array, just make it a counter.

- Allow a process to go into the current list, if it hasn't been
  serviced in this scheduler tick yet.

Possible future improvements including caching the cfqq lookup
in cfq_close_cooperator(), so we don't have to look it up twice.
cfq_get_best_queue() should just use that last decision instead
of doing it again.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

6d048f53

cfq-iosched: improve preemption for cooperating tasks · 1e3335de

由 Jens Axboe 提交于 2月 14, 2007

When testing the syslet async io approach, I discovered that CFQ
sometimes didn't perform as well as expected. cfq_should_preempt()
needs to better check for cooperating tasks, so fix that by allowing
preemption of an equal priority queue if the recently queued request
is as good a candidate for IO as the one we are currently waiting for.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

1e3335de

25 4月, 2007 1 次提交

cfq-iosched: fix alias + front merge bug · 5044eed4

由 Jens Axboe 提交于 4月 25, 2007

There's a really rare and obscure bug in CFQ, that causes a crash in
cfq_dispatch_insert() due to rq == NULL.  One example of the resulting
oops is seen here:

	http://lkml.org/lkml/2007/4/15/41

Neil correctly diagnosed the situation for how this can happen: if two
concurrent requests with the exact same sector number (due to direct IO
or aliasing between MD and the raw device access), the alias handling
will add the request to the sortlist, but next_rq remains NULL.

Read the more complete analysis at:

	http://lkml.org/lkml/2007/4/25/57

This looks like it requires md to trigger, even though it should
potentially be possible to due with O_DIRECT (at least if you edit the
kernel and doctor some of the unplug calls).

The fix is to move the ->next_rq update to when we add a request to the
rbtree. Then we remove the possibility for a request to exist in the
rbtree code, but not have ->next_rq correctly updated.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5044eed4

21 4月, 2007 1 次提交

cfq-iosched: fix sequential write regression · a9938006

由 Jens Axboe 提交于 4月 20, 2007

We have a 10-15% performance regression for sequential writes on TCQ/NCQ
enabled drives in 2.6.21-rcX after the CFQ update went in. It has been
reported by Valerie Clement <valerie.clement@bull.net> and the Intel
testing folks. The regression is because of CFQ's now more aggressive
queue control, limiting the depth available to the device.

This patches fixes that regression by allowing a greater depth when only
one queue is busy. It has been tested to not impact sync-vs-async
workloads too much - we still do a lot better than 2.6.20.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a9938006

18 4月, 2007 1 次提交

[SCSI] sg: cap reserved_size values at max_sectors · 44ec9542

由 Alan Stern 提交于 2月 20, 2007

This patch (as857) modifies the SG_GET_RESERVED_SIZE and
SG_SET_RESERVED_SIZE ioctls in the sg driver, capping the values at
the device's request_queue's max_sectors value.  This will permit
cdrecord to obtain a legal value for the maximum transfer length,
fixing Bugzilla #7026.

The patch also caps the initial reserved_size value.  There's no
reason to have a reserved buffer larger than max_sectors, since it
would be impossible to use the extra space.

The corresponding ioctls in the block layer are modified similarly,
and the initial value for the reserved_size is set as large as
possible.  This will effectively make it default to max_sectors.
Note that the actual value is meaningless anyway, since block devices
don't have a reserved buffer.

Finally, the BLKSECTGET ioctl is added to sg, so that there will be a
uniform way for users to determine the actual max_sectors value for
any raw SCSI transport.
Signed-off-by: NAlan Stern <stern@rowland.harvard.edu>
Acked-by: NJens Axboe <jens.axboe@oracle.com>
Acked-by: NDouglas Gilbert <dougg@torque.net>
Signed-off-by: NJames Bottomley <James.Bottomley@SteelEye.com>

44ec9542

05 4月, 2007 1 次提交

[PATCH] remove protection of LANANA-reserved majors · 2363cc02

由 Andrew Morton 提交于 4月 04, 2007

Revert all this.  It can cause device-mapper to receive a different major from
earlier kernels and it turns out that the Amanda backup program (via GNU tar,
apparently) checks major numbers on files when performing incremental backups.

Which is a bit broken of Amanda (or tar), but this feature isn't important
enough to justify the churn.

Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2363cc02

27 3月, 2007 2 次提交

make elv_register() output atomic · 1ffb96c5

由 Thibaut VARENE 提交于 3月 15, 2007

Booting 2.6.21-rc3-g45592145 I noticed the following on one of my
machines in the bootlog:

io scheduler noop registered<6>Time: jiffies clocksource has been installed.

io scheduler deadline registered (default)

Looking at block/elevator.c, it appears that elv_register() uses two
consecutive printks in a non-atomic way, leading to the above glitch. The
attached trivial patch fixes this issue, by using a single printk.
Signed-off-by: NThibaut VARENE <varenet@parisc-linux.org>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

1ffb96c5

block: blk_max_pfn is somtimes wrong · f772b3d9

由 Vasily Tarasov 提交于 3月 27, 2007

There is a small problem in handling page bounce.

At the moment blk_max_pfn equals max_pfn, which is in fact not maximum
possible _number_ of a page frame, but the _amount_ of page frames.  For
example for the 32bit x86 node with 4Gb RAM, max_pfn = 0x100000, but not
0xFFFF.

request_queue structure has a member q->bounce_pfn and queue needs bounce
pages for the pages _above_ this limit.  This routine is handled by
blk_queue_bounce(), where the following check is produced:

	if (q->bounce_pfn >= blk_max_pfn)
		return;

Assume, that a driver has set q->bounce_pfn to 0xFFFF, but blk_max_pfn
equals 0x10000.  In such situation the check above fails and for each bio
we always fall down for iterating over pages tied to the bio.

I want to notice, that for quite a big range of device drivers (ide, md,
...) such problem doesn't happen because they use BLK_BOUNCE_ANY for
bounce_pfn.  BLK_BOUNCE_ANY is defined as blk_max_pfn << PAGE_SHIFT, and
then the check above doesn't fail.  But for other drivers, which obtain
reuired value from drivers, it fails.  For example sata_nv uses
ATA_DMA_MASK or dev->dma_mask.

I propose to use (max_pfn - 1) for blk_max_pfn.  And the same for
blk_max_low_pfn.  The patch also cleanses some checks related with
bounce_pfn.
Signed-off-by: NVasily Tarasov <vtaras@openvz.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

f772b3d9

21 2月, 2007 2 次提交

[PATCH] lockdep: annotate BLKPG_DEL_PARTITION · 6d740cd5

由 Peter Zijlstra 提交于 2月 20, 2007

>=============================================
>[ INFO: possible recursive locking detected ]
>2.6.19-1.2909.fc7 #1
>---------------------------------------------
>anaconda/587 is trying to acquire lock:
> (&bdev->bd_mutex){--..}, at: [<c05fb380>] mutex_lock+0x21/0x24
>
>but task is already holding lock:
> (&bdev->bd_mutex){--..}, at: [<c05fb380>] mutex_lock+0x21/0x24
>
>other info that might help us debug this:
>1 lock held by anaconda/587:
> #0:  (&bdev->bd_mutex){--..}, at: [<c05fb380>] mutex_lock+0x21/0x24
>
>stack backtrace:
> [<c0405812>] show_trace_log_lvl+0x1a/0x2f
> [<c0405db2>] show_trace+0x12/0x14
> [<c0405e36>] dump_stack+0x16/0x18
> [<c043bd84>] __lock_acquire+0x116/0xa09
> [<c043c960>] lock_acquire+0x56/0x6f
> [<c05fb1fa>] __mutex_lock_slowpath+0xe5/0x24a
> [<c05fb380>] mutex_lock+0x21/0x24
> [<c04d82fb>] blkdev_ioctl+0x600/0x76d
> [<c04946b1>] block_ioctl+0x1b/0x1f
> [<c047ed5a>] do_ioctl+0x22/0x68
> [<c047eff2>] vfs_ioctl+0x252/0x265
> [<c047f04e>] sys_ioctl+0x49/0x63
> [<c0404070>] syscall_call+0x7/0xb

Annotate BLKPG_DEL_PARTITION's bd_mutex locking and add a little comment
clarifying the bd_mutex locking, because I confused myself and initially
thought the lock order was wrong too.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6d740cd5

[PATCH] rework reserved major handling · b446b60e

由 Andrew Morton 提交于 2月 20, 2007

Several people have reported failures in dynamic major device number handling
due to the recent changes in there to avoid handing out the local/experimental
majors.

Rolf reports that this is due to a gcc-4.1.0 bug.

The patch refactors that code a lot in an attempt to provoke the compiler into
behaving.

Cc: Rolf Eike Beer <eike-kernel@sf-tec.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b446b60e

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功