提交 · f48d1915b86f06a943087e5f9b29542a1ef4cd4d · openeuler / raspberrypi-kernel

27 7月, 2011 1 次提交

fail_make_request: cleanup should_fail_request · b2c9cd37

由 Akinobu Mita 提交于 7月 26, 2011

This changes should_fail_request() to more usable wrapper function of
should_fail().  It can avoid putting #ifdef CONFIG_FAIL_MAKE_REQUEST in
the middle of a function.
Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b2c9cd37

26 7月, 2011 1 次提交

block: fix warning with calling smp_processor_id() in preemptible section · 11ccf116

由 Jens Axboe 提交于 7月 26, 2011

After commit 5757a6d7 introduced an unsafe calling of
smp_processor_id(), with preempt debuggin turned on we spew a lot of:

BUG: using smp_processor_id() in preemptible [00000000] code: kjournald/514
caller is __make_request+0x1b8/0x308
[<c0019f44>] (unwind_backtrace+0x0/0xe8) from [<c024b4cc>] (debug_smp_processor_id+0xbc/0xf0)
[<c024b4cc>] (debug_smp_processor_id+0xbc/0xf0) from [<c0223d14>] (__make_request+0x1b8/0x308)
[<c0223d14>] (__make_request+0x1b8/0x308) from [<c02215ac>] (generic_make_request+0x4dc/0x558)
[<c02215ac>] (generic_make_request+0x4dc/0x558) from [<c022173c>] (submit_bio+0x114/0x138)
[<c022173c>] (submit_bio+0x114/0x138) from [<c011f504>] (submit_bh+0x148/0x16c)
[<c011f504>] (submit_bh+0x148/0x16c) from [<c0121ed8>] (__sync_dirty_buffer+0x88/0xd8)
[<c0121ed8>] (__sync_dirty_buffer+0x88/0xd8) from [<c01aff78>] (journal_commit_transaction+0x1198/0x1688)
[<c01aff78>] (journal_commit_transaction+0x1198/0x1688) from [<c01b4034>] (kjournald+0xb4/0x224)
[<c01b4034>] (kjournald+0xb4/0x224) from [<c0069ea0>] (kthread+0x8c/0x94)
[<c0069ea0>] (kthread+0x8c/0x94) from [<c00137f8>] (kernel_thread_exit+0x0/0x8)

Fix this by just using raw_smp_processor_id(), it's just a hint
after all. There's no pinning of the CPU or accessing per-cpu
structures involved.
Reported-by: NMing Lei <tom.leiming@gmail.com>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

11ccf116

24 7月, 2011 2 次提交

block: strict rq_affinity · 5757a6d7

由 Dan Williams 提交于 7月 23, 2011

Some systems benefit from completions always being steered to the strict
requester cpu rather than the looser "per-socket" steering that
blk_cpu_to_group() attempts by default. This is because the first
CPU in the group mask ends up being completely overloaded with work,
while the others (including the original submitter) has power left
to spare.

Allow the strict mode to be set by writing '2' to the sysfs control
file. This is identical to the scheme used for the nomerges file,
where '2' is a more aggressive setting than just being turned on.

echo 2 > /sys/block/<bdev>/queue/rq_affinity

Cc: Christoph Hellwig <hch@infradead.org>
Cc: Roland Dreier <roland@purestorage.com>
Tested-by: NDave Jiang <dave.jiang@intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

5757a6d7

block: fix patch import error in max_discard_sectors check · 4c64500e

由 Jens Axboe 提交于 7月 23, 2011

A '!' snuck in before the unlikely, rendering it useless.
Reported-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

4c64500e

22 7月, 2011 1 次提交

[SCSI] fix crash in scsi_dispatch_cmd() · bfe159a5

由 James Bottomley 提交于 7月 07, 2011

USB surprise removal of sr is triggering an oops in
scsi_dispatch_command().  What seems to be happening is that USB is
hanging on to a queue reference until the last close of the upper
device, so the crash is caused by surprise remove of a mounted CD
followed by attempted unmount.

The problem is that USB doesn't issue its final commands as part of
the SCSI teardown path, but on last close when the block queue is long
gone.  The long term fix is probably to make sr do the teardown in the
same way as sd (so remove all the lower bits on ejection, but keep the
upper disk alive until last close of user space).  However, the
current oops can be simply fixed by not allowing any commands to be
sent to a dead queue.

Cc: stable@kernel.org
Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>

bfe159a5

21 7月, 2011 1 次提交

block,rcu: Convert call_rcu(disk_free_ptbl_rcu_cb) to kfree_rcu() · 57bdfbf9

由 Lai Jiangshan 提交于 3月 18, 2011

The rcu callback disk_free_ptbl_rcu_cb() just calls a kfree(),
so we use kfree_rcu() instead of the call_rcu(disk_free_ptbl_rcu_cb).
Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Jens Axboe <axboe@kernel.dk>
Reviewed-by: NJosh Triplett <josh@joshtriplett.org>

57bdfbf9

12 7月, 2011 4 次提交

CFQ: add think time check for group · 7700fc4f

由 Shaohua Li 提交于 7月 12, 2011

Currently when the last queue of a group has no request, we don't expire
the queue to hope request from the group comes soon, so the group doesn't
miss its share. But if the think time is big, the assumption isn't correct
and we just waste bandwidth. In such case, we don't do idle.

[global]
runtime=30
direct=1

[test1]
cgroup=test1
cgroup_weight=1000
rw=randread
ioengine=libaio
size=500m
runtime=30
directory=/mnt
filename=file1
thinktime=9000

[test2]
cgroup=test2
cgroup_weight=1000
rw=randread
ioengine=libaio
size=500m
runtime=30
directory=/mnt
filename=file2

	patched		base
test1	64k		39k
test2	548k		540k
total	604k		578k

group1 gets much better throughput because it waits less time.

To check if the patch changes behavior of queue without think time. I also
tried to give test1 2ms think time or no think time. The test result is stable.
The thoughput doesn't change with/without the patch.
Signed-off-by: NShaohua Li <shaohua.li@intel.com>
Acked-by: NVivek Goyal <vgoyal@redhat.com>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

7700fc4f

CFQ: add think time check for service tree · f5f2b6ce

由 Shaohua Li 提交于 7月 12, 2011

Currently when the last queue of a service tree has no request, we don't
expire the queue to hope request from the service tree comes soon, so the
service tree doesn't miss its share. But if the think time is big, the
assumption isn't correct and we just waste bandwidth. In such case, we
don't do idle.

[global]
runtime=10
direct=1

[test1]
rw=randread
ioengine=libaio
size=500m
directory=/mnt
filename=file1
thinktime=9000

[test2]
rw=read
ioengine=libaio
size=1G
directory=/mnt
filename=file2

	patched		base
test1	41k/s		33k/s
test2	15868k/s	15789k/s
total	15902k/s	15817k/s

A slightly better

To check if the patch changes behavior of queue without think time. I also
tried to give test1 2ms think time or no think time. The test has variation
even without the patch, but the average throughput doesn't change with/without
the patch.
Signed-off-by: NShaohua Li <shaohua.li@intel.com>
Acked-by: NVivek Goyal <vgoyal@redhat.com>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

f5f2b6ce

CFQ: move think time check variables to a separate struct · 383cd721

由 Shaohua Li 提交于 7月 12, 2011

Move the variables to do think time check to a sepatate struct. This is
to prepare adding think time check for service tree and group. No
functional change.
Signed-off-by: NShaohua Li <shaohua.li@intel.com>
Acked-by: NVivek Goyal <vgoyal@redhat.com>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

383cd721

fixlet: Remove fs_excl from struct task. · 4aede84b

由 Justin TerAvest 提交于 7月 12, 2011

fs_excl is a poor man's priority inheritance for filesystems to hint to
the block layer that an operation is important. It was never clearly
specified, not widely adopted, and will not prevent starvation in many
cases (like across cgroups).

fs_excl was introduced with the time sliced CFQ IO scheduler, to
indicate when a process held FS exclusive resources and thus needed
a boost.

It doesn't cover all file systems, and it was never fully complete.
Lets kill it.
Signed-off-by: NJustin TerAvest <teravest@google.com>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

4aede84b

11 7月, 2011 1 次提交

cfq: Remove special treatment for metadata rqs. · a07405b7

由 Justin TerAvest 提交于 7月 10, 2011

There is no consistency among filesystems from what bios (or requests)
are marked as being metadata. It's interesting to expose this in traces,
but we shouldn't schedule the requests differently based on whether or
not they're marked as being metadata.
Signed-off-by: NJustin TerAvest <teravest@google.com>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

a07405b7

08 7月, 2011 1 次提交

block: avoid building too big plug list · 55c022bb

由 Shaohua Li 提交于 7月 08, 2011

When I test fio script with big I/O depth, I found the total throughput drops
compared to some relative small I/O depth. The reason is the thread accumulates
big requests in its plug list and causes some delays (surely this depends
on CPU speed).
I thought we'd better have a threshold for requests. When a threshold reaches,
this means there is no request merge and queue lock contention isn't severe
when pushing per-task requests to queue, so the main advantages of blk plug
don't exist. We can force a plug list flush in this case.
With this, my test throughput actually increases and almost equals to small
I/O depth. Another side effect is irq off time decreases in blk_flush_plug_list()
for big I/O depth.
The BLK_MAX_REQUEST_COUNT is choosen arbitarily, but 16 is efficiently to
reduce lock contention to me. But I'm open here, 32 is ok in my test too.
Signed-off-by: NShaohua Li <shaohua.li@intel.com>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

55c022bb

07 7月, 2011 1 次提交

block: eliminate potential for infinite loop in blkdev_issue_discard · 0f799603

由 Mike Snitzer 提交于 7月 06, 2011

Due to the recently identified overflow in read_capacity_16() it was
possible for max_discard_sectors to be zero but still have discards
enabled on the associated device's queue.

Eliminate the possibility for blkdev_issue_discard to infinitely loop.

Interestingly this issue wasn't identified until a device, whose
discard_granularity was 0 due to read_capacity_16 overflow, was consumed
by blk_stack_limits() to construct limits for a higher-level DM
multipath device. The multipath device's resulting limits never had the
discard limits stacked because blk_stack_limits() will only do so if
the bottom device's discard_granularity != 0. This resulted in the
multipath device's limits.max_discard_sectors being 0.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

0f799603

02 7月, 2011 1 次提交

compat_ioctl: fix warning caused by qemu · 390192b3

由 Johannes Stezenbach 提交于 7月 01, 2011

On Linux x86_64 host with 32bit userspace, running
qemu or even just "qemu-img create -f qcow2 some.img 1G"
causes a kernel warning:

ioctl32(qemu-img:5296): Unknown cmd fd(3) cmd(00005326){t:'S';sz:0} arg(7fffffff) on some.img
ioctl32(qemu-img:5296): Unknown cmd fd(3) cmd(801c0204){t:02;sz:28} arg(fff77350) on some.img

ioctl 00005326 is CDROM_DRIVE_STATUS,
ioctl 801c0204 is FDGETPRM.

The warning appears because the Linux compat-ioctl handler for these
ioctls only applies to block devices, while qemu also uses the ioctls on
plain files.
Signed-off-by: NJohannes Stezenbach <js@sig21.net>
Acked-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

390192b3

01 7月, 2011 1 次提交

block: flush MEDIA_CHANGE from drivers on close(2) · 85ef06d1

由 Tejun Heo 提交于 7月 01, 2011

Currently, only open(2) is defined as the 'clearing' point.  It has
two roles - first, it's an acknowledgement from userland indicating
that the event has been received and kernel can clear pending states
and proceed to generate more events.  Secondly, it's passed on to
device drivers as a hint indicating that a synchronization point has
been reached and it might want to take a deeper look at the device.

The latter currently is only used by sr which uses two different
mechanisms - GET_EVENT_MEDIA_STATUS_NOTIFICATION and TEST_UNIT_READY
to discover events, where the former is lighter weight and safe to be
used repeatedly but may not provide full coverage.  Among other
things, GET_EVENT can't detect media removal while TUR can.

This patch makes close(2) - blkdev_put() - indicate clearing hint for
MEDIA_CHANGE to drivers.  disk_check_events() is renamed to
disk_flush_events() and updated to take @mask for events to flush
which is or'd to ev->clearing and will be passed to the driver on the
next ->check_events() invocation.

This change makes sr generate MEDIA_CHANGE when media is ejected from
userland - e.g. with eject(1).

Note: Given the current usage, it seems @clearing hint is needlessly
complex.  disk_clear_events() can simply clear all events and the hint
can be boolean @flush.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

85ef06d1

27 6月, 2011 2 次提交

cfq-iosched: make code consistent · 726e99ab

由 Shaohua Li 提交于 6月 27, 2011

ioc->ioc_data is rcu protectd, so uses correct API to access it.
This doesn't change any behavior, but just make code consistent.
Signed-off-by: NShaohua Li <shaohua.li@intel.com>
Cc: stable@kernel.org # after ab4bd22dSigned-off-by: NJens Axboe <jaxboe@fusionio.com>

726e99ab

cfq-iosched: fix a rcu warning · 3181faa8

由 Shaohua Li 提交于 6月 27, 2011

I got a rcu warnning at boot. the ioc->ioc_data is rcu_deferenced, but
doesn't hold rcu_read_lock.
Signed-off-by: NShaohua Li <shaohua.li@intel.com>
Cc: stable@kernel.org # after ab4bd22dSigned-off-by: NJens Axboe <jaxboe@fusionio.com>

3181faa8

20 6月, 2011 3 次提交

bsg: fix address space warning from sparse · 2b727c63

由 Namhyung Kim 提交于 6月 20, 2011

copy_from/to_user() and blk_rq_map_user() want __user pointer.
This patch fixes following warnings from sparse:

   CHECK   block/bsg.c
 block/bsg.c:185:38: warning: incorrect type in argument 2 (different address spaces)
 block/bsg.c:185:38:    expected void const [noderef] <asn:1>*from
 block/bsg.c:185:38:    got void *<noident>
 block/bsg.c:295:58: warning: incorrect type in argument 4 (different address spaces)
 block/bsg.c:295:58:    expected void [noderef] <asn:1>*<noident>
 block/bsg.c:295:58:    got void *[assigned] dxferp
 block/bsg.c:311:52: warning: incorrect type in argument 4 (different address spaces)
 block/bsg.c:311:52:    expected void [noderef] <asn:1>*<noident>
 block/bsg.c:311:52:    got void *[assigned] dxferp
 block/bsg.c:448:37: warning: incorrect type in argument 1 (different address spaces)
 block/bsg.c:448:37:    expected void [noderef] <asn:1>*dst
 block/bsg.c:448:37:    got void *<noident>
Signed-off-by: NNamhyung Kim <namhyung@gmail.com>
Acked-by: NFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

2b727c63

bsg: remove unnecessary conditional expressions · 44194e3e

由 Namhyung Kim 提交于 6月 20, 2011

Second condition in OR always implies first condition is false
thus bytes_read in the second is not needed. The same goes to
bytes_written.
Signed-off-by: NNamhyung Kim <namhyung@gmail.com>
Acked-by: NFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

44194e3e

bsg: fix bsg_poll() to return POLLOUT properly · 80ceb057

由 Namhyung Kim 提交于 6月 20, 2011

POLLOUT should be returned only if bd->queued_cmds < bd->max_queue
so that bsg_alloc_command() can proceed.
Signed-off-by: NNamhyung Kim <namhyung@gmail.com>
Acked-by: NFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

80ceb057

14 6月, 2011 2 次提交

blk-throttle: Make total_nr_queued unsigned · d2f31a5f

由 Joe Perches 提交于 6月 13, 2011

The total of two unsigned values should also be unsigned.

Update throtl_log output to unsigned.
Update total_nr_queued test to non-zero to be the
same as the other total_nr_queued tests.
Signed-off-by: NJoe Perches <joe@perches.com>
Acked-by: NVivek Goyal <vgoyal@redhat.com>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

d2f31a5f

block: Add __attribute__((format(printf...) and fix fallout · fd16d263

由 Joe Perches 提交于 6月 13, 2011

Use the compiler to verify format strings and arguments.

Fix fallout.
Signed-off-by: NJoe Perches <joe@perches.com>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

fd16d263

13 6月, 2011 2 次提交

block:remove some spare spaces in genhd.c · 9f5e4865

由 Wanlong Gao 提交于 6月 13, 2011

Remove the end-of-line spaces in genhd.c.
Signed-off-by: NWanlong Gao <wanlong.gao@gmail.com>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

9f5e4865

block: Add __attribute__((format(printf...) and fix fallout · 08e8138a

由 Joe Perches 提交于 6月 13, 2011

Use the compiler to verify format strings and arguments.

Fix fallout.
Signed-off-by: NJoe Perches <joe@perches.com>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

08e8138a

10 6月, 2011 3 次提交

block: make disk_block_events() properly wait for work cancellation · fdd514e1

由 Tejun Heo 提交于 6月 09, 2011

disk_block_events() should guarantee that the event work is not in
flight on return and once blocked it shouldn't issue further
cancellations.

Because there was no synchronization between the first blocker doing
cancel_delayed_work_sync() and the following blockers, the following
blockers could finish before cancellation was complete, which broke
both guarantees - event work could be in flight and cancellation could
happen after return.

This bug triggered WARN_ON_ONCE() in disk_clear_events() reported in
bug#34662.

  https://bugzilla.kernel.org/show_bug.cgi?id=34662

Fix it by adding an outer mutex which protects both block count
manipulation and work cancellation.

-v2: Use outer mutex instead of bit waitqueue per Linus.
Signed-off-by: NTejun Heo <tj@kernel.org>
Tested-by: NSitsofe Wheeler <sitsofe@yahoo.com>
Reported-by: NSitsofe Wheeler <sitsofe@yahoo.com>
Reported-by: NBorislav Petkov <bp@alien8.de>
Reported-by: NMeelis Roos <mroos@linux.ee>
Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

fdd514e1

block: remove non-syncing __disk_block_events() and fold it into disk_block_events() · c3af54af

由 Tejun Heo 提交于 6月 09, 2011

After the previous update to disk_check_events(), nobody is using
non-syncing __disk_block_events().  Remove @sync and, as this makes
__disk_block_events() virtually identical to disk_block_events(),
remove the underscore prefixed version.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

c3af54af

block: don't use non-syncing event blocking in disk_check_events() · a9dce2a3

由 Tejun Heo 提交于 6月 09, 2011

This patch is part of fix for triggering of WARN_ON_ONCE() in
disk_clear_events() reported in bug#34662.

  https://bugzilla.kernel.org/show_bug.cgi?id=34662

disk_clear_events() blocks events, schedules and flushes the event
work.  It expects the work to have started execution on schedule and
finished on return from flush.  WARN_ON_ONCE() triggers if the event
work hasn't executed as expected.  This problem happens because
__disk_block_events() fails to guarantee that the event work item is
not in flight on return from the function in race-free manner.  The
problem is two-fold and this patch addresses one of them.

When __disk_block_events() is called with @sync == %false, it bumps
event block count, calls cancel_delayed_work() and return.  This makes
it impossible to guarantee that event polling is not in flight on
return from syncing __disk_block_events() - if the first blocker was
non-syncing, polling could still be in progress and later syncing ones
would assume that the first blocker already canceled it.

Making __disk_block_events() cancel_sync regardless of block count
isn't feasible either as it may race with forced event checking in
disk_clear_events().

As disk_check_events() is the only user of non-syncing
__disk_block_events(), updating it to directly cancel and schedule
event work is the easiest way to solve the issue.

Note that there's another bug in __disk_block_events() and this patch
doesn't fix the issue completely.  Later patch will fix the other bug.
Signed-off-by: NTejun Heo <tj@kernel.org>
Tested-by: NSitsofe Wheeler <sitsofe@yahoo.com>
Reported-by: NSitsofe Wheeler <sitsofe@yahoo.com>
Reported-by: NBorislav Petkov <bp@alien8.de>
Reported-by: NMeelis Roos <mroos@linux.ee>
Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

a9dce2a3

06 6月, 2011 4 次提交

block: rename the return of two functions · df415656

由 Paul Bolle 提交于 6月 06, 2011

If we rename the return of alloc_io_context() and get_io_context() from
"ret" to "ioc" the code get's (a bit) more readable and (a lot) more
grepable.
Signed-off-by: NPaul Bolle <pebolle@tiscali.nl>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

df415656

CFQ: make two functions static · 8aea4545

由 Paul Bolle 提交于 6月 06, 2011

Correctly suggested by sparse.
Signed-off-by: NPaul Bolle <pebolle@tiscali.nl>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

8aea4545

cfq-iosched: fix locking around ioc->ioc_data assignment · 9b50902d

由 Jens Axboe 提交于 6月 05, 2011

Since we are modifying this RCU pointer, we need to hold
the lock protecting it around it.

This fixes a potential reuse and double free of a cfq
io_context structure. The bug has been in CFQ for a long
time, it hit very few people but those it did hit seemed
to see it a lot.

Tracked in RH bugzilla here:

https://bugzilla.redhat.com/show_bug.cgi?id=577968

Credit goes to Paul Bolle for figuring out that the issue
was around the one-hit ioc->ioc_data cache. Thanks to his
hard work the issue is now fixed.

Cc: stable@kernel.org
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

9b50902d

cfq-iosched: fix locking around ioc->ioc_data assignment · ab4bd22d

由 Jens Axboe 提交于 6月 05, 2011

Since we are modifying this RCU pointer, we need to hold
the lock protecting it around it.

This fixes a potential reuse and double free of a cfq
io_context structure. The bug has been in CFQ for a long
time, it hit very few people but those it did hit seemed
to see it a lot.

Tracked in RH bugzilla here:

https://bugzilla.redhat.com/show_bug.cgi?id=577968

Credit goes to Paul Bolle for figuring out that the issue
was around the one-hit ioc->ioc_data cache. Thanks to his
hard work the issue is now fixed.

Cc: stable@kernel.org
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

ab4bd22d

03 6月, 2011 1 次提交

iosched: prevent aliased requests from starving other I/O · 796d5116

由 Jeff Moyer 提交于 6月 02, 2011

Hi, Jens,

If you recall, I posted an RFC patch for this back in July of last year:
http://lkml.org/lkml/2010/7/13/279

The basic problem is that a process can issue a never-ending stream of
async direct I/Os to the same sector on a device, thus starving out
other I/O in the system (due to the way the alias handling works in both
cfq and deadline).  The solution I proposed back then was to start
dispatching from the fifo after a certain number of aliases had been
dispatched.  Vivek asked why we had to treat aliases differently at all,
and I never had a good answer.  So, I put together a simple patch which
allows aliases to be added to the rb tree (it adds them to the right,
though that doesn't matter as the order isn't guaranteed anyway).  I
think this is the preferred solution, as it doesn't break up time slices
in CFQ or batches in deadline.  I've tested it, and it does solve the
starvation issue.  Let me know what you think.

Cheers,
Jeff
Signed-off-by: NJeff Moyer <jmoyer@redhat.com>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

796d5116

02 6月, 2011 2 次提交

block: Use hlist_entry() for io_context.cic_list.first · e2bd9678

由 Paul Bolle 提交于 6月 02, 2011

list_entry() and hlist_entry() are both simply aliases for
container_of(), but since io_context.cic_list.first is an hlist_node one
should at least use the correct alias.
Signed-off-by: NPaul Bolle <pebolle@tiscali.nl>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

e2bd9678

cfq-iosched: Remove bogus check in queue_fail path · 28304f48

由 Paul Bolle 提交于 6月 02, 2011

queue_fail can only be reached if cic is NULL, so its check for cic must
be bogus.
Signed-off-by: NPaul Bolle <pebolle@tiscali.nl>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

28304f48

01 6月, 2011 1 次提交

CFQ: Fix typo and remove unnecessary semicolon · 4495a7d4

由 Kyungmin Park 提交于 5月 31, 2011

Fix comment typo and remove unnecessary semicolon at macro
Signed-off-by: NKyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

4495a7d4

27 5月, 2011 4 次提交

block: export blk_{get,put}_queue() · d86e0e83

由 Jens Axboe 提交于 5月 27, 2011

We need them in SCSI to fix a bug, but currently they are not
exported to modules. Export them.
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

d86e0e83

cgroups: add per-thread subsystem callbacks · f780bdb7

由 Ben Blum 提交于 5月 26, 2011

Add cgroup subsystem callbacks for per-thread attachment in atomic contexts

Add can_attach_task(), pre_attach(), and attach_task() as new callbacks
for cgroups's subsystem interface.  Unlike can_attach and attach, these
are for per-thread operations, to be called potentially many times when
attaching an entire threadgroup.

Also, the old "bool threadgroup" interface is removed, as replaced by
this.  All subsystems are modified for the new interface - of note is
cpuset, which requires from/to nodemasks for attach to be globally scoped
(though per-cpuset would work too) to persist from its pre_attach to
attach_task and attach.

This is a pre-patch for cgroup-procs-writable.patch.
Signed-off-by: NBen Blum <bblum@andrew.cmu.edu>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Matt Helsley <matthltc@us.ibm.com>
Reviewed-by: NPaul Menage <menage@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f780bdb7

block: remove unused variable in bio_attempt_front_merge() · 700c4f33

由 Luca Tettamanti 提交于 5月 26, 2011

sector is never read inside the function.
Signed-off-by: NLuca Tettamanti <kronos.it@gmail.com>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

700c4f33

block: always allocate genhd->ev if check_events is implemented · 75e3f3ee

由 Tejun Heo 提交于 5月 26, 2011

9fd097b1 (block: unexport DISK_EVENT_MEDIA_CHANGE for legacy/fringe
drivers) removed DISK_EVENT_MEDIA_CHANGE from legacy/fringe block
drivers which have inadequate ->check_events().  Combined with earlier
change 7c88a168 (block: don't propagate unlisted DISK_EVENTs to
userland), this enables using ->check_events() for internal processing
while avoiding enabling in-kernel block event polling which can lead
to infinite event loop.

Unfortunately, this made many drivers including floppy without any bit
set in disk->events and ->async_events in which case disk_add_events()
simply skipped allocation of disk->ev, which disables whole event
handling.  As ->check_events() is still used during open processing
for revalidation, this can lead to open failure.

This patch always allocates disk->ev if ->check_events is implemented.
In the long term, it would make sense to simply include the event
structure inline into genhd as it's now used by virtually all block
devices.
Signed-off-by: NTejun Heo <tj@kernel.org>
Reported-by: NOndrej Zary <linux@rainbow-software.org>
Reported-by: NAlex Villacis Lasso <avillaci@ceibo.fiec.espol.edu.ec>
Cc: stable@kernel.org
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

75e3f3ee

24 5月, 2011 1 次提交

cfq-iosched: free cic_index if cfqd allocation fails · 1547010e

由 Namhyung Kim 提交于 5月 24, 2011

When struct cfq_data allocation fails, cic_index need to be freed.
Signed-off-by: NNamhyung Kim <namhyung@gmail.com>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

1547010e