提交 · 74814b1c5569f5503727cf3052a52e2349818409 · openanolis / cloud-kernel

03 5月, 2014 1 次提交

blk-mq: remove extra requeue trace · 74814b1c

由 Jens Axboe 提交于 5月 02, 2014

We already issue a blktrace requeue event in
__blk_mq_requeue_request(), don't do it from the original caller
as well.
Signed-off-by: NJens Axboe <axboe@fb.com>

74814b1c

01 5月, 2014 3 次提交

block: Fix format string mismatch in cfq-iosched.c · 176167ad

由 Masanari Iida 提交于 4月 28, 2014

Fix format string mismatch in cfq_var_show()
Signed-off-by: NMasanari Iida <standby24x7@gmail.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

176167ad

blk-mq: refactor request insertion/merging · c6d600c6

由 Jens Axboe 提交于 4月 30, 2014

Refactor the logic around adding a new bio to a software queue,
so we nest the ctx->lock where we really need it (merge and
insertion) and don't hold it when we don't (init and IO start
accounting).
Signed-off-by: NJens Axboe <axboe@fb.com>

c6d600c6

J
blk-mq remove debug BUG_ON() when draining software queues · 98bc1f27
由 Jens Axboe 提交于 4月 30, 2014
```
It's never been of any use, lets get rid of it.
Signed-off-by: NJens Axboe <axboe@fb.com>
```
98bc1f27

30 4月, 2014 1 次提交

blk-mq: fix waiting for reserved tags · 5810d903

由 Jens Axboe 提交于 4月 29, 2014

blk_mq_wait_for_tags() is only able to wait for "normal" tags,
not reserved tags. Pass in which one we should attempt to get
a tag for, so that waiting for reserved tags will work.

Reserved tags are used for internal commands, which are usually
serialized. Hence no waiting generally takes place, but we should
ensure that it actually works if users need that functionality.
Signed-off-by: NJens Axboe <axboe@fb.com>

5810d903

25 4月, 2014 2 次提交

C
block: fold __blk_add_timer into blk_add_timer · c4a634f4
由 Christoph Hellwig 提交于 4月 25, 2014
```
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>
```
c4a634f4

blk-mq: respect rq_affinity · 38535201

由 Christoph Hellwig 提交于 4月 25, 2014

The blk-mq code is using it's own version of the I/O completion affinity
tunables, which causes a few issues:

 - the rq_affinity sysfs file doesn't work for blk-mq devices, even if it
   still is present, thus breaking existing tuning setups.
 - the rq_affinity = 1 mode, which is the defauly for legacy request based
   drivers isn't implemented at all.
 - blk-mq drivers don't implement any completion affinity with the default
   flag settings.

This patches removes the blk-mq ipi_redirect flag and sysfs file, as well
as the internal BLK_MQ_F_SHOULD_IPI flag and replaces it with code that
respects the queue-wide rq_affinity flags and also implements the
rq_affinity = 1 mode.

This means I/O completion affinity can now only be tuned block-queue wide
instead of per context, which seems more sensible to me anyway.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

38535201

24 4月, 2014 3 次提交

blk-mq: fix race with timeouts and requeue events · 87ee7b11

由 Jens Axboe 提交于 4月 24, 2014

If a requeue event races with a timeout, we can get into the
situation where we attempt to complete a request from the
timeout handler when it's not start anymore. This causes a crash.
So have the timeout handler check that REQ_ATOM_STARTED is still
set on the request - if not, we ignore the event. If this happens,
the request has now been marked as complete. As a consequence, we
need to ensure to clear REQ_ATOM_COMPLETE in blk_mq_start_request(),
as to maintain proper request state.
Signed-off-by: NJens Axboe <axboe@fb.com>

87ee7b11

Revert "blk-mq: initialize req->q in allocation" · 70ab0b2d

由 Jens Axboe 提交于 4月 24, 2014

This reverts commit 6a3c8a3a.

We need selective clearing of the request to make the init-at-free
time completely safe. Otherwise we end up stomping on
rq->atomic_flags, which we don't want to do.

70ab0b2d

blk-mq: fix leak of set->tags · 981bd189

由 Ming Lei 提交于 4月 24, 2014

set->tags should be freed in blk_mq_free_tag_set().
Signed-off-by: NMing Lei <tom.leiming@gmail.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

981bd189

22 4月, 2014 5 次提交

block/blk-throttle.c: add static to blk_throtl_dispatch_work_fn · 8876e140

由 Fabian Frederick 提交于 4月 17, 2014

blk_throtl_dispatch_work_fn is only used in blk-throttle.c

Cc: Jens Axboe <axboe@kernel.dk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: NFabian Frederick <fabf@skynet.be>
Signed-off-by: NJens Axboe <axboe@fb.com>

8876e140

blk-mq: initialize req->q in allocation · 6a3c8a3a

由 Ming Lei 提交于 4月 19, 2014

The patch basically reverts the patch of(blk-mq:
initialize request on allocation) in Jens's tree(already
in -next), and only initialize req->q in allocation
for two reasons:

	- presumed cache hotness on completion
	- blk_rq_tagged(rq) depends on reset of req->mq_ctx
Signed-off-by: NMing Lei <tom.leiming@gmail.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

6a3c8a3a

blk-mq: user (1 << order) to implement order_to_size() · 4ca08500

由 Ming Lei 提交于 4月 19, 2014

Cc: Jörg-Volker Peetz <jvpeetz@web.de>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: NMing Lei <tom.leiming@gmail.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

4ca08500

blk-mq: fix allocation of set->tags · 48479005

由 Ming Lei 提交于 4月 19, 2014

type of set->tags is struct blk_mq_tags **.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NMing Lei <tom.leiming@gmail.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

48479005

blk-mq: free hctx->ctx_map when init failed · 11471e0d

由 Ming Lei 提交于 4月 19, 2014

Avoid memory leak in the failure path.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NMing Lei <tom.leiming@gmail.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

11471e0d

17 4月, 2014 11 次提交

block: relax when to modify the timeout timer · f793aa53

由 Jens Axboe 提交于 4月 16, 2014

Since we are now, by default, applying timer slack to expiry times,
the logic for when to modify a timer in the block code is suboptimal.
The block layer keeps a forward rolling timer per queue for all
requests, and modifies this timer if a request has a shorter timeout
than what the current expiry time is. However, this breaks down
when our rounded timer values get applied slack. Then each new
request ends up modifying the timer, since we're still a little
in front of the timer + slack.

Fix this by allowing a tolerance of HZ / 2, the timeout handling
doesn't need to be very precise. This drastically cuts down
the number of timer modifications we have to make.
Signed-off-by: NJens Axboe <axboe@fb.com>

f793aa53

block: export blk_finish_request · 12120077

由 Christoph Hellwig 提交于 4月 16, 2014

This allows to mirror the blk-mq code flow for more a more readable I/O
completion handler in SCSI.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

12120077

blk-mq: rename mq_flush_work struct request member · f88a164b

由 Christoph Hellwig 提交于 4月 16, 2014

We will use this work_struct to requeue scsi commands from the
completion handler as well, so give it a more generic name.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

f88a164b

blk-mq: add blk_mq_requeue_request · ed0791b2

由 Christoph Hellwig 提交于 4月 16, 2014

This allows to requeue a request that has been accepted by ->queue_rq
earlier.  This is needed by the SCSI layer in various error conditions.

The existing internal blk_mq_requeue_request is renamed to
__blk_mq_requeue_request as it is a lower level building block for this
funtionality.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

ed0791b2

blk-mq: add blk_mq_start_hw_queues · 2f268556

由 Christoph Hellwig 提交于 4月 16, 2014

Add a helper to unconditionally kick contexts of a queue.  This will
be needed by the SCSI layer to provide fair queueing between multiple
devices on a single host.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

2f268556

blk-mq: add blk_mq_delay_queue · 70f4db63

由 Christoph Hellwig 提交于 4月 16, 2014

Add a blk-mq equivalent to blk_delay_queue so that the scsi layer can ask
to be kicked again after a delay.
Signed-off-by: NChristoph Hellwig <hch@lst.de>

Modified by me to kill the unnecessary preempt disable/enable
in the delayed workqueue handler.
Signed-off-by: NJens Axboe <axboe@fb.com>

70f4db63

C
blk-mq: add async parameter to blk_mq_start_stopped_hw_queues · 1b4a3258
由 Christoph Hellwig 提交于 4月 16, 2014
```
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>
```
1b4a3258

blk-mq: bidi support · 91b63639

由 Christoph Hellwig 提交于 4月 16, 2014

Add two unlinkely branches to make sure the resid is initialized correctly
for bidi request pairs, and the second request gets properly freed.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

91b63639

blk-mq: allow drivers to hook into I/O completion · 63151a44

由 Christoph Hellwig 提交于 4月 16, 2014

Split out the bottom half of blk_mq_end_io so that drivers can perform
work when they know a request has been completed, but before it has been
freed.  This also obsoletes blk_mq_end_io_partial as drivers can now
pass any value to blk_update_request directly.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

63151a44

blk-mq: kill preempt disable/enable in blk_mq_work_fn() · 6700a678

由 Jens Axboe 提交于 4月 16, 2014

blk_mq_work_fn() is always invoked off the bounded workqueues,
so it can happily preempt among the queues in that set without
causing any issues for blk-mq.
Signed-off-by: NJens Axboe <axboe@fb.com>

6700a678

blk-mq: don't use preempt_count() to check for right CPU · fd1270d5

由 Jens Axboe 提交于 4月 16, 2014

UP or CONFIG_PREEMPT_NONE will return 0, and what we really
want to check is whether or not we are on the right CPU.
So don't make PREEMPT part of this, just test the CPU in
the mask directly.
Signed-off-by: NJens Axboe <axboe@fb.com>

fd1270d5

16 4月, 2014 7 次提交

blk-mq: split out tag initialization, support shared tags · 24d2f903

由 Christoph Hellwig 提交于 4月 15, 2014

Add a new blk_mq_tag_set structure that gets set up before we initialize
the queue.  A single blk_mq_tag_set structure can be shared by multiple
queues.
Signed-off-by: NChristoph Hellwig <hch@lst.de>

Modular export of blk_mq_{alloc,free}_tagset added by me.
Signed-off-by: NJens Axboe <axboe@fb.com>

24d2f903

blk-mq: initialize request on allocation · ed44832d

由 Christoph Hellwig 提交于 4月 14, 2014

If we want to share tag and request allocation between queues we cannot
initialize the request at init/free time, but need to initialize it
at allocation time as it might get used for different queues over its
lifetime.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

ed44832d

blk-mq: add ->init_request and ->exit_request methods · e9b267d9

由 Christoph Hellwig 提交于 4月 15, 2014

The current blk_mq_init_commands/blk_mq_free_commands interface has a
two problems:

 1) Because only the constructor is passed to blk_mq_init_commands there
    is no easy way to clean up when a comman initialization failed.  The
    current code simply leaks the allocations done in the constructor.

 2) There is no good place to call blk_mq_free_commands: before
    blk_cleanup_queue there is no guarantee that all outstanding
    commands have completed, so we can't free them yet.  After
    blk_cleanup_queue the queue has usually been freed.  This can be
    worked around by grabbing an unconditional reference before calling
    blk_cleanup_queue and dropping it after blk_mq_free_commands is
    done, although that's not exatly pretty and driver writers are
    guaranteed to get it wrong sooner or later.

Both issues are easily fixed by making the request constructor and
destructor normal blk_mq_ops methods.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

e9b267d9

blk-mq: make ->flush_rq fully transparent to drivers · 8727af4b

由 Christoph Hellwig 提交于 4月 14, 2014

Drivers shouldn't have to care about the block layer setting aside a
request to implement the flush state machine.  We already override the
mq context and tag to make it more transparent, but so far haven't deal
with the driver private data in the request.  Make sure to override this
as well, and while we're at it add a proper helper sitting in blk-mq.c
that implements the full impersonation.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

8727af4b

blk-mq: do not initialize req->special · 9d74e257

由 Christoph Hellwig 提交于 4月 14, 2014

Drivers can reach their private data easily using the blk_mq_rq_to_pdu
helper and don't need req->special.  By not initializing it code can
be simplified nicely, and we also shave off a few more instructions from
the I/O path.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

9d74e257

blk-mq: initialize resid_len · 742ee69b

由 Christoph Hellwig 提交于 4月 14, 2014

Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

742ee69b

block: remove struct request buffer member · b4f42e28

由 Jens Axboe 提交于 4月 10, 2014

This was used in the olden days, back when onions were proper
yellow. Basically it mapped to the current buffer to be
transferred. With highmem being added more than a decade ago,
most drivers map pages out of a bio, and rq->buffer isn't
pointing at anything valid.

Convert old style drivers to just use bio_data().

For the discard payload use case, just reference the page
in the bio.
Signed-off-by: NJens Axboe <axboe@fb.com>

b4f42e28

11 4月, 2014 1 次提交

block: replace IS_ERR and PTR_ERR with PTR_ERR_OR_ZERO · 21f9fcd8

由 Duan Jiong 提交于 4月 11, 2014

This patch fixes coccinelle error regarding usage of IS_ERR and
PTR_ERR instead of PTR_ERR_OR_ZERO.
Signed-off-by: NDuan Jiong <duanj.fnst@cn.fujitsu.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

21f9fcd8

10 4月, 2014 5 次提交

block: fix regression with block enabled tagging · 360f92c2

由 Jens Axboe 提交于 4月 09, 2014

Martin reported that his test system would not boot with
current git, it oopsed with this:

BUG: unable to handle kernel paging request at ffff88046c6c9e80
IP: [<ffffffff812971e0>] blk_queue_start_tag+0x90/0x150
PGD 1ddf067 PUD 1de2067 PMD 47fc7d067 PTE 800000046c6c9060
Oops: 0002 [#1] SMP DEBUG_PAGEALLOC
Modules linked in: sd_mod lpfc(+) scsi_transport_fc scsi_tgt oracleasm
rpcsec_gss_krb5 ipv6 igb dca i2c_algo_bit i2c_core hwmon
CPU: 3 PID: 87 Comm: kworker/u17:1 Not tainted 3.14.0+ #246
Hardware name: Supermicro X9DRX+-F/X9DRX+-F, BIOS 3.00 07/09/2013
Workqueue: events_unbound async_run_entry_fn
task: ffff8802743c2150 ti: ffff880273d02000 task.ti: ffff880273d02000
RIP: 0010:[<ffffffff812971e0>]  [<ffffffff812971e0>]
blk_queue_start_tag+0x90/0x150
RSP: 0018:ffff880273d03a58  EFLAGS: 00010092
RAX: ffff88046c6c9e78 RBX: ffff880077208e78 RCX: 00000000fffc8da6
RDX: 00000000fffc186d RSI: 0000000000000009 RDI: 00000000fffc8d9d
RBP: ffff880273d03a88 R08: 0000000000000001 R09: ffff8800021c2410
R10: 0000000000000005 R11: 0000000000015b30 R12: ffff88046c5bb8a0
R13: ffff88046c5c0890 R14: 000000000000001e R15: 000000000000001e
FS:  0000000000000000(0000) GS:ffff880277b00000(0000)
knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff88046c6c9e80 CR3: 00000000018f6000 CR4: 00000000000407e0
Stack:
 ffff880273d03a98 ffff880474b18800 0000000000000000 ffff880474157000
 ffff88046c5c0890 ffff880077208e78 ffff880273d03ae8 ffffffff813b9e62
 ffff880200000010 ffff880474b18968 ffff880474b18848 ffff88046c5c0cd8
Call Trace:
 [<ffffffff813b9e62>] scsi_request_fn+0xf2/0x510
 [<ffffffff81293167>] __blk_run_queue+0x37/0x50
 [<ffffffff8129ac43>] blk_execute_rq_nowait+0xb3/0x130
 [<ffffffff8129ad24>] blk_execute_rq+0x64/0xf0
 [<ffffffff8108d2b0>] ? bit_waitqueue+0xd0/0xd0
 [<ffffffff813bba35>] scsi_execute+0xe5/0x180
 [<ffffffff813bbe4a>] scsi_execute_req_flags+0x9a/0x110
 [<ffffffffa01b1304>] sd_spinup_disk+0x94/0x460 [sd_mod]
 [<ffffffff81160000>] ? __unmap_hugepage_range+0x200/0x2f0
 [<ffffffffa01b2b9a>] sd_revalidate_disk+0xaa/0x3f0 [sd_mod]
 [<ffffffffa01b2fb8>] sd_probe_async+0xd8/0x200 [sd_mod]
 [<ffffffff8107703f>] async_run_entry_fn+0x3f/0x140
 [<ffffffff8106a1c5>] process_one_work+0x175/0x410
 [<ffffffff8106b373>] worker_thread+0x123/0x400
 [<ffffffff8106b250>] ? manage_workers+0x160/0x160
 [<ffffffff8107104e>] kthread+0xce/0xf0
 [<ffffffff81070f80>] ? kthread_freezable_should_stop+0x70/0x70
 [<ffffffff815f0bac>] ret_from_fork+0x7c/0xb0
 [<ffffffff81070f80>] ? kthread_freezable_should_stop+0x70/0x70
Code: 48 0f ab 11 72 db 48 81 4b 40 00 00 10 00 89 83 08 01 00 00 48 89
df 49 8b 04 24 48 89 1c d0 e8 f7 a8 ff ff 49 8b 85 28 05 00 00 <48> 89
58 08 48 89 03 49 8d 85 28 05 00 00 48 89 43 08 49 89 9d
RIP  [<ffffffff812971e0>] blk_queue_start_tag+0x90/0x150
 RSP <ffff880273d03a58>
CR2: ffff88046c6c9e80

Martin bisected and found this to be the problem patch;

	commit 6d113398
	Author: Jan Kara <jack@suse.cz>
	Date:   Mon Feb 24 16:39:54 2014 +0100

	    block: Stop abusing rq->csd.list in blk-softirq

and the problem was immediately apparent. The patch states that
it is safe to reuse queuelist at completion time, since it is
no longer used. However, that is not true if a device is using
block enabled tagging. If that is the case, then the queuelist
is reused to keep track of busy tags. If a device also ended
up using softirq completions, we'd reuse ->queuelist for the
IPI handling while block tagging was still using it. Boom.

Fix this by adding a new ipi_list list head, and share the
memory used with the request hash table. The hash table is
never used after the request is moved to the dispatch list,
which happens long before any potential completion of the
request. Add a new request bit for this, so we don't have
cases that check rq->hash while it could potentially have
been reused for the IPI completion.
Reported-by: NMartin K. Petersen <martin.petersen@oracle.com>
Tested-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NJens Axboe <axboe@fb.com>

360f92c2

blk-mq: simplify blk_mq_hw_sysfs_cpus_show() · cb2da43e

由 Jens Axboe 提交于 4月 09, 2014

Now that we have a cpu mask of CPUs that are mapped to
a specific hardware queue, we can just iterate that to
display the sysfs num-hw-queue/cpu_list file.
Signed-off-by: NJens Axboe <axboe@fb.com>

cb2da43e

blk-mq: ensure that hardware queues are always run on the mapped CPUs · e4043dcf

由 Jens Axboe 提交于 4月 09, 2014

Instead of providing soft mappings with no guarantees on hardware
queues always being run on the right CPU, switch to a hard mapping
guarantee that ensure that we always run the hardware queue on
(one of, if more) the mapped CPU.
Signed-off-by: NJens Axboe <axboe@fb.com>

e4043dcf

block: add kblockd_schedule_delayed_work_on() · 8ab14595

由 Jens Axboe 提交于 4月 08, 2014

Same function as kblockd_schedule_delayed_work(), but allow the
caller to pass in a CPU that the work should be executed on. This
just directly extends and maps into the workqueue API, and will
be used to make the blk-mq mappings more strict.
Signed-off-by: NJens Axboe <axboe@fb.com>

8ab14595

J
block: remove 'q' parameter from kblockd_schedule_*_work() · 59c3d45e
由 Jens Axboe 提交于 4月 08, 2014
```
The queue parameter is never used, just get rid of it.
Signed-off-by: NJens Axboe <axboe@fb.com>
```
59c3d45e

07 4月, 2014 1 次提交

blk-mq: fix potential stall during CPU unplug with IO pending · bccb5f7c

由 Jens Axboe 提交于 4月 04, 2014

When a CPU is unplugged, we move the blk_mq_ctx request entries
to the current queue. The current code forgets to remap the
blk_mq_hw_ctx before marking the software context pending,
which breaks if old-cpu and new-cpu don't map to the same
hardware queue.

Additionally, if we mark entries as pending in the new
hardware queue, then make sure we schedule it for running.
Otherwise request could be sitting there until someone else
queues IO for that hardware queue.
Signed-off-by: NJens Axboe <axboe@fb.com>

bccb5f7c

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功