- 30 11月, 2018 2 次提交
-
-
由 Jens Axboe 提交于
blk-mq passes information to the hardware about any given request being the last that we will issue in this sequence. The point is that hardware can defer costly doorbell type writes to the last request. But if we run into errors issuing a sequence of requests, we may never send the request with bd->last == true set. For that case, we need a hook that tells the hardware that nothing else is coming right now. For failures returned by the drivers ->queue_rq() hook, the driver is responsible for flushing pending requests, if it uses bd->last to optimize that part. This works like before, no changes there. Reviewed-by: NOmar Sandoval <osandov@fb.com> Reviewed-by: NMing Lei <ming.lei@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
Only do it if we have requests for multiple queues in the same plug. Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 29 11月, 2018 3 次提交
-
-
由 Dan Carpenter 提交于
I recently found some code which called blk_mq_free_map_and_requests() with a NULL set->tags pointer. I fixed the caller, but it seems like a good idea to add a NULL check here as well. Now we can call: blk_mq_free_tag_set(set); blk_mq_free_tag_set(set); twice in a row and it's harmless. Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Weiping Zhang 提交于
Give a interface to adjust io timeout(ms) by device. Signed-off-by: NWeiping Zhang <zhangweiping@didiglobal.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Yufen Yu 提交于
We recently got a stack by syzkaller like this: BUG: sleeping function called from invalid context at mm/slab.h:361 in_atomic(): 1, irqs_disabled(): 0, pid: 6644, name: blkid INFO: lockdep is turned off. CPU: 1 PID: 6644 Comm: blkid Not tainted 4.4.163-514.55.6.9.x86_64+ #76 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 0000000000000000 5ba6a6b879e50c00 ffff8801f6b07b10 ffffffff81cb2194 0000000041b58ab3 ffffffff833c7745 ffffffff81cb2080 5ba6a6b879e50c00 0000000000000000 0000000000000001 0000000000000004 0000000000000000 Call Trace: <IRQ> [<ffffffff81cb2194>] __dump_stack lib/dump_stack.c:15 [inline] <IRQ> [<ffffffff81cb2194>] dump_stack+0x114/0x1a0 lib/dump_stack.c:51 [<ffffffff8129a981>] ___might_sleep+0x291/0x490 kernel/sched/core.c:7675 [<ffffffff8129ac33>] __might_sleep+0xb3/0x270 kernel/sched/core.c:7637 [<ffffffff81794c13>] slab_pre_alloc_hook mm/slab.h:361 [inline] [<ffffffff81794c13>] slab_alloc_node mm/slub.c:2610 [inline] [<ffffffff81794c13>] slab_alloc mm/slub.c:2692 [inline] [<ffffffff81794c13>] kmem_cache_alloc_trace+0x2c3/0x5c0 mm/slub.c:2709 [<ffffffff81cbe9a7>] kmalloc include/linux/slab.h:479 [inline] [<ffffffff81cbe9a7>] kzalloc include/linux/slab.h:623 [inline] [<ffffffff81cbe9a7>] kobject_uevent_env+0x2c7/0x1150 lib/kobject_uevent.c:227 [<ffffffff81cbf84f>] kobject_uevent+0x1f/0x30 lib/kobject_uevent.c:374 [<ffffffff81cbb5b9>] kobject_cleanup lib/kobject.c:633 [inline] [<ffffffff81cbb5b9>] kobject_release+0x229/0x440 lib/kobject.c:675 [<ffffffff81cbb0a2>] kref_sub include/linux/kref.h:73 [inline] [<ffffffff81cbb0a2>] kref_put include/linux/kref.h:98 [inline] [<ffffffff81cbb0a2>] kobject_put+0x72/0xd0 lib/kobject.c:692 [<ffffffff8216f095>] put_device+0x25/0x30 drivers/base/core.c:1237 [<ffffffff81c4cc34>] delete_partition_rcu_cb+0x1d4/0x2f0 block/partition-generic.c:232 [<ffffffff813c08bc>] __rcu_reclaim kernel/rcu/rcu.h:118 [inline] [<ffffffff813c08bc>] rcu_do_batch kernel/rcu/tree.c:2705 [inline] [<ffffffff813c08bc>] invoke_rcu_callbacks kernel/rcu/tree.c:2973 [inline] [<ffffffff813c08bc>] __rcu_process_callbacks kernel/rcu/tree.c:2940 [inline] [<ffffffff813c08bc>] rcu_process_callbacks+0x59c/0x1c70 kernel/rcu/tree.c:2957 [<ffffffff8120f509>] __do_softirq+0x299/0xe20 kernel/softirq.c:273 [<ffffffff81210496>] invoke_softirq kernel/softirq.c:350 [inline] [<ffffffff81210496>] irq_exit+0x216/0x2c0 kernel/softirq.c:391 [<ffffffff82c2cd7b>] exiting_irq arch/x86/include/asm/apic.h:652 [inline] [<ffffffff82c2cd7b>] smp_apic_timer_interrupt+0x8b/0xc0 arch/x86/kernel/apic/apic.c:926 [<ffffffff82c2bc25>] apic_timer_interrupt+0xa5/0xb0 arch/x86/entry/entry_64.S:746 <EOI> [<ffffffff814cbf40>] ? audit_kill_trees+0x180/0x180 [<ffffffff8187d2f7>] fd_install+0x57/0x80 fs/file.c:626 [<ffffffff8180989e>] do_sys_open+0x45e/0x550 fs/open.c:1043 [<ffffffff818099c2>] SYSC_open fs/open.c:1055 [inline] [<ffffffff818099c2>] SyS_open+0x32/0x40 fs/open.c:1050 [<ffffffff82c299e1>] entry_SYSCALL_64_fastpath+0x1e/0x9a In softirq context, we call rcu callback function delete_partition_rcu_cb(), which may allocate memory by kzalloc with GFP_KERNEL flag. If the allocation cannot be satisfied, it may sleep. However, That is not allowed in softirq contex. Although we found this problem on linux 4.4, the latest kernel version seems to have this problem as well. And it is very similar to the previous one: https://lkml.org/lkml/2018/7/9/391 Fix it by using RCU workqueue, which allows sleep. Reviewed-by: NPaul E. McKenney <paulmck@linux.ibm.com> Signed-off-by: NYufen Yu <yuyufen@huawei.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 28 11月, 2018 1 次提交
-
-
由 Jens Axboe 提交于
If we yank a 'same_queue_rq' request off the plug list, we should also decrement the cached request count. Fixes: 5f0ed774 ("block: sum requests in the plug structure") Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 27 11月, 2018 3 次提交
-
-
由 Jens Axboe 提交于
This isn't exactly the same as the previous count, as it includes requests for all devices. But that really doesn't matter, if we have more than the threshold (16) queued up, flush it. It's not worth it to have an expensive list loop for this. Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Keith Busch 提交于
There are no more users relying on blk-mq request states to prevent double completions, so replace the relatively expensive cmpxchg operation with WRITE_ONCE. Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Keith Busch 提交于
A driver may have internal state to cleanup if we're pretending a request didn't complete. Return 'false' if the command wasn't actually completed due to the timeout error injection, and true otherwise. Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 26 11月, 2018 5 次提交
-
-
由 Jens Axboe 提交于
It's pointless to do so, we are by definition on the CPU we want/need to be, as that's the one waiting for a completion event. Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
Right now we immediately bail if need_resched() is true, but we need to do at least one loop in case we have entries waiting. So just invert the need_resched() check, putting it at the bottom of the loop. Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
blk_poll() has always kept spinning until it found an IO. This is fine for SYNC polling, since we need to find one request we have pending, but in preparation for ASYNC polling it can be beneficial to just check if we have any entries available or not. Existing callers are converted to pass in 'spin == true', to retain the old behavior. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
We always pass in -1 now and none of the callers use the tag value, remove the parameter. Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
If we want to support async IO polling, then we have to allow finding completions that aren't just for the one we are looking for. Always pass in -1 to the mq_ops->poll() helper, and have that return how many events were found in this poll loop. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 21 11月, 2018 2 次提交
-
-
由 Ming Lei 提交于
Even though .mq_kobj, ctx->kobj and q->kobj share same lifetime from block layer's view, actually they don't because userspace may grab one kobject anytime via sysfs. This patch fixes the issue by the following approach: 1) introduce 'struct blk_mq_ctxs' for holding .mq_kobj and managing all ctxs 2) free all allocated ctxs and the 'blk_mq_ctxs' instance in release handler of .mq_kobj 3) grab one ref of .mq_kobj before initializing each ctx->kobj, so that .mq_kobj is always released after all ctxs are freed. This patch fixes kernel panic issue during booting when DEBUG_KOBJECT_RELEASE is enabled. Reported-by: NGuenter Roeck <linux@roeck-us.net> Cc: "jianchao.wang" <jianchao.w.wang@oracle.com> Tested-by: NGuenter Roeck <linux@roeck-us.net> Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
If the first request allocated and issued by a process is a passhthrough request, we don't set up an IO context for it. Ensure that blk_mq_sched_assign_ioc() ignores a NULL io_context. Fixes: e2b3fa5a ("block: Remove bio->bi_ioc") Reported-by: NMing Lei <ming.lei@redhat.com> Tested-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 20 11月, 2018 4 次提交
-
-
由 Damien Le Moal 提交于
For the synchronous I/O path case (read(), write() etc system calls), a BIO I/O priority is not initialized until the execution of blk_init_request_from_bio() when the BIO is submitted and a request initialized for the BIO execution. This is due to the ki_ioprio field of the struct kiocb defined on stack being always initialized to IOPRIO_CLASS_NONE, regardless of the calling process I/O context ioprio value set with ioprio_set(). This late initialization can result in the BIO being merged to pending requests even when the I/O priorities differ. Fix this by initializing the ki_iopriority field of on stack struct kiocb using the get_current_ioprio() helper, ensuring that all BIOs allocated and submitted for the system call execution see the correct intended I/O priority early. With this, since a BIO I/O priority is always set to the intended effective value for both the sync and async path, blk_init_request_from_bio() can be simplified. Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NAdam Manzanares <adam.manzanares@wdc.com> Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Damien Le Moal 提交于
Growing in size a high priority request by merging it with a lower priority BIO or request will increase the request execution time. This is the opposite result of the desired effect of high I/O priorities, namely getting low I/O latencies. Prevent merging of requests and BIOs that have different I/O priorities to fix this. Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Damien Le Moal 提交于
Define get_current_ioprio() as an inline helper to obtain the caller I/O priority from its task I/O context. Use this helper in blk_init_request_from_bio() to set a request ioprio. Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de> Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Damien Le Moal 提交于
bio->bi_ioc is never set so always NULL. Remove references to it in bio_disassociate_task() and in rq_ioc() and delete this field from struct bio. With this change, rq_ioc() always returns current->io_context without the need for a bio argument. Further simplify the code and make it more readable by also removing this helper, which also allows to simplify blk_mq_sched_assign_ioc() by removing its bio argument. Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de> Reviewed-by: NAdam Manzanares <adam.manzanares@wdc.com> Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 19 11月, 2018 2 次提交
-
-
由 Jens Axboe 提交于
We currently only really support sync poll, ie poll with 1 IO in flight. This prepares us for supporting async poll. Note that the returned value isn't necessarily 100% accurate. If poll races with IRQ completion, we assume that the fact that the task is now runnable means we found at least one entry. In reality it could be more than 1, or not even 1. This is fine, the caller will just need to take this into account. Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
For the core poll helper, the task state setting don't need to imply any atomics, as it's the current task itself that is being modified and we're not going to sleep. For IRQ driven, the wakeup path have the necessary barriers to not need us using the heavy handed version of the task state setting. Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 16 11月, 2018 15 次提交
-
-
由 Jens Axboe 提交于
Put the short code in the fast path, where we don't have any functions attached to the queue. This minimizes the impact on the hot path in the core code. Cc: Josef Bacik <josef@toxicpanda.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
Various spots check for q->mq_ops being non-NULL, but provide a helper to do this instead. Where the ->mq_ops != NULL check is redundant, remove it. Since mq == rq-based now that legacy is gone, get rid of the queue_is_rq_based() and just use queue_is_mq() everywhere. Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
This isn't unused, if BFQ is modular we get into trouble. Fixes: b6676f65 ("block: remove a few unused exports") Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
With the legacy request path gone there is no good reason to keep queue_lock as a pointer, we can always use the embedded lock now. Reviewed-by: NHannes Reinecke <hare@suse.com> Signed-off-by: NChristoph Hellwig <hch@lst.de> Fixed floppy and blk-cgroup missing conversions and half done edits. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
With the legacy request path gone there is no real need to override the queue_lock. Reviewed-by: NHannes Reinecke <hare@suse.com> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Reviewed-by: NHannes Reinecke <hare@suse.com> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Use a goto label to merge two identical pieces of error handling code. Reviewed-by: NHannes Reinecke <hare@suse.com> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Reviewed-by: NHannes Reinecke <hare@suse.com> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Only the mq locking is left in the flush state machine. Reviewed-by: NHannes Reinecke <hare@suse.com> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Unused now that the legacy request path is gone. Reviewed-by: NHannes Reinecke <hare@suse.com> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
The only remaining user unconditionally drops and reacquires the lock, which means we really don't need any additional (conditional) annotation. Reviewed-by: NHannes Reinecke <hare@suse.com> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
->queue_flags is generally not set or cleared in the fast path, and also generally set or cleared one flag at a time. Make use of the normal atomic bitops for it so that we don't need to take the queue_lock, which is otherwise mostly unused in the core block layer now. Reviewed-by: NHannes Reinecke <hare@suse.com> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
There is nothing it could synchronize against, so don't go through the pains of acquiring the lock. Reviewed-by: NHannes Reinecke <hare@suse.com> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
No users left since the removal of the legacy request interface, we can remove all the magic bit stealing now and make it a normal field. But use WRITE_ONCE/READ_ONCE on the new deadline field, given that we don't seem to have any mechanism to guarantee a new value actually gets seen by other threads. Reviewed-by: NHannes Reinecke <hare@suse.com> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Unused since the removal of the legacy request code. Reviewed-by: NHannes Reinecke <hare@suse.com> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 15 11月, 2018 2 次提交
-
-
由 Eric Biggers 提交于
blk_try_req_merge() is only used in block/blk-merge.c, so make it static. This addresses a gcc warning when -Wmissing-prototypes is enabled. Signed-off-by: NEric Biggers <ebiggers@google.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Colin Ian King 提交于
The boolean next_sorted is set to false and is never changed, hence the code that checks if it is true is dead code and can now be removed. This dead code occurred from a previous commit that cleaned up the elevator and removed the setting of next_sorted to true. Detected by CoverityScan, CID#1475401 ("'Constant' variable guards dead code") Fixes: a1ce35fa ("block: remove dead elevator code") Signed-off-by: NColin Ian King <colin.king@canonical.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 14 11月, 2018 1 次提交
-
-
由 Ming Lei 提交于
c2856ae2 ("blk-mq: quiesce queue before freeing queue") has already fixed this race, however the implied synchronize_rcu() in blk_mq_quiesce_queue() can slow down LUN probe a lot, so caused performance regression. Then 1311326c ("blk-mq: avoid to synchronize rcu inside blk_cleanup_queue()") tried to quiesce queue for avoiding unnecessary synchronize_rcu() only when queue initialization is done, because it is usual to see lots of inexistent LUNs which need to be probed. However, turns out it isn't safe to quiesce queue only when queue initialization is done. Because when one SCSI command is completed, the user of sending command can be waken up immediately, then the scsi device may be removed, meantime the run queue in scsi_end_request() is still in-progress, so kernel panic can be caused. In Red Hat QE lab, there are several reports about this kind of kernel panic triggered during kernel booting. This patch tries to address the issue by grabing one queue usage counter during freeing one request and the following run queue. Fixes: 1311326c ("blk-mq: avoid to synchronize rcu inside blk_cleanup_queue()") Cc: Andrew Jones <drjones@redhat.com> Cc: Bart Van Assche <bart.vanassche@wdc.com> Cc: linux-scsi@vger.kernel.org Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: Christoph Hellwig <hch@lst.de> Cc: James E.J. Bottomley <jejb@linux.vnet.ibm.com> Cc: stable <stable@vger.kernel.org> Cc: jianchao.wang <jianchao.w.wang@oracle.com> Signed-off-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-