- 16 9月, 2019 1 次提交
-
-
由 Ming Lei 提交于
[ Upstream commit c7e2d94b3d1634988a95ac4d77a72dc7487ece06 ] Once blk_cleanup_queue() returns, tags shouldn't be used any more, because blk_mq_free_tag_set() may be called. Commit 45a9c9d9 ("blk-mq: Fix a use-after-free") fixes this issue exactly. However, that commit introduces another issue. Before 45a9c9d9, we are allowed to run queue during cleaning up queue if the queue's kobj refcount is held. After that commit, queue can't be run during queue cleaning up, otherwise oops can be triggered easily because some fields of hctx are freed by blk_mq_free_queue() in blk_cleanup_queue(). We have invented ways for addressing this kind of issue before, such as: 8dc765d438f1 ("SCSI: fix queue cleanup race before queue initialization is done") c2856ae2 ("blk-mq: quiesce queue before freeing queue") But still can't cover all cases, recently James reports another such kind of issue: https://marc.info/?l=linux-scsi&m=155389088124782&w=2 This issue can be quite hard to address by previous way, given scsi_run_queue() may run requeues for other LUNs. Fixes the above issue by freeing hctx's resources in its release handler, and this way is safe becasue tags isn't needed for freeing such hctx resource. This approach follows typical design pattern wrt. kobject's release handler. Cc: Dongli Zhang <dongli.zhang@oracle.com> Cc: James Smart <james.smart@broadcom.com> Cc: Bart Van Assche <bart.vanassche@wdc.com> Cc: linux-scsi@vger.kernel.org, Cc: Martin K . Petersen <martin.petersen@oracle.com>, Cc: Christoph Hellwig <hch@lst.de>, Cc: James E . J . Bottomley <jejb@linux.vnet.ibm.com>, Reported-by: NJames Smart <james.smart@broadcom.com> Fixes: 45a9c9d9 ("blk-mq: Fix a use-after-free") Cc: stable@vger.kernel.org Reviewed-by: NHannes Reinecke <hare@suse.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Tested-by: NJames Smart <james.smart@broadcom.com> Signed-off-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NSasha Levin <sashal@kernel.org>
-
- 04 8月, 2019 1 次提交
-
-
由 Bart Van Assche 提交于
commit cd84a62e0078dce09f4ed349bec84f86c9d54b30 upstream. The RQF_PREEMPT flag is used for three purposes: - In the SCSI core, for making sure that power management requests are executed even if a device is in the "quiesced" state. - For domain validation by SCSI drivers that use the parallel port. - In the IDE driver, for IDE preempt requests. Rename "preempt-only" into "pm-only" because the primary purpose of this mode is power management. Since the power management core may but does not have to resume a runtime suspended device before performing system-wide suspend and since a later patch will set "pm-only" mode as long as a block device is runtime suspended, make it possible to set "pm-only" mode from more than one context. Since with this change scsi_device_quiesce() is no longer idempotent, make that function return early if it is called for a quiesced queue. Signed-off-by: NBart Van Assche <bvanassche@acm.org> Acked-by: NMartin K. Petersen <martin.petersen@oracle.com> Reviewed-by: NHannes Reinecke <hare@suse.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NMing Lei <ming.lei@redhat.com> Cc: Jianchao Wang <jianchao.w.wang@oracle.com> Cc: Johannes Thumshirn <jthumshirn@suse.de> Cc: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 31 7月, 2019 1 次提交
-
-
由 Josef Bacik 提交于
[ Upstream commit b554db147feea39617b533ab6bca247c91c6198a ] We discovered a problem in newer kernels where a disconnect of a NBD device while the flush request was pending would result in a hang. This is because the blk mq timeout handler does if (!refcount_inc_not_zero(&rq->ref)) return true; to determine if it's ok to run the timeout handler for the request. Flush_rq's don't have a ref count set, so we'd skip running the timeout handler for this request and it would just sit there in limbo forever. Fix this by always setting the refcount of any request going through blk_init_rq() to 1. I tested this with a nbd-server that dropped flush requests to verify that it hung, and then tested with this patch to verify I got the timeout as expected and the error handling kicked in. Thanks, Signed-off-by: NJosef Bacik <josef@toxicpanda.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NSasha Levin <sashal@kernel.org>
-
- 10 7月, 2019 1 次提交
-
-
由 Guilherme G. Piccoli 提交于
```-------------------------------------------------------------- This patch is not on mainline and is meant to 4.19 stable *only*. After the patch description there's a reasoning about that. ``` -------------------------------------------------------------- Commit 37f9579f ("blk-mq: Avoid that submitting a bio concurrently with device removal triggers a crash") introduced a NULL pointer dereference in generic_make_request(). The patch sets q to NULL and enter_succeeded to false; right after, there's an 'if (enter_succeeded)' which is not taken, and then the 'else' will dereference q in blk_queue_dying(q). This patch just moves the 'q = NULL' to a point in which it won't trigger the oops, although the semantics of this NULLification remains untouched. A simple test case/reproducer is as follows: a) Build kernel v4.19.56-stable with CONFIG_BLK_CGROUP=n. b) Create a raid0 md array with 2 NVMe devices as members, and mount it with an ext4 filesystem. c) Run the following oneliner (supposing the raid0 is mounted in /mnt): (dd of=/mnt/tmp if=/dev/zero bs=1M count=999 &); sleep 0.3; echo 1 > /sys/block/nvme1n1/device/device/remove (whereas nvme1n1 is the 2nd array member) This will trigger the following oops: BUG: unable to handle kernel NULL pointer dereference at 0000000000000078 PGD 0 P4D 0 Oops: 0000 [#1] SMP PTI RIP: 0010:generic_make_request+0x32b/0x400 Call Trace: submit_bio+0x73/0x140 ext4_io_submit+0x4d/0x60 ext4_writepages+0x626/0xe90 do_writepages+0x4b/0xe0 [...] This patch has no functional changes and preserves the md/raid0 behavior when a member is removed before kernel v4.17. ---------------------------- Why this is not on mainline? ---------------------------- The patch was originally submitted upstream in linux-raid and linux-block mailing-lists - it was initially accepted by Song Liu, but Christoph Hellwig[0] observed that there was a clean-up series ready to be accepted from Ming Lei[1] that fixed the same issue. The accepted patches from Ming's series in upstream are: commit 47cdee29ef9d ("block: move blk_exit_queue into __blk_release_queue") and commit fe2008640ae3 ("block: don't protect generic_make_request_checks with blk_queue_enter"). Those patches basically do a clean-up in the block layer involving: 1) Putting back blk_exit_queue() logic into __blk_release_queue(); that path was changed in the past and the logic from blk_exit_queue() was added to blk_cleanup_queue(). 2) Removing the guard/protection in generic_make_request_checks() with blk_queue_enter(). The problem with Ming's series for -stable is that it relies in the legacy request IO path removal. So it's "backport-able" to v5.0+, but doing that for early versions (like 4.19) would incur in complex code changes. Hence, it was suggested by Christoph and Song Liu that this patch was submitted to stable only; otherwise merging it upstream would add code to fix a path removed in a subsequent commit. [0] lore.kernel.org/linux-block/20190521172258.GA32702@infradead.org [1] lore.kernel.org/linux-block/20190515030310.20393-1-ming.lei@redhat.com Cc: Christoph Hellwig <hch@lst.de> Cc: Jens Axboe <axboe@kernel.dk> Reviewed-by: NBart Van Assche <bvanassche@acm.org> Reviewed-by: NMing Lei <ming.lei@redhat.com> Tested-by: NEric Ren <renzhengeek@gmail.com> Fixes: 37f9579f ("blk-mq: Avoid that submitting a bio concurrently with device removal triggers a crash") Signed-off-by: NGuilherme G. Piccoli <gpiccoli@canonical.com> Acked-by: NSong Liu <songliubraving@fb.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 15 6月, 2019 1 次提交
-
-
由 Ming Lei 提交于
[ Upstream commit fbc2a15e3433058582e5635aabe48a3011a644a8 ] With holding queue's kobject refcount, it is safe for driver to schedule requeue. However, blk_mq_kick_requeue_list() may be called after blk_sync_queue() is done because of concurrent requeue activities, then requeue work may not be completed when freeing queue, and kernel oops is triggered. So moving the cancel of requeue_work into blk_mq_release() for avoiding race between requeue and freeing queue. Cc: Dongli Zhang <dongli.zhang@oracle.com> Cc: James Smart <james.smart@broadcom.com> Cc: Bart Van Assche <bart.vanassche@wdc.com> Cc: linux-scsi@vger.kernel.org, Cc: Martin K . Petersen <martin.petersen@oracle.com>, Cc: Christoph Hellwig <hch@lst.de>, Cc: James E . J . Bottomley <jejb@linux.vnet.ibm.com>, Reviewed-by: NBart Van Assche <bvanassche@acm.org> Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de> Reviewed-by: NHannes Reinecke <hare@suse.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Tested-by: NJames Smart <james.smart@broadcom.com> Signed-off-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NSasha Levin <sashal@kernel.org>
-
- 08 5月, 2019 1 次提交
-
-
由 Tetsuo Handa 提交于
[ Upstream commit 2e3c18d0ada16f145087b2687afcad1748c0827c ] syzbot is hitting flush_work() warning caused by commit 4d43d395fed12463 ("workqueue: Try to catch flush_work() without INIT_WORK().") [1]. Although that commit did not expect INIT_WORK(NULL) case, calling flush_work() without setting a valid callback should be avoided anyway. Fix this problem by setting a no-op callback instead of NULL. [1] https://syzkaller.appspot.com/bug?id=e390366bc48bc82a7c668326e0663be3b91cbd29Signed-off-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reported-and-tested-by: Nsyzbot <syzbot+ba2a929dcf8e704c180e@syzkaller.appspotmail.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: NJens Axboe <axboe@kernel.dk> [sl: rename blk_timeout_work] Signed-off-by: NSasha Levin <sashal@kernel.org>
-
- 21 11月, 2018 1 次提交
-
-
由 Ming Lei 提交于
commit 8dc765d438f1e42b3e8227b3b09fad7d73f4ec9a upstream. c2856ae2 ("blk-mq: quiesce queue before freeing queue") has already fixed this race, however the implied synchronize_rcu() in blk_mq_quiesce_queue() can slow down LUN probe a lot, so caused performance regression. Then 1311326c ("blk-mq: avoid to synchronize rcu inside blk_cleanup_queue()") tried to quiesce queue for avoiding unnecessary synchronize_rcu() only when queue initialization is done, because it is usual to see lots of inexistent LUNs which need to be probed. However, turns out it isn't safe to quiesce queue only when queue initialization is done. Because when one SCSI command is completed, the user of sending command can be waken up immediately, then the scsi device may be removed, meantime the run queue in scsi_end_request() is still in-progress, so kernel panic can be caused. In Red Hat QE lab, there are several reports about this kind of kernel panic triggered during kernel booting. This patch tries to address the issue by grabing one queue usage counter during freeing one request and the following run queue. Fixes: 1311326c ("blk-mq: avoid to synchronize rcu inside blk_cleanup_queue()") Cc: Andrew Jones <drjones@redhat.com> Cc: Bart Van Assche <bart.vanassche@wdc.com> Cc: linux-scsi@vger.kernel.org Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: Christoph Hellwig <hch@lst.de> Cc: James E.J. Bottomley <jejb@linux.vnet.ibm.com> Cc: stable <stable@vger.kernel.org> Cc: jianchao.wang <jianchao.w.wang@oracle.com> Signed-off-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 22 9月, 2018 1 次提交
-
-
由 Omar Sandoval 提交于
Klaus Kusche reported that the I/O busy time in /proc/diskstats was not updating properly on 4.18. This is because we started using ktime to track elapsed time, and we convert nanoseconds to jiffies when we update the partition counter. However, this gets rounded down, so any I/Os that take less than a jiffy are not accounted for. Previously in this case, the value of jiffies would sometimes increment while we were doing I/O, so at least some I/Os were accounted for. Let's convert the stats to use nanoseconds internally. We still report milliseconds as before, now more accurately than ever. The value is still truncated to 32 bits for backwards compatibility. Fixes: 522a7775 ("block: consolidate struct request timestamp fields") Cc: stable@vger.kernel.org Reported-by: NKlaus Kusche <klaus.kusche@computerix.info> Signed-off-by: NOmar Sandoval <osandov@fb.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 06 9月, 2018 1 次提交
-
-
由 Mikulas Patocka 提交于
It is possible to call fsync on a read-only handle (for example, fsck.ext2 does it when doing read-only check), and this call results in kernel warning. The patch b089cfd9 ("block: don't warn for flush on read-only device") attempted to disable the warning, but it is buggy and it doesn't (op_is_flush tests flags, but bio_op strips off the flags). Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Fixes: 721c7fc7 ("block: fail op_is_write() requests to read-only partitions") Cc: stable@vger.kernel.org # 4.18 Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 18 8月, 2018 1 次提交
-
-
由 Chaitanya Kulkarni 提交于
This patch removes the duplicate initialization of q->queue_head in the blk_alloc_queue_node(). This removes the 2nd initialization so that we preserve the initialization order same as declaration present in struct request_queue. Reviewed-by: NOmar Sandoval <osandov@fb.com> Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 15 8月, 2018 1 次提交
-
-
由 Jens Axboe 提交于
Don't warn for a flush issued to a read-only device. It's not strictly a writable command, as it doesn't change any on-media data by itself. Reported-by: NStefan Agner <stefan@agner.ch> Fixes: 721c7fc7 ("block: fail op_is_write() requests to read-only partitions") Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 09 8月, 2018 1 次提交
-
-
由 Bart Van Assche 提交于
This patch does not change any functionality. Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com> Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de> Cc: Christoph Hellwig <hch@lst.de> Cc: Ming Lei <ming.lei@redhat.com> Cc: Omar Sandoval <osandov@fb.com> Cc: Alexandru Moise <00moses.alexander00@gmail.com> Cc: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: <stable@vger.kernel.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 05 8月, 2018 1 次提交
-
-
由 Linus Torvalds 提交于
It turns out that commit 721c7fc7 ("block: fail op_is_write() requests to read-only partitions"), while obviously correct, causes problems for some older lvm2 installations. The reason is that the lvm snapshotting will continue to write to the snapshow COW volume, even after the volume has been marked read-only. End result: snapshot failure. This has actually been fixed in newer version of the lvm2 tool, but the old tools still exist, and the breakage was reported both in the kernel bugzilla and in the Debian bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=200439 https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=900442 The lvm2 fix is here https://sourceware.org/git/?p=lvm2.git;a=commit;h=a6fdb9d9d70f51c49ad11a87ab4243344e6701a3 but until everybody has updated to recent versions, we'll have to weaken the "never write to read-only partitions" check. It now allows the write to happen, but causes a warning, something like this: generic_make_request: Trying to write to read-only block-device dm-3 (partno X) Modules linked in: nf_tables xt_cgroup xt_owner kvm_intel iwlmvm kvm irqbypass iwlwifi CPU: 1 PID: 77 Comm: kworker/1:1 Not tainted 4.17.9-gentoo #3 Hardware name: LENOVO 20B6A019RT/20B6A019RT, BIOS GJET91WW (2.41 ) 09/21/2016 Workqueue: ksnaphd do_metadata RIP: 0010:generic_make_request_checks+0x4ac/0x600 ... Call Trace: generic_make_request+0x64/0x400 submit_bio+0x6c/0x140 dispatch_io+0x287/0x430 sync_io+0xc3/0x120 dm_io+0x1f8/0x220 do_metadata+0x1d/0x30 process_one_work+0x1b9/0x3e0 worker_thread+0x2b/0x3c0 kthread+0x113/0x130 ret_from_fork+0x35/0x40 Note that this is a "revert" in behavior only. I'm leaving alone the actual code cleanups in commit 721c7fc7, but letting the previously uncaught request go through with a warning instead of stopping it. Fixes: 721c7fc7 ("block: fail op_is_write() requests to read-only partitions") Reported-and-tested-by: NWGH <wgh@torlan.ru> Acked-by: NMike Snitzer <snitzer@redhat.com> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: Ilya Dryomov <idryomov@gmail.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Zdenek Kabelac <zkabelac@redhat.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 03 8月, 2018 1 次提交
-
-
由 Ming Lei 提交于
Runtime PM isn't ready for blk-mq yet, and commit 765e40b6 ("block: disable runtime-pm for blk-mq") tried to disable it. Unfortunately, it can't take effect in that way since user space still can switch it on via 'echo auto > /sys/block/sdN/device/power/control'. This patch disables runtime-pm for blk-mq really by pm_runtime_disable() and fixes all kinds of PM related kernel crash. Cc: Tomas Janousek <tomi@nomi.cz> Cc: Przemek Socha <soprwa@gmail.com> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: <stable@vger.kernel.org> Reviewed-by: NBart Van Assche <bart.vanassche@wdc.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Tested-by: NPatrick Steinhardt <ps@pks.im> Signed-off-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 30 7月, 2018 1 次提交
-
-
由 xiao jin 提交于
We find the memory use-after-free issue in __blk_drain_queue() on the kernel 4.14. After read the latest kernel 4.18-rc6 we think it has the same problem. Memory is allocated for q->fq in the blk_init_allocated_queue(). If the elevator init function called with error return, it will run into the fail case to free the q->fq. Then the __blk_drain_queue() uses the same memory after the free of the q->fq, it will lead to the unpredictable event. The patch is to set q->fq as NULL in the fail case of blk_init_allocated_queue(). Fixes: commit 7c94e1c1 ("block: introduce blk_flush_queue to drive flush machinery") Cc: <stable@vger.kernel.org> Reviewed-by: NMing Lei <ming.lei@redhat.com> Reviewed-by: NBart Van Assche <bart.vanassche@wdc.com> Signed-off-by: Nxiao jin <jin.xiao@intel.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 18 7月, 2018 1 次提交
-
-
由 Michael Callahan 提交于
Add and use a new op_stat_group() function for indexing partition stat fields rather than indexing them by rq_data_dir() or bio_data_dir(). This function works similarly to op_is_sync() in that it takes the request::cmd_flags or bio::bi_opf flags and determines which stats should et updated. In addition, the second parameter to generic_start_io_acct() and generic_end_io_acct() is now a REQ_OP rather than simply a read or write bit and it uses op_stat_group() on the parameter to determine the stat group. Note that the partition in_flight counts are not part of the per-cpu statistics and as such are not indexed via this function. It's now indexed by op_is_write(). tj: Refreshed on top of v4.17. Updated to pass around REQ_OP. Signed-off-by: NMichael Callahan <michaelcallahan@fb.com> Signed-off-by: NTejun Heo <tj@kernel.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Joshua Morris <josh.h.morris@us.ibm.com> Cc: Philipp Reisner <philipp.reisner@linbit.com> Cc: Matias Bjorling <mb@lightnvm.io> Cc: Kent Overstreet <kent.overstreet@gmail.com> Cc: Alasdair Kergon <agk@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 09 7月, 2018 5 次提交
-
-
由 Geert Uytterhoeven 提交于
With gcc 4.9.0 and 7.3.0: block/blk-core.c: In function 'blk_pm_allow_request': block/blk-core.c:2747:2: warning: enumeration value 'RPM_ACTIVE' not handled in switch [-Wswitch] switch (rq->q->rpm_status) { ^ Convert the return statement below the switch() block into a default case to fix this. Fixes: e4f36b24 ("block: fix peeking requests during PM") Signed-off-by: NGeert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Josef Bacik 提交于
We don't really need to save this stuff in the core block code, we can just pass the bio back into the helpers later on to derive the same flags and update the rq->wbt_flags appropriately. Signed-off-by: NJosef Bacik <jbacik@fb.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Josef Bacik 提交于
blkcg-qos is going to do essentially what wbt does, only on a cgroup basis. Break out the common code that will be shared between blkcg-qos and wbt into blk-rq-qos.* so they can both utilize the same infrastructure. Signed-off-by: NJosef Bacik <jbacik@fb.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Bart Van Assche 提交于
The payload of struct request is stored in the request.bio chain if the RQF_SPECIAL_PAYLOAD flag is not set and in request.special_vec if RQF_SPECIAL_PAYLOAD has been set. However, blk_update_request() iterates over req->bio whether or not RQF_SPECIAL_PAYLOAD has been set. Additionally, the RQF_SPECIAL_PAYLOAD flag is ignored by blk_rq_bytes() which means that the value returned by that function is incorrect if the RQF_SPECIAL_PAYLOAD flag has been set. It is not clear to me whether this is an oversight or whether this happened on purpose. Anyway, document that it is known that both functions ignore RQF_SPECIAL_PAYLOAD. See also commit f9d03f96 ("block: improve handling of the magic discard payload"). Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com> Cc: Ming Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Ming Lei 提交于
SCSI probing may synchronously create and destroy a lot of request_queues for non-existent devices. Any synchronize_rcu() in queue creation or destroy path may introduce long latency during booting, see detailed description in comment of blk_register_queue(). This patch removes one synchronize_rcu() inside blk_cleanup_queue() for this case, commit c2856ae2(blk-mq: quiesce queue before freeing queue) needs synchronize_rcu() for implementing blk_mq_quiesce_queue(), but when queue isn't initialized, it isn't necessary to do that since only pass-through requests are involved, no original issue in scsi_execute() at all. Without this patch and previous one, it may take more 20+ seconds for virtio-scsi to complete disk probe. With the two patches, the time becomes less than 100ms. Fixes: c2856ae2 ("blk-mq: quiesce queue before freeing queue") Reported-by: NAndrew Jones <drjones@redhat.com> Cc: Omar Sandoval <osandov@fb.com> Cc: Bart Van Assche <bart.vanassche@wdc.com> Cc: linux-scsi@vger.kernel.org Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: Christoph Hellwig <hch@lst.de> Tested-by: NAndrew Jones <drjones@redhat.com> Signed-off-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 28 6月, 2018 1 次提交
-
-
由 Bart Van Assche 提交于
This patch avoids that removing a path controlled by the dm-mpath driver while mkfs is running triggers the following kernel bug: kernel BUG at block/blk-core.c:3347! invalid opcode: 0000 [#1] PREEMPT SMP KASAN CPU: 20 PID: 24369 Comm: mkfs.ext4 Not tainted 4.18.0-rc1-dbg+ #2 RIP: 0010:blk_end_request_all+0x68/0x70 Call Trace: <IRQ> dm_softirq_done+0x326/0x3d0 [dm_mod] blk_done_softirq+0x19b/0x1e0 __do_softirq+0x128/0x60d irq_exit+0x100/0x110 smp_call_function_single_interrupt+0x90/0x330 call_function_single_interrupt+0xf/0x20 </IRQ> Fixes: f9d03f96 ("block: improve handling of the magic discard payload") Reviewed-by: NMing Lei <ming.lei@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Acked-by: NMike Snitzer <snitzer@redhat.com> Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com> Cc: Hannes Reinecke <hare@suse.com> Cc: Johannes Thumshirn <jthumshirn@suse.de> Cc: <stable@vger.kernel.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 20 6月, 2018 1 次提交
-
-
由 Bart Van Assche 提交于
Commit 0ba99ca4 ("block: Add warning for bi_next not NULL in bio_endio()") breaks the dm driver. end_clone_bio() detects whether or not a bio is the last bio associated with a request by checking the .bi_next field. Commit 0ba99ca4 clears that field before end_clone_bio() has had a chance to inspect that field. Hence revert commit 0ba99ca4. This patch avoids that KASAN reports the following complaint when running the srp-test software (srp-test/run_tests -c -d -r 10 -t 02-mq): ================================================================== BUG: KASAN: use-after-free in bio_advance+0x11b/0x1d0 Read of size 4 at addr ffff8801300e06d0 by task ksoftirqd/0/9 CPU: 0 PID: 9 Comm: ksoftirqd/0 Not tainted 4.18.0-rc1-dbg+ #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014 Call Trace: dump_stack+0xa4/0xf5 print_address_description+0x6f/0x270 kasan_report+0x241/0x360 __asan_load4+0x78/0x80 bio_advance+0x11b/0x1d0 blk_update_request+0xa7/0x5b0 scsi_end_request+0x56/0x320 [scsi_mod] scsi_io_completion+0x7d6/0xb20 [scsi_mod] scsi_finish_command+0x1c0/0x280 [scsi_mod] scsi_softirq_done+0x19a/0x230 [scsi_mod] blk_mq_complete_request+0x160/0x240 scsi_mq_done+0x50/0x1a0 [scsi_mod] srp_recv_done+0x515/0x1330 [ib_srp] __ib_process_cq+0xa0/0xf0 [ib_core] ib_poll_handler+0x38/0xa0 [ib_core] irq_poll_softirq+0xe8/0x1f0 __do_softirq+0x128/0x60d run_ksoftirqd+0x3f/0x60 smpboot_thread_fn+0x352/0x460 kthread+0x1c1/0x1e0 ret_from_fork+0x24/0x30 Allocated by task 1918: save_stack+0x43/0xd0 kasan_kmalloc+0xad/0xe0 kasan_slab_alloc+0x11/0x20 kmem_cache_alloc+0xfe/0x350 mempool_alloc_slab+0x15/0x20 mempool_alloc+0xfb/0x270 bio_alloc_bioset+0x244/0x350 submit_bh_wbc+0x9c/0x2f0 __block_write_full_page+0x299/0x5a0 block_write_full_page+0x16b/0x180 blkdev_writepage+0x18/0x20 __writepage+0x42/0x80 write_cache_pages+0x376/0x8a0 generic_writepages+0xbe/0x110 blkdev_writepages+0xe/0x10 do_writepages+0x9b/0x180 __filemap_fdatawrite_range+0x178/0x1c0 file_write_and_wait_range+0x59/0xc0 blkdev_fsync+0x46/0x80 vfs_fsync_range+0x66/0x100 do_fsync+0x3d/0x70 __x64_sys_fsync+0x21/0x30 do_syscall_64+0x77/0x230 entry_SYSCALL_64_after_hwframe+0x49/0xbe Freed by task 9: save_stack+0x43/0xd0 __kasan_slab_free+0x137/0x190 kasan_slab_free+0xe/0x10 kmem_cache_free+0xd3/0x380 mempool_free_slab+0x17/0x20 mempool_free+0x63/0x160 bio_free+0x81/0xa0 bio_put+0x59/0x60 end_bio_bh_io_sync+0x5d/0x70 bio_endio+0x1a7/0x360 blk_update_request+0xd0/0x5b0 end_clone_bio+0xa3/0xd0 [dm_mod] bio_endio+0x1a7/0x360 blk_update_request+0xd0/0x5b0 scsi_end_request+0x56/0x320 [scsi_mod] scsi_io_completion+0x7d6/0xb20 [scsi_mod] scsi_finish_command+0x1c0/0x280 [scsi_mod] scsi_softirq_done+0x19a/0x230 [scsi_mod] blk_mq_complete_request+0x160/0x240 scsi_mq_done+0x50/0x1a0 [scsi_mod] srp_recv_done+0x515/0x1330 [ib_srp] __ib_process_cq+0xa0/0xf0 [ib_core] ib_poll_handler+0x38/0xa0 [ib_core] irq_poll_softirq+0xe8/0x1f0 __do_softirq+0x128/0x60d The buggy address belongs to the object at ffff8801300e0640 which belongs to the cache bio-0 of size 200 The buggy address is located 144 bytes inside of 200-byte region [ffff8801300e0640, ffff8801300e0708) The buggy address belongs to the page: page:ffffea0004c03800 count:1 mapcount:0 mapping:ffff88015a563a00 index:0x0 compound_mapcount: 0 flags: 0x8000000000008100(slab|head) raw: 8000000000008100 dead000000000100 dead000000000200 ffff88015a563a00 raw: 0000000000000000 0000000000330033 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff8801300e0580: fb fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc ffff8801300e0600: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb >ffff8801300e0680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff8801300e0700: fb fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff8801300e0780: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ================================================================== Cc: Kent Overstreet <kent.overstreet@gmail.com> Fixes: 0ba99ca4 ("block: Add warning for bi_next not NULL in bio_endio()") Acked-by: NMike Snitzer <snitzer@redhat.com> Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 07 6月, 2018 1 次提交
-
-
由 Hannes Reinecke 提交于
blk_partition_remap() will only clear bi_partno if an actual remapping has happened. But flush request et al don't have an actual size, so the remapping doesn't happen and bi_partno is never cleared. So for stacked devices blk_partition_remap() will be called on each level. If (as is the case for native nvme multipathing) one of the lower-level devices do _not_support partitioning a spurious I/O error is generated. Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NHannes Reinecke <hare@suse.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 03 6月, 2018 1 次提交
-
-
由 Jens Axboe 提交于
If we end up splitting a bio and the queue goes away between the initial submission and the later split submission, then we can block forever in blk_queue_enter() waiting for the reference to drop to zero. This will never happen, since we already hold a reference. Mark a split bio as already having entered the queue, so we can just use the live non-blocking queue enter variant. Thanks to Tetsuo Handa for the analysis. Reported-by: syzbot+c4f9cebf9d651f6e54de@syzkaller.appspotmail.com Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 01 6月, 2018 3 次提交
-
-
由 Christoph Hellwig 提交于
Both callers take just around so function call, so move it in. Also remove the now pointless blk_mq_sched_init wrapper. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDamien Le Moal <damien.lemoal@wdc.com> Tested-by: NDamien Le Moal <damien.lemoal@wdc.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Reported-by: NDamien Le Moal <Damien.LeMoal@wdc.com> Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDamien Le Moal <damien.lemoal@wdc.com> Tested-by: NDamien Le Moal <damien.lemoal@wdc.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
No point in doing this in elevator_init. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reported-by: NDamien Le Moal <Damien.LeMoal@wdc.com> Reviewed-by: NDamien Le Moal <damien.lemoal@wdc.com> Tested-by: NDamien Le Moal <damien.lemoal@wdc.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 31 5月, 2018 1 次提交
-
-
由 Kent Overstreet 提交于
Convert the core block functionality to embedded bio sets. Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NKent Overstreet <kent.overstreet@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 29 5月, 2018 1 次提交
-
-
由 Keith Busch 提交于
This patch simplifies the timeout handling by relying on the request reference counting to ensure the iterator is operating on an inflight and truly timed out request. Since the reference counting prevents the tag from being reallocated, the block layer no longer needs to prevent drivers from completing their requests while the timeout handler is operating on it: a driver completing a request is allowed to proceed to the next state without additional syncronization with the block layer. This also removes any need for generation sequence numbers since the request lifetime is prevented from being reallocated as a new sequence while timeout handling is operating on it. To enables this a refcount is added to struct request so that request users can be sure they're operating on the same request without it changing while they're processing it. The request's tag won't be released for reuse until both the timeout handler and the completion are done with it. Signed-off-by: NKeith Busch <keith.busch@intel.com> [hch: slight cleanups, added back submission side hctx lock, use cmpxchg for completions] Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 15 5月, 2018 2 次提交
-
-
由 Kent Overstreet 提交于
Recently found a bug where a driver left bi_next not NULL and then called bio_endio(), and then the submitter of the bio used bio_copy_data() which was treating src and dst as lists of bios. Fixed that bug by splitting out bio_list_copy_data(), but in case other things are depending on bi_next in weird ways, add a warning to help avoid more bugs like that in the future. Signed-off-by: NKent Overstreet <kent.overstreet@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Kent Overstreet 提交于
Minor optimization - remove a pointer indirection when using fs_bio_set. Signed-off-by: NKent Overstreet <kent.overstreet@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 14 5月, 2018 4 次提交
-
-
由 Christoph Hellwig 提交于
We just can't do I/O when doing block layer requests allocations, so use GFP_NOIO instead of the even more limited __GFP_DIRECT_RECLAIM. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NHannes Reinecke <hare@suse.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
blk_old_get_request already has it at hand, and in blk_queue_bio, which is the fast path, it is constant. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NHannes Reinecke <hare@suse.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Switch everyone to blk_get_request_flags, and then rename blk_get_request_flags to blk_get_request. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NHannes Reinecke <hare@suse.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 09 5月, 2018 3 次提交
-
-
由 Omar Sandoval 提交于
Currently, struct request has four timestamp fields: - A start time, set at get_request time, in jiffies, used for iostats - An I/O start time, set at start_request time, in ktime nanoseconds, used for blk-stats (i.e., wbt, kyber, hybrid polling) - Another start time and another I/O start time, used for cfq and bfq These can all be consolidated into one start time and one I/O start time, both in ktime nanoseconds, shaving off up to 16 bytes from struct request depending on the kernel config. Signed-off-by: NOmar Sandoval <osandov@fb.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Omar Sandoval 提交于
struct blk_issue_stat squashes three things into one u64: - The time the driver started working on a request - The original size of the request (for the io.low controller) - Flags for writeback throttling It turns out that on x86_64, we have a 4 byte hole in struct request which we can fill with the non-timestamp fields from blk_issue_stat, simplifying things quite a bit. Signed-off-by: NOmar Sandoval <osandov@fb.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Omar Sandoval 提交于
issue_stat is going to go away, so first make writeback throttling take the containing request, update the internal wbt helpers accordingly, and change rwb->sync_cookie to be the request pointer instead of the issue_stat pointer. No functional change. Signed-off-by: NOmar Sandoval <osandov@fb.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 08 5月, 2018 1 次提交
-
-
由 Thomas Gleixner 提交于
Commit 9c40cef2 ("sched: Move blk_schedule_flush_plug() out of __schedule()") moved the blk_schedule_flush_plug() call out of the interrupt/preempt disabled region in the scheduler. This allows to replace local_irq_save/restore(flags) by local_irq_disable/enable() in blk_flush_plug_list(). But it makes more sense to disable interrupts explicitly when the request queue is locked end reenable them when the request to is unlocked. This shortens the interrupt disabled section which is important when the plug list contains requests for more than one queue. The comment which claims that disabling interrupts around the loop is misleading as the called functions can reenable interrupts unconditionally anyway and obfuscates the scope badly: local_irq_save(flags); spin_lock(q->queue_lock); ... queue_unplugged(q...); scsi_request_fn(); spin_unlock_irq(q->queue_lock); -------------------^^^ ???? spin_lock_irq(q->queue_lock); spin_unlock(q->queue_lock); local_irq_restore(flags); Aside of that the detached interrupt disabling is a constant pain for PREEMPT_RT as it requires patching and special casing when RT is enabled while with the spin_*_irq() variants this happens automatically. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20110622174919.025446432@linutronix.deSigned-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-