- 23 2月, 2016 2 次提交
-
-
由 Mike Snitzer 提交于
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Mike Snitzer 提交于
Request-based DM's blk-mq support (dm-mq) was reported to be 50% slower than if an underlying null_blk device were used directly. One of the reasons for this drop in performance is that blk_insert_clone_request() was calling blk_mq_insert_request() with @async=true. This forced the use of kblockd_schedule_delayed_work_on() to run the blk-mq hw queues which ushered in ping-ponging between process context (fio in this case) and kblockd's kworker to submit the cloned request. The ftrace function_graph tracer showed: kworker-2013 => fio-12190 fio-12190 => kworker-2013 ... kworker-2013 => fio-12190 fio-12190 => kworker-2013 ... Fixing blk_insert_clone_request()'s blk_mq_insert_request() call to _not_ use kblockd to submit the cloned requests isn't enough to eliminate the observed context switches. In addition to this dm-mq specific blk-core fix, there are 2 DM core fixes to dm-mq that (when paired with the blk-core fix) completely eliminate the observed context switching: 1) don't blk_mq_run_hw_queues in blk-mq request completion Motivated by desire to reduce overhead of dm-mq, punting to kblockd just increases context switches. In my testing against a really fast null_blk device there was no benefit to running blk_mq_run_hw_queues() on completion (and no other blk-mq driver does this). So hopefully this change doesn't induce the need for yet another revert like commit 621739b0 ! 2) use blk_mq_complete_request() in dm_complete_request() blk_complete_request() doesn't offer the traditional q->mq_ops vs .request_fn branching pattern that other historic block interfaces do (e.g. blk_get_request). Using blk_mq_complete_request() for blk-mq requests is important for performance. It should be noted that, like blk_complete_request(), blk_mq_complete_request() doesn't natively handle partial completions -- but the request-based DM-multipath target does provide the required partial completion support by dm.c:end_clone_bio() triggering requeueing of the request via dm-mpath.c:multipath_end_io()'s return of DM_ENDIO_REQUEUE. dm-mq fix #2 is _much_ more important than #1 for eliminating the context switches. Before: cpu : usr=15.10%, sys=59.39%, ctx=7905181, majf=0, minf=475 After: cpu : usr=20.60%, sys=79.35%, ctx=2008, majf=0, minf=472 With these changes multithreaded async read IOPs improved from ~950K to ~1350K for this dm-mq stacked on null_blk test-case. The raw read IOPs of the underlying null_blk device for the same workload is ~1950K. Fixes: 7fb4898e ("block: add blk-mq support to blk_insert_cloned_request()") Fixes: bfebd1cd ("dm: add full blk-mq support to request-based DM") Cc: stable@vger.kernel.org # 4.1+ Reported-by: NSagi Grimberg <sagig@dev.mellanox.co.il> Signed-off-by: NMike Snitzer <snitzer@redhat.com> Acked-by: NJens Axboe <axboe@kernel.dk>
-
- 22 2月, 2016 3 次提交
-
-
由 Mike Snitzer 提交于
Rename dm_get_live_table_for_ioctl to dm_grab_bdev_for_ioctl and have it do the dm_{get,put}_live_table() rather than split those operations. The dm_grab_bdev_for_ioctl() callers only care about the block_device associated with a singleton DM device so there isn't any need to retain a reference to the live DM table. It is sufficient to: 1) dm_get_live_table() 2) bdgrab() the bdev associated with the singleton table's target 3) dm_put_live_table() 4) bdput() the bdev Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Mike Snitzer 提交于
None of the callers actually used the returned target. Also, just reuse bdev pointer passed to dm_blk_ioctl(). Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Mike Snitzer 提交于
Using request-based DM mpath configured with the following stacking (.request_fn DM mpath ontop of scsi-mq paths): echo Y > /sys/module/scsi_mod/parameters/use_blk_mq echo N > /sys/module/dm_mod/parameters/use_blk_mq 'struct dm_rq_target_io' would leak if a request is requeued before a blk-mq clone is allocated (or fails to allocate). free_rq_tio() wasn't being called. kmemleak reported: unreferenced object 0xffff8800b90b98c0 (size 112): comm "kworker/7:1H", pid 5692, jiffies 4295056109 (age 78.589s) hex dump (first 32 bytes): 00 d0 5c 2c 03 88 ff ff 40 00 bf 01 00 c9 ff ff ..\,....@....... e0 d9 b1 34 00 88 ff ff 00 00 00 00 00 00 00 00 ...4............ backtrace: [<ffffffff81672b6e>] kmemleak_alloc+0x4e/0xb0 [<ffffffff811dbb63>] kmem_cache_alloc+0xc3/0x1e0 [<ffffffff8117eae5>] mempool_alloc_slab+0x15/0x20 [<ffffffff8117ec1e>] mempool_alloc+0x6e/0x170 [<ffffffffa00029ac>] dm_old_prep_fn+0x3c/0x180 [dm_mod] [<ffffffff812fbd78>] blk_peek_request+0x168/0x290 [<ffffffffa0003e62>] dm_request_fn+0xb2/0x1b0 [dm_mod] [<ffffffff812f66e3>] __blk_run_queue+0x33/0x40 [<ffffffff812f9585>] blk_delay_work+0x25/0x40 [<ffffffff81096fff>] process_one_work+0x14f/0x3d0 [<ffffffff81097715>] worker_thread+0x125/0x4b0 [<ffffffff8109ce88>] kthread+0xd8/0xf0 [<ffffffff8167cb8f>] ret_from_fork+0x3f/0x70 [<ffffffffffffffff>] 0xffffffffffffffff crash> struct -o dm_rq_target_io struct dm_rq_target_io { ... } SIZE: 112 Fixes: e5863d9a ("dm: allocate requests in target when stacking on blk-mq devices") Cc: stable@vger.kernel.org # 4.0+ Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
- 18 11月, 2015 2 次提交
-
-
由 Mike Snitzer 提交于
(Ab)using the @bdev passed to dm_blk_ioctl() opens the potential for targets' .prepare_ioctl to fail if they go on to check the bdev for !NULL. Fixes: e56f81e0 ("dm: refactor ioctl handling") Reported-by: NJunichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Junichi Nomura 提交于
dm-mpath retries ioctl, when no path is readily available and the device is configured to queue I/O in such a case. If you want to stop the retry before multipathd decides to turn off queueing mode, you could send signal for the process to exit from the loop. However the check of fatal signal has not carried along when commit 6c182cd8 ("dm mpath: fix ioctl deadlock when no paths") moved the loop from dm-mpath to dm core. As a result, we can't terminate such a process in the retry loop. Easy reproducer of the situation is: # dmsetup create mp --table '0 1024 multipath 0 0 0 0' # dmsetup message mp 0 'queue_if_no_path' # sg_inq /dev/mapper/mp then you should be able to terminate sg_inq by pressing Ctrl+C. Fixes: 6c182cd8 ("dm mpath: fix ioctl deadlock when no paths") Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com> Cc: Hannes Reinecke <hare@suse.de> Cc: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org
-
- 08 11月, 2015 1 次提交
-
-
由 Jens Axboe 提交于
No functional changes in this patch, but it prepares us for returning a more useful cookie related to the IO that was queued up. Signed-off-by: NJens Axboe <axboe@fb.com> Acked-by: NChristoph Hellwig <hch@lst.de> Acked-by: NKeith Busch <keith.busch@intel.com>
-
- 01 11月, 2015 4 次提交
-
-
由 Mikulas Patocka 提交于
Commit 54efd50b ("block: make generic_make_request handle arbitrarily sized bios") makes it possible for block devices to process large bios. In doing so that commit allocates a new queue->bio_split bioset for each block device, this bioset is used for allocating bios when the driver needs to split large bios. Each bioset allocates a workqueue process, thus the above commit increases the number of processes allocated per block device. DM doesn't need the queue->bio_split bioset, thus we can deallocate it. This reduces the number of allocated processes per bio-based DM device from 3 to 2. Also remove the call to blk_queue_split(), it is not needed because DM does its own splitting. Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Julia Lawall 提交于
Remove DM's unneeded NULL tests before calling these destroy functions, now that they check for NULL, thanks to these v4.3 commits: 3942d299 ("mm/slab_common: allow NULL cache pointer in kmem_cache_destroy()") 4e3ca3e0 ("mm/mempool: allow NULL `pool' pointer in mempool_destroy()") The semantic patch that makes this change is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ expression x; @@ -if (x != NULL) \(kmem_cache_destroy\|mempool_destroy\|dma_pool_destroy\)(x); // </smpl> Signed-off-by: NJulia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Christoph Hellwig 提交于
This adds support to pass through persistent reservation requests similar to the existing ioctl handling, and with the same limitations, e.g. devices may only have a single target attached. This is mostly intended for multipathing. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Christoph Hellwig 提交于
This moves the call to blkdev_ioctl and the argument checking to DM core code, and only leaves a callout to find the block device to operate on in the targets. This simplifies the code and allows us to pass through ioctl-like command using other methods in the next patch. Also split out a helper around calling the prepare_ioctl method that will be reused for persistent reservation handling. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
- 30 10月, 2015 1 次提交
-
-
由 Mikulas Patocka 提交于
Commit bfebd1cd ("dm: add full blk-mq support to request-based DM") moves the initialization of the fields backing_dev_info.congested_fn, backing_dev_info.congested_data and queuedata from the function dm_init_md_queue (that is called when the device is created) to dm_init_old_md_queue (that is called after the device type is determined). There is no locking when accessing these variables, thus it is possible for other parts of the kernel to briefly see this data in a transient state (e.g. queue->backing_dev_info.congested_fn initialized and md->queue->backing_dev_info.congested_data uninitialized, resulting in passing an incorrect parameter to the function dm_any_congested). This queue data is left initialized for blk-mq devices even though they that don't use it. Fixes: bfebd1cd ("dm: add full blk-mq support to request-based DM") Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org # v4.1+
-
- 22 10月, 2015 1 次提交
-
-
由 Dan Williams 提交于
Now that the integrity profile is statically allocated there is no work to do when shutting down an integrity enabled block device. Cc: Matthew Wilcox <willy@linux.intel.com> Cc: Mike Snitzer <snitzer@redhat.com> Cc: James Bottomley <JBottomley@Odin.com> Acked-by: NNeilBrown <neilb@suse.com> Acked-by: NKeith Busch <keith.busch@intel.com> Acked-by: NVishal Verma <vishal.l.verma@intel.com> Tested-by: NRoss Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 06 10月, 2015 1 次提交
-
-
由 Junichi Nomura 提交于
end_clone_bio() is a endio callback for clone bio and should check and save the clone's bi_error for error reporting. However, 4246a0b6 ("block: add a bi_error field to struct bio") changed the function to check the original bio's bi_error, which is 0. Without this fix, clone's error is ignored and reported to the original request as success. Thus data corruption will be observed. Fixes: 4246a0b6 ("block: add a bi_error field to struct bio") Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
- 01 10月, 2015 1 次提交
-
-
由 Junichi Nomura 提交于
__dm_destroy() takes io_barrier SRCU lock (dm_get_live_table) and suspend_lock in reverse order. Doing so can cause AB-BA deadlock: __dm_destroy dm_swap_table --------------------------------------------------- mutex_lock(suspend_lock) dm_get_live_table() srcu_read_lock(io_barrier) dm_sync_table() synchronize_srcu(io_barrier) .. waiting for dm_put_live_table() mutex_lock(suspend_lock) .. waiting for suspend_lock Fix this by taking the locks in proper order. Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com> Fixes: ab7c7bb6 ("dm: hold suspend_lock while suspending device during device deletion") Acked-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org
-
- 14 8月, 2015 2 次提交
-
-
由 Kent Overstreet 提交于
As generic_make_request() is now able to handle arbitrarily sized bios, it's no longer necessary for each individual block driver to define its own ->merge_bvec_fn() callback. Remove every invocation completely. Cc: Jens Axboe <axboe@kernel.dk> Cc: Lars Ellenberg <drbd-dev@lists.linbit.com> Cc: drbd-user@lists.linbit.com Cc: Jiri Kosina <jkosina@suse.cz> Cc: Yehuda Sadeh <yehuda@inktank.com> Cc: Sage Weil <sage@inktank.com> Cc: Alex Elder <elder@kernel.org> Cc: ceph-devel@vger.kernel.org Cc: Alasdair Kergon <agk@redhat.com> Cc: Mike Snitzer <snitzer@redhat.com> Cc: dm-devel@redhat.com Cc: Neil Brown <neilb@suse.de> Cc: linux-raid@vger.kernel.org Cc: Christoph Hellwig <hch@infradead.org> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Acked-by: NeilBrown <neilb@suse.de> (for the 'md' bits) Acked-by: NMike Snitzer <snitzer@redhat.com> Signed-off-by: NKent Overstreet <kent.overstreet@gmail.com> [dpark: also remove ->merge_bvec_fn() in dm-thin as well as dm-era-target, and resolve merge conflicts] Signed-off-by: NDongsu Park <dpark@posteo.net> Signed-off-by: NMing Lin <ming.l@ssi.samsung.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Kent Overstreet 提交于
The way the block layer is currently written, it goes to great lengths to avoid having to split bios; upper layer code (such as bio_add_page()) checks what the underlying device can handle and tries to always create bios that don't need to be split. But this approach becomes unwieldy and eventually breaks down with stacked devices and devices with dynamic limits, and it adds a lot of complexity. If the block layer could split bios as needed, we could eliminate a lot of complexity elsewhere - particularly in stacked drivers. Code that creates bios can then create whatever size bios are convenient, and more importantly stacked drivers don't have to deal with both their own bio size limitations and the limitations of the (potentially multiple) devices underneath them. In the future this will let us delete merge_bvec_fn and a bunch of other code. We do this by adding calls to blk_queue_split() to the various make_request functions that need it - a few can already handle arbitrary size bios. Note that we add the call _after_ any call to blk_queue_bounce(); this means that blk_queue_split() and blk_recalc_rq_segments() don't need to be concerned with bouncing affecting segment merging. Some make_request_fn() callbacks were simple enough to audit and verify they don't need blk_queue_split() calls. The skipped ones are: * nfhd_make_request (arch/m68k/emu/nfblock.c) * axon_ram_make_request (arch/powerpc/sysdev/axonram.c) * simdisk_make_request (arch/xtensa/platforms/iss/simdisk.c) * brd_make_request (ramdisk - drivers/block/brd.c) * mtip_submit_request (drivers/block/mtip32xx/mtip32xx.c) * loop_make_request * null_queue_bio * bcache's make_request fns Some others are almost certainly safe to remove now, but will be left for future patches. Cc: Jens Axboe <axboe@kernel.dk> Cc: Christoph Hellwig <hch@infradead.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Ming Lei <ming.lei@canonical.com> Cc: Neil Brown <neilb@suse.de> Cc: Alasdair Kergon <agk@redhat.com> Cc: Mike Snitzer <snitzer@redhat.com> Cc: dm-devel@redhat.com Cc: Lars Ellenberg <drbd-dev@lists.linbit.com> Cc: drbd-user@lists.linbit.com Cc: Jiri Kosina <jkosina@suse.cz> Cc: Geoff Levand <geoff@infradead.org> Cc: Jim Paris <jim@jtan.com> Cc: Philip Kelleher <pjk1939@linux.vnet.ibm.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Oleg Drokin <oleg.drokin@intel.com> Cc: Andreas Dilger <andreas.dilger@intel.com> Acked-by: NeilBrown <neilb@suse.de> (for the 'md/md.c' bits) Acked-by: NMike Snitzer <snitzer@redhat.com> Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com> Signed-off-by: NKent Overstreet <kent.overstreet@gmail.com> [dpark: skip more mq-based drivers, resolve merge conflicts, etc.] Signed-off-by: NDongsu Park <dpark@posteo.net> Signed-off-by: NMing Lin <ming.l@ssi.samsung.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 12 8月, 2015 1 次提交
-
-
由 Mikulas Patocka 提交于
In properly written code we should not assume that DM_MAPIO_SUBMITTED is zero. We should test the return value for DM_MAPIO_SUBMITTED rather than testing it for zero. Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
- 04 8月, 2015 1 次提交
-
-
由 Mike Snitzer 提交于
A DM regression on 32 bit systems was reported against v4.2-rc3 here: https://lkml.org/lkml/2015/7/29/401 Fix this by reverting both commit 1c220c69 ("dm: fix casting bug in dm_merge_bvec()") and 148e51ba ("dm: improve documentation and code clarity in dm_merge_bvec"). This combined revert is done to eliminate the possibility of a partial revert in stable@ kernels. In hindsight the correct fix, at the time 1c220c69 was applied to fix the regression that 148e51ba introduced, should've been to simply revert 148e51ba. Reported-by: NJosh Boyer <jwboyer@fedoraproject.org> Tested-by: NAdam Williamson <awilliam@redhat.com> Acked-by: NJoe Thornber <ejt@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org # 3.19+
-
- 29 7月, 2015 1 次提交
-
-
由 Christoph Hellwig 提交于
Currently we have two different ways to signal an I/O error on a BIO: (1) by clearing the BIO_UPTODATE flag (2) by returning a Linux errno value to the bi_end_io callback The first one has the drawback of only communicating a single possible error (-EIO), and the second one has the drawback of not beeing persistent when bios are queued up, and are not passed along from child to parent bio in the ever more popular chaining scenario. Having both mechanisms available has the additional drawback of utterly confusing driver authors and introducing bugs where various I/O submitters only deal with one of them, and the others have to add boilerplate code to deal with both kinds of error returns. So add a new bi_error field to store an errno value directly in struct bio and remove the existing mechanisms to clean all this up. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NHannes Reinecke <hare@suse.de> Reviewed-by: NNeilBrown <neilb@suse.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 13 7月, 2015 1 次提交
-
-
由 Mikulas Patocka 提交于
Linux 4.2-rc1 Commit 0f20972f ("dm: factor out a common cleanup_mapped_device()") moved a common cleanup code to a separate function. Unfortunately, that commit incorrectly changed the order of cleanup, so that it destroys the mapped_device's srcu structure 'io_barrier' before destroying its workqueue. The function that is executed on the workqueue (dm_wq_work) uses the srcu structure, thus it may use it after being freed. That results in a crash in the LVM test suite's mirror-vgreduce-removemissing.sh test. Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Fixes: 0f20972f ("dm: factor out a common cleanup_mapped_device()") Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
- 09 7月, 2015 1 次提交
-
-
由 Mike Snitzer 提交于
This reverts commit 9a0e609e. (Resolved a conflict during revert due to commit bfebd1cd that came after) This revert is motivated by a couple failure reports on request-based DM multipath testbeds: 1) Netapp reported that their multipath fault injection test under heavy IO load can stall longer than 300 seconds. 2) IBM reported elevated lock contention in their testbed (likely due to increased back pressure due to IO not being dispatched as quickly): https://www.redhat.com/archives/dm-devel/2015-July/msg00057.htmlSigned-off-by: NMike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org # 4.1+
-
- 26 6月, 2015 2 次提交
-
-
由 Mike Snitzer 提交于
This reverts commit 5f1b670d. Justification for revert as reported in this dm-devel post: https://www.redhat.com/archives/dm-devel/2015-June/msg00160.html this change should not be pushed to mainline yet. Firstly, Christoph has a newer version of the patch that fixes silent data corruption problem: https://www.redhat.com/archives/dm-devel/2015-May/msg00229.html And the new version still depends on LLDDs to always complete requests to the end when error happens, while block API doesn't enforce such a requirement. If the assumption is ever broken, the inconsistency between request and bio (e.g. rq->__sector and rq->bio) will cause silent data corruption: https://www.redhat.com/archives/dm-devel/2015-June/msg00022.htmlReported-by: NJunichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Mike Snitzer 提交于
This reverts commit cbc4e3c1. Reported-by: NJunichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
- 18 6月, 2015 1 次提交
-
-
由 Mikulas Patocka 提交于
This makes it possible to use dm stats with DM multipath. Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
- 02 6月, 2015 1 次提交
-
-
由 Tejun Heo 提交于
Currently, a bdi (backing_dev_info) embeds single wb (bdi_writeback) and the role of the separation is unclear. For cgroup support for writeback IOs, a bdi will be updated to host multiple wb's where each wb serves writeback IOs of a different cgroup on the bdi. To achieve that, a wb should carry all states necessary for servicing writeback IOs for a cgroup independently. This patch moves bdi->state into wb. * enum bdi_state is renamed to wb_state and the prefix of all enums is changed from BDI_ to WB_. * Explicit zeroing of bdi->state is removed without adding zeoring of wb->state as the whole data structure is zeroed on init anyway. * As there's still only one bdi_writeback per backing_dev_info, all uses of bdi->state are mechanically replaced with bdi->wb.state introducing no behavior changes. Signed-off-by: NTejun Heo <tj@kernel.org> Reviewed-by: NJan Kara <jack@suse.cz> Cc: Jens Axboe <axboe@kernel.dk> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: drbd-dev@lists.linbit.com Cc: Neil Brown <neilb@suse.de> Cc: Alasdair Kergon <agk@redhat.com> Cc: Mike Snitzer <snitzer@redhat.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 30 5月, 2015 4 次提交
-
-
由 Mike Snitzer 提交于
Introduce a single common method for cleaning up a DM device's mapped_device. No functional change, just eliminates duplication of delicate mapped_device cleanup code. Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Mike Snitzer 提交于
More often than not a request that is requeued _is_ mapped (meaning the clone request is allocated and clone->q is initialized). Rename dm_requeue_unmapped_original_request() to avoid potential confusion due to function name containing "unmapped". Also, remove dm_requeue_unmapped_request() since callers can easily call the dm_requeue_original_request() directly. Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Mike Snitzer 提交于
Do not allocate the io_pool mempool for blk-mq request-based DM (DM_TYPE_MQ_REQUEST_BASED) in dm_alloc_rq_mempools(). Also refine __bind_mempools() to have more precise awareness of which mempools each type of DM device uses -- avoids mempool churn when reloading DM tables (particularly for DM_TYPE_REQUEST_BASED). Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Joe Thornber 提交于
dm_merge_bvec() was originally added in f6fccb ("dm: introduce merge_bvec_fn"). In that commit a value in sectors is converted to bytes using << 9, and then assigned to an int. This code made assumptions about the value of BIO_MAX_SECTORS. A later commit 148e51 ("dm: improve documentation and code clarity in dm_merge_bvec") was meant to have no functional change but it removed the use of BIO_MAX_SECTORS in favor of using queue_max_sectors(). At this point the cast from sector_t to int resulted in a zero value. The fallout being dm_merge_bvec() would only allow a single page to be added to a bio. This interim fix is minimal for the benefit of stable@ because the more comprehensive cleanup of passing a sector_t to all DM targets' merge function will impact quite a few DM targets. Signed-off-by: NJoe Thornber <ejt@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org # 3.19+
-
- 29 5月, 2015 1 次提交
-
-
由 Mike Snitzer 提交于
When stacking request-based dm device on non blk-mq device and device-mapper target could not map the request (error target is used, multipath target with all paths down, etc), the WARN_ON_ONCE() in free_rq_clone() will trigger when it shouldn't. The warning was added by commit aa6df8dd ("dm: fix free_rq_clone() NULL pointer when requeueing unmapped request"). But free_rq_clone() with clone->q == NULL is valid usage for the case where dm_kill_unmapped_request() initiates request cleanup. Fix this false warning by just removing the WARN_ON -- it only generated false positives and was never useful in catching the intended case (completing clone request not being mapped e.g. clone->q being NULL). Fixes: aa6df8dd ("dm: fix free_rq_clone() NULL pointer when requeueing unmapped request") Reported-by: NBart Van Assche <bart.vanassche@sandisk.com> Reported-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
- 28 5月, 2015 1 次提交
-
-
由 Mike Snitzer 提交于
Use BLK_MQ_RQ_QUEUE_BUSY to requeue a blk-mq request directly from the DM blk-mq device's .queue_rq. This cleans up the previous convoluted handling of request requeueing that would return BLK_MQ_RQ_QUEUE_OK (even though it wasn't) and then run blk_mq_requeue_request() followed by blk_mq_kick_requeue_list(). Also, document that DM blk-mq ontop of old request_fn devices cannot fail in clone_rq() since the clone request is preallocated as part of the pdu. Reported-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
- 27 5月, 2015 1 次提交
-
-
由 Junichi Nomura 提交于
When stacking request-based DM on blk_mq device, request cloning and remapping are done in a single call to target's clone_and_map_rq(). The clone is allocated and valid only if clone_and_map_rq() returns DM_MAPIO_REMAPPED. The "IS_ERR(clone)" check in map_request() does not cover all the !DM_MAPIO_REMAPPED cases that are possible (E.g. if underlying devices are not ready or unavailable, clone_and_map_rq() may return DM_MAPIO_REQUEUE without ever having established an ERR_PTR). Fix this by explicitly checking for a return that is not DM_MAPIO_REMAPPED in map_request(). Without this fix, DM core may call setup_clone() for a NULL clone and oops like this: BUG: unable to handle kernel NULL pointer dereference at 0000000000000068 IP: [<ffffffff81227525>] blk_rq_prep_clone+0x7d/0x137 ... CPU: 2 PID: 5793 Comm: kdmwork-253:3 Not tainted 4.0.0-nm #1 ... Call Trace: [<ffffffffa01d1c09>] map_tio_request+0xa9/0x258 [dm_mod] [<ffffffff81071de9>] kthread_worker_fn+0xfd/0x150 [<ffffffff81071cec>] ? kthread_parkme+0x24/0x24 [<ffffffff81071cec>] ? kthread_parkme+0x24/0x24 [<ffffffff81071fdd>] kthread+0xe6/0xee [<ffffffff81093a59>] ? put_lock_stats+0xe/0x20 [<ffffffff81071ef7>] ? __init_kthread_worker+0x5b/0x5b [<ffffffff814c2d98>] ret_from_fork+0x58/0x90 [<ffffffff81071ef7>] ? __init_kthread_worker+0x5b/0x5b Fixes: e5863d9a ("dm: allocate requests in target when stacking on blk-mq devices") Reported-by: NBart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org # 4.0+
-
- 26 5月, 2015 1 次提交
-
-
由 Junichi Nomura 提交于
Without kicking queue, requeued request may stay forever in the queue if there are no other I/O activities to the device. The original error had been in v2.6.39 with commit 7eaceacc ("block: remove per-queue plugging"), which replaced conditional plugging by periodic runqueue. Commit 9d1deb83 in v4.1-rc1 removed the periodic runqueue and the problem started to manifest. Fixes: 9d1deb83 ("dm: don't schedule delayed run of the queue if nothing to do") Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
- 22 5月, 2015 1 次提交
-
-
由 Christoph Hellwig 提交于
Currently dm-multipath has to clone the bios for every request sent to the lower devices, which wastes cpu cycles and ties down memory. This patch instead adds a new REQ_CLONE flag that instructs req_bio_endio to not complete bios attached to a request, which we set on clone requests similar to bios in a flush sequence. With this change I/O errors on a path failure only get propagated to dm-multipath, which can then either resubmit the I/O or complete the bios on the original request. I've done some basic testing of this on a Linux target with ALUA support, and it survives path failures during I/O nicely. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NMike Snitzer <snitzer@redhat.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 30 4月, 2015 2 次提交
-
-
由 Mike Snitzer 提交于
Commit 02233342 ("dm: optimize dm_mq_queue_rq to _not_ use kthread if using pure blk-mq") mistakenly removed free_rq_clone()'s clone->q check before testing clone->q->mq_ops. It was an oversight to discontinue that check for 1 of the 2 use-cases for free_rq_clone(): 1) free_rq_clone() called when an unmapped original request is requeued 2) free_rq_clone() called in the request-based IO completion path The clone->q check made sense for case #1 but not for #2. However, we cannot just reinstate the check as it'd mask a serious bug in the IO completion case #2 -- no in-flight request should have an uninitialized request_queue (basic block layer refcounting _should_ ensure this). The NULL pointer seen for case #1 is detailed here: https://www.redhat.com/archives/dm-devel/2015-April/msg00160.html Fix this free_rq_clone() NULL pointer by simply checking if the mapped_device's type is DM_TYPE_MQ_REQUEST_BASED (clone's queue is blk-mq) rather than checking clone->q->mq_ops. This avoids the need to dereference clone->q, but a WARN_ON_ONCE is added to let us know if an uninitialized clone request is being completed. Reported-by: NBart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Christoph Hellwig 提交于
Commit bfebd1cd ("dm: add full blk-mq support to request-based DM") didn't properly account for the need to short-circuit re-initializing DM's blk-mq request_queue if it was already initialized. Otherwise, reloading a blk-mq request-based DM table (either manually or via multipathd) resulted in errors, see: https://www.redhat.com/archives/dm-devel/2015-April/msg00132.html Fix is to only initialize the request_queue on the initial table load (when the mapped_device type is assigned). This is better than having dm_init_request_based_blk_mq_queue() return early if the queue was already initialized because it elevates the constraint to a more meaningful location in DM core. As such the pre-existing early return in dm_init_request_based_queue() can now be removed. Fixes: bfebd1cd ("dm: add full blk-mq support to request-based DM") Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
- 16 4月, 2015 2 次提交
-
-
由 Sami Tolvanen 提交于
Add device specific modes to dm-verity to specify how corrupted blocks should be handled. The following modes are defined: - DM_VERITY_MODE_EIO is the default behavior, where reading a corrupted block results in -EIO. - DM_VERITY_MODE_LOGGING only logs corrupted blocks, but does not block the read. - DM_VERITY_MODE_RESTART calls kernel_restart when a corrupted block is discovered. In addition, each mode sends a uevent to notify userspace of corruption and to allow further recovery actions. The driver defaults to previous behavior (DM_VERITY_MODE_EIO) and other modes can be enabled with an additional parameter to the verity table. Signed-off-by: NSami Tolvanen <samitolvanen@google.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Mike Snitzer 提交于
Request-based DM's blk-mq support defaults to off; but a user can easily change the default using the dm_mod.use_blk_mq module/boot option. Also, you can check what mode a given request-based DM device is using with: cat /sys/block/dm-X/dm/use_blk_mq This change enabled further cleanup and reduced work (e.g. the md->io_pool and md->rq_pool isn't created if using blk-mq). Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-