- 22 5月, 2015 1 次提交
-
-
由 Christoph Hellwig 提交于
Currently dm-multipath has to clone the bios for every request sent to the lower devices, which wastes cpu cycles and ties down memory. This patch instead adds a new REQ_CLONE flag that instructs req_bio_endio to not complete bios attached to a request, which we set on clone requests similar to bios in a flush sequence. With this change I/O errors on a path failure only get propagated to dm-multipath, which can then either resubmit the I/O or complete the bios on the original request. I've done some basic testing of this on a Linux target with ALUA support, and it survives path failures during I/O nicely. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NMike Snitzer <snitzer@redhat.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 16 4月, 2015 2 次提交
-
-
由 Mike Snitzer 提交于
Request-based DM's blk-mq support defaults to off; but a user can easily change the default using the dm_mod.use_blk_mq module/boot option. Also, you can check what mode a given request-based DM device is using with: cat /sys/block/dm-X/dm/use_blk_mq This change enabled further cleanup and reduced work (e.g. the md->io_pool and md->rq_pool isn't created if using blk-mq). Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Mike Snitzer 提交于
Otherwise, for sequential workloads, the dm_request_fn can allow excessive request merging at the expense of increased service time. Add a per-device sysfs attribute to allow the user to control how long a request, that is a reasonable merge candidate, can be queued on the request queue. The resolution of this request dispatch deadline is in microseconds (ranging from 1 to 100000 usecs), to set a 20us deadline: echo 20 > /sys/block/dm-7/dm/rq_based_seq_io_merge_deadline The dm_request_fn's merge heuristic and associated extra accounting is disabled by default (rq_based_seq_io_merge_deadline is 0). This sysfs attribute is not applicable to bio-based DM devices so it will only ever report 0 for them. By allowing a request to remain on the queue it will block others requests on the queue. But introducing a short dequeue delay has proven very effective at enabling certain sequential IO workloads on really fast, yet IOPS constrained, devices to build up slightly larger IOs -- yielding 90+% throughput improvements. Having precise control over the time taken to wait for larger requests to build affords control beyond that of waiting for certain IO sizes to accumulate (which would require a deadline anyway). This knob will only ever make sense with sequential IO workloads and the particular value used is storage configuration specific. Given the expected niche use-case for when this knob is useful it has been deemed acceptable to expose this relatively crude method for crafting optimal IO on specific storage -- especially given the solution is simple yet effective. In the context of DM multipath, it is advisable to tune this sysfs attribute to a value that offers the best performance for the common case (e.g. if 4 paths are expected active, tune for that; if paths fail then performance may be slightly reduced). Alternatives were explored to have request-based DM autotune this value (e.g. if/when paths fail) but they were quickly deemed too fragile and complex to warrant further design and development time. If this problem proves more common as faster storage emerges we'll have to look at elevating a generic solution into the block core. Tested-by: NShiva Krishna Merla <shivakrishna.merla@netapp.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
- 01 4月, 2015 1 次提交
-
-
由 Mike Snitzer 提交于
DM multipath is the only caller of blk_lld_busy() -- which calls a queue's lld_busy_fn hook. Request-based DM doesn't support stacking multipath devices so there is no reason to register the lld_busy_fn hook on a multipath device's queue using blk_queue_lld_busy(). As such, remove functions dm_lld_busy and dm_table_any_busy_target. Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
- 10 2月, 2015 2 次提交
-
-
由 Mike Snitzer 提交于
Otherwise replacing the multipath target with the error target fails: device-mapper: ioctl: can't change device type after initial table load. The error target was mistakenly considered to be target type DM_TYPE_REQUEST_BASED rather than DM_TYPE_MQ_REQUEST_BASED even if the target it was to replace was of type DM_TYPE_MQ_REQUEST_BASED. Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
由 Mike Snitzer 提交于
For blk-mq request-based DM the responsibility of allocating a cloned request is transfered from DM core to the target type. Doing so enables the cloned request to be allocated from the appropriate blk-mq request_queue's pool (only the DM target, e.g. multipath, can know which block device to send a given cloned request to). Care was taken to preserve compatibility with old-style block request completion that requires request-based DM _not_ acquire the clone request's queue lock in the completion path. As such, there are now 2 different request-based DM target_type interfaces: 1) the original .map_rq() interface will continue to be used for non-blk-mq devices -- the preallocated clone request is passed in from DM core. 2) a new .clone_and_map_rq() and .release_clone_rq() will be used for blk-mq devices -- blk_get_request() and blk_put_request() are used respectively from these hooks. dm_table_set_type() was updated to detect if the request-based target is being stacked on blk-mq devices, if so DM_TYPE_MQ_REQUEST_BASED is set. DM core disallows switching the DM table's type after it is set. This means that there is no mixing of non-blk-mq and blk-mq devices within the same request-based DM table. [This patch was started by Keith and later heavily modified by Mike] Tested-by: NBart Van Assche <bvanassche@acm.org> Signed-off-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
- 20 11月, 2014 2 次提交
-
-
由 Mike Snitzer 提交于
Rename dm_internal_{suspend,resume} to dm_internal_{suspend,resume}_fast -- dm-stats will continue using these methods to avoid all the extra suspend/resume logic that is not needed in order to quickly flush IO. Introduce dm_internal_suspend_noflush() variant that actually calls the mapped_device's target callbacks -- otherwise target-specific hooks are avoided (e.g. dm-thin's thin_presuspend and thin_postsuspend). Common code between dm_internal_{suspend_noflush,resume} and dm_{suspend,resume} was factored out as __dm_{suspend,resume}. Update dm_internal_{suspend_noflush,resume} to always take and release the mapped_device's suspend_lock. Also update dm_{suspend,resume} to be aware of potential for DM_INTERNAL_SUSPEND_FLAG to be set and respond accordingly by interruptibly waiting for the DM_INTERNAL_SUSPEND_FLAG to be cleared. Add lockdep annotation to dm_suspend() and dm_resume(). The existing DM_SUSPEND_FLAG remains unchanged. DM_INTERNAL_SUSPEND_FLAG is set by dm_internal_suspend_noflush() and cleared by dm_internal_resume(). Both DM_SUSPEND_FLAG and DM_INTERNAL_SUSPEND_FLAG may be set if a device was already suspended when dm_internal_suspend_noflush() was called -- this can be thought of as a "nested suspend". A "nested suspend" can occur with legacy userspace dm-thin code that might suspend all active thin volumes before suspending the pool for resize. But otherwise, in the normal dm-thin-pool suspend case moving forward: the thin-pool will have DM_SUSPEND_FLAG set and all active thins from that thin-pool will have DM_INTERNAL_SUSPEND_FLAG set. Also add DM_INTERNAL_SUSPEND_FLAG to status report. This new DM_INTERNAL_SUSPEND_FLAG state is being reported to assist with debugging (e.g. 'dmsetup info' will report an internally suspended device accordingly). Signed-off-by: NMike Snitzer <snitzer@redhat.com> Acked-by: NJoe Thornber <ejt@redhat.com>
-
由 Mike Snitzer 提交于
The DM thin-pool target now must undo the changes performed during pool_presuspend() so introduce presuspend_undo hook in target_type. Signed-off-by: NMike Snitzer <snitzer@redhat.com> Acked-by: NJoe Thornber <ejt@redhat.com>
-
- 06 10月, 2014 1 次提交
-
-
由 Benjamin Marzinski 提交于
Until this change, when loading a new DM table, DM core would re-open all of the devices in the DM table. Now, DM core will avoid redundant device opens (and closes when destroying the old table) if the old table already has a device open using the same mode. This is achieved by managing reference counts on the table_devices that DM core now stores in the mapped_device structure (rather than in the dm_table structure). So a mapped_device's active and inactive dm_tables' dm_dev lists now just point to the dm_devs stored in the mapped_device's table_devices list. This improvement in DM core's device reference counting has the side-effect of fixing a long-standing limitation of the multipath target: a DM multipath table couldn't include any paths that were unusable (failed). For example: if all paths have failed and you add a new, working, path to the table; you can't use it since the table load would fail due to it still containing failed paths. Now a re-load of a multipath table can include failed devices and when those devices become active again they can be used instantly. The device list code in dm.c isn't a straight copy/paste from the code in dm-table.c, but it's very close (aside from some variable renames). One subtle difference is that find_table_device for the tables_devices list will only match devices with the same name and mode. This is because we don't want to upgrade a device's mode in the active table when an inactive table is loaded. Access to the mapped_device structure's tables_devices list requires a mutex (tables_devices_lock), so that tables cannot be created and destroyed concurrently. Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
- 02 8月, 2014 1 次提交
-
-
由 Mikulas Patocka 提交于
The function dm_table_supports_discards is only called from dm-table.c:dm_table_set_restrictions(). So move it above dm_table_set_restrictions and make it static. Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
- 28 3月, 2014 2 次提交
-
-
由 Mike Snitzer 提交于
Introduce dm_table_run_md_queue_async() to run the request_queue of the mapped_device associated with a request-based DM table. Also add dm_md_get_queue() wrapper to extract the request_queue from a mapped_device. Signed-off-by: NMike Snitzer <snitzer@redhat.com> Signed-off-by: NHannes Reinecke <hare@suse.de> Reviewed-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
-
由 Mikulas Patocka 提交于
Make the function dm_table_alloc_md_mempools static because it is not called from another file. Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
- 15 1月, 2014 1 次提交
-
-
由 Mikulas Patocka 提交于
This reverts commit be35f486 ("dm: wait until embedded kobject is released before destroying a device") and provides an improved fix. The kobject release code that calls the completion must be placed in a non-module file, otherwise there is a module unload race (if the process calling dm_kobject_release is preempted and the DM module unloaded after the completion is triggered, but before dm_kobject_release returns). To fix this race, this patch moves the completion code to dm-builtin.c which is always compiled directly into the kernel if BLK_DEV_DM is selected. The patch introduces a new dm_kobject_holder structure, its purpose is to keep the completion and kobject in one place, so that it can be accessed from non-module code without the need to export the layout of struct mapped_device to that code. Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org
-
- 08 1月, 2014 1 次提交
-
-
由 Mikulas Patocka 提交于
There may be other parts of the kernel holding a reference on the dm kobject. We must wait until all references are dropped before deallocating the mapped_device structure. The dm_kobject_release method signals that all references are dropped via completion. But dm_kobject_release doesn't free the kobject (which is embedded in the mapped_device structure). This is the sequence of operations: * when destroying a DM device, call kobject_put from dm_sysfs_exit * wait until all users stop using the kobject, when it happens the release method is called * the release method signals the completion and should return without delay * the dm device removal code that waits on the completion continues * the dm device removal code drops the dm_mod reference the device had * the dm device removal code frees the mapped_device structure that contains the kobject Using kobject this way should avoid the module unload race that was mentioned at the beginning of this thread: https://lkml.org/lkml/2014/1/4/83Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org
-
- 10 11月, 2013 1 次提交
-
-
由 Mikulas Patocka 提交于
This patch allows the removal of an open device to be deferred until it is closed. (Previously such a removal attempt would fail.) The deferred remove functionality is enabled by setting the flag DM_DEFERRED_REMOVE in the ioctl structure on DM_DEV_REMOVE or DM_REMOVE_ALL ioctl. On return from DM_DEV_REMOVE, the flag DM_DEFERRED_REMOVE indicates if the device was removed immediately or flagged to be removed on close - if the flag is clear, the device was removed. On return from DM_DEV_STATUS and other ioctls, the flag DM_DEFERRED_REMOVE is set if the device is scheduled to be removed on closure. A device that is scheduled to be deleted can be revived using the message "@cancel_deferred_remove". This message clears the DMF_DEFERRED_REMOVE flag so that the device won't be deleted on close. Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
- 23 9月, 2013 2 次提交
-
-
由 Mike Snitzer 提交于
Allow user to change the number of IOs that are reserved by bio-based DM's mempools by writing to this file: /sys/module/dm_mod/parameters/reserved_bio_based_ios The default value is RESERVED_BIO_BASED_IOS (16). The maximum allowed value is RESERVED_MAX_IOS (1024). Export dm_get_reserved_bio_based_ios() for use by DM targets and core code. Switch to sizing dm-io's mempool and bioset using DM core's configurable 'reserved_bio_based_ios'. Signed-off-by: NMike Snitzer <snitzer@redhat.com> Signed-off-by: NFrank Mayhar <fmayhar@google.com>
-
由 Mike Snitzer 提交于
Allow user to change the number of IOs that are reserved by request-based DM's mempools by writing to this file: /sys/module/dm_mod/parameters/reserved_rq_based_ios The default value is RESERVED_REQUEST_BASED_IOS (256). The maximum allowed value is RESERVED_MAX_IOS (1024). Export dm_get_reserved_rq_based_ios() for use by DM targets and core code. Switch to sizing dm-mpath's mempool using DM core's configurable 'reserved_rq_based_ios'. Signed-off-by: NMike Snitzer <snitzer@redhat.com> Signed-off-by: NFrank Mayhar <fmayhar@google.com> Acked-by: NMikulas Patocka <mpatocka@redhat.com>
-
- 06 9月, 2013 2 次提交
-
-
由 Mikulas Patocka 提交于
Support the collection of I/O statistics on user-defined regions of a DM device. If no regions are defined no statistics are collected so there isn't any performance impact. Only bio-based DM devices are currently supported. Each user-defined region specifies a starting sector, length and step. Individual statistics will be collected for each step-sized area within the range specified. The I/O statistics counters for each step-sized area of a region are in the same format as /sys/block/*/stat or /proc/diskstats but extra counters (12 and 13) are provided: total time spent reading and writing in milliseconds. All these counters may be accessed by sending the @stats_print message to the appropriate DM device via dmsetup. The creation of DM statistics will allocate memory via kmalloc or fallback to using vmalloc space. At most, 1/4 of the overall system memory may be allocated by DM statistics. The admin can see how much memory is used by reading /sys/module/dm_mod/parameters/stats_current_allocated_bytes See Documentation/device-mapper/statistics.txt for more details. Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Mike Snitzer 提交于
It may be useful to switch a request-based table to the "error" target. Enhance the DM core to allow a hybrid target_type which is capable of handling either bios (via .map) or requests (via .map_rq). Add a request-based map function (.map_rq) to the "error" target_type; making it DM's first hybrid target. Train dm_table_set_type() to prefer the mapped device's established type (request-based or bio-based). If the mapped device doesn't have an established type default to making the table with the hybrid target(s) bio-based. Tested 'dmsetup wipe_table' to work on both bio-based and request-based devices. Signed-off-by: NMike Snitzer <snitzer@redhat.com> Signed-off-by: NJoe Jin <joe.jin@oracle.com> Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com> Acked-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
- 22 12月, 2012 1 次提交
-
-
由 Mikulas Patocka 提交于
Introduce a field per_bio_data_size in struct dm_target. Targets can set this field in the constructor. If a target sets this field to a non-zero value, "per_bio_data_size" bytes of auxiliary data are allocated for each bio submitted to the target. These data can be used for any purpose by the target and help us improve performance by removing some per-target mempools. Per-bio data is accessed with dm_per_bio_data. The argument data_size must be the same as the value per_bio_data_size in dm_target. If the target has a pointer to per_bio_data, it can get a pointer to the bio with dm_bio_from_per_bio_data() function (data_size must be the same as the value passed to dm_per_bio_data). Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
- 27 9月, 2012 1 次提交
-
-
由 Mike Snitzer 提交于
Add a safety net that will re-use the DM device's existing limits in the event that DM device has a temporary table that doesn't have any component devices. This is to reduce the chance that requests not respecting the hardware limits will reach the device. DM recalculates queue limits based only on devices which currently exist in the table. This creates a problem in the event all devices are temporarily removed such as all paths being lost in multipath. DM will reset the limits to the maximum permissible, which can then assemble requests which exceed the limits of the paths when the paths are restored. The request will fail the blk_rq_check_limits() test when sent to a path with lower limits, and will be retried without end by multipath. This became a much bigger issue after v3.6 commit fe86cdce ("block: do not artificially constrain max_sectors for stacking drivers"). Reported-by: NDavid Jeffery <djeffery@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
- 27 7月, 2012 1 次提交
-
-
由 Alasdair G Kergon 提交于
Commit outstanding metadata before returning the status for a dm thin pool so that the numbers reported are as up-to-date as possible. The commit is not performed if the device is suspended or if the DM_NOFLUSH_FLAG is supplied by userspace and passed to the target through a new 'status_flags' parameter in the target's dm_status_fn. The userspace dmsetup tool will support the --noflush flag with the 'dmsetup status' and 'dmsetup wait' commands from version 1.02.76 onwards. Tested-by: NMike Snitzer <snitzer@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
- 01 11月, 2011 1 次提交
-
-
由 Alasdair G Kergon 提交于
Introduce DM_TARGET_IMMUTABLE to indicate that the target type cannot be mixed with any other target type, and once loaded into a device, it cannot be replaced with a table containing a different type. The thin provisioning pool device will use this. Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
- 02 8月, 2011 1 次提交
-
-
由 Mikulas Patocka 提交于
Add a new flag DMF_MERGE_IS_OPTIONAL to struct mapped_device to indicate whether the device can accept bios larger than the size its merge function returns. When set, use this to send large bios to snapshots which can split them if necessary. Snapshot I/O may be significantly fragmented and this approach seems to improve peformance. Before the patch, dm_set_device_limits restricted bio size to page size if the underlying device had a merge function and the target didn't provide a merge function. After the patch, dm_set_device_limits restricts bio size to page size if the underlying device has a merge function, doesn't have DMF_MERGE_IS_OPTIONAL flag and the target doesn't provide a merge function. The snapshot target can't provide a merge function because when the merge function is called, it is impossible to determine where the bio will be remapped. Previously this led us to impose a 4k limit, which we can now remove if the snapshot store is located on a device without a merge function. Together with another patch for optimizing full chunk writes, it improves performance from 29MB/s to 40MB/s when writing to the filesystem on snapshot store. If the snapshot store is placed on a non-dm device with a merge function (such as md-raid), device mapper still limits all bios to page size. Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
- 17 3月, 2011 1 次提交
-
-
由 Martin K. Petersen 提交于
MD and DM create a new bio_set for every metadevice. Each bio_set has an integrity mempool attached regardless of whether the metadevice is capable of passing integrity metadata. This is a waste of memory. Instead we defer the allocation decision to MD and DM since we know at metadevice creation time whether integrity passthrough is needed or not. Automatic integrity mempool allocation can then be removed from bioset_create() and we make an explicit integrity allocation for the fs_bio_set. Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com> Reported-by: NZdenek Kabelac <zkabelac@redhat.com> Acked-by: NMike Snitzer <snizer@redhat.com> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
- 12 8月, 2010 5 次提交
-
-
由 Mike Snitzer 提交于
Allow discards to be passed through to linear mappings if at least one underlying device supports it. Discards will be forwarded only to devices that support them. A target that supports discards should set num_discard_requests to indicate how many times each discard request must be submitted to it. Verify table's underlying devices support discards prior to setting the associated DM device as capable of discards (via QUEUE_FLAG_DISCARD). Signed-off-by: NMike Snitzer <snitzer@redhat.com> Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Reviewed-by: NJoe Thornber <thornber@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Will Drewry 提交于
This change unifies the various checks and finalization that occurs on a table prior to use. By doing so, it allows table construction without traversing the dm-ioctl interface. Signed-off-by: NWill Drewry <wad@chromium.org> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Mike Snitzer 提交于
Change bio-based mapped devices no longer to have a fully initialized request_queue (request_fn, elevator, etc). This means bio-based DM devices no longer register elevator sysfs attributes ('iosched/' tree or 'scheduler' other than "none"). In contrast, a request-based DM device will continue to have a full request_queue and will register elevator sysfs attributes. Therefore a user can determine a DM device's type by checking if elevator sysfs attributes exist. First allocate a minimalist request_queue structure for a DM device (needed for both bio and request-based DM). Initialization of a full request_queue is deferred until it is known that the DM device is request-based, at the end of the table load sequence. Factor DM device's request_queue initialization: - common to both request-based and bio-based into dm_init_md_queue(). - specific to request-based into dm_init_request_based_queue(). The md->type_lock mutex is used to protect md->queue, in addition to md->type, during table_load(). A DM device's first table_load will establish the immutable md->type. But md->queue initialization, based on md->type, may fail at that time (because blk_init_allocated_queue cannot allocate memory). Therefore any subsequent table_load must (re)try dm_setup_md_queue independently of establishing md->type. Signed-off-by: NMike Snitzer <snitzer@redhat.com> Acked-by: NKiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Mike Snitzer 提交于
Determine whether a mapped device is bio-based or request-based when loading its first (inactive) table and don't allow that to be changed later. This patch performs different device initialisation in each of the two cases. (We don't think it's necessary to add code to support changing between the two types.) Allowed md->type transitions: DM_TYPE_NONE to DM_TYPE_BIO_BASED DM_TYPE_NONE to DM_TYPE_REQUEST_BASED We now prevent table_load from replacing the inactive table with a conflicting type of table even after an explicit table_clear. Introduce 'type_lock' into the struct mapped_device to protect md->type and to prepare for the next patch that will change the queue initialization and allocate memory while md->type_lock is held. Signed-off-by: NMike Snitzer <snitzer@redhat.com> Acked-by: NKiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com> drivers/md/dm-ioctl.c | 15 +++++++++++++++ drivers/md/dm.c | 37 ++++++++++++++++++++++++++++++------- drivers/md/dm.h | 5 +++++ include/linux/dm-ioctl.h | 4 ++-- 4 files changed, 52 insertions(+), 9 deletions(-)
-
由 Kiyoshi Ueda 提交于
This patch separates the device deletion code from dm_put() to make sure the deletion happens in the process context. By this patch, device deletion always occurs in an ioctl (process) context and dm_put() can be called in interrupt context. As a result, the request-based dm's bad dm_put() usage pointed out by Mikulas below disappears. http://marc.info/?l=dm-devel&m=126699981019735&w=2 Without this patch, I confirmed there is a case to crash the system: dm_put() => dm_table_destroy() => vfree() => BUG_ON(in_interrupt()) Some more backgrounds and details: In request-based dm, a device opener can remove a mapped_device while the last request is still completing, because bios in the last request complete first and then the device opener can close and remove the mapped_device before the last request completes: CPU0 CPU1 ================================================================= <<INTERRUPT>> blk_end_request_all(clone_rq) blk_update_request(clone_rq) bio_endio(clone_bio) == end_clone_bio blk_update_request(orig_rq) bio_endio(orig_bio) <<I/O completed>> dm_blk_close() dev_remove() dm_put(md) <<Free md>> blk_finish_request(clone_rq) .... dm_end_request(clone_rq) free_rq_clone(clone_rq) blk_end_request_all(orig_rq) rq_completed(md) So request-based dm used dm_get()/dm_put() to hold md for each I/O until its request completion handling is fully done. However, the final dm_put() can call the device deletion code which must not be run in interrupt context and may cause kernel panic. To solve the problem, this patch moves the device deletion code, dm_destroy(), to predetermined places that is actually deleting the mapped_device in ioctl (process) context, and changes dm_put() just to decrement the reference count of the mapped_device. By this change, dm_put() can be used in any context and the symmetric model below is introduced: dm_create(): create a mapped_device dm_destroy(): destroy a mapped_device dm_get(): increment the reference count of a mapped_device dm_put(): decrement the reference count of a mapped_device dm_destroy() waits for all references of the mapped_device to disappear, then deletes the mapped_device. dm_destroy() uses active waiting with msleep(1), since deleting the mapped_device isn't performance-critical task. And since at this point, nobody opens the mapped_device and no new reference will be taken, the pending counts are just for racing completing activity and will eventually decrease to zero. For the unlikely case of the forced module unload, dm_destroy_immediate(), which doesn't wait and forcibly deletes the mapped_device, is also introduced and used in dm_hash_remove_all(). Otherwise, "rmmod -f" may be stuck and never return. And now, because the mapped_device is deleted at this point, subsequent accesses to the mapped_device may cause NULL pointer references. Cc: stable@kernel.org Signed-off-by: NKiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
- 06 3月, 2010 1 次提交
-
-
由 Peter Rajnoha 提交于
Set a new DM_UEVENT_GENERATED_FLAG when returning from ioctls to indicate that a uevent was actually generated. This tells the userspace caller that it may need to wait for the event to be processed. Signed-off-by: NPeter Rajnoha <prajnoha@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
- 11 12月, 2009 3 次提交
-
-
由 Kiyoshi Ueda 提交于
This patch renames dm_suspended() to dm_suspended_md() and keeps it internal to dm. No functional change. Signed-off-by: NKiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com> Cc: Mike Anderson <andmike@linux.vnet.ibm.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Mike Anderson 提交于
Add dm_deleting_md to check whether or not a given mapped device is currently being deleted. Signed-off-by: NMike Anderson <andmike@linux.vnet.ibm.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Mikulas Patocka 提交于
Allocate "struct io" from a slab. This patch changes dm-io, so that "struct io" is allocated from a slab cache. It used to be allocated with kmalloc. Allocating from a slab will be needed for the next patch, because it requires a special alignment of "struct io" and kmalloc cannot meet this alignment. Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
- 24 7月, 2009 1 次提交
-
-
由 Mike Snitzer 提交于
This patch removes DM's bio-based vs request-based conditional setting of next_ordered. For bio-based DM the next_ordered check is no longer a concern (as that check is now in the __make_request path). For request-based DM the default of QUEUE_ORDERED_NONE is now appropriate. bio-based DM was changed to work-around the previously misplaced next_ordered check with this commit: 99360b4c request-based DM does not yet support barriers but reacted to the above bio-based DM change with this commit: 5d67aa23 The above changes are no longer needed given Neil Brown's recent fix to put the next_ordered check in the __make_request path: db64f680Signed-off-by: NMike Snitzer <snitzer@redhat.com> Cc: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Cc: NeilBrown <neilb@suse.de> Acked-by: NKiyoshi Ueda <k-ueda@ct.jp.nec.com> Acked-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
- 22 6月, 2009 5 次提交
-
-
由 Kiyoshi Ueda 提交于
Request-based dm doesn't have barrier support yet. So we need to set QUEUE_ORDERED_DRAIN only for bio-based dm. Since the device type is decided at the first table loading time, the flag set is deferred until then. Signed-off-by: NKiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com> Acked-by: NHannes Reinecke <hare@suse.de> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Kiyoshi Ueda 提交于
This patch enables request-based dm. o Request-based dm and bio-based dm coexist, since there are some target drivers which are more fitting to bio-based dm. Also, there are other bio-based devices in the kernel (e.g. md, loop). Since bio-based device can't receive struct request, there are some limitations on device stacking between bio-based and request-based. type of underlying device bio-based request-based ---------------------------------------------- bio-based OK OK request-based -- OK The device type is recognized by the queue flag in the kernel, so dm follows that. o The type of a dm device is decided at the first table binding time. Once the type of a dm device is decided, the type can't be changed. o Mempool allocations are deferred to at the table loading time, since mempools for request-based dm are different from those for bio-based dm and needed mempool type is fixed by the type of table. o Currently, request-based dm supports only tables that have a single target. To support multiple targets, we need to support request splitting or prevent bio/request from spanning multiple targets. The former needs lots of changes in the block layer, and the latter needs that all target drivers support merge() function. Both will take a time. Signed-off-by: NKiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Kiyoshi Ueda 提交于
This patch adds core functions for request-based dm. When struct mapped device (md) is initialized, md->queue has an I/O scheduler and the following functions are used for request-based dm as the queue functions: make_request_fn: dm_make_request() pref_fn: dm_prep_fn() request_fn: dm_request_fn() softirq_done_fn: dm_softirq_done() lld_busy_fn: dm_lld_busy() Actual initializations are done in another patch (PATCH 2). Below is a brief summary of how request-based dm behaves, including: - making request from bio - cloning, mapping and dispatching request - completing request and bio - suspending md - resuming md bio to request ============== md->queue->make_request_fn() (dm_make_request()) calls __make_request() for a bio submitted to the md. Then, the bio is kept in the queue as a new request or merged into another request in the queue if possible. Cloning and Mapping =================== Cloning and mapping are done in md->queue->request_fn() (dm_request_fn()), when requests are dispatched after they are sorted by the I/O scheduler. dm_request_fn() checks busy state of underlying devices using target's busy() function and stops dispatching requests to keep them on the dm device's queue if busy. It helps better I/O merging, since no merge is done for a request once it is dispatched to underlying devices. Actual cloning and mapping are done in dm_prep_fn() and map_request() called from dm_request_fn(). dm_prep_fn() clones not only request but also bios of the request so that dm can hold bio completion in error cases and prevent the bio submitter from noticing the error. (See the "Completion" section below for details.) After the cloning, the clone is mapped by target's map_rq() function and inserted to underlying device's queue using blk_insert_cloned_request(). Completion ========== Request completion can be hooked by rq->end_io(), but then, all bios in the request will have been completed even error cases, and the bio submitter will have noticed the error. To prevent the bio completion in error cases, request-based dm clones both bio and request and hooks both bio->bi_end_io() and rq->end_io(): bio->bi_end_io(): end_clone_bio() rq->end_io(): end_clone_request() Summary of the request completion flow is below: blk_end_request() for a clone request => blk_update_request() => bio->bi_end_io() == end_clone_bio() for each clone bio => Free the clone bio => Success: Complete the original bio (blk_update_request()) Error: Don't complete the original bio => blk_finish_request() => rq->end_io() == end_clone_request() => blk_complete_request() => dm_softirq_done() => Free the clone request => Success: Complete the original request (blk_end_request()) Error: Requeue the original request end_clone_bio() completes the original request on the size of the original bio in successful cases. Even if all bios in the original request are completed by that completion, the original request must not be completed yet to keep the ordering of request completion for the stacking. So end_clone_bio() uses blk_update_request() instead of blk_end_request(). In error cases, end_clone_bio() doesn't complete the original bio. It just frees the cloned bio and gives over the error handling to end_clone_request(). end_clone_request(), which is called with queue lock held, completes the clone request and the original request in a softirq context (dm_softirq_done()), which has no queue lock, to avoid a deadlock issue on submission of another request during the completion: - The submitted request may be mapped to the same device - Request submission requires queue lock, but the queue lock has been held by itself and it doesn't know that The clone request has no clone bio when dm_softirq_done() is called. So target drivers can't resubmit it again even error cases. Instead, they can ask dm core for requeueing and remapping the original request in that cases. suspend ======= Request-based dm uses stopping md->queue as suspend of the md. For noflush suspend, just stops md->queue. For flush suspend, inserts a marker request to the tail of md->queue. And dispatches all requests in md->queue until the marker comes to the front of md->queue. Then, stops dispatching request and waits for the all dispatched requests to complete. After that, completes the marker request, stops md->queue and wake up the waiter on the suspend queue, md->wait. resume ====== Starts md->queue. Signed-off-by: NKiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Mike Snitzer 提交于
Currently, device-mapper maintains a separate instance of 'struct queue_limits' for each table of each device. When the configuration of a device is to be changed, first its table is loaded and this structure is populated, then the device is 'resumed' and the calculated queue_limits are applied. This places restrictions on how userspace may process related devices, where it is often advantageous to 'load' tables for several devices at once before 'resuming' them together. As the new queue_limits only take effect after the 'resume', if they are changing and one device uses another, the latter must be 'resumed' before the former may be 'loaded'. This patch moves the calculation of these queue_limits out of the 'load' operation into 'resume'. Since we are no longer pre-calculating this struct, we no longer need to maintain copies within our dm structs. dm_set_device_limits() now passes the 'start' of the device's data area (aka pe_start) as the 'offset' to blk_stack_limits(). init_valid_queue_limits() is replaced by blk_set_default_limits(). Signed-off-by: NMike Snitzer <snitzer@redhat.com> Cc: martin.petersen@oracle.com Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Milan Broz 提交于
Add support for passing a 32 bit "cookie" into the kernel with the DM_SUSPEND, DM_DEV_RENAME and DM_DEV_REMOVE ioctls. The (unsigned) value of this cookie is returned to userspace alongside the uevents issued by these ioctls in the variable DM_COOKIE. This means the userspace process issuing these ioctls can be notified by udev after udev has completed any actions triggered. To minimise the interface extension, we pass the cookie into the kernel in the event_nr field which is otherwise unused when calling these ioctls. Incrementing the version number allows userspace to determine in advance whether or not the kernel supports the cookie. If the kernel does support this but userspace does not, there should be no impact as the new variable will just get ignored. Signed-off-by: NMilan Broz <mbroz@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-