- 11 12月, 2009 17 次提交
-
-
由 Kiyoshi Ueda 提交于
This patch renames dm_suspended() to dm_suspended_md() and keeps it internal to dm. No functional change. Signed-off-by: NKiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com> Cc: Mike Anderson <andmike@linux.vnet.ibm.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Kiyoshi Ueda 提交于
This patch moves DMF_SUSPENDED flag set before postsuspend. No one should care about the ordering, because the flag set and the postsuspend are protected by a single lock, md->suspend_lock, and all strict flag-checkers take the lock. Signed-off-by: NKiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com> Cc: Mike Anderson <andmike@linux.vnet.ibm.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Jun'ichi Nomura 提交于
This patch adds a remapping trace to request-based dm. BIO-based dm already has the equivalent tracepoint. For example, under this dm stack (linear LV on multipath): # dmsetup ls --tree -o ascii vg-lv0 (253:1) `-mpath0 (253:0) |- (8:160) |- (66:80) |- (65:176) `- (65:160) Trace of 'dd of=/dev/vg/lv0 bs=128k count=1 oflag=direct' looks like this: without the patch: dd-6674 [000] 539.727384: block_bio_queue: 253,1 WS 0 + 256 [dd] dd-6674 [000] 539.727392: block_remap: 253,0 WS 384 + 256 <- (253,1) 0 dd-6674 [000] 539.727394: block_bio_queue: 253,0 WS 384 + 256 [dd] dd-6674 [000] 539.727405: block_getrq: 253,0 WS 384 + 256 [dd] dd-6674 [000] 539.727409: block_plug: [dd] dd-6674 [000] 539.727410: block_rq_insert: 253,0 W 0 () 384 + 256 [dd] dd-6674 [000] 539.727416: block_rq_issue: 253,0 W 0 () 384 + 256 [dd] dd-6674 [000] 539.727426: block_rq_insert: 65,176 W 0 () 384 + 256 [dd] dd-6674 [000] 539.727427: block_rq_issue: 65,176 W 0 () 384 + 256 [dd] ... and with the patch: (the line with '**' is the trace added by this patch) dd-6617 [002] 162.914301: block_bio_queue: 253,1 WS 0 + 256 [dd] dd-6617 [002] 162.914314: block_remap: 253,0 WS 384 + 256 <- (253,1) 0 dd-6617 [002] 162.914316: block_bio_queue: 253,0 WS 384 + 256 [dd] dd-6617 [002] 162.914331: block_getrq: 253,0 WS 384 + 256 [dd] dd-6617 [002] 162.914335: block_plug: [dd] dd-6617 [002] 162.914337: block_rq_insert: 253,0 W 0 () 384 + 256 [dd] dd-6617 [002] 162.914347: block_rq_issue: 253,0 W 0 () 384 + 256 [dd] **dd-6617 [002] 162.914356: block_rq_remap: 65,176 W 384 + 256 <- (253,0) 384 dd-6617 [002] 162.914358: block_rq_insert: 65,176 W 0 () 384 + 256 [dd] dd-6617 [002] 162.914359: block_rq_issue: 65,176 W 0 () 384 + 256 [dd] ... Signed-off-by: NKiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Alasdair G Kergon 提交于
When swapping a new table into place, retain the old table until its replacement is in place. An old check for an empty table is removed because this is enforced in populate_table(). __unbind() becomes redundant when followed by __bind(). Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Alasdair G Kergon 提交于
When replacing a mapped device's table during a 'resume', delay the destruction of the old table until the new one is successfully in place. This will make it easier for a later patch to transfer internal state information from the old table to the new one (something we do not currently support) while giving us more options for reversion if a later part of the operation fails. Devices are always in the suspended state during dm_swap_table(). This patch reinforces the requirement that all I/O must have been flushed from the table targets while in this state (including any in workqueues). In the case of 'noflush' suspending, unprocessed I/O should have been 'pushed back' to the dm core prior to this point, for resubmission after the new table is in place. Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Mike Anderson 提交于
Add dm_deleting_md to check whether or not a given mapped device is currently being deleted. Signed-off-by: NMike Anderson <andmike@linux.vnet.ibm.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Alasdair G Kergon 提交于
Rename dm_get_table to dm_get_live_table. Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Kiyoshi Ueda 提交于
This patch adds barrier support for request-based dm. CORE DESIGN The design is basically same as bio-based dm, which emulates barrier by mapping empty barrier bios before/after a barrier I/O. But request-based dm has been using struct request_queue for I/O queueing, so the block-layer's barrier mechanism can be used. o Summary of the block-layer's behavior (which is depended by dm-core) Request-based dm uses QUEUE_ORDERED_DRAIN_FLUSH ordered mode for I/O barrier. It means that when an I/O requiring barrier is found in the request_queue, the block-layer makes pre-flush request and post-flush request just before and just after the I/O respectively. After the ordered sequence starts, the block-layer waits for all in-flight I/Os to complete, then gives drivers the pre-flush request, the barrier I/O and the post-flush request one by one. It means that the request_queue is stopped automatically by the block-layer until drivers complete each sequence. o dm-core For the barrier I/O, treats it as a normal I/O, so no additional code is needed. For the pre/post-flush request, flushes caches by the followings: 1. Make the number of empty barrier requests required by target's num_flush_requests, and map them (dm_rq_barrier()). 2. Waits for the mapped barriers to complete (dm_rq_barrier()). If error has occurred, save the error value to md->barrier_error (dm_end_request()). (*) Basically, the first reported error is taken. But -EOPNOTSUPP supersedes any error and DM_ENDIO_REQUEUE follows. 3. Requeue the pre/post-flush request if the error value is DM_ENDIO_REQUEUE. Otherwise, completes with the error value (dm_rq_barrier_work()). The pre/post-flush work above is done in the kernel thread (kdmflush) context, since memory allocation which might sleep is needed in dm_rq_barrier() but sleep is not allowed in dm_request_fn(), which is an irq-disabled context. Also, clones of the pre/post-flush request share an original, so such clones can't be completed using the softirq context. Instead, complete them in the context of underlying device drivers. It should be safe since there is no I/O dispatching during the completion of such clones. For suspend, the workqueue of kdmflush needs to be flushed after the request_queue has been stopped. Otherwise, the next flush work can be kicked even after the suspend completes. TARGET INTERFACE No new interface is added. Just use the existing num_flush_requests in struct target_type as same as bio-based dm. Signed-off-by: NKiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Kiyoshi Ueda 提交于
This patch moves dm_end_request() to make the next patch more readable. No functional change. Signed-off-by: NKiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Kiyoshi Ueda 提交于
This patch factors out the clone completion code, dm_done(), from dm_softirq_done() in preparation for a subsequent patch. No functional change. dm_done() will be used in barrier completion, which can't use and doesn't need softirq. The softirq_done callback needs to get a clone from an original request but it can't in the case of barrier, where an original request is shared by multiple clones. On the other hand, the completion of barrier clones doesn't involve re-submitting requests, which was the primary reason of the need for softirq. Signed-off-by: NKiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Kiyoshi Ueda 提交于
This patch changes the counter for the number of in_flight I/Os to md->pending from q->in_flight in preparation for a later patch. No functional change. Request-based dm used q->in_flight to count the number of in-flight clones assuming the counter is always incremented for an in-flight original request and original:clone is 1:1 relationship. However, it this no longer true for barrier requests. So use md->pending to count the number of in-flight clones. Signed-off-by: NKiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Kiyoshi Ueda 提交于
The semantics of bio-based dm were changed recently in the case of suspend with "--nolockfs" but without "--noflush". Before 2.6.30, I/Os submitted before the suspend invocation were always flushed. From 2.6.30 onwards, I/Os submitted before the suspend invocation might not be flushed. (For details, see http://marc.info/?t=123994433400003&r=1&w=2) This patch brings the behaviour of request-based dm into line with bio-based dm, simplifying the code and preparing for a subsequent patch that will wait for all in_flight I/Os to complete without stopping request_queue and use dm_wait_for_completion() for it. This change in semantics simplifies the suspend code as follows: o Suspend is implemented as stopping request_queue in request-based dm, and all I/Os are queued in the request_queue even after suspend is invoked. o In the old semantics, we had to track whether I/Os were queued before or after the suspend invocation, so a special barrier-like request called 'suspend marker' was introduced. o With the new semantics, we don't need to flush any I/O so we can remove the marker and the code related to the marker handling and I/O flushing. After removing this codes, the suspend sequence is now: 1. Flush all I/Os by lock_fs() if needed. 2. Stop dispatching any I/O by stopping the request_queue. 3. Wait for all in-flight I/Os to be completed or requeued. Signed-off-by: NKiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Kiyoshi Ueda 提交于
This patch factors out the request cloning code in dm_prep_fn() as clone_rq(). No functional change. This patch is a preparation for a later patch in this series which needs to make clones from an original barrier request. Signed-off-by: NKiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Kiyoshi Ueda 提交于
This patch adds the gfp_mask argument to alloc_rq_tio(). No functional change. This patch is a preparation for a later patch in this series which needs to allocate tio (for barrier I/O) with different allocation flag (GFP_NOIO) from the one in the normal I/O code path. Signed-off-by: NKiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Kiyoshi Ueda 提交于
This patch changes the argument of map_request() to clone request from original request. No functional change. This patch is a preparation for PATCH 9, which needs to use map_request() for clones sharing an original barrier request. Signed-off-by: NKiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Kiyoshi Ueda 提交于
This patch adds md_in_flight() to get the number of in_flight I/Os. No functional change. This patch is a preparation for a later patch in this series, which changes I/O counter to md->pending from q->in_flight in request-based dm. Signed-off-by: NKiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Mikulas Patocka 提交于
Allocate "struct io" from a slab. This patch changes dm-io, so that "struct io" is allocated from a slab cache. It used to be allocated with kmalloc. Allocating from a slab will be needed for the next patch, because it requires a special alignment of "struct io" and kmalloc cannot meet this alignment. Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
- 17 10月, 2009 2 次提交
-
-
由 Kiyoshi Ueda 提交于
Multiple instances of dec_pending() can run concurrently so a lock is needed when it saves the first error code. I have never experienced actual problem without locking and just found this during code inspection while implementing the barrier support patch for request-based dm. This patch adds the locking. I've done compile, boot and basic I/O testings. Cc: stable@kernel.org Signed-off-by: NKiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Zdenek Kabelac 提交于
Add missing del_gendisk() to error path when creation of workqueue fails. Otherwice there is a resource leak and following warning is shown: WARNING: at fs/sysfs/dir.c:487 sysfs_add_one+0xc5/0x160() sysfs: cannot create duplicate filename '/devices/virtual/block/dm-0' Cc: stable@kernel.org Signed-off-by: NZdenek Kabelac <zkabelac@redhat.com> Reviewed-by: NJonathan Brassow <jbrassow@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
- 07 10月, 2009 1 次提交
-
-
由 Nikanth Karthikesan 提交于
Commit a9327cac added seperate read and write statistics of in_flight requests. And exported the number of read and write requests in progress seperately through sysfs. But Corrado Zoccolo <czoccolo@gmail.com> reported getting strange output from "iostat -kx 2". Global values for service time and utilization were garbage. For interval values, utilization was always 100%, and service time is higher than normal. So this was reverted by commit 0f78ab98 The problem was in part_round_stats_single(), I missed the following: if (now == part->stamp) return; - if (part->in_flight) { + if (part_in_flight(part)) { __part_stat_add(cpu, part, time_in_queue, part_in_flight(part) * (now - part->stamp)); __part_stat_add(cpu, part, io_ticks, (now - part->stamp)); With this chunk included, the reported regression gets fixed. Signed-off-by: NNikanth Karthikesan <knikanth@suse.de> -- Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
- 05 10月, 2009 1 次提交
-
-
由 Jens Axboe 提交于
This reverts commit a9327cac. Corrado Zoccolo <czoccolo@gmail.com> reports: "with 2.6.32-rc1 I started getting the following strange output from "iostat -kx 2": Linux 2.6.31bisect (et2) 04/10/2009 _i686_ (2 CPU) avg-cpu: %user %nice %system %iowait %steal %idle 10,70 0,00 3,16 15,75 0,00 70,38 Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 18,22 0,00 0,67 0,01 14,77 0,02 43,94 0,01 10,53 39043915,03 2629219,87 sdb 60,89 9,68 50,79 3,04 1724,43 50,52 65,95 0,70 13,06 488437,47 2629219,87 avg-cpu: %user %nice %system %iowait %steal %idle 2,72 0,00 0,74 0,00 0,00 96,53 Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 100,00 sdb 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 100,00 avg-cpu: %user %nice %system %iowait %steal %idle 6,68 0,00 0,99 0,00 0,00 92,33 Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 100,00 sdb 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 100,00 avg-cpu: %user %nice %system %iowait %steal %idle 4,40 0,00 0,73 1,47 0,00 93,40 Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 100,00 sdb 0,00 4,00 0,00 3,00 0,00 28,00 18,67 0,06 19,50 333,33 100,00 Global values for service time and utilization are garbage. For interval values, utilization is always 100%, and service time is higher than normal. I bisected it down to: [a9327cac] Seperate read and write statistics of in_flight requests and verified that reverting just that commit indeed solves the issue on 2.6.32-rc1." So until this is debugged, revert the bad commit. Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
- 22 9月, 2009 1 次提交
-
-
由 Alexey Dobriyan 提交于
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 14 9月, 2009 1 次提交
-
-
由 Nikanth Karthikesan 提交于
Currently, there is a single in_flight counter measuring the number of requests in the request_queue. But some monitoring tools would like to know how many read requests and write requests are in progress. Split the current in_flight counter into two seperate counters for read and write. This information is exported as a sysfs attribute, as changing the currently available stat files would break the existing tools. Signed-off-by: NNikanth Karthikesan <knikanth@suse.de> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
- 11 9月, 2009 1 次提交
-
-
由 Jens Axboe 提交于
Get rid of any functions that test for these bits and make callers use bio_rw_flagged() directly. Then it is at least directly apparent what variable and flag they check. Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
- 05 9月, 2009 1 次提交
-
-
由 Kiyoshi Ueda 提交于
The patch posted at http://marc.info/?l=dm-devel&m=124539787228784&w=2 which was merged into cec47e3d ("dm: prepare for request based option") introduced a regression in request-based dm. If map_request() calls dm_kill_unmapped_request() to complete a cloned bio without dispatching it, clone->bio is still set when dm_end_request() is called and the BUG_ON(clone->bio) is incorrect. The patch fixes this bug by freeing bio in dm_end_request() if the clone has bio. I've redone my tests to cover all I/O paths and confirmed there's no other regression. Here is the oops I hit in request-based dm when I do I/O to a multipath device which doesn't have any active path nor queue_if_no_path setting: ------------[ cut here ]------------ kernel BUG at /root/2.6.31-rc4.rqdm/drivers/md/dm.c:828! invalid opcode: 0000 [#1] SMP last sysfs file: /sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map CPU 1 Modules linked in: autofs4 sunrpc cpufreq_ondemand acpi_cpufreq dm_mirror dm_region_hash dm_log dm_service_time dm_multipath scsi_dh dm_mod video output sbs sbshc battery ac sg sr_mod e1000e button cdrom serio_raw rtc_cmos rtc_core rtc_lib piix lpfc scsi_transport_fc ata_piix libata megaraid_sas sd_mod scsi_mod crc_t10dif ext3 jbd uhci_hcd ohci_hcd ehci_hcd [last unloaded: microcode] Pid: 7, comm: ksoftirqd/1 Not tainted 2.6.31-rc4.rqdm #1 Express5800/120Lj [N8100-1417] RIP: 0010:[<ffffffffa023629d>] [<ffffffffa023629d>] dm_softirq_done+0xbd/0x100 [dm_mod] RSP: 0018:ffff8800280a1f08 EFLAGS: 00010282 RAX: ffffffffa02544e0 RBX: ffff8802aa1111d0 RCX: ffff8802aa1111e0 RDX: ffff8802ab913e70 RSI: 0000000000000000 RDI: ffff8802ab913e70 RBP: ffff8800280a1f28 R08: ffffc90005457040 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000000 R12: 00000000fffffffb R13: ffff8802ab913e88 R14: ffff8802ab9c1438 R15: 0000000000000100 FS: 0000000000000000(0000) GS:ffff88002809e000(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 0000003d54a98640 CR3: 000000029f0a1000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process ksoftirqd/1 (pid: 7, threadinfo ffff8802ae50e000, task ffff8802ae4f8040) Stack: ffff8800280a1f38 0000000000000020 ffffffff814f30a0 0000000000000004 <0> ffff8800280a1f58 ffffffff8116b245 ffff8800280a1f38 ffff8800280a1f38 <0> ffff8800280a1f58 0000000000000001 ffff8800280a1fa8 ffffffff810477bc Call Trace: <IRQ> [<ffffffff8116b245>] blk_done_softirq+0x75/0x90 [<ffffffff810477bc>] __do_softirq+0xcc/0x210 [<ffffffff81047170>] ? ksoftirqd+0x0/0x110 [<ffffffff8100ce7c>] call_softirq+0x1c/0x50 <EOI> [<ffffffff8100e785>] do_softirq+0x65/0xa0 [<ffffffff81047170>] ? ksoftirqd+0x0/0x110 [<ffffffff810471e0>] ksoftirqd+0x70/0x110 [<ffffffff81059559>] kthread+0x99/0xb0 [<ffffffff8100cd7a>] child_rip+0xa/0x20 [<ffffffff8100c73c>] ? restore_args+0x0/0x30 [<ffffffff810594c0>] ? kthread+0x0/0xb0 [<ffffffff8100cd70>] ? child_rip+0x0/0x20 Code: 44 89 e6 48 89 df e8 23 fb f2 e0 be 01 00 00 00 4c 89 f7 e8 f6 fd ff ff 5b 41 5c 41 5d 41 5e c9 c3 4c 89 ef e8 85 fe ff ff eb ed <0f> 0b eb fe 41 8b 85 dc 00 00 00 48 83 bb 10 01 00 00 00 89 83 RIP [<ffffffffa023629d>] dm_softirq_done+0xbd/0x100 [dm_mod] RSP <ffff8800280a1f08> ---[ end trace 16af0a1d8542da55 ]--- Signed-off-by: NKiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
- 24 7月, 2009 1 次提交
-
-
由 Mike Snitzer 提交于
This patch removes DM's bio-based vs request-based conditional setting of next_ordered. For bio-based DM the next_ordered check is no longer a concern (as that check is now in the __make_request path). For request-based DM the default of QUEUE_ORDERED_NONE is now appropriate. bio-based DM was changed to work-around the previously misplaced next_ordered check with this commit: 99360b4c request-based DM does not yet support barriers but reacted to the above bio-based DM change with this commit: 5d67aa23 The above changes are no longer needed given Neil Brown's recent fix to put the next_ordered check in the __make_request path: db64f680Signed-off-by: NMike Snitzer <snitzer@redhat.com> Cc: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Cc: NeilBrown <neilb@suse.de> Acked-by: NKiyoshi Ueda <k-ueda@ct.jp.nec.com> Acked-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
- 01 7月, 2009 1 次提交
-
-
由 Martin K. Petersen 提交于
This patch restores stacking ability to the block layer integrity infrastructure by creating a set of dedicated bip slabs. Each bip slab has an embedded bio_vec array at the end. This cuts down on memory allocations and also simplifies the code compared to the original bvec version. Only the largest bip slab is backed by a mempool. The pool is contained in the bio_set so stacking drivers can ensure forward progress. Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com> Signed-off-by: NJens Axboe <axboe@carl.(none)>
-
- 22 6月, 2009 13 次提交
-
-
由 Kiyoshi Ueda 提交于
This patch disables interrupt when taking map_lock to avoid lockdep warnings in request-based dm. request-based dm takes map_lock after taking queue_lock with disabling interrupt: spin_lock_irqsave(queue_lock) q->request_fn() == dm_request_fn() => dm_get_table() => read_lock(map_lock) while queue_lock could be (but isn't) taken in interrupt context. Signed-off-by: NKiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com> Acked-by: NChristof Schmitt <christof.schmitt@de.ibm.com> Acked-by: NHannes Reinecke <hare@suse.de> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Kiyoshi Ueda 提交于
Request-based dm doesn't have barrier support yet. So we need to set QUEUE_ORDERED_DRAIN only for bio-based dm. Since the device type is decided at the first table loading time, the flag set is deferred until then. Signed-off-by: NKiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com> Acked-by: NHannes Reinecke <hare@suse.de> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Kiyoshi Ueda 提交于
This patch enables request-based dm. o Request-based dm and bio-based dm coexist, since there are some target drivers which are more fitting to bio-based dm. Also, there are other bio-based devices in the kernel (e.g. md, loop). Since bio-based device can't receive struct request, there are some limitations on device stacking between bio-based and request-based. type of underlying device bio-based request-based ---------------------------------------------- bio-based OK OK request-based -- OK The device type is recognized by the queue flag in the kernel, so dm follows that. o The type of a dm device is decided at the first table binding time. Once the type of a dm device is decided, the type can't be changed. o Mempool allocations are deferred to at the table loading time, since mempools for request-based dm are different from those for bio-based dm and needed mempool type is fixed by the type of table. o Currently, request-based dm supports only tables that have a single target. To support multiple targets, we need to support request splitting or prevent bio/request from spanning multiple targets. The former needs lots of changes in the block layer, and the latter needs that all target drivers support merge() function. Both will take a time. Signed-off-by: NKiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Kiyoshi Ueda 提交于
This patch adds core functions for request-based dm. When struct mapped device (md) is initialized, md->queue has an I/O scheduler and the following functions are used for request-based dm as the queue functions: make_request_fn: dm_make_request() pref_fn: dm_prep_fn() request_fn: dm_request_fn() softirq_done_fn: dm_softirq_done() lld_busy_fn: dm_lld_busy() Actual initializations are done in another patch (PATCH 2). Below is a brief summary of how request-based dm behaves, including: - making request from bio - cloning, mapping and dispatching request - completing request and bio - suspending md - resuming md bio to request ============== md->queue->make_request_fn() (dm_make_request()) calls __make_request() for a bio submitted to the md. Then, the bio is kept in the queue as a new request or merged into another request in the queue if possible. Cloning and Mapping =================== Cloning and mapping are done in md->queue->request_fn() (dm_request_fn()), when requests are dispatched after they are sorted by the I/O scheduler. dm_request_fn() checks busy state of underlying devices using target's busy() function and stops dispatching requests to keep them on the dm device's queue if busy. It helps better I/O merging, since no merge is done for a request once it is dispatched to underlying devices. Actual cloning and mapping are done in dm_prep_fn() and map_request() called from dm_request_fn(). dm_prep_fn() clones not only request but also bios of the request so that dm can hold bio completion in error cases and prevent the bio submitter from noticing the error. (See the "Completion" section below for details.) After the cloning, the clone is mapped by target's map_rq() function and inserted to underlying device's queue using blk_insert_cloned_request(). Completion ========== Request completion can be hooked by rq->end_io(), but then, all bios in the request will have been completed even error cases, and the bio submitter will have noticed the error. To prevent the bio completion in error cases, request-based dm clones both bio and request and hooks both bio->bi_end_io() and rq->end_io(): bio->bi_end_io(): end_clone_bio() rq->end_io(): end_clone_request() Summary of the request completion flow is below: blk_end_request() for a clone request => blk_update_request() => bio->bi_end_io() == end_clone_bio() for each clone bio => Free the clone bio => Success: Complete the original bio (blk_update_request()) Error: Don't complete the original bio => blk_finish_request() => rq->end_io() == end_clone_request() => blk_complete_request() => dm_softirq_done() => Free the clone request => Success: Complete the original request (blk_end_request()) Error: Requeue the original request end_clone_bio() completes the original request on the size of the original bio in successful cases. Even if all bios in the original request are completed by that completion, the original request must not be completed yet to keep the ordering of request completion for the stacking. So end_clone_bio() uses blk_update_request() instead of blk_end_request(). In error cases, end_clone_bio() doesn't complete the original bio. It just frees the cloned bio and gives over the error handling to end_clone_request(). end_clone_request(), which is called with queue lock held, completes the clone request and the original request in a softirq context (dm_softirq_done()), which has no queue lock, to avoid a deadlock issue on submission of another request during the completion: - The submitted request may be mapped to the same device - Request submission requires queue lock, but the queue lock has been held by itself and it doesn't know that The clone request has no clone bio when dm_softirq_done() is called. So target drivers can't resubmit it again even error cases. Instead, they can ask dm core for requeueing and remapping the original request in that cases. suspend ======= Request-based dm uses stopping md->queue as suspend of the md. For noflush suspend, just stops md->queue. For flush suspend, inserts a marker request to the tail of md->queue. And dispatches all requests in md->queue until the marker comes to the front of md->queue. Then, stops dispatching request and waits for the all dispatched requests to complete. After that, completes the marker request, stops md->queue and wake up the waiter on the suspend queue, md->wait. resume ====== Starts md->queue. Signed-off-by: NKiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Mike Snitzer 提交于
Currently, device-mapper maintains a separate instance of 'struct queue_limits' for each table of each device. When the configuration of a device is to be changed, first its table is loaded and this structure is populated, then the device is 'resumed' and the calculated queue_limits are applied. This places restrictions on how userspace may process related devices, where it is often advantageous to 'load' tables for several devices at once before 'resuming' them together. As the new queue_limits only take effect after the 'resume', if they are changing and one device uses another, the latter must be 'resumed' before the former may be 'loaded'. This patch moves the calculation of these queue_limits out of the 'load' operation into 'resume'. Since we are no longer pre-calculating this struct, we no longer need to maintain copies within our dm structs. dm_set_device_limits() now passes the 'start' of the device's data area (aka pe_start) as the 'offset' to blk_stack_limits(). init_valid_queue_limits() is replaced by blk_set_default_limits(). Signed-off-by: NMike Snitzer <snitzer@redhat.com> Cc: martin.petersen@oracle.com Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Milan Broz 提交于
Add support for passing a 32 bit "cookie" into the kernel with the DM_SUSPEND, DM_DEV_RENAME and DM_DEV_REMOVE ioctls. The (unsigned) value of this cookie is returned to userspace alongside the uevents issued by these ioctls in the variable DM_COOKIE. This means the userspace process issuing these ioctls can be notified by udev after udev has completed any actions triggered. To minimise the interface extension, we pass the cookie into the kernel in the event_nr field which is otherwise unused when calling these ioctls. Incrementing the version number allows userspace to determine in advance whether or not the kernel supports the cookie. If the kernel does support this but userspace does not, there should be no impact as the new variable will just get ignored. Signed-off-by: NMilan Broz <mbroz@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Mikulas Patocka 提交于
Pass empty barrier flushes to the targets in dm_flush(). Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Alasdair G Kergon 提交于
Move repeated dm_target_io initialisation inside alloc_tio(). Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Mikulas Patocka 提交于
Introduce num_flush_requests for a target to set to say how many flush instructions (empty barriers) it wants to receive. These are sent by __clone_and_map_empty_barrier with map_info->flush_request going from 0 to (num_flush_requests - 1). Old targets without flush support won't receive any flush requests. Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Mikulas Patocka 提交于
Remove the check that the size of the cloned bio is not zero because a subsequent patch needs to send zero-sized barriers down this path. Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Mikulas Patocka 提交于
If the underlying device doesn't support barriers and dm receives a barrier, it waits until all requests on that device drain so it no longer needs to report -EOPNOTSUPP to the caller. This patch deals with the confusing situation when moving a volume from one physical device to another triggers an EOPNOTSUPP on a volume that didn't report it before. Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Mikulas Patocka 提交于
With the following patches, more than one error can occur during processing. Change md->barrier_error so that only the first one is recorded and returned to the caller. Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Mikulas Patocka 提交于
If barrier request was returned with DM_ENDIO_REQUEUE, requeue it in dm_wq_work instead of dec_pending. This allows us to correctly handle a situation when some targets are asking for a requeue and other targets signal an error. Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-