- 16 10月, 2009 4 次提交
-
-
由 NeilBrown 提交于
Both raid1 and raid10 create a mempool during startup. If the 'alloc' function for this mempool fails, unplug_slaves is called. If that happens when the pool is being initialised, unplug_slaves will try to use the 'conf' structure that isn't filled in yet, and badness will happen. So ensure that unplug_slaves doesn't get called unless we know that the conf structure if fully initialised. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 Dan Williams 提交于
Deallocating a raid5_conf_t structure requires taking 'device_lock'. Ensure it is initialized before it is used, i.e. initialize the lock before attempting any further initializations that might fail. Signed-off-by: NDan Williams <dan.j.williams@intel.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
During 'check' of a raid1 or raid10 it is possible for the management thread to spend a lot of time running 'memcmp' on blocks from different devices, so make sure the thread has a chance to schedule. raid5d already has a cond_resched (in process_stripe). Reported-By: NLee Howard <faxguy@howardsilvan.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
This reverts commit df10cfbc. This patch was based on a misunderstanding and risks introducing a busy-wait loop. So revert it. Acked-by: NDan Williams <dan.j.williams@intel.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 23 9月, 2009 6 次提交
-
-
由 Dmitry Monakhov 提交于
Recently Jens has changed bio_rw_flagged() logic by following commit 1f98a13f. Now it returns bool instead of int. This broke raid1/raid10 RW bits manipulation logic. One of visible result is BUG_ON triggering due to empty barrier here scsi_lib.c:1108 scsi_setup_fs_cmnd() Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
Recent commit bbba809e replaced mempool_create_kzalloc_pool with mempool_create_kmalloc_pool plus a memset. This memset is not needed (and we didn't need kzalloc in the first place). Ever field of the allocated structure (struct multipath_bh) is initialised immediately except retry_list, and memset does not initial a list_head anyway. To remove the memset. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
This should writeback from coming when the device is temporarily suspended. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
The management thread for raid4,5,6 arrays are all called mdX_raid5, independent of the actual raid level, which is wrong and can be confusion. So change md_register_thread to use the name from the personality unless no alternate name (like 'resync' or 'reshape') is given. This is simpler and more correct. Cc: Jinzc <zhenchengjin@gmail.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
There was a real error here on a failure path where we incorrectly call rcu_read_unlock. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
Rename some variable and remove some duplicate definitions to avoid there warnings. None of them are actual errors. Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 22 9月, 2009 2 次提交
-
-
由 Sage Weil 提交于
The kzalloc mempool does not re-zero items that have been used and then returned to the pool. Manually zero the allocated multipath_bh instead. Acked-by: NNeil Brown <neilb@suse.de> Signed-off-by: NSage Weil <sage@newdream.net> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Alexey Dobriyan 提交于
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 21 9月, 2009 1 次提交
-
-
由 Anand Gadiyar 提交于
trivial: fix typo "for for" in multiple files Signed-off-by: NAnand Gadiyar <gadiyar@ti.com> Signed-off-by: NJiri Kosina <jkosina@suse.cz>
-
- 20 9月, 2009 1 次提交
-
-
由 Kay Sievers 提交于
This allows subsytems to provide devtmpfs with non-default permissions for the device node. Instead of the default mode of 0600, null, zero, random, urandom, full, tty, ptmx now have a mode of 0666, which allows non-privileged processes to access standard device nodes in case no other userspace process applies the expected permissions. This also fixes a wrong assignment in pktcdvd and a checkpatch.pl complain. Signed-off-by: NKay Sievers <kay.sievers@vrfy.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
-
- 17 9月, 2009 2 次提交
-
-
由 Dan Williams 提交于
Neil says: "It is correct as it stands, but the fact that every branch in the 'if' part ends with a 'return' isn't immediately obvious, so it is clearer if we are explicit about the if / then / else structure." Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
由 Dan Williams 提交于
As pointed out by Neil it should be possible to build a driver with all BUG_ON statements deleted. It's bad form to have a BUG_ON with a side effect. Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 14 9月, 2009 2 次提交
-
-
由 Martin K. Petersen 提交于
Implement blk_limits_io_opt() and make blk_queue_io_opt() a wrapper around it. DM needs this to avoid poking at the queue_limits directly. Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Nikanth Karthikesan 提交于
Currently, there is a single in_flight counter measuring the number of requests in the request_queue. But some monitoring tools would like to know how many read requests and write requests are in progress. Split the current in_flight counter into two seperate counters for read and write. This information is exported as a sysfs attribute, as changing the currently available stat files would break the existing tools. Signed-off-by: NNikanth Karthikesan <knikanth@suse.de> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
- 11 9月, 2009 2 次提交
-
-
由 Jens Axboe 提交于
Get rid of any functions that test for these bits and make callers use bio_rw_flagged() directly. Then it is at least directly apparent what variable and flag they check. Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Geert Uytterhoeven 提交于
Commit b8313b6d ("dm log: remove incorrect field from userspace table output") added a call to strstr() with a single-character "needle" string parameter. Unfortunately some versions of gcc replace such calls to strstr() by calls to strchr() behind our back. This causes linking errors if strchr() is defined as an inline function in <asm/string.h> (e.g. on m68k): | WARNING: "strchr" [drivers/md/dm-log-userspace.ko] undefined! Avoid this by explicitly calling strchr() instead. Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org> Cc: stable@kernel.org Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 09 9月, 2009 1 次提交
-
-
由 Dan Williams 提交于
Some engines optimize operation by reading ahead in the descriptor chain such that descriptor2 may start execution before descriptor1 completes. If descriptor2 depends on the result from descriptor1 then a fence is required (on descriptor2) to disable this optimization. The async_tx api could implicitly identify dependencies via the 'depend_tx' parameter, but that would constrain cases where the dependency chain only specifies a completion order rather than a data dependency. So, provide an ASYNC_TX_FENCE to explicitly identify data dependencies. Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 05 9月, 2009 13 次提交
-
-
由 Mikulas Patocka 提交于
Fix some problems seen in the chunk size processing when activating a pre-existing snapshot. For a new snapshot, the chunk size can either be supplied by the creator or a default value can be used. For an existing snapshot, the chunk size in the snapshot header on disk should always be used. If someone attempts to load an existing snapshot and has the 'default chunk size' option set, the kernel uses its default value even when it is incorrect for the snapshot being loaded. This patch ensures the correct on-disk value is always used. Secondly, when the code does use the chunk size stored on the disk it is prudent to revalidate it, so the code can exit cleanly if it got corrupted as happened in https://bugzilla.redhat.com/show_bug.cgi?id=461506 . Cc: stable@kernel.org Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Mikulas Patocka 提交于
Break the function set_chunk_size to two functions in preparation for the fix in the following patch. Cc: stable@kernel.org Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Mikulas Patocka 提交于
If a persistent snapshot fills up, a race can corrupt the on-disk header which causes a crash on any future attempt to activate the snapshot (typically while booting). This patch fixes the race. When the snapshot overflows, __invalidate_snapshot is called, which calls snapshot store method drop_snapshot. It goes to persistent_drop_snapshot that calls write_header. write_header constructs the new header in the "area" location. Concurrently, an existing kcopyd job may finish, call copy_callback and commit_exception method, that goes to persistent_commit_exception. persistent_commit_exception doesn't do locking, relying on the fact that callbacks are single-threaded, but it can race with snapshot invalidation and overwrite the header that is just being written while the snapshot is being invalidated. The result of this race is a corrupted header being written that can lead to a crash on further reactivation (if chunk_size is zero in the corrupted header). The fix is to use separate memory areas for each. See the bug: https://bugzilla.redhat.com/show_bug.cgi?id=461506 Cc: stable@kernel.org Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Mikulas Patocka 提交于
Refactor chunk_io to prepare for the fix in the following patch. Pass an area pointer to chunk_io and simplify zero_disk_area to use chunk_io. No functional change. Cc: stable@kernel.org Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Jonathan Brassow 提交于
Device-mapper userspace logs (like the clustered log) are identified by a universally unique identifier (UUID). This identifier is used to associate requests from the kernel to a specific log in userspace. The UUID must be unique everywhere, since multiple machines may use this identifier when communicating about a particular log, as is the case for cluster logs. Sometimes, device-mapper/LVM may re-use a UUID. This is the case during pvmoves, when moving from one segment of an LV to another, or when resizing a mirror, etc. In these cases, a new log is created with the same UUID and loaded in the "inactive" slot. When a device-mapper "resume" is issued, the "live" table is deactivated and the new "inactive" table becomes "live". (The "inactive" table can also be removed via a device-mapper 'clear' command.) The above two issues were colliding. More than one log was being created with the same UUID, and there was no way to distinguish between them. So, sometimes the wrong log would be swapped out during the exchange. The solution is to create a locally unique identifier, 'luid', to go along with the UUID. This new identifier is used to determine exactly which log is being referenced by the kernel when the log exchange is made. The identifier is not universally safe, but it does not need to be, since create/destroy/suspend/resume operations are bound to a specific machine; and these are the operations that make up the exchange. Signed-off-by: NJonathan Brassow <jbrassow@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Jonathan Brassow 提交于
This patch fixes a bug which was triggering a case where the primary leg could not be changed on failure even when the mirror was in-sync. The case involves the failure of the primary device along with the transient failure of the log device. The problem is that bios can be put on the 'failures' list (due to log failure) before 'fail_mirror' is called due to the primary device failure. Normally, this is fine, but if the log device failure is transient, a subsequent iteration of the work thread, 'do_mirror', will reset 'log_failure'. The 'do_failures' function then resets the 'in_sync' variable when processing bios on the failures list. The 'in_sync' variable is what is used to determine if the primary device can be switched in the event of a failure. Since this has been reset, the primary device is incorrectly assumed to be not switchable. The case has been seen in the cluster mirror context, where one machine realizes the log device is dead before the other machines. As the responsibilities of the server migrate from one node to another (because the mirror is being reconfigured due to the failure), the new server may think for a moment that the log device is fine - thus resetting the 'log_failure' variable. In any case, it is inappropiate for us to reset the 'log_failure' variable. The above bug simply illustrates that it can actually hurt us. Cc: stable@kernel.org Signed-off-by: NJonathan Brassow <jbrassow@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Jonathan Brassow 提交于
The output of 'dmsetup table' includes an internal field that should not be there. This patch removes it. To make the fix simpler, we first reorder a constructor argument The 'device size' argument is generated internally. Currently it is placed as the last space-separated word of the constructor string. However, we need to use a version of the string without this word, so we move it to the beginning instead so it is trivial to skip past it. We keep a copy of the arguments passed to userspace for creating a log, just in case we need to resend them. These are the same arguments that are desired in the STATUSTYPE_TABLE request, except for one. When creating the userspace log, the userspace daemon must know the size of the mirror, so that is added to the arguments given in the constructor table. We were printing this extra argument out as well, which is a mistake. Signed-off-by: NJonathan Brassow <jbrassow@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Jonathan Brassow 提交于
Fix 'dmsetup table' output. There is a missing ' ' at the end of the string causing two words to run together. Signed-off-by: NJonathan Brassow <jbrassow@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Mike Snitzer 提交于
Set sensible I/O hints for striped DM devices in the topology infrastructure added for 2.6.31 for userspace tools to obtain via sysfs. Add .io_hints to 'struct target_type' to allow the I/O hints portion (io_min and io_opt) of the 'struct queue_limits' to be set by each target and implement this for dm-stripe. Signed-off-by: NMike Snitzer <snitzer@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Mike Snitzer 提交于
A couple of recent warning messages make it difficult for the reader to determine exactly what is wrong. This patch adds more information to those messages. The messages were added by these commits: 5dea271b ("dm table: pass correct dev area size to device_area_is_valid") ea9df47c ("dm table: fix blk_stack_limits arg to use bytes not sectors") The patch also corrects references to logical_block_size in printk format strings from %hu to %u. Signed-off-by: NMike Snitzer <snitzer@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Mikulas Patocka 提交于
The logic to check for valid device areas is inverted relative to proper use with iterate_devices. The iterate_devices method calls its callback for every underlying device in the target. If any callback returns non-zero, iterate_devices exits immediately. But the callback device_area_is_valid() returns 0 on error and 1 on success. The overall effect without is that an error is issued only if every device is invalid. This patch renames device_area_is_valid to device_area_is_invalid and inverts the logic so that one invalid device is sufficient to raise an error. Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Mike Snitzer 提交于
Implement the .iterate_devices for the origin and snapshot targets. dm-snapshot's lack of .iterate_devices resulted in the inability to properly establish queue_limits for both targets. With 4K sector drives: an unfortunate side-effect of not establishing proper limits in either targets' DM device was that IO to the devices would fail even though both had been created without error. Commit af4874e0 ("dm target:s introduce iterate devices fn") in 2.6.31-rc1 should have implemented .iterate_devices for dm-snap.c's origin and snapshot targets. Signed-off-by: NMike Snitzer <snitzer@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Kiyoshi Ueda 提交于
The patch posted at http://marc.info/?l=dm-devel&m=124539787228784&w=2 which was merged into cec47e3d ("dm: prepare for request based option") introduced a regression in request-based dm. If map_request() calls dm_kill_unmapped_request() to complete a cloned bio without dispatching it, clone->bio is still set when dm_end_request() is called and the BUG_ON(clone->bio) is incorrect. The patch fixes this bug by freeing bio in dm_end_request() if the clone has bio. I've redone my tests to cover all I/O paths and confirmed there's no other regression. Here is the oops I hit in request-based dm when I do I/O to a multipath device which doesn't have any active path nor queue_if_no_path setting: ------------[ cut here ]------------ kernel BUG at /root/2.6.31-rc4.rqdm/drivers/md/dm.c:828! invalid opcode: 0000 [#1] SMP last sysfs file: /sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map CPU 1 Modules linked in: autofs4 sunrpc cpufreq_ondemand acpi_cpufreq dm_mirror dm_region_hash dm_log dm_service_time dm_multipath scsi_dh dm_mod video output sbs sbshc battery ac sg sr_mod e1000e button cdrom serio_raw rtc_cmos rtc_core rtc_lib piix lpfc scsi_transport_fc ata_piix libata megaraid_sas sd_mod scsi_mod crc_t10dif ext3 jbd uhci_hcd ohci_hcd ehci_hcd [last unloaded: microcode] Pid: 7, comm: ksoftirqd/1 Not tainted 2.6.31-rc4.rqdm #1 Express5800/120Lj [N8100-1417] RIP: 0010:[<ffffffffa023629d>] [<ffffffffa023629d>] dm_softirq_done+0xbd/0x100 [dm_mod] RSP: 0018:ffff8800280a1f08 EFLAGS: 00010282 RAX: ffffffffa02544e0 RBX: ffff8802aa1111d0 RCX: ffff8802aa1111e0 RDX: ffff8802ab913e70 RSI: 0000000000000000 RDI: ffff8802ab913e70 RBP: ffff8800280a1f28 R08: ffffc90005457040 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000000 R12: 00000000fffffffb R13: ffff8802ab913e88 R14: ffff8802ab9c1438 R15: 0000000000000100 FS: 0000000000000000(0000) GS:ffff88002809e000(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 0000003d54a98640 CR3: 000000029f0a1000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process ksoftirqd/1 (pid: 7, threadinfo ffff8802ae50e000, task ffff8802ae4f8040) Stack: ffff8800280a1f38 0000000000000020 ffffffff814f30a0 0000000000000004 <0> ffff8800280a1f58 ffffffff8116b245 ffff8800280a1f38 ffff8800280a1f38 <0> ffff8800280a1f58 0000000000000001 ffff8800280a1fa8 ffffffff810477bc Call Trace: <IRQ> [<ffffffff8116b245>] blk_done_softirq+0x75/0x90 [<ffffffff810477bc>] __do_softirq+0xcc/0x210 [<ffffffff81047170>] ? ksoftirqd+0x0/0x110 [<ffffffff8100ce7c>] call_softirq+0x1c/0x50 <EOI> [<ffffffff8100e785>] do_softirq+0x65/0xa0 [<ffffffff81047170>] ? ksoftirqd+0x0/0x110 [<ffffffff810471e0>] ksoftirqd+0x70/0x110 [<ffffffff81059559>] kthread+0x99/0xb0 [<ffffffff8100cd7a>] child_rip+0xa/0x20 [<ffffffff8100c73c>] ? restore_args+0x0/0x30 [<ffffffff810594c0>] ? kthread+0x0/0xb0 [<ffffffff8100cd70>] ? child_rip+0x0/0x20 Code: 44 89 e6 48 89 df e8 23 fb f2 e0 be 01 00 00 00 4c 89 f7 e8 f6 fd ff ff 5b 41 5c 41 5d 41 5e c9 c3 4c 89 ef e8 85 fe ff ff eb ed <0f> 0b eb fe 41 8b 85 dc 00 00 00 48 83 bb 10 01 00 00 00 89 83 RIP [<ffffffffa023629d>] dm_softirq_done+0xbd/0x100 [dm_mod] RSP <ffff8800280a1f08> ---[ end trace 16af0a1d8542da55 ]--- Signed-off-by: NKiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
- 30 8月, 2009 6 次提交
-
-
由 Dan Williams 提交于
Now that the resources to handle stripe_head operations are allocated percpu it is possible for raid5d to distribute stripe handling over multiple cores. This conversion also adds a call to cond_resched() in the non-multicore case to prevent one core from getting monopolized for raid operations. Cc: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
由 Yuri Tikhonov 提交于
These routines have been replaced by there asynchronous counterparts. Signed-off-by: NYuri Tikhonov <yur@emcraft.com> Signed-off-by: NIlya Yanok <yanok@emcraft.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
由 Yuri Tikhonov 提交于
1/ Use STRIPE_OP_BIOFILL to offload completion of read requests to raid_run_ops 2/ Implement a handler for sh->reconstruct_state similar to the raid5 case (adds handling of Q parity) 3/ Prevent handle_parity_checks6 from running concurrently with 'compute' operations 4/ Hook up raid_run_ops Signed-off-by: NYuri Tikhonov <yur@emcraft.com> Signed-off-by: NIlya Yanok <yanok@emcraft.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
由 Dan Williams 提交于
[ Based on an original patch by Yuri Tikhonov ] Implement the state machine for handling the RAID-6 parities check and repair functionality. Note that the raid6 case does not need to check for new failures, like raid5, as it will always writeback the correct disks. The raid5 case can be updated to check zero_sum_result to avoid getting confused by new failures rather than retrying the entire check operation. Signed-off-by: NYuri Tikhonov <yur@emcraft.com> Signed-off-by: NIlya Yanok <yanok@emcraft.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
由 Yuri Tikhonov 提交于
In the synchronous implementation of stripe dirtying we processed a degraded stripe with one call to handle_stripe_dirtying6(). I.e. compute the missing blocks from the other drives, then copy in the new data and reconstruct the parities. In the asynchronous case we do not perform stripe operations directly. Instead, operations are scheduled with flags to be later serviced by raid_run_ops. So, for the degraded case the final reconstruction step can only be carried out after all blocks have been brought up to date by being read, or computed. Like the raid5 case schedule_reconstruction() sets STRIPE_OP_RECONSTRUCT to request a parity generation pass and through operation chaining can handle compute and reconstruct in a single raid_run_ops pass. [dan.j.williams@intel.com: fixup handle_stripe_dirtying6 gating] Signed-off-by: NYuri Tikhonov <yur@emcraft.com> Signed-off-by: NIlya Yanok <yanok@emcraft.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
由 Yuri Tikhonov 提交于
Modify handle_stripe_fill6 to work asynchronously by introducing fetch_block6 as the raid6 analog of fetch_block5 (schedule compute operations for missing/out-of-sync disks). [dan.j.williams@intel.com: compute D+Q in one pass] Signed-off-by: NYuri Tikhonov <yur@emcraft.com> Signed-off-by: NIlya Yanok <yanok@emcraft.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-