- 11 12月, 2009 27 次提交
-
-
由 Mikulas Patocka 提交于
Explicitly initialize bio lists instead of relying on kzalloc. Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Reviewed-by: NTakahiro Yasui <tyasui@redhat.com> Tested-by: NTakahiro Yasui <tyasui@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Mikulas Patocka 提交于
Hold all write bios when leg fails and errors are handled When using a userspace daemon such as dmeventd to handle errors, we must delay completing bios until it has done its job. This patch prevents the following race: - primary leg fails - write "1" fail, the write is held, secondary leg is set default - write "2" goes straight to the secondary leg Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Reviewed-by: NTakahiro Yasui <tyasui@redhat.com> Tested-by: NTakahiro Yasui <tyasui@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Mikulas Patocka 提交于
Hold all write bios when errors are handled. Previously the failures list was used only when handling errors with a userspace daemon such as dmeventd. Now, it is always used for all bios. The regions where some writes failed must be marked as nosync. This can only be done in process context (i.e. in raid1 workqueue), not in the write_callback function. Previously the write would succeed if writing to at least one leg succeeded. This is wrong because data from the failed leg may be replicated to the correct leg. Now, if using a userspace daemon, the write with some failures will be held until the daemon has done its job and reconfigured the array. If not using a daemon, the write still succeeds if at least one leg succeeds. This is bad, but it is consistent with current behavior. Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Reviewed-by: NTakahiro Yasui <tyasui@redhat.com> Tested-by: NTakahiro Yasui <tyasui@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Mikulas Patocka 提交于
Move bio completion out of dm_rh_mark_nosync in preparation for the next patch. Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Reviewed-by: NTakahiro Yasui <tyasui@redhat.com> Tested-by: NTakahiro Yasui <tyasui@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Mikulas Patocka 提交于
Move the logic to get a valid mirror leg into a function for re-use in a later patch. Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Reviewed-by: NTakahiro Yasui <tyasui@redhat.com> Tested-by: NTakahiro Yasui <tyasui@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Mikulas Patocka 提交于
Use the hold framework in do_failures. This patch doesn't change the bio processing logic, it just simplifies failure handling and avoids periodically polling the failures list. Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Reviewed-by: NTakahiro Yasui <tyasui@redhat.com> Tested-by: NTakahiro Yasui <tyasui@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Mikulas Patocka 提交于
Add framework to delay bios until a suspend and then resubmit them with either DM_ENDIO_REQUEUE (if the suspend was noflush) or complete them with -EIO. I/O barrier support will use this. Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Reviewed-by: NTakahiro Yasui <tyasui@redhat.com> Tested-by: NTakahiro Yasui <tyasui@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Mikulas Patocka 提交于
Report flush errors as 'F' instead of 'D' for log and mirror devices. Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Mikulas Patocka 提交于
Implement flush callee. It uses dm_io to send zero-size barrier synchronously and concurrently to all the mirror legs. Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Mikulas Patocka 提交于
Call the flush callback from the log. If flush failed, we have no alternative but to mark the whole log as dirty. Also we set the variable flush_failed to prevent any bits ever being marked as clean again. Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Mikulas Patocka 提交于
Introduce a callback pointer from the log to dm-raid1 layer. Before some region is set as "in-sync", we need to flush hardware cache on all the disks. But the log module doesn't have access to the mirror_set structure. So it will use this callback. So far the callback is unused, it will be used in further patches. Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Mikulas Patocka 提交于
Introduce "flush failed" variable. When a flush before clearing a bit in the log fails, we don't know anything about which which regions are in-sync and which not. So we need to set all regions as not-in-sync and set the variable "flush_failed" to prevent setting the in-sync bit in the future. A target reload is the only way to get out of this situation. The variable will be set in following patches. Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Mikulas Patocka 提交于
Introduce flush_header and use it to flush the log device. Note that we don't have to flush if all the regions transition from "dirty" to "clean" state. Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Mikulas Patocka 提交于
Split the variable "touched" into two, "touched_dirtied" and "touched_cleaned", set when some region was dirtied or cleaned. This will be used to optimize flushes. After a transition from "dirty" to "clean" state we don't have flush hardware cache on the log device. After a transition from "clean" to "dirty" the cache must be flushed. Before a transition from "clean" to "dirty" state we don't have to flush all the raid legs. Before a transition from "dirty" to "clean" we must flush all the legs to make sure that they are really in sync. Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Mikulas Patocka 提交于
Flush support for dm-raid1. When it receives an empty barrier, submit it to all the devices via dm-io. Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Mikulas Patocka 提交于
Remove the hack where we allocate an extra bi_io_vec to store additional private data. This hack prevents us from supporting barriers in dm-raid1 without first making another little block layer change. Instead of doing that, this patch eliminates the bi_io_vec abuse by storing the region number directly in the low bits of bi_private. We need to store two things for each bio, the pointer to the main io structure and, if parallel writes were requested, an index indicating which of these writes this bio belongs to. There can be at most BITS_PER_LONG regions - 32 or 64. The index (region number) was stored in the last (hidden) bio vector and the pointer to struct io was stored in bi_private. This patch now aligns "struct io" on BITS_PER_LONG bytes and stores the region number in the low BITS_PER_LONG bits of bi_private. Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Mikulas Patocka 提交于
Allocate "struct io" from a slab. This patch changes dm-io, so that "struct io" is allocated from a slab cache. It used to be allocated with kmalloc. Allocating from a slab will be needed for the next patch, because it requires a special alignment of "struct io" and kmalloc cannot meet this alignment. Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Milan Broz 提交于
The "wipe key" message is used to wipe the volume key from memory temporarily, for example when suspending to RAM. But the initialisation vector in ESSIV mode is calculated from the hashed volume key, so the wipe message should wipe this IV key too and reinitialise it when the volume key is reinstated. This patch adds an IV wipe method called from a wipe message callback. ESSIV is then reinitialised using the init function added by the last patch. Cc: stable@kernel.org Signed-off-by: NMilan Broz <mbroz@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Milan Broz 提交于
This patch separates the construction of IV from its initialisation. (For ESSIV it is a hash calculation based on volume key.) Constructor code now preallocates hash tfm and salt array and saves it in a private IV structure. The next patch requires this to reinitialise the wiped IV without reallocating memory when resuming a suspended device. Cc: stable@kernel.org Signed-off-by: NMilan Broz <mbroz@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Milan Broz 提交于
Use kzfree for salt deallocation because it is derived from the volume key. Use a common error path in ESSIV constructor. Required by a later patch which fixes the way key material is wiped from memory. Cc: stable@kernel.org Signed-off-by: NMilan Broz <mbroz@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Milan Broz 提交于
Define private structures for IV so it's easy to add further attributes in a following patch which fixes the way key material is wiped from memory. Also move ESSIV destructor and remove unnecessary 'status' operation. There are no functional changes in this patch. Cc: stable@kernel.org Signed-off-by: NMilan Broz <mbroz@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Milan Broz 提交于
The "wipe key" message is used to wipe a volume key from memory temporarily, for example when suspending to RAM. There are two instances of the key in memory (inside crypto tfm) but only one got wiped. This patch wipes them both. Cc: stable@kernel.org Signed-off-by: NMilan Broz <mbroz@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Mikulas Patocka 提交于
Under some special conditions the snapshot hash_size is calculated as zero. This patch instead sets a minimum value of 64, the same as for the pending exception table. rounddown_pow_of_two(0) is an undefined operation (it expands to shift by -1). init_exception_table with an argument of 0 would fail with -ENOMEM. The way to trigger the problem is to create a snapshot with a chunk size that is larger than the origin device. Cc: stable@kernel.org Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Mikulas Patocka 提交于
Take snapshot lock only for STATUSTYPE_INFO, not STATUSTYPE_TABLE. Commit 4c6fff44 (dm-snapshot-lock-snapshot-while-supplying-status.patch) introduced this use of the lock, but userspace applications using libdevmapper have been found to request STATUSTYPE_TABLE while the device is suspended and the lock is already held, leading to deadlock. Since the lock is not necessary in this case, don't try to take it. Cc: stable@kernel.org Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Milan Broz 提交于
This patch just removes an unnecessary warning: kobject: 'dm': does not have a release() function, it is broken and must be fixed. The kobject is embedded in mapped device struct, so code does not need to release memory explicitly here. Cc: stable@kernel.org Signed-off-by: NMilan Broz <mbroz@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Julia Lawall 提交于
Error handling code following a kmalloc should free the allocated data. Cc: stable@kernel.org Signed-off-by: NJulia Lawall <julia@diku.dk> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Mikulas Patocka 提交于
Fix a reported deadlock if there are still unprocessed multipath events on a device that is being removed. _hash_lock is held during dev_remove while trying to send the outstanding events. Sending the events requests the _hash_lock again in dm_copy_name_and_uuid. This patch introduces a separate lock around regions that modify the link to the hash table (dm_set_mdptr) or the name or uuid so that dm_copy_name_and_uuid no longer needs _hash_lock. Additionally, dm_copy_name_and_uuid can only be called if md exists so we can drop the dm_get() and dm_put() which can lead to a BUG() while md is being freed. The deadlock: #0 [ffff8106298dfb48] schedule at ffffffff80063035 #1 [ffff8106298dfc20] __down_read at ffffffff8006475d #2 [ffff8106298dfc60] dm_copy_name_and_uuid at ffffffff8824f740 #3 [ffff8106298dfc90] dm_send_uevents at ffffffff88252685 #4 [ffff8106298dfcd0] event_callback at ffffffff8824c678 #5 [ffff8106298dfd00] dm_table_event at ffffffff8824dd01 #6 [ffff8106298dfd10] __hash_remove at ffffffff882507ad #7 [ffff8106298dfd30] dev_remove at ffffffff88250865 #8 [ffff8106298dfd60] ctl_ioctl at ffffffff88250d80 #9 [ffff8106298dfee0] do_ioctl at ffffffff800418c4 #10 [ffff8106298dff00] vfs_ioctl at ffffffff8002fab9 #11 [ffff8106298dff40] sys_ioctl at ffffffff8004bdaf #12 [ffff8106298dff80] tracesys at ffffffff8005d28d (via system_call) Cc: stable@kernel.org Reported-by: Nguy keren <choo@actcom.co.il> Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
- 05 12月, 2009 1 次提交
-
-
由 Chandra Seetharaman 提交于
Make scsi_dh_activate() function asynchronous, by taking in two additional parameters, one is the callback function and the other is the data to call the callback function with. Signed-off-by: NChandra Seetharaman <sekharan@us.ibm.com> Signed-off-by: NJames Bottomley <James.Bottomley@suse.de>
-
- 01 12月, 2009 1 次提交
-
-
由 NeilBrown 提交于
commit 4706b349 was a forward port of a fix that was needed for SLES10. But in fact it is not needed in mainline because the earlier commit dd00a99e fixes the same problem in a better way. Further, this commit introduces a bug in the way it interacts with the automatic read-error-correction. If, after a read error is successfully corrected, the same disk is chosen to re-read - the re-read won't be attempted but an error will be returned instead. After reverting that commit, there is the possibility that a read error on a read-only array (where read errors cannot be corrected as that requires a write) will repeatedly read the same device and continue to get an error. So in the "Array is readonly" case, fail the drive immediately on a read error. Signed-off-by: NNeilBrown <neilb@suse.de> Cc: stable@kernel.org
-
- 19 11月, 2009 1 次提交
-
-
由 Eric W. Biederman 提交于
For consistency drop & in front of every proc_handler. Explicity taking the address is unnecessary and it prevents optimizations like stubbing the proc_handlers to NULL. Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Joe Perches <joe@perches.com> Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
-
- 13 11月, 2009 3 次提交
-
-
由 NeilBrown 提交于
Normally is it not safe to allow a raid5 that is both dirty and degraded to be assembled without explicit request from that admin, as it can cause hidden data corruption. This is because 'dirty' means that the parity cannot be trusted, and 'degraded' means that the parity needs to be used. However, if the device that is missing contains only parity, then there is no issue and assembly can continue. This particularly applies when a RAID5 is being converted to a RAID6 and there is an unclean shutdown while the conversion is happening. So check for whether the degraded space only contains parity, and in that case, allow the assembly. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
When a reshape finds that it can add spare devices into the array, those devices might already be 'in_sync' if they are beyond the old size of the array, or they might not if they are within the array. The first case happens when we change an N-drive RAID5 to an N+1-drive RAID5. The second happens when we convert an N-drive RAID5 to an N+1-drive RAID6. So set the flag more carefully. Also, ->recovery_offset is only meaningful when the flag is clear, so only set it in that case. This change needs the preceding two to ensure that the non-in_sync device doesn't get evicted from the array when it is stopped, in the case where v0.90 metadata is used. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
This is a combination that didn't really make sense before. However when a reshape is converting e.g. raid5 -> raid6, the extra device is not fully in-sync, but is certainly active and contains important data. So allow that start to be meaningful and in particular get the 'recovery_offset' value (which is needed for any non-in-sync active device) from the reshape_position. Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 12 11月, 2009 2 次提交
-
-
由 Eric W. Biederman 提交于
Now that sys_sysctl is a wrapper around /proc/sys all of the binary sysctl support elsewhere in the tree is dead code. Cc: Jens Axboe <axboe@kernel.dk> Cc: Corey Minyard <minyard@acm.org> Cc: Greg Kroah-Hartman <gregkh@suse.de> Cc: Matt Mackall <mpm@selenic.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Neil Brown <neilb@suse.de> Cc: "James E.J. Bottomley" <James.Bottomley@suse.de> Acked-by: Clemens Ladisch <clemens@ladisch.de> for drivers/char/hpet.c Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
-
由 NeilBrown 提交于
Each device has its own 'recovery_offset' showing how far recovery has progressed on the device. As the only real significance of this is that fact that it can be stored in the metadata and recovered at restart, and as only 1.x metadata can do this, we were only updating 'recovery_offset' to 'curr_resync_completed' when updating v1.x metadata. But this is wrong, and we will shortly make limited use of this field in v0.90 metadata. So move the update into common code. Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 09 11月, 2009 1 次提交
-
-
由 Dirk Hohndel 提交于
something-bility is spelled as something-blity so a grep for 'blit' would find these lines this is so trivial that I didn't split it by subsystem / copy additional maintainers - all changes are to comments The only purpose is to get fewer false positives when grepping around the kernel sources. Signed-off-by: NDirk Hohndel <hohndel@infradead.org> Signed-off-by: NJiri Kosina <jkosina@suse.cz>
-
- 06 11月, 2009 2 次提交
-
-
由 NeilBrown 提交于
This value is visible through sysfs and is used by mdadm when it manages a reshape (backing up data that is about to be rearranged). So it is important that it is always correct. Current it does not get updated properly when a reshape starts which can cause problems when assembling an array that is in the middle of being reshaped. This is suitable for 2.6.31.y stable kernels. Cc: stable@kernel.org Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
If a 'sync_max' has been set (via sysfs), it is wrong to clear it until a resync (or reshape or recovery ...) actually reached that point. So if a resync is interrupted (e.g. by device failure), leave 'resync_max' unchanged. This is particularly important for 'reshape' operations that do not change the size of the array. For such operations mdadm needs to monitor the reshape taking rolling backups of the section being reshaped. If resync_max gets cleared, the reshape can get ahead of mdadm and then the backups that mdadm creates are useless. This is suitable for 2.6.31.y stable kernels. Cc: stable@kernel.org Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 20 10月, 2009 1 次提交
-
-
由 Dan Williams 提交于
Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 17 10月, 2009 1 次提交
-
-
由 Mikulas Patocka 提交于
Allow the snapshot chunk size to be smaller than the page size The code is now capable of handling this due to some previous fixes and enhancements. As the page size varies between computers, prior to this patch, the chunk size of a snapshot dictated which machines could read it: Snapshots created on one machine might not be readable on another. Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Reviewed-by: NMike Snitzer <snitzer@redhat.com> Reviewed-by: NJonathan Brassow <jbrassow@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-