- 11 12月, 2009 17 次提交
-
-
由 Mikulas Patocka 提交于
Introduce a callback pointer from the log to dm-raid1 layer. Before some region is set as "in-sync", we need to flush hardware cache on all the disks. But the log module doesn't have access to the mirror_set structure. So it will use this callback. So far the callback is unused, it will be used in further patches. Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Mikulas Patocka 提交于
Introduce "flush failed" variable. When a flush before clearing a bit in the log fails, we don't know anything about which which regions are in-sync and which not. So we need to set all regions as not-in-sync and set the variable "flush_failed" to prevent setting the in-sync bit in the future. A target reload is the only way to get out of this situation. The variable will be set in following patches. Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Mikulas Patocka 提交于
Introduce flush_header and use it to flush the log device. Note that we don't have to flush if all the regions transition from "dirty" to "clean" state. Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Mikulas Patocka 提交于
Split the variable "touched" into two, "touched_dirtied" and "touched_cleaned", set when some region was dirtied or cleaned. This will be used to optimize flushes. After a transition from "dirty" to "clean" state we don't have flush hardware cache on the log device. After a transition from "clean" to "dirty" the cache must be flushed. Before a transition from "clean" to "dirty" state we don't have to flush all the raid legs. Before a transition from "dirty" to "clean" we must flush all the legs to make sure that they are really in sync. Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Mikulas Patocka 提交于
Flush support for dm-raid1. When it receives an empty barrier, submit it to all the devices via dm-io. Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Mikulas Patocka 提交于
Remove the hack where we allocate an extra bi_io_vec to store additional private data. This hack prevents us from supporting barriers in dm-raid1 without first making another little block layer change. Instead of doing that, this patch eliminates the bi_io_vec abuse by storing the region number directly in the low bits of bi_private. We need to store two things for each bio, the pointer to the main io structure and, if parallel writes were requested, an index indicating which of these writes this bio belongs to. There can be at most BITS_PER_LONG regions - 32 or 64. The index (region number) was stored in the last (hidden) bio vector and the pointer to struct io was stored in bi_private. This patch now aligns "struct io" on BITS_PER_LONG bytes and stores the region number in the low BITS_PER_LONG bits of bi_private. Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Mikulas Patocka 提交于
Allocate "struct io" from a slab. This patch changes dm-io, so that "struct io" is allocated from a slab cache. It used to be allocated with kmalloc. Allocating from a slab will be needed for the next patch, because it requires a special alignment of "struct io" and kmalloc cannot meet this alignment. Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Milan Broz 提交于
The "wipe key" message is used to wipe the volume key from memory temporarily, for example when suspending to RAM. But the initialisation vector in ESSIV mode is calculated from the hashed volume key, so the wipe message should wipe this IV key too and reinitialise it when the volume key is reinstated. This patch adds an IV wipe method called from a wipe message callback. ESSIV is then reinitialised using the init function added by the last patch. Cc: stable@kernel.org Signed-off-by: NMilan Broz <mbroz@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Milan Broz 提交于
This patch separates the construction of IV from its initialisation. (For ESSIV it is a hash calculation based on volume key.) Constructor code now preallocates hash tfm and salt array and saves it in a private IV structure. The next patch requires this to reinitialise the wiped IV without reallocating memory when resuming a suspended device. Cc: stable@kernel.org Signed-off-by: NMilan Broz <mbroz@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Milan Broz 提交于
Use kzfree for salt deallocation because it is derived from the volume key. Use a common error path in ESSIV constructor. Required by a later patch which fixes the way key material is wiped from memory. Cc: stable@kernel.org Signed-off-by: NMilan Broz <mbroz@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Milan Broz 提交于
Define private structures for IV so it's easy to add further attributes in a following patch which fixes the way key material is wiped from memory. Also move ESSIV destructor and remove unnecessary 'status' operation. There are no functional changes in this patch. Cc: stable@kernel.org Signed-off-by: NMilan Broz <mbroz@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Milan Broz 提交于
The "wipe key" message is used to wipe a volume key from memory temporarily, for example when suspending to RAM. There are two instances of the key in memory (inside crypto tfm) but only one got wiped. This patch wipes them both. Cc: stable@kernel.org Signed-off-by: NMilan Broz <mbroz@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Mikulas Patocka 提交于
Under some special conditions the snapshot hash_size is calculated as zero. This patch instead sets a minimum value of 64, the same as for the pending exception table. rounddown_pow_of_two(0) is an undefined operation (it expands to shift by -1). init_exception_table with an argument of 0 would fail with -ENOMEM. The way to trigger the problem is to create a snapshot with a chunk size that is larger than the origin device. Cc: stable@kernel.org Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Mikulas Patocka 提交于
Take snapshot lock only for STATUSTYPE_INFO, not STATUSTYPE_TABLE. Commit 4c6fff44 (dm-snapshot-lock-snapshot-while-supplying-status.patch) introduced this use of the lock, but userspace applications using libdevmapper have been found to request STATUSTYPE_TABLE while the device is suspended and the lock is already held, leading to deadlock. Since the lock is not necessary in this case, don't try to take it. Cc: stable@kernel.org Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Milan Broz 提交于
This patch just removes an unnecessary warning: kobject: 'dm': does not have a release() function, it is broken and must be fixed. The kobject is embedded in mapped device struct, so code does not need to release memory explicitly here. Cc: stable@kernel.org Signed-off-by: NMilan Broz <mbroz@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Julia Lawall 提交于
Error handling code following a kmalloc should free the allocated data. Cc: stable@kernel.org Signed-off-by: NJulia Lawall <julia@diku.dk> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Mikulas Patocka 提交于
Fix a reported deadlock if there are still unprocessed multipath events on a device that is being removed. _hash_lock is held during dev_remove while trying to send the outstanding events. Sending the events requests the _hash_lock again in dm_copy_name_and_uuid. This patch introduces a separate lock around regions that modify the link to the hash table (dm_set_mdptr) or the name or uuid so that dm_copy_name_and_uuid no longer needs _hash_lock. Additionally, dm_copy_name_and_uuid can only be called if md exists so we can drop the dm_get() and dm_put() which can lead to a BUG() while md is being freed. The deadlock: #0 [ffff8106298dfb48] schedule at ffffffff80063035 #1 [ffff8106298dfc20] __down_read at ffffffff8006475d #2 [ffff8106298dfc60] dm_copy_name_and_uuid at ffffffff8824f740 #3 [ffff8106298dfc90] dm_send_uevents at ffffffff88252685 #4 [ffff8106298dfcd0] event_callback at ffffffff8824c678 #5 [ffff8106298dfd00] dm_table_event at ffffffff8824dd01 #6 [ffff8106298dfd10] __hash_remove at ffffffff882507ad #7 [ffff8106298dfd30] dev_remove at ffffffff88250865 #8 [ffff8106298dfd60] ctl_ioctl at ffffffff88250d80 #9 [ffff8106298dfee0] do_ioctl at ffffffff800418c4 #10 [ffff8106298dff00] vfs_ioctl at ffffffff8002fab9 #11 [ffff8106298dff40] sys_ioctl at ffffffff8004bdaf #12 [ffff8106298dff80] tracesys at ffffffff8005d28d (via system_call) Cc: stable@kernel.org Reported-by: Nguy keren <choo@actcom.co.il> Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
- 05 12月, 2009 1 次提交
-
-
由 Chandra Seetharaman 提交于
Make scsi_dh_activate() function asynchronous, by taking in two additional parameters, one is the callback function and the other is the data to call the callback function with. Signed-off-by: NChandra Seetharaman <sekharan@us.ibm.com> Signed-off-by: NJames Bottomley <James.Bottomley@suse.de>
-
- 01 12月, 2009 1 次提交
-
-
由 NeilBrown 提交于
commit 4706b349 was a forward port of a fix that was needed for SLES10. But in fact it is not needed in mainline because the earlier commit dd00a99e fixes the same problem in a better way. Further, this commit introduces a bug in the way it interacts with the automatic read-error-correction. If, after a read error is successfully corrected, the same disk is chosen to re-read - the re-read won't be attempted but an error will be returned instead. After reverting that commit, there is the possibility that a read error on a read-only array (where read errors cannot be corrected as that requires a write) will repeatedly read the same device and continue to get an error. So in the "Array is readonly" case, fail the drive immediately on a read error. Signed-off-by: NNeilBrown <neilb@suse.de> Cc: stable@kernel.org
-
- 19 11月, 2009 1 次提交
-
-
由 Eric W. Biederman 提交于
For consistency drop & in front of every proc_handler. Explicity taking the address is unnecessary and it prevents optimizations like stubbing the proc_handlers to NULL. Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Joe Perches <joe@perches.com> Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
-
- 13 11月, 2009 3 次提交
-
-
由 NeilBrown 提交于
Normally is it not safe to allow a raid5 that is both dirty and degraded to be assembled without explicit request from that admin, as it can cause hidden data corruption. This is because 'dirty' means that the parity cannot be trusted, and 'degraded' means that the parity needs to be used. However, if the device that is missing contains only parity, then there is no issue and assembly can continue. This particularly applies when a RAID5 is being converted to a RAID6 and there is an unclean shutdown while the conversion is happening. So check for whether the degraded space only contains parity, and in that case, allow the assembly. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
When a reshape finds that it can add spare devices into the array, those devices might already be 'in_sync' if they are beyond the old size of the array, or they might not if they are within the array. The first case happens when we change an N-drive RAID5 to an N+1-drive RAID5. The second happens when we convert an N-drive RAID5 to an N+1-drive RAID6. So set the flag more carefully. Also, ->recovery_offset is only meaningful when the flag is clear, so only set it in that case. This change needs the preceding two to ensure that the non-in_sync device doesn't get evicted from the array when it is stopped, in the case where v0.90 metadata is used. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
This is a combination that didn't really make sense before. However when a reshape is converting e.g. raid5 -> raid6, the extra device is not fully in-sync, but is certainly active and contains important data. So allow that start to be meaningful and in particular get the 'recovery_offset' value (which is needed for any non-in-sync active device) from the reshape_position. Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 12 11月, 2009 2 次提交
-
-
由 Eric W. Biederman 提交于
Now that sys_sysctl is a wrapper around /proc/sys all of the binary sysctl support elsewhere in the tree is dead code. Cc: Jens Axboe <axboe@kernel.dk> Cc: Corey Minyard <minyard@acm.org> Cc: Greg Kroah-Hartman <gregkh@suse.de> Cc: Matt Mackall <mpm@selenic.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Neil Brown <neilb@suse.de> Cc: "James E.J. Bottomley" <James.Bottomley@suse.de> Acked-by: Clemens Ladisch <clemens@ladisch.de> for drivers/char/hpet.c Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
-
由 NeilBrown 提交于
Each device has its own 'recovery_offset' showing how far recovery has progressed on the device. As the only real significance of this is that fact that it can be stored in the metadata and recovered at restart, and as only 1.x metadata can do this, we were only updating 'recovery_offset' to 'curr_resync_completed' when updating v1.x metadata. But this is wrong, and we will shortly make limited use of this field in v0.90 metadata. So move the update into common code. Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 09 11月, 2009 1 次提交
-
-
由 Dirk Hohndel 提交于
something-bility is spelled as something-blity so a grep for 'blit' would find these lines this is so trivial that I didn't split it by subsystem / copy additional maintainers - all changes are to comments The only purpose is to get fewer false positives when grepping around the kernel sources. Signed-off-by: NDirk Hohndel <hohndel@infradead.org> Signed-off-by: NJiri Kosina <jkosina@suse.cz>
-
- 06 11月, 2009 2 次提交
-
-
由 NeilBrown 提交于
This value is visible through sysfs and is used by mdadm when it manages a reshape (backing up data that is about to be rearranged). So it is important that it is always correct. Current it does not get updated properly when a reshape starts which can cause problems when assembling an array that is in the middle of being reshaped. This is suitable for 2.6.31.y stable kernels. Cc: stable@kernel.org Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
If a 'sync_max' has been set (via sysfs), it is wrong to clear it until a resync (or reshape or recovery ...) actually reached that point. So if a resync is interrupted (e.g. by device failure), leave 'resync_max' unchanged. This is particularly important for 'reshape' operations that do not change the size of the array. For such operations mdadm needs to monitor the reshape taking rolling backups of the section being reshaped. If resync_max gets cleared, the reshape can get ahead of mdadm and then the backups that mdadm creates are useless. This is suitable for 2.6.31.y stable kernels. Cc: stable@kernel.org Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 20 10月, 2009 1 次提交
-
-
由 Dan Williams 提交于
Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 17 10月, 2009 10 次提交
-
-
由 Mikulas Patocka 提交于
Allow the snapshot chunk size to be smaller than the page size The code is now capable of handling this due to some previous fixes and enhancements. As the page size varies between computers, prior to this patch, the chunk size of a snapshot dictated which machines could read it: Snapshots created on one machine might not be readable on another. Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Reviewed-by: NMike Snitzer <snitzer@redhat.com> Reviewed-by: NJonathan Brassow <jbrassow@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Mikulas Patocka 提交于
Use unsigned integer chunk size. Maximum chunk size is 512kB, there won't ever be need to use 4GB chunk size, so the number can be 32-bit. This fixes compiler failure on 32-bit systems with large block devices. Cc: stable@kernel.org Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com> Reviewed-by: NJonathan Brassow <jbrassow@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Mikulas Patocka 提交于
This patch locks the snapshot when returning status. It fixes a race when it could return an invalid number of free chunks if someone was simultaneously modifying it. Cc: stable@kernel.org Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Mikulas Patocka 提交于
Properly close the device if failing because of an invalid chunk size. Cc: stable@kernel.org Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Mikulas Patocka 提交于
If we are creating snapshot with memory-stored exception store, fail if the user didn't specify chunk size. Zero chunk size would probably crash a lot of places in the rest of snapshot code. Cc: stable@kernel.org Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Reviewed-by: NJonathan Brassow <jbrassow@redhat.com> Reviewed-by: NMike Snitzer <snitzer@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Kiyoshi Ueda 提交于
Multiple instances of dec_pending() can run concurrently so a lock is needed when it saves the first error code. I have never experienced actual problem without locking and just found this during code inspection while implementing the barrier support patch for request-based dm. This patch adds the locking. I've done compile, boot and basic I/O testings. Cc: stable@kernel.org Signed-off-by: NKiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Zdenek Kabelac 提交于
Add missing del_gendisk() to error path when creation of workqueue fails. Otherwice there is a resource leak and following warning is shown: WARNING: at fs/sysfs/dir.c:487 sysfs_add_one+0xc5/0x160() sysfs: cannot create duplicate filename '/devices/virtual/block/dm-0' Cc: stable@kernel.org Signed-off-by: NZdenek Kabelac <zkabelac@redhat.com> Reviewed-by: NJonathan Brassow <jbrassow@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Andrew Morton 提交于
mips: drivers/md/dm-log-userspace-base.c: In function `userspace_ctr': drivers/md/dm-log-userspace-base.c:159: warning: cast from pointer to integer of different size Cc: stable@kernel.org Cc: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Jonathan Brassow 提交于
While initializing the snapshot module, if we fail to register the snapshot target then we must back-out the exception store module initialization. Cc: stable@kernel.org Signed-off-by: NJonathan Brassow <jbrassow@redhat.com> Reviewed-by: NMikulas Patocka <mpatocka@redhat.com> Reviewed-by: NMike Snitzer <snitzer@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Mikulas Patocka 提交于
Avoid a race causing corruption when snapshots of the same origin have different chunk sizes by sorting the internal list of snapshots by chunk size, largest first. https://bugzilla.redhat.com/show_bug.cgi?id=182659 For example, let's have two snapshots with different chunk sizes. The first snapshot (1) has small chunk size and the second snapshot (2) has large chunk size. Let's have chunks A, B, C in these snapshots: snapshot1: ====A==== ====B==== snapshot2: ==========C========== (Chunk size is a power of 2. Chunks are aligned.) A write to the origin at a position within A and C comes along. It triggers reallocation of A, then reallocation of C and links them together using A as the 'primary' exception. Then another write to the origin comes along at a position within B and C. It creates pending exception for B. C already has a reallocation in progress and it already has a primary exception (A), so nothing is done to it: B and C are not linked. If the reallocation of B finishes before the reallocation of C, because there is no link with the pending exception for C it does not know to wait for it and, the second write is dispatched to the origin and causes data corruption in the chunk C in snapshot2. To avoid this situation, we maintain snapshots sorted in descending order of chunk size. This leads to a guaranteed ordering on the links between the pending exceptions and avoids the problem explained above - both A and B now get linked to C. Cc: stable@kernel.org Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
- 16 10月, 2009 1 次提交
-
-
由 NeilBrown 提交于
md/raid6 passes a list of 'struct page *' to the async_tx routines, which then either DMA map them for offload, or take the page_address for CPU based calculations. For RAID6 we sometime leave 'blanks' in the list of pages. For CPU based calcs, we want to treat theses as a page of zeros. For offloaded calculations, we simply don't pass a page to the hardware. Currently the 'blanks' are encoded as a pointer to raid6_empty_zero_page. This is a 4096 byte memory region, not a 'struct page'. This is mostly handled correctly but is rather ugly. So change the code to pass and expect a NULL pointer for the blanks. When taking page_address of a page, we need to check for a NULL and in that case use raid6_empty_zero_page. Signed-off-by: NNeilBrown <neilb@suse.de>
-