- 11 3月, 2008 1 次提交
-
-
由 NeilBrown 提交于
If an md array is "auto-read-only", then this appears in /proc/mdstat as /dev/md0: active(auto-read-only) whereas if it is truely readonly, it appears as /dev/md0: active (read-only) The difference being a space. One program known to parse this file expects the space and gets badly confused. It will be fixed, but it would be best if what the kernel generates is more consistent too. Signed-off-by: NNeil Brown <neilb@suse.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 05 3月, 2008 9 次提交
-
-
由 K.Tanaka 提交于
This message describes another issue about md RAID10 found by testing the 2.6.24 md RAID10 using new scsi fault injection framework. Abstract: When a scsi error results in disabling a disk during RAID10 recovery, the resync threads of md RAID10 could stall. This case, the raid array has already been broken and it may not matter. But I think stall is not preferable. If it occurs, even shutdown or reboot will fail because of resource busy. The deadlock mechanism: The r10bio_s structure has a "remaining" member to keep track of BIOs yet to be handled when recovering. The "remaining" counter is incremented when building a BIO in sync_request() and is decremented when finish a BIO in end_sync_write(). If building a BIO fails for some reasons in sync_request(), the "remaining" should be decremented if it has already been incremented. I found a case where this decrement is forgotten. This causes a md_do_sync() deadlock because md_do_sync() waits for md_done_sync() called by end_sync_write(), but end_sync_write() never calls md_done_sync() because of the "remaining" counter mismatch. For example, this problem would be reproduced in the following case: Personalities : [raid10] md0 : active raid10 sdf1[4] sde1[5](F) sdd1[2] sdc1[1] sdb1[6](F) 3919616 blocks 64K chunks 2 near-copies [4/2] [_UU_] [>....................] recovery = 2.2% (45376/1959808) finish=0.7min speed=45376K/sec This case, sdf1 is recovering, sdb1 and sde1 are disabled. An additional error with detaching sdd will cause a deadlock. md0 : active raid10 sdf1[4] sde1[5](F) sdd1[6](F) sdc1[1] sdb1[7](F) 3919616 blocks 64K chunks 2 near-copies [4/1] [_U__] [=>...................] recovery = 5.0% (99520/1959808) finish=5.9min speed=5237K/sec 2739 ? S< 0:17 [md0_raid10] 28608 ? D< 0:00 [md0_resync] 28629 pts/1 Ss 0:00 bash 28830 pts/1 R+ 0:00 ps ax 31819 ? D< 0:00 [kjournald] The resync thread keeps working, but actually it is deadlocked. Patch: By this patch, the remaining counter will be decremented if needed. Signed-off-by: NNeil Brown <neilb@suse.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 NeilBrown 提交于
Thanks to K.Tanaka and the scsi fault injection framework, here is a fix for another possible deadlock in raid1/raid10 error handing. If a read request returns an error while a resync is happening and a resync request is pending, the attempt to fix the error will block until the resync progresses, and the resync will block until the read request completes. Thus a deadlock. This patch fixes the problem. Cc: "K.Tanaka" <k-tanaka@ce.jp.nec.com> Signed-off-by: NNeil Brown <neilb@suse.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Keld Simonsen 提交于
This patch changes the disk to be read for layout "far > 1" to always be the disk with the lowest block address. Thus the chunks to be read will always be (for a fully functioning array) from the first band of stripes, and the raid will then work as a raid0 consisting of the first band of stripes. Some advantages: The fastest part which is the outer sectors of the disks involved will be used. The outer blocks of a disk may be as much as 100 % faster than the inner blocks. Average seek time will be smaller, as seeks will always be confined to the first part of the disks. Mixed disks with different performance characteristics will work better, as they will work as raid0, the sequential read rate will be number of disks involved times the IO rate of the slowest disk. If a disk is malfunctioning, the first disk which is working, and has the lowest block address for the logical block will be used. Signed-off-by: NKeld Simonsen <keld@dkuug.dk> Signed-off-by: NNeil Brown <neilb@suse.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 NeilBrown 提交于
When we access attributes of an rdev (component device on an md array) through sysfs, we really need to lock the array against concurrent changes. We currently do that when we change an attribute, but not when we read an attribute. We need to lock when reading as well else rdev->mddev could become NULL while we are accessing it. So add appropriate locking (mddev_lock) to rdev_attr_show. rdev_size_store requires some extra care as well as it needs to unlock the mddev while scanning other mddevs for overlapping regions. We currently assume that rdev->mddev will still be unchanged after the scan, but that cannot be certain. So take a copy of rdev->mddev for use at the end of the function. Signed-off-by: NNeil Brown <neilb@suse.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 NeilBrown 提交于
A resync/reshape/recovery thread will refuse to progress when the array is marked read-only. So whenever it mark it not read-only, it is important to wake up thread resync thread. There is one place we didn't do this. The problem manifests if the start_ro module parameters is set, and a raid5 array that is in the middle of a reshape (restripe) is started. The array will initially be semi-read-only (meaning it acts like it is readonly until the first write). So the reshape will not proceed. On the first write, the array will become read-write, but the reshape will not be started, and there is no event which will ever restart that thread. Signed-off-by: NNeil Brown <neilb@suse.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 NeilBrown 提交于
When a raid1 array is stopped, all components currently get added to the list for auto-detection. However we should really only add components that were found by autodetection in the first place. So add a flag to record that information, and use it. Signed-off-by: NNeil Brown <neilb@suse.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 NeilBrown 提交于
Make sure the data doesn't start before the end of the superblock when the superblock is at the start of the device. Signed-off-by: NNeil Brown <neilb@suse.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 NeilBrown 提交于
On an md array with a write-intent bitmap, a thread wakes up every few seconds and scans the bitmap looking for work to do. If the array is idle, there will be no work to do, but a lot of scanning is done to discover this. So cache the fact that the bitmap is completely clean, and avoid scanning the whole bitmap when the cache is known to be clean. Signed-off-by: NNeil Brown <neilb@suse.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 NeilBrown 提交于
When handling a read error, we freeze the array to stop any other IO while attempting to over-write with correct data. This is done in the raid1d(raid10d) thread and must wait for all submitted IO to complete (except for requests that failed and are sitting in the retry queue - these are counted in ->nr_queue and will stay there during a freeze). However write requests need attention from raid1d as bitmap updates might be required. This can cause a deadlock as raid1 is waiting for requests to finish that themselves need attention from raid1d. So we create a new function 'flush_pending_writes' to give that attention, and call it in freeze_array to be sure that we aren't waiting on raid1d. Thanks to "K.Tanaka" <k-tanaka@ce.jp.nec.com> for finding and reporting this problem. Cc: "K.Tanaka" <k-tanaka@ce.jp.nec.com> Signed-off-by: NNeil Brown <neilb@suse.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 20 2月, 2008 1 次提交
-
-
由 Adrian Bunk 提交于
This patch fixes two NULL dereferences introduced by commit 06386bbf and spotted by the Coverity checker. Signed-off-by: NAdrian Bunk <bunk@kernel.org> Signed-off-by: NAlasdair G Kergon <agk@redhat.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 15 2月, 2008 4 次提交
-
-
由 Jan Blunck 提交于
d_path() is used on a <dentry,vfsmount> pair. Lets use a struct path to reflect this. [akpm@linux-foundation.org: fix build in mm/memory.c] Signed-off-by: NJan Blunck <jblunck@suse.de> Acked-by: NBryan Wu <bryan.wu@analog.com> Acked-by: NChristoph Hellwig <hch@infradead.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Neil Brown <neilb@suse.de> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Jan Blunck 提交于
seq_path() is always called with a dentry and a vfsmount from a struct path. Make seq_path() take it directly as an argument. Signed-off-by: NJan Blunck <jblunck@suse.de> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Neil Brown <neilb@suse.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Jan Blunck 提交于
* Add path_put() functions for releasing a reference to the dentry and vfsmount of a struct path in the right order * Switch from path_release(nd) to path_put(&nd->path) * Rename dput_path() to path_put_conditional() [akpm@linux-foundation.org: fix cifs] Signed-off-by: NJan Blunck <jblunck@suse.de> Signed-off-by: NAndreas Gruenbacher <agruen@suse.de> Acked-by: NChristoph Hellwig <hch@lst.de> Cc: <linux-fsdevel@vger.kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Steven French <sfrench@us.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Jan Blunck 提交于
This is the central patch of a cleanup series. In most cases there is no good reason why someone would want to use a dentry for itself. This series reflects that fact and embeds a struct path into nameidata. Together with the other patches of this series - it enforced the correct order of getting/releasing the reference count on <dentry,vfsmount> pairs - it prepares the VFS for stacking support since it is essential to have a struct path in every place where the stack can be traversed - it reduces the overall code size: without patch series: text data bss dec hex filename 5321639 858418 715768 6895825 6938d1 vmlinux with patch series: text data bss dec hex filename 5320026 858418 715768 6894212 693284 vmlinux This patch: Switch from nd->{dentry,mnt} to nd->path.{dentry,mnt} everywhere. [akpm@linux-foundation.org: coding-style fixes] [akpm@linux-foundation.org: fix cifs] [akpm@linux-foundation.org: fix smack] Signed-off-by: NJan Blunck <jblunck@suse.de> Signed-off-by: NAndreas Gruenbacher <agruen@suse.de> Acked-by: NChristoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 14 2月, 2008 1 次提交
-
-
由 Al Viro 提交于
test_and_set_bit() on address of uint32_t is a Bad Idea(tm)... Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 08 2月, 2008 24 次提交
-
-
由 Jonathan Brassow 提交于
This patch adds extra information to the mirror status output, so that it can be determined which device(s) have failed. For each mirror device, a character is printed indicating the most severe error encountered. The characters are: * A => Alive - No failures * D => Dead - A write failure occurred leaving mirror out-of-sync * S => Sync - A sychronization failure occurred, mirror out-of-sync * R => Read - A read failure occurred, mirror data unaffected This allows userspace to properly reconfigure the mirror set. Signed-off-by: NJonathan Brassow <jbrassow@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Jonathan Brassow 提交于
This patch gives the ability to respond-to/record device failures that happen during read operations. It also adds the ability to read from mirror devices that are not the primary if they are in-sync. There are essentially two read paths in mirroring; the direct path and the queued path. When a read request is mapped, if the region is 'in-sync' the direct path is taken; otherwise the queued path is taken. If the direct path is taken, we must record bio information so that if the read fails we can retry it. We then discover the status of a direct read through mirror_end_io. If the read has failed, we will mark the device from which the read was attempted as failed (so we don't try to read from it again), restore the bio and try again. If the queued path is taken, we discover the results of the read from 'read_callback'. If the device failed, we will mark the device as failed and attempt the read again if there is another device where this region is known to be 'in-sync'. Signed-off-by: NJonathan Brassow <jbrassow@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Jonathan Brassow 提交于
This patch adds the ability to requeue write I/O to core device-mapper when there is a log device failure. If a write to the log produces and error, the pending writes are put on the "failures" list. Since the log is marked as failed, they will stay on the failures list until a suspend happens. Suspends come in two phases, presuspend and postsuspend. We must make sure that all the writes on the failures list are requeued in the presuspend phase (a requirement of dm core). This means that recovery must be complete (because writes may be delayed behind it) and the failures list must be requeued before we return from presuspend. The mechanisms to ensure recovery is complete (or stopped) was already in place, but needed to be moved from postsuspend to presuspend. We rely on 'flush_workqueue' to ensure that the mirror thread is complete and therefore, has requeued all writes in the failures list. Because we are using flush_workqueue, we must ensure that no additional 'queue_work' calls will produce additional I/O that we need to requeue (because once we return from presuspend, we are unable to do anything about it). 'queue_work' is called in response to the following functions: - complete_resync_work = NA, recovery is stopped - rh_dec (mirror_end_io) = NA, only calls 'queue_work' if it is ready to recover the region (recovery is stopped) or it needs to clear the region in the log* **this doesn't get called while suspending** - rh_recovery_end = NA, recovery is stopped - rh_recovery_start = NA, recovery is stopped - write_callback = 1) Writes w/o failures simply call bio_endio -> mirror_end_io -> rh_dec (see rh_dec above) 2) Writes with failures are put on the failures list and queue_work is called** ** write_callbacks don't happen during suspend ** - do_failures = NA, 'queue_work' not called if suspending - add_mirror (initialization) = NA, only done on mirror creation - queue_bio = NA, 1) delayed I/O scheduled before flush_workqueue is called. 2) No more I/Os are being issued. 3) Re-attempted READs can still be handled. (Write completions are handled through rh_dec/ write_callback - mention above - and do not use queue_bio.) Signed-off-by: NJonathan Brassow <jbrassow@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Jonathan Brassow 提交于
This patch adds the calls to 'fail_mirror' if an error occurs during mirror recovery (aka resynchronization). 'fail_mirror' is responsible for recording the type of error by mirror device and ensuring an event gets raised for the purpose of notifying userspace. Signed-off-by: NJonathan Brassow <jbrassow@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Jonathan Brassow 提交于
This patch gives mirror the ability to handle device failures during normal write operations. The 'write_callback' function is called when a write completes. If all the writes failed or succeeded, we report failure or success respectively. If some of the writes failed, we call fail_mirror; which increments the error count for the device, notes the type of error encountered (DM_RAID1_WRITE_ERROR), and selects a new primary (if necessary). Note that the primary device can never change while the mirror is not in-sync (IOW, while recovery is happening.) This means that the scenario where a failed write changes the primary and gives recovery_complete a chance to misread the primary never happens. The fact that the primary can change has necessitated the change to the default_mirror field. We need to protect against reading garbage while the primary changes. We then add the bio to a new list in the mirror set, 'failures'. For every bio in the 'failures' list, we call a new function, '__bio_mark_nosync', where we mark the region 'not-in-sync' in the log and properly set the region state as, RH_NOSYNC. Userspace must also be notified of the failure. This is done by 'raising an event' (dm_table_event()). If fail_mirror is called in process context the event can be raised right away. If in interrupt context, the event is deferred to the kmirrord thread - which raises the event if 'event_waiting' is set. Backwards compatibility is maintained by ignoring errors if the DM_FEATURES_HANDLE_ERRORS flag is not present. Signed-off-by: NJonathan Brassow <jbrassow@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Milan Broz 提交于
Provided sector_t is 64 bits, reduce the in-memory footprint of the snapshot exception table by the simple method of using unused bits of the chunk number to combine consecutive entries. Signed-off-by: NMilan Broz <mbroz@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Brian Wood 提交于
This patch adds additional information to the status line. It is added at the end of the returned text so it will not interfere with existing implementations using this data. The addition of this information will allow for a common return interface to match that returned with the dm-raid1.c status line (with Jonathan Brassow's patches). Here is a sample of what is returned with a mirror "status" call: isw_eeaaabgfg_mirror: 0 488390920 mirror 2 8:16 8:32 3727/3727 1 AA 1 core Here's what's returned with this patch for a stripe "status" call: isw_dheeijjdej_stripe: 0 976783872 striped 2 8:16 8:32 1 AA Signed-off-by: NBrian Wood <brian.j.wood@intel.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Brian Wood 提交于
This patch adds the stripe_end_io function to process errors that might occur after an IO operation. As part of this there are a number of enhancements made to record and trigger events: - New atomic variable in struct stripe to record the number of errors each stripe volume device has experienced (could be used later with uevents to report back directly to userspace) - New workqueue/work struct setup to process the trigger_event function - New end_io function. It is here that testing for BIO error conditions take place. It determines the exact stripe that cause the error, records this in the new atomic variable, and calls the queue_work() function - New trigger_event function to process failure events. This calls dm_table_event() Signed-off-by: NBrian Wood <brian.j.wood@intel.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Jonathan Brassow 提交于
If the log type is not recognised, attempt to load the module 'dm-log-<type>.ko'. Signed-off-by: NJonathan Brassow <jbrassow@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Milan Broz 提交于
Add a single-thread workqueue for each mapped device and move flushing of the lists of pushback and deferred bios to this new workqueue. Signed-off-by: NMilan Broz <mbroz@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Milan Broz 提交于
dm-crypt: Use crypto ablkcipher interface Move encrypt/decrypt core to async crypto call. Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au> Signed-off-by: NMilan Broz <mbroz@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Milan Broz 提交于
dm-crypt: Use crypto ablkcipher interface Prepare callback function for async crypto operation. Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au> Signed-off-by: NMilan Broz <mbroz@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Milan Broz 提交于
dm-crypt: Use crypto ablkcipher interface Prepare completion for async crypto request. Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au> Signed-off-by: NMilan Broz <mbroz@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Milan Broz 提交于
dm-crypt: Use crypto ablkcipher interface Introduce mempool for async crypto requests. cc->req is used mainly during synchronous operations (to prevent allocation and deallocation of the same object). Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au> Signed-off-by: NMilan Broz <mbroz@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Milan Broz 提交于
dm-crypt: Use crypto ablkcipher interface Move scatterlists to separate dm_crypt_struct and pick out block processing from crypt_convert. Signed-off-by: NMilan Broz <mbroz@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Milan Broz 提交于
Make io reference counting more obvious. Signed-off-by: NMilan Broz <mbroz@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Milan Broz 提交于
Introduce crypt_write_io_loop(). Signed-off-by: NMilan Broz <mbroz@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Milan Broz 提交于
Process write request in separate function and queue final bio through io workqueue. Signed-off-by: NMilan Broz <mbroz@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Milan Broz 提交于
Add sector into dm_crypt_io instead of using local variable. Signed-off-by: NMilan Broz <mbroz@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Alasdair G Kergon 提交于
Reorder kcryptd functions for clarity. Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Milan Broz 提交于
Rename functions to follow calling convention. Prepare write io error processing function skeleton. Signed-off-by: NMilan Broz <mbroz@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Milan Broz 提交于
Simplify crypt_endio function. Signed-off-by: NMilan Broz <mbroz@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Milan Broz 提交于
Move error code setting outside of crypt_dec_pending function. Use -EIO if crypt_convert_scatterlist() fails. Signed-off-by: NMilan Broz <mbroz@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Milan Broz 提交于
Remove write attribute from convert_context and use bio flag instead. Signed-off-by: NMilan Broz <mbroz@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-