- 28 7月, 2011 4 次提交
-
-
由 NeilBrown 提交于
raid10d() is too big and is about to get bigger, so split handle_read_error() out as a separate function. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
When a loop ends with a large if, it can be neater to change the if to invert the condition and just 'continue'. Then the body of the if can be indented to a lower level. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
It is only safe to choose not to write to a bad block if that bad block is safely recorded in metadata - i.e. if it has been 'acknowledged'. If it hasn't we need to wait for the acknowledgement. We support that using rdev->blocked wait and md_wait_for_blocked_rdev by introducing a new device flag 'BlockedBadBlock'. This flag is only advisory. It is cleared whenever we acknowledge a bad block, so that a waiter can re-check the particular bad blocks that it is interested it. It should be set by a caller when they find they need to wait. This (set after test) is inherently racy, but as md_wait_for_blocked_rdev already has a timeout, losing the race will have minimal impact. When we clear "Blocked" was also clear "BlockedBadBlocks" incase it was set incorrectly (see above race). We also modify the way we manage 'Blocked' to fit better with the new handling of 'BlockedBadBlocks' and to make it consistent between externally managed and internally managed metadata. This requires that each raidXd loop checks if the metadata needs to be written and triggers a write (md_check_recovery) if needed. Otherwise a queued write request might cause raidXd to wait for the metadata to write, and only that thread can write it. Before writing metadata, we set FaultRecorded for all devices that are Faulty, then after writing the metadata we clear Blocked for any device for which the Fault was certainly Recorded. The 'faulty' device flag now appears in sysfs if the device is faulty *or* it has unacknowledged bad blocks. So user-space which does not understand bad blocks can continue to function correctly. User space which does, should not assume a device is faulty until it sees the 'faulty' flag, and then sees the list of unacknowledged bad blocks is empty. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
As no personality understand bad block lists yet, we must reject any device that is known to contain bad blocks. As the personalities get taught, these tests can be removed. This only applies to raid1/raid5/raid10. For linear/raid0/multipath/faulty the whole concept of bad blocks doesn't mean anything so there is no point adding the checks. Signed-off-by: NNeilBrown <neilb@suse.de> Reviewed-by: NNamhyung Kim <namhyung@gmail.com>
-
- 27 7月, 2011 4 次提交
-
-
由 Namhyung Kim 提交于
Read errors are considered to corrected if write-back and re-read cycle is finished without further problems. Thus moving the rdev-> corrected_errors counting after the re-reading looks more reasonable IMHO. Signed-off-by: NNamhyung Kim <namhyung@gmail.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
Normally we would fail a device with a READ error. However if doing so causes the array to fail, it is better to leave the device in place and just return the read error to the caller. The current test for decide if the array will fail is overly simplistic. We have a function 'enough' which can tell if the array is failed or not, so use it to guide the decision. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
When we get a read error during recovery, RAID10 previously arranged for the recovering device to appear to fail so that the recovery stops and doesn't restart. This is misleading and wrong. Instead, make use of the new recovery_disabled handling and mark the target device and having recovery disabled. Add appropriate checks in add_disk and remove_disk so that devices are removed and not re-added when recovery is disabled. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 Christian Dietrich 提交于
As per printk_ratelimit comment, it should not be used. Signed-off-by: NChristian Dietrich <christian.dietrich@informatik.uni-erlangen.de> Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 18 7月, 2011 3 次提交
-
-
由 Namhyung Kim 提交于
When performing a recovery, only first 2 slots in r10_bio are in use, for read and write respectively. However all of pages in the write bio are never used and just replaced to read bio's when the read completes. Get rid of those unused pages and share read pages properly. Signed-off-by: NNamhyung Kim <namhyung@gmail.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 Namhyung Kim 提交于
When normal-write and sync-read/write bio completes, we should find out the disk number the bio belongs to. Factor those common code out to a separate function. Signed-off-by: NNamhyung Kim <namhyung@gmail.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 Namhyung Kim 提交于
Variable 'first' is initialized to zero and updated to @rdev->raid_disk only if it is greater than 0. Thus condition '>= first' always implies '>= 0' so the latter is not needed. Signed-off-by: NNamhyung Kim <namhyung@gmail.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 11 5月, 2011 5 次提交
-
-
由 NeilBrown 提交于
When a loop ends with an 'if' with a large body, it is neater to make the if 'continue' on the inverse condition, and then the body is indented less. Apply this pattern 3 times, and wrap some other long lines. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
This variable 'disk' is never used - how odd. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
Now that we have a 'slot' variable, make better use of it to simplify some code a little. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
Currently the rdev on which a read error happened could be removed before we perform the fix_error handling. This requires extra tests for NULL. So delay the rdev_dec_pending call until after the call to fix_read_error so that we can be sure that the rdev still exists. This allows an 'if' clause to be removed so the body gets re-indented back one level. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
raid10 read balance has two different loop for looking through possible devices to chose the best. Collapse those into one loop and generally make the code more readable. Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 18 4月, 2011 2 次提交
-
-
由 NeilBrown 提交于
We just need to make sure that an unplug event wakes up the md thread, which is exactly what mddev_check_plugged does. Also remove some plug-related code that is no longer needed. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
md/raid submits a lot of IO from the various raid threads. So adding start/finish plug calls to those so that some plugging happens. Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 31 3月, 2011 1 次提交
-
-
由 Lucas De Marchi 提交于
Fixes generated by 'codespell' and manually reviewed. Signed-off-by: NLucas De Marchi <lucas.demarchi@profusion.mobi>
-
- 17 3月, 2011 1 次提交
-
-
由 Martin K. Petersen 提交于
MD and DM create a new bio_set for every metadevice. Each bio_set has an integrity mempool attached regardless of whether the metadevice is capable of passing integrity metadata. This is a waste of memory. Instead we defer the allocation decision to MD and DM since we know at metadevice creation time whether integrity passthrough is needed or not. Automatic integrity mempool allocation can then be removed from bioset_create() and we make an explicit integrity allocation for the fs_bio_set. Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com> Reported-by: NZdenek Kabelac <zkabelac@redhat.com> Acked-by: NMike Snitzer <snizer@redhat.com> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
- 10 3月, 2011 1 次提交
-
-
由 Jens Axboe 提交于
Code has been converted over to the new explicit on-stack plugging, and delay users have been converted to use the new API for that. So lets kill off the old plugging along with aops->sync_page(). Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
- 21 2月, 2011 1 次提交
-
-
由 NeilBrown 提交于
blk_throtl_exit assumes that ->queue_lock still exists, so make sure that it does. To do this, we stop redirecting ->queue_lock to conf->device_lock and leave it pointing where it is initialised - __queue_lock. As the blk_plug functions check the ->queue_lock is held, we now take that spin_lock explicitly around the plug functions. We don't need the locking, just the warning removal. This is needed for any kernel with the blk_throtl code, which is which is 2.6.37 and later. Cc: stable@kernel.org Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 08 2月, 2011 1 次提交
-
-
由 Krzysztof Wojcik 提交于
Following symptoms were observed: 1. After raid0->raid10 takeover operation we have array with 2 missing disks. When we add disk for rebuild, recovery process starts as expected but it does not finish- it stops at about 90%, md126_resync process hangs in "D" state. 2. Similar behavior is when we have mounted raid0 array and we execute takeover to raid10. After this when we try to unmount array- it causes process umount hangs in "D" In scenarios above processes hang at the same function- wait_barrier in raid10.c. Process waits in macro "wait_event_lock_irq" until the "!conf->barrier" condition will be true. In scenarios above it never happens. Reason was that at the end of level_store, after calling pers->run, we call mddev_resume. This calls pers->quiesce(mddev, 0) with RAID10, that calls lower_barrier. However raise_barrier hadn't been called on that 'conf' yet, so conf->barrier becomes negative, which is bad. This patch introduces setting conf->barrier=1 after takeover operation. It prevents to become barrier negative after call lower_barrier(). Signed-off-by: NKrzysztof Wojcik <krzysztof.wojcik@intel.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 14 1月, 2011 2 次提交
-
-
由 Jonathan Brassow 提交于
Add new parameter to 'sync_page_io'. The new parameter allows us to distinguish between metadata and data operations. This becomes important later when we add the ability to use separate devices for data and metadata. Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
-
由 Joe Perches 提交于
Noticed-by: NRussell King <linux@arm.linux.org.uk> Signed-off-by: NJoe Perches <joe@perches.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 09 12月, 2010 1 次提交
-
-
由 NeilBrown 提交于
When we fail to start a raid10 for some reason, we call md_unregister_thread to kill the thread that was created. Unfortunately md_thread() will then make one call into the handler (raid10d) even though md_wakeup_thread has not been called. This is not safe and as md_unregister_thread is called after mddev->private has been set to NULL, it will definitely cause a NULL dereference. So fix this at both ends: - md_thread should only call the handler if THREAD_WAKEUP has been set. - raid10 should call md_unregister_thread before setting things to NULL just like all the other raid modules do. This is applicable to 2.6.35 and later. Cc: stable@kernel.org Reported-by: N"Citizen" <citizen_lee@thecus.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 28 10月, 2010 5 次提交
-
-
由 NeilBrown 提交于
bio_clone and bio_alloc allocate from a common bio pool. If an md device is stacked with other devices that use this pool, or under something like swap which uses the pool, then the multiple calls on the pool can cause deadlocks. So allocate a local bio pool for each md array and use that rather than the common pool. This pool is used both for regular IO and metadata updates. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
Currently sync_page_io takes a 'bdev'. Every caller passes 'rdev->bdev'. We will soon want another field out of the rdev in sync_page_io, So just pass the rdev instead of the bdev out of it. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
bio_alloc can never fail (as it uses a mempool) but an block indefinitely, especially if the caller is holding a reference to a previously allocated bio. So these to places which both handle failure and hold multiple bios should not use bio_alloc, they should use bio_kmalloc. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
It is not safe to allocate from a mempool while holding an item previously allocated from that mempool as that can deadlock when the mempool is close to exhaustion. So don't use a bio list to collect the bios to write to multiple devices in raid1 and raid10. Instead queue each bio as it becomes available so an unplug will activate all previously allocated bios and so a new bio has a chance of being allocated. This means we must set the 'remaining' count to '1' before submitting any requests, then when all are submitted, decrement 'remaining' and possible handle the write completion at that point. Reported-by: NTorsten Kaiser <just.for.lkml@googlemail.com> Tested-by: NTorsten Kaiser <just.for.lkml@googlemail.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
bitmap_get_counter returns the number of sectors covered by the counter in a pass-by-reference variable. In some cases this can be very large, so make it a sector_t for safety. Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 10 9月, 2010 1 次提交
-
-
由 Tejun Heo 提交于
This patch converts md to support REQ_FLUSH/FUA instead of now deprecated REQ_HARDBARRIER. In the core part (md.c), the following changes are notable. * Unlike REQ_HARDBARRIER, REQ_FLUSH/FUA don't interfere with processing of other requests and thus there is no reason to mark the queue congested while FLUSH/FUA is in progress. * REQ_FLUSH/FUA failures are final and its users don't need retry logic. Retry logic is removed. * Preflush needs to be issued to all member devices but FUA writes can be handled the same way as other writes - their processing can be deferred to request_queue of member devices. md_barrier_request() is renamed to md_flush_request() and simplified accordingly. For linear, raid0 and multipath, the core changes are enough. raid1, 5 and 10 need the following conversions. * raid1: Handling of FLUSH/FUA bio's can simply be deferred to request_queues of member devices. Barrier related logic removed. * raid5: Queue draining logic dropped. FUA bit is propagated through biodrain and stripe resconstruction such that all the updated parts of the stripe are written out with FUA writes if any of the dirtying writes was FUA. preread_active_stripes handling in make_request() is updated as suggested by Neil Brown. * raid10: FUA bit needs to be propagated to write clones. linear, raid0, 1, 5 and 10 tested. Signed-off-by: NTejun Heo <tj@kernel.org> Reviewed-by: NNeil Brown <neilb@suse.de> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
- 18 8月, 2010 3 次提交
-
-
由 NeilBrown 提交于
commit 7b6d91da changed the behaviour of a few variables in raid1 and raid10 from flags to bit-sets, but left them as type 'bool' so they did not work. Change them (back) to unsigned long. (historical note: see 1ef04fef) Signed-off-by: NNeilBrown <neilb@suse.de> Reported-by: Jiri Slaby <jslaby@suse.cz> and many others
-
由 NeilBrown 提交于
md_check_recovery expects ->spare_active to return 'true' if any spares were activated, but none of them do, so the consequent change in 'degraded' is not notified through sysfs. So count the number of spares activated, subtract it from 'degraded' just once, and return it. Reported-by: NAdrian Drzewiecki <adriand@vmware.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 Adrian Drzewiecki 提交于
When RAID1 is done syncing disks, it'll update the state of synced rdevs to In_sync. But it neglected to notify sysfs that the attribute changed. So any programs that are waiting for an rdev's state to change will not be woken. (raid5/raid10 added by neilb) Signed-off-by: NAdrian Drzewiecki <adriand@vmware.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 08 8月, 2010 1 次提交
-
-
由 Christoph Hellwig 提交于
Remove the current bio flags and reuse the request flags for the bio, too. This allows to more easily trace the type of I/O from the filesystem down to the block driver. There were two flags in the bio that were missing in the requests: BIO_RW_UNPLUG and BIO_RW_AHEAD. Also I've renamed two request flags that had a superflous RW in them. Note that the flags are in bio.h despite having the REQ_ name - as blkdev.h includes bio.h that is the only way to go for now. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
- 07 8月, 2010 1 次提交
-
-
由 NeilBrown 提交于
If the 'bio_split' path in raid10-read is used while resync/recovery is happening it is possible to deadlock. Fix this be elevating ->nr_waiting for the duration of both parts of the split request. This fixes a bug that has been present since 2.6.22 but has only started manifesting recently for unknown reasons. It is suitable for and -stable since then. Reported-by: NJustin Bronder <jsbronder@gentoo.org> Tested-by: NJustin Bronder <jsbronder@gentoo.org> Signed-off-by: NNeilBrown <neilb@suse.de> Cc: stable@kernel.org
-
- 24 6月, 2010 3 次提交
-
-
由 Maciej Trela 提交于
Use mddev->new_layout in setup_conf. Also use new_chunk, and don't set ->degraded in takeover(). That gets set in run() Signed-off-by: NMaciej Trela <maciej.trela@intel.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
Most array level changes leave the list of devices largely unchanged, possibly causing one at the end to become redundant. However conversions between RAID0 and RAID10 need to renumber all devices (except 0). This renumbering is currently being done in the ->run method when the new personality takes over. However this is too late as the common code in md.c might already have invalidated some of the devices if they had a ->raid_disk number that appeared to high. Moving it into the ->takeover method is too early as the array is still active at that time and wrong ->raid_disk numbers could cause confusion. So add a ->new_raid_disk field to mdk_rdev_s and use it to communicate the new raid_disk number. Now the common code knows exactly which devices need to be renumbered, and which can be invalidated, and can do it all at a convenient time when the array is suspend. It can also update some symlinks in sysfs which previously were not be updated correctly. Reported-by: NMaciej Trela <maciej.trela@intel.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 Prasanna S. Panchamukhi 提交于
Such NULL pointer dereference can occur when the driver was fixing the read errors/bad blocks and the disk was physically removed causing a system crash. This patch check if the rcu_dereference() returns valid rdev before accessing it in fix_read_error(). Cc: stable@kernel.org Signed-off-by: NPrasanna S. Panchamukhi <prasanna.panchamukhi@riverbed.com> Signed-off-by: NRob Becker <rbecker@riverbed.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-