- 03 7月, 2012 4 次提交
-
-
由 NeilBrown 提交于
commit 43220aa0 md/raid5: fix a hang on device failure. fixed a hang, but introduced a refcounting in-balance so that if the presence of bad-blocks ever caused an rdev to be 'blocked' we would increment the refcount on the rdev and never decrement it. So added the needed rdev_dec_pending when md_wait_for_blocked_rdev is not called. Reported-by: Nmajianpeng <majianpeng@gmail.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 majianpeng 提交于
In ops_run_io(), the call to md_wait_for_blocked_rdev will decrement nr_pending so we lose the reference we hold on the rdev. So atomic_inc it first to maintain the reference. This bug was introduced by commit 73e92e51 md/raid5. Don't write to known bad block on doubtful devices. which appeared in 3.0, so patch is suitable for stable kernels since then. Cc: stable@vger.kernel.org Signed-off-by: Nmajianpeng <majianpeng@gmail.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 majianpeng 提交于
In chunk_aligned_read() we are adding data_offset before calling is_badblock. But is_badblock also adds data_offset, so that is bad. So move the addition of data_offset to after the call to is_badblock. This bug was introduced by commit 31c176ec md/raid5: avoid reading from known bad blocks. which first appeared in 3.0. So that patch is suitable for any -stable kernel from 3.0.y onwards. However it will need minor revision for most of those (as the comment didn't appear until recently). Cc: stable@vger.kernel.org Signed-off-by: Nmajianpeng <majianpeng@gmail.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
If a RAID5 has both a failed device and a device marked as 'WantReplacement', then we should preferentially replace the failed device. However the current code replaces whichever is found first. So split into 2 loops, check fail failed/missing first, and only check for WantReplacement if nothing is failed or missing. Reported-by: Nmajianpeng <majianpeng@gmail.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 22 5月, 2012 5 次提交
-
-
由 NeilBrown 提交于
After a reshape which reduced the number of devices we need to disconnect the extra devices. The code for this doesn't currently handle 'replacement' devices. It is very unlikely that such devices will be present, but it is safest to handle them anyway. So simplify the handling. Just clear In_sync and leave it to remove_and_add_spaces (which will be called soon) to do the real works. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
We always should have allowed this. A raid5 reshape doesn't change the size of the bitmap, so not need to restrict it. Also add a test to make sure we don't try to start a reshape on a failed array. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
Now that bitmaps can be resized, we can allow an array to be resized while the bitmap is present. This only covers resizing that involves changing the effective size of member devices, not resizing that changes the number of devices. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 Shaohua Li 提交于
REQ_SYNC is ignored in current raid5 code. Block layer does use it to do policy, for example ioscheduler. This patch adds it. Signed-off-by: NShaohua Li <shli@fusionio.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 Shaohua Li 提交于
The two variables are useless. Signed-off-by: NShaohua Li <shli@fusionio.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 21 5月, 2012 4 次提交
-
-
由 NeilBrown 提交于
The important issue here is incorporating the different in data_offset into calculations concerning when we might need to over-write data that is still thought to be valid. To this end we find the minimum offset difference across all devices and add that where appropriate. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
As there can now be two different data_offsets - an 'old' and a 'new' - we need to carefully choose between them. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
When reshaping we can avoid costly intermediate backup by changing the 'start' address of the array on the device (if there is enough room). So as a first step, allow such a change to be requested through sysfs, and recorded in v1.x metadata. (As we didn't previous check that all 'pad' fields were zero, we need a new FEATURE flag for this. A (belatedly) check that all remaining 'pad' fields are zero to avoid a repeat of this) The new data offset must be requested separately for each device. This allows each to have a different change in the data offset. This is not likely to be used often but as data_offset can be set per-device, new_data_offset should be too. This patch also removes the 'acknowledged' arg to rdev_set_badblocks as it is never used and never will be. At the same time we add a new arg ('in_new') which is currently always zero but will be used more soon. When a reshape finishes we will need to update the data_offset and rdev->sectors. So provide an exported function to do that. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
Currently a reshape operation always progresses from the start of the array to the end unless the number of devices is being reduced, in which case it progressed in the opposite direction. To reverse a partial reshape which changes the number of devices you can stop the array and re-assemble with the raid-disks numbers reversed and it will undo. However for a reshape that does not change the number of devices it is not possible to reverse the reshape in the middle - you have to wait until it completes. So add a 'reshape_direction' attribute with is either 'forwards' or 'backwards' and can be explicitly set when delta_disks is zero. This will become more important when we allow the data_offset to change in a reshape. Then the explicit statement of what direction is being used will be more useful. This can be enabled in raid5 trivially as it already supports reverse reshape and just needs to use a different trigger to request it. Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 03 4月, 2012 2 次提交
-
-
由 majianpeng 提交于
When create a raid5 using assume-clean and echo check or repair to sync_action.Then component disks did not operated IO but the raid check/resync faster than normal. Because the judgement in function analyse_stripe(): if (do_recovery || sh->sector >= conf->mddev->recovery_cp) s->syncing = 1; else s->replacing = 1; When check or repair,the recovery_cp == MaxSectore,so syncing equal zero not one. This bug was introduced by commit 9a3e1101 md/raid5: detect and handle replacements during recovery. so this patch is suitable for 3.3-stable. Cc: stable@vger.kernel.org Signed-off-by: Nmajianpeng <majianpeng@gmail.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
1/ We can only treat a known-bad-block like a read-error if we have the data that belongs in that block. So fix that test. 2/ If we cannot recovery a stripe due to insufficient data, don't tell "md_done_sync" that the sync failed unless we really did fail something. If we successfully record bad blocks, that is success. Reported-by: N"majianpeng" <majianpeng@gmail.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 19 3月, 2012 2 次提交
-
-
由 NeilBrown 提交于
md.h has an 'rdev_for_each()' macro for iterating the rdevs in an mddev. However it uses the 'safe' version of list_for_each_entry, and so requires the extra variable, but doesn't include 'safe' in the name, which is useful documentation. Consequently some places use this safe version without needing it, and many use an explicity list_for_each entry. So: - rename rdev_for_each to rdev_for_each_safe - create a new rdev_for_each which uses the plain list_for_each_entry, - use the 'safe' version only where needed, and convert all other list_for_each_entry calls to use rdev_for_each. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
When an array is failed (some data inaccessible) then there is no point attempting to add a spare as it could not possibly be recovered. However that may be value in re-adding a recently removed device. e.g. if there is a write-intent-bitmap and it is clear, then access to the data could be restored by this action. So don't reject a re-add to a failed array for RAID10 and RAID5 (the only arrays types that check for a failed array). Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 13 3月, 2012 3 次提交
-
-
由 majianpeng 提交于
Signed-off-by: Nmajianpeng <majianpeng@gmail.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
commit 908f4fbd removed the last user of this variable, so we should discard it completely. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
Leaving a valid reshape_position value in place could be confusing. Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 23 12月, 2011 13 次提交
-
-
由 NeilBrown 提交于
Now that WantReplacement drives are replaced cleanly, mark a drive as WantReplacement when we see a write error. It might get failed soon so the WantReplacement flag is irrelevant, but if the write error is recorded in the bad block log, we still want to activate any spare that might be available. Reviewed-by: NDan Williams <dan.j.williams@intel.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
When attempting to add a spare to a RAID[456] array, also consider adding it as a replacement for a want_replacement device. This requires that common md code attempt hot_add even when the array is not formally degraded. Reviewed-by: NDan Williams <dan.j.williams@intel.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
If a Replacement is seen, file it as such. If we see two replacements (or two normal devices) for the one slot, abort. Reviewed-by: NDan Williams <dan.j.williams@intel.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
When recovery completes - as reported by a call to ->spare_active, we clear In_sync on the original and set it on the replacement. Then when the original gets removed we move the replacement from 'replacement' to 'rdev'. This could race with other code that is looking at these pointers, so we use memory barriers and careful ordering to ensure that a reader might see one device twice, but never no devices. Then the readers guard against using both devices, which could only happen when writing. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
During recovery we want to write to the replacement but not the original. So we have two new flags - R5_NeedReplace if this stripe has a replacement that needs to be written at some stage - R5_WantReplace if NeedReplace, and the data is available, and a 'sync' has been requested on this stripe. We also distinguish between 'sync and replace' which need to read all other devices, and 'replace' which only needs to read the devices being replaced. Note that during resync we always write to any replacement device. It might not need to be written to, but as we don't read to compare, we have to write to be sure. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
When writing, we need to submit two writes, one to the original, and one to the replacement - if there is a replacement. If the write to the replacement results in a write error, we just fail the device. We only try to record write errors to the original. When writing for recovery, we shouldn't write to the original. This will be addressed in a subsequent patch that generally addresses recovery. Reviewed-by: NDan Williams <dan.j.williams@intel.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
Enhance raid5_remove_disk to be able to remove ->replacement as well as ->rdev. Reviewed-by: NDan Williams <dan.j.williams@intel.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
If a replacement device is present and has been recovered far enough, then use it for reading into the stripe cache. If we get an error we don't try to repair it, we just fail the device. A replacement device that gives errors does not sound sensible. This requires removing the setting of R5_ReadError when we get a read error during a read that bypasses the cache. It was probably a bad idea anyway as we don't know that every block in the read caused an error, and it could cause ReadError to be set for the replacement device, which is bad. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
We current initialise some fields of a bio when preparing a stripe_head, and again just before submitting the request. Remove the duplication by only setting the fields that lower level devices don't touch in raid5_build_block, and only set the changeable fields in ops_run_io. Reviewed-by: NDan Williams <dan.j.williams@intel.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
Just enhance data structures to record a second device per slot to be used as a 'replacement' device, replacing the original. We also have a second bio in each slot in each stripe_head. This will only be used when writing to the array - we need to write to both the original and the replacement at the same time, so will need two bios. For now, only try using the replacement drive for aligned-reads. In this case, we prefer the replacement if it has been recovered far enough, otherwise use the original. This includes a small enhancement. Previously we would only do aligned reads if the target device was fully recovered. Now we also do them if it has recovered far enough. Reviewed-by: NDan Williams <dan.j.williams@intel.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
Soon an array will be able to have multiple devices with the same raid_disk number (an original and a replacement). So removing a device based on the number won't work. So pass the actual device handle instead. Reviewed-by: NDan Williams <dan.j.williams@intel.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
When an array is being reshaped to change the number of devices, the two halves can be differently degraded. e.g. one could be missing a device and the other not. So we need to be more careful about calculating the 'degraded' attribute. Instead of just inc/dec at appropriate times, perform a full re-calculation examining both possible cases. This doesn't happen often so it not a big cost, and we already have most of the code to do it. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
While reshaping a degraded array (as when reshaping a RAID0 by first converting it to a degraded RAID4) we currently get confused about which devices are in_sync. In most cases we get it right, but in the region that is being reshaped we need to treat non-failed devices as in-sync when we have the data but haven't actually written it out yet. Reported-by: NAdam Kwolek <adam.kwolek@intel.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 09 12月, 2011 1 次提交
-
-
由 Adam Kwolek 提交于
NULL pointer access causes crash in raid5 module. Signed-off-by: NAdam Kwolek <adam.kwolek@intel.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 08 12月, 2011 1 次提交
-
-
由 NeilBrown 提交于
Once a device is failed we really want to completely ignore it. It should go away soon anyway. In particular the presence of bad blocks on it should not cause us to block as we won't be trying to write there anyway. So as soon as we can check if a device is Faulty, do so and pretend that it is already gone if it is Faulty. Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 08 11月, 2011 2 次提交
-
-
由 Dan Williams 提交于
All updates that occur under STRIPE_ACTIVE should be globally visible when STRIPE_ACTIVE clears. test_and_set_bit() implies a barrier, but clear_bit() does not. This is suitable for 3.1-stable. Signed-off-by: NDan Williams <dan.j.williams@intel.com> Signed-off-by: NNeilBrown <neilb@suse.de> Cc: stable@kernel.org
-
由 NeilBrown 提交于
When the number of failed devices exceeds the allowed number we must abort any active parity operations (checks or updates) as they are no longer meaningful, and can lead to a BUG_ON in handle_parity_checks6. This bug was introduce by commit 6c0069c0 in 2.6.29. Reported-by: NManish Katiyar <mkatiyar@gmail.com> Tested-by: NManish Katiyar <mkatiyar@gmail.com> Acked-by: NDan Williams <dan.j.williams@intel.com> Signed-off-by: NNeilBrown <neilb@suse.de> Cc: stable@kernel.org
-
- 01 11月, 2011 1 次提交
-
-
由 Paul Gortmaker 提交于
A pending cleanup will mean that module.h won't be implicitly everywhere anymore. Make sure the modular drivers in md dir are actually calling out for <module.h> explicitly in advance. Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
-
- 26 10月, 2011 2 次提交
-
-
由 NeilBrown 提交于
In 3.0 we changed the way recovery_disabled was handle so that instead of testing against zero, we test an mddev-> value against a conf-> value. Two problems: 1/ one place in raid1 was missed and still sets to '1'. 2/ We didn't explicitly set the conf-> value at array creation time. It defaulted to '0' just like the mddev value does so they could appear equal and thus disable recovery. This did not affect normal 'md' as it calls bind_rdev_to_array which changes the mddev value. However the dmraid interface doesn't call this and so doesn't change ->recovery_disabled; so at array start all recovery is incorrectly disabled. So initialise the 'conf' value to one less that the mddev value, so the will only be the same when explicitly set that way. Reported-by: NJonathan Brassow <jbrassow@redhat.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
This bug was introduced in 415e72d0 which was in 2.6.36. There is a small window of time between when a device fails and when it is removed from the array. During this time we might still read from it, but we won't write to it - so it is possible that we could read stale data. We didn't need the test of 'Faulty' before because the test on In_sync is sufficient. Since we started allowing reads from the early part of non-In_sync devices we need a test on Faulty too. This is suitable for any kernel from 2.6.36 onwards, though the patch might need a bit of tweaking in 3.0 and earlier. Cc: stable@kernel.org Signed-off-by: NNeilBrown <neilb@suse.de>
-