- 12 10月, 2015 5 次提交
-
-
由 Goldwyn Rodrigues 提交于
In a clustered environment, a change such as marking a device faulty, can be recorded by any of the nodes. This is communicated to all the nodes and re-recording such a change is unnecessary, and quite often pretty disruptive. With this patch, just before the update, we detect for the changes and if the changes are already in superblock, we abort the update after clearing all the flags Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
-
由 Goldwyn Rodrigues 提交于
md_reload_sb is too simplistic and it explicitly needs to determine the changes made by the writing node. However, there are multiple areas where a simple reload could fail. Instead, read the superblock of one of the "good" rdevs and update the necessary information: - read the superblock into a newly allocated page, by temporarily swapping out rdev->sb_page and calling ->load_super. - if that fails return - if it succeeds, call check_sb_changes 1. iterates over list of active devices and checks the matching dev_roles[] value. If that is 'faulty', the device must be marked as faulty - call md_error to mark the device as faulty. Make sure not to set CHANGE_DEVS and wakeup mddev->thread or else it would initiate a resync process, which is the responsibility of the "primary" node. - clear the Blocked bit - Call remove_and_add_spares() to hot remove the device. If the device is 'spare': - call remove_and_add_spares() to get the number of spares added in this operation. - Reduce mddev->degraded to mark the array as not degraded. 2. reset recovery_cp - read the rest of the rdevs to update recovery_offset. If recovery_offset is equal to MaxSector, call spare_active() to set it In_sync This required that recovery_offset be initialized to MaxSector, as opposed to zero so as to communicate the end of sync for a rdev. Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
-
由 Goldwyn Rodrigues 提交于
remove_and_add_spares() checks for all devices to activate spare. Change it to activate a specific device if a non-null rdev argument is passed. remove_and_add_spares() can be used to activate spares in slot_store() as well. For hot_remove_disk(), check if rdev->raid_disk == -1 before calling remove_and_add_spares() Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
-
由 Goldwyn Rodrigues 提交于
Suspending the entire device for resync could take too long. Resync in small chunks. cluster's resync window (32M) is maintained in r1conf as cluster_sync_low and cluster_sync_high and processed in raid1's sync_request(). If the current resync is outside the cluster resync window: 1. Set the cluster_sync_low to curr_resync_completed. 2. Check if the sync will fit in the new window, if not issue a wait_barrier() and set cluster_sync_low to sector_nr. 3. Set cluster_sync_high to cluster_sync_low + resync_window. 4. Send a message to all nodes so they may add it in their suspension list. bitmap_cond_end_sync is modified to allow to force a sync inorder to get the curr_resync_completed uptodate with the sector passed. Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 Goldwyn Rodrigues 提交于
Add BITMAP_MAJOR_CLUSTERED as 5, in order to prevent older kernels to assemble a clustered device. In order to maximize compatibility, the major version is set to BITMAP_MAJOR_CLUSTERED *only* if the bitmap is clustered. Added MD_FEATURE_CLUSTERED in order to return error for older kernels which would assemble MD even if the bitmap is corrupted. Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: NNeilBrown <neilb@suse.com>
-
- 02 10月, 2015 2 次提交
-
-
由 Shaohua Li 提交于
If faulty disks of an array are more than allowed degraded number, the array enters error handling. It will be marked as read-only with MD_CHANGE_PENDING/RECOVERY_NEEDED set. But currently recovery doesn't clear CHANGE_PENDING bit for read-only array. If MD_CHANGE_PENDING is set for a raid5 array, all returned IO will be hold on a list till the bit is clear. But recovery nevery clears this bit, the IO is always in pending state and nevery finish. This has bad effects like upper layer can't get an IO error and the array can't be stopped. Fixes: c3cce6cd ("md/raid5: ensure device failure recorded before write request returns.") Signed-off-by: NShaohua Li <shli@fb.com> Signed-off-by: NNeilBrown <neilb@suse.com>
-
由 NeilBrown 提交于
If a superblock update is pending, wait for it to complete before letting md_set_readonly() switch to readonly. Otherwise we might lose important information about a device having failed. For external arrays, waiting for superblock updates can wait on user-space, so in that case, just return an error. Reported-and-tested-by: NShaohua Li <shli@fb.com> Signed-off-by: NNeilBrown <neilb@suse.com>
-
- 01 9月, 2015 9 次提交
-
-
由 NeilBrown 提交于
When a write to one of the legs of a RAID1 fails, the failure is recorded in the metadata of the other leg(s) so that after a restart the data on the failed drive wont be trusted even if that drive seems to be working again (maybe a cable was unplugged). Similarly when we record a bad-block in response to a write failure, we must not let the write complete until the bad-block update is safe. Currently there is no interlock between the write request completing and the metadata update. So it is possible that the write will complete, the app will confirm success in some way, and then the machine will crash before the metadata update completes. This is an extremely small hole for a racy to fit in, but it is theoretically possible and so should be closed. So: - set MD_CHANGE_PENDING when requesting a metadata update for a failed device, so we can know with certainty when it completes - queue requests that experienced an error on a new queue which is only processed after the metadata update completes - call raid_end_bio_io() on bios in that queue when the time comes. Signed-off-by: NNeilBrown <neilb@suse.com>
-
由 NeilBrown 提交于
This code looks racy. The only possible race is if two modules try to register at the same time and that won't happen. But make the code look safe anyway. Signed-off-by: NNeilBrown <neilb@suse.com>
-
由 Guoqing Jiang 提交于
When node A stops an array while the array is doing a resync, we need to let another node B take over the resync task. To achieve the goal, we need the A send an explicit BITMAP_NEEDS_SYNC message to the cluster. And the node B which received that message will invoke __recover_slot to do resync. Reviewed-by: NGoldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: NGuoqing Jiang <gqjiang@suse.com> Signed-off-by: NNeilBrown <neilb@suse.com>
-
由 Sasha Levin 提交于
We used to set up the safemode_timer timer in md_run. If md_run would fail before the timer was set up we'd end up trying to modify a timer that doesn't have a callback function when we access safe_delay_store, which would trigger a BUG. neilb: delete init_timer() call as setup_timer() does that. Signed-off-by: NSasha Levin <sasha.levin@oracle.com> Signed-off-by: NNeilBrown <neilb@suse.com>
-
由 NeilBrown 提交于
There can be a small window between the moment that recovery actually writes the last block and the time when various sysfs and /proc/mdstat attributes report that it has finished. During this time, 'sync_completed' can have the wrong value. This can confuse monitoring software. So: - don't set curr_resync_completed beyond the end of the devices, - set it correctly when resync/recovery has completed. Signed-off-by: NNeilBrown <neilb@suse.com>
-
由 NeilBrown 提交于
While it generally shouldn't happen, it is not impossible for curr_resync_completed to exceed resync_max. This can particularly happen when reshaping RAID5 - the current status isn't copied to curr_resync_completed promptly, so when it is, it can exceed resync_max. This happens when the reshape is 'frozen', resync_max is set low, and reshape is re-enabled. Taking a difference between two unsigned numbers is always dangerous anyway, so add a test to behave correctly if curr_resync_completed > resync_max Signed-off-by: NNeilBrown <neilb@suse.com>
-
由 NeilBrown 提交于
This ensures that 'sync_action' will show 'recover' immediately the array is started. If there is no spare the status will change to 'idle' once that is detected. Clear MD_RECOVERY_RECOVER for a read-only array to ensure this change happens. This allows scripts which monitor status not to get confused - particularly my test scripts. Signed-off-by: NNeilBrown <neilb@suse.com>
-
由 NeilBrown 提交于
When checking sync_action in a script, we want to be sure it is as accurate as possible. As resync/reshape etc doesn't always start immediately (a separate thread is scheduled to do it), it is best if 'action_show' checks if MD_RECOVER_NEEDED is set (which it does) and in that case reports what is likely to start soon (which it only sometimes does). So: - report 'reshape' if reshape_position suggests one might start. - set MD_RECOVERY_RECOVER in raid1_reshape(), because that is very likely to happen next. Signed-off-by: NNeilBrown <neilb@suse.com>
-
由 NeilBrown 提交于
Currently when a recovery completes, mdstat shows that it has finished before the new device is marked as a full member. Because of this it can appear to a script that the recovery finished but the array isn't in sync. So while MD_RECOVERY_DONE is still set, keep mdstat reporting "recovery". Once md_reap_sync_thread() completes, the spare will be active and then MD_RECOVERY_DONE will be cleared. To ensure this is race-free, set MD_RECOVERY_DONE before clearning curr_resync. Signed-off-by: NNeilBrown <neilb@suse.com>
-
- 14 8月, 2015 2 次提交
-
-
由 Kent Overstreet 提交于
As generic_make_request() is now able to handle arbitrarily sized bios, it's no longer necessary for each individual block driver to define its own ->merge_bvec_fn() callback. Remove every invocation completely. Cc: Jens Axboe <axboe@kernel.dk> Cc: Lars Ellenberg <drbd-dev@lists.linbit.com> Cc: drbd-user@lists.linbit.com Cc: Jiri Kosina <jkosina@suse.cz> Cc: Yehuda Sadeh <yehuda@inktank.com> Cc: Sage Weil <sage@inktank.com> Cc: Alex Elder <elder@kernel.org> Cc: ceph-devel@vger.kernel.org Cc: Alasdair Kergon <agk@redhat.com> Cc: Mike Snitzer <snitzer@redhat.com> Cc: dm-devel@redhat.com Cc: Neil Brown <neilb@suse.de> Cc: linux-raid@vger.kernel.org Cc: Christoph Hellwig <hch@infradead.org> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Acked-by: NeilBrown <neilb@suse.de> (for the 'md' bits) Acked-by: NMike Snitzer <snitzer@redhat.com> Signed-off-by: NKent Overstreet <kent.overstreet@gmail.com> [dpark: also remove ->merge_bvec_fn() in dm-thin as well as dm-era-target, and resolve merge conflicts] Signed-off-by: NDongsu Park <dpark@posteo.net> Signed-off-by: NMing Lin <ming.l@ssi.samsung.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Kent Overstreet 提交于
The way the block layer is currently written, it goes to great lengths to avoid having to split bios; upper layer code (such as bio_add_page()) checks what the underlying device can handle and tries to always create bios that don't need to be split. But this approach becomes unwieldy and eventually breaks down with stacked devices and devices with dynamic limits, and it adds a lot of complexity. If the block layer could split bios as needed, we could eliminate a lot of complexity elsewhere - particularly in stacked drivers. Code that creates bios can then create whatever size bios are convenient, and more importantly stacked drivers don't have to deal with both their own bio size limitations and the limitations of the (potentially multiple) devices underneath them. In the future this will let us delete merge_bvec_fn and a bunch of other code. We do this by adding calls to blk_queue_split() to the various make_request functions that need it - a few can already handle arbitrary size bios. Note that we add the call _after_ any call to blk_queue_bounce(); this means that blk_queue_split() and blk_recalc_rq_segments() don't need to be concerned with bouncing affecting segment merging. Some make_request_fn() callbacks were simple enough to audit and verify they don't need blk_queue_split() calls. The skipped ones are: * nfhd_make_request (arch/m68k/emu/nfblock.c) * axon_ram_make_request (arch/powerpc/sysdev/axonram.c) * simdisk_make_request (arch/xtensa/platforms/iss/simdisk.c) * brd_make_request (ramdisk - drivers/block/brd.c) * mtip_submit_request (drivers/block/mtip32xx/mtip32xx.c) * loop_make_request * null_queue_bio * bcache's make_request fns Some others are almost certainly safe to remove now, but will be left for future patches. Cc: Jens Axboe <axboe@kernel.dk> Cc: Christoph Hellwig <hch@infradead.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Ming Lei <ming.lei@canonical.com> Cc: Neil Brown <neilb@suse.de> Cc: Alasdair Kergon <agk@redhat.com> Cc: Mike Snitzer <snitzer@redhat.com> Cc: dm-devel@redhat.com Cc: Lars Ellenberg <drbd-dev@lists.linbit.com> Cc: drbd-user@lists.linbit.com Cc: Jiri Kosina <jkosina@suse.cz> Cc: Geoff Levand <geoff@infradead.org> Cc: Jim Paris <jim@jtan.com> Cc: Philip Kelleher <pjk1939@linux.vnet.ibm.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Oleg Drokin <oleg.drokin@intel.com> Cc: Andreas Dilger <andreas.dilger@intel.com> Acked-by: NeilBrown <neilb@suse.de> (for the 'md/md.c' bits) Acked-by: NMike Snitzer <snitzer@redhat.com> Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com> Signed-off-by: NKent Overstreet <kent.overstreet@gmail.com> [dpark: skip more mq-based drivers, resolve merge conflicts, etc.] Signed-off-by: NDongsu Park <dpark@posteo.net> Signed-off-by: NMing Lin <ming.l@ssi.samsung.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 03 8月, 2015 2 次提交
-
-
由 Benjamin Randazzo 提交于
There is no point assigning '\0' to file->pathname[0] as file is now zeroed out, so remove that branch and simplify the code. [Original patch combined this with the change to use kzalloc. I split the two so that the change to kzalloc is easier to backport. - neilb] Signed-off-by: NBenjamin Randazzo <benjamin@randazzo.fr> Signed-off-by: NNeilBrown <neilb@suse.com>
-
由 Benjamin Randazzo 提交于
In drivers/md/md.c get_bitmap_file() uses kmalloc() for creating a mdu_bitmap_file_t called "file". 5769 file = kmalloc(sizeof(*file), GFP_NOIO); 5770 if (!file) 5771 return -ENOMEM; This structure is copied to user space at the end of the function. 5786 if (err == 0 && 5787 copy_to_user(arg, file, sizeof(*file))) 5788 err = -EFAULT But if bitmap is disabled only the first byte of "file" is initialized with zero, so it's possible to read some bytes (up to 4095) of kernel space memory from user space. This is an information leak. 5775 /* bitmap disabled, zero the first byte and copy out */ 5776 if (!mddev->bitmap_info.file) 5777 file->pathname[0] = '\0'; Signed-off-by: NBenjamin Randazzo <benjamin@randazzo.fr> Signed-off-by: NNeilBrown <neilb@suse.com>
-
- 29 7月, 2015 1 次提交
-
-
由 Christoph Hellwig 提交于
Currently we have two different ways to signal an I/O error on a BIO: (1) by clearing the BIO_UPTODATE flag (2) by returning a Linux errno value to the bi_end_io callback The first one has the drawback of only communicating a single possible error (-EIO), and the second one has the drawback of not beeing persistent when bios are queued up, and are not passed along from child to parent bio in the ever more popular chaining scenario. Having both mechanisms available has the additional drawback of utterly confusing driver authors and introducing bugs where various I/O submitters only deal with one of them, and the others have to add boilerplate code to deal with both kinds of error returns. So add a new bi_error field to store an errno value directly in struct bio and remove the existing mechanisms to clean all this up. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NHannes Reinecke <hare@suse.de> Reviewed-by: NNeilBrown <neilb@suse.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 24 7月, 2015 1 次提交
-
-
由 Goldwyn Rodrigues 提交于
request_module() can return 256 (process exited) in some cases, which is not as specified in the documentation before the request_module() definition. Convert the error to -ENOENT. The positive error number results in bitmap_create() returning a value that is meant to be an error but doesn't look like one, so it is dereferenced as a point and causes a crash. (not needed for stable as this is "experimental" code) Fixes: edb39c9d ("Introduce md_cluster_operations to handle cluster functions") Signed-off-By: NGoldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: NNeilBrown <neilb@suse.com>
-
- 22 7月, 2015 1 次提交
-
-
由 NeilBrown 提交于
The 'event_work' worker used by dm-raid may still be running when the array is stopped. This can result in an oops. So flush the workqueue on which it is run after detaching and before destroying the device. Reported-by: NHeinz Mauelshagen <heinzm@redhat.com> Signed-off-by: NNeilBrown <neilb@suse.com> Cc: stable@vger.kernel.org (2.6.38+ please delay 2 weeks after -final release) Fixes: 9d09e663 ("dm: raid456 basic support")
-
- 26 6月, 2015 1 次提交
-
-
由 Rasmus Villemoes 提交于
There's no point in starting over when we meet a '/'. This also eliminates a stack variable and a little .text. Signed-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk> Acked-by: NNeilBrown <neilb@suse.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 25 6月, 2015 3 次提交
-
-
由 Neil Brown 提交于
The Blocked flag indicates that a device has failed but that this fact hasn't been recorded in the metadata yet. Writes to such devices cannot be allowed until the metadata has been updated. On a read-only array, the Blocked flag will never be cleared. This prevents the device being removed from the array. If the metadata is being handled by the kernel (i.e. !mddev->external), then we can be sure that if the array is switch to writable, then a metadata update will happen and will record the failure. So we don't need the flag set. If metadata is externally managed, it is upto the external manager to clear the 'blocked' flag. Reported-by: NXiaoNi <xni@redhat.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
This error path retuns while still holding the lock - bad. Fixes: 6791875e ("md: make reconfig_mutex optional for writes to md sysfs files.") Cc: stable@vger.kernel.org (v4.0+) Signed-off-by: NNeilBrown <neilb@suse.com>
-
由 NeilBrown 提交于
If ->private is set when ->run is called, it is assumed to be a 'config' prepared as part of 'reshape'. So it is important when we free that config, that we also clear ->private. This is not often a problem as the mddev will normally be discarded shortly after the config us freed. However if an 'assemble' races with a final close, the assemble can use the old mddev which has a stale ->private. This leads to any of various sorts of crashes. So clear ->private after calling ->free(). Reported-by: NNate Clark <nate@neworld.us> Cc: stable@vger.kernel.org (v4.0+) Fixes: afa0f557 ("md: rename ->stop to ->free") Signed-off-by: NNeilBrown <neilb@suse.com>
-
- 24 6月, 2015 1 次提交
-
-
由 Miklos Szeredi 提交于
Turn d_path(&file->f_path, ...); into file_path(file, ...); Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 17 6月, 2015 2 次提交
-
-
由 Firo Yang 提交于
Warning like this: drivers/md/md.c: In function "update_array_info": drivers/md/md.c:6394:26: warning: logical not is only applied to the left hand side of comparison [-Wlogical-not-parentheses] !mddev->persistent != info->not_persistent|| Fix it as Neil Brown said: mddev->persistent != !info->not_persistent || Signed-off-by: NFiro Yang <firogm@gmail.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 Alexey Dobriyan 提交于
Convert away from deprecated simple_strto*() functions. Add "fit into sector_t" checks. Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 12 6月, 2015 3 次提交
-
-
由 NeilBrown 提交于
MD_RECOVERY_DONE is normally cleared by md_check_recovery after a resync etc finished. However it is possible for raid5_start_reshape to race and start a reshape before MD_RECOVERY_DONE is cleared. This can lean to multiple reshapes running at the same time, which isn't good. To make sure it is cleared before starting a reshape, and also clear it when reaping a thread, just to be safe. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
Checking ->sync_thread without holding the mddev_lock() isn't really safe, even after flushing the workqueue which ensures md_start_sync() has been run. While this code is waiting for the lock, md_check_recovery could reap the thread itself, and then start another thread (e.g. recovery might finish, then reshape starts). When this thread gets the lock md_start_sync() hasn't run so it doesn't get reaped, but MD_RECOVERY_RUNNING gets cleared. This allows two threads to start which leads to confusion. So don't both if MD_RECOVERY_RUNNING isn't set, but if it is do the flush and the test and the reap all under the mddev_lock to avoid any race with md_check_recovery. Signed-off-by: NNeilBrown <neilb@suse.de> Fixes: 6791875e ("md: make reconfig_mutex optional for writes to md sysfs files.") Cc: stable@vger.kernel.org (v4.0+)
-
由 NeilBrown 提交于
Returning zero from a 'store' function is bad. The return value should be either len length of the string or an error. So use 'len' if 'err' is zero. Fixes: 6791875e ("md: make reconfig_mutex optional for writes to md sysfs files.") Signed-off-by: NNeilBrown <neilb@suse.de> Cc: stable@vger.kernel (v4.0+)
-
- 28 5月, 2015 1 次提交
-
-
由 NeilBrown 提交于
A recent change removed the need for locking around writing to "sync_action" (and various other places), but introduced a subtle race. When e.g. setting 'reshape' on a 'frozen' array, the 'frozen' flag is cleared before 'reshape' is set, so the md thread can get in and start trying recovery - which isn't wanted. So instead of clearing MD_RECOVERY_FROZEN for any command except 'frozen', only clear it when each specific command is parsed. This allows the handling of 'reshape' to clear the bit while a lock is held. Also remove some places where we set MD_RECOVERY_NEEDED, as it is always set on non-error exit of the function. Signed-off-by: NNeilBrown <neilb@suse.de> Fixes: 6791875e ("md: make reconfig_mutex optional for writes to md sysfs files.")
-
- 28 4月, 2015 1 次提交
-
-
由 NeilBrown 提交于
Because of the peculiar way that md devices are created (automatically when the device node is opened), a new device can be created and registered immediately after the blk_unregister_region(disk_devt(disk), disk->minors); call in del_gendisk(). Therefore it is important that all visible artifacts of the previous device are removed before this call. In particular, the 'bdi'. Since: commit c4db59d3 Author: Christoph Hellwig <hch@lst.de> fs: don't reassign dirty inodes to default_backing_dev_info moved the device_unregister(bdi->dev); call from bdi_unregister() to bdi_destroy() it has been quite easy to lose a race and have a new (e.g.) "md127" be created after the blk_unregister_region() call and before bdi_destroy() is ultimately called by the final 'put_disk', which must come after del_gendisk(). The new device finds that the bdi name is already registered in sysfs and complains > [ 9627.630029] WARNING: CPU: 18 PID: 3330 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x5a/0x70() > [ 9627.630032] sysfs: cannot create duplicate filename '/devices/virtual/bdi/9:127' We can fix this by moving the bdi_destroy() call out of blk_release_queue() (which can happen very late when a refcount reaches zero) and into blk_cleanup_queue() - which happens exactly when the md device driver calls it. Then it is only necessary for md to call blk_cleanup_queue() before del_gendisk(). As loop.c devices are also created on demand by opening the device node, we make the same change there. Fixes: c4db59d3Reported-by: NAzat Khuzhin <a3at.mail@gmail.com> Cc: Christoph Hellwig <hch@lst.de> Cc: stable@vger.kernel.org (v4.0) Signed-off-by: NNeilBrown <neilb@suse.de> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 22 4月, 2015 5 次提交
-
-
由 NeilBrown 提交于
When md notices non-sync IO happening while it is trying to resync (or reshape or recover) it slows down to the set minimum. The default minimum might have made sense many years ago but the drives have become faster. Changing the default to match the times isn't really a long term solution. This patch changes the code so that instead of waiting until the speed has dropped to the target, it just waits until pending requests have completed. This means that the delay inserted is a function of the speed of the devices. Testing shows that: - for some loads, the resync speed is unchanged. For those loads increasing the minimum doesn't change the speed either. So this is a good result. To increase resync speed under such loads we would probably need to increase the resync window size. - for other loads, resync speed does increase to a reasonable fraction (e.g. 20%) of maximum possible, and throughput of the load only drops a little bit (e.g. 10%) - for other loads, throughput of the non-sync load drops quite a bit more. These seem to be latency-sensitive loads. So it isn't a perfect solution, but it is mostly an improvement. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
This option is not well justified and testing suggests that it hardly ever makes any difference. The comment suggests there might be a need to wait for non-resync activity indicated by ->nr_waiting, however raise_barrier() already waits for all of that. So just remove it to simplify reasoning about speed limiting. This allows us to remove a 'FIXME' comment from raid5.c as that never used the flag. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
There is really no need for sync_min to be a multiple of chunk_size, and values read from here often aren't. That means you cannot read a value and expect to be able to write it back later. So remove the chunk_size check, and round down to a multiple of 4K, to be sure everything works with 4K-sector devices. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 Goldwyn Rodrigues 提交于
When "re-add" is writted to /sys/block/mdXX/md/dev-YYY/state, the clustered md: 1. Sends RE_ADD message with the desc_nr. Nodes receiving the message clear the Faulty bit in their respective rdev->flags. 2. The node initiating re-add, gathers the bitmaps of all nodes and copies them into the local bitmap. It does not clear the bitmap from which it is copying. 3. Initiating node schedules a md recovery to sync the devices. Signed-off-by: NGuoqing Jiang <gqjiang@suse.com> Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 Goldwyn Rodrigues 提交于
This adds the capability of re-adding a failed disk by writing "re-add" to /sys/block/mdXX/md/dev-YYY/state. This facilitates adding disks which have encountered a temporary error such as a network disconnection/hiccup in an iSCSI device, or a SAN cable disconnection which has been restored. In such a situation, you do not need to remove and re-add the device. Writing re-add to the failed device's state would add it again to the array and perform the recovery of only the blocks which were written after the device failed. This works for generic md, and is not related to clustering. However, this patch is to ease re-add operations listed above in clustering environments. Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-