- 24 4月, 2013 5 次提交
-
-
由 NeilBrown 提交于
read-only arrays should stay that way as much as possible. Updating the metadata - which could be triggered by a re-add while assembling the array metadata - should be avoided. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
When assembling an array incrementally we might want to make it device available when "enough" devices are present, but maybe not "all" devices are present. If the remaining devices appear before the array is actually used, they should be added transparently. We do this by using the "read-auto" mode where the array acts like it is read-only until a write request arrives. Current an add-device request switches a read-auto array to active. This means that only one device can be added after the array is first made read-auto. This isn't a problem for RAID5, but is not ideal for RAID6 or RAID10. Also we don't really want to switch the array to read-auto at all when re-adding a device as this doesn't really imply any change. So: - remove the "md_update_sb()" call from add_new_disk(). This isn't really needed as just adding a disk doesn't require a metadata update. Instead, just set MD_CHANGE_DEVS. This will effect a metadata update soon enough, once the array is not read-only. - Allow the ADD_NEW_DISK ioctl to succeed without activating a read-auto array, providing the MD_DISK_SYNC flag is set. In this case, the device will be rejected if it cannot be added with the correct device number, or has an incorrect event count. - Teach remove_and_add_spares() to be careful about adding spares when the array is read-only (or read-mostly) - only add devices that are thought to be in-sync, and only do it if the array is in-sync itself. - In md_check_recovery, use remove_and_add_spares in the read-only case, rather than open coding just the 'remove' part of it. Reported-by: NMartin Wilck <mwilck@arcor.de> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
If a fail device or a spare is removed from an array, there is not need to make the array 'active'. If/when the array does become active for some other reason the metadata will be update to reflect the removal. If that never happens and the array is stopped while still read-auto, then there is no loss in forgetting the that the device had 'failed'. A read-only array will leave failed devices attached to the array personality, so we need to explicitly call remove_and_add_spares() to free it (clearing Blocked just like we do in store_slot()). Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
slot_store and remove_and_add_spares both call ->hot_remove_disk(), but with slightly different tests and consequences, which is at least untidy and might be buggy. So modify remove_and_add_spaces() so that it can be asked to remove a specific device, and call it from slot_store(). We also clear the Blocked flag to ensure that doesn't prevent removal. The purpose of Blocked is to prevent automatic removal by the kernel before an error is acknowledged. If the array is read/write then user-space would have not reason to remove a device unless it was known to be 'spare' or 'faulty' in which it would have already cleared the Blocked flag. If the array is read-only, the flag might still be blocked, but there is no harm in clearing the flag for read-only arrays. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
Normally we don't even try to update the metadata if the array is read-only. However future patches will increase the number of things that can happen on a read-only array, so it is safest to explicitly disable this. Every time that mddev->ro is set to 0, either - md_update_sb will be called again (at least if MD_CHANGE_DEVS is set) or - the mddev->thread is scheduled, which will also run md_update_sb if needed. So this is safe: if the array ever become read-write the metadata will be updated. Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 20 3月, 2013 1 次提交
-
-
由 Jonathan Brassow 提交于
MD: Prevent sysfs operations on uninitialized kobjects Device-mapper does not use sysfs; but when device-mapper is leveraging MD's RAID personalities, MD sometimes attempts to update sysfs. This patch adds checks for 'mddev-kobj.sd' in sysfs_[un]link_rdev to ensure it is about to operate on something valid. This patch also checks for 'mddev->kobj.sd' before calling 'sysfs_notify' in 'remove_and_add_spares'. Although 'sysfs_notify' already makes this check, doing so in 'remove_and_add_spares' prevents an additional mutex operation. Signed-off-by: NJonathan Brassow <jbrassow@redhat.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 28 2月, 2013 1 次提交
-
-
由 NeilBrown 提交于
If something has failed while the array was read-auto, then when we switch to 'active' we need to update the metadata. This will happen anyway but it is good to expedite it, and also to ensure any failed device has been released by the underlying device before we try to action the ioctl which caused us to switch to 'active' mode. Reported-by: NJoe Lawrence <Joe.Lawrence@stratus.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 26 2月, 2013 1 次提交
-
-
由 NeilBrown 提交于
You cannot resize a RAID0 array (in terms of making the devices bigger), but the code doesn't entirely stop you. So: disable setting of the available size on each device for RAID0 and Linear devices. This must not change as doing so can change the effective layout of data. Make sure that the size that raid0_size() reports is accurate, but rounding devices sizes to chunk sizes. As the device sizes cannot change now, this isn't so important, but it is best to be safe. Without this change: mdadm --grow /dev/md0 -z max mdadm --grow /dev/md0 -Z max then read to the end of the array can cause a BUG in a RAID0 array. These bugs have been present ever since it became possible to resize any device, which is a long time. So the fix is suitable for any -stable kerenl. Cc: stable@vger.kernel.org Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 21 2月, 2013 1 次提交
-
-
由 Sebastian Riemer 提交于
If an fsync occurs on a read-only array, we need to send a completion for the IO and may not increment the active IO count. Otherwise, we hit a bug trace and can't stop the MD array anymore. By advice of Christoph Hellwig we return success upon a flush request but we return -EROFS for other writes. We detect flush requests by checking if the bio has zero sectors. This patch is suitable to any -stable kernel to which it applies. Cc: Christoph Hellwig <hch@infradead.org> Cc: Ben Hutchings <ben@decadent.org.uk> Cc: NeilBrown <neilb@suse.de> Cc: stable@vger.kernel.org Signed-off-by: NSebastian Riemer <sebastian.riemer@profitbricks.com> Reported-by: NBen Hutchings <ben@decadent.org.uk> Acked-by: NPaul Menzel <paulepanter@users.sourceforge.net> Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 13 12月, 2012 3 次提交
-
-
由 majianpeng 提交于
If a resync is aborted cleanly, ->curr_resync is a reliable record of where we got up to. If there was an error it is less reliable but we always know that ->curr_resync_completed is safe. So add a flag MD_RECOVERY_ERROR to differentiate between these cases and set recovery_cp accordingly. Signed-off-by: NJianpeng Ma <majianpeng@gmail.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 majianpeng 提交于
md will current only only checkpoint recovery or resync ever 1/16th of the device size. As devices get larger this can become a long time an so a lot of work that might need to be duplicated after a shutdown. So add a time-based checkpoint. Every 5 minutes limits the amount of duplicated effort to at most 5 minutes, and has almost zero impact on performance. [changelog entry re-written by NeilBrown] Signed-off-by: NJianpeng Ma <majianpeng@gmail.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 kernelmail 提交于
In resyncing, recovery_cp only updated when resync aborted or completed. But in md drives,many place used it to judge.So add a place to update. Signed-off-by: NJianpeng Ma <majianpeng@gmail.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 11 12月, 2012 3 次提交
-
-
由 NeilBrown 提交于
Intent was unnecessarily deep. Also change one 'switch' which has a single case element, into an 'if'. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
When we remove a device from an md array, the final removal of the "dev-XX" sys entry is run asynchronously. If we then re-add that device immediately before the worker thread gets to run, we can end up trying to add the "dev-XX" sysfs entry back before it has been removed. So in both places where we add a device, call flush_workqueue(md_misc_wq); before taking the md lock (as holding the md lock can prevent removal to complete). Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
'i' is unused. NeilBrown <neilb@suse.de>
-
- 30 11月, 2012 1 次提交
-
-
由 Lukas Czerner 提交于
New wait_event{_interruptible}_lock_irq{_cmd} macros added. This commit moves the private wait_event_lock_irq() macro from MD to regular wait includes, introduces new macro wait_event_lock_irq_cmd() instead of using the old method with omitting cmd parameter which is ugly and makes a use of new macros in the MD. It also introduces the _interruptible_ variant. The use of new interface is when one have a special lock to protect data structures used in the condition, or one also needs to invoke "cmd" before putting it to sleep. All new macros are expected to be called with the lock taken. The lock is released before sleep and is reacquired afterwards. We will leave the macro with the lock held. Note to DM: IMO this should also fix theoretical race on waitqueue while using simultaneously wait_event_lock_irq() and wait_event() because of lack of locking around current state setting and wait queue removal. Signed-off-by: NLukas Czerner <lczerner@redhat.com> Cc: Neil Brown <neilb@suse.de> Cc: David Howells <dhowells@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 20 11月, 2012 3 次提交
-
-
由 NeilBrown 提交于
md_stop() would stop an array, but not free various attached data structures. For internal arrays, these are freed later in do_md_stop() or mddev_put(), but they don't apply for dm-raid arrays. So get md_stop() to free them, and only all it from dm-raid. For internal arrays we now call __md_stop. Reported-by: Nmajianpeng <majianpeng@gmail.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 majianpeng 提交于
If read_seqretry returned true and bbp was changed, it will write invalid address which can cause some serious problem. This bug was introduced by commit v3.0-rc7-130-g2699b672. So fix is suitable for 3.0.y thru 3.6.y. Reported-by: zhuwenfeng@kedacom.com Tested-by: zhuwenfeng@kedacom.com Cc: stable@vger.kernel.org Signed-off-by: NJianpeng Ma <majianpeng@gmail.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 majianpeng 提交于
This bug was introduced by commit(v3.0-rc7-126-g2230dfe4). So fix is suitable for 3.0.y thru 3.6.y. Cc: stable@vger.kernel.org Signed-off-by: NJianpeng Ma <majianpeng@gmail.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 30 10月, 2012 1 次提交
-
-
由 Masanari Iida 提交于
Correct spelling typo in drivers/md. Signed-off-by: NMasanari Iida <standby24x7@gmail.com> Signed-off-by: NJiri Kosina <jkosina@suse.cz>
-
- 11 10月, 2012 7 次提交
-
-
由 NeilBrown 提交于
If 'resync_max' is set to 0 (as is often done when starting a reshape, so the mdadm can remain in control during a sensitive period), and if the reshape request is initially delayed because another array using the same array is resyncing or reshaping etc, when user-space cannot easily tell when the delay changes from being due to a conflicting reshape, to being due to resync_max = 0. So introduce a new state: (curr_resync == 3) to reflect this, make sure it is visible both via /proc/mdstat and via the "sync_completed" sysfs attribute, and ensure that the event transition from one delay state to the other is properly notified. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
If you make an array bigger but suppress resync of the new region with mdadm --grow /dev/mdX --size=max --assume-clean then stop the array before anything is written to it, the effect of the "--assume-clean" is lost and the array will resync the new space when restarted. So ensure that we update the metadata in the case. Reported-by: NSebastian Riemer <sebastian.riemer@profitbricks.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
In some cases array are started in 'read-auto' state where in nothing gets written to any device until the array is written to. The purpose of this is to make accidental auto-assembly of the wrong arrays less of a risk, and to allow arrays to be started to read suspend-to-disk images without actually changing anything (as might happen if the array were dirty and a resync seemed necessary). Explicitly writing the 'sync_action' for a read-auto array currently doesn't clear the read-auto state, so the sync action doesn't happen, which can be confusing. So allow any successful write to sync_action to clear any read-auto state. Reported-by: NAlexander Kühn <alexander.kuehn@nagilum.de> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 Jianpeng Ma 提交于
Now that multiple threads can handle stripes, it is safer to use an atomic64_t for resync_mismatches, to avoid update races. Signed-off-by: NJianpeng Ma <majianpeng@gmail.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 Jonathan Brassow 提交于
MD RAID10: Fix a couple potential kernel panics if RAID10 is used by dm-raid When device-mapper uses the RAID10 personality through dm-raid.c, there is no 'gendisk' structure in mddev and some sysfs information is also not populated. This patch avoids touching those non-existent structures. Signed-off-by: NJonathan Brassow <jbrassow@rehdat.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
Some ioctls don't need to take the mutex and doing so can cause a delay as it is held during super-block update. So move those ioctls out of the mutex and rely on rcu locking to ensure we don't access stale data. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 Shaohua Li 提交于
Change the thread parameter, so the thread can carry extra info. Next patch will use it. Signed-off-by: NShaohua Li <shli@fusionio.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 19 9月, 2012 1 次提交
-
-
由 NeilBrown 提交于
It isn't always necessary to update the metadata when spares are removed as the presence-or-not of a spare isn't really important to the integrity of an array. Also activating a spare doesn't always require updating the metadata as the update on 'recovery-completed' is usually sufficient. However the introduction of 'replacement' devices have made these transitions sometimes more important. For example the 'Replacement' flag isn't cleared until the original device is removed, so we need to ensure a metadata update after that 'spare' is removed. So set MD_CHANGE_DEVS whenever a spare is activated or removed, to complement the current situation where it is set when a spare is added or a device is failed (or a number of other less common situations). This is suitable for -stable as out-of-data metadata could lead to data corruption. This is only relevant for 3.3 and later 9when 'replacement' as introduced. Cc: stable@vger.kernel.org Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 09 9月, 2012 3 次提交
-
-
由 Kent Overstreet 提交于
Previously, there was bio_clone() but it only allocated from the fs bio set; as a result various users were open coding it and using __bio_clone(). This changes bio_clone() to become bio_clone_bioset(), and then we add bio_clone() and bio_clone_kmalloc() as wrappers around it, making use of the functionality the last patch adedd. This will also help in a later patch changing how bio cloning works. Signed-off-by: NKent Overstreet <koverstreet@google.com> CC: Jens Axboe <axboe@kernel.dk> CC: NeilBrown <neilb@suse.de> CC: Alasdair Kergon <agk@redhat.com> CC: Boaz Harrosh <bharrosh@panasas.com> CC: Jeff Garzik <jeff@garzik.org> Acked-by: NJeff Garzik <jgarzik@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Kent Overstreet 提交于
Now that bios keep track of where they were allocated from, bio_integrity_alloc_bioset() becomes redundant. Remove bio_integrity_alloc_bioset() and drop bio_set argument from the related functions and make them use bio->bi_pool. Signed-off-by: NKent Overstreet <koverstreet@google.com> CC: Jens Axboe <axboe@kernel.dk> CC: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: NTejun Heo <tj@kernel.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Kent Overstreet 提交于
With the old code, when you allocate a bio from a bio pool you have to implement your own destructor that knows how to find the bio pool the bio was originally allocated from. This adds a new field to struct bio (bi_pool) and changes bio_alloc_bioset() to use it. This makes various bio destructors unnecessary, so they're then deleted. v6: Explain the temporary if statement in bio_put Signed-off-by: NKent Overstreet <koverstreet@google.com> CC: Jens Axboe <axboe@kernel.dk> CC: NeilBrown <neilb@suse.de> CC: Alasdair Kergon <agk@redhat.com> CC: Nicholas Bellinger <nab@linux-iscsi.org> CC: Lars Ellenberg <lars.ellenberg@linbit.com> Acked-by: NTejun Heo <tj@kernel.org> Acked-by: NNicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 16 8月, 2012 1 次提交
-
-
由 NeilBrown 提交于
commit 27a7b260 md: Fix handling for devices from 2TB to 4TB in 0.90 metadata. changed 0.90 metadata handling to truncated size to 4TB as that is all that 0.90 can record. However for RAID0 and Linear, 0.90 doesn't need to record the size, so this truncation is not needed and causes working arrays to become too small. So avoid the truncation for RAID0 and Linear This bug was introduced in 3.1 and is suitable for any stable kernels from then onwards. As the offending commit was tagged for 'stable', any stable kernel that it was applied to should also get this patch. That includes at least 2.6.32, 2.6.33 and 3.0. (Thanks to Ben Hutchings for providing that list). Cc: stable@vger.kernel.org Signed-off-by: NNeil Brown <neilb@suse.de>
-
- 31 7月, 2012 4 次提交
-
-
由 NeilBrown 提交于
This will allow md/raid to know why the unplug was called, and will be able to act according - if !from_schedule it is safe to perform tasks which could themselves schedule. Signed-off-by: NNeilBrown <neilb@suse.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 NeilBrown 提交于
Both md and umem has similar code for getting notified on an blk_finish_plug event. Centralize this code in block/ and allow each driver to provide its distinctive difference. Signed-off-by: NNeilBrown <neilb@suse.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 NeilBrown 提交于
This seemed like a good idea at the time, but after further thought I cannot see it making a difference other than very occasionally and testing to try to exercise the case it is most likely to help did not show any performance difference by removing it. So remove the counting of active plugs and allow 'pending writes' to be activated at any time, not just when no plugs are active. This is only relevant when there is a write-intent bitmap, and the updating of the bitmap will likely introduce enough delay that the single-threading of bitmap updates will be enough to collect large numbers of updates together. Removing this will make it easier to centralise the unplug code, and will clear the other for other unplug enhancements which have a measurable effect. Signed-off-by: NNeilBrown <neilb@suse.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 NeilBrown 提交于
do_md_stop tests mddev->openers while holding ->open_mutex, and fails if this count is too high. So callers do not need to check mddev->openers and doing so isn't very meaningful as they don't hold ->open_mutex so the number could change. So remove the unnecessary tests on mddev->openers. These are not called often enough for there to be any gain in an early test on ->open_mutex to avoid the need for a slightly more costly mutex_lock call. Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 19 7月, 2012 2 次提交
-
-
由 NeilBrown 提交于
md will refuse to stop an array if any other fd (or mounted fs) is using it. When any fs is unmounted of when the last open fd is closed all pending IO will be flushed (e.g. sync_blockdev call in __blkdev_put) so there will be no pending IO to worry about when the array is stopped. However in order to send the STOP_ARRAY ioctl to stop the array one must first get and open fd on the block device. If some fd is being used to write to the block device and it is closed after mdadm open the block device, but before mdadm issues the STOP_ARRAY ioctl, then there will be no last-close on the md device so __blkdev_put will not call sync_blockdev. If this happens, then IO can still be in-flight while md tears down the array and bad things can happen (use-after-free and subsequent havoc). So in the case where do_md_stop is being called from an open file descriptor, call sync_block after taking the mutex to ensure there will be no new openers. This is needed when setting a read-write device to read-only too. Cc: stable@vger.kernel.org Reported-by: Nmajianpeng <majianpeng@gmail.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
commit c6563a8c md: add possibility to change data-offset for devices. introduced a 'new_data_offset' attribute which should normally be the same as 'data_offset', but can be explicitly set to a different value to allow a reshape operation to move the data. Unfortunately when the 'data_offset' is explicitly set through sysfs, the new_data_offset is not also set, so the two would become out-of-sync incorrectly. One result of this is that trying to set the 'size' after the 'data_offset' would fail because it is not permitted to set the size when the 'data_offset' and 'new_data_offset' are different - as that can be confusing. Consequently when mdadm tried to do this while assembling an IMSM array it would fail. This bug was introduced in 3.5-rc1. Reported-by: NBrian Downing <bdowning@lavos.net> Bisected-by: NBrian Downing <bdowning@lavos.net> Tested-by: NBrian Downing <bdowning@lavos.net> Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 03 7月, 2012 2 次提交
-
-
由 NeilBrown 提交于
We currently only allow a device to be re-added if it appear to be in-sync. This is overly restrictive as it may be desirable to re-add a device that is in the middle of recovery. So remove the test for "InSync" - the test on rdev->raid_disk is sufficient to ensure that the re-add will succeed. Reported-by: NAlexander Lyakas <alex.bolshoy@gmail.com> Tested-by: NAlexander Lyakas <alex.bolshoy@gmail.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
Having the 'name' arg optional and defaulting to the current personality name is no necessary and leads to errors, as when changing the level of an array we can end up using the name of the old level instead of the new one. So make it non-optional and always explicitly pass the name of the level that the array will be. Reported-by: Nmajianpeng <majianpeng@gmail.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-