- 13 6月, 2013 1 次提交
-
-
由 NeilBrown 提交于
__md_stop_writes() will currently sometimes freeze recovery. So any caller must be ready for that to happen, and indeed they are. However if __md_stop_writes() doesn't freeze_recovery, then a recovery could start before mddev_suspend() is called, which could be awkward. This can particularly cause problems or dm-raid. So change __md_stop_writes() to always freeze recovery. This is safe and more predicatable. Reported-by: NBrassow Jonathan <jbrassow@redhat.com> Tested-by: NBrassow Jonathan <jbrassow@redhat.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 30 4月, 2013 3 次提交
-
-
由 Shaohua Li 提交于
In SSD/hard disk hybid storage, discard request should be ignored for hard disk. We used to be doing this way, but the unplug path forgets it. This is suitable for stable tree since v3.6. Cc: stable@vger.kernel.org Reported-and-tested-by: NMarkus <M4rkusXXL@web.de> Signed-off-by: NShaohua Li <shli@fusionio.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
Maintenance of a bad-block-list currently defaults to 'enabled' and is then disabled when it cannot be supported. This is backwards and causes problem for dm-raid which didn't know to disable it. So fix the defaults, and only enabled for v1.x metadata which explicitly has bad blocks enabled. The problem with dm-raid has been present since badblock support was added in v3.1, so this patch is suitable for any -stable from 3.1 onwards. Cc: stable@vger.kernel.org (3.1+) Reported-by: NJonathan Brassow <jbrassow@redhat.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 Hirokazu Takahashi 提交于
Hi. Raid1 and raid10 devices leak memory every time they stop. This is a patch for linux-3.9.0-rc7 to fix this problem. Thanks, Hirokazu Takahashi. Signed-off-by: NHirokazu Takahashi <taka@valinux.co.jp> Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 24 4月, 2013 11 次提交
-
-
由 Jonathan Brassow 提交于
DM RAID: Add message/status support for changing sync action This patch adds a message interface to dm-raid to allow the user to more finely control the sync actions being performed by the MD driver. This gives the user the ability to initiate "check" and "repair" (i.e. scrubbing). Two additional fields have been appended to the status output to provide more information about the type of sync action occurring and the results of those actions, specifically: <sync_action> and <mismatch_cnt>. These new fields will always be populated. This is essentially the device-mapper way of doing what MD controls through the 'sync_action' sysfs file and shows through the 'mismatch_cnt' sysfs file. Signed-off-by: NJonathan Brassow <jbrassow@redhat.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 Jonathan Brassow 提交于
MD: Export 'md_reap_sync_thread' function Make 'md_reap_sync_thread' available to other files, specifically dm-raid.c. - rename reap_sync_thread to md_reap_sync_thread - move the fn after md_check_recovery to match md.h declaration placement - export md_reap_sync_thread Signed-off-by: NJonathan Brassow <jbrassow@redhat.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
read-only arrays should stay that way as much as possible. Updating the metadata - which could be triggered by a re-add while assembling the array metadata - should be avoided. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
When assembling an array incrementally we might want to make it device available when "enough" devices are present, but maybe not "all" devices are present. If the remaining devices appear before the array is actually used, they should be added transparently. We do this by using the "read-auto" mode where the array acts like it is read-only until a write request arrives. Current an add-device request switches a read-auto array to active. This means that only one device can be added after the array is first made read-auto. This isn't a problem for RAID5, but is not ideal for RAID6 or RAID10. Also we don't really want to switch the array to read-auto at all when re-adding a device as this doesn't really imply any change. So: - remove the "md_update_sb()" call from add_new_disk(). This isn't really needed as just adding a disk doesn't require a metadata update. Instead, just set MD_CHANGE_DEVS. This will effect a metadata update soon enough, once the array is not read-only. - Allow the ADD_NEW_DISK ioctl to succeed without activating a read-auto array, providing the MD_DISK_SYNC flag is set. In this case, the device will be rejected if it cannot be added with the correct device number, or has an incorrect event count. - Teach remove_and_add_spares() to be careful about adding spares when the array is read-only (or read-mostly) - only add devices that are thought to be in-sync, and only do it if the array is in-sync itself. - In md_check_recovery, use remove_and_add_spares in the read-only case, rather than open coding just the 'remove' part of it. Reported-by: NMartin Wilck <mwilck@arcor.de> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 Martin Wilck 提交于
When an array is assembled incrementally with mdadm -I -R and the array switches to "active" mode, md starts a recovery. If the array was clean, the "fullsync" flag will be 0. Skip the full recovery in this case, as RAID1 does (the code was actually copied from the sync_request() method of RAID1). Signed-off-by: NMartin Wilck <mwilck@arcor.de> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
If we write to a known-bad-block it will be flags as having a ReadError by analyse_stripe, but the write will proceed anyway (as it should). Then the read-error handling will kick in an write again, then re-read. We don't need that 'write-again', so set R5_ReWrite so it looks like it has already been done. Then we will just get the re-read, which we want. Reported-by: Nmajianpeng <majianpeng@gmail.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 majianpeng 提交于
As the function call is the most expensive of these tests it should be done later in the chain so that it can be avoided in some cases. Signed-off-by: NJianpeng Ma <majianpeng@gmail.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 Akinobu Mita 提交于
The value returned by test_and_set_bit_le() drivers/md/bitmap.c is not used. So just use set_bit_le(). The same goes for test_and_clear_bit_le(). Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com> Cc: Neil Brown <neilb@suse.de> Cc: linux-raid@vger.kernel.org Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
If a fail device or a spare is removed from an array, there is not need to make the array 'active'. If/when the array does become active for some other reason the metadata will be update to reflect the removal. If that never happens and the array is stopped while still read-auto, then there is no loss in forgetting the that the device had 'failed'. A read-only array will leave failed devices attached to the array personality, so we need to explicitly call remove_and_add_spares() to free it (clearing Blocked just like we do in store_slot()). Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
slot_store and remove_and_add_spares both call ->hot_remove_disk(), but with slightly different tests and consequences, which is at least untidy and might be buggy. So modify remove_and_add_spaces() so that it can be asked to remove a specific device, and call it from slot_store(). We also clear the Blocked flag to ensure that doesn't prevent removal. The purpose of Blocked is to prevent automatic removal by the kernel before an error is acknowledged. If the array is read/write then user-space would have not reason to remove a device unless it was known to be 'spare' or 'faulty' in which it would have already cleared the Blocked flag. If the array is read-only, the flag might still be blocked, but there is no harm in clearing the flag for read-only arrays. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
Normally we don't even try to update the metadata if the array is read-only. However future patches will increase the number of things that can happen on a read-only array, so it is safest to explicitly disable this. Every time that mddev->ro is set to 0, either - md_update_sb will be called again (at least if MD_CHANGE_DEVS is set) or - the mddev->thread is scheduled, which will also run md_update_sb if needed. So this is safe: if the array ever become read-write the metadata will be updated. Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 19 4月, 2013 1 次提交
-
-
由 Linus Torvalds 提交于
This reverts commit 3a366e61. Wanlong Gao reports that it causes a kernel panic on his machine several minutes after boot. Reverting it removes the panic. Jens says: "It's not quite clear why that is yet, so I think we should just revert the commit for 3.9 final (which I'm assuming is pretty close). The wifi is crap at the LSF hotel, so sending this email instead of queueing up a revert and pull request." Reported-by: NWanlong Gao <gaowanlong@cn.fujitsu.com> Requested-by: NJens Axboe <axboe@kernel.dk> Cc: Tejun Heo <tj@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 05 4月, 2013 2 次提交
-
-
由 Mike Snitzer 提交于
A recent patch to fix the dm cache target's writethrough mode extended the bio's front_pad to include a 1056-byte struct dm_bio_details. Writeback mode doesn't need this, so this patch reduces the per_bio_data_size to 16 bytes in this case instead of 1096. The dm_bio_details structure was added in "dm cache: fix writes to cache device in writethrough mode" which fixed commit e2e74d61 ("dm cache: fix race in writethrough implementation"). In writeback mode we avoid allocating the writethrough-specific members of the per_bio_data structure (the dm_bio_details structure included). Signed-off-by: NMike Snitzer <snitzer@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Darrick J. Wong 提交于
The dm-cache writethrough strategy introduced by commit e2e74d61 ("dm cache: fix race in writethrough implementation") issues a bio to the origin device, remaps and then issues the bio to the cache device. This more conservative in-series approach was selected to favor correctness over performance (of the previous parallel writethrough). However, this in-series implementation that reuses the same bio to write both the origin and cache device didn't take into account that the block layer's req_bio_endio() modifies a completing bio's bi_sector and bi_size. So the new writethrough strategy needs to preserve these bio fields, and restore them before submission to the cache device, otherwise nothing gets written to the cache (because bi_size is 0). This patch adds a struct dm_bio_details field to struct per_bio_data, and uses dm_bio_record() and dm_bio_restore() to ensure the bio is restored before reissuing to the cache device. Adding such a large structure to the per_bio_data is not ideal but we can improve this later, for now correctness is the important thing. This problem initially went unnoticed because the dm-cache test-suite uses a linear DM device for the dm-cache device's origin device. Writethrough worked as expected because DM submits a *clone* of the original bio, so the original bio which was reused for the cache was never touched. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NJoe Thornber <ejt@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
- 21 3月, 2013 10 次提交
-
-
由 Mike Snitzer 提交于
When reading the dm cache metadata from disk, ignore the policy hints unless they were generated by the same major version number of the same policy module. The hints are considered to be private data belonging to the specific module that generated them and there is no requirement for them to make sense to different versions of the policy that generated them. Policy modules are all required to work fine if no previous hints are supplied (or if existing hints are lost). Signed-off-by: NMike Snitzer <snitzer@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Mike Snitzer 提交于
Separate dm cache policy version string into 3 unsigned numbers corresponding to major, minor and patchlevel and store them at the end of the on-disk metadata so we know which version of the policy generated the hints in case a future version wants to use them differently. Signed-off-by: NMike Snitzer <snitzer@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Joe Thornber 提交于
We have found a race in the optimisation used in the dm cache writethrough implementation. Currently, dm core sends the cache target two bios, one for the origin device and one for the cache device and these are processed in parallel. This patch avoids the race by changing the code back to a simpler (slower) implementation which processes the two writes in series, one after the other, until we can develop a complete fix for the problem. When the cache is in writethrough mode it needs to send WRITE bios to both the origin and cache devices. Previously we've been implementing this by having dm core query the cache target on every write to find out how many copies of the bio it wants. The cache will ask for two bios if the block is in the cache, and one otherwise. Then main problem with this is it's racey. At the time this check is made the bio hasn't yet been submitted and so isn't being taken into account when quiescing a block for migration (promotion or demotion). This means a single bio may be submitted when two were needed because the block has since been promoted to the cache (catastrophic), or two bios where only one is needed (harmless). I really don't want to start entering bios into the quiescing system (deferred_set) in the get_num_write_bios callback. Instead this patch simplifies things; only one bio is submitted by the core, this is first written to the origin and then the cache device in series. Obviously this will have a latency impact. deferred_writethrough_bios is introduced to record bios that must be later issued to the cache device from the worker thread. This deferred submission, after the origin bio completes, is required given that we're in interrupt context (writethrough_endio). Signed-off-by: NJoe Thornber <ejt@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Joe Thornber 提交于
When writing the dirty bitset to the metadata device on a clean shutdown, clear the dirty bits. Previously they were left indicating the cache was dirty. This led to confusion about whether there really was dirty data in the cache or not. (This was a harmless bug.) Reported-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NJoe Thornber <ejt@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Heinz Mauelshagen 提交于
If the cache policy's config values are not able to be set we must set the policy to NULL after destroying it in create_cache_policy() so we don't attempt to destroy it a second time later. Signed-off-by: NHeinz Mauelshagen <heinzm@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Heinz Mauelshagen 提交于
Return error if cache_create() fails. A missing return check made cache_ctr continue even after an error in cache_create() resulting in the cache object being destroyed. So a simple failure like an odd number of cache policy config value arguments would result in an oops. Signed-off-by: NHeinz Mauelshagen <heinzm@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Joe Thornber 提交于
Squash various 32bit link errors. >> on i386: >> drivers/built-in.o: In function `is_discarded_oblock': >> dm-cache-target.c:(.text+0x1ea28e): undefined reference to `__udivdi3' ... Reported-by: NRandy Dunlap <rdunlap@infradead.org> Signed-off-by: NJoe Thornber <ejt@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Mikulas Patocka 提交于
A deadlock was found in the prefetch code in the dm verity map function. This patch fixes this by transferring the prefetch to a worker thread and skipping it completely if kmalloc fails. If generic_make_request is called recursively, it queues the I/O request on the current->bio_list without making the I/O request and returns. The routine making the recursive call cannot wait for the I/O to complete. The deadlock occurs when one thread grabs the bufio_client mutex and waits for an I/O to complete but the I/O is queued on another thread's current->bio_list and is waiting to get the mutex held by the first thread. The fix recognises that prefetching is not essential. If memory can be allocated, it queues the prefetch request to the worker thread, but if not, it does nothing. Signed-off-by: NPaul Taysom <taysom@chromium.org> Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com> Cc: stable@kernel.org
-
由 Joe Thornber 提交于
Fix a discard granularity calculation to work for non power of 2 block sizes. In order for thinp to passdown discard bios to the underlying data device, the data device must have a discard granularity that is a factor of the thinp block size. Originally this check was done by using bitops since the block_size was known to be a power of two. Introduced by commit f13945d7 ("dm thin: support a non power of 2 discard_granularity"). Signed-off-by: NJoe Thornber <ejt@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Joe Thornber 提交于
Fix a bug in dm_btree_remove that could leave leaf values with incorrect reference counts. The effect of this was that removal of a shared block could result in the space maps thinking the block was no longer used. More concretely, if you have a thin device and a snapshot of it, sending a discard to a shared region of the thin could corrupt the snapshot. Thinp uses a 2-level nested btree to store it's mappings. This first level is indexed by thin device, and the second level by logical block. Often when we're removing an entry in this mapping tree we need to rebalance nodes, which can involve shadowing them, possibly creating a copy if the block is shared. If we do create a copy then children of that node need to have their reference counts incremented. In this way reference counts percolate down the tree as shared trees diverge. The rebalance functions were incrementing the children at the appropriate time, but they were always assuming the children were internal nodes. This meant the leaf values (in our case packed block/flags entries) were not being incremented. Cc: stable@vger.kernel.org Signed-off-by: NJoe Thornber <ejt@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
- 20 3月, 2013 5 次提交
-
-
由 Paul Bolle 提交于
Once instance of this Kconfig macro remained after commit 51acbcec ("md: remove CONFIG_MULTICORE_RAID456"). Remove that one too. And, while we're at it, also remove it from the defconfig files that carry it. Signed-off-by: NPaul Bolle <pebolle@tiscali.nl> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
A number of problems can occur due to races between resync/recovery and discard. - if sync_request calls handle_stripe() while a discard is happening on the stripe, it might call handle_stripe_clean_event before all of the individual discard requests have completed (so some devices are still locked, but not all). Since commit ca64cae9 md/raid5: Make sure we clear R5_Discard when discard is finished. this will cause R5_Discard to be cleared for the parity device, so handle_stripe_clean_event() will not be called when the other devices do become unlocked, so their ->written will not be cleared. This ultimately leads to a WARN_ON in init_stripe and a lock-up. - If handle_stripe_clean_event() does clear R5_UPTODATE at an awkward time for resync, it can lead to s->uptodate being less than disks in handle_parity_checks5(), which triggers a BUG (because it is one). So: - keep R5_Discard on the parity device until all other devices have completed their discard request - make sure we don't try to have a 'discard' and a 'sync' action at the same time. This involves a new stripe flag to we know when a 'discard' is happening, and the use of R5_Overlap on the parity disk so when a discard is wanted while a sync is active, so we know to wake up the discard at the appropriate time. Discard support for RAID5 was added in 3.7, so this is suitable for any -stable kernel since 3.7. Cc: stable@vger.kernel.org (v3.7+) Reported-by: NJes Sorensen <Jes.Sorensen@redhat.com> Tested-by: NJes Sorensen <Jes.Sorensen@redhat.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 Jonathan Brassow 提交于
MD: Prevent sysfs operations on uninitialized kobjects Device-mapper does not use sysfs; but when device-mapper is leveraging MD's RAID personalities, MD sometimes attempts to update sysfs. This patch adds checks for 'mddev-kobj.sd' in sysfs_[un]link_rdev to ensure it is about to operate on something valid. This patch also checks for 'mddev->kobj.sd' before calling 'sysfs_notify' in 'remove_and_add_spares'. Although 'sysfs_notify' already makes this check, doing so in 'remove_and_add_spares' prevents an additional mutex operation. Signed-off-by: NJonathan Brassow <jbrassow@redhat.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 Jonathan Brassow 提交于
MD RAID5: Fix kernel oops when RAID4/5/6 is used via device-mapper Commit a9add5d9 (v3.8-rc1) added blktrace calls to the RAID4/5/6 driver. However, when device-mapper is used to create RAID4/5/6 arrays, the mddev->gendisk and mddev->queue fields are not setup. Therefore, calling things like trace_block_bio_remap will cause a kernel oops. This patch conditionalizes those calls on whether the proper fields exist to make the calls. (Device-mapper will call trace_block_bio_remap on its own.) This patch is suitable for the 3.8.y stable kernel. Cc: stable@vger.kernel.org (v3.8+) Signed-off-by: NJonathan Brassow <jbrassow@redhat.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
Since commit 1ed850f3 md/raid5: make sure to_read and to_write never go negative. It has been possible for handle_stripe_dirtying to be called when there isn't actually any work to do. It then calls schedule_reconstruction() which will set R5_LOCKED on the parity block(s) even when nothing else is happening. This then causes problems in do_release_stripe(). So add checks to schedule_reconstruction() so that if it doesn't find anything to do, it just aborts. This bug was introduced in v3.7, so the patch is suitable for -stable kernels since then. Cc: stable@vger.kernel.org (v3.7+) Reported-by: Nmajianpeng <majianpeng@gmail.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 02 3月, 2013 7 次提交
-
-
由 Heinz Mauelshagen 提交于
A simple cache policy that writes back all data to the origin. This is used to decommission a dm cache by emptying it. Signed-off-by: NHeinz Mauelshagen <mauelshagen@redhat.com> Signed-off-by: NJoe Thornber <ejt@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Joe Thornber 提交于
A cache policy that uses a multiqueue ordered by recent hit count to select which blocks should be promoted and demoted. This is meant to be a general purpose policy. It prioritises reads over writes. Signed-off-by: NJoe Thornber <ejt@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Joe Thornber 提交于
Add a target that allows a fast device such as an SSD to be used as a cache for a slower device such as a disk. A plug-in architecture was chosen so that the decisions about which data to migrate and when are delegated to interchangeable tunable policy modules. The first general purpose module we have developed, called "mq" (multiqueue), follows in the next patch. Other modules are under development. Signed-off-by: NJoe Thornber <ejt@redhat.com> Signed-off-by: NHeinz Mauelshagen <mauelshagen@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Joe Thornber 提交于
Add a persistent bitset as a wrapper around dm-array. Signed-off-by: NJoe Thornber <ejt@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Joe Thornber 提交于
Add a transactional array. Signed-off-by: NJoe Thornber <ejt@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Joe Thornber 提交于
This patch takes advantage of the new bio-prison interface where the memory is now passed in rather than using a mempool in bio-prison. This allows the map function to avoid performing potentially-blocking allocations that could lead to deadlocks: We want to avoid the cell allocation that is done in bio_detain. (The potential for mempool deadlocks still remains in other functions that use bio_detain.) Signed-off-by: NJoe Thornber <ejt@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Joe Thornber 提交于
Change the dm_bio_prison interface so that instead of allocating memory internally, dm_bio_detain is supplied with a pre-allocated cell each time it is called. This enables a subsequent patch to move the allocation of the struct dm_bio_prison_cell outside the thin target's mapping function so it can no longer block there. Signed-off-by: NJoe Thornber <ejt@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-