- 08 6月, 2011 1 次提交
-
-
由 Jonathan Brassow 提交于
Don't attempt md_integrity_register if there is no gendisk struct available. When MD arrays are built via device-mapper, the gendisk structure is not available via mddev. Signed-off-by: NJonathan Brassow <jbrassow@redhat.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 11 5月, 2011 3 次提交
-
-
由 NeilBrown 提交于
The sysfs attribute 'resync_start' (known internally as recovery_cp), records where a resync is up to. A value of 0 means the array is not known to be in-sync at all. A value of MaxSector means the array is believed to be fully in-sync. When the size of member devices of an array (RAID1,RAID4/5/6) is increased, the array can be increased to match. This process sets resync_start to the old end-of-device offset so that the new part of the array gets resynced. However with RAID1 (and RAID6) a resync is not technically necessary and may be undesirable. So it would be good if the implied resync after the array is resized could be avoided. So: change 'resync_start' so the value can be changed while the array is active, and as a precaution only allow it to be changed while resync/recovery is 'frozen'. Changing it once resync has started is not going to be useful anyway. This allows the array to be resized without a resync by: write 'frozen' to 'sync_action' write new size to 'component_size' (this will set resync_start) write 'none' to 'resync_start' write 'idle' to 'sync_action'. Also slightly improve some tests on recovery_cp when resizing raid1/raid5. Now that an arbitrary value could be set we should be more careful in our tests. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
The 'add_new_disk' ioctl can be used to add a device either as a spare, or as an active disk that just needs to be resynced based on write-intent-bitmap information (re-add) Currently if a re-add is requested but fails we add as a spare instead. This makes it impossible for user-space to check for failure. So change to require that a re-add attempt will either succeed or completely fail. User-space can then decide what to do next. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
There is a race when creating an md device by opening /dev/mdXX. If two processes do this at much the same time they will follow the call path __blkdev_get -> get_gendisk -> kobj_lookup The first will call -> md_probe -> md_alloc -> add_disk -> blk_register_region and the race happens when the second gets to kobj_lookup after add_disk has called blk_register_region but before it returns to md_alloc. In the case the second will not call md_probe (as the probe is already done) but will get a handle on the gendisk, return to __blkdev_get which will then call md_open (via the ->open) pointer. As mddev->gendisk hasn't been set yet, md_open will think something is wrong an return with ERESTARTSYS. This can loop endlessly while the first thread makes no progress through add_disk. Nothing is blocking it, but due to scheduler behaviour it doesn't get a turn. So this is essentially a live-lock. We fix this by simply moving the assignment to mddev->gendisk before the call the add_disk() so md_open doesn't get confused. Also move blk_queue_flush earlier because add_disk should be as late as possible. To make sure that md_open doesn't complete until md_alloc has done all that is needed, we take mddev->open_mutex during the last part of md_alloc. md_open will wait for this. This can cause a lock-up on boot so Cc:ing for stable. For 2.6.36 and earlier a different patch will be needed as the 'blk_queue_flush' call isn't there. Signed-off-by: NNeilBrown <neilb@suse.de> Reported-by: NThomas Jarosch <thomas.jarosch@intra2net.com> Tested-by: NThomas Jarosch <thomas.jarosch@intra2net.com> Cc: stable@kernel.org
-
- 20 4月, 2011 1 次提交
-
-
由 Krzysztof Wojcik 提交于
Problem: After raid4->raid0 takeover operation, another takeover operation (e.g raid0->raid10) results "kernel oops". Root cause: Variables 'degraded' in mddev structure is not cleared on raid45->raid0 takeover. This patch reset this variable. Signed-off-by: NKrzysztof Wojcik <krzysztof.wojcik@intel.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 18 4月, 2011 2 次提交
-
-
由 NeilBrown 提交于
When an md device adds a request to a queue, it can call mddev_check_plugged. If this succeeds then we know that the md thread will be woken up shortly, and ->plug_cnt will be non-zero until then, so some processing can be delayed. If it fails, then no unplug callback is expected and the make_request function needs to do whatever is required to make the request happen. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
md has some plugging infrastructure for RAID5 to use because the normal plugging infrastructure required a 'request_queue', and when called from dm, RAID5 doesn't have one of those available. This relied on the ->unplug_fn callback which doesn't exist any more. So remove all of that code, both in md and raid5. Subsequent patches with restore the plugging functionality. Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 31 3月, 2011 1 次提交
-
-
由 Lucas De Marchi 提交于
Fixes generated by 'codespell' and manually reviewed. Signed-off-by: NLucas De Marchi <lucas.demarchi@profusion.mobi>
-
- 29 3月, 2011 1 次提交
-
-
由 Martin K. Petersen 提交于
We incorrectly returned -EINVAL when none of the devices in the array had an integrity profile. This in turn prevented mdadm from starting the metadevice. Fix this so we only return errors on mismatched profiles and memory allocation failures. Reported-by: NGiacomo Catenazzi <cate@cateee.net> Reported-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 17 3月, 2011 1 次提交
-
-
由 Martin K. Petersen 提交于
MD and DM create a new bio_set for every metadevice. Each bio_set has an integrity mempool attached regardless of whether the metadevice is capable of passing integrity metadata. This is a waste of memory. Instead we defer the allocation decision to MD and DM since we know at metadevice creation time whether integrity passthrough is needed or not. Automatic integrity mempool allocation can then be removed from bioset_create() and we make an explicit integrity allocation for the fs_bio_set. Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com> Reported-by: NZdenek Kabelac <zkabelac@redhat.com> Acked-by: NMike Snitzer <snizer@redhat.com> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
- 10 3月, 2011 2 次提交
-
-
由 Jens Axboe 提交于
With the plugging now being explicitly controlled by the submitter, callers need not pass down unplugging hints to the block layer. If they want to unplug, it's because they manually plugged on their own - in which case, they should just unplug at will. Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
由 Jens Axboe 提交于
Code has been converted over to the new explicit on-stack plugging, and delay users have been converted to use the new API for that. So lets kill off the old plugging along with aops->sync_page(). Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
- 24 2月, 2011 1 次提交
-
-
由 NeilBrown 提交于
Revert b821eaa5 and f3b99be1 When I wrote the first of these I had a wrong idea about the lifetime of 'struct block_device'. It can disappear at any time that the block device is not open if it falls out of the inode cache. So relying on the 'size' recorded with it to detect when the device size has changed and so we need to revalidate, is wrong. Rather, we really do need the 'changed' attribute stored directly in the mddev and set/tested as appropriate. Without this patch, a sequence of: mknod / open / close / unlink (which can cause a block_device to be created and then destroyed) will result in a rescan of the partition table and consequence removal and addition of partitions. Several of these in a row can get udev racing to create and unlink and other code can get confused. With the patch, the rescan is only performed when needed and so there are no races. This is suitable for any stable kernel from 2.6.35. Reported-by: N"Wojcik, Krzysztof" <krzysztof.wojcik@intel.com> Signed-off-by: NNeilBrown <neilb@suse.de> Cc: stable@kernel.org
-
- 16 2月, 2011 2 次提交
-
-
由 NeilBrown 提交于
'mdp' devices are md devices with preallocated device numbers for partitions. As such it is possible to mknod and open a partition before opening the whole device. this causes md_probe() to be called with a device number of a partition, which in-turn calls mddev_find with such a number. However mddev_find expects the number of a 'whole device' and does the wrong thing with partition numbers. So add code to mddev_find to remove the 'partition' part of a device number and just work with the 'whole device'. This patch addresses https://bugzilla.kernel.org/show_bug.cgi?id=28652 Reported-by: hkmaly@bigfoot.com Signed-off-by: NNeilBrown <neilb@suse.de> Cc: <stable@kernel.org>
-
由 NeilBrown 提交于
If the desired size of an array is set (via sysfs) before the array is active (which is the normal sequence), we currrently call set_capacity immediately. This means that a subsequent 'open' (as can be caused by some udev-triggers program) will notice the new size and try to probe for partitions. However as the array isn't quite ready yet the read will fail. Then when the array is read, as the size doesn't change again we don't try to re-probe. So when setting array size via sysfs, only call set_capacity if the array is already active. Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 08 2月, 2011 1 次提交
-
-
由 Chris Mason 提交于
md_make_request was calling bio_sectors() for part_stat_add after it was calling the make_request function. This is bad because the make_request function can free the bio and because the bi_size field can change around. The fix here was suggested by Jens Axboe. It saves the sector count before the make_request call. I hit this with CONFIG_DEBUG_PAGEALLOC turned on while trying to break his pretty fusionio card. Cc: <stable@kernel.org> Signed-off-by: NChris Mason <chris.mason@oracle.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 02 2月, 2011 1 次提交
-
-
由 NeilBrown 提交于
Activating a spare in an array while resync/recovery is already happening can lead the that spare being marked in-sync when it isn't really. So don't allow the 'slot' to be set (this activating the device) while resync/recovery is happening. Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 31 1月, 2011 4 次提交
-
-
由 NeilBrown 提交于
There is no need to set this to zero at this point. It will be set to zero by remove_and_add_spares or at the start of md_do_sync at the latest. And setting it to zero before MD_RECOVERY_RUNNING is cleared can make a 'zero' appear briefly in the 'sync_completed' sysfs attribute just as resync is finishing. So simply remove this setting to zero. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
remove_and_add_spares is called in two places where the needs really are very different. remove_and_add_spares should not be called on an array which is about to be reshaped as some extra devices might have been manually added and that would remove them. However if the array is 'read-auto', that will currently happen, which is bad. So in the 'ro != 0' case don't call remove_and_add_spares but simply remove the failed devices as the comment suggests is needed. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
This flag is not needed and is used badly. Devices that are included in a native-metadata array are reserved exclusively for that array - and currently have AllReserved set. They all are bd_claimed for the rdev and so cannot be shared. Devices that are included in external-metadata arrays can be shared among multiple arrays - providing there is no overlap. These are bd_claimed for md in general - not for a particular rdev. When changing the amount of a device that is used in an array we need to check for overlap. This currently includes a check on AllReserved So even without overlap, sharing with an AllReserved device is not allowed. However the bd_claim usage already precludes sharing with these devices, so the test on AllReserved is not needed. And in fact it is wrong. As this is the only use of AllReserved, simply remove all usage and definition of AllReserved. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
If we try to update_raid_disks and it fails, we should put 'delta_disks' back to zero. This is important because some code, such as slot_store, assumes that delta_disks has been validated. Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 25 1月, 2011 1 次提交
-
-
由 Tejun Heo 提交于
WQ_RESCUER is now an internal flag and should only be used in the workqueue implementation proper. Use WQ_MEM_RECLAIM instead. This doesn't introduce any functional difference. Signed-off-by: NTejun Heo <tj@kernel.org> Cc: dm-devel@redhat.com Cc: Neil Brown <neilb@suse.de>
-
- 15 1月, 2011 1 次提交
-
-
由 Tejun Heo 提交于
Commit e09b457b (block: simplify holder symlink handling) incorrectly assumed that there is only one link at maximum. dm may use multiple links and expects block layer to track reference count for each link, which is different from and unrelated to the exclusive device holder identified by @holder when the device is opened. Remove the single holder assumption and automatic removal of the link and revive the per-link reference count tracking. The code essentially behaves the same as before commit e09b457b sans the unnecessary kobject reference count dancing. While at it, note that this facility should not be used by anyone else than the current ones. Sysfs symlinks shouldn't be abused like this and the whole thing doesn't belong in the block layer at all. Signed-off-by: NTejun Heo <tj@kernel.org> Reported-by: NMilan Broz <mbroz@redhat.com> Cc: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Cc: Neil Brown <neilb@suse.de> Cc: linux-raid@vger.kernel.org Cc: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
- 14 1月, 2011 12 次提交
-
-
由 NeilBrown 提交于
When a RAID6 is converted to a RAID5, the extra drive should be discarded. However it isn't due to a typo in a comparison. This bug was introduced in commit e93f68a1 in 2.6.35-rc4 and is suitable for any -stable since than. As the extra drive is not removed, the 'degraded' counter is wrong and so the RAID5 will not respond correctly to a subsequent failure. Cc: stable@kernel.org Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
When adding a spare to an active array, we should check the slot number, but allow it to be larger than raid_disks if a reshape is being prepared. Apply the same test when adding a device to an array-under-construction. It already had most of the test in place, but not quite all. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 Rémi Rérolle 提交于
The values exported in the sync_completed file are unsigned long, which overflows with very large drives, resulting in wrong values reported. Since sync_completed uses sectors as unit, we'll start getting wrong values with components larger than 2TB. This patch simply replaces the use of unsigned long by unsigned long long. Signed-off-by: NRémi Rérolle <rrerolle@lacie.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
The sysfs attributes 'suspend_lo' and 'suspend_hi' describe a region to which read/writes are suspended so that the under lying data can be manipulated without user-space noticing. Currently the window they describe can only move forwards along the device. However this is an unnecessary restriction which will cause problems with planned developments. So relax this restriction and allow these endpoints to move arbitrarily. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
mddev->curr_resync has artificial values of '1' and '2' which are used by the code which ensures only one resync is happening at a time on any given device. These values are internal and should never be exposed to user-space (except when translated appropriately as in the 'pending' status in /proc/mdstat). Unfortunately they are as ->curr_resync is assigned to ->curr_resync_completed and that value is directly visible through sysfs. So change the assignments to ->curr_resync_completed to get the same valued from elsewhere in a form that doesn't have the magic '1' or '2' values. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 Jonathan Brassow 提交于
Allow the metadata to be on a separate device from the data. This doesn't mean the data and metadata will by on separate physical devices - it simply gives device-mapper and userspace tools more flexibility. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 Jonathan Brassow 提交于
Add new parameter to 'sync_page_io'. The new parameter allows us to distinguish between metadata and data operations. This becomes important later when we add the ability to use separate devices for data and metadata. Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
-
由 Jonathan Brassow 提交于
When we allow for separate devices for data and metadata in a later patch, we will need to be able to calculate the superblock offset based on more than the bdev. Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
-
由 NeilBrown 提交于
Setting ->recovery to 0 is generally not a good idea as it could clear bits that shouldn't be cleared. In particular, MD_RECOVERY_FROZEN should only be cleared on explicit request from user-space. So when we need to clear things, just clear the bits that need clearing. As there are a few different places which reap a resync process - and some do an incomplte job - factor out the code for doing the from md_check_recovery and call that function instead of open coding part of it. Signed-off-by: NNeilBrown <neilb@suse.de> Reported-by: NJonathan Brassow <jbrassow@redhat.com>
-
由 NeilBrown 提交于
As md_stop_writes manipulates the sync_thread and calls md_update_sb, it need to be called with mddev_lock held. In all internal cases it is, but the symbol is exported for dm-raid to call and in that case the lock won't be help. Do make an exported version which takes the lock, and an internal version which does not. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
When an md device is in the process of coming on line it is possible for an IO request (typically a partition table probe) to get through before the array is fully initialised, which can cause unexpected behaviour (e.g. a crash). So explicitly record when the array is ready for IO and don't allow IO through until then. There is no possibility for a similar problem when the array is going off-line as there must only be one 'open' at that time, and it is busy off-lining the array and so cannot send IO requests. So no memory barrier is needed in md_stop() This has been a bug since commit 409c57f3 in 2.6.30 which introduced md_make_request. Before then, each personality would register its own make_request_fn when it was ready. This is suitable for any stable kernel from 2.6.30.y onwards. Cc: <stable@kernel.org> Signed-off-by: NNeilBrown <neilb@suse.de> Reported-by: N"Hawrylewicz Czarnowski, Przemyslaw" <przemyslaw.hawrylewicz.czarnowski@intel.com>
-
由 NeilBrown 提交于
commit 589a594b (2.6.37-rc4) fixed a problem were md_thread would sometimes call the ->run function at a bad time. If an error is detected during array start up after the md_thread has been started, the md_thread is killed. This resulted in the ->run function being called once. However the array may not be in a state that it is safe to call ->run. However the fix imposed meant that ->run was not called on a timeout. This means that when an array goes idle, bitmap bits do not get cleared promptly. While the array is busy the bits will still be cleared when appropriate so this is not very serious. There is no risk to data. Change the test so that we only avoid calling ->run when the thread is being stopped. This more explicitly addresses the problem situation. This is suitable for 2.6.37-stable and any -stable kernel to which 589a594b was applied. Cc: stable@kernel.org Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 12 1月, 2011 1 次提交
-
-
由 NeilBrown 提交于
Commit 1a855a06 (2.6.37-rc4) fixed a problem where devices were re-added when they shouldn't be but caused a regression in a less common case that means sometimes devices cannot be re-added when they should be. In particular, when re-adding a device to an array without metadata we should always access the device, but after the above commit we didn't. This patch sets the In_sync flag in that case so that the re-add succeeds. This patch is suitable for any -stable kernel to which 1a855a06 was applied. Cc: stable@kernel.org Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 17 12月, 2010 1 次提交
-
-
由 Martin K. Petersen 提交于
When stacking devices, a request_queue is not always available. This forced us to have a no_cluster flag in the queue_limits that could be used as a carrier until the request_queue had been set up for a metadevice. There were several problems with that approach. First of all it was up to the stacking device to remember to set queue flag after stacking had completed. Also, the queue flag and the queue limits had to be kept in sync at all times. We got that wrong, which could lead to us issuing commands that went beyond the max scatterlist limit set by the driver. The proper fix is to avoid having two flags for tracking the same thing. We deprecate QUEUE_FLAG_CLUSTER and use the queue limit directly in the block layer merging functions. The queue_limit 'no_cluster' is turned into 'cluster' to avoid double negatives and to ease stacking. Clustering defaults to being enabled as before. The queue flag logic is removed from the stacking function, and explicitly setting the cluster flag is no longer necessary in DM and MD. Reported-by: NEd Lin <ed.lin@promise.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com> Acked-by: NMike Snitzer <snitzer@redhat.com> Cc: stable@kernel.org Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
- 09 12月, 2010 3 次提交
-
-
由 NeilBrown 提交于
When we fail to start a raid10 for some reason, we call md_unregister_thread to kill the thread that was created. Unfortunately md_thread() will then make one call into the handler (raid10d) even though md_wakeup_thread has not been called. This is not safe and as md_unregister_thread is called after mddev->private has been set to NULL, it will definitely cause a NULL dereference. So fix this at both ends: - md_thread should only call the handler if THREAD_WAKEUP has been set. - raid10 should call md_unregister_thread before setting things to NULL just like all the other raid modules do. This is applicable to 2.6.35 and later. Cc: stable@kernel.org Reported-by: N"Citizen" <citizen_lee@thecus.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
With v0.90 metadata, a hot-spare does not become a full member of the array until recovery is complete. So if we re-add such a device to the array, we know that all of it is as up-to-date as the event count would suggest, and so it a bitmap-based recovery is possible. However with v1.x metadata, the hot-spare immediately becomes a full member of the array, but it record how much of the device has been recovered. If the array is stopped and re-assembled recovery starts from this point. When such a device is hot-added to an array we currently lose the 'how much is recovered' information and incorrectly included it as a full in-sync member (after bitmap-based fixup). This is wrong and unsafe and could corrupt data. So be more careful about setting saved_raid_disk - which is what guides the re-adding of devices back into an array. The new code matches the code in slot_store which does a similar thing, which is encouraging. This is suitable for any -stable kernel. Reported-by: N"Dailey, Nate" <Nate.Dailey@stratus.com> Cc: stable@kernel.org Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
As recorded in https://bugzilla.kernel.org/show_bug.cgi?id=24012 it is possible for a flush request through md to hang. This is due to an interaction between the recursion avoidance in generic_make_request, the insistence in md of only having one flush active at a time, and the possibility of dm (or md) submitting two flush requests to a device from the one generic_make_request. If a generic_make_request call into dm causes two flush requests to be queued (as happens if the dm table has two targets - they get one each), these two will be queued inside generic_make_request. Assume they are for the same md device. The first is processed and causes 1 or more flush requests to be sent to lower devices. These get queued within generic_make_request too. Then the second flush to the md device gets handled and it blocks waiting for the first flush to complete. But it won't complete until the two lower-device requests complete, and they haven't even been submitted yet as they are on the generic_make_request queue. The deadlock can be broken by using a separate thread to submit the requests to lower devices. md has such a thread readily available: md_wq. So use it to submit these requests. Reported-by: NGiacomo Catenazzi <cate@cateee.net> Tested-by: NGiacomo Catenazzi <cate@cateee.net> Signed-off-by: NNeilBrown <neilb@suse.de>
-