- 26 7月, 2010 13 次提交
-
-
由 NeilBrown 提交于
1/ use md_unplug in bitmap.c as we will soon be using bitmaps under arrays with no queue attached. 2/ Don't bother plugging the queue when we set a bit in the bitmap. The reason for this was to encourage as many bits as possible to get set before we unplug and write stuff out. However every personality already plugs the queue after bitmap_startwrite either directly (raid1/raid10) or be setting STRIPE_BIT_DELAY which causes the queue to be plugged later (raid5). Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
For dm-raid45 we will want to use bitmaps in dm-targets which don't have entries in sysfs, so cope with the mddev not living in sysfs. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
Fixes some whitespace problems Fixed some checkpatch.pl complaints. Replaced kmalloc ... memset(0), with kzalloc Fixed an unlikely memory leak on an error path. Reformatted a number of 'if/else' sets, sometimes replacing goto with an else clause. Removed some old comments and commented-out code. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
Also remove remaining accesses to ->queue and ->gendisk when ->queue is NULL (As it is in a DM target). Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
If an array doesn't have a 'queue' then md_do_sync cannot unplug it. In that case it will have a 'plugger', so make that available to the mddev, and use it to unplug the array if needed. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
md/raid5 uses the plugging infrastructure provided by the block layer and 'struct request_queue'. However when we plug raid5 under dm there is no request queue so we cannot use that. So create a similar infrastructure that is much lighter weight and use it for raid5. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
the dm module will need this for dm-raid45. Also only access ->queue->backing_dev_info->congested_fn if ->queue actually exists. It won't in a dm target. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
dm-raid456 does not provide a 'queue' for raid5 to use, so we must make raid5 stop depending on the queue. First: read_ahead dm handles read-ahead adjustment fully in userspace, so simply don't do any readahead adjustments if there is no queue. Also re-arrange code slightly so all the accesses to ->queue are together. Finally, move the blk_queue_merge_bvec function into the 'if' as the ->split_io setting in dm-raid456 has the same effect. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
dm uses scheduled work to raise events to user-space. So allow md device to have work_structs and schedule them on an error. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
export entry points for starting and stopping md arrays. This will be used by a module to make md/raid5 work under dm. Also stop calling md_stop_writes from md_stop, as that won't work well with dm - it will want to call the two separately. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
This functionality will be needed separately in a subsequent patch, so split it into it's own exported function. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
When MD_CHANGE_CLEAN is set we might block in md_write_start. So we should only set it when fairly sure that something will clear it. There are two places where it is set so as to encourage a metadata update to record the progress of resync/recovery. This should only be done if the internal metadata update mechanisms are in use, which can be tested by by inspecting '->persistent'. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
We will shortly allow md devices with no gendisk (they are attached to a dm-target instead). That will cause mdname() to return 'mdX'. There is one place where mdname really needs to be unique: when creating the name for a slab cache. So in that case, if there is no gendisk, you the address of the mddev formatted in HEX to provide a unique name. Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 21 7月, 2010 2 次提交
-
-
由 NeilBrown 提交于
Separate the actual 'change' code from the sysfs interface so that it can eventually be called internally. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
We will want md devices to live as dm targets where sysfs is not visible. So allow md to not connect to sysfs. Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 24 6月, 2010 12 次提交
-
-
由 NeilBrown 提交于
There are few situations where it would make any sense to add a spare when reducing the number of devices in an array, but it is conceivable: A 6 drive RAID6 with two missing devices could be reshaped to a 5 drive RAID6, and a spare could become available just in time for the reshape, but not early enough to have been recovered first. 'freezing' recovery can make this easy to do without any races. However doing such a thing is a bad idea. md will not record the partially-recovered state of the 'spare' and when the reshape finished it will think that the spare is still spare. Easiest way to avoid this confusion is to simply disallow it. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
As the comment says, the tail of this loop only applies to devices that are not fully in sync, so if In_sync was set, we should avoid the rest of the loop. This bug will hardly ever cause an actual problem. The worst it can do is allow an array to be assembled that is dirty and degraded, which is not generally a good idea (without warning the sysadmin first). This will only happen if the array is RAID4 or a RAID5/6 in an intermediate state during a reshape and so has one drive that is all 'parity' - no data - while some other device has failed. This is certainly possible, but not at all common. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
During a recovery of reshape the early part of some devices might be in-sync while the later parts are not. We we know we are looking at an early part it is good to treat that part as in-sync for stripe calculations. This is particularly important for a reshape which suffers device failure. Treating the data as in-sync can mean the difference between data-safety and data-loss. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
When we are reshaping an array, the device failure combinations that cause us to decide that the array as failed are more subtle. In particular, any 'spare' will be fully in-sync in the section of the array that has already been reshaped, thus failures that affect only that section are less critical. So encode this subtlety in a new function and call it as appropriate. The case that showed this problem was a 4 drive RAID5 to 8 drive RAID6 conversion where the last two devices failed. This resulted in: good good good good incomplete good good failed failed while converting a 5-drive RAID6 to 8 drive RAID5 The incomplete device causes the whole array to look bad, bad as it was actually good for the section that had been converted to 8-drives, all the data was actually safe. Reported-by: NTerry Morris <tbmorris@tbmorris.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
When an array is reshaped to have fewer devices, the reshape proceeds from the end of the devices to the beginning. If a device happens to be non-In_sync (which is possible but rare) we would normally update the ->recovery_offset as the reshape progresses. However that would be wrong as the recover_offset records that the early part of the device is in_sync, while in fact it would only be the later part that is in_sync, and in any case the offset number would be measured from the wrong end of the device. Relatedly, if after a reshape a spare is discovered to not be recoverred all the way to the end, not allow spare_active to incorporate it in the array. This becomes relevant in the following sample scenario: A 4 drive RAID5 is converted to a 6 drive RAID6 in a combined operation. The RAID5->RAID6 conversion will cause a 5 drive to be included as a spare, then the 5drive -> 6drive reshape will effectively rebuild that spare as it progresses. The 6th drive is treated as in_sync the whole time as there is never any case that we might consider reading from it, but must not because there is no valid data. If we interrupt this reshape part-way through and reverse it to return to a 5-drive RAID6 (or event a 4-drive RAID5), we don't want to update the recovery_offset - as that would be wrong - and we don't want to include that spare as active in the 5-drive RAID6 when the reversed reshape completed and it will be mostly out-of-sync still. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
The entries in the stripe_cache maintained by raid5 are enlarged when we increased the number of devices in the array, but not shrunk when we reduce the number of devices. So if entries are added after reducing the number of devices, we much ensure to initialise the whole entry, not just the part that is currently relevant. Otherwise if we enlarge the array again, we will reference uninitialised values. As grow_buffers/shrink_buffer now want to use a count that is stored explicity in the raid_conf, they should get it from there rather than being passed it as a parameter. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 Maciej Trela 提交于
Only level 5 with layout=PARITY_N can be taken over to raid0 now. Lets allow level 4 either. Signed-off-by: NMaciej Trela <maciej.trela@intel.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 Maciej Trela 提交于
After takeover from raid5/10 -> raid0 mddev->layout is not cleared. Signed-off-by: NMaciej Trela <maciej.trela@intel.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 Maciej Trela 提交于
Use mddev->new_layout in setup_conf. Also use new_chunk, and don't set ->degraded in takeover(). That gets set in run() Signed-off-by: NMaciej Trela <maciej.trela@intel.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
Most array level changes leave the list of devices largely unchanged, possibly causing one at the end to become redundant. However conversions between RAID0 and RAID10 need to renumber all devices (except 0). This renumbering is currently being done in the ->run method when the new personality takes over. However this is too late as the common code in md.c might already have invalidated some of the devices if they had a ->raid_disk number that appeared to high. Moving it into the ->takeover method is too early as the array is still active at that time and wrong ->raid_disk numbers could cause confusion. So add a ->new_raid_disk field to mdk_rdev_s and use it to communicate the new raid_disk number. Now the common code knows exactly which devices need to be renumbered, and which can be invalidated, and can do it all at a convenient time when the array is suspend. It can also update some symlinks in sysfs which previously were not be updated correctly. Reported-by: NMaciej Trela <maciej.trela@intel.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 Prasanna S. Panchamukhi 提交于
Such NULL pointer dereference can occur when the driver was fixing the read errors/bad blocks and the disk was physically removed causing a system crash. This patch check if the rcu_dereference() returns valid rdev before accessing it in fix_read_error(). Cc: stable@kernel.org Signed-off-by: NPrasanna S. Panchamukhi <prasanna.panchamukhi@riverbed.com> Signed-off-by: NRob Becker <rbecker@riverbed.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
Commit b821eaa5 broke partition detection for md arrays. The logic was almost right. However if revalidate_disk is called when the device is not yet open, bdev->bd_disk won't be set, so the flush_disk() Call will not set bd_invalidated. So when md_open is called we still need to ensure that ->bd_invalidated gets set. This is easily done with a call to check_disk_size_change in the place where the offending commit removed check_disk_change. At the important times, the size will have changed from 0 to non-zero, so check_disk_size_change will set bd_invalidated. Tested-by: NDuncan <1i5t5.duncan@cox.net> Reported-by: NDuncan <1i5t5.duncan@cox.net> Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 28 5月, 2010 1 次提交
-
-
由 Akinobu Mita 提交于
By the previous modification, the cpu notifier can return encapsulate errno value. This converts the cpu notifiers for raid5. Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com> Cc: Neil Brown <neilb@suse.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 22 5月, 2010 2 次提交
-
-
由 Christoph Hellwig 提交于
Now that the last user passing a NULL file pointer is gone we can remove the redundant dentry argument and associated hacks inside vfs_fsynmc_range. The next step will be removig the dentry argument from ->fsync, but given the luck with the last round of method prototype changes I'd rather defer this until after the main merge window. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Eric W. Biederman 提交于
The problem. When implementing a network namespace I need to be able to have multiple network devices with the same name. Currently this is a problem for /sys/class/net/*, /sys/devices/virtual/net/*, and potentially a few other directories of the form /sys/ ... /net/*. What this patch does is to add an additional tag field to the sysfs dirent structure. For directories that should show different contents depending on the context such as /sys/class/net/, and /sys/devices/virtual/net/ this tag field is used to specify the context in which those directories should be visible. Effectively this is the same as creating multiple distinct directories with the same name but internally to sysfs the result is nicer. I am calling the concept of a single directory that looks like multiple directories all at the same path in the filesystem tagged directories. For the networking namespace the set of directories whose contents I need to filter with tags can depend on the presence or absence of hotplug hardware or which modules are currently loaded. Which means I need a simple race free way to setup those directories as tagged. To achieve a reace free design all tagged directories are created and managed by sysfs itself. Users of this interface: - define a type in the sysfs_tag_type enumeration. - call sysfs_register_ns_types with the type and it's operations - sysfs_exit_ns when an individual tag is no longer valid - Implement mount_ns() which returns the ns of the calling process so we can attach it to a sysfs superblock. - Implement ktype.namespace() which returns the ns of a syfs kobject. Everything else is left up to sysfs and the driver layer. For the network namespace mount_ns and namespace() are essentially one line functions, and look to remain that. Tags are currently represented a const void * pointers as that is both generic, prevides enough information for equality comparisons, and is trivial to create for current users, as it is just the existing namespace pointer. The work needed in sysfs is more extensive. At each directory or symlink creating I need to check if the directory it is being created in is a tagged directory and if so generate the appropriate tag to place on the sysfs_dirent. Likewise at each symlink or directory removal I need to check if the sysfs directory it is being removed from is a tagged directory and if so figure out which tag goes along with the name I am deleting. Currently only directories which hold kobjects, and symlinks are supported. There is not enough information in the current file attribute interfaces to give us anything to discriminate on which makes it useless, and there are no potential users which makes it an uninteresting problem to solve. Signed-off-by: NEric W. Biederman <ebiederm@xmission.com> Signed-off-by: NBenjamin Thery <benjamin.thery@bull.net> Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
-
- 18 5月, 2010 10 次提交
-
-
由 NeilBrown 提交于
Devices which know that they are spares do not really need to have an event count that matches the rest of the array, so there are no data-in-sync issues. It is enough that the uuid matches. So remove the requirement that the event count is up-to-date. We currently still write out and event count on spares, but this allows us in a year or 3 to stop doing that completely. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
When updating the event count for a simple clean <-> dirty transition, we try to avoid updating the spares so they can safely spin-down. As the event_counts across an array must be +/- 1, this means decrementing the event_count on a dirty->clean transition. This is not always safe and we have to avoid the unsafe time. We current do this with a misguided idea about it being safe or not depending on whether the event_count is odd or even. This approach only works reliably in a few common instances, but easily falls down. So instead, simply keep internal state concerning whether it is safe or not, and always assume it is not safe when an array is first assembled. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 Gabriele A. Trombetti 提交于
Fix: Raid-6 was not trying to correct a read-error when in singly-degraded state and was instead dropping one more device, going to doubly-degraded state. This patch fixes this behaviour. Tested-by: NJanos Haar <janos.haar@netcenter.hu> Signed-off-by: NGabriele A. Trombetti <g.trombetti.lkrnl1213@logicschema.com> Reported-by: NJanos Haar <janos.haar@netcenter.hu> Signed-off-by: NNeilBrown <neilb@suse.de> Cc: stable@kernel.org
-
由 NeilBrown 提交于
Some time ago we stopped the clean/active metadata updates from being written to a 'spare' device in most cases so that it could spin down and say spun down. Device failure/removal etc are still recorded on spares. However commit 51d5668c broke this 50% of the time, depending on whether the event count is even or odd. The change log entry said: This means that the alignment between 'odd/even' and 'clean/dirty' might take a little longer to attain, how ever the code makes no attempt to create that alignment, so it could take arbitrarily long. So when we find that clean/dirty is not aligned with odd/even, force a second metadata-update immediately. There are already cases where a second metadata-update is needed immediately (e.g. when a device fails during the metadata update). We just piggy-back on that. Reported-by: NJoe Bryant <tenminjoe@yahoo.com> Signed-off-by: NNeilBrown <neilb@suse.de> Cc: stable@kernel.org
-
由 NeilBrown 提交于
read_balance uses a "unsigned long" for a sector number which will get truncated beyond 2TB. This will cause read-balancing to be non-optimal, and can cause data to be read from the 'wrong' branch during a resync. This has a very small chance of returning wrong data. Reported-by: NJordan Russell <jr-list-2010@quo.to> Cc: stable@kernel.org Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
md/linear:mdname: Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
All messages now start md/raid0:md-device-name: Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
All raid10 printk messages now start md/raid10:md-device-name: Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
Make sure the array name is included in a uniform way in all printk messages. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
Many 'printk' messages from the raid456 module mention 'raid5' even though it may be a 'raid6' or even 'raid4' array. This can cause confusion. Also the actual array name is not always reported and when it is it is not reported consistently. So change all the messages to start: md/raid:%s: where '%s' becomes e.g. md3 to identify the particular array. Signed-off-by: NNeilBrown <neilb@suse.de>
-