- 22 5月, 2010 1 次提交
-
-
由 Eric W. Biederman 提交于
The problem. When implementing a network namespace I need to be able to have multiple network devices with the same name. Currently this is a problem for /sys/class/net/*, /sys/devices/virtual/net/*, and potentially a few other directories of the form /sys/ ... /net/*. What this patch does is to add an additional tag field to the sysfs dirent structure. For directories that should show different contents depending on the context such as /sys/class/net/, and /sys/devices/virtual/net/ this tag field is used to specify the context in which those directories should be visible. Effectively this is the same as creating multiple distinct directories with the same name but internally to sysfs the result is nicer. I am calling the concept of a single directory that looks like multiple directories all at the same path in the filesystem tagged directories. For the networking namespace the set of directories whose contents I need to filter with tags can depend on the presence or absence of hotplug hardware or which modules are currently loaded. Which means I need a simple race free way to setup those directories as tagged. To achieve a reace free design all tagged directories are created and managed by sysfs itself. Users of this interface: - define a type in the sysfs_tag_type enumeration. - call sysfs_register_ns_types with the type and it's operations - sysfs_exit_ns when an individual tag is no longer valid - Implement mount_ns() which returns the ns of the calling process so we can attach it to a sysfs superblock. - Implement ktype.namespace() which returns the ns of a syfs kobject. Everything else is left up to sysfs and the driver layer. For the network namespace mount_ns and namespace() are essentially one line functions, and look to remain that. Tags are currently represented a const void * pointers as that is both generic, prevides enough information for equality comparisons, and is trivial to create for current users, as it is just the existing namespace pointer. The work needed in sysfs is more extensive. At each directory or symlink creating I need to check if the directory it is being created in is a tagged directory and if so generate the appropriate tag to place on the sysfs_dirent. Likewise at each symlink or directory removal I need to check if the sysfs directory it is being removed from is a tagged directory and if so figure out which tag goes along with the name I am deleting. Currently only directories which hold kobjects, and symlinks are supported. There is not enough information in the current file attribute interfaces to give us anything to discriminate on which makes it useless, and there are no potential users which makes it an uninteresting problem to solve. Signed-off-by: NEric W. Biederman <ebiederm@xmission.com> Signed-off-by: NBenjamin Thery <benjamin.thery@bull.net> Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
-
- 18 5月, 2010 39 次提交
-
-
由 NeilBrown 提交于
Devices which know that they are spares do not really need to have an event count that matches the rest of the array, so there are no data-in-sync issues. It is enough that the uuid matches. So remove the requirement that the event count is up-to-date. We currently still write out and event count on spares, but this allows us in a year or 3 to stop doing that completely. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
When updating the event count for a simple clean <-> dirty transition, we try to avoid updating the spares so they can safely spin-down. As the event_counts across an array must be +/- 1, this means decrementing the event_count on a dirty->clean transition. This is not always safe and we have to avoid the unsafe time. We current do this with a misguided idea about it being safe or not depending on whether the event_count is odd or even. This approach only works reliably in a few common instances, but easily falls down. So instead, simply keep internal state concerning whether it is safe or not, and always assume it is not safe when an array is first assembled. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 Gabriele A. Trombetti 提交于
Fix: Raid-6 was not trying to correct a read-error when in singly-degraded state and was instead dropping one more device, going to doubly-degraded state. This patch fixes this behaviour. Tested-by: NJanos Haar <janos.haar@netcenter.hu> Signed-off-by: NGabriele A. Trombetti <g.trombetti.lkrnl1213@logicschema.com> Reported-by: NJanos Haar <janos.haar@netcenter.hu> Signed-off-by: NNeilBrown <neilb@suse.de> Cc: stable@kernel.org
-
由 NeilBrown 提交于
Some time ago we stopped the clean/active metadata updates from being written to a 'spare' device in most cases so that it could spin down and say spun down. Device failure/removal etc are still recorded on spares. However commit 51d5668c broke this 50% of the time, depending on whether the event count is even or odd. The change log entry said: This means that the alignment between 'odd/even' and 'clean/dirty' might take a little longer to attain, how ever the code makes no attempt to create that alignment, so it could take arbitrarily long. So when we find that clean/dirty is not aligned with odd/even, force a second metadata-update immediately. There are already cases where a second metadata-update is needed immediately (e.g. when a device fails during the metadata update). We just piggy-back on that. Reported-by: NJoe Bryant <tenminjoe@yahoo.com> Signed-off-by: NNeilBrown <neilb@suse.de> Cc: stable@kernel.org
-
由 NeilBrown 提交于
read_balance uses a "unsigned long" for a sector number which will get truncated beyond 2TB. This will cause read-balancing to be non-optimal, and can cause data to be read from the 'wrong' branch during a resync. This has a very small chance of returning wrong data. Reported-by: NJordan Russell <jr-list-2010@quo.to> Cc: stable@kernel.org Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
md/linear:mdname: Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
All messages now start md/raid0:md-device-name: Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
All raid10 printk messages now start md/raid10:md-device-name: Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
Make sure the array name is included in a uniform way in all printk messages. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
Many 'printk' messages from the raid456 module mention 'raid5' even though it may be a 'raid6' or even 'raid4' array. This can cause confusion. Also the actual array name is not always reported and when it is it is not reported consistently. So change all the messages to start: md/raid:%s: where '%s' becomes e.g. md3 to identify the particular array. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
RAID10 has been available for quite a while now and is quite well tested, so we can remove the EXPERIMENTAL designation. Reported-by: NEric MSP Veith <eveith@wwweb-library.net> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 Dan Williams 提交于
e.g. allow md to interpret 'echo 4 > md/level' as a request for raid4. Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
由 Dan Williams 提交于
Level modifications change the output of mdstat. The mdmon manager thread is interested in these events for external metadata management. Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
由 Dan Williams 提交于
For consistency allow raid4 to takeover raid0 in addition to raid5 (with a raid4 layout). Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
由 NeilBrown 提交于
When a raid1 array is configured to support write-behind on some devices, it normally only reads from other devices. If all devices are write-behind (because the rest have failed) it is possible for a read request to be serviced before a behind-write request, which would appear as data corruption. So when forced to read from a WriteMostly device, wait for any write-behind to complete, and don't start any more behind-writes. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
This message seems to suggest the named device is the one on which a read failed, however it is actually the device that the read will be redirected to. So make the message a little clearer. Reported-by: NTim Burgess <ozburgess@gmail.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
This is - unnecessary because mddev_suspend is always followed by a call to ->stop, and each ->stop unregisters the thread, and - a problem as it makes it awkwards to suspend and then resume a device as we will want later. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
This is a simple factorisation that makes mddev_find easier to read. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
We used to pass the personality make_request function direct to the block layer so the first argument had to be a queue. But now we have the intermediary md_make_request so it makes at lot more sense to pass a struct mddev_s. It makes it possible to have an mddev without its own queue too. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
This moves the call to the other side of set_readonly, but that should not be an issue. This encapsulates in 'md_stop' all of the functionality for internally stopping the array, leaving all the interactions with externalities (sysfs, request_queue, gendisk) in do_md_stop. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
Using do_md_stop to set an array to read-only is a little confusing. Now most of the common code has been factored out, split md_set_readonly off in to a separate function. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
Further refactoring of do_md_stop. This one requires some explanation as it takes code from different places in do_md_stop, so some re-ordering happens. We only get into this part of do_md_stop if there are no active opens of the device, so no writes can be happening and the device must have been flushed. In md_stop_writes we want to stop any internal sources of writes - i.e. resync - and flush out the metadata. The only code that was previously before some of this code is code to clean up the queue, the mddev, the gendisk, or sysfs, all of which is probably better after code that makes active changes (i.e. triggers writes). Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
do_md_stop is large and clunky, so hard to understand. This is a first step of refactoring, pulling two simple sub-functions out. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
As part of relaxing the binding between an mddev and gendisk, we separate do_md_run into two functions. md_run does all the work internal to md do_md_run calls md_run and makes and changes to gendisk that are required. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
We set ->changed to 1 and call check_disk_change at the end of md_open so that bd_invalidated would be set and thus partition rescan would happen appropriately. Now that we call revalidate_disk directly, which sets bd_invalidates, that indirection is no longer needed and can be removed. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
Using ->array_sectors rather than get_capacity() is more direct and is a step towards relaxing the tight connection between mddev and gendisk. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
While I generally prefer letting personalities do as much as possible, given that we have a central md_make_request anyway we may as well use it to simplify code. Also this centralises knowledge of ->gendisk which will help later. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
Diving through ->queue to find mddev is unnecessarily complex - there is an easier path to finding mddev, so use that. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
This is unlikely to be wanted, but we may as well provide it for completeness. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 Maciej Trela 提交于
Level changes can be very significant, so make sure to notify them via sysfs. Signed-off-by: NMaciej Trela <maciej.trela@intel.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
When metadata is being managed by user-space, md doesn't know what the maximum number of devices allowed in an array is so ->max_disks is 0. In this case we should allow any (+ve) number of disks. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 Maciej Trela 提交于
Writing "none" to "../md/dev-xx/slot" removes that device from being an active part of the array, but it didn't set ->raid_disk to -1 to record this fact. Signed-off-by: NMaciej Trela <Maciej.Trela@intel.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 Trela, Maciej 提交于
Signed-off-by: NMaciej Trela <maciej.trela@intel.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 Trela, Maciej 提交于
Signed-off-by: NMaciej Trela <maciej.trela@intel.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 Trela Maciej 提交于
Signed-off-by: NMaciej Trela <maciej.trela@intel.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
In a subsequent patch we will make it possible to change mddev->raid_disks while a RAID0 or RAID10 array is active. This is part of the process of reshaping such an array. This means that we cannot use this value while processes requests (it is OK to use it during initialisation as we are locked against changes then). Both RAID0 and RAID10 have the same value stored in the private data structure, so use that value instead. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
This was needed when sysfs files could only be 'notified' from process context. Now that we have sys_notify_direct, we can call it directly from an interrupt. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 H Hartley Sweeten 提交于
void pointers do not need to be cast to other pointer types. Signed-off-by: NH Hartley Sweeten <hsweeten@visionengravers.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 Paul Clements 提交于
Keep track of the maximum number of concurrent write-behind requests for an md array and exposed this number in sysfs at md/bitmap/max_backlog_used Writing any value to this file will clear it. This allows userspace to be involved in tuning bitmap/backlog. Signed-off-by: NPaul Clements <paul.clements@steeleye.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-