- 07 10月, 2010 1 次提交
-
-
由 Vasiliy Kulikov 提交于
Function read_sb_page may return ERR_PTR(...). Check for it. Signed-off-by: NVasiliy Kulikov <segooon@gmail.com> Cc: Neil Brown <neilb@suse.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 30 8月, 2010 1 次提交
-
-
由 NeilBrown 提交于
MD_CHANGE_CLEAN is used for two different purposes and this leads to confusion. One of the purposes is largely mirrored by MD_CHANGE_PENDING which is not used for anything else, so have MD_CHANGE_PENDING take over that purpose fully. The two purposes are: 1/ tell md_update_sb that an update is needed and that it is just a clean/dirty transition. 2/ tell user-space that an transition from clean to dirty is pending (something wants to write), and tell te kernel (by clearin the flag) that the transition is OK. The first purpose remains wit MD_CHANGE_CLEAN, the second is moved fully to MD_CHANGE_PENDING. This means that various places which conditionally set or cleared MD_CHANGE_CLEAN no longer need to be conditional. Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 26 7月, 2010 7 次提交
-
-
由 NeilBrown 提交于
dm makes this distinction between ->ctr and ->resume, so we need to too. Also get the new bitmap_load to clear out the bitmap first, as this is most consistent with the dm suspend/resume approach Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
This allows md/raid5 to fully work as a dm target. Normally md uses a 'filemap' which contains a list of pages of bits each of which may be written separately. dm-log uses and all-or-nothing approach to writing the log, so when using a dm-log, ->filemap is NULL and the flags normally stored in filemap_attr are stored in ->logattrs instead. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
A bitmap is stored as one page per 2048 bits. If none of the bits are set, the page is not allocated. When bitmap_get_counter finds that a page isn't allocate, it just reports that one bit work of space isn't flagged, rather than reporting that 2048 bits worth of space are unflagged. This can cause searches for flagged bits (e.g. bitmap_close_sync) to do more work than is really necessary. So change bitmap_get_counter (when creating) to report a number of blocks that more accurately reports the range of the device for which no counter currently exists. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
1/ use md_unplug in bitmap.c as we will soon be using bitmaps under arrays with no queue attached. 2/ Don't bother plugging the queue when we set a bit in the bitmap. The reason for this was to encourage as many bits as possible to get set before we unplug and write stuff out. However every personality already plugs the queue after bitmap_startwrite either directly (raid1/raid10) or be setting STRIPE_BIT_DELAY which causes the queue to be plugged later (raid5). Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
For dm-raid45 we will want to use bitmaps in dm-targets which don't have entries in sysfs, so cope with the mddev not living in sysfs. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
Fixes some whitespace problems Fixed some checkpatch.pl complaints. Replaced kmalloc ... memset(0), with kzalloc Fixed an unlikely memory leak on an error path. Reformatted a number of 'if/else' sets, sometimes replacing goto with an else clause. Removed some old comments and commented-out code. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
When MD_CHANGE_CLEAN is set we might block in md_write_start. So we should only set it when fairly sure that something will clear it. There are two places where it is set so as to encourage a metadata update to record the progress of resync/recovery. This should only be done if the internal metadata update mechanisms are in use, which can be tested by by inspecting '->persistent'. Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 22 5月, 2010 2 次提交
-
-
由 Christoph Hellwig 提交于
Now that the last user passing a NULL file pointer is gone we can remove the redundant dentry argument and associated hacks inside vfs_fsynmc_range. The next step will be removig the dentry argument from ->fsync, but given the luck with the last round of method prototype changes I'd rather defer this until after the main merge window. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Eric W. Biederman 提交于
The problem. When implementing a network namespace I need to be able to have multiple network devices with the same name. Currently this is a problem for /sys/class/net/*, /sys/devices/virtual/net/*, and potentially a few other directories of the form /sys/ ... /net/*. What this patch does is to add an additional tag field to the sysfs dirent structure. For directories that should show different contents depending on the context such as /sys/class/net/, and /sys/devices/virtual/net/ this tag field is used to specify the context in which those directories should be visible. Effectively this is the same as creating multiple distinct directories with the same name but internally to sysfs the result is nicer. I am calling the concept of a single directory that looks like multiple directories all at the same path in the filesystem tagged directories. For the networking namespace the set of directories whose contents I need to filter with tags can depend on the presence or absence of hotplug hardware or which modules are currently loaded. Which means I need a simple race free way to setup those directories as tagged. To achieve a reace free design all tagged directories are created and managed by sysfs itself. Users of this interface: - define a type in the sysfs_tag_type enumeration. - call sysfs_register_ns_types with the type and it's operations - sysfs_exit_ns when an individual tag is no longer valid - Implement mount_ns() which returns the ns of the calling process so we can attach it to a sysfs superblock. - Implement ktype.namespace() which returns the ns of a syfs kobject. Everything else is left up to sysfs and the driver layer. For the network namespace mount_ns and namespace() are essentially one line functions, and look to remain that. Tags are currently represented a const void * pointers as that is both generic, prevides enough information for equality comparisons, and is trivial to create for current users, as it is just the existing namespace pointer. The work needed in sysfs is more extensive. At each directory or symlink creating I need to check if the directory it is being created in is a tagged directory and if so generate the appropriate tag to place on the sysfs_dirent. Likewise at each symlink or directory removal I need to check if the sysfs directory it is being removed from is a tagged directory and if so figure out which tag goes along with the name I am deleting. Currently only directories which hold kobjects, and symlinks are supported. There is not enough information in the current file attribute interfaces to give us anything to discriminate on which makes it useless, and there are no potential users which makes it an uninteresting problem to solve. Signed-off-by: NEric W. Biederman <ebiederm@xmission.com> Signed-off-by: NBenjamin Thery <benjamin.thery@bull.net> Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
-
- 18 5月, 2010 3 次提交
-
-
由 NeilBrown 提交于
When a raid1 array is configured to support write-behind on some devices, it normally only reads from other devices. If all devices are write-behind (because the rest have failed) it is possible for a read request to be serviced before a behind-write request, which would appear as data corruption. So when forced to read from a WriteMostly device, wait for any write-behind to complete, and don't start any more behind-writes. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 H Hartley Sweeten 提交于
void pointers do not need to be cast to other pointer types. Signed-off-by: NH Hartley Sweeten <hsweeten@visionengravers.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 Paul Clements 提交于
Keep track of the maximum number of concurrent write-behind requests for an md array and exposed this number in sysfs at md/bitmap/max_backlog_used Writing any value to this file will clear it. This allows userspace to be involved in tuning bitmap/backlog. Signed-off-by: NPaul Clements <paul.clements@steeleye.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 14 12月, 2009 10 次提交
-
-
由 NeilBrown 提交于
There is a sysfs file which allows bits in the write-intent bitmap to be explicit set - indicating that the block is thought to be 'dirty'. When this happens we should really set recovery_cp backwards to include the block to reflect this dirtiness. In particular, a 'resync' process will refuse to start if recovery_cp is beyond the end of the array, so this is needed to allow a resync to be triggered. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
In this case, the metadata needs to not be in the same sector as the bitmap. md will not read/write any bitmap metadata. Config must be done via sysfs and when a recovery makes the array non-degraded again, writing 'true' to 'bitmap/can_clear' will allow bits in the bitmap to be cleared again. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
Setting daemon_lastrun really has nothing to do with reading the bitmap superblock, it just happens to be needed at the same time. bitmap_read_sb is about to become options, so move that code out to after the call to bitmap_read_sb. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
A new attribute directory 'bitmap' in 'md' is created which contains files for configuring the bitmap. 'location' identifies where the bitmap is, either 'none', or 'file' or 'sector offset from metadata'. Writing 'location' can create or remove a bitmap. Adding a 'file' bitmap this way is not yet supported. 'chunksize' and 'time_base' must be set before 'location' can be set. 'chunksize' can be set before creating a bitmap, but is currently always over-ridden by the bitmap superblock. 'time_base' and 'backlog' can be updated at any time. Signed-off-by: NNeilBrown <neilb@suse.de> Reviewed-by: NAndre Noll <maan@systemlinux.org>
-
由 NeilBrown 提交于
For md arrays were metadata is managed externally, the kernel does not know about a superblock so the superblock offset is 0. If we want to have a write-intent-bitmap near the end of the devices of such an array, we should support sector_t sized offset. We need offset be possibly negative for when the bitmap is before the metadata, so use loff_t instead. Also add sanity check that bitmap does not overlap with data. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
As bitmap_create and bitmap_destroy already set thread->timeout as appropriate, there is no need to do it in raid10_quiesce. There is a possible need to wake the thread after the timeout has been set low, but it is better to do that where the timeout is actually set low, in bitmap_create. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
This removes a lot of multiplications by HZ. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
... and into bitmap_info. These are all configuration parameters that need to be set before the bitmap is created. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
In preparation for making bitmap fields configurable via sysfs, start tidying up by making a single structure to contain the configuration fields. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
A write intent bitmap can be removed from an array while the array is active. When this happens, all IO is suspended and flushed before the bitmap is removed. However it is possible that bitmap_daemon_work is still running to clear old bits from the bitmap. If it is, it can dereference the bitmap after it has been freed. So introduce a new mutex to protect bitmap_daemon_work and get it before destroying a bitmap. This is suitable for any current -stable kernel. Signed-off-by: NNeilBrown <neilb@suse.de> Cc: stable@kernel.org
-
- 16 10月, 2009 1 次提交
-
-
由 NeilBrown 提交于
and replace with vfs_fsync which is much neater (but wasn't exported, or even in existence at the time the code was written). Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 23 9月, 2009 1 次提交
-
-
由 NeilBrown 提交于
There was a real error here on a failure path where we incorrectly call rcu_read_unlock. Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 26 5月, 2009 1 次提交
-
-
由 NeilBrown 提交于
The code for checking which bits in the bitmap can be cleared has 2 problems: 1/ it repeatedly takes and drops a spinlock, where it would make more sense to just hold on to it most of the time. 2/ it doesn't make use of some opportunities to skip large sections of the bitmap This patch fixes those. It will only affect CPU consumption, not correctness. Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 23 5月, 2009 1 次提交
-
-
由 Martin K. Petersen 提交于
Until now we have had a 1:1 mapping between storage device physical block size and the logical block sized used when addressing the device. With SATA 4KB drives coming out that will no longer be the case. The sector size will be 4KB but the logical block size will remain 512-bytes. Hence we need to distinguish between the physical block size and the logical ditto. This patch renames hardsect_size to logical_block_size. Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
- 07 5月, 2009 2 次提交
-
-
由 NeilBrown 提交于
If a write intent bitmap covers more than 2TB, we sometimes work with values beyond 32bit, so these need to be sector_t. This patches add the required casts to some unsigned longs that are being shifted up. This will affect any raid10 larger than 2TB, or any raid1/4/5/6 with member devices that are larger than 2TB. Signed-off-by: NNeilBrown <neilb@suse.de> Reported-by: N"Mario 'BitKoenig' Holbe" <Mario.Holbe@TU-Ilmenau.DE> Cc: stable@kernel.org
-
由 NeilBrown 提交于
When md is loading a bitmap which it knows is out of date, it fills each page with 1s and writes it back out again. However the write_page call makes used of bitmap->file_pages and bitmap->last_page_size which haven't been set correctly yet. So this can sometimes fail. Move the setting of file_pages and last_page_size to before the call to write_page. This bug can cause the assembly on an array to fail, thus making the data inaccessible. Hence I think it is a suitable candidate for -stable. Cc: stable@kernel.org Reported-by: NVojtech Pavlik <vojtech@suse.cz> Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 20 4月, 2009 1 次提交
-
-
由 NeilBrown 提交于
.. and other arrays with components larger than 2 terabytes. We use a "long" rather than a "sector_t" in part of the bitmap size calculations, which is sad. Reported-by: N"Mario 'BitKoenig' Holbe" <Mario.Holbe@TU-Ilmenau.DE> Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 14 4月, 2009 1 次提交
-
-
由 NeilBrown 提交于
The sync_completed file reports how much of a resync (or recovery or reshape) has been completed. However due to the possibility of out-of-order completion of writes, it is not certain to be accurate. We have an internal value - mddev->curr_resync_completed - which is an accurate value (though it might not always be quite so uptodate). So: - make curr_resync_completed be uptodate a little more often, particularly when raid5 reshape updates status in the metadata - report curr_resync_completed in the sysfs file - allow poll/select to report all updates to md/sync_completed. This makes sync_completed completed usable by any external metadata handler that wants to record this status information in its metadata. Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 31 3月, 2009 8 次提交
-
-
由 Andre Noll 提交于
This patch renames the "size" field of struct mddev_s to "dev_sectors" and stores the number of 512-byte sectors instead of the number of 1K-blocks in it. All users of that field, including raid levels 1,4-6,10, are adjusted accordingly. This simplifies the code a bit because it allows to get rid of a couple of divisions/multiplications by two. In order to make checkpatch happy, some minor coding style issues have also been addressed. In particular, size_store() now uses strict_strtoull() instead of simple_strtoull(). Signed-off-by: NAndre Noll <maan@systemlinux.org> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
Version 1.x metadata has the ability to record the status of a partially completed drive recovery. However we only update that record on a clean shutdown. It would be nice to update it on unclean shutdowns too, particularly when using a bitmap that removes much to the 'sync' effort after an unclean shutdown. One complication with checkpointing recovery is that we only know where we are up to in terms of IO requests started, not which ones have completed. And we need to know what has completed to record how much is recovered. So occasionally pause the recovery until all submitted requests are completed, then update the record of where we are up to. When we have a bitmap, we already do that pause occasionally to keep the bitmap up-to-date. So enhance that code to record the recovery offset and schedule a superblock update. And when there is no bitmap, just pause 16 times during the resync to do a checkpoint. '16' is a fairly arbitrary number. But we don't really have any good way to judge how often is acceptable, and it seems like a reasonable number for now. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
It really is nicer to keep related code together.. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
This makes the includes more explicit, and is preparation for moving md_k.h to drivers/md/md.h Remove include/raid/md.h as its only remaining use was to #include other files. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 Christoph Hellwig 提交于
Move the headers with the local structures for the disciplines and bitmap.h into drivers/md/ so that they are more easily grepable for hacking and not far away. md.h is left where it is for now as there are some uses from the outside. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
When we add some spares to an array and start recovery, and we have a bitmap which is stored 'internally' on all devices, we call bitmap_write_all to make sure the bitmap is correct on the new device(s). However that doesn't work as write_sb_page only writes to 'In_sync' devices, and devices undergoing recovery are not 'In_sync' until recovery finishes. So extend write_sb_page (actually next_active_rdev) to include devices that are under recovery. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
It is safe to clear a bit from the write-intent bitmap for a raid1 if we know the data has been written to all devices, which is what the current test does. But it is not always safe to update the 'events_cleared' counter in that case. This is because one request could complete successfully after some other request has partially failed. So simply disable the clearing and updating of events_cleared whenever the array is degraded. This might end up not clearing some bits that could safely be cleared, but it is safest approach. Note that the bug fixed here did not risk corrupting data by letting the array get out-of-sync. Rather it meant that when a device is removed and re-added to the array, it might incorrectly require a full recovery rather than just recovering based on the bitmap. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
md currently insists that the chunk size used for write-intent bitmaps (the amount of data that corresponds to one chunk) be at least one page. The reason for this restriction is lost in the mists of time, but a review of the code (and a vague memory) suggests that the only problem would be related to resync. Resync tries very hard to work in multiples of a page, but also needs to sync with units of a bitmap_chunk too. This connection comes out in the bitmap_start_sync call. So change bitmap_start_sync to always work in multiples of a page. If the bitmap chunk size is less that one page, we flag multiple chunks as 'syncing' and generally make them all appear to the resync routines like one chunk. All other code either already works with data ranges that could span multiple chunks, or explicitly only cares about a single chunk. Signed-off-by: NNeil Brown <neilb@suse.de>
-