- 14 12月, 2009 4 次提交
-
-
由 NeilBrown 提交于
Suggested by Oren Held <orenhe@il.ibm.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
... and into bitmap_info. These are all configuration parameters that need to be set before the bitmap is created. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
A 2-device raid5 array can now be converted to raid1. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
This will allow us to stop writeout to portions of the array while they are resynced by someone else - e.g. another node in a cluster. Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 01 12月, 2009 1 次提交
-
-
由 NeilBrown 提交于
commit 4706b349 was a forward port of a fix that was needed for SLES10. But in fact it is not needed in mainline because the earlier commit dd00a99e fixes the same problem in a better way. Further, this commit introduces a bug in the way it interacts with the automatic read-error-correction. If, after a read error is successfully corrected, the same disk is chosen to re-read - the re-read won't be attempted but an error will be returned instead. After reverting that commit, there is the possibility that a read error on a read-only array (where read errors cannot be corrected as that requires a write) will repeatedly read the same device and continue to get an error. So in the "Array is readonly" case, fail the drive immediately on a read error. Signed-off-by: NNeilBrown <neilb@suse.de> Cc: stable@kernel.org
-
- 16 10月, 2009 2 次提交
-
-
由 NeilBrown 提交于
Both raid1 and raid10 create a mempool during startup. If the 'alloc' function for this mempool fails, unplug_slaves is called. If that happens when the pool is being initialised, unplug_slaves will try to use the 'conf' structure that isn't filled in yet, and badness will happen. So ensure that unplug_slaves doesn't get called unless we know that the conf structure if fully initialised. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
During 'check' of a raid1 or raid10 it is possible for the management thread to spend a lot of time running 'memcmp' on blocks from different devices, so make sure the thread has a chance to schedule. raid5d already has a cond_resched (in process_stripe). Reported-By: NLee Howard <faxguy@howardsilvan.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 23 9月, 2009 3 次提交
-
-
由 Dmitry Monakhov 提交于
Recently Jens has changed bio_rw_flagged() logic by following commit 1f98a13f. Now it returns bool instead of int. This broke raid1/raid10 RW bits manipulation logic. One of visible result is BUG_ON triggering due to empty barrier here scsi_lib.c:1108 scsi_setup_fs_cmnd() Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
This should writeback from coming when the device is temporarily suspended. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
The management thread for raid4,5,6 arrays are all called mdX_raid5, independent of the actual raid level, which is wrong and can be confusion. So change md_register_thread to use the name from the personality unless no alternate name (like 'resync' or 'reshape') is given. This is simpler and more correct. Cc: Jinzc <zhenchengjin@gmail.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 11 9月, 2009 1 次提交
-
-
由 Jens Axboe 提交于
Get rid of any functions that test for these bits and make callers use bio_rw_flagged() directly. Then it is at least directly apparent what variable and flag they check. Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
- 03 8月, 2009 2 次提交
-
-
由 NeilBrown 提交于
As revalidate_disk calls check_disk_size_change, it will cause any capacity change of a gendisk to be propagated to the blockdev inode. So use that instead of mucking about with locks and i_size_write. Also add a call to revalidate_disk in do_md_run and a few other places where the gendisk capacity is changed. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 Andre Noll 提交于
This patch replaces md_integrity_check() by two new public functions: md_integrity_register() and md_integrity_add_rdev() which are both personality-independent. md_integrity_register() is called from the ->run and ->hot_remove methods of all personalities that support data integrity. The function iterates over the component devices of the array and determines if all active devices are integrity capable and if their profiles match. If this is the case, the common profile is registered for the mddev via blk_integrity_register(). The second new function, md_integrity_add_rdev() is called from the ->hot_add_disk methods, i.e. whenever a new device is being added to a raid array. If the new device does not support data integrity, or has a profile different from the one already registered, data integrity for the mddev is disabled. For raid0 and linear, only the call to md_integrity_register() from the ->run method is necessary. Signed-off-by: NAndre Noll <maan@systemlinux.org> Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 01 7月, 2009 1 次提交
-
-
由 Martin K. Petersen 提交于
Switch MD over to the new disk_stack_limits() function which checks for aligment and adjusts preferred I/O sizes when stacking. Also indicate preferred I/O sizes where applicable. Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 18 6月, 2009 3 次提交
-
-
由 Andre Noll 提交于
Currently, the md layer checks in analyze_sbs() if the raid level supports reconstruction (mddev->level >= 1) and if reconstruction is in progress (mddev->recovery_cp != MaxSector). Move that printk into the personality code of those raid levels that care (levels 1, 4, 5, 6, 10). Signed-off-by: NAndre Noll <maan@systemlinux.org> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 Andre Noll 提交于
A straight-forward conversion which gets rid of some multiplications/divisions/shifts. The patch also introduces a couple of new ones, most of which are due to conf->chunk_size still being represented in bytes. This will be cleaned up in subsequent patches. Signed-off-by: NAndre Noll <maan@systemlinux.org> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 Andre Noll 提交于
This patch renames the chunk_size field to chunk_sectors with the implied change of semantics. Since is_power_of_2(chunk_size) = is_power_of_2(chunk_sectors << 9) = is_power_of_2(chunk_sectors) these bits don't need an adjustment for the shift. Signed-off-by: NAndre Noll <maan@systemlinux.org> Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 16 6月, 2009 1 次提交
-
-
由 NeilBrown 提交于
Having a macro just to cast a void* isn't really helpful. I would must rather see that we are simply de-referencing ->private, than have to know what the macro does. So open code the macro everywhere and remove the pointless cast. Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 23 5月, 2009 1 次提交
-
-
由 Martin K. Petersen 提交于
Convert all external users of queue limits to using wrapper functions instead of poking the request queue variables directly. Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
- 15 4月, 2009 1 次提交
-
-
由 Christoph Hellwig 提交于
It's used by DM and MD and generally useful, so move the bio list helpers into bio.h. Signed-off-by: NChristoph Hellwig <hch@lst.de> Acked-by: NAlasdair G Kergon <agk@redhat.com> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
- 07 4月, 2009 1 次提交
-
-
由 Alexander Beregalov 提交于
Fix this build error: drivers/md/raid1.c: In function 'raid1_congested': drivers/md/raid1.c:589: error: 'BDI_write_congested' undeclared BDI_write_congested was changed in commit 1faa16d2 ("block: change the request allocation/congestion logic to be sync/async based") Signed-off-by: NAlexander Beregalov <a.beregalov@gmail.com> Cc: Neil Brown <neilb@suse.de> Cc: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 06 4月, 2009 1 次提交
-
-
由 NeilBrown 提交于
Since commit d3f76110 newly allocated bvecs aren't initialised to NULL, so we have to be more careful about freeing a bio which only managed to get a few pages allocated to it. Otherwise the resync process crashes. This patch is appropriate for 2.6.29-stable. Cc: stable@kernel.org Cc: "Jens Axboe" <jens.axboe@oracle.com> Reported-by: NGabriele Tozzi <gabriele@tozzi.eu> Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 31 3月, 2009 8 次提交
-
-
由 Dan Williams 提交于
Allow userspace to set the size of the array according to the following semantics: 1/ size must be <= to the size returned by mddev->pers->size(mddev, 0, 0) a) If size is set before the array is running, do_md_run will fail if size is greater than the default size b) A reshape attempt that reduces the default size to less than the set array size should be blocked 2/ once userspace sets the size the kernel will not change it 3/ writing 'default' to this attribute returns control of the size to the kernel and reverts to the size reported by the personality Also, convert locations that need to know the default size from directly reading ->array_sectors to <pers>_size. Resync/reshape operations always follow the default size. Finally, fixup other locations that read a number of 1k-blocks from userspace to use strict_blocks_to_sectors() which checks for unsigned long long to sector_t overflow and blocks to sectors overflow. Reviewed-by: NAndre Noll <maan@systemlinux.org> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
由 Dan Williams 提交于
Get personalities out of the business of directly modifying ->array_sectors. Lays groundwork to introduce policy on when ->array_sectors can be modified. Reviewed-by: NAndre Noll <maan@systemlinux.org> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
由 Dan Williams 提交于
In preparation for giving userspace control over ->array_sectors we need to be able to retrieve the 'default' size, and the 'anticipated' size when a reshape is requested. For personalities that do not reshape emit a warning if anything but the default size is requested. In the raid5 case we need to update ->previous_raid_disks to make the new 'default' size available. Reviewed-by: NAndre Noll <maan@systemlinux.org> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
由 NeilBrown 提交于
To be able to change the 'level' of an md/raid array, we need to suspend the device so that no requests are active - then move some pointers around etc. The code already keeps counts of active requests and the ->quiesce function can be used to wait until those counts hit zero. However the quiesce function blocks new requests once they are all ready 'inside' the personality module, and that is too late if we want to replace the personality modules. So make all md requests come in through a common md_make_request function that keeps track of how many requests have entered the modules but may not yet be on the internal reference counts. Allow md_make_request to be blocked when we want to suspend the device, and make it possible to wait for all those in-transit requests to be added to internal lists so that ->quiesce can wait for them. There is still a problem that when a request completes, we drop the ref count inside the personality code so there is a short time between when the refcount hits zero, and when the personality code is no longer being used. The personality code never blocks (schedule or spinlock) between dropping the refcount and exiting the routine, so this should be safe (as put_module calls synchronize_sched() before unmapping the module code). Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 Andre Noll 提交于
This patch renames the "size" field of struct mddev_s to "dev_sectors" and stores the number of 512-byte sectors instead of the number of 1K-blocks in it. All users of that field, including raid levels 1,4-6,10, are adjusted accordingly. This simplifies the code a bit because it allows to get rid of a couple of divisions/multiplications by two. In order to make checkpatch happy, some minor coding style issues have also been addressed. In particular, size_store() now uses strict_strtoull() instead of simple_strtoull(). Signed-off-by: NAndre Noll <maan@systemlinux.org> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
It really is nicer to keep related code together.. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
This makes the includes more explicit, and is preparation for moving md_k.h to drivers/md/md.h Remove include/raid/md.h as its only remaining use was to #include other files. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 Christoph Hellwig 提交于
Move the headers with the local structures for the disciplines and bitmap.h into drivers/md/ so that they are more easily grepable for hacking and not far away. md.h is left where it is for now as there are some uses from the outside. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 25 2月, 2009 1 次提交
-
-
由 NeilBrown 提交于
There has been a race in raid10 and raid1 for a long time which has only recently started showing up due to a scheduler changed. When a sync_read request finishes, as soon as reschedule_retry is called, another thread can mark the resync request as having completed, so md_do_sync can finish, ->stop can be called, and ->conf can be freed. So using conf after reschedule_retry is not safe. Similarly, when finishing a sync_write, calling md_done_sync must be the last thing we do, as it allows a chain of events which will free conf and other data structures. The first of these requires action in raid10.c The second requires action in raid1.c and raid10.c Cc: stable@kernel.org Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 06 2月, 2009 1 次提交
-
-
由 NeilBrown 提交于
If a raid1 only has a single working device and gets a read error, we choose to simply return that error up to the filesystem (or whatever) rather than failing the whole array. However the codes doesn't quite do that. We attempt a readbalance which allocates the same drive, so we retry the read - indefinitely. Instead: If read_balance in the error case chooses the same drive that just failed, treat it as a failure and don't retry. Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 09 1月, 2009 2 次提交
-
-
由 NeilBrown 提交于
If a raid1 has only one working drive and it has a sector which gives an error on read, then an attempt to recover onto a spare will fail, but as the single remaining drive is not removed from the array, the recovery will be immediately re-attempted, resulting in an infinite recovery loop. So detect this situation and don't retry recovery once an error on the lone remaining drive is detected. Allow recovery to be retried once every time a spare is added in case the problem wasn't actually a media error. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 Cheng Renquan 提交于
The rdev_for_each macro defined in <linux/raid/md_k.h> is identical to list_for_each_entry_safe, from <linux/list.h>, it should be defined to use list_for_each_entry_safe, instead of reinventing the wheel. But some calls to each_entry_safe don't really need a safe version, just a direct list_for_each_entry is enough, this could save a temp variable (tmp) in every function that used rdev_for_each. In this patch, most rdev_for_each loops are replaced by list_for_each_entry, totally save many tmp vars; and only in the other situations that will call list_del to delete an entry, the safe version is used. Signed-off-by: NCheng Renquan <crquan@gmail.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 15 10月, 2008 1 次提交
-
-
由 Stephen Rothwell 提交于
Today's linux-next build (powerpc ppc64_defconfig) failed like this: drivers/md/raid1.c: In function 'sync_request': drivers/md/raid1.c:1759: error: implicit declaration of function 'msleep_interruptible' make[3]: *** [drivers/md/raid1.o] Error 1 make[3]: *** Waiting for unfinished jobs.... drivers/md/raid10.c: In function 'sync_request': drivers/md/raid10.c:1749: error: implicit declaration of function 'msleep_interruptible' make[3]: *** [drivers/md/raid10.o] Error 1 drivers/md/md.c: In function 'md_do_sync': drivers/md/md.c:5915: error: implicit declaration of function 'msleep' Caused by commit 6caa3b0bbdb474647f6bdd8a958ffc46f78d8d58 ("md: Remove unnecessary #includes, #defines, and function declarations"). I added the following patch. Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 09 10月, 2008 4 次提交
-
-
由 Tejun Heo 提交于
Move stats related fields - stamp, in_flight, dkstats - from disk to part0 and unify stat handling such that... * part_stat_*() now updates part0 together if the specified partition is not part0. ie. part_stat_*() are now essentially all_stat_*(). * {disk|all}_stat_*() are gone. * part_round_stats() is updated similary. It handles part0 stats automatically and disk_round_stats() is killed. * part_{inc|dec}_in_fligh() is implemented which automatically updates part0 stats for parts other than part0. * disk_map_sector_rcu() is updated to return part0 if no part matches. Combined with the above changes, this makes NULL special case handling in callers unnecessary. * Separate stats show code paths for disk are collapsed into part stats show code paths. * Rename disk_stat_lock/unlock() to part_stat_lock/unlock() While at it, reposition stat handling macros a bit and add missing parentheses around macro parameters. Signed-off-by: NTejun Heo <tj@kernel.org> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Tejun Heo 提交于
There are two variants of stat functions - ones prefixed with double underbars which don't care about preemption and ones without which disable preemption before manipulating per-cpu counters. It's unclear whether the underbarred ones assume that preemtion is disabled on entry as some callers don't do that. This patch unifies diskstats access by implementing disk_stat_lock() and disk_stat_unlock() which take care of both RCU (for partition access) and preemption (for per-cpu counter access). diskstats access should always be enclosed between the two functions. As such, there's no need for the versions which disables preemption. They're removed and double underbars ones are renamed to drop the underbars. As an extra argument is added, there's no danger of using the old version unconverted. disk_stat_lock() uses get_cpu() and returns the cpu index and all diskstat functions which access per-cpu counters now has @cpu argument to help RT. This change adds RCU or preemption operations at some places but also collapses several preemption ops into one at others. Overall, the performance difference should be negligible as all involved ops are very lightweight per-cpu ones. Signed-off-by: NTejun Heo <tj@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Jens Axboe 提交于
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Mikulas Patocka 提交于
Remove hw_segments field from struct bio and struct request. Without virtual merge accounting they have no purpose. Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
- 21 7月, 2008 1 次提交
-
-
由 Andre Noll 提交于
This patch renames the array_size field of struct mddev_s to array_sectors and converts all instances to use units of 512 byte sectors instead of 1k blocks. Signed-off-by: NAndre Noll <maan@systemlinux.org> Signed-off-by: NNeilBrown <neilb@suse.de>
-