- 16 3月, 2010 1 次提交
-
-
由 NeilBrown 提交于
If a component device has a merge_bvec_fn then as we never call it we must ensure we never need to. Currently this is done by setting max_sector to 1 PAGE, however this does not stop a bio being created with several sub-page iovecs that would violate the merge_bvec_fn. So instead set max_segments to 1 and set the segment boundary to the same as a page boundary to ensure there is only ever one single-page segment of IO requested at a time. This can particularly be an issue when 'xen' is used as it is known to submit multiple small buffers in a single bio. Signed-off-by: NNeilBrown <neilb@suse.de> Cc: stable@kernel.org
-
- 26 2月, 2010 1 次提交
-
-
由 Martin K. Petersen 提交于
The block layer calling convention is blk_queue_<limit name>. blk_queue_max_sectors predates this practice, leading to some confusion. Rename the function to appropriately reflect that its intended use is to set max_hw_sectors. Also introduce a temporary wrapper for backwards compability. This can be removed after the merge window is closed. Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
- 14 12月, 2009 2 次提交
-
-
由 NeilBrown 提交于
Suggested by Oren Held <orenhe@il.ibm.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
Previously barriers were only supported on RAID1. This is because other levels requires synchronisation across all devices and so needed a different approach. Here is that approach. When a barrier arrives, we send a zero-length barrier to every active device. When that completes - and if the original request was not empty - we submit the barrier request itself (with the barrier flag cleared) and then submit a fresh load of zero length barriers. The barrier request itself is asynchronous, but any subsequent request will block until the barrier completes. The reason for clearing the barrier flag is that a barrier request is allowed to fail. If we pass a non-empty barrier through a striping raid level it is conceivable that part of it could succeed and part could fail. That would be way too hard to deal with. So if the first run of zero length barriers succeed, we assume all is sufficiently well that we send the request and ignore errors in the second run of barriers. RAID5 needs extra care as write requests may not have been submitted to the underlying devices yet. So we flush the stripe cache before proceeding with the barrier. Note that the second set of zero-length barriers are submitted immediately after the original request is submitted. Thus when a personality finds mddev->barrier to be set during make_request, it should not return from make_request until the corresponding per-device request(s) have been queued. That will be done in later patches. Signed-off-by: NNeilBrown <neilb@suse.de> Reviewed-by: NAndre Noll <maan@systemlinux.org>
-
- 23 9月, 2009 1 次提交
-
-
由 NeilBrown 提交于
This should writeback from coming when the device is temporarily suspended. Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 11 9月, 2009 1 次提交
-
-
由 Jens Axboe 提交于
Get rid of any functions that test for these bits and make callers use bio_rw_flagged() directly. Then it is at least directly apparent what variable and flag they check. Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
- 03 8月, 2009 2 次提交
-
-
由 NeilBrown 提交于
As revalidate_disk calls check_disk_size_change, it will cause any capacity change of a gendisk to be propagated to the blockdev inode. So use that instead of mucking about with locks and i_size_write. Also add a call to revalidate_disk in do_md_run and a few other places where the gendisk capacity is changed. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 Andre Noll 提交于
This patch replaces md_integrity_check() by two new public functions: md_integrity_register() and md_integrity_add_rdev() which are both personality-independent. md_integrity_register() is called from the ->run and ->hot_remove methods of all personalities that support data integrity. The function iterates over the component devices of the array and determines if all active devices are integrity capable and if their profiles match. If this is the case, the common profile is registered for the mddev via blk_integrity_register(). The second new function, md_integrity_add_rdev() is called from the ->hot_add_disk methods, i.e. whenever a new device is being added to a raid array. If the new device does not support data integrity, or has a profile different from the one already registered, data integrity for the mddev is disabled. For raid0 and linear, only the call to md_integrity_register() from the ->run method is necessary. Signed-off-by: NAndre Noll <maan@systemlinux.org> Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 01 7月, 2009 1 次提交
-
-
由 Martin K. Petersen 提交于
Switch MD over to the new disk_stack_limits() function which checks for aligment and adjusts preferred I/O sizes when stacking. Also indicate preferred I/O sizes where applicable. Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 18 6月, 2009 5 次提交
-
-
由 NeilBrown 提交于
Current, when we update the 'conf' structure, when adding a drive to a linear array, we keep the old version around until the array is finally stopped, as it is not safe to free it immediately. Now that we have rcu protection on all accesses to 'conf', we can use call_rcu to free it more promptly. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 SandeepKsinha 提交于
Due to the lack of memory ordering guarantees, we may have races around mddev->conf. In particular, the correct contents of the structure we get from dereferencing ->private might not be visible to this CPU yet, and they might not be correct w.r.t mddev->raid_disks. This patch addresses the problem using rcu protection to avoid such race conditions. Signed-off-by: NSandeepKsinha <sandeepksinha@gmail.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 Andre Noll 提交于
If the superblock of a component device indicates the presence of a bitmap but the corresponding raid personality does not support bitmaps (raid0, linear, multipath, faulty), then something is seriously wrong and we'd better refuse to run such an array. Currently, this check is performed while the superblocks are examined, i.e. before entering personality code. Therefore the generic md layer must know which raid levels support bitmaps and which do not. This patch avoids this layer violation without adding identical code to various personalities. This is accomplished by introducing a new public function to md.c, md_check_no_bitmap(), which replaces the hard-coded checks in the superblock loading functions. A call to md_check_no_bitmap() is added to the ->run method of each personality which does not support bitmaps and assembly is aborted if at least one component device contains a bitmap. Signed-off-by: NAndre Noll <maan@systemlinux.org> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
This is currently ensured by common code, but it is more reliable to ensure it where it is needed in personality code. All the other personalities that care already round the size to the chunk_size. raid0 and linear are the only hold-outs. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 Andre Noll 提交于
This patch renames the chunk_size field to chunk_sectors with the implied change of semantics. Since is_power_of_2(chunk_size) = is_power_of_2(chunk_sectors << 9) = is_power_of_2(chunk_sectors) these bits don't need an adjustment for the shift. Signed-off-by: NAndre Noll <maan@systemlinux.org> Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 16 6月, 2009 4 次提交
-
-
由 Sandeep K Sinha 提交于
Replace the linear search with binary search in which_dev. Signed-off-by: NSandeep K Sinha <sandeepksinha@gmail.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 Sandeep K Sinha 提交于
Remove num_sectors from dev_info and replace start_sector with end_sector. This makes a lot of comparisons much simpler. Signed-off-by: NSandeep K Sinha <sandeepksinha@gmail.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 Sandeep K Sinha 提交于
Get rid of sector_div and hash table for linear raid and replace with a linear search in which_dev. The hash table adds a lot of complexity for little if any gain. Ultimately a binary search will be used which will have smaller cache foot print, a similar number of memory access, and no divisions. Signed-off-by: NSandeep K Sinha <sandeepksinha@gmail.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
Having a macro just to cast a void* isn't really helpful. I would must rather see that we are simply de-referencing ->private, than have to know what the macro does. So open code the macro everywhere and remove the pointless cast. Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 23 5月, 2009 1 次提交
-
-
由 Martin K. Petersen 提交于
Convert all external users of queue limits to using wrapper functions instead of poking the request queue variables directly. Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
- 31 3月, 2009 6 次提交
-
-
由 Dan Williams 提交于
Get personalities out of the business of directly modifying ->array_sectors. Lays groundwork to introduce policy on when ->array_sectors can be modified. Reviewed-by: NAndre Noll <maan@systemlinux.org> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
由 Dan Williams 提交于
In preparation for giving userspace control over ->array_sectors we need to be able to retrieve the 'default' size, and the 'anticipated' size when a reshape is requested. For personalities that do not reshape emit a warning if anything but the default size is requested. In the raid5 case we need to update ->previous_raid_disks to make the new 'default' size available. Reviewed-by: NAndre Noll <maan@systemlinux.org> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
由 Andre Noll 提交于
This patch renames the "size" field of struct mdk_rdev_s to "sectors" and changes this field to store sectors instead of blocks. All users of this field, linear.c, raid0.c and md.c, are fixed up accordingly which gets rid of many multiplications and divisions. Signed-off-by: NAndre Noll <maan@systemlinux.org> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
It really is nicer to keep related code together.. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
This makes the includes more explicit, and is preparation for moving md_k.h to drivers/md/md.h Remove include/raid/md.h as its only remaining use was to #include other files. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 Christoph Hellwig 提交于
Move the headers with the local structures for the disciplines and bitmap.h into drivers/md/ so that they are more easily grepable for hacking and not far away. md.h is left where it is for now as there are some uses from the outside. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 06 2月, 2009 1 次提交
-
-
由 Andre Noll 提交于
ab5bd5cb introduced the following bug in linear software raid for large arrays on 32 bit machines: which_dev() computes the device holding a given sector by shifting down the sector number to a 32 bit range, dividing by the array spacing and looking up the resulting index in the hash table of the array. Because the computed index might be slightly too small, a loop at the end of which_dev() increases the index until the given sector actually falls into the range of the device associated with that index. The changes of the above mentioned commit caused this loop to check whether the _index_ rather than the sector number is small enough, effectively bypassing the loop and thus possibly returning the wrong device. As reported by Simon Kirby, this leads to errors such as linear_make_request: Sector 2340486136 out of bounds on dev sdi: 156301312 sectors, offset 2109870464 Fix this bug by introducing a local variable for the index so that the variable containing the passed sector is left unchanged. Cc: stable@kernel.org Signed-off-by: NAndre Noll <maan@systemlinux.org> Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 09 1月, 2009 1 次提交
-
-
由 Cheng Renquan 提交于
The rdev_for_each macro defined in <linux/raid/md_k.h> is identical to list_for_each_entry_safe, from <linux/list.h>, it should be defined to use list_for_each_entry_safe, instead of reinventing the wheel. But some calls to each_entry_safe don't really need a safe version, just a direct list_for_each_entry is enough, this could save a temp variable (tmp) in every function that used rdev_for_each. In this patch, most rdev_for_each loops are replaced by list_for_each_entry, totally save many tmp vars; and only in the other situations that will call list_del to delete an entry, the safe version is used. Signed-off-by: NCheng Renquan <crquan@gmail.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 06 11月, 2008 1 次提交
-
-
由 Andre Noll 提交于
We currently oops with a divide error on starting a linear software raid array consisting of at least two very small (< 500K) devices. The bug is caused by the calculation of the hash table size which tries to compute sector_div(sz, base) with "base" being zero due to the small size of the component devices of the array. Fix this by requiring the hash spacing to be at least one which implies that also "base" is non-zero. This bug has existed since about 2.6.14. Cc: stable@kernel.org Signed-off-by: NAndre Noll <maan@systemlinux.org> Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 13 10月, 2008 7 次提交
-
-
由 NeilBrown 提交于
A lot of cruft has gathered over the years. Time to remove it. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 Andre Noll 提交于
This patch renames hash_spacing and preshift to spacing and sector_shift respectively with the following change of semantics: Case 1: (sizeof(sector_t) <= sizeof(u32)). ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In this case, we have sector_shift = preshift = 0 and spacing = 2 * hash_spacing. Hence, the index for the hash table which is computed by the new code in which_dev() as sector / spacing equals the old value which was (sector/2) / hash_spacing. Note also that the value of nb_zone stays the same because both sz and base double. Case 2: (sizeof(sector_t) > sizeof(u32)). ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ (aka the shifting dance case). Here we have sector_shift = preshift + 1 and spacing = 2 * hash_spacing during the computation of nb_zone and curr_sector, but spacing = hash_spacing in which_dev() because in the last hunk of the patch for linear.c we shift down conf->spacing (= 2 * hash_spacing) by one more bit than in the old code. Hence in the computation of nb_zone, sz and base have the same value as before, so nb_zone is not affected. Also curr_sector in the next hunk stays the same. In which_dev() the hash table index is computed as (sector >> sector_shift) / spacing In view of sector_shift = preshift + 1 and spacing = hash_spacing, this equals ((sector/2) >> preshift) / hash_spacing which is the value computed by the old code. Signed-off-by: NAndre Noll <maan@systemlinux.org> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 Andre Noll 提交于
This is a preparation for representing also the remaining fields of struct linear_private_data as sectors. Signed-off-by: NAndre Noll <maan@systemlinux.org> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 Andre Noll 提交于
Rename them to num_sectors and start_sector which is more descriptive. Signed-off-by: NAndre Noll <maan@systemlinux.org> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 Andre Noll 提交于
conf->smallest_size is undefined since day one of the git repo.. Signed-off-by: NAndre Noll <maan@systemlinux.org> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 Andre Noll 提交于
Signed-off-by: NAndre Noll <maan@systemlinux.org> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 Andre Noll 提交于
Signed-off-by: NAndre Noll <maan@systemlinux.org> Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 09 10月, 2008 3 次提交
-
-
由 Denis ChengRq 提交于
Since all bio_split calls refer the same single bio_split_pool, the bio_split function can use bio_split_pool directly instead of the mempool_t parameter; then the mempool_t parameter can be removed from bio_split param list, and bio_split_pool is only referred in fs/bio.c file, can be marked static. Signed-off-by: NDenis ChengRq <crquan@gmail.com> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Tejun Heo 提交于
Move stats related fields - stamp, in_flight, dkstats - from disk to part0 and unify stat handling such that... * part_stat_*() now updates part0 together if the specified partition is not part0. ie. part_stat_*() are now essentially all_stat_*(). * {disk|all}_stat_*() are gone. * part_round_stats() is updated similary. It handles part0 stats automatically and disk_round_stats() is killed. * part_{inc|dec}_in_fligh() is implemented which automatically updates part0 stats for parts other than part0. * disk_map_sector_rcu() is updated to return part0 if no part matches. Combined with the above changes, this makes NULL special case handling in callers unnecessary. * Separate stats show code paths for disk are collapsed into part stats show code paths. * Rename disk_stat_lock/unlock() to part_stat_lock/unlock() While at it, reposition stat handling macros a bit and add missing parentheses around macro parameters. Signed-off-by: NTejun Heo <tj@kernel.org> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Tejun Heo 提交于
There are two variants of stat functions - ones prefixed with double underbars which don't care about preemption and ones without which disable preemption before manipulating per-cpu counters. It's unclear whether the underbarred ones assume that preemtion is disabled on entry as some callers don't do that. This patch unifies diskstats access by implementing disk_stat_lock() and disk_stat_unlock() which take care of both RCU (for partition access) and preemption (for per-cpu counter access). diskstats access should always be enclosed between the two functions. As such, there's no need for the versions which disables preemption. They're removed and double underbars ones are renamed to drop the underbars. As an extra argument is added, there's no danger of using the old version unconverted. disk_stat_lock() uses get_cpu() and returns the cpu index and all diskstat functions which access per-cpu counters now has @cpu argument to help RT. This change adds RCU or preemption operations at some places but also collapses several preemption ops into one at others. Overall, the performance difference should be negligible as all involved ops are very lightweight per-cpu ones. Signed-off-by: NTejun Heo <tj@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
- 21 7月, 2008 2 次提交
-
-
由 Andre Noll 提交于
Signed-off-by: NAndre Noll <maan@systemlinux.org> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 Andre Noll 提交于
This patch renames the array_size field of struct mddev_s to array_sectors and converts all instances to use units of 512 byte sectors instead of 1k blocks. Signed-off-by: NAndre Noll <maan@systemlinux.org> Signed-off-by: NNeilBrown <neilb@suse.de>
-