- 21 7月, 2008 7 次提交
-
-
由 NeilBrown 提交于
All modifications and most access to the mddev->disks list are made under the reconfig_mutex lock. However there are three places where the list is walked without any locking. If a reconfig happens at this time, havoc (and oops) can ensue. So use RCU to protect these accesses: - wrap them in rcu_read_{,un}lock() - use list_for_each_entry_rcu - add to the list with list_add_rcu - delete from the list with list_del_rcu - delay the 'free' with call_rcu rather than schedule_work Note that export_rdev did a list_del_init on this list. In almost all cases the entry was not in the list anymore so it was a no-op and so safe. It is no longer safe as after list_del_rcu we may not touch the list_head. An audit shows that export_rdev is called: - after unbind_rdev_from_array, in which case the delete has already been done, - after bind_rdev_to_array fails, in which case the delete isn't needed. - before the device has been put on a list at all (e.g. in add_new_disk where reading the superblock fails). - and in autorun devices after a failure when the device is on a different list. So remove the list_del_init call from export_rdev, and add it back immediately before the called to export_rdev for that last case. Note also that ->same_set is sometimes used for lists other than mddev->list (e.g. candidates). In these cases rcu is not needed. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
Open isn't the only thing that increments ->active. e.g. reading /proc/mdstat will increment it briefly. So to avoid false positives in testing for concurrent access, introduce a new counter that counts just the number of times the md device it open. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 Andre Noll 提交于
Signed-off-by: NAndre Noll <maan@systemlinux.org> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 Andre Noll 提交于
This patch renames the array_size field of struct mddev_s to array_sectors and converts all instances to use units of 512 byte sectors instead of 1k blocks. Signed-off-by: NAndre Noll <maan@systemlinux.org> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 Andre Noll 提交于
Also, change the type of the size parameter from unsigned long long to sector_t and rename it to num_sectors. Signed-off-by: NAndre Noll <maan@systemlinux.org> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 Andre Noll 提交于
The checks in overlaps() expect all parameters either in block-based or sector-based quantities. However, its single caller passes two rdev->data_offset arguments as well as two rdev->size arguments, the former being sector counts while the latter are measured in 1K blocks. This could cause rdev_size_store() to accept an invalid size from user space. Fix it by passing only sector-based quantities to overlaps(). Signed-off-by: NAndre Noll <maan@systemlinux.org> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 Neil Brown 提交于
- used strict_strtoull in place of simple_strtoull - use my_mddev in place of rdev->mddev (they have the same value) and more significantly, - don't adjust mddev->size to fit, rather reject changes which make rdev->size smaller than mddev->size Adjusting mddev->size is a hangover from bind_rdev_to_array which does a similar thing. But it really is a better design to insist that mddev->size is set as required, then the rdev->sizes are set to allow for that. The previous way invites confusion. Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 15 7月, 2008 1 次提交
-
-
由 Chandra Seetharaman 提交于
Do not automatically "select" SCSI_DH for dm-multipath. If SCSI_DH doesn't exist,just do not allow hardware handlers to be used. Handle SCSI_DH being a module also. Make sure it doesn't allow DM_MULTIPATH to be compiled in when SCSI_DH is a module. [jejb: added comment for Kconfig syntax] Signed-off-by: NChandra Seetharaman <sekharan@us.ibm.com> Reported-by: NRandy Dunlap <randy.dunlap@oracle.com> Reported-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NJames Bottomley <James.Bottomley@HansenPartnership.com>
-
- 11 7月, 2008 10 次提交
-
-
由 Andre Noll 提交于
Rename it to sb_start to make sure all users have been converted. Signed-off-by: NAndre Noll <maan@systemlinux.org> Signed-off-by: NNeil Brown <neilb@suse.de>
-
由 Andre Noll 提交于
As BLOCK_SIZE_BITS is 10 and MD_NEW_SIZE_SECTORS(2 * x) = 2 * NEW_SIZE_BLOCKS(x), the return value of calc_dev_sboffset() doubles. Fix up all three callers accordingly. Signed-off-by: NAndre Noll <maan@systemlinux.org> Signed-off-by: NNeil Brown <neilb@suse.de>
-
由 Andre Noll 提交于
Number of sectors is the preferred unit for sizes of raid devices, so change calc_dev_size() so that it returns this unit instead of the number of 1K blocks. Signed-off-by: NAndre Noll <maan@systemlinux.org> Signed-off-by: NNeil Brown <neilb@suse.de>
-
由 Andre Noll 提交于
Changing the internal representations of sizes of raid devices from 1K blocks to sector counts (512B units) is desirable because it allows to get rid of many divisions/multiplications and unnecessary casts that are present in the current code. This patch is a first step in this direction. It replaces the old 1K-based "size" argument of update_size() by "num_sectors" and fixes up its two callers. Signed-off-by: NAndre Noll <maan@systemlinux.org> Signed-off-by: NNeil Brown <neilb@suse.de>
-
由 Neil Brown 提交于
do_md_stop check the number of active users before allowing the array to be stopped. Two problems: 1/ it assumes the request is coming through an open file descriptor (via ioctl) so it allows for that. This is not always the case. 2/ it doesn't do the check it the array hasn't been activated. This is not good for cases when we use an inactive array to hold some devices in a container. Signed-off-by: NNeil Brown <neilb@suse.de>
-
由 Andre Noll 提交于
The current code copies a signed int from user space, converts it to unsigned and passes the unsigned value to find_rdev_nr() which expects a signed value. Simply pass the signed value from user space directly. Signed-off-by: NAndre Noll <maan@systemlinux.org> Signed-off-by: NNeil Brown <neilb@suse.de>
-
由 Andre Noll 提交于
Signed-off-by: NAndre Noll <maan@systemlinux.org> Signed-off-by: NNeil Brown <neilb@suse.de>
-
由 Andre Noll 提交于
If alloc_page() fails, ENOMEM is a more suitable error value than EINVAL. Signed-off-by: NAndre Noll <maan@systemlinux.org> Signed-off-by: NNeil Brown <neilb@suse.de>
-
由 Andre Noll 提交于
The only caller of sb_equal() tests the return value against zero, so it's OK to return the negated return value of memcmp(). Signed-off-by: NAndre Noll <maan@systemlinux.org> Signed-off-by: NNeil Brown <neilb@suse.de>
-
由 Andre Noll 提交于
Signed-off-by: NAndre Noll <maan@systemlinux.org> Signed-off-by: NNeil Brown <neilb@suse.de>
-
- 10 7月, 2008 1 次提交
-
-
由 Dan Williams 提交于
Remove the dubious attempt to prefer 'compute' over 'read'. Not only is it wrong given commit c337869d (md: do not compute parity unless it is on a failed drive), but it can trigger a BUG_ON in handle_parity_checks5(). Cc: <stable@kernel.org> Signed-off-by: NDan Williams <dan.j.williams@intel.com> Signed-off-by: NNeil Brown <neilb@suse.de>
-
- 08 7月, 2008 7 次提交
-
-
由 Andre Noll 提交于
Signed-off-by: NAndre Noll <maan@systemlinux.org> Signed-off-by: NNeil Brown <neilb@suse.de>
-
由 Andre Noll 提交于
Signed-off-by: NAndre Noll <maan@systemlinux.org> Signed-off-by: NNeil Brown <neilb@suse.de>
-
由 Andre Noll 提交于
Signed-off-by: NAndre Noll <maan@systemlinux.org> Signed-off-by: NNeil Brown <neilb@suse.de>
-
由 Andre Noll 提交于
- Remove superfluous parentheses. - Make format string match the type of the variable that is printed. Signed-off-by: NAndre Noll <maan@systemlinux.org> Signed-off-by: NNeil Brown <neilb@suse.de>
-
由 Andre Noll 提交于
In case pers->run() succeeds but creating the bitmap fails, we print an error message stating that pers->run() has failed. Print this message only if pers->run() really failed. Signed-off-by: NAndre Noll <maan@systemlinux.org> Signed-off-by: NNeil Brown <neilb@suse.de>
-
由 Andre Noll 提交于
Signed-off-by: NAndre Noll <maan@systemlinux.org> Signed-off-by: NNeil Brown <neilb@suse.de>
-
由 Andre Noll 提交于
Signed-off-by: NAndre Noll <maan@systemlinux.org> Signed-off-by: NNeil Brown <neilb@suse.de>
-
- 03 7月, 2008 1 次提交
-
-
由 Alasdair G Kergon 提交于
When devices are stacked, one device's merge_bvec_fn may need to perform the mapping and then call one or more functions for its underlying devices. The following bio fields are used: bio->bi_sector bio->bi_bdev bio->bi_size bio->bi_rw using bio_data_dir() This patch creates a new struct bvec_merge_data holding a copy of those fields to avoid having to change them directly in the struct bio when going down the stack only to have to change them back again on the way back up. (And then when the bio gets mapped for real, the whole exercise gets repeated, but that's a problem for another day...) Signed-off-by: NAlasdair G Kergon <agk@redhat.com> Cc: Neil Brown <neilb@suse.de> Cc: Milan Broz <mbroz@redhat.com> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
- 02 7月, 2008 1 次提交
-
-
由 Milan Broz 提交于
Add cond_resched() to prevent monopolising CPU when processing large bios. dm-crypt processes encryption of bios in sector units. If the bio request is big it can spend a long time in the encryption call. Signed-off-by: NMilan Broz <mbroz@redhat.com> Tested-by: NYan Li <elliot.li.tech@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
- 01 7月, 2008 1 次提交
-
-
由 Dan Williams 提交于
md_allow_write() marks the metadata dirty while holding mddev->lock and then waits for the write to complete. For externally managed metadata this causes a deadlock as userspace needs to take the lock to communicate that the metadata update has completed. Change md_allow_write() in the 'external' case to start the 'mark active' operation and then return -EAGAIN. The expected side effects while waiting for userspace to write 'active' to 'array_state' are holding off reshape (code currently handles -ENOMEM), cause some 'stripe_cache_size' change requests to fail, cause some GET_BITMAP_FILE ioctl requests to fall back to GFP_NOIO, and cause updates to 'raid_disks' to fail. Except for 'stripe_cache_size' changes these failures can be mitigated by coordinating with mdmon. md_write_start() still prevents writes from occurring until the metadata handler has had a chance to take action as it unconditionally waits for MD_CHANGE_CLEAN to be cleared. [neilb@suse.de: return -EAGAIN, try GFP_NOIO] Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 28 6月, 2008 11 次提交
-
-
由 Dan Williams 提交于
From: Dan Williams <dan.j.williams@intel.com> Commit a4456856 refactored some of the deep code paths in raid5.c into separate functions. The names chosen at the time do not consistently indicate what is going to happen to the stripe. So, update the names, and since a stripe is a cache element use cache semantics like fill, dirty, and clean. (also, fix up the indentation in fetch_block5) Signed-off-by: NDan Williams <dan.j.williams@intel.com> Signed-off-by: NNeil Brown <neilb@suse.de>
-
由 Dan Williams 提交于
From: Dan Williams <dan.j.williams@intel.com> Neil said: > At the end of ops_run_compute5 you have: > /* ack now if postxor is not set to be run */ > if (tx && !test_bit(STRIPE_OP_POSTXOR, &s->ops_run)) > async_tx_ack(tx); > > It looks odd having that test there. Would it fit in raid5_run_ops > better? The intended global interpretation is that raid5_run_ops can build a chain of xor and memcpy operations. When MD registers the compute-xor it tells async_tx to keep the operation handle around so that another item in the dependency chain can be submitted. If we are just computing a block to satisfy a read then we can terminate the chain immediately. raid5_run_ops gives a better context for this test since it cares about the entire chain. Signed-off-by: NDan Williams <dan.j.williams@intel.com> Signed-off-by: NNeil Brown <neilb@suse.de>
-
由 Dan Williams 提交于
From: Dan Williams <dan.j.williams@intel.com> Currently ops_run_biodrain and other locations have extra logic to determine which blocks are processed in the prexor and non-prexor cases. This can be eliminated if handle_write_operations5 flags the blocks to be processed in all cases via R5_Wantdrain. The presence of the prexor operation is tracked in sh->reconstruct_state. Signed-off-by: NDan Williams <dan.j.williams@intel.com> Signed-off-by: NNeil Brown <neilb@suse.de>
-
由 Dan Williams 提交于
From: Dan Williams <dan.j.williams@intel.com> Track the state of reconstruct operations (recalculating the parity block usually due to incoming writes, or as part of array expansion) Reduces the scope of the STRIPE_OP_{BIODRAIN,PREXOR,POSTXOR} flags to only tracking whether a reconstruct operation has been requested via the ops_request field of struct stripe_head_state. This is the final step in the removal of ops.{pending,ack,complete,count}, i.e. the STRIPE_OP_{BIODRAIN,PREXOR,POSTXOR} flags only request an operation and do not track the state of the operation. Signed-off-by: NDan Williams <dan.j.williams@intel.com> Signed-off-by: NNeil Brown <neilb@suse.de>
-
由 Dan Williams 提交于
From: Dan Williams <dan.j.williams@intel.com> Track the state of compute operations (recalculating a block from all the other blocks in a stripe) with a state flag. Reduces the scope of the STRIPE_OP_COMPUTE_BLK flag to only tracking whether a compute operation has been requested via the ops_request field of struct stripe_head_state. Note, the compute operation that is performed in the course of doing a 'repair' operation (check the parity block, recalculate it and write it back if the check result is not zero) is tracked separately with the 'check_state' variable. Compute operations are held off while a 'check' is in progress, and moving this check out to handle_issuing_new_read_requests5 the helper routine __handle_issuing_new_read_requests5 can be simplified. This is another step towards the removal of ops.{pending,ack,complete,count}, i.e. STRIPE_OP_COMPUTE_BLK only requests an operation and does not track the state of the operation. Signed-off-by: NDan Williams <dan.j.williams@intel.com> Signed-off-by: NNeil Brown <neilb@suse.de>
-
由 Dan Williams 提交于
From: Dan Williams <dan.j.williams@intel.com> Track the state of read operations (copying data from the stripe cache to bio buffers outside the lock) with a state flag. Reduce the scope of the STRIPE_OP_BIOFILL flag to only tracking whether a biofill operation has been requested via the ops_request field of struct stripe_head_state. This is another step towards the removal of ops.{pending,ack,complete,count}, i.e. STRIPE_OP_BIOFILL only requests an operation and does not track the state of the operation. Signed-off-by: NDan Williams <dan.j.williams@intel.com> Signed-off-by: NNeil Brown <neilb@suse.de>
-
由 Dan Williams 提交于
From: Dan Williams <dan.j.williams@intel.com> The STRIPE_OP_* flags record the state of stripe operations which are performed outside the stripe lock. Their use in indicating which operations need to be run is straightforward; however, interpolating what the next state of the stripe should be based on a given combination of these flags is not straightforward, and has led to bugs. An easier to read implementation with minimal degrees of freedom is needed. Towards this goal, this patch introduces explicit states to replace what was previously interpolated from the STRIPE_OP_* flags. For now this only converts the handle_parity_checks5 path, removing a user of the ops.{pending,ack,complete,count} fields of struct stripe_operations. This conversion also found a remaining issue with the current code. There is a small window for a drive to fail between when we schedule a repair and when the parity calculation for that repair completes. When this happens we will writeback to 'failed_num' when we really want to write back to 'pd_idx'. Signed-off-by: NDan Williams <dan.j.williams@intel.com> Signed-off-by: NNeil Brown <neilb@suse.de>
-
由 Dan Williams 提交于
From: Dan Williams <dan.j.williams@intel.com> Let the raid6 path call ops_run_io to get pending i/o submitted. Signed-off-by: NDan Williams <dan.j.williams@intel.com> Signed-off-by: NNeil Brown <neilb@suse.de>
-
由 Dan Williams 提交于
From: Dan Williams <dan.j.williams@intel.com> In handle_stripe after taking sh->lock we sample some bits into 's' (struct stripe_head_state): s.syncing = test_bit(STRIPE_SYNCING, &sh->state); s.expanding = test_bit(STRIPE_EXPAND_SOURCE, &sh->state); s.expanded = test_bit(STRIPE_EXPAND_READY, &sh->state); Use these values from 's' in ops_run_io() rather than re-sampling the bits. This ensures a consistent snapshot (as seen under sh->lock) is used. Signed-off-by: NDan Williams <dan.j.williams@intel.com> Signed-off-by: NNeil Brown <neilb@suse.de>
-
由 Dan Williams 提交于
From: Dan Williams <dan.j.williams@intel.com> The R5_Want{Read,Write} flags already gate i/o. So, this flag is superfluous and we can unconditionally call ops_run_io(). Signed-off-by: NDan Williams <dan.j.williams@intel.com> Signed-off-by: NNeil Brown <neilb@suse.de>
-
由 Dan Williams 提交于
From: Dan Williams <dan.j.williams@intel.com> This micro-optimization allowed the raid code to skip a re-read of the parity block after checking parity. It took advantage of the fact that xor-offload-engines have their own internal result buffer and can check parity without writing to memory. Remove it for the following reasons: 1/ It is a layering violation for MD to need to manage the DMA and non-DMA paths within async_xor_zero_sum 2/ Bad precedent to toggle the 'ops' flags outside the lock 3/ Hard to realize a performance gain as reads will not need an updated parity block and writes will dirty it anyways. Signed-off-by: NDan Williams <dan.j.williams@intel.com> Signed-off-by: NNeil Brown <neilb@suse.de>
-