提交 · 271f5a9b8f8ae0db95de72779d115c9d0b9d3cc5 · openeuler / Kernel

01 9月, 2008 1 次提交

Remove invalidate_partition call from do_md_stop. · 271f5a9b

由 NeilBrown 提交于 9月 01, 2008

When stopping an md array, or just switching to read-only, we
currently call invalidate_partition while holding the mddev lock.
The main reason for this is probably to ensure all dirty buffers
are flushed (invalidate_partition calls fsync_bdev).

However if any dirty buffers are found, it will almost certainly cause
a deadlock as starting writeout will require an update to the
superblock, and performing that updates requires taking the mddev
lock - which is already held.

This deadlock can be demonstrated by running "reboot -f -n" with
a root filesystem on md/raid, and some dirty buffers in memory.

All other calls to stop an array should already happen after a flush.
The normal sequence is to stop using the array (e.g. umount) which
will cause __blkdev_put to call sync_blockdev.  Then open the
array and issue the STOP_ARRAY ioctl while the buffers are all still
clean.

So this invalidate_partition is normally a no-op, except for one case
where it will cause a deadlock.

So remove it.

This patch possibly addresses the regression recored in
   http://bugzilla.kernel.org/show_bug.cgi?id=11460
and
   http://bugzilla.kernel.org/show_bug.cgi?id=11452

though it isn't yet clear how it ever worked.
Signed-off-by: NNeilBrown <neilb@suse.de>

271f5a9b

08 8月, 2008 1 次提交

md: cancel check/repair requests when recovery is needed · 56ac36d7

由 Dan Williams 提交于 8月 07, 2008

If a 'repair' is requested when an array is in a position to 'recover' raid1
will perform the repair while md believes a recovery is happening.  Address
this at both ends, i.e. cancel check/repair requests upon detecting a
recover condition and do not call ->spare_active after completing a
check/repair.
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

56ac36d7

05 8月, 2008 6 次提交

Allow raid10 resync to happening in larger chunks. · 0310fa21

由 NeilBrown 提交于 8月 05, 2008

The raid10 resync/recovery code currently limits the amount of
in-flight resync IO to 2Meg.  This was copied from raid1 where
it seems quite adequate.  However for raid10, some layouts require
a bit of seeking to perform a resync, and allowing a larger buffer
size means that the seeking can be significantly reduced.

There is probably no real need to limit the amount of in-flight
IO at all.  Any shortage of memory will naturally reduce the
amount of buffer space available down to a set minimum, and any
concurrent normal IO will quickly cause resync IO to back off.

The only problem would be that normal IO has to wait for all resync IO
to finish, so a very large amount of resync IO could cause unpleasant
latency when normal IO starts up.

So: increase RESYNC_DEPTH to allow 32Meg of buffer (if memory is
available) which seems to be a good amount.  Also reduce the amount
of memory reserved as there is no need to keep 2Meg just for resync if
memory is tight.

Thanks to Keld for the suggestion.

Cc: Keld Jørn Simonsen <keld@dkuug.dk>
Signed-off-by: NNeilBrown <neilb@suse.de>

0310fa21

Allow faulty devices to be removed from a readonly array. · c89a8eee

由 NeilBrown 提交于 8月 05, 2008

Removing faulty devices from an array is a two stage process.
First the device is moved from being a part of the active array
to being similar to a spare device.  Then it can be removed
by a request from user space.

The first step is currently not performed for read-only arrays,
so the second step can never succeed.

So allow readonly arrays to remove failed devices (which aren't
blocked).
Signed-off-by: NNeilBrown <neilb@suse.de>

c89a8eee

Don't let a blocked_rdev interfere with read request in raid5/6 · ac4090d2

由 NeilBrown 提交于 8月 05, 2008

When we have externally managed metadata, we need to mark a failed
device as 'Blocked' and not allow any writes until that device
have been marked as faulty in the metadata and the Blocked flag has
been removed.

However it is perfectly OK to allow read requests when there is a
Blocked device, and with a readonly array, there may not be any
metadata-handler watching for blocked devices.

So in raid5/raid6 only allow a Blocked device to interfere with
Write request or resync.  Read requests go through untouched.

raid1 and raid10 already differentiate between read and write
properly.
Signed-off-by: NNeilBrown <neilb@suse.de>

ac4090d2

Fail safely when trying to grow an array with a write-intent bitmap. · dba034ee

由 NeilBrown 提交于 8月 05, 2008

We cannot currently change the size of a write-intent bitmap.
So if we change the size of an array which has such a bitmap, it
tries to set bits beyond the end of the bitmap.

For now, simply reject any request to change the size of an array
which has a bitmap.  mdadm can remove the bitmap and add a new one
after the array has changed size.
Signed-off-by: NNeilBrown <neilb@suse.de>

dba034ee

Restore force switch of md array to readonly at reboot time. · 2b25000b

由 NeilBrown 提交于 8月 05, 2008

A recent patch allowed do_md_stop to know whether it was being called
via an ioctl or not, and thus where to allow for an extra open file
descriptor when checking if it is in use.
This broke then switch to readonly performed by the shutdown notifier,
which needs to work even when the array is still (apparently) active
(as md doesn't get told when the filesystem becomes readonly).

So restore this feature by pretending that there can be lots of
file descriptors open, but we still want do_md_stop to switch to
readonly.
Signed-off-by: NNeilBrown <neilb@suse.de>

2b25000b

Make writes to md/safe_mode_delay immediately effective. · 19052c0e

由 NeilBrown 提交于 8月 05, 2008

If we reduce the 'safe_mode_delay', it could still wait for the old
delay to completely expire before doing anything about safe_mode.
Thus the effect if the change is delayed.

To make the effect more immediate, run the timeout function
immediately if the delay was reduced.  This may cause it to run
slightly earlier that required, but that is the safer option.
Signed-off-by: NNeilBrown <neilb@suse.de>

19052c0e

02 8月, 2008 1 次提交

md: the bitmap code needs to use blk_plug_device_unlocked() · 93769f58

由 Jens Axboe 提交于 8月 01, 2008

It doesn't hold the queue lock, so it's both racey on the queue flags
and thus spews a warning.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

93769f58

01 8月, 2008 2 次提交

A
[PATCH] switch mtd and dm-table to lookup_bdev() · d5686b44
由 Al Viro 提交于 8月 01, 2008
```
No need to open-code it...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
d5686b44

md: raid10: wake up frozen array · 388667be

由 Arthur Jones 提交于 7月 25, 2008

When rescheduling a bio in raid10, we wake up
the md thread, but if the array is frozen, this
will have no effect.  This causes the array to
remain frozen for eternity.  We add a wake_up
to allow the array to de-freeze.  This code is
nearly identical to the raid1 code, which has
this fix already.
Signed-off-by: NArthur Jones <ajones@riverbed.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

388667be

29 7月, 2008 2 次提交

md: do not count blocked devices as spares · e5427135

由 Dan Williams 提交于 7月 28, 2008

remove_and_add_spares() assumes that failed devices have been hot-removed
from the array.  Removal is skipped in the 'blocked' case so do not count a
device in this state as 'spare'.
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

e5427135

md: do not progress the resync process if the stripe was blocked · df10cfbc

由 Dan Williams 提交于 7月 28, 2008

handle_stripe will take no action on a stripe when waiting for userspace
to unblock the array, so do not report completed sectors.
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

df10cfbc

27 7月, 2008 1 次提交

[SCSI] scsi_dh: attach to hardware handler from dm-mpath · ae11b1b3

由 Hannes Reinecke 提交于 7月 17, 2008

multipath keeps a separate device table which may be
more current than the built-in one.
So we should make sure to always call ->attach whenever
a multipath map with hardware handler is instantiated.
And we should call ->detach on removal, too.

[sekharan: update as per comments from agk]
Signed-off-by: NHannes Reinecke <hare@suse.de>
Signed-off-by: NChandra Seetharaman <sekharan@us.ibm.com>
Signed-off-by: NJames Bottomley <James.Bottomley@HansenPartnership.com>

ae11b1b3

24 7月, 2008 3 次提交

md: delay notification of 'active_idle' to the recovery thread · d8e64406

由 Dan Williams 提交于 7月 23, 2008

sysfs_notify might sleep, so do not call it from md_safemode_timeout.
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

d8e64406

md: fix merge error · 23397883

由 Dan Williams 提交于 7月 23, 2008

The original STRIPE_OP_IO removal patch had the following hunk:

-               for (i = conf->raid_disks; i--; ) {
+               for (i = conf->raid_disks; i--; )
                        set_bit(R5_Wantwrite, &sh->dev[i].flags);
-                       if (!test_and_set_bit(STRIPE_OP_IO, &sh->ops.pending))
-                               sh->ops.count++;
-               }

However it appears the hunk became broken after merging:
-               for (i = conf->raid_disks; i--; ) {
+               for (i = conf->raid_disks; i--; )
                        set_bit(R5_Wantwrite, &sh->dev[i].flags);
                        set_bit(R5_LOCKED, &dev->flags);
                        s.locked++;
-                       if (!test_and_set_bit(STRIPE_OP_IO, &sh->ops.pending))
-                               sh->ops.count++;
-               }
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

23397883

md: move async_tx_issue_pending_all outside spin_lock_irq · c9f21aaf

由 Dan Williams 提交于 7月 23, 2008

Some dma drivers need to call spin_lock_bh in their device_issue_pending
routines.  This change avoids:

WARNING: at kernel/softirq.c:136 local_bh_enable_ip+0x3a/0x85()
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

c9f21aaf

21 7月, 2008 19 次提交

dm crypt: add merge · d41e26b9

由 Milan Broz 提交于 7月 21, 2008

This patch implements biovec merge function for crypt target.

If the underlying device has merge function defined, call it.
If not, keep precomputed value.
Signed-off-by: NMilan Broz <mbroz@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

d41e26b9

dm table: remove merge_bvec sector restriction · 9980c638

由 Milan Broz 提交于 7月 21, 2008

Remove max_sector restriction - merge function replaced it.
Signed-off-by: NMilan Broz <mbroz@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

9980c638

dm: linear add merge · 7bc3447b

由 Milan Broz 提交于 7月 21, 2008

This patch implements biovec merge function for linear target.

If the underlying device has merge function defined, call it.
If not, keep precomputed value.
Signed-off-by: NMilan Broz <mbroz@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

7bc3447b

dm: introduce merge_bvec_fn · f6fccb12

由 Milan Broz 提交于 7月 21, 2008

Introduce a bvec merge function for device mapper devices
for dynamic size restrictions.

This code ensures the requested biovec lies within a single
target and then calls a target-specific function to check
against any constraints imposed by underlying devices.
Signed-off-by: NMilan Broz <mbroz@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

f6fccb12

dm snapshot: use per device mempools · 92e86812

由 Mikulas Patocka 提交于 7月 21, 2008

Change snapshot per-module mempool to per-device mempool.

Per-module mempools could cause a deadlock if multiple
snapshot devices are stacked above each other.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

92e86812

dm snapshot: fix race during exception creation · a8d41b59

由 Mikulas Patocka 提交于 7月 21, 2008

Fix a race condition that returns incorrect data when a write causes an
exception to be allocated whilst a read is still in flight.

The race condition happens as follows:
* A read to non-reallocated sector in the snapshot is submitted so that the
  read is routed to the original device.
* A write to the original device is submitted. The write causes an exception
  that reallocates the block.  The write proceeds.
* The original read is dequeued and reads the wrong data.

This race can be triggered with CFQ scheduler and one thread writing and
multiple threads reading simultaneously.

(This patch relies upon the earlier dm-kcopyd-per-device.patch to avoid a
deadlock.)
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

a8d41b59

dm snapshot: track snapshot reads · cd45daff

由 Mikulas Patocka 提交于 7月 21, 2008

Whenever a snapshot read gets mapped through to the origin, track it in
a per-snapshot hash table indexed by chunk number, using memory allocated
from a new per-snapshot mempool.

We need to track these reads to avoid race conditions which will be fixed
by patches that follow.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

cd45daff

dm mpath: fix test for reinstate_path · def052d2

由 Alasdair G Kergon 提交于 7月 21, 2008

Fix test for reinstate_path method before attempting to use it.
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
Cc: Julia Lawall <julia@diku.dk>

def052d2

dm mpath: return parameter error · 148acff6

由 Mikulas Patocka 提交于 7月 21, 2008

Return a specific error message if there are an invalid number of multipath
arguments.

This invalid command returns an "Unknown error" because the ti->error field is
not set

dmsetup create --table '0 2 multipath 0 0 1 1 round-robin 0 1 1 /dev/sdh' mpath0
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

148acff6

dm io: remove struct padding · 6ae2fa67

由 Richard Kennedy 提交于 7月 21, 2008

Rearrange struct dm_io.
Shrinks size from 40 -> 32 allowing more objects/slab.
Signed-off-by: NRichard Kennedy <richard@rsk.demon.co.uk>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

6ae2fa67

dm log: make dm_dirty_log init and exit static · c8da2f8d

由 Adrian Bunk 提交于 7月 21, 2008

dm_dirty_log_{init,exit}() can now become static.
Signed-off-by: NAdrian Bunk <bunk@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

c8da2f8d

dm mpath: free path selector on invalid args · 371b2e34

由 Mikulas Patocka 提交于 7月 21, 2008

Free path selector if the arguments are invalid.

This command (note that it is invalid) causes reference leak on module
"dm_round_robin" and prevents the module from being removed.

dmsetup create --table '0 2 multipath 0 0 1 1 round-robin /dev/sdh' mpath0
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

371b2e34

md: Protect access to mddev->disks list using RCU · 4b80991c

由 NeilBrown 提交于 7月 21, 2008

All modifications and most access to the mddev->disks list are made
under the reconfig_mutex lock.  However there are three places where
the list is walked without any locking.  If a reconfig happens at this
time, havoc (and oops) can ensue.

So use RCU to protect these accesses:
  - wrap them in rcu_read_{,un}lock()
  - use list_for_each_entry_rcu
  - add to the list with list_add_rcu
  - delete from the list with list_del_rcu
  - delay the 'free' with call_rcu rather than schedule_work

Note that export_rdev did a list_del_init on this list.  In almost all
cases the entry was not in the list anymore so it was a no-op and so
safe.  It is no longer safe as after list_del_rcu we may not touch
the list_head.
An audit shows that export_rdev is called:
  - after unbind_rdev_from_array, in which case the delete has
     already been done,
  - after bind_rdev_to_array fails, in which case the delete isn't needed.
  - before the device has been put on a list at all (e.g. in
      add_new_disk where reading the superblock fails).
  - and in autorun devices after a failure when the device is on a
      different list.

So remove the list_del_init call from export_rdev, and add it back
immediately before the called to export_rdev for that last case.

Note also that ->same_set is sometimes used for lists other than
mddev->list (e.g. candidates).  In these cases rcu is not needed.
Signed-off-by: NNeilBrown <neilb@suse.de>

4b80991c

md: only count actual openers as access which prevent a 'stop' · f2ea68cf

由 NeilBrown 提交于 7月 21, 2008

Open isn't the only thing that increments ->active.  e.g. reading
/proc/mdstat will increment it briefly.  So to avoid false positives
in testing for concurrent access, introduce a new counter that counts
just the number of times the md device it open.
Signed-off-by: NNeilBrown <neilb@suse.de>

f2ea68cf

A
md: linear: Make array_size sector-based and rename it to array_sectors. · d6e22150
由 Andre Noll 提交于 7月 21, 2008
```
Signed-off-by: NAndre Noll <maan@systemlinux.org>
Signed-off-by: NNeilBrown <neilb@suse.de>
```
d6e22150

md: Make mddev->array_size sector-based. · f233ea5c

由 Andre Noll 提交于 7月 21, 2008

This patch renames the array_size field of struct mddev_s to array_sectors
and converts all instances to use units of 512 byte sectors instead of 1k
blocks.
Signed-off-by: NAndre Noll <maan@systemlinux.org>
Signed-off-by: NNeilBrown <neilb@suse.de>

f233ea5c

md: Make super_type->rdev_size_change() take sector-based sizes. · 15f4a5fd

由 Andre Noll 提交于 7月 21, 2008

Also, change the type of the size parameter from unsigned long long to
sector_t and rename it to num_sectors.
Signed-off-by: NAndre Noll <maan@systemlinux.org>
Signed-off-by: NNeilBrown <neilb@suse.de>

15f4a5fd

md: Fix check for overlapping devices. · d07bd3bc

由 Andre Noll 提交于 7月 21, 2008

The checks in overlaps() expect all parameters either in block-based
or sector-based quantities. However, its single caller passes two
rdev->data_offset arguments as well as two rdev->size arguments, the
former being sector counts while the latter are measured in 1K blocks.

This could cause rdev_size_store() to accept an invalid size from user
space. Fix it by passing only sector-based quantities to overlaps().
Signed-off-by: NAndre Noll <maan@systemlinux.org>
Signed-off-by: NNeilBrown <neilb@suse.de>

d07bd3bc

md: Tidy up rdev_size_store a bit: · d7027458

由 Neil Brown 提交于 7月 12, 2008

 - used strict_strtoull in place of simple_strtoull
 - use my_mddev in place of rdev->mddev (they have the same value)
and more significantly,
 - don't adjust mddev->size to fit, rather reject changes which make
   rdev->size smaller than mddev->size

Adjusting mddev->size is a hangover from bind_rdev_to_array which
does a similar thing.  But it really is a better design to insist that
mddev->size is set as required, then the rdev->sizes are set to allow
for that.  The previous way invites confusion.
Signed-off-by: NNeilBrown <neilb@suse.de>

d7027458

15 7月, 2008 1 次提交

[SCSI] scsi_dh: fix kconfig related build errors · fe9233fb

由 Chandra Seetharaman 提交于 5月 23, 2008

Do not automatically "select" SCSI_DH for dm-multipath. If SCSI_DH
doesn't exist,just do not allow  hardware handlers to be used.

Handle SCSI_DH being a module also. Make sure it doesn't allow DM_MULTIPATH
to be compiled in when SCSI_DH is a module.

[jejb: added comment for Kconfig syntax]
Signed-off-by: NChandra Seetharaman <sekharan@us.ibm.com>
Reported-by: NRandy Dunlap <randy.dunlap@oracle.com>
Reported-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NJames Bottomley <James.Bottomley@HansenPartnership.com>

fe9233fb

11 7月, 2008 3 次提交

md: Turn rdev->sb_offset into a sector-based quantity. · 0f420358

由 Andre Noll 提交于 7月 11, 2008

Rename it to sb_start to make sure all users have been converted.
Signed-off-by: NAndre Noll <maan@systemlinux.org>
Signed-off-by: NNeil Brown <neilb@suse.de>

0f420358

md: Make calc_dev_sboffset() return a sector count. · b73df2d3

由 Andre Noll 提交于 7月 11, 2008

As BLOCK_SIZE_BITS is 10 and

	MD_NEW_SIZE_SECTORS(2 * x) = 2 * NEW_SIZE_BLOCKS(x),

the return value of calc_dev_sboffset() doubles. Fix up all three
callers accordingly.
Signed-off-by: NAndre Noll <maan@systemlinux.org>
Signed-off-by: NNeil Brown <neilb@suse.de>

b73df2d3

md: Replace calc_dev_size() by calc_num_sectors(). · e7debaa4

由 Andre Noll 提交于 7月 11, 2008

Number of sectors is the preferred unit for sizes of raid devices,
so change calc_dev_size() so that it returns this unit instead of
the number of 1K blocks.
Signed-off-by: NAndre Noll <maan@systemlinux.org>
Signed-off-by: NNeil Brown <neilb@suse.de>

e7debaa4

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功