提交 · 80268ee9270ebe4847365a7426de91d179e870d0 · openeuler / raspberrypi-kernel

13 10月, 2008 3 次提交

md: Don't try to set an array to 'read-auto' if it is already in that state. · 80268ee9

由 NeilBrown 提交于 10月 13, 2008

'read-auto' is a variant of 'readonly' which will switch to writable
on the first write attempt.

Calling do_md_stop to set the array readonly when it is already readonly
returns an error.  So make sure not to do that.
Signed-off-by: NNeilBrown <neilb@suse.de>

80268ee9

md: Allow metadata_version to be updated for externally managed metadata. · ea43ddd8

由 NeilBrown 提交于 10月 13, 2008

For externally managed metadata, the 'metadata_version' sysfs
attribute is really just a channel for user-space programs to
communicate about how the array is being managed.
It can be useful for this to be changed while the array is active.

Normally changes to metadata_version are not permitted while the array
is active.  Change that so that if the metadata is externally managed,
the metadata_version can be changed to a different flavour of external
management.
Signed-off-by: NNeilBrown <neilb@suse.de>

ea43ddd8

md: Fix rdev_size_store with size == 0 · 7d3c6f87

由 Chris Webb 提交于 10月 13, 2008


Fix rdev_size_store with size == 0.
size == 0 means to use the largest size allowed by the
underlying device and is used when modifying an active array.

This fixes a regression introduced by
 commit d7027458

Cc: <stable@kernel.org>
Signed-off-by: NChris Webb <chris@arachsys.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

7d3c6f87

09 10月, 2008 3 次提交

block: move stats from disk to part0 · 074a7aca

由 Tejun Heo 提交于 8月 25, 2008

Move stats related fields - stamp, in_flight, dkstats - from disk to
part0 and unify stat handling such that...

* part_stat_*() now updates part0 together if the specified partition
  is not part0.  ie. part_stat_*() are now essentially all_stat_*().

* {disk|all}_stat_*() are gone.

* part_round_stats() is updated similary.  It handles part0 stats
  automatically and disk_round_stats() is killed.

* part_{inc|dec}_in_fligh() is implemented which automatically updates
  part0 stats for parts other than part0.

* disk_map_sector_rcu() is updated to return part0 if no part matches.
  Combined with the above changes, this makes NULL special case
  handling in callers unnecessary.

* Separate stats show code paths for disk are collapsed into part
  stats show code paths.

* Rename disk_stat_lock/unlock() to part_stat_lock/unlock()

While at it, reposition stat handling macros a bit and add missing
parentheses around macro parameters.
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

074a7aca

block: always set bdev->bd_part · 0762b8bd

由 Tejun Heo 提交于 8月 25, 2008

Till now, bdev->bd_part is set only if the bdev was for parts other
than part0.  This patch makes bdev->bd_part always set so that code
paths don't have to differenciate common handling.
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

0762b8bd

block: implement and use {disk|part}_to_dev() · ed9e1982

由 Tejun Heo 提交于 8月 25, 2008

Implement {disk|part}_to_dev() and use them to access generic device
instead of directly dereferencing {disk|part}->dev.  To make sure no
user is left behind, rename generic devices fields to __dev.

This is in preparation of unifying partition 0 handling with other
partitions.
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

ed9e1982

19 9月, 2008 1 次提交

md: Don't wait UNINTERRUPTIBLE for other resync to finish · 9744197c

由 NeilBrown 提交于 9月 19, 2008

When two md arrays share some block device (e.g each uses different
partitions on the one device), a resync of one array will wait for
the resync on the other to finish.

This can be a long time and as it currently waits TASK_UNINTERRUPTIBLE,
the softlockup code notices and complains.

So use TASK_INTERRUPTIBLE instead and make sure to flush signals
before calling schedule.
Signed-off-by: NNeilBrown <neilb@suse.de>

9744197c

01 9月, 2008 1 次提交

Remove invalidate_partition call from do_md_stop. · 271f5a9b

由 NeilBrown 提交于 9月 01, 2008

When stopping an md array, or just switching to read-only, we
currently call invalidate_partition while holding the mddev lock.
The main reason for this is probably to ensure all dirty buffers
are flushed (invalidate_partition calls fsync_bdev).

However if any dirty buffers are found, it will almost certainly cause
a deadlock as starting writeout will require an update to the
superblock, and performing that updates requires taking the mddev
lock - which is already held.

This deadlock can be demonstrated by running "reboot -f -n" with
a root filesystem on md/raid, and some dirty buffers in memory.

All other calls to stop an array should already happen after a flush.
The normal sequence is to stop using the array (e.g. umount) which
will cause __blkdev_put to call sync_blockdev.  Then open the
array and issue the STOP_ARRAY ioctl while the buffers are all still
clean.

So this invalidate_partition is normally a no-op, except for one case
where it will cause a deadlock.

So remove it.

This patch possibly addresses the regression recored in
   http://bugzilla.kernel.org/show_bug.cgi?id=11460
and
   http://bugzilla.kernel.org/show_bug.cgi?id=11452

though it isn't yet clear how it ever worked.
Signed-off-by: NNeilBrown <neilb@suse.de>

271f5a9b

08 8月, 2008 1 次提交

md: cancel check/repair requests when recovery is needed · 56ac36d7

由 Dan Williams 提交于 8月 07, 2008

If a 'repair' is requested when an array is in a position to 'recover' raid1
will perform the repair while md believes a recovery is happening.  Address
this at both ends, i.e. cancel check/repair requests upon detecting a
recover condition and do not call ->spare_active after completing a
check/repair.
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

56ac36d7

05 8月, 2008 4 次提交

Allow faulty devices to be removed from a readonly array. · c89a8eee

由 NeilBrown 提交于 8月 05, 2008

Removing faulty devices from an array is a two stage process.
First the device is moved from being a part of the active array
to being similar to a spare device.  Then it can be removed
by a request from user space.

The first step is currently not performed for read-only arrays,
so the second step can never succeed.

So allow readonly arrays to remove failed devices (which aren't
blocked).
Signed-off-by: NNeilBrown <neilb@suse.de>

c89a8eee

Fail safely when trying to grow an array with a write-intent bitmap. · dba034ee

由 NeilBrown 提交于 8月 05, 2008

We cannot currently change the size of a write-intent bitmap.
So if we change the size of an array which has such a bitmap, it
tries to set bits beyond the end of the bitmap.

For now, simply reject any request to change the size of an array
which has a bitmap.  mdadm can remove the bitmap and add a new one
after the array has changed size.
Signed-off-by: NNeilBrown <neilb@suse.de>

dba034ee

Restore force switch of md array to readonly at reboot time. · 2b25000b

由 NeilBrown 提交于 8月 05, 2008

A recent patch allowed do_md_stop to know whether it was being called
via an ioctl or not, and thus where to allow for an extra open file
descriptor when checking if it is in use.
This broke then switch to readonly performed by the shutdown notifier,
which needs to work even when the array is still (apparently) active
(as md doesn't get told when the filesystem becomes readonly).

So restore this feature by pretending that there can be lots of
file descriptors open, but we still want do_md_stop to switch to
readonly.
Signed-off-by: NNeilBrown <neilb@suse.de>

2b25000b

Make writes to md/safe_mode_delay immediately effective. · 19052c0e

由 NeilBrown 提交于 8月 05, 2008

If we reduce the 'safe_mode_delay', it could still wait for the old
delay to completely expire before doing anything about safe_mode.
Thus the effect if the change is delayed.

To make the effect more immediate, run the timeout function
immediately if the delay was reduced.  This may cause it to run
slightly earlier that required, but that is the safer option.
Signed-off-by: NNeilBrown <neilb@suse.de>

19052c0e

29 7月, 2008 1 次提交

md: do not count blocked devices as spares · e5427135

由 Dan Williams 提交于 7月 28, 2008

remove_and_add_spares() assumes that failed devices have been hot-removed
from the array.  Removal is skipped in the 'blocked' case so do not count a
device in this state as 'spare'.
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

e5427135

24 7月, 2008 1 次提交

md: delay notification of 'active_idle' to the recovery thread · d8e64406

由 Dan Williams 提交于 7月 23, 2008

sysfs_notify might sleep, so do not call it from md_safemode_timeout.
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

d8e64406

21 7月, 2008 6 次提交

md: Protect access to mddev->disks list using RCU · 4b80991c

由 NeilBrown 提交于 7月 21, 2008

All modifications and most access to the mddev->disks list are made
under the reconfig_mutex lock.  However there are three places where
the list is walked without any locking.  If a reconfig happens at this
time, havoc (and oops) can ensue.

So use RCU to protect these accesses:
  - wrap them in rcu_read_{,un}lock()
  - use list_for_each_entry_rcu
  - add to the list with list_add_rcu
  - delete from the list with list_del_rcu
  - delay the 'free' with call_rcu rather than schedule_work

Note that export_rdev did a list_del_init on this list.  In almost all
cases the entry was not in the list anymore so it was a no-op and so
safe.  It is no longer safe as after list_del_rcu we may not touch
the list_head.
An audit shows that export_rdev is called:
  - after unbind_rdev_from_array, in which case the delete has
     already been done,
  - after bind_rdev_to_array fails, in which case the delete isn't needed.
  - before the device has been put on a list at all (e.g. in
      add_new_disk where reading the superblock fails).
  - and in autorun devices after a failure when the device is on a
      different list.

So remove the list_del_init call from export_rdev, and add it back
immediately before the called to export_rdev for that last case.

Note also that ->same_set is sometimes used for lists other than
mddev->list (e.g. candidates).  In these cases rcu is not needed.
Signed-off-by: NNeilBrown <neilb@suse.de>

4b80991c

md: only count actual openers as access which prevent a 'stop' · f2ea68cf

由 NeilBrown 提交于 7月 21, 2008

Open isn't the only thing that increments ->active.  e.g. reading
/proc/mdstat will increment it briefly.  So to avoid false positives
in testing for concurrent access, introduce a new counter that counts
just the number of times the md device it open.
Signed-off-by: NNeilBrown <neilb@suse.de>

f2ea68cf

md: Make mddev->array_size sector-based. · f233ea5c

由 Andre Noll 提交于 7月 21, 2008

This patch renames the array_size field of struct mddev_s to array_sectors
and converts all instances to use units of 512 byte sectors instead of 1k
blocks.
Signed-off-by: NAndre Noll <maan@systemlinux.org>
Signed-off-by: NNeilBrown <neilb@suse.de>

f233ea5c

md: Make super_type->rdev_size_change() take sector-based sizes. · 15f4a5fd

由 Andre Noll 提交于 7月 21, 2008

Also, change the type of the size parameter from unsigned long long to
sector_t and rename it to num_sectors.
Signed-off-by: NAndre Noll <maan@systemlinux.org>
Signed-off-by: NNeilBrown <neilb@suse.de>

15f4a5fd

md: Fix check for overlapping devices. · d07bd3bc

由 Andre Noll 提交于 7月 21, 2008

The checks in overlaps() expect all parameters either in block-based
or sector-based quantities. However, its single caller passes two
rdev->data_offset arguments as well as two rdev->size arguments, the
former being sector counts while the latter are measured in 1K blocks.

This could cause rdev_size_store() to accept an invalid size from user
space. Fix it by passing only sector-based quantities to overlaps().
Signed-off-by: NAndre Noll <maan@systemlinux.org>
Signed-off-by: NNeilBrown <neilb@suse.de>

d07bd3bc

md: Tidy up rdev_size_store a bit: · d7027458

由 Neil Brown 提交于 7月 12, 2008

 - used strict_strtoull in place of simple_strtoull
 - use my_mddev in place of rdev->mddev (they have the same value)
and more significantly,
 - don't adjust mddev->size to fit, rather reject changes which make
   rdev->size smaller than mddev->size

Adjusting mddev->size is a hangover from bind_rdev_to_array which
does a similar thing.  But it really is a better design to insist that
mddev->size is set as required, then the rdev->sizes are set to allow
for that.  The previous way invites confusion.
Signed-off-by: NNeilBrown <neilb@suse.de>

d7027458

11 7月, 2008 10 次提交

md: Turn rdev->sb_offset into a sector-based quantity. · 0f420358

由 Andre Noll 提交于 7月 11, 2008

Rename it to sb_start to make sure all users have been converted.
Signed-off-by: NAndre Noll <maan@systemlinux.org>
Signed-off-by: NNeil Brown <neilb@suse.de>

0f420358

md: Make calc_dev_sboffset() return a sector count. · b73df2d3

由 Andre Noll 提交于 7月 11, 2008

As BLOCK_SIZE_BITS is 10 and

	MD_NEW_SIZE_SECTORS(2 * x) = 2 * NEW_SIZE_BLOCKS(x),

the return value of calc_dev_sboffset() doubles. Fix up all three
callers accordingly.
Signed-off-by: NAndre Noll <maan@systemlinux.org>
Signed-off-by: NNeil Brown <neilb@suse.de>

b73df2d3

md: Replace calc_dev_size() by calc_num_sectors(). · e7debaa4

由 Andre Noll 提交于 7月 11, 2008

Number of sectors is the preferred unit for sizes of raid devices,
so change calc_dev_size() so that it returns this unit instead of
the number of 1K blocks.
Signed-off-by: NAndre Noll <maan@systemlinux.org>
Signed-off-by: NNeil Brown <neilb@suse.de>

e7debaa4

md: Make update_size() take the number of sectors. · d71f9f88

由 Andre Noll 提交于 7月 11, 2008

Changing the internal representations of sizes of raid devices
from 1K blocks to sector counts (512B units) is desirable because
it allows to get rid of many divisions/multiplications and unnecessary
casts that are present in the current code.

This patch is a first step in this direction. It replaces the old
1K-based "size" argument of update_size() by "num_sectors" and
fixes up its two callers.
Signed-off-by: NAndre Noll <maan@systemlinux.org>
Signed-off-by: NNeil Brown <neilb@suse.de>

d71f9f88

md: Better control of when do_md_stop is allowed to stop the array. · df5b20cf

由 Neil Brown 提交于 7月 11, 2008

do_md_stop check the number of active users before allowing the array
to be stopped.
Two problems:
  1/ it assumes the request is coming through an open file descriptor
     (via ioctl) so it allows for that.  This is not always the case.
  2/ it doesn't do the check it the array hasn't been activated.
     This is not good for cases when we use an inactive array to hold
     some devices in a container.
Signed-off-by: NNeil Brown <neilb@suse.de>

df5b20cf

md: get_disk_info(): Don't convert between signed and unsigned and back. · 26ef379f

由 Andre Noll 提交于 7月 11, 2008

The current code copies a signed int from user space, converts it to
unsigned and passes the unsigned value to find_rdev_nr() which expects
a signed value. Simply pass the signed value from user space directly.
Signed-off-by: NAndre Noll <maan@systemlinux.org>
Signed-off-by: NNeil Brown <neilb@suse.de>

26ef379f

md: Simplify restart_array(). · 80fab1d7

由 Andre Noll 提交于 7月 11, 2008

Signed-off-by: NAndre Noll <maan@systemlinux.org>
Signed-off-by: NNeil Brown <neilb@suse.de>

80fab1d7

md: alloc_disk_sb(): Return proper error value. · ebc24337

由 Andre Noll 提交于 7月 11, 2008

If alloc_page() fails, ENOMEM is a more suitable error value
than EINVAL.
Signed-off-by: NAndre Noll <maan@systemlinux.org>
Signed-off-by: NNeil Brown <neilb@suse.de>

ebc24337

md: Simplify sb_equal(). · ce0c8e05

由 Andre Noll 提交于 7月 11, 2008

The only caller of sb_equal() tests the return value against
zero, so it's OK to return the negated return value of memcmp().
Signed-off-by: NAndre Noll <maan@systemlinux.org>
Signed-off-by: NNeil Brown <neilb@suse.de>

ce0c8e05

md: Simplify uuid_equal(). · 05710466

由 Andre Noll 提交于 7月 11, 2008

Signed-off-by: NAndre Noll <maan@systemlinux.org>
Signed-off-by: NNeil Brown <neilb@suse.de>

05710466

08 7月, 2008 7 次提交

md: sb_equal(): Fix misleading printk. · 35020f1a

由 Andre Noll 提交于 3月 23, 2008

Signed-off-by: NAndre Noll <maan@systemlinux.org>
Signed-off-by: NNeil Brown <neilb@suse.de>

35020f1a

A
md: Fix a typo in the comment to cmd_match(). · 7f6ce769
由 Andre Noll 提交于 3月 23, 2008
```
Signed-off-by: NAndre Noll <maan@systemlinux.org>
Signed-off-by: NNeil Brown <neilb@suse.de>
```
7f6ce769

md: Fix typo in array_state comment. · 910d8cb3

由 Andre Noll 提交于 3月 25, 2008

Signed-off-by: NAndre Noll <maan@systemlinux.org>
Signed-off-by: NNeil Brown <neilb@suse.de>

910d8cb3

md: sync_speed_show(): Trivial cleanups. · 9687a60c

由 Andre Noll 提交于 3月 25, 2008

- Remove superfluous parentheses.
- Make format string match the type of the variable that is printed.
Signed-off-by: NAndre Noll <maan@systemlinux.org>
Signed-off-by: NNeil Brown <neilb@suse.de>

9687a60c

md: do_md_run(): Fix misleading error message. · 13e53df3

由 Andre Noll 提交于 3月 26, 2008

In case pers->run() succeeds but creating the bitmap fails, we
print an error message stating that pers->run() has failed.

Print this message only if pers->run() really failed.
Signed-off-by: NAndre Noll <maan@systemlinux.org>
Signed-off-by: NNeil Brown <neilb@suse.de>

13e53df3

A
md: md_getgeo(): Move comment to proper position. · 2f9618ce
由 Andre Noll 提交于 4月 25, 2008
```
Signed-off-by: NAndre Noll <maan@systemlinux.org>
Signed-off-by: NNeil Brown <neilb@suse.de>
```
2f9618ce
A
md: md_ioctl(): Fix misleading indentation. · bb57fc64
由 Andre Noll 提交于 4月 25, 2008
```
Signed-off-by: NAndre Noll <maan@systemlinux.org>
Signed-off-by: NNeil Brown <neilb@suse.de>
```
bb57fc64

01 7月, 2008 1 次提交

md: resolve external metadata handling deadlock in md_allow_write · b5470dc5

由 Dan Williams 提交于 6月 27, 2008

md_allow_write() marks the metadata dirty while holding mddev->lock and then
waits for the write to complete.  For externally managed metadata this causes a
deadlock as userspace needs to take the lock to communicate that the metadata
update has completed.

Change md_allow_write() in the 'external' case to start the 'mark active'
operation and then return -EAGAIN.  The expected side effects while waiting for
userspace to write 'active' to 'array_state' are holding off reshape (code
currently handles -ENOMEM), cause some 'stripe_cache_size' change requests to
fail, cause some GET_BITMAP_FILE ioctl requests to fall back to GFP_NOIO, and
cause updates to 'raid_disks' to fail.  Except for 'stripe_cache_size' changes
these failures can be mitigated by coordinating with mdmon.

md_write_start() still prevents writes from occurring until the metadata
handler has had a chance to take action as it unconditionally waits for
MD_CHANGE_CLEAN to be cleared.

[neilb@suse.de: return -EAGAIN, try GFP_NOIO]
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

b5470dc5

28 6月, 2008 1 次提交

Support changing rdev size on running arrays. · 0cd17fec

由 Chris Webb 提交于 6月 28, 2008

From: Chris Webb <chris@arachsys.com>

Allow /sys/block/mdX/md/rdY/size to change on running arrays, moving the
superblock if necessary for this metadata version. We prevent the available
space from shrinking to less than the used size, and allow it to be set to zero
to fill all the available space on the underlying device.
Signed-off-by: NChris Webb <chris@arachsys.com>
Signed-off-by: NNeil Brown <neilb@suse.de>

0cd17fec