提交 · 37222e1c9ee3ce587f5b41fed868bd8a592a992f · OpenHarmony / kernel_linux

14 12月, 2009 27 次提交

md: add 'recovery_start' per-device sysfs attribute · 06e3c817

由 Dan Williams 提交于 12月 12, 2009

Enable external metadata arrays to manage rebuild checkpointing via a
md/dev-XXX/recovery_start attribute which reflects rdev->recovery_offset

Also update resync_start_store to allow 'none' to be written, for
consistency.
Signed-off-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

06e3c817

md: rcu_read_lock() walk of mddev->disks in md_do_sync() · 4e59ca7d

由 Dan Williams 提交于 12月 12, 2009

Other walks of this list are either under rcu_read_lock() or the list
mutation lock (mddev_lock()). This protects against the improbable case of a
disk being removed from the array at the start of md_do_sync().
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

4e59ca7d

md: integrate spares into array at earliest opportunity. · 93be75ff

由 NeilBrown 提交于 12月 14, 2009

As v1.x metadata can record that a member of the array is
not completely recovered, it make sense to record that a
spare has become a regular member of the array at the earliest
opportunity.
So remove the tests on "recovery_offset > 0" in super_1_sync
as they really aren't needed, and schedule a metadata update
immediately after adding spares to a degraded array.

This means that if a crash happens immediately after a recovery
starts, the new device will be included in the array and recovery will
continue from wherever it was up to.  Previously this didn't happen
unless recovery was at least 1/16 of the way through.
Signed-off-by: NNeilBrown <neilb@suse.de>

93be75ff

md: move compat_ioctl handling into md.c · aa98aa31

由 Arnd Bergmann 提交于 12月 14, 2009

The RAID ioctls are only implemented in md.c, so the
handling for them should also be moved there from
fs/compat_ioctl.c.
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Cc: Neil Brown <neilb@suse.de>
Cc: Andre Noll <maan@systemlinux.org>
Cc: linux-raid@vger.kernel.org
Signed-off-by: NNeilBrown <neilb@suse.de>

aa98aa31

md: revise Kconfig help for MD_MULTIPATH · 93bd89a6

由 NeilBrown 提交于 12月 14, 2009

Make it clear in the config message that MD_MULTIPATH is not under
active development.

Cc: Oren Held <orenhe@il.ibm.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

93bd89a6

N
md: add MODULE_DESCRIPTION for all md related modules. · 0efb9e61
由 NeilBrown 提交于 12月 14, 2009
```
Suggested by  Oren Held <orenhe@il.ibm.com>
Signed-off-by: NNeilBrown <neilb@suse.de>
```
0efb9e61

raid: improve MD/raid10 handling of correctable read errors. · 1e50915f

由 Robert Becker 提交于 12月 14, 2009

We've noticed severe lasting performance degradation of our raid
arrays when we have drives that yield large amounts of media errors.
The raid10 module will queue each failed read for retry, and also
will attempt call fix_read_error() to perform the read recovery.
Read recovery is performed while the array is frozen, so repeated
recovery attempts can degrade the performance of the array for
extended periods of time.

With this patch I propose adding a per md device max number of
corrected read attempts.  Each rdev will maintain a count of
read correction attempts in the rdev->read_errors field (not
used currently for raid10). When we enter fix_read_error()
we'll check to see when the last read error occurred, and
divide the read error count by 2 for every hour since the
last read error. If at that point our read error count
exceeds the read error threshold, we'll fail the raid device.

In addition in this patch I add sysfs nodes (get/set) for
the per md max_read_errors attribute, the rdev->read_errors
attribute, and added some printk's to indicate when
fix_read_error fails to repair an rdev.

For testing I used debugfs->fail_make_request to inject
IO errors to the rdev while doing IO to the raid array.
Signed-off-by: NRobert Becker <Rob.Becker@riverbed.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

1e50915f

md/raid10: print more useful messages on device failure. · 67b8dc4b

由 Robert Becker 提交于 12月 14, 2009

When we get a read error on a device in a RAID10, and attempting to
repair the error fails, print more useful messages about why it
failed.
Signed-off-by: NRobert Becker <Rob.Becker@riverbed.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

67b8dc4b

md/bitmap: update dirty flag when bitmap bits are explicitly set. · ffa23322

由 NeilBrown 提交于 12月 14, 2009

There is a sysfs file which allows bits in the write-intent
bitmap to be explicit set - indicating that the block is thought
to be 'dirty'.
When this happens we should really set recovery_cp backwards
to include the block to reflect this dirtiness.

In particular, a 'resync' process will refuse to start if
recovery_cp is beyond the end of the array, so this is needed
to allow a resync to be triggered.
Signed-off-by: NNeilBrown <neilb@suse.de>

ffa23322

md: Support write-intent bitmaps with externally managed metadata. · ece5cff0

由 NeilBrown 提交于 12月 14, 2009

In this case, the metadata needs to not be in the same
sector as the bitmap.
md will not read/write any bitmap metadata.  Config must be
done via sysfs and when a recovery makes the array non-degraded
again, writing 'true' to 'bitmap/can_clear' will allow bits in
the bitmap to be cleared again.
Signed-off-by: NNeilBrown <neilb@suse.de>

ece5cff0

md/bitmap: move setting of daemon_lastrun out of bitmap_read_sb · 624ce4f5

由 NeilBrown 提交于 12月 14, 2009

Setting daemon_lastrun really has nothing to do with reading
the bitmap superblock, it just happens to be needed at the same time.
bitmap_read_sb is about to become options, so move that code out
to after the call to bitmap_read_sb.
Signed-off-by: NNeilBrown <neilb@suse.de>

624ce4f5

md: support updating bitmap parameters via sysfs. · 43a70507

由 NeilBrown 提交于 12月 14, 2009

A new attribute directory 'bitmap' in 'md' is created which
contains files for configuring the bitmap.
'location' identifies where the bitmap is, either 'none',
or 'file' or 'sector offset from metadata'.
Writing 'location' can create or remove a bitmap.
Adding a 'file' bitmap this way is not yet supported.
'chunksize' and 'time_base' must be set before 'location'
can be set.

'chunksize' can be set before creating a bitmap, but is
currently always over-ridden by the bitmap superblock.

'time_base' and 'backlog' can be updated at any time.
Signed-off-by: NNeilBrown <neilb@suse.de>
Reviewed-by: NAndre Noll <maan@systemlinux.org>

43a70507

md: factor out parsing of fixed-point numbers · 72e02075

由 NeilBrown 提交于 12月 14, 2009

safe_delay_store can parse fixed point numbers (for fractions
of a second).  We will want to do that for another sysfs
file soon, so factor out the code.
Signed-off-by: NNeilBrown <neilb@suse.de>

72e02075

md: support bitmap offset appropriate for external-metadata arrays. · f6af949c

由 NeilBrown 提交于 12月 14, 2009

For md arrays were metadata is managed externally, the kernel does not
know about a superblock so the superblock offset is 0.
If we want to have a write-intent-bitmap near the end of the
devices of such an array, we should support sector_t sized offset.
We need offset be possibly negative for when the bitmap is before
the metadata, so use loff_t instead.

Also add sanity check that bitmap does not overlap with data.
Signed-off-by: NNeilBrown <neilb@suse.de>

f6af949c

md: remove needless setting of thread->timeout in raid10_quiesce · 9cd30fdc

由 NeilBrown 提交于 12月 14, 2009

As bitmap_create and bitmap_destroy already set thread->timeout
as appropriate, there is no need to do it in raid10_quiesce.
There is a possible need to wake the thread after the timeout
has been set low, but it is better to do that where the timeout
is actually set low, in bitmap_create.
Signed-off-by: NNeilBrown <neilb@suse.de>

9cd30fdc

N
md: change daemon_sleep to be in 'jiffies' rather than 'seconds'. · 1b04be96
由 NeilBrown 提交于 12月 14, 2009
```
This removes a lot of multiplications by HZ.
Signed-off-by: NNeilBrown <neilb@suse.de>
```
1b04be96

md: move offset, daemon_sleep and chunksize out of bitmap structure · 42a04b50

由 NeilBrown 提交于 12月 14, 2009

... and into bitmap_info.  These are all configuration parameters
that need to be set before the bitmap is created.
Signed-off-by: NNeilBrown <neilb@suse.de>

42a04b50

md: collect bitmap-specific fields into one structure. · c3d9714e

由 NeilBrown 提交于 12月 14, 2009

In preparation for making bitmap fields configurable via sysfs,
start tidying up by making a single structure to contain the
configuration fields.
Signed-off-by: NNeilBrown <neilb@suse.de>

c3d9714e

N
md/raid1: add takeover support for raid5->raid1 · 709ae487
由 NeilBrown 提交于 12月 14, 2009
```
A 2-device raid5 array can now be converted to raid1.
Signed-off-by: NNeilBrown <neilb@suse.de>
```
709ae487

md: add honouring of suspend_{lo,hi} to raid1. · 6eef4b21

由 NeilBrown 提交于 12月 14, 2009

This will allow us to stop writeout to portions of the array
while  they are resynced by someone else - e.g. another node in
a cluster.
Signed-off-by: NNeilBrown <neilb@suse.de>

6eef4b21

md/raid5: don't complete make_request on barrier until writes are scheduled · 729a1866

由 NeilBrown 提交于 12月 14, 2009

The post-barrier-flush is sent by md as soon as make_request on the
barrier write completes.  For raid5, the data might not be in the
per-device queues yet.  So for barrier requests, wait for any
pre-reading to be done so that the request will be in the per-device
queues.

We use the 'preread_active' count to check that nothing is still in
the preread phase, and delay the decrement of this count until after
write requests have been submitted to the underlying devices.
Signed-off-by: NNeilBrown <neilb@suse.de>

729a1866

md: support barrier requests on all personalities. · a2826aa9

由 NeilBrown 提交于 12月 14, 2009

Previously barriers were only supported on RAID1.  This is because
other levels requires synchronisation across all devices and so needed
a different approach.
Here is that approach.

When a barrier arrives, we send a zero-length barrier to every active
device.  When that completes - and if the original request was not
empty -  we submit the barrier request itself (with the barrier flag
cleared) and then submit a fresh load of zero length barriers.

The barrier request itself is asynchronous, but any subsequent
request will block until the barrier completes.

The reason for clearing the barrier flag is that a barrier request is
allowed to fail.  If we pass a non-empty barrier through a striping
raid level it is conceivable that part of it could succeed and part
could fail.  That would be way too hard to deal with.
So if the first run of zero length barriers succeed, we assume all is
sufficiently well that we send the request and ignore errors in the
second run of barriers.

RAID5 needs extra care as write requests may not have been submitted
to the underlying devices yet.  So we flush the stripe cache before
proceeding with the barrier.

Note that the second set of zero-length barriers are submitted
immediately after the original request is submitted.  Thus when
a personality finds mddev->barrier to be set during make_request,
it should not return from make_request until the corresponding
per-device request(s) have been queued.

That will be done in later patches.
Signed-off-by: NNeilBrown <neilb@suse.de>
Reviewed-by: NAndre Noll <maan@systemlinux.org>

a2826aa9

md: don't reset curr_resync_completed after an interrupted resync · efa59339

由 NeilBrown 提交于 12月 14, 2009

If a resync/recovery/check/repair is interrupted for some reason, it
can be useful to know exactly where it got up to.
So in that case, do not clear curr_resync_completed.
Initialise it when starting a resync/recovery/... instead.
Signed-off-by: NNeilBrown <neilb@suse.de>

efa59339

md: adjust resync_min usefully when resync aborts. · c07b70ad

由 NeilBrown 提交于 12月 14, 2009

When a 'check' or 'repair' finished we should clear resync_min
so that a future check/repair will cover the whole array (by default).
However if it is interrupted, we should update resync_min to
where we got up to, so that when the check/repair continues it
just does the remainder of the array.
Signed-off-by: NNeilBrown <neilb@suse.de>

c07b70ad

N
md: remove sparse warning:symbol XXX was not declared. · 7820f9e1
由 NeilBrown 提交于 12月 14, 2009
```
Signed-off-by: NNeilBrown <neilb@suse.de>
```
7820f9e1

md/raid5: remove some sparse warnings. · 8553fe7e

由 NeilBrown 提交于 12月 14, 2009

qd_idx is previously declared and given exactly the same value!
Signed-off-by: NNeilBrown <neilb@suse.de>

8553fe7e

md/bitmap: protect against bitmap removal while being updated. · aa5cbd10

由 NeilBrown 提交于 12月 14, 2009

A write intent bitmap can be removed from an array while the
array is active.
When this happens, all IO is suspended and flushed before the
bitmap is removed.
However it is possible that bitmap_daemon_work is still running to
clear old bits from the bitmap.  If it is, it can dereference the
bitmap after it has been freed.

So introduce a new mutex to protect bitmap_daemon_work and get it
before destroying a bitmap.

This is suitable for any current -stable kernel.
Signed-off-by: NNeilBrown <neilb@suse.de>
Cc: stable@kernel.org

aa5cbd10

05 12月, 2009 1 次提交

[SCSI] scsi_dh: Change the scsidh_activate interface to be asynchronous · 3ae31f6a

由 Chandra Seetharaman 提交于 10月 21, 2009

Make scsi_dh_activate() function asynchronous, by taking in two additional
parameters, one is the callback function and the other is the data to call
the callback function with.
Signed-off-by: NChandra Seetharaman <sekharan@us.ibm.com>
Signed-off-by: NJames Bottomley <James.Bottomley@suse.de>

3ae31f6a

01 12月, 2009 1 次提交

md: revert incorrect fix for read error handling in raid1. · d0e26078

由 NeilBrown 提交于 12月 01, 2009

commit 4706b349 was a forward port of a fix that was needed
for SLES10.  But in fact it is not needed in mainline because
the earlier commit dd00a99e fixes the same problem in a
better way.
Further, this commit introduces a bug in the way it interacts with
the automatic read-error-correction.  If, after a read error is
successfully corrected, the same disk is chosen to re-read - the
re-read won't be attempted but an error will be returned instead.

After reverting that commit, there is the possibility that a
read error on a read-only array (where read errors cannot
be corrected as that requires a write) will repeatedly read the same
device and continue to get an error.
So in the "Array is readonly" case, fail the drive immediately on
a read error.
Signed-off-by: NNeilBrown <neilb@suse.de>
Cc: stable@kernel.org

d0e26078

19 11月, 2009 1 次提交

sysctl: Drop & in front of every proc_handler. · 6d456111

由 Eric W. Biederman 提交于 11月 16, 2009

For consistency drop & in front of every proc_handler.  Explicity
taking the address is unnecessary and it prevents optimizations
like stubbing the proc_handlers to NULL.

Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>

6d456111

13 11月, 2009 3 次提交

md/raid5: Allow dirty-degraded arrays to be assembled when only party is degraded. · c148ffdc

由 NeilBrown 提交于 11月 13, 2009

Normally is it not safe to allow a raid5 that is both dirty and
degraded to be assembled without explicit request from that admin, as
it can cause hidden data corruption.
This is because 'dirty' means that the parity cannot be trusted, and
'degraded' means that the parity needs to be used.

However, if the device that is missing contains only parity, then
there is no issue and assembly can continue.
This particularly applies when a RAID5 is being converted to a RAID6
and there is an unclean shutdown while the conversion is happening.

So check for whether the degraded space only contains parity, and
in that case, allow the assembly.
Signed-off-by: NNeilBrown <neilb@suse.de>

c148ffdc

Don't unconditionally set in_sync on newly added device in raid5_reshape · 7ef90146

由 NeilBrown 提交于 11月 13, 2009

When a reshape finds that it can add spare devices into the array,
those devices might already be 'in_sync' if they are beyond the old
size of the array, or they might not if they are within the array.

The first case happens when we change an N-drive RAID5 to an
N+1-drive RAID5.
The second happens when we convert an N-drive RAID5 to an
N+1-drive RAID6.

So set the flag more carefully.
Also, ->recovery_offset is only meaningful when the flag is clear,
so only set it in that case.

This change needs the preceding two to ensure that the non-in_sync
device doesn't get evicted from the array when it is stopped, in the
case where v0.90 metadata is used.
Signed-off-by: NNeilBrown <neilb@suse.de>

7ef90146

md: allow v0.91 metadata to record devices as being active but not in-sync. · 0261cd9f

由 NeilBrown 提交于 11月 13, 2009

This is a combination that didn't really make sense before.
However when a reshape is converting e.g. raid5 -> raid6, the extra
device is not fully in-sync, but is certainly active and contains
important data.
So allow that start to be meaningful and in particular get
the 'recovery_offset' value (which is needed for any non-in-sync
active device) from the reshape_position.
Signed-off-by: NNeilBrown <neilb@suse.de>

0261cd9f

12 11月, 2009 2 次提交

sysctl drivers: Remove dead binary sysctl support · 894d2491

由 Eric W. Biederman 提交于 11月 05, 2009

Now that sys_sysctl is a wrapper around /proc/sys all of
the binary sysctl support elsewhere in the tree is
dead code.

Cc: Jens Axboe <axboe@kernel.dk>
Cc: Corey Minyard <minyard@acm.org>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Neil Brown <neilb@suse.de>
Cc: "James E.J. Bottomley" <James.Bottomley@suse.de>
Acked-by: Clemens Ladisch <clemens@ladisch.de> for drivers/char/hpet.c
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>

894d2491

md: factor out updating of 'recovery_offset'. · 5e865106

由 NeilBrown 提交于 11月 12, 2009

Each device has its own 'recovery_offset' showing how far
recovery has progressed on the device.
As the only real significance of this is that fact that it can
be stored in the metadata and recovered at restart, and as
only 1.x metadata can do this, we were only updating
'recovery_offset' to 'curr_resync_completed' when updating
v1.x metadata.
But this is wrong, and we will shortly make limited use of this
field in v0.90 metadata.

So move the update into common code.
Signed-off-by: NNeilBrown <neilb@suse.de>

5e865106

09 11月, 2009 1 次提交

tree-wide: fix a very frequent spelling mistake · 06fe9fb4

由 Dirk Hohndel 提交于 9月 28, 2009

something-bility is spelled as something-blity
so a grep for 'blit' would find these lines

this is so trivial that I didn't split it by subsystem / copy
additional maintainers - all changes are to comments
The only purpose is to get fewer false positives when grepping
around the kernel sources.
Signed-off-by: NDirk Hohndel <hohndel@infradead.org>
Signed-off-by: NJiri Kosina <jkosina@suse.cz>

06fe9fb4

06 11月, 2009 2 次提交

md/raid5: make sure curr_sync_completes is uptodate when reshape starts · 8dee7211

由 NeilBrown 提交于 11月 06, 2009

This value is visible through sysfs and is used by mdadm
when it manages a reshape (backing up data that is about to be
rearranged).  So it is important that it is always correct.
Current it does not get updated properly when a reshape
starts which can cause problems when assembling an array
that is in the middle of being reshaped.

This is suitable for 2.6.31.y stable kernels.

Cc: stable@kernel.org
Signed-off-by: NNeilBrown <neilb@suse.de>

8dee7211

md: don't clear endpoint for resync when resync is interrupted. · 24395a85

由 NeilBrown 提交于 11月 06, 2009

If a 'sync_max' has been set (via sysfs), it is wrong to clear it
until a resync (or reshape or recovery ...) actually reached that
point.
So if a resync is interrupted (e.g. by device failure),
leave 'resync_max' unchanged.

This is particularly important for 'reshape' operations that do not
change the size of the array.  For such operations mdadm needs to
monitor the reshape taking rolling backups of the section being
reshaped.  If resync_max gets cleared, the reshape can get ahead of
mdadm and then the backups that mdadm creates are useless.

This is suitable for 2.6.31.y stable kernels.
Cc: stable@kernel.org
Signed-off-by: NNeilBrown <neilb@suse.de>

24395a85

20 10月, 2009 1 次提交
- D
  md/raid6: kill a gcc-4.0.1 'uninitialized variable' warning · 6629542e
  由 Dan Williams 提交于 10月 19, 2009
```
Signed-off-by: NDan Williams <dan.j.williams@intel.com>
```
  6629542e
17 10月, 2009 1 次提交

dm snapshot: allow chunk size to be less than page size · c1cc65ca

由 Mikulas Patocka 提交于 10月 16, 2009

Allow the snapshot chunk size to be smaller than the page size
The code is now capable of handling this due to some previous
fixes and enhancements.

As the page size varies between computers, prior to this patch,
the chunk size of a snapshot dictated which machines could read it:
Snapshots created on one machine might not be readable on another.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Reviewed-by: NMike Snitzer <snitzer@redhat.com>
Reviewed-by: NJonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

c1cc65ca

OpenHarmony / kernel_linux 上一次同步 3 年多

OpenHarmony / kernel_linux
上一次同步 3 年多