提交 · e01fd7eeb00f8078103f4ed3e8ef64474c11f300 · openeuler / Kernel

25 4月, 2008 7 次提交

dm io: rename error to error_bits · e01fd7ee

由 Alasdair G Kergon 提交于 4月 24, 2008

Rename 'error' to 'error_bits' for clarity.
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

e01fd7ee

dm snapshot: store pointer to target instance · 72727bad

由 Mikulas Patocka 提交于 4月 24, 2008

Save pointer to dm_target in dm_snapshot structure.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

72727bad

dm log: move dirty region log code into separate module · 769aef30

由 Heinz Mauelshagen 提交于 4月 24, 2008

Move the dirty region log code into a separate module so
other targets can share the code.
Signed-off-by: NHeinz Mauelshagen <hjm@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

769aef30

dm log: generalise name in messages · b7fd54a7

由 Heinz Mauelshagen 提交于 4月 24, 2008

Change dm-log.c messages from "mirror log" to "dirty region log" as
a new dm target wants to share this code.
Signed-off-by: NHeinz Mauelshagen <hjm@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

b7fd54a7

dm raid1: use list_split_init · c12bfc92

由 Robert P. J. Day 提交于 4月 24, 2008

Use shorter list_splice_init() for brevity.
Signed-off-by: NRobert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

c12bfc92

dm snapshot: reduce default memory allocation · 8ee2767a

由 Milan Broz 提交于 4月 24, 2008

Limit the amount of memory allocated per snapshot on systems
with a large page size.  (The larger default chunk size on
these systems compensates for the smaller number of pages reserved.)
Signed-off-by: NMilan Broz <mbroz@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

8ee2767a

dm snapshot: fix chunksize sector conversion · 92436262

由 Mikulas Patocka 提交于 4月 24, 2008

If a snapshot has a smaller chunksize than the page size the
conversion to pages currently returns 0 instead of 1, causing:
kernel BUG in mempool_resize.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMilan Broz <mbroz@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
Cc: stable@kernel.org

92436262

22 4月, 2008 1 次提交

RAID: remove trailing space from printk line · fdefa4d8

由 Nick Andrew 提交于 4月 21, 2008

drivers/md/*.[ch] contains only one more printk line with a trailing space.
Remove it.
Signed-off-by: NNick Andrew <nick@nick-andrew.net>
Signed-off-by: NJesper Juhl <jesper.juhl@gmail.com>

fdefa4d8

11 4月, 2008 1 次提交

md: close a livelock window in handle_parity_checks5 · bd2ab670

由 Dan Williams 提交于 4月 10, 2008

If a failure is detected after a parity check operation has been initiated,
but before it completes handle_parity_checks5 will never quiesce operations on
the stripe.

Explicitly handle this case by "canceling" the parity check, i.e.  clear the
STRIPE_OP_CHECK flags and queue the stripe on the handle list again to refresh
any non-uptodate blocks.

Kernel versions >= 2.6.23 are susceptible.

Cc: <stable@kernel.org>
Cc: NeilBrown <neilb@suse.de>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

bd2ab670

29 3月, 2008 2 次提交

dm io: write error bits form long not int · 4cdc1d1f

由 Alasdair G Kergon 提交于 3月 28, 2008

write_err is an unsigned long used with set_bit() so should not be passed
around as unsigned int.

http://bugzilla.kernel.org/show_bug.cgi?id=10271Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4cdc1d1f

dm crypt: fix ctx pending · 3f1e9070

由 Milan Broz 提交于 3月 28, 2008

Fix regression in dm-crypt introduced in commit
3a7f6c99 ("dm crypt: use async crypto").

If write requests need to be split into pieces, the code must not process them
in parallel because the crypto context cannot be shared.  So there can be
parallel crypto operations on one part of the write, but only one write bio
can be processed at a time.

This is not optimal and the workqueue code needs to be optimized for parallel
processing, but for now it solves the problem without affecting the
performance of synchronous crypto operation (most of current dm-crypt users).

http://bugzilla.kernel.org/show_bug.cgi?id=10242
http://bugzilla.kernel.org/show_bug.cgi?id=10207Signed-off-by: NMilan Broz <mbroz@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3f1e9070

20 3月, 2008 2 次提交

drivers/md/raid5.c: fix printk warnings · 9ea85eba

由 Andrew Morton 提交于 3月 19, 2008

gcc-3.4.5 on sparc64:

drivers/md/raid5.c: In function `raid5_end_read_request':
drivers/md/raid5.c:1147: warning: long long unsigned int format, long unsigned int arg (arg 4)
drivers/md/raid5.c:1164: warning: long long unsigned int format, long unsigned int arg (arg 3)
drivers/md/raid5.c:1170: warning: long long unsigned int format, long unsigned int arg (arg 3)

sector_t is u64, and we don't know what type the architecture uses to
implement u64 (on some it is unsigned long).

Cc: Neil Brown <neilb@suse.de>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9ea85eba

md: remove the 'super' sysfs attribute from devices in an 'md' array · 0e82989d

由 NeilBrown 提交于 3月 19, 2008

Exposing the binary blob which is the md 'super-block' via sysfs doesn't
really fit with the whole sysfs model, and ever since commit
8118a859 ("sysfs: fix off-by-one error
in fill_read_buffer()") it doesn't actually work at all (as the size of
the blob is often one page).

(akpm: as in, fs/sysfs/file.c:fill_read_buffer() goes BUG)

So just remove it altogether.  It isn't really useful.
Signed-off-by: NNeil Brown <neilb@suse.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0e82989d

11 3月, 2008 2 次提交

md: reduce CPU wastage on idle md array with a write-intent bitmap · 7be3dfec

由 NeilBrown 提交于 3月 10, 2008

Recent patch titled
  Reduce CPU wastage on idle md array with a write-intent bitmap.

would sometimes leave the array with dirty bitmap bits that stay dirty.  A
subsequent write would sort things out so it isn't a big problem, but should
be fixed nonetheless.

We need to make sure that when the bitmap becomes not "allclean", the
daemon_sleep really does get set to a sensible value.
Signed-off-by: NNeil Brown <neilb@suse.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7be3dfec

md: fix formatting error in /proc/mdstat · 52720ae7

由 NeilBrown 提交于 3月 10, 2008

If an md array is "auto-read-only", then this appears in /proc/mdstat as

   /dev/md0: active(auto-read-only)

whereas if it is truely readonly, it appears as

   /dev/md0: active (read-only)

The difference being a space.

One program known to parse this file expects the space and gets badly
confused.  It will be fixed, but it would be best if what the kernel generates
is more consistent too.
Signed-off-by: NNeil Brown <neilb@suse.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

52720ae7

05 3月, 2008 9 次提交

md: the md RAID10 resync thread could cause a md RAID10 array deadlock · a07e6ab4

由 K.Tanaka 提交于 3月 04, 2008

This message describes another issue about md RAID10 found by testing the
2.6.24 md RAID10 using new scsi fault injection framework.

Abstract:

When a scsi error results in disabling a disk during RAID10 recovery, the
resync threads of md RAID10 could stall.

This case, the raid array has already been broken and it may not matter.  But
I think stall is not preferable.  If it occurs, even shutdown or reboot will
fail because of resource busy.

The deadlock mechanism:

The r10bio_s structure has a "remaining" member to keep track of BIOs yet to
be handled when recovering.  The "remaining" counter is incremented when
building a BIO in sync_request() and is decremented when finish a BIO in
end_sync_write().

If building a BIO fails for some reasons in sync_request(), the "remaining"
should be decremented if it has already been incremented.  I found a case
where this decrement is forgotten.  This causes a md_do_sync() deadlock
because md_do_sync() waits for md_done_sync() called by end_sync_write(), but
end_sync_write() never calls md_done_sync() because of the "remaining" counter
mismatch.

For example, this problem would be reproduced in the following case:

Personalities : [raid10]
md0 : active raid10 sdf1[4] sde1[5](F) sdd1[2] sdc1[1] sdb1[6](F)
      3919616 blocks 64K chunks 2 near-copies [4/2] [_UU_]
      [>....................]  recovery =  2.2% (45376/1959808) finish=0.7min speed=45376K/sec

This case, sdf1 is recovering, sdb1 and sde1 are disabled.
An additional error with detaching sdd will cause a deadlock.

md0 : active raid10 sdf1[4] sde1[5](F) sdd1[6](F) sdc1[1] sdb1[7](F)
      3919616 blocks 64K chunks 2 near-copies [4/1] [_U__]
      [=>...................]  recovery =  5.0% (99520/1959808) finish=5.9min speed=5237K/sec

 2739 ?        S<     0:17 [md0_raid10]
28608 ?        D<     0:00 [md0_resync]
28629 pts/1    Ss     0:00 bash
28830 pts/1    R+     0:00 ps ax
31819 ?        D<     0:00 [kjournald]

The resync thread keeps working, but actually it is deadlocked.

Patch:
By this patch, the remaining counter will be decremented if needed.
Signed-off-by: NNeil Brown <neilb@suse.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a07e6ab4

md: fix possible raid1/raid10 deadlock on read error during resync · 1c830532

由 NeilBrown 提交于 3月 04, 2008

Thanks to K.Tanaka and the scsi fault injection framework, here is a fix for
another possible deadlock in raid1/raid10 error handing.

If a read request returns an error while a resync is happening and a resync
request is pending, the attempt to fix the error will block until the resync
progresses, and the resync will block until the read request completes.  Thus
a deadlock.

This patch fixes the problem.

Cc: "K.Tanaka" <k-tanaka@ce.jp.nec.com>
Signed-off-by: NNeil Brown <neilb@suse.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1c830532

md: don't attempt read-balancing for raid10 'far' layouts · 8ed3a195

由 Keld Simonsen 提交于 3月 04, 2008

This patch changes the disk to be read for layout "far > 1" to always be the
disk with the lowest block address.

Thus the chunks to be read will always be (for a fully functioning array) from
the first band of stripes, and the raid will then work as a raid0 consisting
of the first band of stripes.

Some advantages:

The fastest part which is the outer sectors of the disks involved will be
used.  The outer blocks of a disk may be as much as 100 % faster than the
inner blocks.

Average seek time will be smaller, as seeks will always be confined to the
first part of the disks.

Mixed disks with different performance characteristics will work better, as
they will work as raid0, the sequential read rate will be number of disks
involved times the IO rate of the slowest disk.

If a disk is malfunctioning, the first disk which is working, and has the
lowest block address for the logical block will be used.
Signed-off-by: NKeld Simonsen <keld@dkuug.dk>
Signed-off-by: NNeil Brown <neilb@suse.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8ed3a195

md: lock access to rdev attributes properly · 27c529bb

由 NeilBrown 提交于 3月 04, 2008

When we access attributes of an rdev (component device on an md array) through
sysfs, we really need to lock the array against concurrent changes.  We
currently do that when we change an attribute, but not when we read an
attribute.  We need to lock when reading as well else rdev->mddev could become
NULL while we are accessing it.

So add appropriate locking (mddev_lock) to rdev_attr_show.

rdev_size_store requires some extra care as well as it needs to unlock the
mddev while scanning other mddevs for overlapping regions.  We currently
assume that rdev->mddev will still be unchanged after the scan, but that
cannot be certain.  So take a copy of rdev->mddev for use at the end of the
function.
Signed-off-by: NNeil Brown <neilb@suse.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

27c529bb

md: make sure a reshape is started when device switches to read-write · 25156198

由 NeilBrown 提交于 3月 04, 2008

A resync/reshape/recovery thread will refuse to progress when the array is
marked read-only. So whenever it mark it not read-only, it is important to
wake up thread resync thread. There is one place we didn't do this.

The problem manifests if the start_ro module parameters is set, and a raid5
array that is in the middle of a reshape (restripe) is started. The array
will initially be semi-read-only (meaning it acts like it is readonly until
the first write). So the reshape will not proceed.

On the first write, the array will become read-write, but the reshape will not
be started, and there is no event which will ever restart that thread.
Signed-off-by: NNeil Brown <neilb@suse.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

25156198

md: clean up irregularity with raid autodetect · d0fae18f

由 NeilBrown 提交于 3月 04, 2008

When a raid1 array is stopped, all components currently get added to the list
for auto-detection.  However we should really only add components that were
found by autodetection in the first place.  So add a flag to record that
information, and use it.
Signed-off-by: NNeil Brown <neilb@suse.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d0fae18f

md: guard against possible bad array geometry in v1 metadata · a1801f85

由 NeilBrown 提交于 3月 04, 2008

Make sure the data doesn't start before the end of the superblock when the
superblock is at the start of the device.
Signed-off-by: NNeil Brown <neilb@suse.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a1801f85

md: reduce CPU wastage on idle md array with a write-intent bitmap · 8311c29d

由 NeilBrown 提交于 3月 04, 2008

On an md array with a write-intent bitmap, a thread wakes up every few seconds
and scans the bitmap looking for work to do. If the array is idle, there will
be no work to do, but a lot of scanning is done to discover this.

So cache the fact that the bitmap is completely clean, and avoid scanning the
whole bitmap when the cache is known to be clean.
Signed-off-by: NNeil Brown <neilb@suse.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8311c29d

md: fix deadlock in md/raid1 and md/raid10 when handling a read error · a35e63ef

由 NeilBrown 提交于 3月 04, 2008

When handling a read error, we freeze the array to stop any other IO while
attempting to over-write with correct data.

This is done in the raid1d(raid10d) thread and must wait for all submitted IO
to complete (except for requests that failed and are sitting in the retry
queue - these are counted in ->nr_queue and will stay there during a freeze).

However write requests need attention from raid1d as bitmap updates might be
required.  This can cause a deadlock as raid1 is waiting for requests to
finish that themselves need attention from raid1d.

So we create a new function 'flush_pending_writes' to give that attention, and
call it in freeze_array to be sure that we aren't waiting on raid1d.

Thanks to "K.Tanaka" <k-tanaka@ce.jp.nec.com> for finding and reporting this
problem.

Cc: "K.Tanaka" <k-tanaka@ce.jp.nec.com>
Signed-off-by: NNeil Brown <neilb@suse.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a35e63ef

20 2月, 2008 1 次提交

dm-raid1.c: fix NULL dereferences · e03f1a84

由 Adrian Bunk 提交于 2月 19, 2008

This patch fixes two NULL dereferences introduced by commit
06386bbf and spotted by the Coverity
checker.
Signed-off-by: NAdrian Bunk <bunk@kernel.org>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e03f1a84

15 2月, 2008 4 次提交

d_path: Make d_path() use a struct path · cf28b486

由 Jan Blunck 提交于 2月 14, 2008

d_path() is used on a <dentry,vfsmount> pair.  Lets use a struct path to
reflect this.

[akpm@linux-foundation.org: fix build in mm/memory.c]
Signed-off-by: NJan Blunck <jblunck@suse.de>
Acked-by: NBryan Wu <bryan.wu@analog.com>
Acked-by: NChristoph Hellwig <hch@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Neil Brown <neilb@suse.de>
Cc: Michael Halcrow <mhalcrow@us.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

cf28b486

d_path: Make seq_path() use a struct path argument · c32c2f63

由 Jan Blunck 提交于 2月 14, 2008

seq_path() is always called with a dentry and a vfsmount from a struct path.
Make seq_path() take it directly as an argument.
Signed-off-by: NJan Blunck <jblunck@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c32c2f63

Introduce path_put() · 1d957f9b

由 Jan Blunck 提交于 2月 14, 2008

* Add path_put() functions for releasing a reference to the dentry and
  vfsmount of a struct path in the right order

* Switch from path_release(nd) to path_put(&nd->path)

* Rename dput_path() to path_put_conditional()

[akpm@linux-foundation.org: fix cifs]
Signed-off-by: NJan Blunck <jblunck@suse.de>
Signed-off-by: NAndreas Gruenbacher <agruen@suse.de>
Acked-by: NChristoph Hellwig <hch@lst.de>
Cc: <linux-fsdevel@vger.kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Steven French <sfrench@us.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1d957f9b

Embed a struct path into struct nameidata instead of nd->{dentry,mnt} · 4ac91378

由 Jan Blunck 提交于 2月 14, 2008

This is the central patch of a cleanup series. In most cases there is no good
reason why someone would want to use a dentry for itself. This series reflects
that fact and embeds a struct path into nameidata.

Together with the other patches of this series
- it enforced the correct order of getting/releasing the reference count on
  <dentry,vfsmount> pairs
- it prepares the VFS for stacking support since it is essential to have a
  struct path in every place where the stack can be traversed
- it reduces the overall code size:

without patch series:
   text    data     bss     dec     hex filename
5321639  858418  715768 6895825  6938d1 vmlinux

with patch series:
   text    data     bss     dec     hex filename
5320026  858418  715768 6894212  693284 vmlinux

This patch:

Switch from nd->{dentry,mnt} to nd->path.{dentry,mnt} everywhere.

[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: fix cifs]
[akpm@linux-foundation.org: fix smack]
Signed-off-by: NJan Blunck <jblunck@suse.de>
Signed-off-by: NAndreas Gruenbacher <agruen@suse.de>
Acked-by: NChristoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4ac91378

14 2月, 2008 1 次提交

dm-raid1 breakage on 64bit · 39ed7adb

由 Al Viro 提交于 2月 13, 2008

test_and_set_bit() on address of uint32_t is a Bad Idea(tm)...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

39ed7adb

08 2月, 2008 10 次提交

dm raid1: report fault status · af195ac8

由 Jonathan Brassow 提交于 2月 08, 2008

This patch adds extra information to the mirror status output, so that
it can be determined which device(s) have failed.  For each mirror device,
a character is printed indicating the most severe error encountered.  The
characters are:
 *    A => Alive - No failures
 *    D => Dead - A write failure occurred leaving mirror out-of-sync
 *    S => Sync - A sychronization failure occurred, mirror out-of-sync
 *    R => Read - A read failure occurred, mirror data unaffected
This allows userspace to properly reconfigure the mirror set.
Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

af195ac8

dm raid1: handle read failures · 06386bbf

由 Jonathan Brassow 提交于 2月 08, 2008

This patch gives the ability to respond-to/record device failures
that happen during read operations.  It also adds the ability to
read from mirror devices that are not the primary if they are
in-sync.

There are essentially two read paths in mirroring; the direct path
and the queued path.  When a read request is mapped, if the region
is 'in-sync' the direct path is taken; otherwise the queued path
is taken.

If the direct path is taken, we must record bio information so that
if the read fails we can retry it.  We then discover the status of
a direct read through mirror_end_io.  If the read has failed, we will
mark the device from which the read was attempted as failed (so we
don't try to read from it again), restore the bio and try again.

If the queued path is taken, we discover the results of the read
from 'read_callback'.  If the device failed, we will mark the device
as failed and attempt the read again if there is another device
where this region is known to be 'in-sync'.
Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

06386bbf

dm raid1: fix EIO after log failure · b80aa7a0

由 Jonathan Brassow 提交于 2月 08, 2008

This patch adds the ability to requeue write I/O to
core device-mapper when there is a log device failure.

If a write to the log produces and error, the pending writes are
put on the "failures" list.  Since the log is marked as failed,
they will stay on the failures list until a suspend happens.

Suspends come in two phases, presuspend and postsuspend.  We must
make sure that all the writes on the failures list are requeued
in the presuspend phase (a requirement of dm core).  This means
that recovery must be complete (because writes may be delayed
behind it) and the failures list must be requeued before we
return from presuspend.

The mechanisms to ensure recovery is complete (or stopped) was
already in place, but needed to be moved from postsuspend to
presuspend.  We rely on 'flush_workqueue' to ensure that the
mirror thread is complete and therefore, has requeued all writes
in the failures list.

Because we are using flush_workqueue, we must ensure that no
additional 'queue_work' calls will produce additional I/O
that we need to requeue (because once we return from
presuspend, we are unable to do anything about it).  'queue_work'
is called in response to the following functions:
- complete_resync_work = NA, recovery is stopped
- rh_dec (mirror_end_io) = NA, only calls 'queue_work' if it
                           is ready to recover the region
                           (recovery is stopped) or it needs
                           to clear the region in the log*
                           **this doesn't get called while
                           suspending**
- rh_recovery_end = NA, recovery is stopped
- rh_recovery_start = NA, recovery is stopped
- write_callback = 1) Writes w/o failures simply call
                   bio_endio -> mirror_end_io -> rh_dec
                   (see rh_dec above)
                   2) Writes with failures are put on
                   the failures list and queue_work is
                   called**
                   ** write_callbacks don't happen
                   during suspend **
- do_failures = NA, 'queue_work' not called if suspending
- add_mirror (initialization) = NA, only done on mirror creation
- queue_bio = NA, 1) delayed I/O scheduled before flush_workqueue
              is called.  2) No more I/Os are being issued.
              3) Re-attempted READs can still be handled.
              (Write completions are handled through rh_dec/
              write_callback - mention above - and do not
              use queue_bio.)
Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

b80aa7a0

dm raid1: handle recovery failures · 8f0205b7

由 Jonathan Brassow 提交于 2月 08, 2008

This patch adds the calls to 'fail_mirror' if an error occurs during
mirror recovery (aka resynchronization).  'fail_mirror' is responsible
for recording the type of error by mirror device and ensuring an event
gets raised for the purpose of notifying userspace.
Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

8f0205b7

dm raid1: handle write failures · 72f4b314

由 Jonathan Brassow 提交于 2月 08, 2008

This patch gives mirror the ability to handle device failures
during normal write operations.

The 'write_callback' function is called when a write completes.
If all the writes failed or succeeded, we report failure or
success respectively.  If some of the writes failed, we call
fail_mirror; which increments the error count for the device, notes
the type of error encountered (DM_RAID1_WRITE_ERROR),  and
selects a new primary (if necessary).  Note that the primary
device can never change while the mirror is not in-sync (IOW,
while recovery is happening.)  This means that the scenario
where a failed write changes the primary and gives
recovery_complete a chance to misread the primary never happens.
The fact that the primary can change has necessitated the change
to the default_mirror field.  We need to protect against reading
garbage while the primary changes.  We then add the bio to a new
list in the mirror set, 'failures'.  For every bio in the 'failures'
list, we call a new function, '__bio_mark_nosync', where we mark
the region 'not-in-sync' in the log and properly set the region
state as, RH_NOSYNC.  Userspace must also be notified of the
failure.  This is done by 'raising an event' (dm_table_event()).
If fail_mirror is called in process context the event can be raised
right away.  If in interrupt context, the event is deferred to the
kmirrord thread - which raises the event if 'event_waiting' is set.

Backwards compatibility is maintained by ignoring errors if
the DM_FEATURES_HANDLE_ERRORS flag is not present.
Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

72f4b314

dm snapshot: combine consecutive exceptions in memory · d74f81f8

由 Milan Broz 提交于 2月 08, 2008

Provided sector_t is 64 bits, reduce the in-memory footprint of the
snapshot exception table by the simple method of using unused bits of
the chunk number to combine consecutive entries.
Signed-off-by: NMilan Broz <mbroz@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

d74f81f8

dm: stripe enhanced status return · 4f7f5c67

由 Brian Wood 提交于 2月 08, 2008

This patch adds additional information to the status line. It is added at the
end of the returned text so it will not interfere with existing
implementations using this data. The addition of this information will allow
for a common return interface to match that returned with the dm-raid1.c
status line (with Jonathan Brassow's patches).

Here is a sample of what is returned with a mirror "status" call:
isw_eeaaabgfg_mirror: 0 488390920 mirror 2 8:16 8:32 3727/3727 1 AA 1 core

Here's what's returned with this patch for a stripe "status" call:
isw_dheeijjdej_stripe: 0 976783872 striped 2 8:16 8:32 1 AA
Signed-off-by: NBrian Wood <brian.j.wood@intel.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

4f7f5c67

dm: stripe trigger event on failure · a25eb944

由 Brian Wood 提交于 2月 08, 2008

This patch adds the stripe_end_io function to process errors that might
occur after an IO operation. As part of this there are a number of
enhancements made to record and trigger events:

- New atomic variable in struct stripe to record the number of
errors each stripe volume device has experienced (could be used
later with uevents to report back directly to userspace)

- New workqueue/work struct setup to process the trigger_event function

- New end_io function. It is here that testing for BIO error conditions
take place. It determines the exact stripe that cause the error,
records this in the new atomic variable, and calls the queue_work() function

- New trigger_event function to process failure events. This
calls dm_table_event()
Signed-off-by: NBrian Wood <brian.j.wood@intel.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

a25eb944

dm log: auto load modules · fb8b2848

由 Jonathan Brassow 提交于 2月 08, 2008

If the log type is not recognised, attempt to load the module
'dm-log-<type>.ko'.
Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

fb8b2848

dm: move deferred bio flushing to workqueue · 304f3f6a

由 Milan Broz 提交于 2月 08, 2008

Add a single-thread workqueue for each mapped device
and move flushing of the lists of pushback and deferred bios
to this new workqueue.
Signed-off-by: NMilan Broz <mbroz@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

304f3f6a

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功