提交 · af63bcb817cf708f53bcae6edc2e3fb7dd7d8051 · openeuler / Kernel

08 3月, 2012 6 次提交

dm thin metadata: decrement counter after removing mapped block · af63bcb8

由 Joe Thornber 提交于 3月 07, 2012

Correct the number of mapped sectors shown on a thin device's
status line by decrementing td->mapped_blocks in __remove() each time
a block is removed.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Acked-by: NMike Snitzer <snitzer@redhat.com>
Cc: stable@kernel.org
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

af63bcb8

dm thin metadata: unlock superblock in init_pmd error path · 4469a5f3

由 Joe Thornber 提交于 3月 07, 2012

If dm_sm_disk_create() fails the superblock must be unlocked.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Acked-by: NMike Snitzer <snitzer@redhat.com>
Cc: stable@kernel.org
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

4469a5f3

dm thin metadata: remove incorrect close_device on creation error paths · 1f3db25d

由 Mike Snitzer 提交于 3月 07, 2012

The __open_device() error paths in __create_thin() and __create_snap()
incorrectly call __close_device() even if td was not initialized by
__open_device().  Remove this.

Also document __open_device() return values, remove a redundant
td->changed = 1 in __create_thin(), and insert an additional
safeguard against creating an already-existing device.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Cc: stable@kernel.org
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

1f3db25d

dm flakey: fix crash on read when corrupt_bio_byte not set · 1212268f

由 Mike Snitzer 提交于 3月 07, 2012

The following BUG is hit on the first read that is submitted to a dm
flakey test device while the device is "down" if the corrupt_bio_byte
feature wasn't requested when the device's table was loaded.

Example DM table that will hit this BUG:
0 2097152 flakey 8:0 2048 0 30

This bug was introduced by commit a3998799
(dm flakey: add corrupt_bio_byte feature) in v3.1-rc1.

BUG: unable to handle kernel paging request at ffff8801cfce3fff
IP: [<ffffffffa008c233>] corrupt_bio_data+0x6e/0xae [dm_flakey]
PGD 1606063 PUD 0
Oops: 0002 [#1] SMP
...
Call Trace:
 <IRQ>
 [<ffffffffa008c2b5>] flakey_end_io+0x42/0x48 [dm_flakey]
 [<ffffffffa00dca98>] clone_endio+0x54/0xb6 [dm_mod]
 [<ffffffff81130587>] bio_endio+0x2d/0x2f
 [<ffffffff811c819a>] req_bio_endio+0x96/0x9f
 [<ffffffff811c94b9>] blk_update_request+0x1dc/0x3a9
 [<ffffffff812f5ee2>] ? rcu_read_unlock+0x21/0x23
 [<ffffffff811c96a6>] blk_update_bidi_request+0x20/0x6e
 [<ffffffff811c9713>] blk_end_bidi_request+0x1f/0x5d
 [<ffffffff811c978d>] blk_end_request+0x10/0x12
 [<ffffffff8128f450>] scsi_io_completion+0x1e5/0x4b1
 [<ffffffff812882a9>] scsi_finish_command+0xec/0xf5
 [<ffffffff8128f830>] scsi_softirq_done+0xff/0x108
 [<ffffffff811ce284>] blk_done_softirq+0x84/0x98
 [<ffffffff81048d19>] __do_softirq+0xe3/0x1d5
 [<ffffffff8138f83f>] ? _raw_spin_lock+0x62/0x69
 [<ffffffff810997cf>] ? handle_irq_event+0x4c/0x61
 [<ffffffff8139833c>] call_softirq+0x1c/0x30
 [<ffffffff81003b37>] do_softirq+0x4b/0xa3
 [<ffffffff81048a39>] irq_exit+0x53/0xca
 [<ffffffff81398acd>] do_IRQ+0x9d/0xb4
 [<ffffffff81390333>] common_interrupt+0x73/0x73
...
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org # 3.1+
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

1212268f

dm io: fix discard support · 0c535e0d

由 Milan Broz 提交于 3月 07, 2012

This patch fixes a crash by recognising discards in dm_io.

Currently dm_mirror can send REQ_DISCARD bios if running over a
discard-enabled device and without support in dm_io the system
crashes badly.

BUG: unable to handle kernel paging request at 00800000
IP:  __bio_add_page.part.17+0xf5/0x1e0
...
 bio_add_page+0x56/0x70
 dispatch_io+0x1cf/0x240 [dm_mod]
 ? km_get_page+0x50/0x50 [dm_mod]
 ? vm_next_page+0x20/0x20 [dm_mod]
 ? mirror_flush+0x130/0x130 [dm_mirror]
 dm_io+0xdc/0x2b0 [dm_mod]
...

Introduced in 2.6.38-rc1 by commit 5fc2ffea
(dm raid1: support discard).
Signed-off-by: NMilan Broz <mbroz@redhat.com>
Cc: stable@kernel.org
Acked-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

0c535e0d

dm ioctl: do not leak argv if target message only contains whitespace · 902c6a96

由 Jesper Juhl 提交于 3月 07, 2012

If 'argc' is zero we jump to the 'out:' label, but this leaks the
(unused) memory that 'dm_split_args()' allocated for 'argv' if the
string being split consisted entirely of whitespace.  Jump to the
'out_argv:' label instead to free up that memory.
Signed-off-by: NJesper Juhl <jj@chaosbits.net>
Cc: stable@kernel.org
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

902c6a96

07 2月, 2012 1 次提交

md: two small fixes to handling interrupt resync. · db91ff55

由 NeilBrown 提交于 2月 07, 2012

1/ If a resync is aborted we should record how far we got
 (recovery_cp) the last request that we know has completed
 (->curr_resync_completed) rather than the last request that was
 submitted (->curr_resync).

2/ When a resync aborts we still want to update the metadata with
 any changes, so set MD_CHANGE_DEVS even if we 'skip'.
Signed-off-by: NNeilBrown <neilb@suse.de>

db91ff55

31 1月, 2012 1 次提交

Prevent DM RAID from loading bitmap twice. · 34f8ac6d

由 Jonathan Brassow 提交于 1月 27, 2012

The life cycle of a device-mapper target is:
1) create
2) resume
3) suspend
*) possibly repeat from 2
4) destroy

The dm-raid target is unconditionally calling MD's bitmap_load function upon
every resume.  If steps 2 & 3 above are repeated, bitmap_load is called
multiple times.  It is only written to be called once; otherwise, it allocates
new memory for the bitmap (without freeing the old) and incrementing the number
of pages it thinks it has without zeroing first.  This ultimately leads to
access beyond allocated memory and lost memory.

Simply avoiding the bitmap_load call upon resume is not sufficient.  If the
target was suspended while the initial recovery was only partially complete,
it needs to be restarted when the target is resumed.  This is why
'md_wakeup_thread' is called before issuing the 'mddev_resume'.
Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

34f8ac6d

15 1月, 2012 1 次提交

dm: do not forward ioctls from logical volumes to the underlying device · ec8013be

由 Paolo Bonzini 提交于 1月 12, 2012

A logical volume can map to just part of underlying physical volume.
In this case, it must be treated like a partition.

Based on a patch from Alasdair G Kergon.

Cc: Alasdair G Kergon <agk@redhat.com>
Cc: dm-devel@redhat.com
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ec8013be

11 1月, 2012 3 次提交

block: Introduce blk_set_stacking_limits function · b1bd055d

由 Martin K. Petersen 提交于 1月 11, 2012

Stacking driver queue limits are typically bounded exclusively by the
capabilities of the low level devices, not by the stacking driver
itself.

This patch introduces blk_set_stacking_limits() which has more liberal
metrics than the default queue limits function. This allows us to
inherit topology parameters from bottom devices without manually
tweaking the default limits in each driver prior to calling the stacking
function.

Since there is now a clear distinction between stacking and low-level
devices, blk_set_default_limits() has been modified to carry the more
conservative values that we used to manually set in
blk_queue_make_request().
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Acked-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b1bd055d

md/raid1: perform bad-block tests for WriteMostly devices too. · 307729c8

由 NeilBrown 提交于 1月 09, 2012

We normally try to avoid reading from write-mostly devices, but when
we do we really have to check for bad blocks and be sure not to
try reading them.

With the current code, best_good_sectors might not get set and that
causes zero-length read requests to be send down which is very
confusing.

This bug was introduced in commit d2eb35ac and so the patch
is suitable for 3.1.x and 3.2.x
Reported-and-tested-by: NMichał Mirosław <mirq-linux@rere.qmqm.pl>
Reported-and-tested-by: NArt -kwaak- van Breemen <ard@telegraafnet.nl>
Signed-off-by: NNeilBrown <neilb@suse.de>
Cc: stable@vger.kernel.org

307729c8

md: notify the 'degraded' sysfs attribute on failure. · f2a371c5

由 NeilBrown 提交于 1月 09, 2012

We currently only 'notify' changes to the 'degraded' attribute
when it decreases, not when it increases.

Notifying on failure is a little awkward as it happen in
interrupt context.
So instead, notify when we remove the failed device from the array,
which is very soon afterwards.
Reported-and-tested-by: NMikhail Balabin <mbalabin@gmail.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

f2a371c5

04 1月, 2012 1 次提交

fs: move code out of buffer.c · ff01bb48

由 Al Viro 提交于 9月 16, 2011

Move invalidate_bdev, block_sync_page into fs/block_dev.c.  Export
kill_bdev as well, so brd doesn't have to open code it.  Reduce
buffer_head.h requirement accordingly.

Removed a rather large comment from invalidate_bdev, as it looked a bit
obsolete to bother moving.  The small comment replacing it says enough.
Signed-off-by: NNick Piggin <npiggin@suse.de>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

ff01bb48

23 12月, 2011 27 次提交

md/raid1: Mark device want_replacement when we see a write error. · 19d67169

由 NeilBrown 提交于 12月 23, 2011

Now that WantReplacement drives are replaced cleanly, mark a drive
as want_replacement when we see a write error.  It might get failed soon so
the WantReplacement flag is irrelevant, but if the write error is recorded
in the bad block log, we still want to activate any spare that might
be available.
Signed-off-by: NNeilBrown <neilb@suse.de>

19d67169

md/raid1: If there is a spare and a want_replacement device, start replacement. · 7ef449d1

由 NeilBrown 提交于 12月 23, 2011

When attempting to add a spare to a RAID1 array, also consider
adding it as a replacement for a want_replacement device.
Signed-off-by: NNeilBrown <neilb@suse.de>

7ef449d1

md/raid1: recognise replacements when assembling arrays. · c19d5798

由 NeilBrown 提交于 12月 23, 2011

If a Replacement is seen, file it as such.

If we see two replacements (or two normal devices) for the one slot,
abort.
Signed-off-by: NNeilBrown <neilb@suse.de>

c19d5798

md/raid1: handle activation of replacement device when recovery completes. · 8c7a2c2b

由 NeilBrown 提交于 12月 23, 2011

When recovery completes ->spare_active is called.
This checks if the replacement is ready and if so it fails
the original.
Signed-off-by: NNeilBrown <neilb@suse.de>

8c7a2c2b

md/raid1: Allow a failed replacement device to be removed. · b014f14c

由 NeilBrown 提交于 12月 23, 2011

Replacement devices are stored at a different offset, so look
there too.
Signed-off-by: NNeilBrown <neilb@suse.de>

b014f14c

md/raid1: Allocate spare to store replacement devices and their bios. · 8f19ccb2

由 NeilBrown 提交于 12月 23, 2011

In RAID1, a replacement is much like a normal device, so we just
double the size of the relevant arrays and look at all possible
devices for reads and writes.

This means that the array looks like it is now double the size in some
way - we need to be careful about that.
In particular, we checking if the array is still degraded while
creating a recovery request we need to only consider the first 'half'
- i.e. the real (non-replacement) devices.
Signed-off-by: NNeilBrown <neilb@suse.de>

8f19ccb2

md/raid1: Replace use of mddev->raid_disks with conf->raid_disks. · 30194636

由 NeilBrown 提交于 12月 23, 2011

In general mddev->raid_disks can change unexpectedly while
conf->raid_disks will only change in a very controlled way.  So change
some uses of one to the other.

The use of mddev->raid_disks will not cause actually problems but
this way is more consistent and safer in the long term.
Signed-off-by: NNeilBrown <neilb@suse.de>

30194636

md/raid10: If there is a spare and a want_replacement device, start replacement. · b7044d41

由 NeilBrown 提交于 12月 23, 2011

When attempting to add a spare to a RAID10 array, also consider
adding it as a replacement for a want_replacement device.
Signed-off-by: NNeilBrown <neilb@suse.de>

b7044d41

md/raid10: recognise replacements when assembling array. · 56a2559b

由 NeilBrown 提交于 12月 23, 2011

If a Replacement is seen, file it as such.

If we see two replacements (or two normal devices) for the one slot,
abort.
Signed-off-by: NNeilBrown <neilb@suse.de>

56a2559b

md/raid10: Allow replacement device to be replace old drive. · 4ca40c2c

由 NeilBrown 提交于 12月 23, 2011

When recovery finish and spare_active is called, check for a
replace that might have just become fully synced and mark it
as such, marking the original as failed.

Then when the original is removed, move the replacement into
its position.

This means that 'replacement' and spontaneously become NULL in some
situations.  Make sure we check for those.
It also means that 'rdev' and 'replacement' could appear to be
identical - check for that too.
Signed-off-by: NNeilBrown <neilb@suse.de>

4ca40c2c

md/raid10: handle recovery of replacement devices. · 24afd80d

由 NeilBrown 提交于 12月 23, 2011

If there is a replacement device, then recover to it,
reading from any drives - maybe the one being replaced, maybe not.
Signed-off-by: NNeilBrown <neilb@suse.de>

24afd80d

md/raid10: Handle replacement devices during resync. · 9ad1aefc

由 NeilBrown 提交于 12月 23, 2011

If we need to resync an array which has replacement devices,
we always write any block checked to every replacement.

If the resync was bitmap-based resync we will then complete the
replacement normally.
If it was a full resync, we mark the replacements as fully recovered
when the resync finishes so no further recovery is needed.
Signed-off-by: NNeilBrown <neilb@suse.de>

9ad1aefc

md/raid10: writes should get directed to replacement as well as original. · 475b0321

由 NeilBrown 提交于 12月 23, 2011

When writing, we need to submit two writes, one to the original,
and one to the replacements - if there is a replacement.

If the write to the replacement results in a write error we just
fail the device.  We only try to record write errors to the
original.

This only handles writing new data.  Writing for resync/recovery
will come later.
Signed-off-by: NNeilBrown <neilb@suse.de>

475b0321

md/raid10: allow removal of failed replacement devices. · c8ab903e

由 NeilBrown 提交于 12月 23, 2011

Enhance raid10_remove_disk to be able to remove ->replacement
as well as ->rdev
Signed-off-by: NNeilBrown <neilb@suse.de>

c8ab903e

md/raid10: preferentially read from replacement device if possible. · abbf098e

由 NeilBrown 提交于 12月 23, 2011

When reading (for array reads, not for recovery etc) we read from the
replacement device if it has recovered far enough.
This requires storing the chosen rdev in the 'r10_bio' so we can make
sure to drop the ref on the right device when the read finishes.
Signed-off-by: NNeilBrown <neilb@suse.de>

abbf098e

md/raid10: change read_balance to return an rdev · 96c3fd1f

由 NeilBrown 提交于 12月 23, 2011

It makes more sense to return an rdev than just an index as
read_balance() gets a reference to the rdev and so returning
the pointer make this more idiomatic.

This will be needed in a future patch when we might return
a 'replacement' rdev instead of the main rdev.
Signed-off-by: NNeilBrown <neilb@suse.de>

96c3fd1f

md/raid10: prepare data structures for handling replacement. · 69335ef3

由 NeilBrown 提交于 12月 23, 2011

Allow each slot in the RAID10 to have 2 devices, the want_replacement
and the replacement.

Also an r10bio to have 2 bios, and for resync/recovery allocate the
second bio if there are any replacement devices.
Signed-off-by: NNeilBrown <neilb@suse.de>

69335ef3

md/raid5: Mark device want_replacement when we see a write error. · 3a6de292

由 NeilBrown 提交于 12月 23, 2011

Now that WantReplacement drives are replaced cleanly, mark a drive
as WantReplacement when we see a write error.  It might get failed soon so
the WantReplacement flag is irrelevant, but if the write error is recorded
in the bad block log, we still want to activate any spare that might
be available.
Reviewed-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

3a6de292

md/raid5: If there is a spare and a want_replacement device, start replacement. · 7bfec5f3

由 NeilBrown 提交于 12月 23, 2011

When attempting to add a spare to a RAID[456] array, also consider
adding it as a replacement for a want_replacement device.

This requires that common md code attempt hot_add even when the array
is not formally degraded.
Reviewed-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

7bfec5f3

md/raid5: recognise replacements when assembling array. · 17045f52

由 NeilBrown 提交于 12月 23, 2011

If a Replacement is seen, file it as such.

If we see two replacements (or two normal devices) for the one slot,
abort.
Reviewed-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

17045f52

md/raid5: handle activation of replacement device when recovery completes. · dd054fce

由 NeilBrown 提交于 12月 23, 2011

When recovery completes - as reported by a call to ->spare_active,
we clear In_sync on the original and set it on the replacement.

Then when the original gets removed we move the replacement from
'replacement' to 'rdev'.

This could race with other code that is looking at these pointers,
so we use memory barriers and careful ordering to ensure that
a reader might see one device twice, but never no devices.
Then the readers guard against using both devices, which could
only happen when writing.
Signed-off-by: NNeilBrown <neilb@suse.de>

dd054fce

md/raid5: detect and handle replacements during recovery. · 9a3e1101

由 NeilBrown 提交于 12月 23, 2011

During recovery we want to write to the replacement but not
the original.  So we have two new flags
 - R5_NeedReplace if this stripe has a replacement that needs to
   be written at some stage
 - R5_WantReplace if NeedReplace, and the data is available, and
   a 'sync' has been requested on this stripe.

We also distinguish between 'sync and replace' which need to read
all other devices, and 'replace' which only needs to read the
devices being replaced.

Note that during resync we always write to any replacement device.
It might not need to be written to, but as we don't read to compare,
we have to write to be sure.
Signed-off-by: NNeilBrown <neilb@suse.de>

9a3e1101

md/raid5: writes should get directed to replacement as well as original. · 977df362

由 NeilBrown 提交于 12月 23, 2011

When writing, we need to submit two writes, one to the original, and
one to the replacement - if there is a replacement.

If the write to the replacement results in a write error, we just fail
the device.  We only try to record write errors to the original.

When writing for recovery, we shouldn't write to the original.  This
will be addressed in a subsequent patch that generally addresses
recovery.
Reviewed-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

977df362

md/raid5: allow removal for failed replacement devices. · 657e3e4d

由 NeilBrown 提交于 12月 23, 2011

Enhance raid5_remove_disk to be able to remove ->replacement
as well as ->rdev.
Reviewed-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

657e3e4d

md/raid5: preferentially read from replacement device if possible. · 14a75d3e

由 NeilBrown 提交于 12月 23, 2011

If a replacement device is present and has been recovered far enough,
then use it for reading into the stripe cache.

If we get an error we don't try to repair it, we just fail the device.
A replacement device that gives errors does not sound sensible.

This requires removing the setting of R5_ReadError when we get
a read error during a read that bypasses the cache.  It was probably
a bad idea anyway as we don't know that every block in the read
caused an error, and it could cause ReadError to be set for the
replacement device, which is bad.
Signed-off-by: NNeilBrown <neilb@suse.de>

14a75d3e

md/raid5: remove redundant bio initialisations. · 995c4275

由 NeilBrown 提交于 12月 23, 2011

We current initialise some fields of a bio when preparing a
stripe_head, and again just before submitting the request.

Remove the duplication by only setting the fields that lower level
devices don't touch in raid5_build_block, and only set the changeable
fields in ops_run_io.
Reviewed-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

995c4275

md/raid5: raid5.h cleanup · ede7ee8b

由 NeilBrown 提交于 12月 23, 2011

Remove some #defines that are no longer used, and replace some
others with an enum.
And remove an unused field.
Reviewed-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

ede7ee8b

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功