提交 · 278c1ca2f254d0695d2eba79793d20ce785323ea · openeuler / raspberrypi-kernel

19 3月, 2012 12 次提交

md/bitmap: change a 'goto' to a normal 'if' construct. · 278c1ca2

由 NeilBrown 提交于 3月 19, 2012

The use of a goto makes the control flow more obscure here.

So make it a normal:
  if (x) {
     Y;
  }

No functional change.
Signed-off-by: NNeilBrown <neilb@suse.de>

278c1ca2

md/bitmap: move printing of bitmap status to bitmap.c · 57148964

由 NeilBrown 提交于 3月 19, 2012

The part of /proc/mdstat which describes the bitmap should really
be generated by code in bitmap.c.  So move it there.
Signed-off-by: NNeilBrown <neilb@suse.de>

57148964

N
md/bitmap: remove some unused noise from bitmap.h · 4ba97dff
由 NeilBrown 提交于 3月 19, 2012
```
Signed-off-by: NNeilBrown <neilb@suse.de>
```
4ba97dff

md/raid10 - support resizing some RAID10 arrays. · 006a09a0

由 NeilBrown 提交于 3月 19, 2012

'resizing' an array in this context means making use of extra
space that has become available in component devices, not adding new
devices.
It also includes shrinking the array to take up less space of
component devices.

This is not supported for array with a 'far' layout.  However
for 'near' and 'offset' layout arrays, adding and removing space at
the end of the devices is easy to support, and this patch provides
that support.
Signed-off-by: NNeilBrown <neilb@suse.de>

006a09a0

md/raid1: handle merge_bvec_fn in member devices. · 6b740b8d

由 NeilBrown 提交于 3月 19, 2012

Currently we don't honour merge_bvec_fn in member devices so if there
is one, we force all requests to be single-page at most.
This is not ideal.

So create a raid1 merge_bvec_fn to check that function in children
as well.

This introduces a small problem.  There is no locking around calls
the ->merge_bvec_fn and subsequent calls to ->make_request.  So a
device added between these could end up getting a request which
violates its merge_bvec_fn.

Currently the best we can do is synchronize_sched().  This will work
providing no preemption happens.  If there is is preemption, we just
have to hope that new devices are largely consistent with old devices.
Signed-off-by: NNeilBrown <neilb@suse.de>

6b740b8d

md/raid10: handle merge_bvec_fn in member devices. · 050b6615

由 NeilBrown 提交于 3月 19, 2012

Currently we don't honour merge_bvec_fn in member devices so if there
is one, we force all requests to be single-page at most.
This is not ideal.

So enhance the raid10 merge_bvec_fn to check that function in children
as well.

This introduces a small problem.  There is no locking around calls
the ->merge_bvec_fn and subsequent calls to ->make_request.  So a
device added between these could end up getting a request which
violates its merge_bvec_fn.

Currently the best we can do is synchronize_sched().  This will work
providing no preemption happens.  If there is preemption, we just
have to hope that new devices are largely consistent with old devices.
Signed-off-by: NNeilBrown <neilb@suse.de>

050b6615

md: add proper merge_bvec handling to RAID0 and Linear. · ba13da47

由 NeilBrown 提交于 3月 19, 2012

These personalities currently set a max request size of one page
when any member device has a merge_bvec_fn because they don't
bother to call that function.

This causes extra works in splitting and combining requests.

So make the extra effort to call the merge_bvec_fn when it exists
so that we end up with larger requests out the bottom.
Signed-off-by: NNeilBrown <neilb@suse.de>

ba13da47

md: tidy up rdev_for_each usage. · dafb20fa

由 NeilBrown 提交于 3月 19, 2012

md.h has an 'rdev_for_each()' macro for iterating the rdevs in an
mddev.  However it uses the 'safe' version of list_for_each_entry,
and so requires the extra variable, but doesn't include 'safe' in the
name, which is useful documentation.

Consequently some places use this safe version without needing it, and
many use an explicity list_for_each entry.

So:
 - rename rdev_for_each to rdev_for_each_safe
 - create a new rdev_for_each which uses the plain
   list_for_each_entry,
 - use the 'safe' version only where needed, and convert all other
   list_for_each_entry calls to use rdev_for_each.
Signed-off-by: NNeilBrown <neilb@suse.de>

dafb20fa

md/raid1,raid10: avoid deadlock during resync/recovery. · d6b42dcb

由 NeilBrown 提交于 3月 19, 2012

If RAID1 or RAID10 is used under LVM or some other stacking
block device, it is possible to enter a deadlock during
resync or recovery.
This can happen if the upper level block device creates
two requests to the RAID1 or RAID10.  The first request gets
processed, blocks recovery and queue requests for underlying
requests in current->bio_list.  A resync request then starts
which will wait for those requests and block new IO.

But then the second request to the RAID1/10 will be attempted
and it cannot progress until the resync request completes,
which cannot progress until the underlying device requests complete,
which are on a queue behind that second request.

So allow that second request to proceed even though there is
a resync request about to start.

This is suitable for any -stable kernel.

Cc: stable@vger.kernel.org
Reported-by: NRay Morris <support@bettercgi.com>
Tested-by: NRay Morris <support@bettercgi.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

d6b42dcb

md/bitmap: ensure to load bitmap when creating via sysfs. · 4474ca42

由 NeilBrown 提交于 3月 19, 2012

When commit 69e51b44 (md/bitmap:  separate out loading a bitmap...)
created bitmap_load, it missed calling it after bitmap_create when a
bitmap is created through the sysfs interface.
So if a bitmap is added this way, we don't allocate memory properly
and can crash.

This is suitable for any -stable release since 2.6.35.
Cc: stable@vger.kernel.org
Signed-off-by: NNeilBrown <neilb@suse.de>

4474ca42

md: don't set md arrays to readonly on shutdown. · c744a65c

由 NeilBrown 提交于 3月 19, 2012

It seems that with recent kernel, writeback can still be happening
while shutdown is happening, and consequently data can be written
after the md reboot notifier switches all arrays to read-only.
This causes a BUG.

So don't switch them to read-only - just mark them clean and
set 'safemode' to '2' which mean that immediately after any
write the array will be switch back to 'clean'.

This could result in the shutdown happening when array is marked
dirty, thus forcing a resync on reboot.  However if you reboot
without performing a "sync" first, you get to keep both halves.

This is suitable for any stable kernel (though there might be some
conflicts with obvious fixes in earlier kernels).

Cc: stable@vger.kernel.org
Signed-off-by: NNeilBrown <neilb@suse.de>

c744a65c

md: allow re-add to failed arrays. · dc10c643

由 NeilBrown 提交于 3月 19, 2012

When an array is failed (some data inaccessible) then there is no
point attempting to add a spare as it could not possibly be recovered.

However that may be value in re-adding a recently removed device.
e.g. if there is a write-intent-bitmap and it is clear, then access
to the data could be restored by this action.

So don't reject a re-add to a failed array for RAID10 and RAID5 (the
only arrays  types that check for a failed array).
Signed-off-by: NNeilBrown <neilb@suse.de>

dc10c643

13 3月, 2012 4 次提交

M
md/raid5: use atomic_dec_return() instead of atomic_dec() and atomic_read(). · 41fe75f6
由 majianpeng 提交于 3月 13, 2012
```
Signed-off-by: Nmajianpeng <majianpeng@gmail.com>
Signed-off-by: NNeilBrown <neilb@suse.de>
```
41fe75f6

md/raid5: removed unused 'added_devices' variable. · 9d4c7d87

由 NeilBrown 提交于 3月 13, 2012

commit 908f4fbd removed the last user of this variable,
so we should discard it completely.
Signed-off-by: NNeilBrown <neilb@suse.de>

9d4c7d87

md/raid10: remove unnecessary smp_mb() from end_sync_write · 547414d1

由 NeilBrown 提交于 3月 13, 2012

Recent commit 4ca40c2c (md/raid10: Allow replacement device ...)
added an smp_mb in end_sync_write.
This was to close a possible race with raid10_remove_disk.
However there is no such race as it is never attempted to remove a
disk while resync (or recovery) is happening.
so the smp_mb is just noise.
Signed-off-by: NNeilBrown <neilb@suse.de>

547414d1

N
md/raid5: make sure reshape_position is cleared on error path. · 1e3fa9bd
由 NeilBrown 提交于 3月 13, 2012
```
Leaving a valid reshape_position value in place could be confusing.
Signed-off-by: NNeilBrown <neilb@suse.de>
```
1e3fa9bd

08 3月, 2012 8 次提交

dm raid: fix flush support · 0ca93de9

由 Jonathan E Brassow 提交于 3月 07, 2012

Fix dm-raid flush support.

Both md and dm have support for flush, but the dm-raid target
forgot to set the flag to indicate that flushes should be
passed on.  (Important for data integrity e.g. with writeback cache
enabled.)
Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
Acked-by: NMike Snitzer <snitzer@redhat.com>
Cc: stable@kernel.org
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

0ca93de9

dm raid: set MD_CHANGE_DEVS when rebuilding · 3aa3b2b2

由 Jonathan E Brassow 提交于 3月 07, 2012

The 'rebuild' parameter is used to rebuild individual devices in an
array (e.g. resynchronize a RAID1 device or recalculate a parity device
in higher RAID).  The MD_CHANGE_DEVS flag must be set when this
parameter is given in order to write out the superblocks and make the
change take immediate effect.  The code that handles new devices in
super_load already sets MD_CHANGE_DEVS and 'FirstUse'.  (The 'FirstUse'
flag was being set as a special case for rebuilds in
super_init_validation.)

Add a condition for rebuilds in super_load to take care of both flags
without the special case in 'super_init_validation'.
Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
Cc: stable@kernel.org
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

3aa3b2b2

dm thin metadata: decrement counter after removing mapped block · af63bcb8

由 Joe Thornber 提交于 3月 07, 2012

Correct the number of mapped sectors shown on a thin device's
status line by decrementing td->mapped_blocks in __remove() each time
a block is removed.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Acked-by: NMike Snitzer <snitzer@redhat.com>
Cc: stable@kernel.org
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

af63bcb8

dm thin metadata: unlock superblock in init_pmd error path · 4469a5f3

由 Joe Thornber 提交于 3月 07, 2012

If dm_sm_disk_create() fails the superblock must be unlocked.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Acked-by: NMike Snitzer <snitzer@redhat.com>
Cc: stable@kernel.org
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

4469a5f3

dm thin metadata: remove incorrect close_device on creation error paths · 1f3db25d

由 Mike Snitzer 提交于 3月 07, 2012

The __open_device() error paths in __create_thin() and __create_snap()
incorrectly call __close_device() even if td was not initialized by
__open_device().  Remove this.

Also document __open_device() return values, remove a redundant
td->changed = 1 in __create_thin(), and insert an additional
safeguard against creating an already-existing device.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Cc: stable@kernel.org
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

1f3db25d

dm flakey: fix crash on read when corrupt_bio_byte not set · 1212268f

由 Mike Snitzer 提交于 3月 07, 2012

The following BUG is hit on the first read that is submitted to a dm
flakey test device while the device is "down" if the corrupt_bio_byte
feature wasn't requested when the device's table was loaded.

Example DM table that will hit this BUG:
0 2097152 flakey 8:0 2048 0 30

This bug was introduced by commit a3998799
(dm flakey: add corrupt_bio_byte feature) in v3.1-rc1.

BUG: unable to handle kernel paging request at ffff8801cfce3fff
IP: [<ffffffffa008c233>] corrupt_bio_data+0x6e/0xae [dm_flakey]
PGD 1606063 PUD 0
Oops: 0002 [#1] SMP
...
Call Trace:
 <IRQ>
 [<ffffffffa008c2b5>] flakey_end_io+0x42/0x48 [dm_flakey]
 [<ffffffffa00dca98>] clone_endio+0x54/0xb6 [dm_mod]
 [<ffffffff81130587>] bio_endio+0x2d/0x2f
 [<ffffffff811c819a>] req_bio_endio+0x96/0x9f
 [<ffffffff811c94b9>] blk_update_request+0x1dc/0x3a9
 [<ffffffff812f5ee2>] ? rcu_read_unlock+0x21/0x23
 [<ffffffff811c96a6>] blk_update_bidi_request+0x20/0x6e
 [<ffffffff811c9713>] blk_end_bidi_request+0x1f/0x5d
 [<ffffffff811c978d>] blk_end_request+0x10/0x12
 [<ffffffff8128f450>] scsi_io_completion+0x1e5/0x4b1
 [<ffffffff812882a9>] scsi_finish_command+0xec/0xf5
 [<ffffffff8128f830>] scsi_softirq_done+0xff/0x108
 [<ffffffff811ce284>] blk_done_softirq+0x84/0x98
 [<ffffffff81048d19>] __do_softirq+0xe3/0x1d5
 [<ffffffff8138f83f>] ? _raw_spin_lock+0x62/0x69
 [<ffffffff810997cf>] ? handle_irq_event+0x4c/0x61
 [<ffffffff8139833c>] call_softirq+0x1c/0x30
 [<ffffffff81003b37>] do_softirq+0x4b/0xa3
 [<ffffffff81048a39>] irq_exit+0x53/0xca
 [<ffffffff81398acd>] do_IRQ+0x9d/0xb4
 [<ffffffff81390333>] common_interrupt+0x73/0x73
...
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org # 3.1+
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

1212268f

dm io: fix discard support · 0c535e0d

由 Milan Broz 提交于 3月 07, 2012

This patch fixes a crash by recognising discards in dm_io.

Currently dm_mirror can send REQ_DISCARD bios if running over a
discard-enabled device and without support in dm_io the system
crashes badly.

BUG: unable to handle kernel paging request at 00800000
IP:  __bio_add_page.part.17+0xf5/0x1e0
...
 bio_add_page+0x56/0x70
 dispatch_io+0x1cf/0x240 [dm_mod]
 ? km_get_page+0x50/0x50 [dm_mod]
 ? vm_next_page+0x20/0x20 [dm_mod]
 ? mirror_flush+0x130/0x130 [dm_mirror]
 dm_io+0xdc/0x2b0 [dm_mod]
...

Introduced in 2.6.38-rc1 by commit 5fc2ffea
(dm raid1: support discard).
Signed-off-by: NMilan Broz <mbroz@redhat.com>
Cc: stable@kernel.org
Acked-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

0c535e0d

dm ioctl: do not leak argv if target message only contains whitespace · 902c6a96

由 Jesper Juhl 提交于 3月 07, 2012

If 'argc' is zero we jump to the 'out:' label, but this leaks the
(unused) memory that 'dm_split_args()' allocated for 'argv' if the
string being split consisted entirely of whitespace.  Jump to the
'out_argv:' label instead to free up that memory.
Signed-off-by: NJesper Juhl <jj@chaosbits.net>
Cc: stable@kernel.org
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

902c6a96

06 3月, 2012 1 次提交

md/raid10: fix assembling of arrays with replacement devices. · 7a904848

由 NeilBrown 提交于 3月 05, 2012

commit 56a2559b (md/raid10: recognise replacements ...)
changed 'run' to set ->replacement or ->rdev depending on the
'Replacement' status if the device, but it didn't remove the
old unconditional setting of 'rdev'.  So it was largely ineffective.

So remove that now.
Signed-off-by: NNeilBrown <neilb@suse.de>

7a904848

14 2月, 2012 1 次提交

md/raid10: fix handling of error on last working device in array. · fae8cc5e

由 NeilBrown 提交于 2月 14, 2012

If we get a read error on the last working device in a RAID10 which
contains the target block, then we don't fail the device (which is
good) but we don't abort retries, which is wrong.
We end up in an infinite loop retrying the read on the one device.

This patch fixes the problem in two places:
1/ in raid10_end_read_request we don't even ask for a retry if this
   was the last usable device.  This is efficient but a little racy
   and will sometimes retry when it should not.

2/ in handle_read_error we are careful to exclude any device from
   retry which we tried to mark as faulty (that might have failed if
   it was the last device).  This is race-free but less efficient.
Signed-off-by: NNeilBrown <neilb@suse.de>

fae8cc5e

13 2月, 2012 1 次提交

md/raid1: fix buglet in md_raid1_contested. · f53e29fc

由 NeilBrown 提交于 2月 13, 2012

Since we added 'replacement' capability, RAID1 can have twice
as many devices as ->raid_disks indicates.
So md_raid1_congested needs to check that many possible devices,
not just ->raid_disks many.
Signed-off-by: NNeilBrown <neilb@suse.de>

f53e29fc

07 2月, 2012 1 次提交

md: two small fixes to handling interrupt resync. · db91ff55

由 NeilBrown 提交于 2月 07, 2012

1/ If a resync is aborted we should record how far we got
 (recovery_cp) the last request that we know has completed
 (->curr_resync_completed) rather than the last request that was
 submitted (->curr_resync).

2/ When a resync aborts we still want to update the metadata with
 any changes, so set MD_CHANGE_DEVS even if we 'skip'.
Signed-off-by: NNeilBrown <neilb@suse.de>

db91ff55

31 1月, 2012 1 次提交

Prevent DM RAID from loading bitmap twice. · 34f8ac6d

由 Jonathan Brassow 提交于 1月 27, 2012

The life cycle of a device-mapper target is:
1) create
2) resume
3) suspend
*) possibly repeat from 2
4) destroy

The dm-raid target is unconditionally calling MD's bitmap_load function upon
every resume.  If steps 2 & 3 above are repeated, bitmap_load is called
multiple times.  It is only written to be called once; otherwise, it allocates
new memory for the bitmap (without freeing the old) and incrementing the number
of pages it thinks it has without zeroing first.  This ultimately leads to
access beyond allocated memory and lost memory.

Simply avoiding the bitmap_load call upon resume is not sufficient.  If the
target was suspended while the initial recovery was only partially complete,
it needs to be restarted when the target is resumed.  This is why
'md_wakeup_thread' is called before issuing the 'mddev_resume'.
Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

34f8ac6d

15 1月, 2012 1 次提交

dm: do not forward ioctls from logical volumes to the underlying device · ec8013be

由 Paolo Bonzini 提交于 1月 12, 2012

A logical volume can map to just part of underlying physical volume.
In this case, it must be treated like a partition.

Based on a patch from Alasdair G Kergon.

Cc: Alasdair G Kergon <agk@redhat.com>
Cc: dm-devel@redhat.com
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ec8013be

11 1月, 2012 3 次提交

block: Introduce blk_set_stacking_limits function · b1bd055d

由 Martin K. Petersen 提交于 1月 11, 2012

Stacking driver queue limits are typically bounded exclusively by the
capabilities of the low level devices, not by the stacking driver
itself.

This patch introduces blk_set_stacking_limits() which has more liberal
metrics than the default queue limits function. This allows us to
inherit topology parameters from bottom devices without manually
tweaking the default limits in each driver prior to calling the stacking
function.

Since there is now a clear distinction between stacking and low-level
devices, blk_set_default_limits() has been modified to carry the more
conservative values that we used to manually set in
blk_queue_make_request().
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Acked-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b1bd055d

md/raid1: perform bad-block tests for WriteMostly devices too. · 307729c8

由 NeilBrown 提交于 1月 09, 2012

We normally try to avoid reading from write-mostly devices, but when
we do we really have to check for bad blocks and be sure not to
try reading them.

With the current code, best_good_sectors might not get set and that
causes zero-length read requests to be send down which is very
confusing.

This bug was introduced in commit d2eb35ac and so the patch
is suitable for 3.1.x and 3.2.x
Reported-and-tested-by: NMichał Mirosław <mirq-linux@rere.qmqm.pl>
Reported-and-tested-by: NArt -kwaak- van Breemen <ard@telegraafnet.nl>
Signed-off-by: NNeilBrown <neilb@suse.de>
Cc: stable@vger.kernel.org

307729c8

md: notify the 'degraded' sysfs attribute on failure. · f2a371c5

由 NeilBrown 提交于 1月 09, 2012

We currently only 'notify' changes to the 'degraded' attribute
when it decreases, not when it increases.

Notifying on failure is a little awkward as it happen in
interrupt context.
So instead, notify when we remove the failed device from the array,
which is very soon afterwards.
Reported-and-tested-by: NMikhail Balabin <mbalabin@gmail.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

f2a371c5

04 1月, 2012 1 次提交

fs: move code out of buffer.c · ff01bb48

由 Al Viro 提交于 9月 16, 2011

Move invalidate_bdev, block_sync_page into fs/block_dev.c.  Export
kill_bdev as well, so brd doesn't have to open code it.  Reduce
buffer_head.h requirement accordingly.

Removed a rather large comment from invalidate_bdev, as it looked a bit
obsolete to bother moving.  The small comment replacing it says enough.
Signed-off-by: NNick Piggin <npiggin@suse.de>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

ff01bb48

23 12月, 2011 6 次提交

md/raid1: Mark device want_replacement when we see a write error. · 19d67169

由 NeilBrown 提交于 12月 23, 2011

Now that WantReplacement drives are replaced cleanly, mark a drive
as want_replacement when we see a write error.  It might get failed soon so
the WantReplacement flag is irrelevant, but if the write error is recorded
in the bad block log, we still want to activate any spare that might
be available.
Signed-off-by: NNeilBrown <neilb@suse.de>

19d67169

md/raid1: If there is a spare and a want_replacement device, start replacement. · 7ef449d1

由 NeilBrown 提交于 12月 23, 2011

When attempting to add a spare to a RAID1 array, also consider
adding it as a replacement for a want_replacement device.
Signed-off-by: NNeilBrown <neilb@suse.de>

7ef449d1

md/raid1: recognise replacements when assembling arrays. · c19d5798

由 NeilBrown 提交于 12月 23, 2011

If a Replacement is seen, file it as such.

If we see two replacements (or two normal devices) for the one slot,
abort.
Signed-off-by: NNeilBrown <neilb@suse.de>

c19d5798

md/raid1: handle activation of replacement device when recovery completes. · 8c7a2c2b

由 NeilBrown 提交于 12月 23, 2011

When recovery completes ->spare_active is called.
This checks if the replacement is ready and if so it fails
the original.
Signed-off-by: NNeilBrown <neilb@suse.de>

8c7a2c2b

md/raid1: Allow a failed replacement device to be removed. · b014f14c

由 NeilBrown 提交于 12月 23, 2011

Replacement devices are stored at a different offset, so look
there too.
Signed-off-by: NNeilBrown <neilb@suse.de>

b014f14c

md/raid1: Allocate spare to store replacement devices and their bios. · 8f19ccb2

由 NeilBrown 提交于 12月 23, 2011

In RAID1, a replacement is much like a normal device, so we just
double the size of the relevant arrays and look at all possible
devices for reads and writes.

This means that the array looks like it is now double the size in some
way - we need to be careful about that.
In particular, we checking if the array is still degraded while
creating a recovery request we need to only consider the first 'half'
- i.e. the real (non-replacement) devices.
Signed-off-by: NNeilBrown <neilb@suse.de>

8f19ccb2