提交 · 6e0d2d0312fb20c1edac1b2c849068c1c7944abf · openanolis / cloud-kernel

28 7月, 2011 4 次提交

md: add documentation for bad block log · 6e0d2d03

由 Namhyung Kim 提交于 7月 28, 2011

Previous patch in the bad block series added new sysfs interfaces
([unacknowledged_]bad_blocks) for each rdev without documentation.
Add it.
Signed-off-by: NNamhyung Kim <namhyung@gmail.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

6e0d2d03

md/bad-block-log: add sysfs interface for accessing bad-block-log. · 16c791a5

由 NeilBrown 提交于 7月 28, 2011

This can show the log (providing it fits in one page) and
allows bad blocks to be 'acknowledged' meaning that they
have safely been recorded in metadata.

Clearing bad blocks is not allowed via sysfs (except for
code testing).  A bad block can only be cleared when
a write to the block succeeds.
Signed-off-by: NNeilBrown <neilb@suse.de>
Reviewed-by: NNamhyung Kim <namhyung@gmail.com>

16c791a5

md: beginnings of bad block management. · 2230dfe4

由 NeilBrown 提交于 7月 28, 2011

This the first step in allowing md to track bad-blocks per-device so
that we can fail individual blocks rather than the whole device.

This patch just adds a data structure for recording bad blocks, with
routines to add, remove, search the list.
Signed-off-by: NNeilBrown <neilb@suse.de>
Reviewed-by: NNamhyung Kim <namhyung@gmail.com>

2230dfe4

md: remove suspicious size_of() · a519b26d

由 NeilBrown 提交于 7月 28, 2011

When calling bioset_create we pass the size of the front_pad as
   sizeof(mddev)
which looks suspicious as mddev is a pointer and so it looks like a
common mistake where
   sizeof(*mddev)
was intended.
The size is actually correct as we want to store a pointer in the
front padding of the bios created by the bioset, so make the intent
more explicit by using
   sizeof(mddev_t *)
Reported-by: NZdenek Kabelac <zdenek.kabelac@gmail.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

a519b26d

27 7月, 2011 24 次提交

MD: generate an event when array sync is complete · 768e587e

由 Jonathan Brassow 提交于 7月 27, 2011

This patch causes MD to generate an event (for device-mapper) when the
synchronization thread is reaped. This is expected behavior for device-mapper.
Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

768e587e

MD bitmap: Revert DM dirty log hooks · 3520fa4d

由 Jonathan Brassow 提交于 7月 27, 2011


Revert most of commit e384e585
  md/bitmap: prepare for storing write-intent-bitmap via dm-dirty-log.

MD should not need to use DM's dirty log - we decided to use md's
bitmaps instead.

Keeping the DIV_ROUND_UP clean-ups that were part of commit
e384e585, however.
Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

3520fa4d

MD: raid1 s/sysfs_notify_dirent/sysfs_notify_dirent_safe · 654e8b5a

由 Jonathan Brassow 提交于 7月 27, 2011

If device-mapper creates a RAID1 array that includes devices to
be rebuilt, it will deref a NULL pointer when finished because
sysfs is not used by device-mapper instantiated RAID devices.
Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

654e8b5a

md/raid5: Avoid BUG caused by multiple failures. · 8cfa7b0f

由 NeilBrown 提交于 7月 27, 2011

While preparing to write a stripe we keep the parity block or blocks
locked (R5_LOCKED) - towards the end of schedule_reconstruction.

If the array is discovered to have failed before this write completes
we can leave those blocks LOCKED, and init_stripe will notice that a
free stripe still has a locked block and will complain.

So clear the R5_LOCKED flag in handle_failed_stripe, and demote the
'BUG' to a 'WARN_ON'.
Signed-off-by: NNeilBrown <neilb@suse.de>

8cfa7b0f

md/raid10: move rdev->corrected_errors counting · cbea2170

由 Namhyung Kim 提交于 7月 27, 2011

Read errors are considered to corrected if write-back and re-read
cycle is finished without further problems. Thus moving the rdev->
corrected_errors counting after the re-reading looks more reasonable
IMHO.
Signed-off-by: NNamhyung Kim <namhyung@gmail.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

cbea2170

md/raid5: move rdev->corrected_errors counting · ddd5115f

由 Namhyung Kim 提交于 7月 27, 2011

Read errors are considered to corrected if write-back and re-read
cycle is finished without further problems. Thus moving the rdev->
corrected_errors counting after the re-reading looks more reasonable
IMHO.
Signed-off-by: NNamhyung Kim <namhyung@gmail.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

ddd5115f

md/raid1: move rdev->corrected_errors counting · 9d3d8011

由 Namhyung Kim 提交于 7月 27, 2011

Read errors are considered to corrected if write-back and re-read
cycle is finished without further problems. Thus moving the rdev->
corrected_errors counting after the re-reading looks more reasonable
IMHO. Also included a couple of whitespace fixes on sync_page_io().
Signed-off-by: NNamhyung Kim <namhyung@gmail.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

9d3d8011

md: get rid of unnecessary casts on page_address() · 65a06f06

由 Namhyung Kim 提交于 7月 27, 2011

page_address() returns void pointer, so the casts can be removed.
Signed-off-by: NNamhyung Kim <namhyung@gmail.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

65a06f06

md/raid10: Improve decision on whether to fail a device with a read error. · 700c7213

由 NeilBrown 提交于 7月 27, 2011

Normally we would fail a device with a READ error.  However if doing
so causes the array to fail, it is better to leave the device
in place and just return the read error to the caller.

The current test for decide if the array will fail is overly
simplistic.
We have a function 'enough' which can tell if the array is failed or
not, so use it to guide the decision.
Signed-off-by: NNeilBrown <neilb@suse.de>

700c7213

md/raid10: Make use of new recovery_disabled handling · 2bb77736

由 NeilBrown 提交于 7月 27, 2011

When we get a read error during recovery, RAID10 previously
arranged for the recovering device to appear to fail so that
the recovery stops and doesn't restart.  This is misleading and wrong.

Instead, make use of the new recovery_disabled handling and mark
the target device and having recovery disabled.

Add appropriate checks in add_disk and remove_disk so that devices
are removed and not re-added when recovery is disabled.
Signed-off-by: NNeilBrown <neilb@suse.de>

2bb77736

md: change managed of recovery_disabled. · 5389042f

由 NeilBrown 提交于 7月 27, 2011

If we hit a read error while recovering a mirror, we want to abort the
recovery without necessarily failing the disk - as having a disk this
a read error is better than not having an array at all.

Currently this is managed with a per-array flag "recovery_disabled"
and is only implemented for RAID1.  For RAID10 we will need finer
grained control as we might want to disable recovery for individual
devices separately.

So push more of the decision making into the personality.
'recovery_disabled' is now a 'cookie' which is copied when the
personality want to disable recovery and is changed when a device is
added to the array as this is used as a trigger to 'try recovery
again'.

This will allow RAID10 to get the control that it needs.
Signed-off-by: NNeilBrown <neilb@suse.de>

5389042f

md: remove ro check in md_check_recovery() · a478a069

由 Namhyung Kim 提交于 7月 27, 2011

Commit c89a8eee ("Allow faulty devices to be removed from a
readonly array.") added some work on ro array in the function,
but it couldn't be done since we didn't allow the ro array to be
handled from the beginning. Fix it.
Signed-off-by: NNamhyung Kim <namhyung@gmail.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

a478a069

md: introduce link/unlink_rdev() helpers · 36fad858

由 Namhyung Kim 提交于 7月 27, 2011

There are places where sysfs links to rdev are handled
in a same way. Add the helper functions to consolidate
them.
Signed-off-by: NNamhyung Kim <namhyung@gmail.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

36fad858

md/raid: use printk_ratelimited instead of printk_ratelimit · 8bda470e

由 Christian Dietrich 提交于 7月 27, 2011

As per printk_ratelimit comment, it should not be used.
Signed-off-by: NChristian Dietrich <christian.dietrich@informatik.uni-erlangen.de>
Signed-off-by: NNeilBrown <neilb@suse.de>

8bda470e

md: use proper little-endian bitops · a0a02a7a

由 Akinobu Mita 提交于 7月 27, 2011

Using __test_and_{set,clear}_bit_le() with ignoring its return value
can be replaced with __{set,clear}_bit_le().
Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
Cc: NeilBrown <neilb@suse.de>
Cc: linux-raid@vger.kernel.org
Signed-off-by: NNeilBrown <neilb@suse.de>

a0a02a7a

md/raid5: finalise new merged handle_stripe. · acfe726b

由 NeilBrown 提交于 7月 27, 2011

handle_stripe5() and handle_stripe6() are now virtually identical.
So discard one and rename the other to 'analyse_stripe()'.

It always returns 0, so change it to 'void' and remove the 'done'
variable in handle_stripe().
Signed-off-by: NNeilBrown <neilb@suse.de>
Reviewed-by: NNamhyung Kim <namhyung@gmail.com>

acfe726b

md/raid5: move some more common code into handle_stripe · 474af965

由 NeilBrown 提交于 7月 27, 2011

The RAID6 version of this code is usable for RAID5 providing:
  - we test "conf->max_degraded" rather than "2" as appropriate
  - we make sure s->failed_num[1] is meaningful (and not '-1')
    when s->failed > 1

The 'return 1' must become 'goto finish' in the new location.
Signed-off-by: NNeilBrown <neilb@suse.de>
Reviewed-by: NNamhyung Kim <namhyung@gmail.com>

474af965

md/raid5: move more common code into handle_stripe · 84789554

由 NeilBrown 提交于 7月 27, 2011

Apart from 'prexor' which can only be set for RAID5, and
'qd_idx' which can only be meaningful for RAID6, these two
chunks of code are nearly the same.

So combine them into one adding a test to call either
handle_parity_checks5 or handle_parity_checks6 as appropriate.
Signed-off-by: NNeilBrown <neilb@suse.de>
Reviewed-by: NNamhyung Kim <namhyung@gmail.com>

84789554

md/raid5: unite handle_stripe_dirtying5 and handle_stripe_dirtying6 · c8ac1803

由 NeilBrown 提交于 7月 27, 2011

RAID6 is only allowed to choose 'reconstruct-write' while RAID5 is
also allow 'read-modify-write'
Apart from this difference, handle_stripe_dirtying[56] are nearly
identical.  So resolve these differences and create just one function.
Signed-off-by: NNeilBrown <neilb@suse.de>
Reviewed-by: NNamhyung Kim <namhyung@gmail.com>

c8ac1803

md/raid5: unite fetch_block5 and fetch_block6 · 93b3dbce

由 NeilBrown 提交于 7月 27, 2011

Provided that ->failed_num[1] is not a valid device number (which is
easily achieved) fetch_block6 provides all the functionality of
fetch_block5.

So remove the latter and rename the former to simply "fetch_block".

Then handle_stripe_fill5 and handle_stripe_fill6 become the same and
can similarly be united.
Signed-off-by: NNeilBrown <neilb@suse.de>
Reviewed-by: NNamhyung Kim <namhyung@gmail.com>

93b3dbce

md/raid5: rearrange a test in fetch_block6. · 5d35e09c

由 NeilBrown 提交于 7月 27, 2011

Next patch will unite fetch_block5 and fetch_block6.
First I want to make the differences a little more clear.

For RAID6 if we are writing at all and there is a failed device, then
we need to load or compute every block so we can do a
reconstruct-write.
This case isn't needed for RAID5 - we will do a read-modify-write in
that case.
So make that test a separate test in fetch_block6 rather than merged
with two other tests.

Make a similar change in fetch_block5 so the one bit that is not
needed for RAID6 is clearly separate.
Signed-off-by: NNeilBrown <neilb@suse.de>
Reviewed-by: NNamhyung Kim <namhyung@gmail.com>

5d35e09c

md/raid5: move more code into common handle_stripe · c5a31000

由 NeilBrown 提交于 7月 27, 2011

The difference between the RAID5 and RAID6 code here is easily
resolved using conf->max_degraded.
Signed-off-by: NNeilBrown <neilb@suse.de>
Reviewed-by: NNamhyung Kim <namhyung@gmail.com>

c5a31000

md/raid5: Move code for finishing a reconstruction into handle_stripe. · 3687c061

由 NeilBrown 提交于 7月 27, 2011

Prior to commit ab69ae12 the code in handle_stripe5 and
handle_stripe6 to "Finish reconstruct operations initiated by the
expansion process" was identical.
That commit added an identical stanza of code to each function, but in
different places.  That was careless.

The raid5 code was correct, so move that out into handle_stripe and
remove raid6 version.
Signed-off-by: NNeilBrown <neilb@suse.de>
Reviewed-by: NNamhyung Kim <namhyung@gmail.com>

3687c061

md/raid5: Remove stripe_head_state arg from handle_stripe_expansion. · 86c374ba

由 NeilBrown 提交于 7月 27, 2011

This arg is only used to differentiate between RAID5 and RAID6 but
that is not needed.  For RAID5, raid5_compute_sector will set qd_idx
to "~0" so j with certainly not equals qd_idx, so there is no need
for a guard on that condition.

So remove the guard and remove the arg from the declaration and
callers of handle_stripe_expansion.
Signed-off-by: NNeilBrown <neilb@suse.de>
Reviewed-by: NNamhyung Kim <namhyung@gmail.com>

86c374ba

26 7月, 2011 7 次提交

md/raid5: move stripe_head_state and more code into handle_stripe. · cc94015a

由 NeilBrown 提交于 7月 26, 2011

By defining the 'stripe_head_state' in 'handle_stripe', we can move
some common code out of handle_stripe[56]() and into handle_stripe.

The means that all accesses for stripe_head_state in handle_stripe[56]
need to be 's->' instead of 's.', but the compiler should inline
those functions and just use a direct stack reference, and future
patches while hoist most of this code up into handle_stripe()
so we will revert to "s.".
Signed-off-by: NNeilBrown <neilb@suse.de>
Reviewed-by: NNamhyung Kim <namhyung@gmail.com>

cc94015a

md/raid5: add some more fields to stripe_head_state · c5709ef6

由 NeilBrown 提交于 7月 26, 2011

Adding these three fields will allow more common code to be moved
to handle_stripe()

struct field rearrangement by Namhyung Kim.
Signed-off-by: NNeilBrown <neilb@suse.de>
Reviewed-by: NNamhyung Kim <namhyung@gmail.com>

c5709ef6

md/raid5: unify stripe_head_state and r6_state · f2b3b44d

由 NeilBrown 提交于 7月 26, 2011

'struct stripe_head_state' stores state about the 'current' stripe
that is passed around while handling the stripe.
For RAID6 there is an extension structure: r6_state, which is also
passed around.
There is no value in keeping these separate, so move the fields from
the latter into the former.

This means that all code now needs to treat s->failed_num as an small
array, but this is a small cost.
Signed-off-by: NNeilBrown <neilb@suse.de>
Reviewed-by: NNamhyung Kim <namhyung@gmail.com>

f2b3b44d

md/raid5: move common code into handle_stripe · 82e5a171

由 NeilBrown 提交于 7月 26, 2011

There is common code at the start of handle_stripe5 and
handle_stripe6.  Move it into handle_stripe.
Signed-off-by: NNeilBrown <neilb@suse.de>
Reviewed-by: NNamhyung Kim <namhyung@gmail.com>

82e5a171

md/raid5: replace sh->lock with an 'active' flag. · c4c1663b

由 NeilBrown 提交于 7月 26, 2011

sh->lock is now mainly used to ensure that two threads aren't running
in the locked part of handle_stripe[56] at the same time.

That can more neatly be achieved with an 'active' flag which we set
while running handle_stripe.  If we find the flag is set, we simply
requeue the stripe for later by setting STRIPE_HANDLE.

For safety we take ->device_lock while examining the state of the
stripe and creating a summary in 'stripe_head_state / r6_state'.
This possibly isn't needed but as shared fields like ->toread,
->towrite are checked it is safer for now at least.

We leave the label after the old 'unlock' called "unlock" because it
will disappear in a few patches, so renaming seems pointless.

This leaves the stripe 'locked' for longer as we clear STRIPE_ACTIVE
later, but that is not a problem.
Signed-off-by: NNeilBrown <neilb@suse.de>
Reviewed-by: NNamhyung Kim <namhyung@gmail.com>

c4c1663b

md/raid5: Protect some more code with ->device_lock. · cbe47ec5

由 NeilBrown 提交于 7月 26, 2011

Other places that change or follow dev->towrite and dev->written take
the device_lock as well as the sh->lock.
So it should really be held in these places too.
Also, doing so will allow sh->lock to be discarded.

with merged fixes by: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: NNeilBrown <neilb@suse.de>
Reviewed-by: NNamhyung Kim <namhyung@gmail.com>

cbe47ec5

md/raid5: Remove use of sh->lock in sync_request · 83206d66

由 NeilBrown 提交于 7月 26, 2011

This is the start of a series of patches to remove sh->lock.

sync_request takes sh->lock before setting STRIPE_SYNCING to ensure
there is no race with testing it in handle_stripe[56].

Instead, use a new flag STRIPE_SYNC_REQUESTED and test it early
in handle_stripe[56] (after getting the same lock) and perform the
same set/clear operations if it was set.
Signed-off-by: NNeilBrown <neilb@suse.de>
Reviewed-by: NNamhyung Kim <namhyung@gmail.com>

83206d66

18 7月, 2011 5 次提交

md/raid5: get rid of duplicated call to bio_data_dir() · ffd96e35

由 Namhyung Kim 提交于 7月 18, 2011

In raid5::make_request(), once bio_data_dir(@bi) is detected
it never (and couldn't) be changed. Use the result always.
Signed-off-by: NNamhyung Kim <namhyung@gmail.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

ffd96e35

md/raid5: use kmem_cache_zalloc() · 6ce32846

由 Namhyung Kim 提交于 7月 18, 2011

Replace kmem_cache_alloc + memset(,0,) to kmem_cache_zalloc.
I think it's not harmful since @conf->slab_cache already knows
actual size of struct stripe_head.
Signed-off-by: NNamhyung Kim <namhyung@gmail.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

6ce32846

md/raid10: share pages between read and write bio's during recovery · c65060ad

由 Namhyung Kim 提交于 7月 18, 2011

When performing a recovery, only first 2 slots in r10_bio are in use,
for read and write respectively. However all of pages in the write bio
are never used and just replaced to read bio's when the read completes.

Get rid of those unused pages and share read pages properly.
Signed-off-by: NNamhyung Kim <namhyung@gmail.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

c65060ad

md/raid10: factor out common bio handling code · 778ca018

由 Namhyung Kim 提交于 7月 18, 2011

When normal-write and sync-read/write bio completes, we should
find out the disk number the bio belongs to. Factor those common
code out to a separate function.
Signed-off-by: NNamhyung Kim <namhyung@gmail.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

778ca018

md/raid10: get rid of duplicated conditional expression · 2c4193df

由 Namhyung Kim 提交于 7月 18, 2011

Variable 'first' is initialized to zero and updated to @rdev->raid_disk
only if it is greater than 0. Thus condition '>= first' always implies
'>= 0' so the latter is not needed.
Signed-off-by: NNamhyung Kim <namhyung@gmail.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

2c4193df

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功