提交 · 64590f45ddc7147fa1968147a1f5b5c436b728fe · openeuler / Kernel

04 2月, 2015 3 次提交

md: make merge_bvec_fn more robust in face of personality changes. · 64590f45

由 NeilBrown 提交于 12月 15, 2014

There is no locking around calls to merge_bvec_fn(), so
it is possible that calls which coincide with a level (or personality)
change could go wrong.

So create a central dispatch point for these functions and use
rcu_read_lock().
If the array is suspended, reject any merge that can be rejected.
If not, we know it is safe to call the function.
Signed-off-by: NNeilBrown <neilb@suse.de>

64590f45

md: make ->congested robust against personality changes. · 5c675f83

由 NeilBrown 提交于 12月 15, 2014

There is currently no locking around calls to the 'congested'
bdi function.  If called at an awkward time while an array is
being converted from one level (or personality) to another, there
is a tiny chance of running code in an unreferenced module etc.

So add a 'congested' function to the md_personality operations
structure, and call it with appropriate locking from a central
'mddev_congested'.

When the array personality is changing the array will be 'suspended'
so no IO is processed.
If mddev_congested detects this, it simply reports that the
array is congested, which is a safe guess.
As mddev_suspend calls synchronize_rcu(), mddev_congested can
avoid races by included the whole call inside an rcu_read_lock()
region.
This require that the congested functions for all subordinate devices
can be run under rcu_lock.  Fortunately this is the case.
Signed-off-by: NNeilBrown <neilb@suse.de>

5c675f83

md: rename mddev->write_lock to mddev->lock · 85572d7c

由 NeilBrown 提交于 12月 15, 2014

This lock is used for (slightly) more than helping with writing
superblocks, and it will soon be extended further.  So the
name is inappropriate.

Also, the _irq variant hasn't been needed since 2.6.37 as it is
never taking from interrupt or bh context.

So:
  -rename write_lock to lock
  -document what it protects
  -remove _irq ... except in md_flush_request() as there
     is no wait_event_lock() (with no _irq).  This can be
     cleaned up after appropriate changes to wait.h.
Signed-off-by: NNeilBrown <neilb@suse.de>

85572d7c

14 10月, 2014 1 次提交
- N
  md: remove unwanted white space from md.c · f72ffdd6
  由 NeilBrown 提交于 9月 30, 2014
```
My editor shows much of this is RED.
Signed-off-by: NNeilBrown <neilb@suse.de>
```
  f72ffdd6
09 4月, 2014 1 次提交

md/bitmap: don't abuse i_writecount for bitmap files. · 035328c2

由 NeilBrown 提交于 4月 09, 2014

md bitmap code currently tries to use i_writecount to stop any other
process from writing to out bitmap file.  But that is really an abuse
and has bit-rotted so locking is all wrong.

So discard that - root should be allowed to shoot self in foot.

Still use it in a much less intrusive way to stop the same file being
used as bitmap on two different array, and apply other checks to
ensure the file is at least vaguely usable for bitmap storage
(is regular, is open for write.  Support for ->bmap is already checked
elsewhere).
Reported-by: NAl Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: NNeilBrown <neilb@suse.de>

035328c2

14 1月, 2014 1 次提交

md: fix problem when adding device to read-only array with bitmap. · 8313b8e5

由 NeilBrown 提交于 12月 12, 2013

If an array is started degraded, and then the missing device
is found it can be re-added and a minimal bitmap-based recovery
will bring it fully up-to-date.

If the array is read-only a recovery would not be allowed.
But also if the array is read-only and the missing device was
present very recently, then there could be no need for any
recovery at all, so we simply include the device in the read-only
array without any recovery.

However... if the missing device was removed a little longer ago
it could be missing some updates, but if a bitmap is present it will
be conditionally accepted pending a bitmap-based update.  We don't
currently detect this case properly and will include that old
device into the read-only array with no recovery even though it really
needs a recovery.

This patch keeps track of whether a bitmap-based-recovery is really
needed or not in the new Bitmap_sync rdev flag.  If that is set,
then the device will not be added to a read-only array.

Cc: Andrei Warkentin <andreiw@vmware.com>
Fixes: d70ed2e4
Cc: stable@vger.kernel.org (3.2+)
Signed-off-by: NNeilBrown <neilb@suse.de>

8313b8e5

12 12月, 2013 1 次提交

kernfs: s/sysfs_dirent/kernfs_node/ and rename its friends accordingly · 324a56e1

由 Tejun Heo 提交于 12月 11, 2013

kernfs has just been separated out from sysfs and we're already in
full conflict mode.  Nothing can make the situation any worse.  Let's
take the chance to name things properly.

This patch performs the following renames.

* s/sysfs_elem_dir/kernfs_elem_dir/
* s/sysfs_elem_symlink/kernfs_elem_symlink/
* s/sysfs_elem_attr/kernfs_elem_file/
* s/sysfs_dirent/kernfs_node/
* s/sd/kn/ in kernfs proper
* s/parent_sd/parent/
* s/target_sd/target/
* s/dir_sd/parent/
* s/to_sysfs_dirent()/rb_to_kn()/
* misc renames of local vars when they conflict with the above

Because md, mic and gpio dig into sysfs details, this patch ends up
modifying them.  All are sysfs_dirent renames and trivial.  While we
can avoid these by introducing a dummy wrapping struct sysfs_dirent
around kernfs_node, given the limited usage outside kernfs and sysfs
proper, I don't think such workaround is called for.

This patch is strictly rename only and doesn't introduce any
functional difference.

- mic / gpio renames were missing.  Spotted by kbuild test robot.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Neil Brown <neilb@suse.de>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Ashutosh Dixit <ashutosh.dixit@intel.com>
Cc: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

324a56e1

09 11月, 2013 1 次提交

block: Consolidate duplicated bio_trim() implementations · 6678d83f

由 Kent Overstreet 提交于 8月 07, 2013

Someone cut and pasted md's md_trim_bio() into xen-blkfront.c. Come on,
we should know better than this.
Signed-off-by: NKent Overstreet <kmo@daterainc.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Neil Brown <neilb@suse.de>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

6678d83f

27 9月, 2013 1 次提交

sysfs: clean up sysfs_get_dirent() · 388975cc

由 Tejun Heo 提交于 9月 11, 2013

The pre-existing sysfs interfaces which take explicit namespace
argument are weird in that they place the optional @ns in front of
@name which is contrary to the established convention.  For example,
we end up forcing vast majority of sysfs_get_dirent() users to do
sysfs_get_dirent(parent, NULL, name), which is silly and error-prone
especially as @ns and @name may be interchanged without causing
compilation warning.

This renames sysfs_get_dirent() to sysfs_get_dirent_ns() and swap the
positions of @name and @ns, and sysfs_get_dirent() is now a wrapper
around sysfs_get_dirent_ns().  This makes confusions a lot less
likely.

There are other interfaces which take @ns before @name.  They'll be
updated by following patches.

This patch doesn't introduce any functional changes.

v2: EXPORT_SYMBOL_GPL() wasn't updated leading to undefined symbol
    error on module builds.  Reported by build test robot.  Fixed.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Kay Sievers <kay@vrfy.org>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

388975cc

27 8月, 2013 2 次提交

md: avoid deadlock when dirty buffers during md_stop. · 260fa034

由 NeilBrown 提交于 8月 27, 2013

When the last process closes /dev/mdX sync_blockdev will be called so
that all buffers get flushed.
So if it is then opened for the STOP_ARRAY ioctl to be sent there will
be nothing to flush.

However if we open /dev/mdX in order to send the STOP_ARRAY ioctl just
moments before some other process which was writing closes their file
descriptor, then there won't be a 'last close' and the buffers might
not get flushed.

So do_md_stop() calls sync_blockdev().  However at this point it is
holding ->reconfig_mutex.  So if the array is currently 'clean' then
the writes from sync_blockdev() will not complete until the array
can be marked dirty and that won't happen until some other thread
can get ->reconfig_mutex.  So we deadlock.

We need to move the sync_blockdev() call to before we take
->reconfig_mutex.
However then some other thread could open /dev/mdX and write to it
after we call sync_blockdev() and before we actually stop the array.
This can leave dirty data in the page cache which is awkward.

So introduce new flag MD_STILL_CLOSED.  Set it before calling
sync_blockdev(), clear it if anyone does open the file, and abort the
STOP_ARRAY attempt if it gets set before we lock against further
opens.

It is still possible to get problems if you open /dev/mdX, write to
it, then issue the STOP_ARRAY ioctl.  Just don't do that.
Signed-off-by: NNeilBrown <neilb@suse.de>

260fa034

md: Don't test all of mddev->flags at once. · 7a0a5355

由 NeilBrown 提交于 8月 27, 2013

mddev->flags is mostly used to record if an update of the
metadata is needed.  Sometimes the whole field is tested
instead of just the important bits.  This makes it difficult
to introduce more state bits.

So replace all bare tests of mddev->flags with tests for the bits
that actually need testing.
Signed-off-by: NNeilBrown <neilb@suse.de>

7a0a5355

26 6月, 2013 1 次提交

MD: Remember the last sync operation that was performed · c4a39551

由 Jonathan Brassow 提交于 6月 25, 2013

MD:  Remember the last sync operation that was performed

This patch adds a field to the mddev structure to track the last
sync operation that was performed.  This is especially useful when
it comes to what is recorded in mismatch_cnt in sysfs.  If the
last operation was "data-check", then it reports the number of
descrepancies found by the user-initiated check.  If it was a
"repair" operation, then it is reporting the number of
descrepancies repaired.  etc.
Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

c4a39551

24 4月, 2013 1 次提交

MD: Export 'md_reap_sync_thread' function · a91d5ac0

由 Jonathan Brassow 提交于 4月 24, 2013

MD: Export 'md_reap_sync_thread' function

Make 'md_reap_sync_thread' available to other files, specifically dm-raid.c.
- rename reap_sync_thread to md_reap_sync_thread
- move the fn after md_check_recovery to match md.h declaration placement
- export md_reap_sync_thread
Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

a91d5ac0

20 3月, 2013 1 次提交

MD: Prevent sysfs operations on uninitialized kobjects · 90584fc9

由 Jonathan Brassow 提交于 3月 07, 2013

MD: Prevent sysfs operations on uninitialized kobjects

Device-mapper does not use sysfs; but when device-mapper is leveraging
MD's RAID personalities, MD sometimes attempts to update sysfs.  This
patch adds checks for 'mddev-kobj.sd' in sysfs_[un]link_rdev to ensure
it is about to operate on something valid.  This patch also checks for
'mddev->kobj.sd' before calling 'sysfs_notify' in 'remove_and_add_spares'.
Although 'sysfs_notify' already makes this check, doing so in
'remove_and_add_spares' prevents an additional mutex operation.
Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

90584fc9

13 12月, 2012 1 次提交

md: Use ->curr_resync as last completed request when cleanly aborting resync. · 0a19caab

由 majianpeng 提交于 11月 19, 2012

If a resync is aborted cleanly, ->curr_resync is a reliable
record of where we got up to.
If there was an error it is less reliable but we always know that
->curr_resync_completed is safe.

So add a flag MD_RECOVERY_ERROR to differentiate between these cases
and set recovery_cp accordingly.
Signed-off-by: NJianpeng Ma <majianpeng@gmail.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

0a19caab

30 11月, 2012 1 次提交

wait: add wait_event_lock_irq() interface · eed8c02e

由 Lukas Czerner 提交于 11月 30, 2012

New wait_event{_interruptible}_lock_irq{_cmd} macros added. This commit
moves the private wait_event_lock_irq() macro from MD to regular wait
includes, introduces new macro wait_event_lock_irq_cmd() instead of using
the old method with omitting cmd parameter which is ugly and makes a use
of new macros in the MD. It also introduces the _interruptible_ variant.

The use of new interface is when one have a special lock to protect data
structures used in the condition, or one also needs to invoke "cmd"
before putting it to sleep.

All new macros are expected to be called with the lock taken. The lock
is released before sleep and is reacquired afterwards. We will leave the
macro with the lock held.

Note to DM: IMO this should also fix theoretical race on waitqueue while
using simultaneously wait_event_lock_irq() and wait_event() because of
lack of locking around current state setting and wait queue removal.
Signed-off-by: NLukas Czerner <lczerner@redhat.com>
Cc: Neil Brown <neilb@suse.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

eed8c02e

11 10月, 2012 2 次提交

Subject: [PATCH] md:change resync_mismatches to atomic64_t to avoid races · 7f7583d4

由 Jianpeng Ma 提交于 10月 11, 2012

Now that multiple threads can handle stripes, it is safer to
use an atomic64_t for resync_mismatches, to avoid update races.
Signed-off-by: NJianpeng Ma <majianpeng@gmail.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

7f7583d4

MD: change the parameter of md thread · 4ed8731d

由 Shaohua Li 提交于 10月 11, 2012

Change the thread parameter, so the thread can carry extra info. Next patch
will use it.
Signed-off-by: NShaohua Li <shli@fusionio.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

4ed8731d

31 7月, 2012 3 次提交

blk: pass from_schedule to non-request unplug functions. · 74018dc3

由 NeilBrown 提交于 7月 31, 2012

This will allow md/raid to know why the unplug was called,
and will be able to act according - if !from_schedule it
is safe to perform tasks which could themselves schedule.
Signed-off-by: NNeilBrown <neilb@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

74018dc3

blk: centralize non-request unplug handling. · 9cbb1750

由 NeilBrown 提交于 7月 31, 2012

Both md and umem has similar code for getting notified on an
blk_finish_plug event.
Centralize this code in block/ and allow each driver to
provide its distinctive difference.
Signed-off-by: NNeilBrown <neilb@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

9cbb1750

md: remove plug_cnt feature of plugging. · 0021b7bc

由 NeilBrown 提交于 7月 31, 2012

This seemed like a good idea at the time, but after further thought I
cannot see it making a difference other than very occasionally and
testing to try to exercise the case it is most likely to help did not
show any performance difference by removing it.

So remove the counting of active plugs and allow 'pending writes' to
be activated at any time, not just when no plugs are active.

This is only relevant when there is a write-intent bitmap, and the
updating of the bitmap will likely introduce enough delay that
the single-threading of bitmap updates will be enough to collect large
numbers of updates together.

Removing this will make it easier to centralise the unplug code, and
will clear the other for other unplug enhancements which have a
measurable effect.
Signed-off-by: NNeilBrown <neilb@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

0021b7bc

22 5月, 2012 2 次提交

md/bitmap: add new 'space' attribute for bitmaps. · 6409bb05

由 NeilBrown 提交于 5月 22, 2012

If we are to allow bitmaps to be resized when the array is resized,
we need to know how much space there is.

So create an attribute to store this information and set appropriate
defaults.

It can be set more precisely via sysfs, or future metadata extensions
may allow it to be recorded.
Signed-off-by: NNeilBrown <neilb@suse.de>

6409bb05

md: dm-raid should call helper function to clear rdev. · 545c8795

由 NeilBrown 提交于 5月 22, 2012

dm-raid currently open-codes the freeing of some members of
and rdev.  It is more maintainable to have it call common code
from md.c which does this for all call-sites.

So remove free_disk_sb to md_rdev_clear, export it, and use it in
dm-raid.c
Signed-off-by: NNeilBrown <neilb@suse.de>

545c8795

21 5月, 2012 2 次提交

md: add possibility to change data-offset for devices. · c6563a8c

由 NeilBrown 提交于 5月 21, 2012

When reshaping we can avoid costly intermediate backup by
changing the 'start' address of the array on the device
(if there is enough room).

So as a first step, allow such a change to be requested
through sysfs, and recorded in v1.x metadata.

(As we didn't previous check that all 'pad' fields were zero,
 we need a new FEATURE flag for this.
 A (belatedly) check that all remaining 'pad' fields are
 zero to avoid a repeat of this)

The new data offset must be requested separately for each device.
This allows each to have a different change in the data offset.
This is not likely to be used often but as data_offset can be
set per-device, new_data_offset should be too.

This patch also removes the 'acknowledged' arg to rdev_set_badblocks as
it is never used and never will be.  At the same time we add a new
arg ('in_new') which is currently always zero but will be used more
soon.

When a reshape finishes we will need to update the data_offset
and rdev->sectors.  So provide an exported function to do that.
Signed-off-by: NNeilBrown <neilb@suse.de>

c6563a8c

md: allow a reshape operation to be reversed. · 2c810cdd

由 NeilBrown 提交于 5月 21, 2012

Currently a reshape operation always progresses from the start
of the array to the end unless the number of devices is being
reduced, in which case it progressed in the opposite direction.

To reverse a partial reshape which changes the number of devices
you can stop the array and re-assemble with the raid-disks numbers
reversed and it will undo.

However for a reshape that does not change the number of devices
it is not possible to reverse the reshape in the middle - you have to
wait until it completes.

So add a 'reshape_direction' attribute with is either 'forwards' or
'backwards' and can be explicitly set when delta_disks is zero.

This will become more important when we allow the data_offset to
change in a reshape.  Then the explicit statement of what direction is
being used will be more useful.

This can be enabled in raid5 trivially as it already supports
reverse reshape and just needs to use a different trigger to request it.
Signed-off-by: NNeilBrown <neilb@suse.de>

2c810cdd

19 3月, 2012 2 次提交

md/raid10: handle merge_bvec_fn in member devices. · 050b6615

由 NeilBrown 提交于 3月 19, 2012

Currently we don't honour merge_bvec_fn in member devices so if there
is one, we force all requests to be single-page at most.
This is not ideal.

So enhance the raid10 merge_bvec_fn to check that function in children
as well.

This introduces a small problem.  There is no locking around calls
the ->merge_bvec_fn and subsequent calls to ->make_request.  So a
device added between these could end up getting a request which
violates its merge_bvec_fn.

Currently the best we can do is synchronize_sched().  This will work
providing no preemption happens.  If there is preemption, we just
have to hope that new devices are largely consistent with old devices.
Signed-off-by: NNeilBrown <neilb@suse.de>

050b6615

md: tidy up rdev_for_each usage. · dafb20fa

由 NeilBrown 提交于 3月 19, 2012

md.h has an 'rdev_for_each()' macro for iterating the rdevs in an
mddev.  However it uses the 'safe' version of list_for_each_entry,
and so requires the extra variable, but doesn't include 'safe' in the
name, which is useful documentation.

Consequently some places use this safe version without needing it, and
many use an explicity list_for_each entry.

So:
 - rename rdev_for_each to rdev_for_each_safe
 - create a new rdev_for_each which uses the plain
   list_for_each_entry,
 - use the 'safe' version only where needed, and convert all other
   list_for_each_entry calls to use rdev_for_each.
Signed-off-by: NNeilBrown <neilb@suse.de>

dafb20fa

23 12月, 2011 2 次提交

md: create externally visible flags for supporting hot-replace. · 2d78f8c4

由 NeilBrown 提交于 12月 23, 2011

hot-replace is a feature being added to md which will allow a
device to be replaced without removing it from the array first.

With hot-replace a spare can be activated and recovery can start while
the original device is still in place, thus allowing a transition from
an unreliable device to a reliable device without leaving the array
degraded during the transition.  It can also be use when the original
device is still reliable but it not wanted for some reason.

This will eventually be supported in RAID4/5/6 and RAID10.

This patch adds a super-block flag to distinguish the replacement
device.  If an old kernel sees this flag it will reject the device.

It also adds two per-device flags which are viewable and settable via
sysfs.
   "want_replacement" can be set to request that a device be replaced.
   "replacement" is set to show that this device is replacing another
   device.

The "rd%d" links in /sys/block/mdXx/md only apply to the original
device, not the replacement.  We currently don't make links for the
replacement - there doesn't seem to be a need.
Signed-off-by: NNeilBrown <neilb@suse.de>

2d78f8c4

md: change hot_remove_disk to take an rdev rather than a number. · b8321b68

由 NeilBrown 提交于 12月 23, 2011

Soon an array will be able to have multiple devices with the
same raid_disk number (an original and a replacement).  So removing
a device based on the number won't work.  So pass the actual device
handle instead.
Reviewed-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

b8321b68

11 10月, 2011 4 次提交
- N
  md: rename "mdk_personality" to "md_personality" · 84fc4b56
  由 NeilBrown 提交于 10月 11, 2011
```
"mdk" doesn't mean anything any more.
Signed-off-by: NNeilBrown <neilb@suse.de>
```
  84fc4b56
- N
  md: remove typedefs: mdk_thread_t -> struct md_thread · 2b8bf345
  由 NeilBrown 提交于 10月 11, 2011
```
Signed-off-by: NNeilBrown <neilb@suse.de>
```
  2b8bf345
- N
  md: remove typedefs: mddev_t -> struct mddev · fd01b88c
  由 NeilBrown 提交于 10月 11, 2011
```
Having mddev_t and 'struct mddev_s' is ugly and not preferred
Signed-off-by: NNeilBrown <neilb@suse.de>
```
  fd01b88c
- N
  md: removing typedefs: mdk_rdev_t -> struct md_rdev · 3cb03002
  由 NeilBrown 提交于 10月 11, 2011
```
The typedefs are just annoying. 'mdk' probably refers to 'md_k.h'
which used to be an include file that defined this thing.
Signed-off-by: NNeilBrown <neilb@suse.de>
```
  3cb03002
21 9月, 2011 2 次提交

W
trival: md_k.h should be md.h in the beginning comment of file md.h · 7e841526
由 Wang Sheng-Hui 提交于 9月 21, 2011
```
Signed-off-by: NWang Sheng-Hui <shhuiw@gmail.com>
Signed-off-by: NNeilBrown <neilb@suse.de>
```
7e841526

md: Avoid waking up a thread after it has been freed. · 01f96c0a

由 NeilBrown 提交于 9月 21, 2011

Two related problems:

1/ some error paths call "md_unregister_thread(mddev->thread)"
   without subsequently clearing ->thread.  A subsequent call
   to mddev_unlock will try to wake the thread, and crash.

2/ Most calls to md_wakeup_thread are protected against the thread
   disappeared either by:
      - holding the ->mutex
      - having an active request, so something else must be keeping
        the array active.
   However mddev_unlock calls md_wakeup_thread after dropping the
   mutex and without any certainty of an active request, so the
   ->thread could theoretically disappear.
   So we need a spinlock to provide some protections.

So change md_unregister_thread to take a pointer to the thread
pointer, and ensure that it always does the required locking, and
clears the pointer properly.
Reported-by: N"Moshe Melnikov" <moshe@zadarastorage.com>
Signed-off-by: NNeilBrown <neilb@suse.de>
cc: stable@kernel.org

01f96c0a

12 9月, 2011 1 次提交

block: remove support for bio remapping from ->make_request · 5a7bbad2

由 Christoph Hellwig 提交于 9月 12, 2011

There is very little benefit in allowing to let a ->make_request
instance update the bios device and sector and loop around it in
__generic_make_request when we can archive the same through calling
generic_make_request from the driver and letting the loop in
generic_make_request handle it.

Note that various drivers got the return value from ->make_request and
returned non-zero values for errors.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NNeilBrown <neilb@suse.de>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

5a7bbad2

28 7月, 2011 4 次提交

md: make it easier to wait for bad blocks to be acknowledged. · de393cde

由 NeilBrown 提交于 7月 28, 2011

It is only safe to choose not to write to a bad block if that bad
block is safely recorded in metadata - i.e. if it has been
'acknowledged'.

If it hasn't we need to wait for the acknowledgement.

We support that using rdev->blocked wait and
md_wait_for_blocked_rdev by introducing a new device flag
'BlockedBadBlock'.

This flag is only advisory.
It is cleared whenever we acknowledge a bad block, so that a waiter
can re-check the particular bad blocks that it is interested it.

It should be set by a caller when they find they need to wait.
This (set after test) is inherently racy, but as
md_wait_for_blocked_rdev already has a timeout, losing the race will
have minimal impact.

When we clear "Blocked" was also clear "BlockedBadBlocks" incase it
was set incorrectly (see above race).

We also modify the way we manage 'Blocked' to fit better with the new
handling of 'BlockedBadBlocks' and to make it consistent between
externally managed and internally managed metadata.   This requires
that each raidXd loop checks if the metadata needs to be written and
triggers a write (md_check_recovery) if needed.  Otherwise a queued
write request might cause raidXd to wait for the metadata to write,
and only that thread can write it.

Before writing metadata, we set FaultRecorded for all devices that
are Faulty, then after writing the metadata we clear Blocked for any
device for which the Fault was certainly Recorded.

The 'faulty' device flag now appears in sysfs if the device is faulty
*or* it has unacknowledged bad blocks.  So user-space which does not
understand bad blocks can continue to function correctly.
User space which does, should not assume a device is faulty until it
sees the 'faulty' flag, and then sees the list of unacknowledged bad
blocks is empty.
Signed-off-by: NNeilBrown <neilb@suse.de>

de393cde

md: add 'write_error' flag to component devices. · d7a9d443

由 NeilBrown 提交于 7月 28, 2011

If a device has ever seen a write error, we will want to handle
known-bad-blocks differently.
So create an appropriate state flag and export it via sysfs.
Signed-off-by: NNeilBrown <neilb@suse.de>
Reviewed-by: NNamhyung Kim <namhyung@gmail.com>

d7a9d443

md/raid1: avoid reading from known bad blocks. · d2eb35ac

由 NeilBrown 提交于 7月 28, 2011

Now that we have a bad block list, we should not read from those
blocks.
There are several main parts to this:
  1/ read_balance needs to check for bad blocks, and return not only
     the chosen device, but also how many good blocks are available
     there.
  2/ fix_read_error needs to avoid trying to read from bad blocks.
  3/ read submission must be ready to issue multiple reads to
     different devices as different bad blocks on different devices
     could mean that a single large read cannot be served by any one
     device, but can still be served by the array.
     This requires keeping count of the number of outstanding requests
     per bio.  This count is stored in 'bi_phys_segments'
  4/ retrying a read needs to also be ready to submit a smaller read
     and queue another request for the rest.

This does not yet handle bad blocks when reading to perform resync,
recovery, or check.

'md_trim_bio' will also be used for RAID10, so put it in md.c and
export it.
Signed-off-by: NNeilBrown <neilb@suse.de>

d2eb35ac

md: load/store badblock list from v1.x metadata · 2699b672

由 NeilBrown 提交于 7月 28, 2011

Space must have been allocated when array was created.
A feature flag is set when the badblock list is non-empty, to
ensure old kernels don't load and trust the whole device.

We only update the on-disk badblocklist when it has changed.
If the badblocklist (or other metadata) is stored on a bad block, we
don't cope very well.

If metadata has no room for bad block, flag bad-blocks as disabled,
and do the same for 0.90 metadata.
Signed-off-by: NNeilBrown <neilb@suse.de>

2699b672

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功