提交 · b03e0ccb5ab9df3efbe51c87843a1ffbecbafa1f · openeuler / Kernel

02 11月, 2017 2 次提交

md: remove special meaning of ->quiesce(.., 2) · b03e0ccb

由 NeilBrown 提交于 10月 19, 2017

The '2' argument means "wake up anything that is waiting".
This is an inelegant part of the design and was added
to help support management of suspend_lo/suspend_hi setting.
Now that suspend_lo/hi is managed in mddev_suspend/resume,
that need is gone.
These is still a couple of places where we call 'quiesce'
with an argument of '2', but they can safely be changed to
call ->quiesce(.., 1); ->quiesce(.., 0) which
achieve the same result at the small cost of pausing IO
briefly.

This removes a small "optimization" from suspend_{hi,lo}_store,
but it isn't clear that optimization served a useful purpose.
The code now is a lot clearer.
Suggested-by: NShaohua Li <shli@kernel.org>
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>

b03e0ccb

md: move suspend_hi/lo handling into core md code · b3143b9a

由 NeilBrown 提交于 10月 17, 2017

responding to ->suspend_lo and ->suspend_hi is similar
to responding to ->suspended.  It is best to wait in
the common core code without incrementing ->active_io.
This allows mddev_suspend()/mddev_resume() to work while
requests are waiting for suspend_lo/hi to change.
This is will be important after a subsequent patch
which uses mddev_suspend() to synchronize updating for
suspend_lo/hi.

So move the code for testing suspend_lo/hi out of raid1.c
and raid5.c, and place it in md.c
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>

b3143b9a

17 10月, 2017 2 次提交

md: rename some drivers/md/ files to have an "md-" prefix · 935fe098

由 Mike Snitzer 提交于 10月 10, 2017

Motivated by the desire to illiminate the imprecise nature of
DM-specific patches being unnecessarily sent to both the MD maintainer
and mailing-list.  Which is born out of the fact that DM files also
reside in drivers/md/

Now all MD-specific files in drivers/md/ start with either "raid" or
"md-" and the MAINTAINERS file has been updated accordingly.

Shaohua: don't change module name
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NShaohua Li <shli@fb.com>

935fe098

md-cluster: fix wrong condition check in raid1_write_request · 385f4d7f

由 Guoqing Jiang 提交于 9月 29, 2017

The check used here is to avoid conflict between write and
resync, however we used the wrong logic, it should be the
inverse of the checking inside "if".

Fixes: 589a1c49 ("Suspend writes in RAID1 if within range")
Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>

385f4d7f

28 8月, 2017 1 次提交

md: Runtime support for multiple ppls · ddc08823

由 Pawel Baldysiak 提交于 8月 16, 2017

Increase PPL area to 1MB and use it as circular buffer to store PPL. The
entry with highest generation number is the latest one. If PPL to be
written is larger then space left in a buffer, rewind the buffer to the
start (don't wrap it).
Signed-off-by: NPawel Baldysiak <pawel.baldysiak@intel.com>
Signed-off-by: NArtur Paszkiewicz <artur.paszkiewicz@intel.com>
Signed-off-by: NShaohua Li <shli@fb.com>

ddc08823

26 8月, 2017 1 次提交

md/raid1/10: reset bio allocated from mempool · 208410b5

由 Shaohua Li 提交于 8月 24, 2017

Data allocated from mempool doesn't always get initialized, this happens when
the data is reused instead of fresh allocation. In the raid1/10 case, we must
reinitialize the bios.
Reported-by: NJonathan G. Underwood <jonathan.underwood@gmail.com>
Fixes: f0250618(md: raid10: don't use bio's vec table to manage resync pages)
Fixes: 98d30c58(md: raid1: don't use bio's vec table to manage resync pages)
Cc: stable@vger.kernel.org (4.12+)
Cc: Ming Lei <ming.lei@redhat.com>
Signed-off-by: NShaohua Li <shli@fb.com>

208410b5

24 8月, 2017 1 次提交

block: replace bi_bdev with a gendisk pointer and partitions index · 74d46992

由 Christoph Hellwig 提交于 8月 23, 2017

This way we don't need a block_device structure to submit I/O.  The
block_device has different life time rules from the gendisk and
request_queue and is usually only available when the block device node
is open.  Other callers need to explicitly create one (e.g. the lightnvm
passthrough code, or the new nvme multipathing code).

For the actual I/O path all that we need is the gendisk, which exists
once per block device.  But given that the block layer also does
partition remapping we additionally need a partition index, which is
used for said remapping in generic_make_request.

Note that all the block drivers generally want request_queue or
sometimes the gendisk, so this removes a layer of indirection all
over the stack.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

74d46992

22 7月, 2017 5 次提交

md: simplify code with bio_io_error · 6308d8e3

由 Guoqing Jiang 提交于 7月 21, 2017

Since bio_io_error sets bi_status to BLK_STS_IOERR,
and calls bio_endio, so we can use it directly.

And as mentioned by Shaohua, there are also two
places in raid5.c can use bio_io_error either.
Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>

6308d8e3

md/raid1: fix writebehind bio clone · 16d56e2f

由 Shaohua Li 提交于 7月 17, 2017

After bio is submitted, we should not clone it as its bi_iter might be
invalid by driver. This is the case of behind_master_bio. In certain
situration, we could dispatch behind_master_bio immediately for the
first disk and then clone it for other disks.

https://bugzilla.kernel.org/show_bug.cgi?id=196383Reported-and-tested-by: NMarkus <m4rkusxxl@web.de>
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Fix: 841c1316(md: raid1: improve write behind)
Cc: stable@vger.kernel.org (4.12+)
Signed-off-by: NShaohua Li <shli@fb.com>

16d56e2f

md: raid1-10: move raid1/raid10 common code into raid1-10.c · be453e77

由 Ming Lei 提交于 7月 14, 2017

No function change, just move 'struct resync_pages' and related
helpers into raid1-10.c
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NShaohua Li <shli@fb.com>

be453e77

md: raid1/raid10: initialize bvec table via bio_add_page() · fb0eb5df

由 Ming Lei 提交于 7月 14, 2017

We will support multipage bvec soon, so initialize bvec
table using the standardy way instead of writing the
talbe directly. Otherwise it won't work any more once
multipage bvec is enabled.
Acked-by: NGuoqing Jiang <gqjiang@suse.com>
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NShaohua Li <shli@fb.com>

fb0eb5df

md: remove 'idx' from 'struct resync_pages' · 022e510f

由 Ming Lei 提交于 7月 14, 2017

bio_add_page() won't fail for resync bio, and the page index for each
bio is same, so remove it.

More importantly the 'idx' of 'struct resync_pages' is initialized in
mempool allocator function, the current way is wrong since mempool is
only responsible for allocation, we can't use that for initialization.
Suggested-by: NNeilBrown <neilb@suse.com>
Reported-by: NNeilBrown <neilb@suse.com>
Reported-and-tested-by: NPatrick <dto@gmx.net>
Fixes: f0250618(md: raid10: don't use bio's vec table to manage resync pages)
Fixes: 98d30c58(md: raid1: don't use bio's vec table to manage resync pages)
Cc: stable@vger.kernel.org (4.12+)
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NShaohua Li <shli@fb.com>

022e510f

19 6月, 2017 1 次提交

blk: replace bioset_create_nobvec() with a flags arg to bioset_create() · 011067b0

由 NeilBrown 提交于 6月 18, 2017

"flags" arguments are often seen as good API design as they allow
easy extensibility.
bioset_create_nobvec() is implemented internally as a variation in
flags passed to __bioset_create().

To support future extension, make the internal structure part of the
API.
i.e. add a 'flags' argument to bioset_create() and discard
bioset_create_nobvec().

Note that the bio_split allocations in drivers/md/raid* do not need
the bvec mempool - they should have used bioset_create_nobvec().
Suggested-by: NChristoph Hellwig <hch@infradead.org>
Reviewed-by: NChristoph Hellwig <hch@infradead.org>
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

011067b0

17 6月, 2017 1 次提交

md/raid1: remove unused bio in sync_request_write · 037d2ff6

由 Guoqing Jiang 提交于 6月 15, 2017

The "bio" is not used in sync_request_write after commit a68e5870
("md/raid1: split out two sub-functions from sync_request_write").
Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>

037d2ff6

14 6月, 2017 2 次提交

md: don't use flush_signals in userspace processes · f9c79bc0

由 Mikulas Patocka 提交于 6月 07, 2017

The function flush_signals clears all pending signals for the process. It
may be used by kernel threads when we need to prepare a kernel thread for
responding to signals. However using this function for an userspaces
processes is incorrect - clearing signals without the program expecting it
can cause misbehavior.

The raid1 and raid5 code uses flush_signals in its request routine because
it wants to prepare for an interruptible wait. This patch drops
flush_signals and uses sigprocmask instead to block all signals (including
SIGKILL) around the schedule() call. The signals are not lost, but the
schedule() call won't respond to them.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Acked-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>

f9c79bc0

md: fix deadlock between mddev_suspend() and md_write_start() · cc27b0c7

由 NeilBrown 提交于 6月 05, 2017

If mddev_suspend() races with md_write_start() we can deadlock
with mddev_suspend() waiting for the request that is currently
in md_write_start() to complete the ->make_request() call,
and md_write_start() waiting for the metadata to be updated
to mark the array as 'dirty'.
As metadata updates done by md_check_recovery() only happen then
the mddev_lock() can be claimed, and as mddev_suspend() is often
called with the lock held, these threads wait indefinitely for each
other.

We fix this by having md_write_start() abort if mddev_suspend()
is happening, and ->make_request() aborts if md_write_start()
aborted.
md_make_request() can detect this abort, decrease the ->active_io
count, and wait for mddev_suspend().
Reported-by: NNix <nix@esperi.org.uk>
Fix: 68866e42(MD: no sync IO while suspended)
Cc: stable@vger.kernel.org
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>

cc27b0c7

09 6月, 2017 1 次提交

block: switch bios to blk_status_t · 4e4cbee9

由 Christoph Hellwig 提交于 6月 03, 2017

Replace bi_error with a new bi_status to allow for a clear conversion.
Note that device mapper overloaded bi_error with a private value, which
we'll have to keep arround at least for now and thus propagate to a
proper blk_status_t value.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

4e4cbee9

06 6月, 2017 1 次提交

md: initialise ->writes_pending in personality modules. · a415c0f1

由 NeilBrown 提交于 6月 05, 2017

The new per-cpu counter for writes_pending is initialised in
md_alloc(), which is not called by dm-raid.
So dm-raid fails when md_write_start() is called.

Move the initialization to the personality modules
that need it.  This way it is always initialised when needed,
but isn't unnecessarily initialized (requiring memory allocation)
when the personality doesn't use writes_pending.
Reported-by: NHeinz Mauelshagen <heinzm@redhat.com>
Fixes: 4ad23a97 ("MD: use per-cpu counter for writes_pending")
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>

a415c0f1

13 5月, 2017 1 次提交

raid1: prefer disk without bad blocks · d82dd0e3

由 Tomasz Majchrzak 提交于 5月 12, 2017

If an array consists of two drives and the first drive has the bad
block, the read request to the region overlapping the bad block chooses
the same disk (with bad block) as device to read from over and over and
the request gets stuck. If the first disk only partially overlaps with
bad block, it becomes a candidate ("best disk") for shorter range of
sectors. The second disk is capable of reading the entire requested
range and it is updated accordingly, however it is not recorded as a
best device for the request. In the end the request is sent to the first
disk to read entire range of sectors. It fails and is re-tried in a
moment but with the same outcome.

Actually it is quite likely scenario but it had little exposure in my
test until commit 715d40b93b10 ("md/raid1: add failfast handling for
reads.") removed preference for idle disk. Such scenario had been
passing as second disk was always chosen when idle.

Reset a candidate ("best disk") to read from if disk can read entire
range. Do it only if other disk has already been chosen as a candidate
for a smaller range. The head position / disk type logic will select
the best disk to read from - it is fine as disk with bad block won't be
considered for it.
Signed-off-by: NTomasz Majchrzak <tomasz.majchrzak@intel.com>
Signed-off-by: NShaohua Li <shli@fb.com>

d82dd0e3

12 5月, 2017 1 次提交

md/raid1/10: avoid unnecessary locking · 23b245c0

由 Shaohua Li 提交于 5月 10, 2017

If we add bios to block plugging list, locking is unnecessry, since the block
unplug is guaranteed not to run at that time.
Reviewed-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>

23b245c0

09 5月, 2017 1 次提交

md: don't return -EAGAIN in md_allow_write for external metadata arrays · 2214c260

由 Artur Paszkiewicz 提交于 5月 08, 2017

This essentially reverts commit b5470dc5 ("md: resolve external
metadata handling deadlock in md_allow_write") with some adjustments.

Since commit 6791875e ("md: make reconfig_mutex optional for writes
to md sysfs files.") changing array_state to 'active' does not use
mddev_lock() and will not cause a deadlock with md_allow_write(). This
revert simplifies userspace tools that write to sysfs attributes like
"stripe_cache_size" or "consistency_policy" because it removes the need
for special handling for external metadata arrays, checking for EAGAIN
and retrying the write.
Signed-off-by: NArtur Paszkiewicz <artur.paszkiewicz@intel.com>
Signed-off-by: NShaohua Li <shli@fb.com>

2214c260

28 4月, 2017 1 次提交

md/raid1: Use a new variable to count flighting sync requests · 43ac9b84

由 Xiao Ni 提交于 4月 27, 2017

In new barrier codes, raise_barrier waits if conf->nr_pending[idx] is not zero.
After all the conditions are true, the resync request can go on be handled. But
it adds conf->nr_pending[idx] again. The next resync request hit the same bucket
idx need to wait the resync request which is submitted before. The performance
of resync/recovery is degraded.
So we should use a new variable to count sync requests which are in flight.

I did a simple test:
1. Without the patch, create a raid1 with two disks. The resync speed:
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sdb 0.00 0.00 166.00 0.00 10.38 0.00 128.00 0.03 0.20 0.20 0.00 0.19 3.20
sdc 0.00 0.00 0.00 166.00 0.00 10.38 128.00 0.96 5.77 0.00 5.77 5.75 95.50
2. With the patch, the result is:
sdb 2214.00 0.00 766.00 0.00 185.69 0.00 496.46 2.80 3.66 3.66 0.00 1.03 79.10
sdc 0.00 2205.00 0.00 769.00 0.00 186.44 496.52 5.25 6.84 0.00 6.84 1.30 100.10
Suggested-by: NShaohua Li <shli@kernel.org>
Signed-off-by: NXiao Ni <xni@redhat.com>
Acked-by: NColy Li <colyli@suse.de>
Signed-off-by: NShaohua Li <shli@fb.com>

43ac9b84

26 4月, 2017 1 次提交

md: clear WantReplacement once disk is removed · e5bc9c3c

由 Guoqing Jiang 提交于 4月 24, 2017

We can clear 'WantReplacement' flag directly no
matter it's replacement existed or not since the
semantic is same as before.

Also since the disk is removed from array, then
it is straightforward to remove 'WantReplacement'
flag and the comments in raid10/5 can be removed
as well.
Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>

e5bc9c3c

24 4月, 2017 1 次提交

md/raid1/10: remove unused queue · 29661758

由 Lidong Zhong 提交于 4月 21, 2017

A queue is declared and get from the disk of the array, but it's not
used anywhere. So removing it from the source.
Signed-off-by: NLidong Zhong <lzhong@suse.com>
Acted-by: NGuoqing Jiang <gqjiang@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>

29661758

12 4月, 2017 4 次提交

md/raid1: factor out flush_bio_list() · 673ca68d

由 NeilBrown 提交于 4月 05, 2017

flush_pending_writes() and raid1_unplug() each contain identical
copies of a fairly large slab of code.  So factor that out into
new flush_bio_list() to simplify maintenance.
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>

673ca68d

md/raid1: simplify handle_read_error(). · 689389a0

由 NeilBrown 提交于 4月 05, 2017

handle_read_error() duplicates a lot of the work that raid1_read_request()
does, so it makes sense to just use that function.
This doesn't quite work as handle_read_error() relies on the same r1bio
being re-used so that, in the case of a read-only array, setting
IO_BLOCKED in r1bio->bios[] ensures read_balance() won't re-use
that device.
So we need to allow a r1bio to be passed to raid1_read_request(), and to
have that function mostly initialise the r1bio, but leave the bios[]
array untouched.

Two parts of handle_read_error() that need to be preserved are the warning
message it prints, so they are conditionally added to raid1_read_request().

Note that this highlights a minor bug on alloc_r1bio().  It doesn't
initalise the bios[] array, so it is possible that old content is there,
which might cause read_balance() to ignore some devices with no good reason.

With this change, we no longer need inc_pending(), or the sectors_handled
arg to alloc_r1bio().

As handle_read_error() is called from raid1d() and allocates memory,
there is tiny chance of a deadlock.  All element of various pools
could be queued waiting for raid1 to handle them, and there may be no
extra memory free.
Achieving guaranteed forward progress would probably require a second
thread and another mempool.  Instead of that complexity, add
__GFP_HIGH to any allocations when read1_read_request() is called
from raid1d.
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>

689389a0

md/raid1: simplify alloc_behind_master_bio() · cb83efcf

由 NeilBrown 提交于 4月 05, 2017

Now that we always always pass an offset of 0 and a size
that matches the bio to alloc_behind_master_bio(),
we can remove the offset/size args and simplify the code.

We could probably remove bio_copy_data_partial() too.
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>

cb83efcf

md/raid1: simplify the splitting of requests. · c230e7e5

由 NeilBrown 提交于 4月 05, 2017

raid1 currently splits requests in two different ways for
two different reasons.

First, bio_split() is used to ensure the bio fits within a
resync accounting region.
Second, multiple r1bios are allocated for each bio to handle
the possiblity of known bad blocks on some devices.

This can be simplified to just use bio_split() once, and not
use multiple r1bios.
We delay the split until we know a maximum bio size that can
be handled with a single r1bio, and then split the bio and
queue the remainder for later handling.

This avoids all loops inside raid1.c request handling.  Just
a single read, or a single set of writes, is submitted to
lower-level devices for each bio that comes from
generic_make_request().

When the bio needs to be split, generic_make_request() will
do the necessary looping and call md_make_request() multiple
times.

raid1_make_request() no longer queues request for raid1 to handle,
so we can remove that branch from the 'if'.

This patch also creates a new private bio_set
(conf->bio_split) for splitting bios.  Using fs_bio_set
is wrong, as it is meant to be used by filesystems, not
block devices.  Using it inside md can lead to deadlocks
under high memory pressure.

Delete unused variable in raid1_write_request() (Shaohua)
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>

c230e7e5

11 4月, 2017 1 次提交

md/raid1: avoid reusing a resync bio after error handling. · 0c9d5b12

由 NeilBrown 提交于 4月 06, 2017

fix_sync_read_error() modifies a bio on a newly faulty
device by setting bi_end_io to end_sync_write.
This ensure that put_buf() will still call rdev_dec_pending()
as required, but makes sure that subsequent code in
fix_sync_read_error() doesn't try to read from the device.

Unfortunately this interacts badly with sync_request_write()
which assumes that any bio with bi_end_io set to non-NULL
other than end_sync_read is safe to write to.

As the device is now faulty it doesn't make sense to write.
As the bio was recently used for a read, it is "dirty"
and not suitable for immediate submission.
In particular, ->bi_next might be non-NULL, which will cause
generic_make_request() to complain.

Break this interaction by refusing to write to devices
which are marked as Faulty.
Reported-and-tested-by: NMichael Wang <yun.wang@profitbricks.com>
Fixes: 2e52d449 ("md/raid1: add failfast handling for reads.")
Cc: stable@vger.kernel.org (v4.10+)
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>

0c9d5b12

09 4月, 2017 1 次提交

md: support REQ_OP_WRITE_ZEROES · 3deff1a7

由 Christoph Hellwig 提交于 4月 05, 2017

Copy & paste from the REQ_OP_WRITE_SAME code.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

3deff1a7

28 3月, 2017 1 次提交

md: raid1: kill warning on powerpc_pseries · 8fc04e6e

由 Ming Lei 提交于 3月 28, 2017

This patch kills the warning reported on powerpc_pseries,
and actually we don't need the initialization.

	After merging the md tree, today's linux-next build (powerpc
	pseries_le_defconfig) produced this warning:

	drivers/md/raid1.c: In function 'raid1d':
	drivers/md/raid1.c:2172:9: warning: 'page_len$' may be used uninitialized in this function [-Wmaybe-uninitialized]
	     if (memcmp(page_address(ppages[j]),
	         ^
	drivers/md/raid1.c:2160:7: note: 'page_len$' was declared here
	   int page_len[RESYNC_PAGES];
       ^
Signed-off-by: NMing Lei <tom.leiming@gmail.com>
Signed-off-by: NShaohua Li <shli@fb.com>

8fc04e6e

26 3月, 2017 1 次提交

md/raid1: skip data copy for behind io for discard request · 41743c1f

由 Shaohua Li 提交于 3月 24, 2017

discard request doesn't have data attached, so it's meaningless to
allocate memory and copy from original bio for behind IO. And the copy
is bogus because bio_copy_data_partial can't handle discard request.

We don't support writesame/writezeros request so far.
Reviewed-by: NMing Lei <tom.leiming@gmail.com>
Signed-off-by: NShaohua Li <shli@fb.com>

41743c1f

25 3月, 2017 8 次提交

md: raid1: improve write behind · 841c1316

由 Ming Lei 提交于 3月 17, 2017

This patch improve handling of write behind in the following ways:

- introduce behind master bio to hold all write behind pages
- fast clone bios from behind master bio
- avoid to change bvec table directly
- use bio_copy_data() and make code more clean
Suggested-by: NShaohua Li <shli@fb.com>
Signed-off-by: NMing Lei <tom.leiming@gmail.com>
Signed-off-by: NShaohua Li <shli@fb.com>

841c1316

md: raid1: move 'offset' out of loop · d8c84c4f

由 Ming Lei 提交于 3月 17, 2017

The 'offset' local variable can't be changed inside the loop, so
move it out.
Signed-off-by: NMing Lei <tom.leiming@gmail.com>
Signed-off-by: NShaohua Li <shli@fb.com>

d8c84c4f

md: raid1: use bio helper in process_checks() · 60928a91

由 Ming Lei 提交于 3月 17, 2017

Avoid to direct access to bvec table.
Signed-off-by: NMing Lei <tom.leiming@gmail.com>
Signed-off-by: NShaohua Li <shli@fb.com>

60928a91

md: raid1: retrieve page from pre-allocated resync page array · 44cf0f4d

由 Ming Lei 提交于 3月 17, 2017

Now one page array is allocated for each resync bio, and we can
retrieve page from this table directly.
Signed-off-by: NMing Lei <tom.leiming@gmail.com>
Signed-off-by: NShaohua Li <shli@fb.com>

44cf0f4d

md: raid1: don't use bio's vec table to manage resync pages · 98d30c58

由 Ming Lei 提交于 3月 17, 2017

Now we allocate one page array for managing resync pages, instead
of using bio's vec table to do that, and the old way is very hacky
and won't work any more if multipage bvec is enabled.

The introduced cost is that we need to allocate (128 + 16) * raid_disks
bytes per r1_bio, and it is fine because the inflight r1_bio for
resync shouldn't be much, as pointed by Shaohua.

Also the bio_reset() in raid1_sync_request() is removed because
all bios are freshly new now and not necessary to reset any more.

This patch can be thought as a cleanup too
Suggested-by: NShaohua Li <shli@kernel.org>
Signed-off-by: NMing Lei <tom.leiming@gmail.com>
Signed-off-by: NShaohua Li <shli@fb.com>

98d30c58

md: raid1: simplify r1buf_pool_free() · a7234234

由 Ming Lei 提交于 3月 17, 2017

This patch gets each page's reference of each bio for resync,
then r1buf_pool_free() gets simplified a lot.

The same policy has been taken in raid10's buf pool allocation/free
too.
Signed-off-by: NMing Lei <tom.leiming@gmail.com>
Signed-off-by: NShaohua Li <shli@fb.com>

a7234234

md: move two macros into md.h · d8e29fbc

由 Ming Lei 提交于 3月 17, 2017

Both raid1 and raid10 share common resync
block size and page count, so move them into md.h.
Signed-off-by: NMing Lei <tom.leiming@gmail.com>
Signed-off-by: NShaohua Li <shli@fb.com>

d8e29fbc

md: raid1/raid10: don't handle failure of bio_add_page() · c85ba149

由 Ming Lei 提交于 3月 17, 2017

All bio_add_page() is for adding one page into resync bio,
which is big enough to hold RESYNC_PAGES pages, and
the current bio_add_page() doesn't check queue limit any more,
so it won't fail at all.

remove unused label (shaohua)
Signed-off-by: NMing Lei <tom.leiming@gmail.com>
Signed-off-by: NShaohua Li <shli@fb.com>

c85ba149

openeuler / Kernel 大约 1 年 前同步成功

openeuler / Kernel
大约 1 年前同步成功