提交 · da22f0eea555baf9b0a84b52afe56db2052cfe8d · openeuler / Kernel

06 9月, 2017 10 次提交

bcache: silence static checker warning · da22f0ee

由 Dan Carpenter 提交于 9月 06, 2017

In olden times, closure_return() used to have a hidden return built in.
We removed the hidden return but forgot to add a new return here.  If
"c" were NULL we would oops on the next line, but fortunately "c" is
never NULL.  Let's just remove the if statement.
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: NColy Li <colyli@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

da22f0ee

bcache: fix for gc and write-back race · 9baf3097

由 Tang Junhui 提交于 9月 06, 2017

gc and write-back get raced (see the email "bcache get stucked" I sended
before):
gc thread                               write-back thread
|                                       |bch_writeback_thread()
|bch_gc_thread()                        |
|                                       |==>read_dirty()
|==>bch_btree_gc()                      |
|==>btree_root() //get btree root       |
|                //node write locker    |
|==>bch_btree_gc_root()                 |
|                                       |==>read_dirty_submit()
|                                       |==>write_dirty()
|                                       |==>continue_at(cl,
|                                       |               write_dirty_finish,
|                                       |               system_wq);
|                                       |==>write_dirty_finish()//excute
|                                       |               //in system_wq
|                                       |==>bch_btree_insert()
|                                       |==>bch_btree_map_leaf_nodes()
|                                       |==>__bch_btree_map_nodes()
|                                       |==>btree_root //try to get btree
|                                       |              //root node read
|                                       |              //lock
|                                       |-----stuck here
|==>bch_btree_set_root()
|==>bch_journal_meta()
|==>bch_journal()
|==>journal_try_write()
|==>journal_write_unlocked() //journal_full(&c->journal)
|                            //condition satisfied
|==>continue_at(cl, journal_write, system_wq); //try to excute
|                               //journal_write in system_wq
|                               //but work queue is excuting
|                               //write_dirty_finish()
|==>closure_sync(); //wait journal_write execute
|                   //over and wake up gc,
|-------------stuck here
|==>release root node write locker

This patch alloc a separate work-queue for write-back thread to avoid such
race.

(Commit log re-organized by Coly Li to pass checkpatch.pl checking)
Signed-off-by: NTang Junhui <tang.junhui@zte.com.cn>
Acked-by: NColy Li <colyli@suse.de>
Cc: stable@vger.kernel.org
Signed-off-by: NJens Axboe <axboe@kernel.dk>

9baf3097

bcache: increase the number of open buckets · 89b1fc54

由 Tang Junhui 提交于 9月 06, 2017

In currently, we only alloc 6 open buckets for each cache set,
but in usually, we always attach about 10 or so backend devices for
each cache set, and the each bcache device are always accessed by
about 10 or so threads in top application layer. So 6 open buckets
are too few, It has led to that each of the same thread write data
to different buckets, which would cause low efficiency write-back,
and also cause buckets inefficient, and would be Very easy to run
out of.

I add debug message in bch_open_buckets_alloc() to print alloc bucket
info, and test with ten bcache devices with a cache set, and each
bcache device is accessed by ten threads.

From the debug message, we can see that, after the modification, One
bucket is more likely to assign to the same thread, and the data from
the same thread are more likely to write the same bucket. Usually the
same thread always write/read the same backend device, so it is good
for write-back and also promote the usage efficiency of buckets.
Signed-off-by: NTang Junhui <tang.junhui@zte.com.cn>
Reviewed-by: NColy Li <colyli@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

89b1fc54

bcache: Correct return value for sysfs attach errors · 77fa100f

由 Tony Asleson 提交于 9月 06, 2017

If you encounter any errors in bch_cached_dev_attach it will return
a negative error code.  The variable 'v' which stores the result is
unsigned, thus user space sees a very large value returned for bytes
written which can cause incorrect user space behavior.  Utilize 1
signed variable to use throughout the function to preserve error return
capability.
Signed-off-by: NTony Asleson <tasleson@redhat.com>
Acked-by: NColy Li <colyli@suse.de>
Cc: stable@vger.kernel.org
Signed-off-by: NJens Axboe <axboe@kernel.dk>

77fa100f

bcache: correct cache_dirty_target in __update_writeback_rate() · a8394090

由 Tang Junhui 提交于 9月 06, 2017

__update_write_rate() uses a Proportion-Differentiation Controller
algorithm to control writeback rate. A dirty target number is used in
this PD controller to control writeback rate. A larger target number
will make the writeback rate smaller, on the versus, a smaller target
number will make the writeback rate larger.

bcache uses the following steps to calculate the target number,
1) cache_sectors = all-buckets-of-cache-set * buckets-size
2) cache_dirty_target = cache_sectors * cached-device-writeback_percent
3) target = cache_dirty_target *
(sectors-of-cached-device/sectors-of-all-cached-devices-of-this-cache-set)

The calculation at step 1) for cache_sectors is incorrect, which does
not consider dirty blocks occupied by flash only volume.

A flash only volume can be took as a bcache device without cached
device. All data sectors allocated for it are persistent on cache device
and marked dirty, they are not touched by bcache writeback and garbage
collection code. So data blocks of flash only volume should be ignore
when calculating cache_sectors of cache set.

Current code does not subtract dirty sectors of flash only volume, which
results a larger target number from the above 3 steps. And in sequence
the cache device's writeback rate is smaller then a correct value,
writeback speed is slower on all cached devices.

This patch fixes the incorrect slower writeback rate by subtracting
dirty sectors of flash only volumes in __update_writeback_rate().

(Commit log composed by Coly Li to pass checkpatch.pl checking)
Signed-off-by: NTang Junhui <tang.junhui@zte.com.cn>
Reviewed-by: NColy Li <colyli@suse.de>
Cc: stable@vger.kernel.org
Signed-off-by: NJens Axboe <axboe@kernel.dk>

a8394090

bcache: gc does not work when triggering by manual command · 0b43f49d

由 Tang Junhui 提交于 9月 06, 2017

I try to execute the following command to trigger gc thread:
[root@localhost internal]# echo 1 > trigger_gc
But it does not work, I debug the code in gc_should_run(), It works only
if in invalidating or sectors_to_gc < 0. So set sectors_to_gc to -1 to
meet the condition when we trigger gc by manual command.

(Code comments aded by Coly Li)
Signed-off-by: NTang Junhui <tang.junhui@zte.com.cn>
Reviewed-by: NColy Li <colyli@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

0b43f49d

bcache: Don't reinvent the wheel but use existing llist API · 09b3efec

由 Byungchul Park 提交于 9月 06, 2017

Although llist provides proper APIs, they are not used. Make them used.
Signed-off-by: NByungchul Park <byungchul.park@lge.com>
Acked-by: NColy Li <colyli@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

09b3efec

bcache: do not subtract sectors_to_gc for bypassed IO · 69daf03a

由 Tang Junhui 提交于 9月 06, 2017

Since bypassed IOs use no bucket, so do not subtract sectors_to_gc to
trigger gc thread.
Signed-off-by: Ntang.junhui <tang.junhui@zte.com.cn>
Acked-by: NColy Li <colyli@suse.de>
Reviewed-by: NEric Wheeler <bcache@linux.ewheeler.net>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Cc: stable@vger.kernel.org
Signed-off-by: NJens Axboe <axboe@kernel.dk>

69daf03a

bcache: fix sequential large write IO bypass · c81ffa32

由 Tang Junhui 提交于 9月 06, 2017

Sequential write IOs were tested with bs=1M by FIO in writeback cache
mode, these IOs were expected to be bypassed, but actually they did not.
We debug the code, and find in check_should_bypass():
    if (!congested &&
        mode == CACHE_MODE_WRITEBACK &&
        op_is_write(bio_op(bio)) &&
        (bio-＞bi_opf & REQ_SYNC))
        goto rescale
that means, If in writeback mode, a write IO with REQ_SYNC flag will not
be bypassed though it is a sequential large IO, It's not a correct thing
to do actually, so this patch remove these codes.
Signed-off-by: Ntang.junhui <tang.junhui@zte.com.cn>
Reviewed-by: NKent Overstreet <kent.overstreet@gmail.com>
Reviewed-by: NEric Wheeler <bcache@linux.ewheeler.net>
Cc: stable@vger.kernel.org
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c81ffa32

bcache: Fix leak of bdev reference · 4b758df2

由 Jan Kara 提交于 9月 06, 2017

If blkdev_get_by_path() in register_bcache() fails, we try to lookup the
block device using lookup_bdev() to detect which situation we are in to
properly report error. However we never drop the reference returned to
us from lookup_bdev(). Fix that.
Signed-off-by: NJan Kara <jack@suse.cz>
Acked-by: NColy Li <colyli@suse.de>
Cc: stable@vger.kernel.org
Signed-off-by: NJens Axboe <axboe@kernel.dk>

4b758df2

24 8月, 2017 2 次提交

block: replace bi_bdev with a gendisk pointer and partitions index · 74d46992

由 Christoph Hellwig 提交于 8月 23, 2017

This way we don't need a block_device structure to submit I/O.  The
block_device has different life time rules from the gendisk and
request_queue and is usually only available when the block device node
is open.  Other callers need to explicitly create one (e.g. the lightnvm
passthrough code, or the new nvme multipathing code).

For the actual I/O path all that we need is the gendisk, which exists
once per block device.  But given that the block layer also does
partition remapping we additionally need a partition index, which is
used for said remapping in generic_make_request.

Note that all the block drivers generally want request_queue or
sometimes the gendisk, so this removes a layer of indirection all
over the stack.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

74d46992

raid5: remove a call to get_start_sect · 10433d04

由 Christoph Hellwig 提交于 8月 23, 2017

The block layer always remaps partitions before calling into the
->make_request methods of drivers.  Thus the call to get_start_sect in
in_chunk_boundary will always return 0 and can be removed.
Reviewed-by: NShaohua Li <shli@fb.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

10433d04

12 8月, 2017 1 次提交

MD: not clear ->safemode for external metadata array · afc1f55c

由 Shaohua Li 提交于 8月 11, 2017

->safemode should be triggered by mdadm for external metadaa array, otherwise
array's state confuses mdadm.

Fixes: 33182d15(md: always clear ->safemode when md_check_recovery gets the mddev lock.)
Cc: NeilBrown <neilb@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>

afc1f55c

10 8月, 2017 2 次提交

block: pass in queue to inflight accounting · d62e26b3

由 Jens Axboe 提交于 6月 30, 2017

No functional change in this patch, just in preparation for
basing the inflight mechanism on the queue in question.
Reviewed-by: NBart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d62e26b3

dm-crypt: don't mess with BIP_BLOCK_INTEGRITY · 62d20aa6

由 Christoph Hellwig 提交于 8月 09, 2017

This flag is never set right after calling bio_integrity_alloc,
so don't clear it and confuse the reader.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

62d20aa6

08 8月, 2017 4 次提交

md/r5cache: fix io_unit handling in r5l_log_endio() · a9501d74

由 Song Liu 提交于 8月 03, 2017

In r5l_log_endio(), once log->io_list_lock is released, the io unit
may be accessed (or even freed) by other threads. Current code
doesn't handle the io_unit properly, which leads to potential race
conditions.

This patch solves this race condition by:

1. Add a pending_stripe count flush_payload. Multiple flush_payloads
   are counted as only one pending_stripe. Flag has_flush_payload is
   added to show whether the io unit has flush_payload;
2. In r5l_log_endio(), check flags has_null_flush and
   has_flush_payload with log->io_list_lock held. After the lock
   is released, this IO unit is only accessed when we know the
   pending_stripe counter cannot be zeroed by other threads.
Signed-off-by: NSong Liu <songliubraving@fb.com>
Signed-off-by: NShaohua Li <shli@fb.com>

a9501d74

md/r5cache: call mddev_lock/unlock() in r5c_journal_mode_set · b44886c5

由 Song Liu 提交于 7月 31, 2017

In r5c_journal_mode_set(), it is necessary to call mddev_lock()
before accessing conf and conf->log. Otherwise, the conf->log
may change (and become NULL).

Shaohua: fix unlock in failure cases
Signed-off-by: NSong Liu <songliubraving@fb.com>
Signed-off-by: NShaohua Li <shli@fb.com>

b44886c5

md: fix test in md_write_start() · 81fe48e9

由 NeilBrown 提交于 8月 08, 2017

md_write_start() needs to clear the in_sync flag is it is set, or if
there might be a race with set_in_sync() such that the later will
set it very soon.  In the later case it is sufficient to take the
spinlock to synchronize with set_in_sync(), and then set the flag
if needed.

The current test is incorrect.
It should be:
  if "flag is set" or "race is possible"

"flag is set" is trivially "mddev->in_sync".
"race is possible" should be tested by "mddev->sync_checkers".

If sync_checkers is 0, then there can be no race.  set_in_sync() will
wait in percpu_ref_switch_to_atomic_sync() for an RCU grace period,
and as md_write_start() holds the rcu_read_lock(), set_in_sync() will
be sure ot see the update to writes_pending.

If sync_checkers is > 0, there could be race.  If md_write_start()
happened entirely between
		if (!mddev->in_sync &&
		    percpu_ref_is_zero(&mddev->writes_pending)) {
and
			mddev->in_sync = 1;
in set_in_sync(), then it would not see that is_sync had been set,
and set_in_sync() would not see that writes_pending had been
incremented.

This bug means that in_sync is sometimes not set when it should be.
Consequently there is a small chance that the array will be marked as
"clean" when in fact it is inconsistent.

Fixes: 4ad23a97 ("MD: use per-cpu counter for writes_pending")
cc: stable@vger.kernel.org (v4.12+)
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>

81fe48e9

md: always clear ->safemode when md_check_recovery gets the mddev lock. · 33182d15

由 NeilBrown 提交于 8月 08, 2017

If ->safemode == 1, md_check_recovery() will try to get the mddev lock
and perform various other checks.
If mddev->in_sync is zero, it will call set_in_sync, and clear
->safemode.  However if mddev->in_sync is not zero, ->safemode will not
be cleared.

When md_check_recovery() drops the mddev lock, the thread is woken
up again.  Normally it would just check if there was anything else to
do, find nothing, and go to sleep.  However as ->safemode was not
cleared, it will take the mddev lock again, then wake itself up
when unlocking.

This results in an infinite loop, repeatedly calling
md_check_recovery(), which RCU or the soft-lockup detector
will eventually complain about.

Prior to commit 4ad23a97 ("MD: use per-cpu counter for
writes_pending"), safemode would only be set to one when the
writes_pending counter reached zero, and would be cleared again
when writes_pending is incremented.  Since that patch, safemode
is set more freely, but is not reliably cleared.

So in md_check_recovery() clear ->safemode before checking ->in_sync.

Fixes: 4ad23a97 ("MD: use per-cpu counter for writes_pending")
Cc: stable@vger.kernel.org (4.12+)
Reported-by: NDominik Brodowski <linux@dominikbrodowski.net>
Reported-by: NDavid R <david@unsolicited.net>
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>

33182d15

27 7月, 2017 3 次提交

dm, dax: Make sure dm_dax_flush() is called if device supports it · 273752c9

由 Vivek Goyal 提交于 7月 26, 2017

Currently dm_dax_flush() is not being called, even if underlying dax
device supports write cache, because DAXDEV_WRITE_CACHE is not being
propagated up to the DM dax device.

If the underlying dax device supports write cache, set
DAXDEV_WRITE_CACHE on the DM dax device.  This will cause dm_dax_flush()
to be called.

Fixes: abebfbe2 ("dm: add ->flush() dax operation support")
Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
Acked-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

273752c9

dm verity fec: fix GFP flags used with mempool_alloc() · 34c96507

由 NeilBrown 提交于 4月 10, 2017

mempool_alloc() cannot fail for GFP_NOIO allocation, so there is no
point testing for failure.

One place the code tested for failure was passing "0" as the GFP
flags.  This is most unusual and is probably meant to be GFP_NOIO,
so that is changed.

Also, allocation from ->extra_pool and ->prealloc_pool are repeated
before releasing the previous allocation.  This can deadlock if the code
is servicing a write under high memory pressure.  To avoid deadlocks,
change these to use GFP_NOWAIT and leave the error handling in place.
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

34c96507

dm zoned: use GFP_NOIO in I/O path · 4218a955

由 Damien Le Moal 提交于 7月 24, 2017

Use GFP_NOIO for memory allocations in the I/O path.  Other memory
allocations in the initialization path can use GFP_KERNEL.
Reported-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

4218a955

26 7月, 2017 6 次提交

MD: fix warnning for UP case · ed9b66d2

由 Shaohua Li 提交于 7月 25, 2017

spin_is_locked always returns 0 for UP case, so ignores it
Reported-by: NJoshua Kinard <kumba@gentoo.org>
Signed-off-by: NShaohua Li <shli@fb.com>

ed9b66d2

dm zoned: remove test for impossible REQ_OP_FLUSH conditions · edbe9597

由 Mikulas Patocka 提交于 7月 21, 2017

The value REQ_OP_FLUSH is only used by the block code for
request-based devices.

Remove the tests for REQ_OP_FLUSH from the bio-based dm-zoned-target.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Reviewed-by: NDamien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

edbe9597

dm raid: bump target version · ac6a3188

由 Heinz Mauelshagen 提交于 7月 13, 2017

Bumo dm-raid target version to 1.12.1 to reflect that commit cc27b0c7
("md: fix deadlock between mddev_suspend() and md_write_start()") is
available.

This version change allows userspace to detect that MD fix is available.
Signed-off-by: NHeinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

ac6a3188

dm raid: avoid mddev->suspended access · 0cf352e5

由 Heinz Mauelshagen 提交于 7月 13, 2017

Use runtime flag to ensure that an mddev gets suspended/resumed just once.
Signed-off-by: NHeinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

0cf352e5

dm raid: fix activation check in validate_raid_redundancy() · f4af3f82

由 Heinz Mauelshagen 提交于 7月 13, 2017

During growing reshapes (i.e. stripes being added to a raid set), the
new stripe images are not in-sync and not part of the raid set until
the reshape is started.

LVM2 has to request multiple table reloads involving superblock updates
in order to reflect proper size of SubLVs in the cluster.  Before a stripe
adding reshape starts, validate_raid_redundancy() fails as a result of that
because it checks the total number of devices against the number of rebuild
ones rather than the actual ones in the raid set (as retrieved from the
superblock) thus resulting in failed raid4/5/6/10 redundancy checks.

E.g. convert 3 stripes -> 7 stripes raid5 (which only allows for maximum
1 device to fail) requesting +4 delta disks causing 4 devices to rebuild
during reshaping thus failing activation.

To fix this, move validate_raid_redundancy() to get access to the
current raid_set members.
Signed-off-by: NHeinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

f4af3f82

H
dm raid: remove WARN_ON() in raid10_md_layout_to_format() · bbac1e06
由 Heinz Mauelshagen 提交于 7月 13, 2017
```
Signed-off-by: NHeinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
```
bbac1e06

25 7月, 2017 3 次提交

dm bufio: fix error code in dm_bufio_write_dirty_buffers() · edc11d49

由 Dan Carpenter 提交于 7月 12, 2017

We should be returning normal negative error codes here.  The "a"
variables comes from &c->async_write_error which is a blk_status_t
converted to a regular error code.

In the current code, the blk_status_t gets propogated back to
pool_create() and eventually results in an Oops.

Fixes: 4e4cbee9 ("block: switch bios to blk_status_t")
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

edc11d49

dm integrity: test for corrupted disk format during table load · bc86a41e

由 Mikulas Patocka 提交于 7月 21, 2017

If the dm-integrity superblock was corrupted in such a way that the
journal_sections field was zero, the integrity target would deadlock
because it would wait forever for free space in the journal.

Detect this situation and refuse to activate the device.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Fixes: 7eada909 ("dm: add integrity target")
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

bc86a41e

dm integrity: WARN_ON if variables representing journal usage get out of sync · aa03a91f

由 Mikulas Patocka 提交于 7月 21, 2017

If this WARN_ON triggers it speaks to programmer error, and likely
implies corruption, but no released kernel should trigger it.  This
WARN_ON serves to assist DM integrity developers as changes are
made/tested in the future.

BUG_ON is excessive for catching programmer error, if a user or
developer would like warnings to trigger a panic, they can enable that
via /proc/sys/kernel/panic_on_warn
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

aa03a91f

24 7月, 2017 1 次提交

md/raid5: add thread_group worker async_tx_issue_pending_all · 7e96d559

由 Ofer Heifetz 提交于 7月 24, 2017

Since thread_group worker and raid5d kthread are not in sync, if
worker writes stripe before raid5d then requests will be waiting
for issue_pendig.

Issue observed when building raid5 with ext4, in some build runs
jbd2 would get hung and requests were waiting in the HW engine
waiting to be issued.

Fix this by adding a call to async_tx_issue_pending_all in the
raid5_do_work.
Signed-off-by: NOfer Heifetz <oferh@marvell.com>
Cc: stable@vger.kernel.org
Signed-off-by: NShaohua Li <shli@fb.com>

7e96d559

22 7月, 2017 5 次提交

md: simplify code with bio_io_error · 6308d8e3

由 Guoqing Jiang 提交于 7月 21, 2017

Since bio_io_error sets bi_status to BLK_STS_IOERR,
and calls bio_endio, so we can use it directly.

And as mentioned by Shaohua, there are also two
places in raid5.c can use bio_io_error either.
Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>

6308d8e3

md/raid1: fix writebehind bio clone · 16d56e2f

由 Shaohua Li 提交于 7月 17, 2017

After bio is submitted, we should not clone it as its bi_iter might be
invalid by driver. This is the case of behind_master_bio. In certain
situration, we could dispatch behind_master_bio immediately for the
first disk and then clone it for other disks.

https://bugzilla.kernel.org/show_bug.cgi?id=196383Reported-and-tested-by: NMarkus <m4rkusxxl@web.de>
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Fix: 841c1316(md: raid1: improve write behind)
Cc: stable@vger.kernel.org (4.12+)
Signed-off-by: NShaohua Li <shli@fb.com>

16d56e2f

md: raid1-10: move raid1/raid10 common code into raid1-10.c · be453e77

由 Ming Lei 提交于 7月 14, 2017

No function change, just move 'struct resync_pages' and related
helpers into raid1-10.c
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NShaohua Li <shli@fb.com>

be453e77

md: raid1/raid10: initialize bvec table via bio_add_page() · fb0eb5df

由 Ming Lei 提交于 7月 14, 2017

We will support multipage bvec soon, so initialize bvec
table using the standardy way instead of writing the
talbe directly. Otherwise it won't work any more once
multipage bvec is enabled.
Acked-by: NGuoqing Jiang <gqjiang@suse.com>
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NShaohua Li <shli@fb.com>

fb0eb5df

md: remove 'idx' from 'struct resync_pages' · 022e510f

由 Ming Lei 提交于 7月 14, 2017

bio_add_page() won't fail for resync bio, and the page index for each
bio is same, so remove it.

More importantly the 'idx' of 'struct resync_pages' is initialized in
mempool allocator function, the current way is wrong since mempool is
only responsible for allocation, we can't use that for initialization.
Suggested-by: NNeilBrown <neilb@suse.com>
Reported-by: NNeilBrown <neilb@suse.com>
Reported-and-tested-by: NPatrick <dto@gmx.net>
Fixes: f0250618(md: raid10: don't use bio's vec table to manage resync pages)
Fixes: 98d30c58(md: raid1: don't use bio's vec table to manage resync pages)
Cc: stable@vger.kernel.org (4.12+)
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NShaohua Li <shli@fb.com>

022e510f

20 7月, 2017 2 次提交

dm integrity: use plugging when writing the journal · a7c3e62b

由 Mikulas Patocka 提交于 7月 19, 2017

When copying data from the journal to the appropriate place, we submit
many IOs.  Some of these IOs could go to adjacent areas.  Use on-stack
plugging so that adjacent IOs get merged during submission.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

a7c3e62b

dm integrity: fix inefficient allocation of journal space · 9dd59727

由 Mikulas Patocka 提交于 7月 19, 2017

When using a block size greater than 512 bytes, the dm-integrity target
allocates journal space inefficiently.  It allocates one journal entry
for each 512-byte chunk of data, fills an entry for each block of data
and leaves the remaining entries unused.

This issue doesn't cause data corruption, but all the unused journal
entries degrade performance severely.

For example, with 4k blocks and an 8k bio, it would allocate 16 journal
entries but only use 2 entries.  The remaining 14 entries were left
unused.

Fix this by adding the missing 'log2_sectors_per_block' shifts that are
required to have each journal entry map to a full block.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Fixes: 7eada909 ("dm: add integrity target")
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

9dd59727

13 7月, 2017 1 次提交

raid5-ppl: use BIOSET_NEED_BVECS when creating bioset · 6409e84e

由 Artur Paszkiewicz 提交于 7月 11, 2017

This bioset is used for allocating bios with nr_iovecs > 0 so this flag
must be set.

Fixes: 011067b0 ("blk: replace bioset_create_nobvec() with a flags arg to bioset_create()")
Signed-off-by: NArtur Paszkiewicz <artur.paszkiewicz@intel.com>
Signed-off-by: NShaohua Li <shli@fb.com>

6409e84e

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功