提交 · fcd07350007bdcc0aab506fb9b5703fad48a6521 · openeuler / raspberrypi-kernel

12 8月, 2017 1 次提交

MD: not clear ->safemode for external metadata array · afc1f55c

由 Shaohua Li 提交于 8月 11, 2017

->safemode should be triggered by mdadm for external metadaa array, otherwise
array's state confuses mdadm.

Fixes: 33182d15(md: always clear ->safemode when md_check_recovery gets the mddev lock.)
Cc: NeilBrown <neilb@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>

afc1f55c

08 8月, 2017 4 次提交

md/r5cache: fix io_unit handling in r5l_log_endio() · a9501d74

由 Song Liu 提交于 8月 03, 2017

In r5l_log_endio(), once log->io_list_lock is released, the io unit
may be accessed (or even freed) by other threads. Current code
doesn't handle the io_unit properly, which leads to potential race
conditions.

This patch solves this race condition by:

1. Add a pending_stripe count flush_payload. Multiple flush_payloads
   are counted as only one pending_stripe. Flag has_flush_payload is
   added to show whether the io unit has flush_payload;
2. In r5l_log_endio(), check flags has_null_flush and
   has_flush_payload with log->io_list_lock held. After the lock
   is released, this IO unit is only accessed when we know the
   pending_stripe counter cannot be zeroed by other threads.
Signed-off-by: NSong Liu <songliubraving@fb.com>
Signed-off-by: NShaohua Li <shli@fb.com>

a9501d74

md/r5cache: call mddev_lock/unlock() in r5c_journal_mode_set · b44886c5

由 Song Liu 提交于 7月 31, 2017

In r5c_journal_mode_set(), it is necessary to call mddev_lock()
before accessing conf and conf->log. Otherwise, the conf->log
may change (and become NULL).

Shaohua: fix unlock in failure cases
Signed-off-by: NSong Liu <songliubraving@fb.com>
Signed-off-by: NShaohua Li <shli@fb.com>

b44886c5

md: fix test in md_write_start() · 81fe48e9

由 NeilBrown 提交于 8月 08, 2017

md_write_start() needs to clear the in_sync flag is it is set, or if
there might be a race with set_in_sync() such that the later will
set it very soon.  In the later case it is sufficient to take the
spinlock to synchronize with set_in_sync(), and then set the flag
if needed.

The current test is incorrect.
It should be:
  if "flag is set" or "race is possible"

"flag is set" is trivially "mddev->in_sync".
"race is possible" should be tested by "mddev->sync_checkers".

If sync_checkers is 0, then there can be no race.  set_in_sync() will
wait in percpu_ref_switch_to_atomic_sync() for an RCU grace period,
and as md_write_start() holds the rcu_read_lock(), set_in_sync() will
be sure ot see the update to writes_pending.

If sync_checkers is > 0, there could be race.  If md_write_start()
happened entirely between
		if (!mddev->in_sync &&
		    percpu_ref_is_zero(&mddev->writes_pending)) {
and
			mddev->in_sync = 1;
in set_in_sync(), then it would not see that is_sync had been set,
and set_in_sync() would not see that writes_pending had been
incremented.

This bug means that in_sync is sometimes not set when it should be.
Consequently there is a small chance that the array will be marked as
"clean" when in fact it is inconsistent.

Fixes: 4ad23a97 ("MD: use per-cpu counter for writes_pending")
cc: stable@vger.kernel.org (v4.12+)
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>

81fe48e9

md: always clear ->safemode when md_check_recovery gets the mddev lock. · 33182d15

由 NeilBrown 提交于 8月 08, 2017

If ->safemode == 1, md_check_recovery() will try to get the mddev lock
and perform various other checks.
If mddev->in_sync is zero, it will call set_in_sync, and clear
->safemode.  However if mddev->in_sync is not zero, ->safemode will not
be cleared.

When md_check_recovery() drops the mddev lock, the thread is woken
up again.  Normally it would just check if there was anything else to
do, find nothing, and go to sleep.  However as ->safemode was not
cleared, it will take the mddev lock again, then wake itself up
when unlocking.

This results in an infinite loop, repeatedly calling
md_check_recovery(), which RCU or the soft-lockup detector
will eventually complain about.

Prior to commit 4ad23a97 ("MD: use per-cpu counter for
writes_pending"), safemode would only be set to one when the
writes_pending counter reached zero, and would be cleared again
when writes_pending is incremented.  Since that patch, safemode
is set more freely, but is not reliably cleared.

So in md_check_recovery() clear ->safemode before checking ->in_sync.

Fixes: 4ad23a97 ("MD: use per-cpu counter for writes_pending")
Cc: stable@vger.kernel.org (4.12+)
Reported-by: NDominik Brodowski <linux@dominikbrodowski.net>
Reported-by: NDavid R <david@unsolicited.net>
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>

33182d15

27 7月, 2017 3 次提交

dm, dax: Make sure dm_dax_flush() is called if device supports it · 273752c9

由 Vivek Goyal 提交于 7月 26, 2017

Currently dm_dax_flush() is not being called, even if underlying dax
device supports write cache, because DAXDEV_WRITE_CACHE is not being
propagated up to the DM dax device.

If the underlying dax device supports write cache, set
DAXDEV_WRITE_CACHE on the DM dax device.  This will cause dm_dax_flush()
to be called.

Fixes: abebfbe2 ("dm: add ->flush() dax operation support")
Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
Acked-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

273752c9

dm verity fec: fix GFP flags used with mempool_alloc() · 34c96507

由 NeilBrown 提交于 4月 10, 2017

mempool_alloc() cannot fail for GFP_NOIO allocation, so there is no
point testing for failure.

One place the code tested for failure was passing "0" as the GFP
flags.  This is most unusual and is probably meant to be GFP_NOIO,
so that is changed.

Also, allocation from ->extra_pool and ->prealloc_pool are repeated
before releasing the previous allocation.  This can deadlock if the code
is servicing a write under high memory pressure.  To avoid deadlocks,
change these to use GFP_NOWAIT and leave the error handling in place.
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

34c96507

dm zoned: use GFP_NOIO in I/O path · 4218a955

由 Damien Le Moal 提交于 7月 24, 2017

Use GFP_NOIO for memory allocations in the I/O path.  Other memory
allocations in the initialization path can use GFP_KERNEL.
Reported-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

4218a955

26 7月, 2017 6 次提交

MD: fix warnning for UP case · ed9b66d2

由 Shaohua Li 提交于 7月 25, 2017

spin_is_locked always returns 0 for UP case, so ignores it
Reported-by: NJoshua Kinard <kumba@gentoo.org>
Signed-off-by: NShaohua Li <shli@fb.com>

ed9b66d2

dm zoned: remove test for impossible REQ_OP_FLUSH conditions · edbe9597

由 Mikulas Patocka 提交于 7月 21, 2017

The value REQ_OP_FLUSH is only used by the block code for
request-based devices.

Remove the tests for REQ_OP_FLUSH from the bio-based dm-zoned-target.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Reviewed-by: NDamien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

edbe9597

dm raid: bump target version · ac6a3188

由 Heinz Mauelshagen 提交于 7月 13, 2017

Bumo dm-raid target version to 1.12.1 to reflect that commit cc27b0c7
("md: fix deadlock between mddev_suspend() and md_write_start()") is
available.

This version change allows userspace to detect that MD fix is available.
Signed-off-by: NHeinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

ac6a3188

dm raid: avoid mddev->suspended access · 0cf352e5

由 Heinz Mauelshagen 提交于 7月 13, 2017

Use runtime flag to ensure that an mddev gets suspended/resumed just once.
Signed-off-by: NHeinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

0cf352e5

dm raid: fix activation check in validate_raid_redundancy() · f4af3f82

由 Heinz Mauelshagen 提交于 7月 13, 2017

During growing reshapes (i.e. stripes being added to a raid set), the
new stripe images are not in-sync and not part of the raid set until
the reshape is started.

LVM2 has to request multiple table reloads involving superblock updates
in order to reflect proper size of SubLVs in the cluster.  Before a stripe
adding reshape starts, validate_raid_redundancy() fails as a result of that
because it checks the total number of devices against the number of rebuild
ones rather than the actual ones in the raid set (as retrieved from the
superblock) thus resulting in failed raid4/5/6/10 redundancy checks.

E.g. convert 3 stripes -> 7 stripes raid5 (which only allows for maximum
1 device to fail) requesting +4 delta disks causing 4 devices to rebuild
during reshaping thus failing activation.

To fix this, move validate_raid_redundancy() to get access to the
current raid_set members.
Signed-off-by: NHeinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

f4af3f82

H
dm raid: remove WARN_ON() in raid10_md_layout_to_format() · bbac1e06
由 Heinz Mauelshagen 提交于 7月 13, 2017
```
Signed-off-by: NHeinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
```
bbac1e06

25 7月, 2017 3 次提交

dm bufio: fix error code in dm_bufio_write_dirty_buffers() · edc11d49

由 Dan Carpenter 提交于 7月 12, 2017

We should be returning normal negative error codes here.  The "a"
variables comes from &c->async_write_error which is a blk_status_t
converted to a regular error code.

In the current code, the blk_status_t gets propogated back to
pool_create() and eventually results in an Oops.

Fixes: 4e4cbee9 ("block: switch bios to blk_status_t")
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

edc11d49

dm integrity: test for corrupted disk format during table load · bc86a41e

由 Mikulas Patocka 提交于 7月 21, 2017

If the dm-integrity superblock was corrupted in such a way that the
journal_sections field was zero, the integrity target would deadlock
because it would wait forever for free space in the journal.

Detect this situation and refuse to activate the device.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Fixes: 7eada909 ("dm: add integrity target")
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

bc86a41e

dm integrity: WARN_ON if variables representing journal usage get out of sync · aa03a91f

由 Mikulas Patocka 提交于 7月 21, 2017

If this WARN_ON triggers it speaks to programmer error, and likely
implies corruption, but no released kernel should trigger it.  This
WARN_ON serves to assist DM integrity developers as changes are
made/tested in the future.

BUG_ON is excessive for catching programmer error, if a user or
developer would like warnings to trigger a panic, they can enable that
via /proc/sys/kernel/panic_on_warn
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

aa03a91f

24 7月, 2017 1 次提交

md/raid5: add thread_group worker async_tx_issue_pending_all · 7e96d559

由 Ofer Heifetz 提交于 7月 24, 2017

Since thread_group worker and raid5d kthread are not in sync, if
worker writes stripe before raid5d then requests will be waiting
for issue_pendig.

Issue observed when building raid5 with ext4, in some build runs
jbd2 would get hung and requests were waiting in the HW engine
waiting to be issued.

Fix this by adding a call to async_tx_issue_pending_all in the
raid5_do_work.
Signed-off-by: NOfer Heifetz <oferh@marvell.com>
Cc: stable@vger.kernel.org
Signed-off-by: NShaohua Li <shli@fb.com>

7e96d559

22 7月, 2017 5 次提交

md: simplify code with bio_io_error · 6308d8e3

由 Guoqing Jiang 提交于 7月 21, 2017

Since bio_io_error sets bi_status to BLK_STS_IOERR,
and calls bio_endio, so we can use it directly.

And as mentioned by Shaohua, there are also two
places in raid5.c can use bio_io_error either.
Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>

6308d8e3

md/raid1: fix writebehind bio clone · 16d56e2f

由 Shaohua Li 提交于 7月 17, 2017

After bio is submitted, we should not clone it as its bi_iter might be
invalid by driver. This is the case of behind_master_bio. In certain
situration, we could dispatch behind_master_bio immediately for the
first disk and then clone it for other disks.

https://bugzilla.kernel.org/show_bug.cgi?id=196383Reported-and-tested-by: NMarkus <m4rkusxxl@web.de>
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Fix: 841c1316(md: raid1: improve write behind)
Cc: stable@vger.kernel.org (4.12+)
Signed-off-by: NShaohua Li <shli@fb.com>

16d56e2f

md: raid1-10: move raid1/raid10 common code into raid1-10.c · be453e77

由 Ming Lei 提交于 7月 14, 2017

No function change, just move 'struct resync_pages' and related
helpers into raid1-10.c
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NShaohua Li <shli@fb.com>

be453e77

md: raid1/raid10: initialize bvec table via bio_add_page() · fb0eb5df

由 Ming Lei 提交于 7月 14, 2017

We will support multipage bvec soon, so initialize bvec
table using the standardy way instead of writing the
talbe directly. Otherwise it won't work any more once
multipage bvec is enabled.
Acked-by: NGuoqing Jiang <gqjiang@suse.com>
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NShaohua Li <shli@fb.com>

fb0eb5df

md: remove 'idx' from 'struct resync_pages' · 022e510f

由 Ming Lei 提交于 7月 14, 2017

bio_add_page() won't fail for resync bio, and the page index for each
bio is same, so remove it.

More importantly the 'idx' of 'struct resync_pages' is initialized in
mempool allocator function, the current way is wrong since mempool is
only responsible for allocation, we can't use that for initialization.
Suggested-by: NNeilBrown <neilb@suse.com>
Reported-by: NNeilBrown <neilb@suse.com>
Reported-and-tested-by: NPatrick <dto@gmx.net>
Fixes: f0250618(md: raid10: don't use bio's vec table to manage resync pages)
Fixes: 98d30c58(md: raid1: don't use bio's vec table to manage resync pages)
Cc: stable@vger.kernel.org (4.12+)
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NShaohua Li <shli@fb.com>

022e510f

20 7月, 2017 2 次提交

dm integrity: use plugging when writing the journal · a7c3e62b

由 Mikulas Patocka 提交于 7月 19, 2017

When copying data from the journal to the appropriate place, we submit
many IOs.  Some of these IOs could go to adjacent areas.  Use on-stack
plugging so that adjacent IOs get merged during submission.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

a7c3e62b

dm integrity: fix inefficient allocation of journal space · 9dd59727

由 Mikulas Patocka 提交于 7月 19, 2017

When using a block size greater than 512 bytes, the dm-integrity target
allocates journal space inefficiently.  It allocates one journal entry
for each 512-byte chunk of data, fills an entry for each block of data
and leaves the remaining entries unused.

This issue doesn't cause data corruption, but all the unused journal
entries degrade performance severely.

For example, with 4k blocks and an 8k bio, it would allocate 16 journal
entries but only use 2 entries.  The remaining 14 entries were left
unused.

Fix this by adding the missing 'log2_sectors_per_block' shifts that are
required to have each journal entry map to a full block.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Fixes: 7eada909 ("dm: add integrity target")
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

9dd59727

13 7月, 2017 1 次提交

raid5-ppl: use BIOSET_NEED_BVECS when creating bioset · 6409e84e

由 Artur Paszkiewicz 提交于 7月 11, 2017

This bioset is used for allocating bios with nr_iovecs > 0 so this flag
must be set.

Fixes: 011067b0 ("blk: replace bioset_create_nobvec() with a flags arg to bioset_create()")
Signed-off-by: NArtur Paszkiewicz <artur.paszkiewicz@intel.com>
Signed-off-by: NShaohua Li <shli@fb.com>

6409e84e

11 7月, 2017 2 次提交

Raid5 should update rdev->sectors after reshape · b5d27718

由 Xiao Ni 提交于 7月 05, 2017

The raid5 md device is created by the disks which we don't use the total size. For example,
the size of the device is 5G and it just uses 3G of the devices to create one raid5 device.
Then change the chunksize and wait reshape to finish. After reshape finishing stop the raid
and assemble it again. It fails.
mdadm -CR /dev/md0 -l5 -n3 /dev/loop[0-2] --size=3G --chunk=32 --assume-clean
mdadm /dev/md0 --grow --chunk=64
wait reshape to finish
mdadm -S /dev/md0
mdadm -As
The error messages:
[197519.814302] md: loop1 does not have a valid v1.2 superblock, not importing!
[197519.821686] md: md_import_device returned -22

After reshape the data offset is changed. It selects backwards direction in this condition.
In function super_1_load it compares the available space of the underlying device with
sb->data_size. The new data offset gets bigger after reshape. So super_1_load returns -EINVAL.
rdev->sectors is updated in md_finish_reshape. Then sb->data_size is set in super_1_sync based
on rdev->sectors. So add md_finish_reshape in end_reshape.
Signed-off-by: NXiao Ni <xni@redhat.com>
Acked-by: NGuoqing Jiang <gqjiang@suse.com>
Cc: stable@vger.kernel.org
Signed-off-by: NShaohua Li <shli@fb.com>

b5d27718

md/bitmap: don't read page from device with Bitmap_sync · 4aaf7694

由 Guoqing Jiang 提交于 7月 04, 2017

The device owns Bitmap_sync flag needs recovery
to become in sync, and read page from this type
device could get stale status.

Also add comments for Bitmap_sync bit per the
suggestion from Shaohua and Neil.

Previous disscussion can be found here:
https://marc.info/?t=149760428900004&r=1&w=2Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>

4aaf7694

06 7月, 2017 1 次提交

dm zoned: fix overflow when converting zone ID to sectors · 3908c983

由 Damien Le Moal 提交于 7月 03, 2017

A zone ID is a 32 bits unsigned int which can overflow when doing the
bit shifts in dmz_start_sect().  With a 256 MB zone size drive, the
overflow happens for a zone ID >= 8192.

Fix this by casting the zone ID to a sector_t before doing the bit
shift.  While at it, similarly fix dmz_start_block().
Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

3908c983

04 7月, 2017 2 次提交

bio-integrity: fix interface for bio_integrity_trim · fbd08e76

由 Dmitry Monakhov 提交于 6月 29, 2017

bio_integrity_trim inherent it's interface from bio_trim and accept
offset and size, but this API is error prone because data offset
must always be insync with bio's data offset. That is why we have
integrity update hook in bio_advance()

So only meaningful values are: offset == 0, sectors == bio_sectors(bio)
Let's just remove them completely.
Reviewed-by: NHannes Reinecke <hare@suse.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

fbd08e76

MD: fix sleep in atomic · 7184ef8b

由 Shaohua Li 提交于 7月 03, 2017

bioset_free() will take a mutex, so can't get called with spinlock hold.

Fix: 5a85071c(md: use a separate bio_set for synchronous IO.)
Cc: NeilBrown <neilb@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>

7184ef8b

30 6月, 2017 1 次提交

dm raid: stop using BUG() in __rdev_sectors() · 4d49f1b4

由 Heinz Mauelshagen 提交于 6月 30, 2017

Return 0 rather than BUG() if __rdev_sectors() fails and catch invalid
rdev size in the constructor.
Reported-by: NHannes Reinecke <hare@suse.de>
Cc: stable@vger.kernel.org
Signed-off-by: NHeinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

4d49f1b4

28 6月, 2017 2 次提交

dm thin: do not queue freed thin mapping for next stage processing · 00a0ea33

由 Vallish Vaidyeshwara 提交于 6月 23, 2017

process_prepared_discard_passdown_pt1() should cleanup
dm_thin_new_mapping in cases of error.

dm_pool_inc_data_range() can fail trying to get a block reference:

metadata operation 'dm_pool_inc_data_range' failed: error = -61

When dm_pool_inc_data_range() fails, dm thin aborts current metadata
transaction and marks pool as PM_READ_ONLY. Memory for thin mapping
is released as well. However, current thin mapping will be queued
onto next stage as part of queue_passdown_pt2() or passdown_endio().
This dangling thin mapping memory when processed and accessed in
next stage will lead to device mapper crashing.

Code flow without fix:
-> process_prepared_discard_passdown_pt1(m)
   -> dm_thin_remove_range()
   -> discard passdown
      --> passdown_endio(m) queues m onto next stage
   -> dm_pool_inc_data_range() fails, frees memory m
            but does not remove it from next stage queue

-> process_prepared_discard_passdown_pt2(m)
   -> processes freed memory m and crashes

One such stack:

Call Trace:
[<ffffffffa037a46f>] dm_cell_release_no_holder+0x2f/0x70 [dm_bio_prison]
[<ffffffffa039b6dc>] cell_defer_no_holder+0x3c/0x80 [dm_thin_pool]
[<ffffffffa039b88b>] process_prepared_discard_passdown_pt2+0x4b/0x90 [dm_thin_pool]
[<ffffffffa0399611>] process_prepared+0x81/0xa0 [dm_thin_pool]
[<ffffffffa039e735>] do_worker+0xc5/0x820 [dm_thin_pool]
[<ffffffff8152bf54>] ? __schedule+0x244/0x680
[<ffffffff81087e72>] ? pwq_activate_delayed_work+0x42/0xb0
[<ffffffff81089f53>] process_one_work+0x153/0x3f0
[<ffffffff8108a71b>] worker_thread+0x12b/0x4b0
[<ffffffff8108a5f0>] ? rescuer_thread+0x350/0x350
[<ffffffff8108fd6a>] kthread+0xca/0xe0
[<ffffffff8108fca0>] ? kthread_park+0x60/0x60
[<ffffffff81530b45>] ret_from_fork+0x25/0x30

The fix is to first take the block ref count for discarded block and
then do a passdown discard of this block. If block ref count fails,
then bail out aborting current metadata transaction, mark pool as
PM_READ_ONLY and also free current thin mapping memory (existing error
handling code) without queueing this thin mapping onto next stage of
processing. If block ref count succeeds, then passdown discard of this
block. Discard callback of passdown_endio() will queue this thin mapping
onto next stage of processing.

Code flow with fix:
-> process_prepared_discard_passdown_pt1(m)
   -> dm_thin_remove_range()
   -> dm_pool_inc_data_range()
      --> if fails, free memory m and bail out
   -> discard passdown
      --> passdown_endio(m) queues m onto next stage

Cc: stable <stable@vger.kernel.org> # v4.9+
Reviewed-by: NEduardo Valentin <eduval@amazon.com>
Reviewed-by: NCristian Gafton <gafton@amazon.com>
Reviewed-by: NAnchal Agarwal <anchalag@amazon.com>
Signed-off-by: NVallish Vaidyeshwara <vallish@amazon.com>
Reviewed-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

00a0ea33

dm: don't set bounce limit · 41341afa

由 Christoph Hellwig 提交于 6月 19, 2017

Now all queues allocators come without abounce limit by default,
dm doesn't have to override this anymore.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

41341afa

24 6月, 2017 2 次提交

MD: fix a null dereference · 7f053a6a

由 Shaohua Li 提交于 6月 23, 2017

rdev->mddev could be null in start time.
Reported-by: NMing Lei <ming.lei@redhat.com>
Fix: 5a85071c(md: use a separate bio_set for synchronous IO.)
Cc: NeilBrown <neilb@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>

7f053a6a

dm raid: fix oops on upgrading to extended superblock format · c4d097d1

由 Heinz Mauelshagen 提交于 6月 23, 2017

When a RAID set was created on dm-raid version < 1.9.0 (old RAID
superblock format), all of the new 1.9.0 members of the superblock are
uninitialized (zero) -- including the device sectors member needed to
support shrinking.

All the other accesses to superblock fields new in 1.9.0 were reviewed
and verified to be properly guarded against invalid use.  The 'sectors'
member was the only one used when the superblock version is < 1.9.

Don't access the superblock's >= 1.9.0 'sectors' member unconditionally.
Also add respective comments.
Signed-off-by: NHeinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

c4d097d1

22 6月, 2017 2 次提交

md: use a separate bio_set for synchronous IO. · 5a85071c

由 NeilBrown 提交于 6月 21, 2017

md devices allocate a bio_set and use it for two
distinct purposes.
mddev->bio_set is used to clone bios as part of sending
upper level requests down to lower level devices,
and it is also use for synchronous IO such as superblock
and bitmap updates, and for correcting read errors.

This multiple usage can lead to deadlocks.  It is likely
that cloned bios might be queued for write and to be
waiting for a metadata update before the write can be permitted.
If the cloning exhausted mddev->bio_set, the metadata update
may not be able to proceed.

This scenario has been seen during heavy testing, with lots of IO and
lots of memory pressure.

Address this by adding a new bio_set specifically for synchronous IO.
All synchronous IO goes directly to the underlying device and is not
queued at the md level, so request using entries from the new
mddev->sync_set will complete in a timely fashion.
Requests that use mddev->bio_set will sometimes need to wait
for synchronous IO, but will no longer risk deadlocking that iO.

Also: small simplification in mddev_put(): there is no need to
wait until the spinlock is released before calling bioset_free().
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>

5a85071c

dm io: fix duplicate bio completion due to missing ref count · feb7695f

由 Mike Snitzer 提交于 6月 20, 2017

If only a subset of the devices associated with multiple regions support
a given special operation (eg. DISCARD) then the dec_count() that is
used to set error for the region must increment the io->count.

Otherwise, when the dec_count() is called it can cause the dm-io
caller's bio to be completed multiple times. As was reported against
the dm-mirror target that had mirror legs with a mix of discard
capabilities.

Bug: https://bugzilla.kernel.org/show_bug.cgi?id=196077Reported-by: NZhang Yi <yizhan@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

feb7695f

21 6月, 2017 1 次提交

dm integrity: fix to not disable/enable interrupts from interrupt context · 7def52b7

由 Mike Snitzer 提交于 6月 19, 2017

Use spin_lock_irqsave and spin_unlock_irqrestore rather than
spin_{lock,unlock}_irq in submit_flush_bio().

Otherwise lockdep issues the following warning:
  DEBUG_LOCKS_WARN_ON(current->hardirq_context)
  WARNING: CPU: 1 PID: 0 at kernel/locking/lockdep.c:2748 trace_hardirqs_on_caller+0x107/0x180
Reported-by: NOndrej Kozina <okozina@redhat.com>
Tested-by: NOndrej Kozina <okozina@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Acked-by: NMikulas Patocka <mpatocka@redhat.com>

7def52b7

20 6月, 2017 1 次提交

sched/wait: Rename wait_queue_t => wait_queue_entry_t · ac6424b9

由 Ingo Molnar 提交于 6月 20, 2017

Rename:

	wait_queue_t		=>	wait_queue_entry_t

'wait_queue_t' was always a slight misnomer: its name implies that it's a "queue",
but in reality it's a queue *entry*. The 'real' queue is the wait queue head,
which had to carry the name.

Start sorting this out by renaming it to 'wait_queue_entry_t'.

This also allows the real structure name 'struct __wait_queue' to
lose its double underscore and become 'struct wait_queue_entry',
which is the more canonical nomenclature for such data types.

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: NIngo Molnar <mingo@kernel.org>

ac6424b9