提交 · a578183ed9dff915878ec6c2b3bf729bf72b9bd1 · openeuler / raspberrypi-kernel

05 5月, 2016 7 次提交

md-cluster: wakeup thread if activated a spare disk · a578183e

由 Guoqing Jiang 提交于 5月 02, 2016

When a device is re-added, it will ultimately need
to be activated and that happens in md_check_recovery,
so we need to set MD_RECOVERY_NEEDED right after
remove_and_add_spares.

A specifical issue without the change is that when
one node perform fail/remove/readd on a disk, but
slave nodes could not add the disk back to array as
expected (added as missed instead of in sync). So
give slave nodes a chance to do resync.
Reviewed-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>

a578183e

md-cluster: change array_sectors and update size are not supported · ab5a98b1

由 Guoqing Jiang 提交于 5月 02, 2016

Currently, some features are not supported yet,
such as change array_sectors and update size, so
return EINVAL for them and listed it in document.
Reviewed-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>

ab5a98b1

md-cluster: fix locking when node joins cluster during message broadcast · 1535212c

由 Guoqing Jiang 提交于 5月 02, 2016

If a node joins the cluster while a message broadcast
is under way, a lock issue could happen as follows.

For a cluster which included two nodes, if node A is
calling __sendmsg before up-convert CR to EX on ack,
and node B released CR on ack. But if a new node C
joins the cluster and it doesn't receive the message
which A sent before, so it could hold CR on ack before
A up-convert CR to EX on ack.

So a node joining the cluster should get an EX lock on
the "token" first to ensure no broadcast is ongoing,
then release it after held CR on ack.
Reviewed-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>

1535212c

md-cluster: unregister thread if err happened · 5b0fb33e

由 Guoqing Jiang 提交于 5月 02, 2016

The two threads need to be unregistered if a node
can't join cluster successfully.
Reviewed-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>

5b0fb33e

md-cluster: wake up thread to continue recovery · eb315cd0

由 Guoqing Jiang 提交于 5月 02, 2016

In recovery case, we need to set MD_RECOVERY_NEEDED
and wake up thread only if recover is not finished.
Reviewed-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>

eb315cd0

md-cluser: make resync_finish only called after pers->sync_request · 2c97cf13

由 Guoqing Jiang 提交于 5月 02, 2016

It is not reasonable that cluster raid to release resync
lock before the last pers->sync_request has finished.

As the metadata will be changed when node performs resync,
we need to inform other nodes to update metadata, so the
MD_CHANGE_PENDING flag is set before finish resync.

Then metadata_update_finish is move ahead to ensure that
METADATA_UPDATED msg is sent before finish resync, and
metadata_update_start need to be run after "repeat:" label
accordingly.
Reviewed-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>

2c97cf13

md-cluster: change resync lock from asynchronous to synchronous · 41a9a0dc

由 Guoqing Jiang 提交于 5月 02, 2016

If multiple nodes choose to attempt do resync at the same time
they need to be serialized so they don't duplicate effort. This
serialization is done by locking the 'resync' DLM lock.

Currently if a node cannot get the lock immediately it doesn't
request notification when the lock becomes available (i.e.
DLM_LKF_NOQUEUE is set), so it may not reliably find out when it
is safe to try again.

Rather than trying to arrange an async wake-up when the lock
becomes available, switch to using synchronous locking - this is
a lot easier to think about.  As it is not permitted to block in
the 'raid1d' thread, move the locking to the resync thread.  So
the rsync thread is forked immediately, but it blocks until the
resync lock is available. Once the lock is locked it checks again
if any resync action is needed.

A particular symptom of the current problem is that a node can
get stuck with "resync=pending" indefinitely.
Reviewed-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>

41a9a0dc

30 4月, 2016 1 次提交

raid5: delete unnecessary warnning · b8a0b8e9

由 Shaohua Li 提交于 4月 29, 2016

If device has R5_LOCKED set, it's legit device has R5_SkipCopy set and page !=
orig_page. After R5_LOCKED is clear, handle_stripe_clean_event will clear the
SkipCopy flag and set page to orig_page. So the warning is unnecessary.
Reported-by: NJoey Liao <joeyliao@qnap.com>
Signed-off-by: NShaohua Li <shli@fb.com>

b8a0b8e9

26 4月, 2016 1 次提交

MD: make bio mergeable · 9c573de3

由 Shaohua Li 提交于 4月 25, 2016

blk_queue_split marks bio unmergeable, which makes sense for normal bio.
But if dispatching the bio to underlayer disk, the blk_queue_split
checks are invalid, hence it's possible the bio becomes mergeable.

In the reported bug, this bug causes trim against raid0 performance slash
https://bugzilla.kernel.org/show_bug.cgi?id=117051Reported-and-tested-by: NPark Ju Hyung <qkrwngud825@gmail.com>
Fixes: 6ac45aeb(block: avoid to merge splitted bio)
Cc: stable@vger.kernel.org (v4.3+)
Cc: Ming Lei <ming.lei@canonical.com>
Cc: Neil Brown <neilb@suse.de>
Reviewed-by: NJens Axboe <axboe@fb.com>
Signed-off-by: NShaohua Li <shli@fb.com>

9c573de3

25 4月, 2016 1 次提交

md/raid0: remove empty line printk from dump_zones · b297874a

由 Michał Pecio 提交于 4月 24, 2016

Remove the final printk. All preceding output is already properly
newline-terminated and the printk isn't even KERN_CONT to begin with,
so it only adds one empty line to the log.
Signed-off-by: NMichal Pecio <michal.pecio@gmail.com>
Signed-off-by: NShaohua Li <shli@fb.com>

b297874a

17 4月, 2016 1 次提交

dm cache metadata: fix cmd_read_lock() acquiring write lock · 6545b60b

由 Ahmed Samy 提交于 4月 17, 2016

Commit 9567366f ("dm cache metadata: fix READ_LOCK macros and
cleanup WRITE_LOCK macros") uses down_write() instead of down_read() in
cmd_read_lock(), yet up_read() is used to release the lock in
READ_UNLOCK().  Fix it.

Fixes: 9567366f ("dm cache metadata: fix READ_LOCK macros and cleanup WRITE_LOCK macros")
Cc: stable@vger.kernel.org
Signed-off-by: NAhmed Samy <f.fallen45@gmail.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

6545b60b

15 4月, 2016 2 次提交

dm cache metadata: fix READ_LOCK macros and cleanup WRITE_LOCK macros · 9567366f

由 Mike Snitzer 提交于 4月 12, 2016

The READ_LOCK macro was incorrectly returning -EINVAL if
dm_bm_is_read_only() was true -- it will always be true once the cache
metadata transitions to read-only by dm_cache_metadata_set_read_only().

Wrap READ_LOCK and WRITE_LOCK multi-statement macros in do {} while(0).
Also, all accesses of the 'cmd' argument passed to these related macros
are now encapsulated in parenthesis.

A follow-up patch can be developed to eliminate the use of macros in
favor of pure C code.  Avoiding that now given that this needs to apply
to stable@.
Reported-by: NBen Hutchings <ben@decadent.org.uk>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Fixes: d14fcf3d ("dm cache: make sure every metadata function checks fail_io")
Cc: stable@vger.kernel.org

9567366f

md/raid0: fix uninitialized variable bug · 7dedd15d

由 Dan Carpenter 提交于 4月 14, 2016

If this function fails the callers expect that *private_conf is set to
an ERR_PTR() but that isn't true for the first error path where we can't
allocate "conf".  It leads to some uninitialized variable bugs.
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NShaohua Li <shli@fb.com>

7dedd15d

11 4月, 2016 1 次提交

dm: fix dm_target_io leak if clone_bio() returns an error · 072623de

由 Mikulas Patocka 提交于 4月 09, 2016

Commit c80914e8 ("dm: return error if bio_integrity_clone() fails
in clone_bio()") changed clone_bio() such that if it does return error
then the alloc_tio() created resources (both the bio that was allocated
to be a clone and the containing dm_target_io struct) will leak.

Fix this by calling free_tio() in __clone_and_map_data_bio()'s
clone_bio() error path.

Fixes: c80914e8 ("dm: return error if bio_integrity_clone() fails in clone_bio()")
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

072623de

05 4月, 2016 1 次提交

mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros · 09cbfeaf

由 Kirill A. Shutemov 提交于 4月 01, 2016

PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.

This promise never materialized.  And unlikely will.

We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE.  And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.

Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.

Let's stop pretending that pages in page cache are special.  They are
not.

The changes are pretty straight-forward:

 - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;

 - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;

 - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

 - page_cache_get() -> get_page();

 - page_cache_release() -> put_page();

This patch contains automated changes generated with coccinelle using
script below.  For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.

The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.

There are few places in the code where coccinelle didn't reach.  I'll
fix them manually in a separate patch.  Comments and documentation also
will be addressed with the separate patch.

virtual patch

@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT

@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE

@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK

@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)

@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)

@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)
Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

09cbfeaf

02 4月, 2016 1 次提交

md/bitmap: clear bitmap if bitmap_create failed · f9a67b11

由 Guoqing Jiang 提交于 4月 01, 2016

If bitmap_create returns an error, we need to call
either bitmap_destroy or bitmap_free to do clean up,
and the selection is based on mddev->bitmap is set
or not.

And the sysfs_put(bitmap->sysfs_can_clear) is moved
from bitmap_destroy to bitmap_free, and the comment
of bitmap_create is changed as well.
Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>

f9a67b11

01 4月, 2016 3 次提交

MD: add rdev reference for super write · ed3b98c7

由 Shaohua Li 提交于 3月 29, 2016

Xiao Ni reported below crash:
[26396.335146] BUG: unable to handle kernel NULL pointer dereference at 00000000000002a8
[26396.342990] IP: [<ffffffffa0425b00>] super_written+0x20/0x80 [md_mod]
[26396.349449] PGD 0
[26396.351468] Oops: 0002 [#1] SMP
[26396.354898] Modules linked in: ext4 mbcache jbd2 raid456 async_raid6_recov async_memcpy async_pq async_xor xor async_td
[26396.408404] CPU: 5 PID: 3261 Comm: loop0 Not tainted 4.5.0 #1
[26396.414140] Hardware name: Dell Inc. PowerEdge R715/0G2DP3, BIOS 3.2.2 09/15/2014
[26396.421608] task: ffff8808339be680 ti: ffff8808365f4000 task.ti: ffff8808365f4000
[26396.429074] RIP: 0010:[<ffffffffa0425b00>]  [<ffffffffa0425b00>] super_written+0x20/0x80 [md_mod]
[26396.437952] RSP: 0018:ffff8808365f7c38  EFLAGS: 00010046
[26396.443252] RAX: ffffffffa0425ae0 RBX: ffff8804336a7900 RCX: ffffe8f9f7b41198
[26396.450371] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8804336a7900
[26396.457489] RBP: ffff8808365f7c50 R08: 0000000000000005 R09: 00001801e02ce3d7
[26396.464608] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
[26396.471728] R13: ffff8808338d9a00 R14: 0000000000000000 R15: ffff880833f9fe00
[26396.478849] FS:  00007f9e5066d740(0000) GS:ffff880237b40000(0000) knlGS:0000000000000000
[26396.486922] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[26396.492656] CR2: 00000000000002a8 CR3: 00000000019ea000 CR4: 00000000000006e0
[26396.499775] Stack:
[26396.501781]  ffff8804336a7900 0000000000000000 0000000000000000 ffff8808365f7c68
[26396.509199]  ffffffff81308cd0 ffff8804336a7900 ffff8808365f7ca8 ffffffff81310637
[26396.516618]  00000000a0233a00 ffff880833f9fe00 0000000000000000 ffff880833fb0000
[26396.524038] Call Trace:
[26396.526485]  [<ffffffff81308cd0>] bio_endio+0x40/0x60
[26396.531529]  [<ffffffff81310637>] blk_update_request+0x87/0x320
[26396.537439]  [<ffffffff8131a20a>] blk_mq_end_request+0x1a/0x70
[26396.543261]  [<ffffffff81313889>] blk_flush_complete_seq+0xd9/0x2a0
[26396.549517]  [<ffffffff81313ccf>] flush_end_io+0x15f/0x240
[26396.554993]  [<ffffffff8131a22a>] blk_mq_end_request+0x3a/0x70
[26396.560815]  [<ffffffff8131a314>] __blk_mq_complete_request+0xb4/0xe0
[26396.567246]  [<ffffffff8131a35c>] blk_mq_complete_request+0x1c/0x20
[26396.573506]  [<ffffffffa04182df>] loop_queue_work+0x6f/0x72c [loop]
[26396.579764]  [<ffffffff81697844>] ? __schedule+0x2b4/0x8f0
[26396.585242]  [<ffffffff810a7812>] kthread_worker_fn+0x52/0x170
[26396.591065]  [<ffffffff810a77c0>] ? kthread_create_on_node+0x1a0/0x1a0
[26396.597582]  [<ffffffff810a7238>] kthread+0xd8/0xf0
[26396.602453]  [<ffffffff810a7160>] ? kthread_park+0x60/0x60
[26396.607929]  [<ffffffff8169bdcf>] ret_from_fork+0x3f/0x70
[26396.613319]  [<ffffffff810a7160>] ? kthread_park+0x60/0x60

md_super_write() and corresponding md_super_wait() generally are called
with reconfig_mutex locked, which prevents disk disappears. There is one
case this rule is broken. write_sb_page of bitmap.c doesn't hold the
mutex. next_active_rdev does increase rdev reference, but it decreases
the reference too early (eg, before IO finish). disk can disappear at
the window. We unconditionally increase rdev reference in
md_super_write() to avoid the race.
Reported-and-tested-by: NXiao Ni <xni@redhat.com>
Reviewed-by: NNeil Brown <neilb@suse.de>
Signed-off-by: NShaohua Li <shli@fb.com>

ed3b98c7

md: fix a trivial typo in comments · 466ad292

由 Wei Fang 提交于 3月 21, 2016

Fix a trivial typo in md_ioctl().
Signed-off-by: NWei Fang <fangwei1@huawei.com>
Signed-off-by: NShaohua Li <shli@fb.com>

466ad292

md:raid1: fix a dead loop when read from a WriteMostly disk · 816b0acf

由 Wei Fang 提交于 3月 21, 2016

If first_bad == this_sector when we get the WriteMostly disk
in read_balance(), valid disk will be returned with zero
max_sectors. It'll lead to a dead loop in make_request(), and
OOM will happen because of endless allocation of struct bio.

Since we can't get data from this disk in this case, so
continue for another disk.
Signed-off-by: NWei Fang <fangwei1@huawei.com>
Signed-off-by: NShaohua Li <shli@fb.com>

816b0acf

18 3月, 2016 3 次提交

md/raid5: Cleanup cpu hotplug notifier · 1d034e68

由 Anna-Maria Gleixner 提交于 3月 16, 2016

The raid456_cpu_notify() hotplug callback lacks handling of the
CPU_UP_CANCELED case. That means if CPU_UP_PREPARE fails, the scratch
buffer is leaked.

Add handling for CPU_UP_CANCELED[_FROZEN] hotplug notifier transitions
to free the scratch buffer.

CC: Shaohua Li <shli@kernel.org>
CC: linux-raid@vger.kernel.org
Signed-off-by: NAnna-Maria Gleixner <anna-maria@linutronix.de>
Signed-off-by: NShaohua Li <shli@fb.com>

1d034e68

raid10: include bio_end_io_list in nr_queued to prevent freeze_array hang · 23ddba80

由 Shaohua Li 提交于 3月 14, 2016

This is the raid10 counterpart of the bug fixed by Nate
(raid1: include bio_end_io_list in nr_queued to prevent freeze_array hang)

Fixes: 95af587e(md/raid10: ensure device failure recorded before write request returns)
Cc: stable@vger.kernel.org (V4.3+)
Cc: Nate Dailey <nate.dailey@stratus.com>
Signed-off-by: NShaohua Li <shli@fb.com>

23ddba80

raid1: include bio_end_io_list in nr_queued to prevent freeze_array hang · ccfc7bf1

由 Nate Dailey 提交于 2月 29, 2016

If raid1d is handling a mix of read and write errors, handle_read_error's
call to freeze_array can get stuck.

This can happen because, though the bio_end_io_list is initially drained,
writes can be added to it via handle_write_finished as the retry_list
is processed. These writes contribute to nr_pending but are not included
in nr_queued.

If a later entry on the retry_list triggers a call to handle_read_error,
freeze array hangs waiting for nr_pending == nr_queued+extra. The writes
on the bio_end_io_list aren't included in nr_queued so the condition will
never be satisfied.

To prevent the hang, include bio_end_io_list writes in nr_queued.

There's probably a better way to handle decrementing nr_queued, but this
seemed like the safest way to avoid breaking surrounding code.

I'm happy to supply the script I used to repro this hang.

Fixes: 55ce74d4(md/raid1: ensure device failure recorded before write request returns.)
Cc: stable@vger.kernel.org (v4.3+)
Signed-off-by: NNate Dailey <nate.dailey@stratus.com>
Signed-off-by: NShaohua Li <shli@fb.com>

ccfc7bf1

15 3月, 2016 5 次提交

dm: fix rq_end_stats() NULL pointer in dm_requeue_original_request() · 98dbc9c6

由 Bryn M. Reeves 提交于 3月 14, 2016

An "old" (.request_fn) DM 'struct request' stores a pointer to the
associated 'struct dm_rq_target_io' in rq->special.

dm_requeue_original_request(), previously named
dm_requeue_unmapped_original_request(), called dm_unprep_request() to
reset rq->special to NULL.  But rq_end_stats() would go on to hit a NULL
pointer deference because its call to tio_from_request() returned NULL.

Fix this by calling rq_end_stats() _before_ dm_unprep_request()
Signed-off-by: NBryn M. Reeves <bmr@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Fixes: e262f347 ("dm stats: add support for request-based DM devices")
Cc: stable@vger.kernel.org # 4.2+

98dbc9c6

md: fix typos for stipe · d85326cf

由 Guoqing Jiang 提交于 3月 14, 2016

Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>

d85326cf

md/bitmap: remove redundant return in bitmap_checkpage · c6f0b9f1

由 Guoqing Jiang 提交于 3月 14, 2016

The "return 0" is not needed since bitmap_checkpage
will finally return 0 for the case.
Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>

c6f0b9f1

md/raid1: remove unnecessary BUG_ON · b3c95b42

由 Guoqing Jiang 提交于 3月 14, 2016

Since bitmap_start_sync will not return until
sync_blocks is not less than PAGE_SIZE>>9, so
the BUG_ON is not needed anymore.
Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>

b3c95b42

md: multipath: don't hardcopy bio in .make_request path · fafcde3a

由 Ming Lei 提交于 3月 12, 2016

Inside multipath_make_request(), multipath maps the incoming
bio into low level device's bio, but it is totally wrong to
copy the bio into mapped bio via '*mapped_bio = *bio'. For
example, .__bi_remaining is kept in the copy, especially if
the incoming bio is chained to via bio splitting, so .bi_end_io
can't be called for the mapped bio at all in the completing path
in this kind of situation.

This patch fixes the issue by using clone style.

Cc: stable@vger.kernel.org (v3.14+)
Reported-and-tested-by: NAndrea Righi <righi.andrea@gmail.com>
Signed-off-by: NMing Lei <ming.lei@canonical.com>
Signed-off-by: NShaohua Li <shli@fb.com>

fafcde3a

12 3月, 2016 1 次提交

dm thin: consistently return -ENOSPC if pool has run out of data space · c3667cc6

由 Mike Snitzer 提交于 3月 10, 2016

Commit 0a927c2f ("dm thin: return -ENOSPC when erroring retry list due
to out of data space") was a step in the right direction but didn't go
far enough.

Add a new 'out_of_data_space' flag to 'struct pool' and set it if/when
the pool runs of of data space.  This fixes cell_error() and
error_retry_list() to not blindly return -EIO.

We cannot rely on the 'error_if_no_space' feature flag since it is
transient (in that it can be reset once space is added, plus it only
controls whether errors are issued, it doesn't reflect whether the
pool is actually out of space).
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

c3667cc6

11 3月, 2016 12 次提交

M
dm cache: bump the target version · 843f0f2e
由 Mike Snitzer 提交于 3月 10, 2016
```
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
```
843f0f2e

dm cache: make sure every metadata function checks fail_io · d14fcf3d

由 Joe Thornber 提交于 3月 10, 2016

Otherwise operations may be attempted that will only ever go on to crash
(since the metadata device is either missing or unreliable if 'fail_io'
is set).
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org

d14fcf3d

M
dm: add missing newline between DM_DEBUG_BLOCK_STACK_TRACING and DM_BUFIO · 3f068040
由 Mike Snitzer 提交于 3月 04, 2016
```
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
```
3f068040
M
dm cache policy smq: clarify that mq registration failure was for 'mq' · 7dd85bb0
由 Mike Snitzer 提交于 3月 03, 2016
```
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
```
7dd85bb0

dm: return error if bio_integrity_clone() fails in clone_bio() · c80914e8

由 Mike Snitzer 提交于 3月 02, 2016

clone_bio() now checks if bio_integrity_clone() returned an error rather
than just drop it on the floor.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

c80914e8

dm thin metadata: don't issue prefetches if a transaction abort has failed · 2eae9e44

由 Joe Thornber 提交于 3月 01, 2016

If a transaction abort has failed then we can no longer use the metadata
device.  Typically this happens if the superblock is unreadable.

This fix addresses a crash seen during metadata device failure testing.

Fixes: 8a01a6af ("dm thin: prefetch missing metadata pages")
Cc: stable@vger.kernel.org # 3.19+
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

2eae9e44

dm snapshot: disallow the COW and origin devices from being identical · 4df2bf46

由 DingXiang 提交于 2月 02, 2016

Otherwise loading a "snapshot" table using the same device for the
origin and COW devices, e.g.:

echo "0 20971520 snapshot 253:3 253:3 P 8" | dmsetup create snap

will trigger:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000098
[ 1958.979934] IP: [<ffffffffa040efba>] dm_exception_store_set_chunk_size+0x7a/0x110 [dm_snapshot]
[ 1958.989655] PGD 0
[ 1958.991903] Oops: 0000 [#1] SMP
...
[ 1959.059647] CPU: 9 PID: 3556 Comm: dmsetup Tainted: G          IO    4.5.0-rc5.snitm+ #150
...
[ 1959.083517] task: ffff8800b9660c80 ti: ffff88032a954000 task.ti: ffff88032a954000
[ 1959.091865] RIP: 0010:[<ffffffffa040efba>]  [<ffffffffa040efba>] dm_exception_store_set_chunk_size+0x7a/0x110 [dm_snapshot]
[ 1959.104295] RSP: 0018:ffff88032a957b30  EFLAGS: 00010246
[ 1959.110219] RAX: 0000000000000000 RBX: 0000000000000008 RCX: 0000000000000001
[ 1959.118180] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff880329334a00
[ 1959.126141] RBP: ffff88032a957b50 R08: 0000000000000000 R09: 0000000000000001
[ 1959.134102] R10: 000000000000000a R11: f000000000000000 R12: ffff880330884d80
[ 1959.142061] R13: 0000000000000008 R14: ffffc90001c13088 R15: ffff880330884d80
[ 1959.150021] FS:  00007f8926ba3840(0000) GS:ffff880333440000(0000) knlGS:0000000000000000
[ 1959.159047] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1959.165456] CR2: 0000000000000098 CR3: 000000032f48b000 CR4: 00000000000006e0
[ 1959.173415] Stack:
[ 1959.175656]  ffffc90001c13040 ffff880329334a00 ffff880330884ed0 ffff88032a957bdc
[ 1959.183946]  ffff88032a957bb8 ffffffffa040f225 ffff880329334a30 ffff880300000000
[ 1959.192233]  ffffffffa04133e0 ffff880329334b30 0000000830884d58 00000000569c58cf
[ 1959.200521] Call Trace:
[ 1959.203248]  [<ffffffffa040f225>] dm_exception_store_create+0x1d5/0x240 [dm_snapshot]
[ 1959.211986]  [<ffffffffa040d310>] snapshot_ctr+0x140/0x630 [dm_snapshot]
[ 1959.219469]  [<ffffffffa0005c44>] ? dm_split_args+0x64/0x150 [dm_mod]
[ 1959.226656]  [<ffffffffa0005ea7>] dm_table_add_target+0x177/0x440 [dm_mod]
[ 1959.234328]  [<ffffffffa0009203>] table_load+0x143/0x370 [dm_mod]
[ 1959.241129]  [<ffffffffa00090c0>] ? retrieve_status+0x1b0/0x1b0 [dm_mod]
[ 1959.248607]  [<ffffffffa0009e35>] ctl_ioctl+0x255/0x4d0 [dm_mod]
[ 1959.255307]  [<ffffffff813304e2>] ? memzero_explicit+0x12/0x20
[ 1959.261816]  [<ffffffffa000a0c3>] dm_ctl_ioctl+0x13/0x20 [dm_mod]
[ 1959.268615]  [<ffffffff81215eb6>] do_vfs_ioctl+0xa6/0x5c0
[ 1959.274637]  [<ffffffff81120d2f>] ? __audit_syscall_entry+0xaf/0x100
[ 1959.281726]  [<ffffffff81003176>] ? do_audit_syscall_entry+0x66/0x70
[ 1959.288814]  [<ffffffff81216449>] SyS_ioctl+0x79/0x90
[ 1959.294450]  [<ffffffff8167e4ae>] entry_SYSCALL_64_fastpath+0x12/0x71
...
[ 1959.323277] RIP  [<ffffffffa040efba>] dm_exception_store_set_chunk_size+0x7a/0x110 [dm_snapshot]
[ 1959.333090]  RSP <ffff88032a957b30>
[ 1959.336978] CR2: 0000000000000098
[ 1959.344121] ---[ end trace b049991ccad1169e ]---

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1195899
Cc: stable@vger.kernel.org
Signed-off-by: NDing Xiang <dingxiang@huawei.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

4df2bf46

dm cache: make the 'mq' policy an alias for 'smq' · 9ed84698

由 Joe Thornber 提交于 2月 10, 2016

smq seems to be performing better than the old mq policy in all
situations, as well as using a quarter of the memory.

Make 'mq' an alias for 'smq' when choosing a cache policy.  The tunables
that were present for the old mq are faked, and have no effect.  mq
should be considered deprecated now.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

9ed84698

dm: drop unnecessary assignment of md->queue · e233d800

由 Bob Liu 提交于 2月 24, 2016

md->queue and q are the same thing in dm_old_init_request_queue() and
dm_mq_init_request_queue().

Also drop the temporary 'struct request_queue *q' in
dm_old_init_request_queue().
Signed-off-by: NBob Liu <bob.liu@oracle.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

e233d800

dm: reorder 'struct mapped_device' members to fix alignment and holes · 032482fd

由 Mike Snitzer 提交于 2月 22, 2016

Saves 16 bytes by eliminating 4 4byte holes but more importantly:
numerous members that crossed cachelines were fixed.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

032482fd

dm: remove dummy definition of 'struct dm_table' · 1d3aa6f6

由 Mike Snitzer 提交于 2月 22, 2016

Change the map pointer in 'struct mapped_device' from 'struct dm_table
__rcu *' to 'void __rcu *' to avoid the need for the dummy definition.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

1d3aa6f6

dm: add 'dm_numa_node' module parameter · 115485e8

由 Mike Snitzer 提交于 2月 22, 2016

Allows user to control which NUMA node the memory for DM device
structures (e.g. mapped_device, request_queue, gendisk, blk_mq_tag_set)
is allocated from.

Defaults to NUMA_NO_NODE (-1).  Allowable range is from -1 until the
last online NUMA node id.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

115485e8