提交 · 5a85071c2cbcc7d8d8f764b33bf64c76e47d268d · openanolis / cloud-kernel

22 6月, 2017 1 次提交

md: use a separate bio_set for synchronous IO. · 5a85071c

由 NeilBrown 提交于 6月 21, 2017

md devices allocate a bio_set and use it for two
distinct purposes.
mddev->bio_set is used to clone bios as part of sending
upper level requests down to lower level devices,
and it is also use for synchronous IO such as superblock
and bitmap updates, and for correcting read errors.

This multiple usage can lead to deadlocks.  It is likely
that cloned bios might be queued for write and to be
waiting for a metadata update before the write can be permitted.
If the cloning exhausted mddev->bio_set, the metadata update
may not be able to proceed.

This scenario has been seen during heavy testing, with lots of IO and
lots of memory pressure.

Address this by adding a new bio_set specifically for synchronous IO.
All synchronous IO goes directly to the underlying device and is not
queued at the md level, so request using entries from the new
mddev->sync_set will complete in a timely fashion.
Requests that use mddev->bio_set will sometimes need to wait
for synchronous IO, but will no longer risk deadlocking that iO.

Also: small simplification in mddev_put(): there is no need to
wait until the spinlock is released before calling bioset_free().
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>

5a85071c

17 6月, 2017 3 次提交

md: change the initialization value for a spare device spot to MD_DISK_ROLE_SPARE · 8df72024

由 Lidong Zhong 提交于 6月 12, 2017

The value for spare spot of sb->dev_roles is changed from
MD_DISK_ROLE_FAULTY to MD_DISK_ROLE_SPARE to keep align
with the value when the superblock is firstly created in
userspace.
Signed-off-by: NLidong Zhong <lzhong@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>

8df72024

md/raid1: remove unused bio in sync_request_write · 037d2ff6

由 Guoqing Jiang 提交于 6月 15, 2017

The "bio" is not used in sync_request_write after commit a68e5870
("md/raid1: split out two sub-functions from sync_request_write").
Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>

037d2ff6

md/raid10: fix FailFast test for wrong device · 1cdd1257

由 Guoqing Jiang 提交于 6月 13, 2017

We need to test FailFast flag for replacement device here
since the set up for writing is for the replacement, so we
need fix it like:

- if (test_bit(FailFast, &conf->mirrors[d].rdev->flags))
+ if (test_bit(FailFast, &conf->mirrors[d].replacement->flags))

Since commit f90145f3 ("md/raid10: add rcu protection
to rdev access in raid10_sync_request.") had added the rcu
protection for the part, so let's extend the range protected
by rcu and use rdev directly.

Fixes: 1919cbb2 ("md/raid10: add failfast handling for writes.")
Reviewed-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>

1cdd1257

14 6月, 2017 2 次提交

md: don't use flush_signals in userspace processes · f9c79bc0

由 Mikulas Patocka 提交于 6月 07, 2017

The function flush_signals clears all pending signals for the process. It
may be used by kernel threads when we need to prepare a kernel thread for
responding to signals. However using this function for an userspaces
processes is incorrect - clearing signals without the program expecting it
can cause misbehavior.

The raid1 and raid5 code uses flush_signals in its request routine because
it wants to prepare for an interruptible wait. This patch drops
flush_signals and uses sigprocmask instead to block all signals (including
SIGKILL) around the schedule() call. The signals are not lost, but the
schedule() call won't respond to them.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Acked-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>

f9c79bc0

md: fix deadlock between mddev_suspend() and md_write_start() · cc27b0c7

由 NeilBrown 提交于 6月 05, 2017

If mddev_suspend() races with md_write_start() we can deadlock
with mddev_suspend() waiting for the request that is currently
in md_write_start() to complete the ->make_request() call,
and md_write_start() waiting for the metadata to be updated
to mark the array as 'dirty'.
As metadata updates done by md_check_recovery() only happen then
the mddev_lock() can be claimed, and as mddev_suspend() is often
called with the lock held, these threads wait indefinitely for each
other.

We fix this by having md_write_start() abort if mddev_suspend()
is happening, and ->make_request() aborts if md_write_start()
aborted.
md_make_request() can detect this abort, decrease the ->active_io
count, and wait for mddev_suspend().
Reported-by: NNix <nix@esperi.org.uk>
Fix: 68866e42(MD: no sync IO while suspended)
Cc: stable@vger.kernel.org
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>

cc27b0c7

06 6月, 2017 1 次提交

md: initialise ->writes_pending in personality modules. · a415c0f1

由 NeilBrown 提交于 6月 05, 2017

The new per-cpu counter for writes_pending is initialised in
md_alloc(), which is not called by dm-raid.
So dm-raid fails when md_write_start() is called.

Move the initialization to the personality modules
that need it.  This way it is always initialised when needed,
but isn't unnecessarily initialized (requiring memory allocation)
when the personality doesn't use writes_pending.
Reported-by: NHeinz Mauelshagen <heinzm@redhat.com>
Fixes: 4ad23a97 ("MD: use per-cpu counter for writes_pending")
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>

a415c0f1

01 6月, 2017 1 次提交

md: Make flush bios explicitely sync · 5a8948f8

由 Jan Kara 提交于 5月 31, 2017

Commit b685d3d6 "block: treat REQ_FUA and REQ_PREFLUSH as
synchronous" removed REQ_SYNC flag from WRITE_{FUA|PREFLUSH|...}
definitions.  generic_make_request_checks() however strips REQ_FUA and
REQ_PREFLUSH flags from a bio when the storage doesn't report volatile
write cache and thus write effectively becomes asynchronous which can
lead to performance regressions

Fix the problem by making sure all bios which are synchronous are
properly marked with REQ_SYNC.

CC: linux-raid@vger.kernel.org
CC: Shaohua Li <shli@kernel.org>
Fixes: b685d3d6
CC: stable@vger.kernel.org
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NShaohua Li <shli@fb.com>

5a8948f8

31 5月, 2017 1 次提交

dm: make flush bios explicitly sync · ff0361b3

由 Jan Kara 提交于 5月 31, 2017

Commit b685d3d6 ("block: treat REQ_FUA and REQ_PREFLUSH as
synchronous") removed REQ_SYNC flag from WRITE_{FUA|PREFLUSH|...}
definitions.  generic_make_request_checks() however strips REQ_FUA and
REQ_PREFLUSH flags from a bio when the storage doesn't report volatile
write cache and thus write effectively becomes asynchronous which can
lead to performance regressions.

Fix the problem by making sure all bios which are synchronous are
properly marked with REQ_SYNC.

Fixes: b685d3d6 ("block: treat REQ_FUA and REQ_PREFLUSH as synchronous")
Cc: stable@vger.kernel.org
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

ff0361b3

25 5月, 2017 2 次提交

md: report sector of stripes with check mismatches · e1539036

由 Nix 提交于 5月 16, 2017

This makes it possible, with appropriate filesystem support, for a
sysadmin to tell what is affected by the mismatch, and whether
it should be ignored (if it's inside a swap partition, for
instance).

We ratelimit to prevent log flooding: if there are so many
mismatches that ratelimiting is necessary, the individual messages
are relatively unlikely to be important (either the machine is
swapping like crazy or something is very wrong with the disk).
Signed-off-by: NNick Alcock <nick.alcock@oracle.com>
Signed-off-by: NShaohua Li <shli@fb.com>

e1539036

md: uuid debug statement now in processor byte order. · 4179bc30

由 Kyungchan Koh 提交于 5月 24, 2017

Previously, the uuid debug statements were printed in little-endian
format, which wasn't consistent in machines that might not be in
little-endian byte order. With this change, the output will be
consistent for all machines with different byte-ordering.
Signed-off-by: NKyungchan Koh <kkc6196@fb.com>
Signed-off-by: NShaohua Li <shli@fb.com>

4179bc30

23 5月, 2017 3 次提交

dm ioctl: restore __GFP_HIGH in copy_params() · 8c1e2162

由 Junaid Shahid 提交于 5月 18, 2017

Commit d224e938 ("drivers/md/dm-ioctl.c: use kvmalloc rather than
opencoded variant") left out the __GFP_HIGH flag when converting from
__vmalloc to kvmalloc.  This can cause the DM ioctl to fail in some low
memory situations where it wouldn't have failed earlier.  Add __GFP_HIGH
back to avoid any potential regression.

Fixes: d224e938 ("drivers/md/dm-ioctl.c: use kvmalloc rather than opencoded variant")
Signed-off-by: NJunaid Shahid <junaids@google.com>
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Acked-by: NDavid Rientjes <rientjes@google.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

8c1e2162

M
dm integrity: use kvmalloc() instead of dm_integrity_kvmalloc() · 702a6204
由 Mikulas Patocka 提交于 5月 20, 2017
```
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
```
702a6204

dm verity: fix no salt use case · f52236e0

由 Gilad Ben-Yossef 提交于 5月 18, 2017

DM-Verity has an (undocumented) mode where no salt is used.  This was
never handled directly by the DM-Verity code, instead working due to the
fact that calling crypto_shash_update() with a zero length data is an
implicit noop.

This is no longer the case now that we have switched to
crypto_ahash_update().  Fix the issue by introducing explicit handling
of the no salt use case to DM-Verity.
Signed-off-by: NGilad Ben-Yossef <gilad@benyossef.com>
Reported-by: NMarian Csontos <mcsontos@redhat.com>
Fixes: d1ac3ff0 ("dm verity: switch to using asynchronous hash crypto API")
Tested-by: NMilan Broz <gmazyland@gmail.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

f52236e0

22 5月, 2017 1 次提交

md-cluster: fix potential lock issue in add_new_disk · 2dffdc07

由 Guoqing Jiang 提交于 5月 16, 2017

The add_new_disk returns with communication locked if
__sendmsg returns failure, fix it with call unlock_comm
before return.
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
CC: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>

2dffdc07

17 5月, 2017 2 次提交

dm cache: handle kmalloc failure allocating background_tracker struct · 7e1b9521

由 Colin Ian King 提交于 3月 11, 2017

Currently there is no kmalloc failure check on the allocation of
the background_tracker struct in btracker_create(), and so a NULL return
will lead to a NULL pointer dereference. Add a NULL check.

Detected by CoverityScan, CID#1416587 ("Dereference null return value")

Fixes: b29d4986 ("dm cache: significant rework to leverage dm-bio-prison-v2")
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

7e1b9521

dm bufio: make the parameter "retain_bytes" unsigned long · 13840d38

由 Mikulas Patocka 提交于 4月 30, 2017

Change the type of the parameter "retain_bytes" from unsigned to
unsigned long, so that on 64-bit machines the user can set more than
4GiB of data to be retained.

Also, change the type of the variable "count" in the function
"__evict_old_buffers" to unsigned long.  The assignment
"count = c->n_buffers[LIST_CLEAN] + c->n_buffers[LIST_DIRTY];"
could result in unsigned long to unsigned overflow and that could result
in buffers not being freed when they should.

While at it, avoid division in get_retain_buffers().  Division is slow,
we can change it to shift because we have precalculated the log2 of
block size.

Cc: stable@vger.kernel.org
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

13840d38

16 5月, 2017 5 次提交

dm mpath: multipath_clone_and_map must not return -EIO · f98e0eb6

由 Christoph Hellwig 提交于 5月 15, 2017

Since 412445ac ("dm: introduce a new DM_MAPIO_KILL return value"), the
clone_and_map_rq methods must not return errno values, so fix it up
to properly return DM_MAPIO_KILL, instead of the -EIO value that snuck
in due to a conflict between two patches.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

f98e0eb6

dm mpath: don't return -EIO from dm_report_EIO · 18a482f5

由 Christoph Hellwig 提交于 5月 15, 2017

Instead just turn the macro into a helper for the warning message.
This removes an unnecessary assignment and will allow the next commit to
fix a place where -EIO is the wrong return value.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

18a482f5

dm rq: add a missing break to map_request · ece07280

由 Christoph Hellwig 提交于 5月 15, 2017

We don't want to bug when receiving a DM_MAPIO_KILL value..

Fixes: 412445ac ("dm: introduce a new DM_MAPIO_KILL return value")
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

ece07280

dm space map disk: fix some book keeping in the disk space map · 0377a07c

由 Joe Thornber 提交于 5月 15, 2017

When decrementing the reference count for a block, the free count wasn't
being updated if the reference count went to zero.

Cc: stable@vger.kernel.org
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

0377a07c

dm thin metadata: call precommit before saving the roots · 91bcdb92

由 Joe Thornber 提交于 5月 15, 2017

These calls were the wrong way round in __write_initial_superblock.

Cc: stable@vger.kernel.org
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

91bcdb92

15 5月, 2017 8 次提交

dm cache policy smq: don't do any writebacks unless IDLE · 2e633095