提交 · d19a55ccad15a486ffe03030570744e5d5bd9f8e · openeuler / raspberrypi-kernel

03 2月, 2017 1 次提交
- M
  dm mpath: cleanup -Wbool-operation warning in choose_pgpath() · d19a55cc
  由 Mike Snitzer 提交于 1月 06, 2017
```
Reported-by: NDavid Binderman <dcb314@hotmail.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
```
  d19a55cc
25 1月, 2017 6 次提交

md/r5cache: disable write back for degraded array · 2e38a37f

由 Song Liu 提交于 1月 24, 2017

write-back cache in degraded mode introduces corner cases to the array.
Although we try to cover all these corner cases, it is safer to just
disable write-back cache when the array is in degraded mode.

In this patch, we disable writeback cache for degraded mode:
1. On device failure, if the array enters degraded mode, raid5_error()
   will submit async job r5c_disable_writeback_async to disable
   writeback;
2. In r5c_journal_mode_store(), it is invalid to enable writeback in
   degraded mode;
3. In r5c_try_caching_write(), stripes with s->failed>0 will be handled
   in write-through mode.
Signed-off-by: NSong Liu <songliubraving@fb.com>
Signed-off-by: NShaohua Li <shli@fb.com>

2e38a37f

md/r5cache: shift complex rmw from read path to write path · 07e83364

由 Song Liu 提交于 1月 23, 2017

Write back cache requires a complex RMW mechanism, where old data is
read into dev->orig_page for prexor, and then xor is done with
dev->page. This logic is already implemented in the write path.

However, current read path is not awared of this requirement. When
the array is optimal, the RMW is not required, as the data are
read from raid disks. However, when the target stripe is degraded,
complex RMW is required to generate right data.

To keep read path as clean as possible, we handle read path by
flushing degraded, in-journal stripes before processing reads to
missing dev.

Specifically, when there is read requests to a degraded stripe
with data in journal, handle_stripe_fill() calls
r5c_make_stripe_write_out() and exits. Then handle_stripe_dirtying()
will do the complex RMW and flush the stripe to RAID disks. After
that, read requests are handled.

There is one more corner case when there is non-overwrite bio for
the missing (or out of sync) dev. handle_stripe_dirtying() will not
be able to process the non-overwrite bios without constructing the
data in handle_stripe_fill(). This is fixed by delaying non-overwrite
bios in handle_stripe_dirtying(). So handle_stripe_fill() works on
these bios after the stripe is flushed to raid disks.
Signed-off-by: NSong Liu <songliubraving@fb.com>
Signed-off-by: NShaohua Li <shli@fb.com>

07e83364

md/r5cache: flush data only stripes in r5l_recovery_log() · a85dd7b8

由 Song Liu 提交于 1月 23, 2017

For safer operation, all arrays start in write-through mode, which has been
better tested and is more mature. And actually the write-through/write-mode
isn't persistent after array restarted, so we always start array in
write-through mode. However, if recovery found data-only stripes before the
shutdown (from previous write-back mode), it is not safe to start the array in
write-through mode, as write-through mode can not handle stripes with data in
write-back cache. To solve this problem, we flush all data-only stripes in
r5l_recovery_log(). When r5l_recovery_log() returns, the array starts with
empty cache in write-through mode.

This logic is implemented in r5c_recovery_flush_data_only_stripes():

1. enable write back cache
2. flush all stripes
3. wake up conf->mddev->thread
4. wait for all stripes get flushed (reuse wait_for_quiescent)
5. disable write back cache

The wait in 4 will be waked up in release_inactive_stripe_list()
when conf->active_stripes reaches 0.

It is safe to wake up mddev->thread here because all the resource
required for the thread has been initialized.
Signed-off-by: NSong Liu <songliubraving@fb.com>
Signed-off-by: NShaohua Li <shli@fb.com>

a85dd7b8

S
md/raid5: move comment of fetch_block to right location · ba02684d
由 Song Liu 提交于 1月 12, 2017
```
Signed-off-by: NSong Liu <songliubraving@fb.com>
Signed-off-by: NShaohua Li <shli@fb.com>
```
ba02684d

md/r5cache: read data into orig_page for prexor of cached data · 86aa1397

由 Song Liu 提交于 1月 12, 2017

With write back cache, we use orig_page to do prexor. This patch
makes sure we read data into orig_page for it.

Flag R5_OrigPageUPTDODATE is added to show whether orig_page
has the latest data from raid disk.

We introduce a helper function uptodate_for_rmw() to simplify
the a couple conditions in handle_stripe_dirtying().
Signed-off-by: NSong Liu <songliubraving@fb.com>
Signed-off-by: NShaohua Li <shli@fb.com>

86aa1397

md/raid5-cache: delete meaningless code · d46d29f0

由 Shaohua Li 提交于 1月 11, 2017

sector_t is unsigned long, it's never < 0
Reported-by: NJulia Lawall <julia.lawall@lip6.fr>
Signed-off-by: NShaohua Li <shli@fb.com>

d46d29f0

10 1月, 2017 1 次提交

md/raid5: Use correct IS_ERR() variation on pointer check · 32cd7cbb

由 Jes Sorensen 提交于 1月 06, 2017

This fixes a build error on certain architectures, such as ppc64.

Fixes: 6995f0b2("md: takeover should clear unrelated bits")
Signed-off-by: NJes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: NShaohua Li <shli@fb.com>

32cd7cbb

06 1月, 2017 5 次提交

md: cleanup mddev flag clear for takeover · 394ed8e4

由 Shaohua Li 提交于 1月 04, 2017

Commit 6995f0b2 (md: takeover should clear unrelated bits) clear
unrelated bits, but it's quite fragile. To avoid error in the future,
define a macro for unsupported mddev flags for each raid type and use it
to clear unsupported mddev flags. This should be less error-prone.
Suggested-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>

394ed8e4

md/r5cache: fix spelling mistake on "recoverying" · 99f17890

由 Colin Ian King 提交于 12月 23, 2016

Trivial fix to spelling mistake "recoverying" to "recovering" in
pr_dbg message.
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Signed-off-by: NShaohua Li <shli@fb.com>

99f17890

md/r5cache: assign conf->log before r5l_load_log() · d2250f10

由 Song Liu 提交于 12月 14, 2016

r5l_load_log() calls functions that requires a proper conf->log,
for example, r5c_is_writeback(). Therefore, we should set
conf->log before calling r5l_load_log(). If r5l_load_log() fails,
conf->log is set back to NULL.
Signed-off-by: NSong Liu <songliubraving@fb.com>
Signed-off-by: NShaohua Li <shli@fb.com>

d2250f10

md/r5cache: simplify handling of sh->log_start in recovery · 3c66abba

由 Song Liu 提交于 12月 14, 2016

We only need to update sh->log_start at the end of recovery,
which is r5c_recovery_rewrite_data_only_stripes(), so it is not
necessary to set it before that. In this patch, log_start is
removed from r5c_recovery_alloc_stripe().

After updating all sh->log_start, rewrite_data_only_stripes()
also updates log->next_checkpoints to the last sh->log_start.
Signed-off-by: NSong Liu <songliubraving@fb.com>
Signed-off-by: NShaohua Li <shli@fb.com>

3c66abba

md/raid5-cache: removes unnecessary write-through mode judgments · 28ca833e

由 JackieLiu 提交于 12月 13, 2016

The write-through mode has been returned in front of the function,
do not need to do it again.
Signed-off-by: NJackieLiu <liuyun01@kylinos.cn>
Reviewed-by: NSong Liu <songliubraving@fb.com>
Signed-off-by: NShaohua Li <shli@fb.com>

28ca833e

04 1月, 2017 2 次提交

md/raid10: Refactor raid10_make_request · bb5f1ed7

由 Robert LeBlanc 提交于 12月 05, 2016

Refactor raid10_make_request into seperate read and write functions to
clean up the code.

Shaohua: add the recovery check back to read path
Signed-off-by: NRobert LeBlanc <robert@leblancnet.us>
Signed-off-by: NShaohua Li <shli@fb.com>

bb5f1ed7

md/raid1: Refactor raid1_make_request · 3b046a97

由 Robert LeBlanc 提交于 12月 05, 2016

Refactor raid1_make_request to make read and write code in their own
functions to clean up the code.
Signed-off-by: NRobert LeBlanc <robert@leblancnet.us>
Signed-off-by: NShaohua Li <shli@fb.com>

3b046a97

25 12月, 2016 1 次提交

Replace <asm/uaccess.h> with <linux/uaccess.h> globally · 7c0f6ba6

由 Linus Torvalds 提交于 12月 24, 2016

This was entirely automated, using the script by Al:

  PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>'
  sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \
        $(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h)

to do the replacement at the end of the merge window.
Requested-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7c0f6ba6

18 12月, 2016 2 次提交
- E
  bcache: partition support: add 16 minors per bcacheN device · b8c0d911
  由 Eric Wheeler 提交于 10月 23, 2016
```
Signed-off-by: NEric Wheeler <bcache@linux.ewheeler.net>
Tested-by: NWido den Hollander <wido@widodh.nl>
```
  b8c0d911
- K
  bcache: Make gc wakeup sane, remove set_task_state() · be628be0
  由 Kent Overstreet 提交于 10月 26, 2016
```
Signed-off-by: NKent Overstreet <kent.overstreet@gmail.com>
```
  be628be0
16 12月, 2016 1 次提交

linux: drop __bitwise__ everywhere · 9efeccac

由 Michael S. Tsirkin 提交于 12月 11, 2016

__bitwise__ used to mean "yes, please enable sparse checks
unconditionally", but now that we dropped __CHECK_ENDIAN__
__bitwise is exactly the same.
There aren't many users, replace it by __bitwise everywhere.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: NStefan Schmidt <stefan@osg.samsung.com>
Acked-by: NKrzysztof Kozlowski <krzk@kernel.org>
Akced-by: NLee Duncan <lduncan@suse.com>

9efeccac

14 12月, 2016 1 次提交

dm flakey: introduce "error_writes" feature · ef548c55

由 Mike Snitzer 提交于 12月 13, 2016

Recent dm-flakey fixes, to have reads error out during the "down"
interval, made it so that the previous read behaviour is no longer
available.

It is useful to have reads complete like normal but have writes error
out, so make it possible again with a new "error_writes" feature.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

ef548c55

09 12月, 2016 20 次提交

md: separate flags for superblock changes · 2953079c

由 Shaohua Li 提交于 12月 08, 2016

The mddev->flags are used for different purposes. There are a lot of
places we check/change the flags without masking unrelated flags, we
could check/change unrelated flags. These usage are most for superblock
write, so spearate superblock related flags. This should make the code
clearer and also fix real bugs.
Reviewed-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>

2953079c

md: MD_RECOVERY_NEEDED is set for mddev->recovery · 82a301cb

由 Shaohua Li 提交于 12月 08, 2016

Fixes: 90f5f7ad("md: Wait for md_check_recovery before attempting device
removal.")
Reviewed-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>

82a301cb

md: takeover should clear unrelated bits · 6995f0b2

由 Shaohua Li 提交于 12月 08, 2016

When we change level from raid1 to raid5, the MD_FAILFAST_SUPPORTED bit
will be accidentally set, but raid5 doesn't support it. The same is true
for the MD_HAS_JOURNAL bit.

Fix: 46533ff7 (md: Use REQ_FAILFAST_* on metadata writes where appropriate)
Reviewed-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>

6995f0b2

dm cache policy smq: use hash_32() instead of hash_32_generic() · e99dda8f

由 Mike Snitzer 提交于 12月 08, 2016

Switch to using hash_32() because hash_32_generic() should only be used
by the kernel's selftests.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

e99dda8f

dm crypt: reject key strings containing whitespace chars · 027c431c

由 Ondrej Kozina 提交于 12月 01, 2016

Unfortunately key_string may theoretically contain whitespace even after
it's processed by dm_split_args(). The reason for this is DM core
supports escaping of almost all chars including any whitespace.

If userspace passes a key to the kernel in format ":32:logon:my_prefix:my\ key"
dm-crypt will look up key "my_prefix:my key" in kernel keyring service.
So far everything's fine.

Unfortunately if userspace later calls DM_TABLE_STATUS ioctl, it will not
receive back expected ":32:logon:my_prefix:my\ key" but the unescaped version
instead. Also userpace (most notably cryptsetup) is not ready to parse
single target argument containing (even escaped) whitespace chars and any
whitespace is simply taken as delimiter of another argument.

This effect is mitigated by the fact libdevmapper curently performs
double escaping of '\' char. Any user input in format "x\ x" is
transformed into "x\\ x" before being passed to the kernel. Nonetheless
dm-crypt may be used without libdevmapper. Therefore the near-term
solution to this is to reject any key string containing whitespace.
Signed-off-by: NOndrej Kozina <okozina@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

027c431c

dm space map: always set ev if sm_ll_mutate() succeeds · b446396b

由 Benjamin Marzinski 提交于 12月 04, 2016

If no block was allocated or freed, sm_ll_mutate() wasn't setting
*ev, leaving the variable unitialized. sm_ll_insert(),
sm_disk_inc_block(), and sm_disk_new_block() all check ev to see
if there was an allocation event in sm_ll_mutate(), possibly
reading unitialized data.

If no allocation event occured, sm_ll_mutate() should set *ev
to SM_NONE.
Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com>
Acked-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

b446396b

dm space map metadata: skip useless memcpy in metadata_ll_init_index() · 0c79ce0b

由 Benjamin Marzinski 提交于 12月 02, 2016

When metadata_ll_init_index() is called by sm_ll_new_metadata(),
ll->mi_le hasn't been initialized yet. So, when
metadata_ll_init_index() copies the contents of ll->mi_le into the
newly allocated bitmap_root, it is just copying garbage. ll->mi_le
will be allocated later in sm_ll_extend() and copied into the
bitmap_root, in sm_ll_commit().
Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

0c79ce0b

dm space map metadata: fix 'struct sm_metadata' leak on failed create · 314c25c5

由 Benjamin Marzinski 提交于 11月 30, 2016

In dm_sm_metadata_create() we temporarily change the dm_space_map
operations from 'ops' (whose .destroy function deallocates the
sm_metadata) to 'bootstrap_ops' (whose .destroy function doesn't).

If dm_sm_metadata_create() fails in sm_ll_new_metadata() or
sm_ll_extend(), it exits back to dm_tm_create_internal(), which calls
dm_sm_destroy() with the intention of freeing the sm_metadata, but it
doesn't (because the dm_space_map operations is still set to
'bootstrap_ops').

Fix this by setting the dm_space_map operations back to 'ops' if
dm_sm_metadata_create() fails when it is set to 'bootstrap_ops'.
Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com>
Acked-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org

314c25c5

dm raid: fix discard support regression · 11e29684

由 Heinz Mauelshagen 提交于 11月 29, 2016

Commit ecbfb9f1 ("dm raid: add raid level takeover support") moved the
configure_discard_support() call from raid_ctr() to raid_preresume().

Enabling/disabling discard _must_ happen during table load (through the
.ctr hook).  Fix this regression by moving the
configure_discard_support() call back to raid_ctr().

Fixes: ecbfb9f1 ("dm raid: add raid level takeover support")
Cc: stable@vger.kernel.org # 4.8+
Signed-off-by: NHeinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

11e29684

dm raid: don't allow "write behind" with raid4/5/6 · affa9d28

由 Heinz Mauelshagen 提交于 11月 24, 2016

Remove CTR_FLAG_MAX_WRITE_BEHIND from raid4/5/6's valid ctr flags.

Only the md raid1 personality supports setting a maximum number
of "write behind" write IOs on any legs set to "write mostly".
"write mostly" enhances throughput with slow links/disks.

Technically the "write behind" value is a write intent bitmap
property only being respected by the raid1 personality.  It allows a
maximum number of "write behind" writes to any "write mostly" raid1
mirror legs to be delayed and avoids reads from such legs.

No other MD personalities supported via dm-raid make use of "write
behind", thus setting this property is superfluous; it wouldn't cause
harm but it is correct to reject it.
Signed-off-by: NHeinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

affa9d28

dm mpath: use hw_handler_params if attached hw_handler is same as requested · 54cd640d

由 tang.junhui 提交于 11月 24, 2016

Let the requested m->hw_handler_params be used if the attached hardware
handler is the same handler as requested with m->hw_handler_name.
Signed-off-by: Ntang.junhui <tang.junhui@zte.com.cn>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

54cd640d

dm crypt: add ability to use keys from the kernel key retention service · c538f6ec

由 Ondrej Kozina 提交于 11月 21, 2016

The kernel key service is a generic way to store keys for the use of
other subsystems. Currently there is no way to use kernel keys in dm-crypt.
This patch aims to fix that. Instead of key userspace may pass a key
description with preceding ':'. So message that constructs encryption
mapping now looks like this:

  <cipher> [<key>|:<key_string>] <iv_offset> <dev_path> <start> [<#opt_params> <opt_params>]

where <key_string> is in format: <key_size>:<key_type>:<key_description>

Currently we only support two elementary key types: 'user' and 'logon'.
Keys may be loaded in dm-crypt either via <key_string> or using
classical method and pass the key in hex representation directly.

dm-crypt device initialised with a key passed in hex representation may be
replaced with key passed in key_string format and vice versa.

(Based on original work by Andrey Ryabinin)
Signed-off-by: NOndrej Kozina <okozina@redhat.com>
Reviewed-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

c538f6ec

dm array: remove a dead assignment in populate_ablock_with_values() · 0637018d

由 Bart Van Assche 提交于 11月 18, 2016

A value is assigned to 'nr_entries' but is never used, remove it.
Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

0637018d

dm ioctl: use offsetof() instead of open-coding it · 6080758d

由 Bart Van Assche 提交于 11月 18, 2016

Subtracting sizes is a fragile approach because the result is only
correct if the compiler has not added any padding at the end of the
structure. Hence use offsetof() instead of size subtraction. An
additional advantage of offsetof() is that it makes the intent more
clear.
Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

6080758d

dm rq: simplify use_blk_mq initialization · b23df0d0

由 Bart Van Assche 提交于 11月 18, 2016

Use a single statement to declare and initialize 'use_blk_mq' instead
of two statements.
Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

b23df0d0

dm: use blk_set_queue_dying() in __dm_destroy() · 2e91c369

由 Bart Van Assche 提交于 11月 18, 2016

After QUEUE_FLAG_DYING has been set any code that is waiting in
get_request() should be woken up.  But to get this behaviour
blk_set_queue_dying() must be used instead of only setting
QUEUE_FLAG_DYING.
Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

2e91c369

dm bufio: drop the lock when doing GFP_NOIO allocation · 41c73a49

由 Mikulas Patocka 提交于 11月 23, 2016

If the first allocation attempt using GFP_NOWAIT fails, drop the lock
and retry using GFP_NOIO allocation (lock is dropped because the
allocation can take some time).

Note that we won't do GFP_NOIO allocation when we loop for the second
time, because the lock shouldn't be dropped between __wait_for_free_buffer
and __get_unclaimed_buffer.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

41c73a49

dm bufio: don't take the lock in dm_bufio_shrink_count · d12067f4

由 Mikulas Patocka 提交于 11月 23, 2016

dm_bufio_shrink_count() is called from do_shrink_slab to find out how many
freeable objects are there. The reported value doesn't have to be precise,
so we don't need to take the dm-bufio lock.
Suggested-by: NDavid Rientjes <rientjes@google.com>
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

d12067f4

dm bufio: avoid sleeping while holding the dm_bufio lock · 9ea61cac

由 Douglas Anderson 提交于 11月 17, 2016

We've seen in-field reports showing _lots_ (18 in one case, 41 in
another) of tasks all sitting there blocked on:

  mutex_lock+0x4c/0x68
  dm_bufio_shrink_count+0x38/0x78
  shrink_slab.part.54.constprop.65+0x100/0x464
  shrink_zone+0xa8/0x198

In the two cases analyzed, we see one task that looks like this:

  Workqueue: kverityd verity_prefetch_io

  __switch_to+0x9c/0xa8
  __schedule+0x440/0x6d8
  schedule+0x94/0xb4
  schedule_timeout+0x204/0x27c
  schedule_timeout_uninterruptible+0x44/0x50
  wait_iff_congested+0x9c/0x1f0
  shrink_inactive_list+0x3a0/0x4cc
  shrink_lruvec+0x418/0x5cc
  shrink_zone+0x88/0x198
  try_to_free_pages+0x51c/0x588
  __alloc_pages_nodemask+0x648/0xa88
  __get_free_pages+0x34/0x7c
  alloc_buffer+0xa4/0x144
  __bufio_new+0x84/0x278
  dm_bufio_prefetch+0x9c/0x154
  verity_prefetch_io+0xe8/0x10c
  process_one_work+0x240/0x424
  worker_thread+0x2fc/0x424
  kthread+0x10c/0x114

...and that looks to be the one holding the mutex.

The problem has been reproduced on fairly easily:
0. Be running Chrome OS w/ verity enabled on the root filesystem
1. Pick test patch: http://crosreview.com/412360
2. Install launchBalloons.sh and balloon.arm from
     http://crbug.com/468342
   ...that's just a memory stress test app.
3. On a 4GB rk3399 machine, run
     nice ./launchBalloons.sh 4 900 100000
   ...that tries to eat 4 * 900 MB of memory and keep accessing.
4. Login to the Chrome web browser and restore many tabs

With that, I've seen printouts like:
  DOUG: long bufio 90758 ms
...and stack trace always show's we're in dm_bufio_prefetch().

The problem is that we try to allocate memory with GFP_NOIO while
we're holding the dm_bufio lock.  Instead we should be using
GFP_NOWAIT.  Using GFP_NOIO can cause us to sleep while holding the
lock and that causes the above problems.

The current behavior explained by David Rientjes:

  It will still try reclaim initially because __GFP_WAIT (or
  __GFP_KSWAPD_RECLAIM) is set by GFP_NOIO.  This is the cause of
  contention on dm_bufio_lock() that the thread holds.  You want to
  pass GFP_NOWAIT instead of GFP_NOIO to alloc_buffer() when holding a
  mutex that can be contended by a concurrent slab shrinker (if
  count_objects didn't use a trylock, this pattern would trivially
  deadlock).

This change significantly increases responsiveness of the system while
in this state.  It makes a real difference because it unblocks kswapd.
In the bug report analyzed, kswapd was hung:

   kswapd0         D ffffffc000204fd8     0    72      2 0x00000000
   Call trace:
   [<ffffffc000204fd8>] __switch_to+0x9c/0xa8
   [<ffffffc00090b794>] __schedule+0x440/0x6d8
   [<ffffffc00090bac0>] schedule+0x94/0xb4
   [<ffffffc00090be44>] schedule_preempt_disabled+0x28/0x44
   [<ffffffc00090d900>] __mutex_lock_slowpath+0x120/0x1ac
   [<ffffffc00090d9d8>] mutex_lock+0x4c/0x68
   [<ffffffc000708e7c>] dm_bufio_shrink_count+0x38/0x78
   [<ffffffc00030b268>] shrink_slab.part.54.constprop.65+0x100/0x464
   [<ffffffc00030dbd8>] shrink_zone+0xa8/0x198
   [<ffffffc00030e578>] balance_pgdat+0x328/0x508
   [<ffffffc00030eb7c>] kswapd+0x424/0x51c
   [<ffffffc00023f06c>] kthread+0x10c/0x114
   [<ffffffc000203dd0>] ret_from_fork+0x10/0x40

By unblocking kswapd memory pressure should be reduced.
Suggested-by: NDavid Rientjes <rientjes@google.com>
Reviewed-by: NGuenter Roeck <linux@roeck-us.net>
Signed-off-by: NDouglas Anderson <dianders@chromium.org>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

9ea61cac

dm table: simplify dm_table_determine_type() · 5b8c01f7

由 Bart Van Assche 提交于 11月 15, 2016

Use a single loop instead of two loops to determine whether or not
all_blk_mq has to be set.
Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

5b8c01f7