提交 · 03b047f45c29dff02f913a0234ca0cc1ca51966f · openeuler / raspberrypi-kernel

14 2月, 2017 3 次提交

md/r5cache: enable chunk_aligned_read with write back cache · 03b047f4

由 Song Liu 提交于 1月 11, 2017

Chunk aligned read significantly reduces CPU usage of raid456.
However, it is not safe to fully bypass the write back cache.
This patch enables chunk aligned read with write back cache.

For chunk aligned read, we track stripes in write back cache at
a bigger granularity, "big_stripe". Each chunk may contain more
than one stripe (for example, a 256kB chunk contains 64 4kB-page,
so this chunk contain 64 stripes). For chunk_aligned_read, these
stripes are grouped into one big_stripe, so we only need one lookup
for the whole chunk.

For each big_stripe, struct big_stripe_info tracks how many stripes
of this big_stripe are in the write back cache. We count how many
stripes of this big_stripe are in the write back cache. These
counters are tracked in a radix tree (big_stripe_tree).
r5c_tree_index() is used to calculate keys for the radix tree.

chunk_aligned_read() calls r5c_big_stripe_cached() to look up
big_stripe of each chunk in the tree. If this big_stripe is in the
tree, chunk_aligned_read() aborts. This look up is protected by
rcu_read_lock().

It is necessary to remember whether a stripe is counted in
big_stripe_tree. Instead of adding new flag, we reuses existing flags:
STRIPE_R5C_PARTIAL_STRIPE and STRIPE_R5C_FULL_STRIPE. If either of these
two flags are set, the stripe is counted in big_stripe_tree. This
requires moving set_bit(STRIPE_R5C_PARTIAL_STRIPE) to
r5c_try_caching_write(); and moving clear_bit of
STRIPE_R5C_PARTIAL_STRIPE and STRIPE_R5C_FULL_STRIPE to
r5c_finish_stripe_write_out().
Signed-off-by: NSong Liu <songliubraving@fb.com>
Reviewed-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>

03b047f4

raid5: only dispatch IO from raid5d for harddisk raid · 765d704d

由 Shaohua Li 提交于 1月 04, 2017

We made raid5 stripe handling multi-thread before. It works well for
SSD. But for harddisk, the multi-threading creates more disk seek, so
not always improve performance. For several hard disks based raid5,
multi-threading is required as raid5d becames a bottleneck especially
for sequential write.

To overcome the disk seek issue, we only dispatch IO from raid5d if the
array is harddisk based. Other threads can still handle stripes, but
can't dispatch IO.

Idealy, we should control IO dispatching order according to IO position
interrnally. Right now we still depend on block layer, which isn't very
efficient sometimes though.

My setup has 9 harddisks, each disk can do around 180M/s sequential
write. So in theory, the raid5 can do 180 * 8 = 1440M/s sequential
write. The test machine uses an ATOM CPU. I measure sequential write
with large iodepth bandwidth to raid array:

without patch: ~600M/s
without patch and group_thread_cnt=4: 750M/s
with patch and group_thread_cnt=4: 950M/s
with patch, group_thread_cnt=4, skip_copy=1: 1150M/s

We are pretty close to the maximum bandwidth in the large iodepth
iodepth case. The performance gap of small iodepth sequential write
between software raid and theory value is still very big though, because
we don't have an efficient pipeline.

Cc: NeilBrown <neilb@suse.com>
Cc: Song Liu <songliubraving@fb.com>
Signed-off-by: NShaohua Li <shli@fb.com>

765d704d

md linear: fix a race between linear_add() and linear_congested() · 03a9e24e

由 colyli@suse.de 提交于 1月 28, 2017

Recently I receive a bug report that on Linux v3.0 based kerenl, hot add
disk to a md linear device causes kernel crash at linear_congested(). From
the crash image analysis, I find in linear_congested(), mddev->raid_disks
contains value N, but conf->disks[] only has N-1 pointers available. Then
a NULL pointer deference crashes the kernel.

There is a race between linear_add() and linear_congested(), RCU stuffs
used in these two functions cannot avoid the race. Since Linuv v4.0
RCU code is replaced by introducing mddev_suspend().  After checking the
upstream code, it seems linear_congested() is not called in
generic_make_request() code patch, so mddev_suspend() cannot provent it
from being called. The possible race still exists.

Here I explain how the race still exists in current code.  For a machine
has many CPUs, on one CPU, linear_add() is called to add a hard disk to a
md linear device; at the same time on other CPU, linear_congested() is
called to detect whether this md linear device is congested before issuing
an I/O request onto it.

Now I use a possible code execution time sequence to demo how the possible
race happens,

seq    linear_add()                linear_congested()
 0                                 conf=mddev->private
 1   oldconf=mddev->private
 2   mddev->raid_disks++
 3                              for (i=0; i<mddev->raid_disks;i++)
 4                                bdev_get_queue(conf->disks[i].rdev->bdev)
 5   mddev->private=newconf

In linear_add() mddev->raid_disks is increased in time seq 2, and on
another CPU in linear_congested() the for-loop iterates conf->disks[i] by
the increased mddev->raid_disks in time seq 3,4. But conf with one more
element (which is a pointer to struct dev_info type) to conf->disks[] is
not updated yet, accessing its structure member in time seq 4 will cause a
NULL pointer deference fault.

To fix this race, there are 2 parts of modification in the patch,
 1) Add 'int raid_disks' in struct linear_conf, as a copy of
    mddev->raid_disks. It is initialized in linear_conf(), always being
    consistent with pointers number of 'struct dev_info disks[]'. When
    iterating conf->disks[] in linear_congested(), use conf->raid_disks to
    replace mddev->raid_disks in the for-loop, then NULL pointer deference
    will not happen again.
 2) RCU stuffs are back again, and use kfree_rcu() in linear_add() to
    free oldconf memory. Because oldconf may be referenced as mddev->private
    in linear_congested(), kfree_rcu() makes sure that its memory will not
    be released until no one uses it any more.
Also some code comments are added in this patch, to make this modification
to be easier understandable.

This patch can be applied for kernels since v4.0 after commit:
3be260cc ("md/linear: remove rcu protections in favour of
suspend/resume"). But this bug is reported on Linux v3.0 based kernel, for
people who maintain kernels before Linux v4.0, they need to do some back
back port to this patch.

Changelog:
 - V3: add 'int raid_disks' in struct linear_conf, and use kfree_rcu() to
       replace rcu_call() in linear_add().
 - v2: add RCU stuffs by suggestion from Shaohua and Neil.
 - v1: initial effort.
Signed-off-by: NColy Li <colyli@suse.de>
Cc: Shaohua Li <shli@fb.com>
Cc: Neil Brown <neilb@suse.com>
Cc: stable@vger.kernel.org
Signed-off-by: NShaohua Li <shli@fb.com>

03a9e24e

03 2月, 2017 3 次提交

dm crypt: replace RCU read-side section with rwsem · f5b0cba8

由 Ondrej Kozina 提交于 1月 31, 2017

The lockdep splat below hints at a bug in RCU usage in dm-crypt that
was introduced with commit c538f6ec ("dm crypt: add ability to use
keys from the kernel key retention service").  The kernel keyring
function user_key_payload() is in fact a wrapper for
rcu_dereference_protected() which must not be called with only
rcu_read_lock() section mark.

Unfortunately the kernel keyring subsystem doesn't currently provide
an interface that allows the use of an RCU read-side section.  So for
now we must drop RCU in favour of rwsem until a proper function is
made available in the kernel keyring subsystem.

===============================
[ INFO: suspicious RCU usage. ]
4.10.0-rc5 #2 Not tainted
-------------------------------
./include/keys/user-type.h:53 suspicious rcu_dereference_protected() usage!
other info that might help us debug this:
rcu_scheduler_active = 2, debug_locks = 1
2 locks held by cryptsetup/6464:
 #0:  (&md->type_lock){+.+.+.}, at: [<ffffffffa02472a2>] dm_lock_md_type+0x12/0x20 [dm_mod]
 #1:  (rcu_read_lock){......}, at: [<ffffffffa02822f8>] crypt_set_key+0x1d8/0x4b0 [dm_crypt]
stack backtrace:
CPU: 1 PID: 6464 Comm: cryptsetup Not tainted 4.10.0-rc5 #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.1-1.fc24 04/01/2014
Call Trace:
 dump_stack+0x67/0x92
 lockdep_rcu_suspicious+0xc5/0x100
 crypt_set_key+0x351/0x4b0 [dm_crypt]
 ? crypt_set_key+0x1d8/0x4b0 [dm_crypt]
 crypt_ctr+0x341/0xa53 [dm_crypt]
 dm_table_add_target+0x147/0x330 [dm_mod]
 table_load+0x111/0x350 [dm_mod]
 ? retrieve_status+0x1c0/0x1c0 [dm_mod]
 ctl_ioctl+0x1f5/0x510 [dm_mod]
 dm_ctl_ioctl+0xe/0x20 [dm_mod]
 do_vfs_ioctl+0x8e/0x690
 ? ____fput+0x9/0x10
 ? task_work_run+0x7e/0xa0
 ? trace_hardirqs_on_caller+0x122/0x1b0
 SyS_ioctl+0x3c/0x70
 entry_SYSCALL_64_fastpath+0x18/0xad
RIP: 0033:0x7f392c9a4ec7
RSP: 002b:00007ffef6383378 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007ffef63830a0 RCX: 00007f392c9a4ec7
RDX: 000000000124fcc0 RSI: 00000000c138fd09 RDI: 0000000000000005
RBP: 00007ffef6383090 R08: 00000000ffffffff R09: 00000000012482b0
R10: 2a28205d34383336 R11: 0000000000000246 R12: 00007f392d803a08
R13: 00007ffef63831e0 R14: 0000000000000000 R15: 00007f392d803a0b

Fixes: c538f6ec ("dm crypt: add ability to use keys from the kernel key retention service")
Reported-by: NMilan Broz <mbroz@redhat.com>
Signed-off-by: NOndrej Kozina <okozina@redhat.com>
Reviewed-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

f5b0cba8

dm rq: cope with DM device destruction while in dm_old_request_fn() · 4087a1ff

由 Mike Snitzer 提交于 1月 25, 2017

Fixes a crash in dm_table_find_target() due to a NULL struct dm_table
being passed from dm_old_request_fn() that races with DM device
destruction.

Reported-by: artem@flashgrid.io
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org

4087a1ff

M
dm mpath: cleanup -Wbool-operation warning in choose_pgpath() · d19a55cc
由 Mike Snitzer 提交于 1月 06, 2017
```
Reported-by: NDavid Binderman <dcb314@hotmail.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
```
d19a55cc

25 1月, 2017 6 次提交

md/r5cache: disable write back for degraded array · 2e38a37f

由 Song Liu 提交于 1月 24, 2017

write-back cache in degraded mode introduces corner cases to the array.
Although we try to cover all these corner cases, it is safer to just
disable write-back cache when the array is in degraded mode.

In this patch, we disable writeback cache for degraded mode:
1. On device failure, if the array enters degraded mode, raid5_error()
   will submit async job r5c_disable_writeback_async to disable
   writeback;
2. In r5c_journal_mode_store(), it is invalid to enable writeback in
   degraded mode;
3. In r5c_try_caching_write(), stripes with s->failed>0 will be handled
   in write-through mode.
Signed-off-by: NSong Liu <songliubraving@fb.com>
Signed-off-by: NShaohua Li <shli@fb.com>

2e38a37f

md/r5cache: shift complex rmw from read path to write path · 07e83364

由 Song Liu 提交于 1月 23, 2017

Write back cache requires a complex RMW mechanism, where old data is
read into dev->orig_page for prexor, and then xor is done with
dev->page. This logic is already implemented in the write path.

However, current read path is not awared of this requirement. When
the array is optimal, the RMW is not required, as the data are
read from raid disks. However, when the target stripe is degraded,
complex RMW is required to generate right data.

To keep read path as clean as possible, we handle read path by
flushing degraded, in-journal stripes before processing reads to
missing dev.

Specifically, when there is read requests to a degraded stripe
with data in journal, handle_stripe_fill() calls
r5c_make_stripe_write_out() and exits. Then handle_stripe_dirtying()
will do the complex RMW and flush the stripe to RAID disks. After
that, read requests are handled.

There is one more corner case when there is non-overwrite bio for
the missing (or out of sync) dev. handle_stripe_dirtying() will not
be able to process the non-overwrite bios without constructing the
data in handle_stripe_fill(). This is fixed by delaying non-overwrite
bios in handle_stripe_dirtying(). So handle_stripe_fill() works on
these bios after the stripe is flushed to raid disks.
Signed-off-by: NSong Liu <songliubraving@fb.com>
Signed-off-by: NShaohua Li <shli@fb.com>

07e83364

md/r5cache: flush data only stripes in r5l_recovery_log() · a85dd7b8

由 Song Liu 提交于 1月 23, 2017

For safer operation, all arrays start in write-through mode, which has been
better tested and is more mature. And actually the write-through/write-mode
isn't persistent after array restarted, so we always start array in
write-through mode. However, if recovery found data-only stripes before the
shutdown (from previous write-back mode), it is not safe to start the array in
write-through mode, as write-through mode can not handle stripes with data in
write-back cache. To solve this problem, we flush all data-only stripes in
r5l_recovery_log(). When r5l_recovery_log() returns, the array starts with
empty cache in write-through mode.

This logic is implemented in r5c_recovery_flush_data_only_stripes():

1. enable write back cache
2. flush all stripes
3. wake up conf->mddev->thread
4. wait for all stripes get flushed (reuse wait_for_quiescent)
5. disable write back cache

The wait in 4 will be waked up in release_inactive_stripe_list()
when conf->active_stripes reaches 0.

It is safe to wake up mddev->thread here because all the resource
required for the thread has been initialized.
Signed-off-by: NSong Liu <songliubraving@fb.com>
Signed-off-by: NShaohua Li <shli@fb.com>

a85dd7b8

S
md/raid5: move comment of fetch_block to right location · ba02684d
由 Song Liu 提交于 1月 12, 2017
```
Signed-off-by: NSong Liu <songliubraving@fb.com>
Signed-off-by: NShaohua Li <shli@fb.com>
```
ba02684d

md/r5cache: read data into orig_page for prexor of cached data · 86aa1397

由 Song Liu 提交于 1月 12, 2017

With write back cache, we use orig_page to do prexor. This patch
makes sure we read data into orig_page for it.

Flag R5_OrigPageUPTDODATE is added to show whether orig_page
has the latest data from raid disk.

We introduce a helper function uptodate_for_rmw() to simplify
the a couple conditions in handle_stripe_dirtying().
Signed-off-by: NSong Liu <songliubraving@fb.com>
Signed-off-by: NShaohua Li <shli@fb.com>

86aa1397

md/raid5-cache: delete meaningless code · d46d29f0

由 Shaohua Li 提交于 1月 11, 2017

sector_t is unsigned long, it's never < 0
Reported-by: NJulia Lawall <julia.lawall@lip6.fr>
Signed-off-by: NShaohua Li <shli@fb.com>

d46d29f0

10 1月, 2017 1 次提交

md/raid5: Use correct IS_ERR() variation on pointer check · 32cd7cbb

由 Jes Sorensen 提交于 1月 06, 2017

This fixes a build error on certain architectures, such as ppc64.

Fixes: 6995f0b2("md: takeover should clear unrelated bits")
Signed-off-by: NJes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: NShaohua Li <shli@fb.com>

32cd7cbb

06 1月, 2017 5 次提交

md: cleanup mddev flag clear for takeover · 394ed8e4

由 Shaohua Li 提交于 1月 04, 2017

Commit 6995f0b2 (md: takeover should clear unrelated bits) clear
unrelated bits, but it's quite fragile. To avoid error in the future,
define a macro for unsupported mddev flags for each raid type and use it
to clear unsupported mddev flags. This should be less error-prone.
Suggested-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>

394ed8e4

md/r5cache: fix spelling mistake on "recoverying" · 99f17890

由 Colin Ian King 提交于 12月 23, 2016

Trivial fix to spelling mistake "recoverying" to "recovering" in
pr_dbg message.
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Signed-off-by: NShaohua Li <shli@fb.com>

99f17890

md/r5cache: assign conf->log before r5l_load_log() · d2250f10

由 Song Liu 提交于 12月 14, 2016

r5l_load_log() calls functions that requires a proper conf->log,
for example, r5c_is_writeback(). Therefore, we should set
conf->log before calling r5l_load_log(). If r5l_load_log() fails,
conf->log is set back to NULL.
Signed-off-by: NSong Liu <songliubraving@fb.com>
Signed-off-by: NShaohua Li <shli@fb.com>

d2250f10

md/r5cache: simplify handling of sh->log_start in recovery · 3c66abba

由 Song Liu 提交于 12月 14, 2016

We only need to update sh->log_start at the end of recovery,
which is r5c_recovery_rewrite_data_only_stripes(), so it is not
necessary to set it before that. In this patch, log_start is
removed from r5c_recovery_alloc_stripe().

After updating all sh->log_start, rewrite_data_only_stripes()
also updates log->next_checkpoints to the last sh->log_start.
Signed-off-by: NSong Liu <songliubraving@fb.com>
Signed-off-by: NShaohua Li <shli@fb.com>

3c66abba

md/raid5-cache: removes unnecessary write-through mode judgments · 28ca833e

由 JackieLiu 提交于 12月 13, 2016

The write-through mode has been returned in front of the function,
do not need to do it again.
Signed-off-by: NJackieLiu <liuyun01@kylinos.cn>
Reviewed-by: NSong Liu <songliubraving@fb.com>
Signed-off-by: NShaohua Li <shli@fb.com>

28ca833e

04 1月, 2017 2 次提交

md/raid10: Refactor raid10_make_request · bb5f1ed7

由 Robert LeBlanc 提交于 12月 05, 2016

Refactor raid10_make_request into seperate read and write functions to
clean up the code.

Shaohua: add the recovery check back to read path
Signed-off-by: NRobert LeBlanc <robert@leblancnet.us>
Signed-off-by: NShaohua Li <shli@fb.com>

bb5f1ed7

md/raid1: Refactor raid1_make_request · 3b046a97

由 Robert LeBlanc 提交于 12月 05, 2016

Refactor raid1_make_request to make read and write code in their own
functions to clean up the code.
Signed-off-by: NRobert LeBlanc <robert@leblancnet.us>
Signed-off-by: NShaohua Li <shli@fb.com>

3b046a97

25 12月, 2016 1 次提交

Replace <asm/uaccess.h> with <linux/uaccess.h> globally · 7c0f6ba6

由 Linus Torvalds 提交于 12月 24, 2016

This was entirely automated, using the script by Al:

  PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>'
  sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \
        $(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h)

to do the replacement at the end of the merge window.
Requested-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7c0f6ba6

18 12月, 2016 2 次提交
- E
  bcache: partition support: add 16 minors per bcacheN device · b8c0d911
  由 Eric Wheeler 提交于 10月 23, 2016
```
Signed-off-by: NEric Wheeler <bcache@linux.ewheeler.net>
Tested-by: NWido den Hollander <wido@widodh.nl>
```
  b8c0d911
- K
  bcache: Make gc wakeup sane, remove set_task_state() · be628be0
  由 Kent Overstreet 提交于 10月 26, 2016
```
Signed-off-by: NKent Overstreet <kent.overstreet@gmail.com>
```
  be628be0
16 12月, 2016 1 次提交

linux: drop __bitwise__ everywhere · 9efeccac

由 Michael S. Tsirkin 提交于 12月 11, 2016

__bitwise__ used to mean "yes, please enable sparse checks
unconditionally", but now that we dropped __CHECK_ENDIAN__
__bitwise is exactly the same.
There aren't many users, replace it by __bitwise everywhere.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: NStefan Schmidt <stefan@osg.samsung.com>
Acked-by: NKrzysztof Kozlowski <krzk@kernel.org>
Akced-by: NLee Duncan <lduncan@suse.com>

9efeccac

14 12月, 2016 1 次提交

dm flakey: introduce "error_writes" feature · ef548c55

由 Mike Snitzer 提交于 12月 13, 2016

Recent dm-flakey fixes, to have reads error out during the "down"
interval, made it so that the previous read behaviour is no longer
available.

It is useful to have reads complete like normal but have writes error
out, so make it possible again with a new "error_writes" feature.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

ef548c55

09 12月, 2016 15 次提交

md: separate flags for superblock changes · 2953079c

由 Shaohua Li 提交于 12月 08, 2016

The mddev->flags are used for different purposes. There are a lot of
places we check/change the flags without masking unrelated flags, we
could check/change unrelated flags. These usage are most for superblock
write, so spearate superblock related flags. This should make the code
clearer and also fix real bugs.
Reviewed-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>

2953079c

md: MD_RECOVERY_NEEDED is set for mddev->recovery · 82a301cb

由 Shaohua Li 提交于 12月 08, 2016

Fixes: 90f5f7ad("md: Wait for md_check_recovery before attempting device
removal.")
Reviewed-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>

82a301cb

md: takeover should clear unrelated bits · 6995f0b2

由 Shaohua Li 提交于 12月 08, 2016

When we change level from raid1 to raid5, the MD_FAILFAST_SUPPORTED bit
will be accidentally set, but raid5 doesn't support it. The same is true
for the MD_HAS_JOURNAL bit.

Fix: 46533ff7 (md: Use REQ_FAILFAST_* on metadata writes where appropriate)
Reviewed-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>

6995f0b2

dm cache policy smq: use hash_32() instead of hash_32_generic() · e99dda8f

由 Mike Snitzer 提交于 12月 08, 2016

Switch to using hash_32() because hash_32_generic() should only be used
by the kernel's selftests.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

e99dda8f

dm crypt: reject key strings containing whitespace chars · 027c431c

由 Ondrej Kozina 提交于 12月 01, 2016

Unfortunately key_string may theoretically contain whitespace even after
it's processed by dm_split_args(). The reason for this is DM core
supports escaping of almost all chars including any whitespace.

If userspace passes a key to the kernel in format ":32:logon:my_prefix:my\ key"
dm-crypt will look up key "my_prefix:my key" in kernel keyring service.
So far everything's fine.

Unfortunately if userspace later calls DM_TABLE_STATUS ioctl, it will not
receive back expected ":32:logon:my_prefix:my\ key" but the unescaped version
instead. Also userpace (most notably cryptsetup) is not ready to parse
single target argument containing (even escaped) whitespace chars and any
whitespace is simply taken as delimiter of another argument.

This effect is mitigated by the fact libdevmapper curently performs
double escaping of '\' char. Any user input in format "x\ x" is
transformed into "x\\ x" before being passed to the kernel. Nonetheless
dm-crypt may be used without libdevmapper. Therefore the near-term
solution to this is to reject any key string containing whitespace.
Signed-off-by: NOndrej Kozina <okozina@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

027c431c

dm space map: always set ev if sm_ll_mutate() succeeds · b446396b

由 Benjamin Marzinski 提交于 12月 04, 2016

If no block was allocated or freed, sm_ll_mutate() wasn't setting
*ev, leaving the variable unitialized. sm_ll_insert(),
sm_disk_inc_block(), and sm_disk_new_block() all check ev to see
if there was an allocation event in sm_ll_mutate(), possibly
reading unitialized data.

If no allocation event occured, sm_ll_mutate() should set *ev
to SM_NONE.
Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com>
Acked-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

b446396b

dm space map metadata: skip useless memcpy in metadata_ll_init_index() · 0c79ce0b

由 Benjamin Marzinski 提交于 12月 02, 2016

When metadata_ll_init_index() is called by sm_ll_new_metadata(),
ll->mi_le hasn't been initialized yet. So, when
metadata_ll_init_index() copies the contents of ll->mi_le into the
newly allocated bitmap_root, it is just copying garbage. ll->mi_le
will be allocated later in sm_ll_extend() and copied into the
bitmap_root, in sm_ll_commit().
Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

0c79ce0b

dm space map metadata: fix 'struct sm_metadata' leak on failed create · 314c25c5

由 Benjamin Marzinski 提交于 11月 30, 2016

In dm_sm_metadata_create() we temporarily change the dm_space_map
operations from 'ops' (whose .destroy function deallocates the
sm_metadata) to 'bootstrap_ops' (whose .destroy function doesn't).

If dm_sm_metadata_create() fails in sm_ll_new_metadata() or
sm_ll_extend(), it exits back to dm_tm_create_internal(), which calls
dm_sm_destroy() with the intention of freeing the sm_metadata, but it
doesn't (because the dm_space_map operations is still set to
'bootstrap_ops').

Fix this by setting the dm_space_map operations back to 'ops' if
dm_sm_metadata_create() fails when it is set to 'bootstrap_ops'.
Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com>
Acked-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org

314c25c5

dm raid: fix discard support regression · 11e29684

由 Heinz Mauelshagen 提交于 11月 29, 2016

Commit ecbfb9f1 ("dm raid: add raid level takeover support") moved the
configure_discard_support() call from raid_ctr() to raid_preresume().

Enabling/disabling discard _must_ happen during table load (through the
.ctr hook).  Fix this regression by moving the
configure_discard_support() call back to raid_ctr().

Fixes: ecbfb9f1 ("dm raid: add raid level takeover support")
Cc: stable@vger.kernel.org # 4.8+
Signed-off-by: NHeinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

11e29684

dm raid: don't allow "write behind" with raid4/5/6 · affa9d28

由 Heinz Mauelshagen 提交于 11月 24, 2016

Remove CTR_FLAG_MAX_WRITE_BEHIND from raid4/5/6's valid ctr flags.

Only the md raid1 personality supports setting a maximum number
of "write behind" write IOs on any legs set to "write mostly".
"write mostly" enhances throughput with slow links/disks.

Technically the "write behind" value is a write intent bitmap
property only being respected by the raid1 personality.  It allows a
maximum number of "write behind" writes to any "write mostly" raid1
mirror legs to be delayed and avoids reads from such legs.

No other MD personalities supported via dm-raid make use of "write
behind", thus setting this property is superfluous; it wouldn't cause
harm but it is correct to reject it.
Signed-off-by: NHeinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

affa9d28

dm mpath: use hw_handler_params if attached hw_handler is same as requested · 54cd640d

由 tang.junhui 提交于 11月 24, 2016

Let the requested m->hw_handler_params be used if the attached hardware
handler is the same handler as requested with m->hw_handler_name.
Signed-off-by: Ntang.junhui <tang.junhui@zte.com.cn>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

54cd640d

dm crypt: add ability to use keys from the kernel key retention service · c538f6ec

由 Ondrej Kozina 提交于 11月 21, 2016

The kernel key service is a generic way to store keys for the use of
other subsystems. Currently there is no way to use kernel keys in dm-crypt.
This patch aims to fix that. Instead of key userspace may pass a key
description with preceding ':'. So message that constructs encryption
mapping now looks like this:

  <cipher> [<key>|:<key_string>] <iv_offset> <dev_path> <start> [<#opt_params> <opt_params>]

where <key_string> is in format: <key_size>:<key_type>:<key_description>

Currently we only support two elementary key types: 'user' and 'logon'.
Keys may be loaded in dm-crypt either via <key_string> or using
classical method and pass the key in hex representation directly.

dm-crypt device initialised with a key passed in hex representation may be
replaced with key passed in key_string format and vice versa.

(Based on original work by Andrey Ryabinin)
Signed-off-by: NOndrej Kozina <okozina@redhat.com>
Reviewed-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

c538f6ec

dm array: remove a dead assignment in populate_ablock_with_values() · 0637018d

由 Bart Van Assche 提交于 11月 18, 2016

A value is assigned to 'nr_entries' but is never used, remove it.
Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

0637018d

dm ioctl: use offsetof() instead of open-coding it · 6080758d

由 Bart Van Assche 提交于 11月 18, 2016

Subtracting sizes is a fragile approach because the result is only
correct if the compiler has not added any padding at the end of the
structure. Hence use offsetof() instead of size subtraction. An
additional advantage of offsetof() is that it makes the intent more
clear.
Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

6080758d

dm rq: simplify use_blk_mq initialization · b23df0d0

由 Bart Van Assche 提交于 11月 18, 2016

Use a single statement to declare and initialize 'use_blk_mq' instead
of two statements.
Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

b23df0d0