提交 · 28dec870aaf704af1421ac014f7f8abf4cac7c69 · openanolis / cloud-kernel

08 6月, 2018 2 次提交

md: Unify mddev destruction paths · 28dec870

由 Kent Overstreet 提交于 6月 07, 2018

Previously, mddev_put() had a couple different paths for freeing a
mddev, due to the fact that the kobject wasn't initialized when the
mddev was first allocated. If we move the kobject_init() to when it's
first allocated and just use kobject_add() later, we can clean all this
up.

This also removes a hack in mddev_put() to avoid freeing biosets under a
spinlock, which involved copying biosets on the stack after the reset
bioset_init() changes.
Signed-off-by: NKent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

28dec870

dm: use bioset_init_from_src() to copy bio_set · 2a2a4c51

由 Jens Axboe 提交于 6月 07, 2018

We can't just copy and clear a bio_set, use the bio helper to
setup a new bio_set with the settings from another one.

Fixes: 6f1c819c ("dm: convert to bioset_init()/mempool_init()")
Reported-by: NVenkat R.B <vrbagal1@linux.vnet.ibm.com>
Tested-by: NVenkat R.B <vrbagal1@linux.vnet.ibm.com>
Tested-by: NLi Wang <liwang@redhat.com>
Reviewed-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

2a2a4c51

05 6月, 2018 1 次提交

dm: Use kzalloc for all structs with embedded biosets/mempools · d3775354

由 Kent Overstreet 提交于 6月 05, 2018

mempool_init()/bioset_init() require that the mempools/biosets be zeroed
first; they probably should not _require_ this, but not allocating those
structs with kzalloc is a fairly nonsensical thing to do (calling
mempool_exit()/bioset_exit() on an uninitialized mempool/bioset is legal
and safe, but only works if said memory was zeroed.)
Acked-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NKent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d3775354

03 6月, 2018 1 次提交

dm-crypt: fix warning in shutdown path · d00a11df

由 Kent Overstreet 提交于 6月 02, 2018

The counter for the number of allocated pages includes pages in the
mempool's reserve, so checking that the number of allocated pages is 0
needs to happen after we exit the mempool.

Fixes: 6f1c819c ("dm: convert to bioset_init()/mempool_init()")
Signed-off-by: NKent Overstreet <kent.overstreet@gmail.com>
Reported-by: NKrzysztof Kozlowski <krzk@kernel.org>
Acked-by: NMike Snitzer <snitzer@redhat.com>

Fixed to always just use percpu_counter_sum()
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d00a11df

31 5月, 2018 4 次提交

dm: convert to bioset_init()/mempool_init() · 6f1c819c

由 Kent Overstreet 提交于 5月 20, 2018

Convert dm to embedded bio sets.
Acked-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NKent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

6f1c819c

md: convert to bioset_init()/mempool_init() · afeee514

由 Kent Overstreet 提交于 5月 20, 2018

Convert md to embedded bio sets.
Signed-off-by: NKent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

afeee514

bcache: convert to bioset_init()/mempool_init() · d19936a2

由 Kent Overstreet 提交于 5月 20, 2018

Convert bcache to embedded bio sets.
Reviewed-by: NColy Li <colyli@suse.de>
Signed-off-by: NKent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d19936a2

block: convert bounce, q->bio_split to bioset_init()/mempool_init() · 338aa96d

由 Kent Overstreet 提交于 5月 20, 2018

Convert the core block functionality to embedded bio sets.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NKent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

338aa96d

29 5月, 2018 4 次提交

bcache: Replace bch_read_string_list() by __sysfs_match_string() · ce4c3e19

由 Andy Shevchenko 提交于 5月 28, 2018

Kernel library has a common function to match user input from sysfs
against an array of strings. Thus, replace bch_read_string_list() by
__sysfs_match_string().
Signed-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: NColy Li <colyli@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ce4c3e19

bcache: Move couple of functions to sysfs.c · ecb37ce9

由 Andy Shevchenko 提交于 5月 28, 2018

There is couple of functions that are used exclusively in sysfs.c.
Move it to there and make them static.

Besides above, it will allow further clean up.

No functional change intended.
Signed-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: NColy Li <colyli@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ecb37ce9

bcache: Move couple of string arrays to sysfs.c · 04cbc211

由 Andy Shevchenko 提交于 5月 28, 2018

There is couple of string arrays that are used exclusively in sysfs.c.
Move it to there and make them static.

Besides above, it will allow further clean up.

No functional change intended.
Signed-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: NColy Li <colyli@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

04cbc211

bcache: stop bcache device when backing device is offline · 0f0709e6

由 Coly Li 提交于 5月 28, 2018

Currently bcache does not handle backing device failure, if backing
device is offline and disconnected from system, its bcache device can still
be accessible. If the bcache device is in writeback mode, I/O requests even
can success if the requests hit on cache device. That is to say, when and
how bcache handles offline backing device is undefined.

This patch tries to handle backing device offline in a rather simple way,
- Add cached_dev->status_update_thread kernel thread to update backing
  device status in every 1 second.
- Add cached_dev->offline_seconds to record how many seconds the backing
  device is observed to be offline. If the backing device is offline for
  BACKING_DEV_OFFLINE_TIMEOUT (30) seconds, set dc->io_disable to 1 and
  call bcache_device_stop() to stop the bache device which linked to the
  offline backing device.

Now if a backing device is offline for BACKING_DEV_OFFLINE_TIMEOUT seconds,
its bcache device will be removed, then user space application writing on
it will get error immediately, and handler the device failure in time.

This patch is quite simple, does not handle more complicated situations.
Once the bcache device is stopped, users need to recovery the backing
device, register and attach it manually.

Changelog:
v3: call wait_for_kthread_stop() before exits kernel thread.
v2: remove "bcache: " prefix when calling pr_warn().
v1: initial version.
Signed-off-by: NColy Li <colyli@suse.de>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Cc: Michael Lyle <mlyle@lyle.org>
Cc: Junhui Tang <tang.junhui@zte.com.cn>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

0f0709e6

17 5月, 2018 1 次提交

bcache: return 0 from bch_debug_init() if CONFIG_DEBUG_FS=n · 1c1a2ee1

由 Coly Li 提交于 5月 17, 2018

Commit 539d39eb ("bcache: fix wrong return value in bch_debug_init()")
returns the return value of debugfs_create_dir() to bcache_init(). When
CONFIG_DEBUG_FS=n, bch_debug_init() always returns 1 and makes
bcache_init() failedi.

This patch makes bch_debug_init() always returns 0 if CONFIG_DEBUG_FS=n,
so bcache can continue to work for the kernels which don't have debugfs
enanbled.

Changelog:
v4: Add Acked-by from Kent Overstreet.
v3: Use IS_ENABLED(CONFIG_DEBUG_FS) to replace #ifdef DEBUG_FS.
v2: Remove a warning information
v1: Initial version.

Fixes: Commit 539d39eb ("bcache: fix wrong return value in bch_debug_init()")
Cc: stable@vger.kernel.org
Signed-off-by: NColy Li <colyli@suse.de>
Reported-by: NMassimo B. <massimo.b@gmx.net>
Reported-by: NKai Krakow <kai@kaishome.de>
Tested-by: NKai Krakow <kai@kaishome.de>
Acked-by: NKent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

1c1a2ee1

14 5月, 2018 1 次提交

block: sanitize blk_get_request calling conventions · ff005a06

由 Christoph Hellwig 提交于 5月 09, 2018

Switch everyone to blk_get_request_flags, and then rename
blk_get_request_flags to blk_get_request.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ff005a06

09 5月, 2018 1 次提交

block: consolidate struct request timestamp fields · 522a7775

由 Omar Sandoval 提交于 5月 09, 2018

Currently, struct request has four timestamp fields:

- A start time, set at get_request time, in jiffies, used for iostats
- An I/O start time, set at start_request time, in ktime nanoseconds,
  used for blk-stats (i.e., wbt, kyber, hybrid polling)
- Another start time and another I/O start time, used for cfq and bfq

These can all be consolidated into one start time and one I/O start
time, both in ktime nanoseconds, shaving off up to 16 bytes from struct
request depending on the kernel config.
Signed-off-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

522a7775

04 5月, 2018 1 次提交

dm mirror: remove VLA usage · 65972a6f

由 Kees Cook 提交于 4月 10, 2018

On the quest to remove all VLAs from the kernel[1], this avoids VLAs
in dm-raid1.c by just using the maximum size for the stack arrays.
The nr_mirrors value was already capped at 9, so this makes it a trivial
adjustment to the array sizes.

[1] https://lkml.org/lkml/2018/3/7/621Signed-off-by: NKees Cook <keescook@chromium.org>
Acked-by: NHeinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

65972a6f

03 5月, 2018 6 次提交

bcache: use pr_info() to inform duplicated CACHE_SET_IO_DISABLE set · 09a44ca2

由 Coly Li 提交于 5月 03, 2018

It is possible that multiple I/O requests hits on failed cache device or
backing device, therefore it is quite common that CACHE_SET_IO_DISABLE is
set already when a task tries to set the bit from bch_cache_set_error().
Currently the message "CACHE_SET_IO_DISABLE already set" is printed by
pr_warn(), which might mislead users to think a serious fault happens in
source code.

This patch uses pr_info() to print the information in such situation,
avoid extra worries. This information is helpful to understand bcache
behavior in cache device failures, so I still keep them in source code.

Fixes: 771f393e ("bcache: add CACHE_SET_IO_DISABLE to struct cache_set flags")
Signed-off-by: NColy Li <colyli@suse.de>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

09a44ca2

bcache: set dc->io_disable to true in conditional_stop_bcache_device() · 4fd8e138

由 Coly Li 提交于 5月 03, 2018

Commit 7e027ca4 ("bcache: add stop_when_cache_set_failed option to
backing device") adds stop_when_cache_set_failed option and stops bcache
device if stop_when_cache_set_failed is auto and there is dirty data on
broken cache device. There might exists a small time gap that the cache
set is released and set to NULL but bcache device is not released yet
(because they are released in parallel). During this time gap, dc->c is
NULL so CACHE_SET_IO_DISABLE won't be checked, and dc->io_disable is still
false, so new coming I/O requests will be accepted and directly go into
backing device as no cache set attached to. If there is dirty data on
cache device, this behavior may introduce potential inconsistent data.

This patch sets dc->io_disable to true before calling bcache_device_stop()
to make sure the backing device will reject new coming I/O request as
well, so even in the small time gap no I/O will directly go into backing
device to corrupt data consistency.

Fixes: 7e027ca4 ("bcache: add stop_when_cache_set_failed option to backing device")
Signed-off-by: NColy Li <colyli@suse.de>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

4fd8e138

bcache: add wait_for_kthread_stop() in bch_allocator_thread() · ecb2ba8c

由 Coly Li 提交于 5月 03, 2018

When CACHE_SET_IO_DISABLE is set on cache set flags, bcache allocator
thread routine bch_allocator_thread() may stop the while-loops and
exit. Then it is possible to observe the following kernel oops message,

[  631.068366] bcache: bch_btree_insert() error -5
[  631.069115] bcache: cached_dev_detach_finish() Caching disabled for sdf
[  631.070220] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
[  631.070250] PGD 0 P4D 0
[  631.070261] Oops: 0002 [#1] SMP PTI
[snipped]
[  631.070578] Workqueue: events cache_set_flush [bcache]
[  631.070597] RIP: 0010:exit_creds+0x1b/0x50
[  631.070610] RSP: 0018:ffffc9000705fe08 EFLAGS: 00010246
[  631.070626] RAX: 0000000000000001 RBX: ffff880a622ad300 RCX: 000000000000000b
[  631.070645] RDX: 0000000000000601 RSI: 000000000000000c RDI: 0000000000000000
[  631.070663] RBP: ffff880a622ad300 R08: ffffea00190c66e0 R09: 0000000000000200
[  631.070682] R10: ffff880a48123000 R11: ffff880000000000 R12: 0000000000000000
[  631.070700] R13: ffff880a4b160e40 R14: ffff880a4b160000 R15: 0ffff880667e2530
[  631.070719] FS:  0000000000000000(0000) GS:ffff880667e00000(0000) knlGS:0000000000000000
[  631.070740] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  631.070755] CR2: 0000000000000000 CR3: 000000000200a001 CR4: 00000000003606e0
[  631.070774] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  631.070793] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  631.070811] Call Trace:
[  631.070828]  __put_task_struct+0x55/0x160
[  631.070845]  kthread_stop+0xee/0x100
[  631.070863]  cache_set_flush+0x11d/0x1a0 [bcache]
[  631.070879]  process_one_work+0x146/0x340
[  631.070892]  worker_thread+0x47/0x3e0
[  631.070906]  kthread+0xf5/0x130
[  631.070917]  ? max_active_store+0x60/0x60
[  631.070930]  ? kthread_bind+0x10/0x10
[  631.070945]  ret_from_fork+0x35/0x40
[snipped]
[  631.071017] RIP: exit_creds+0x1b/0x50 RSP: ffffc9000705fe08
[  631.071033] CR2: 0000000000000000
[  631.071045] ---[ end trace 011c63a24b22c927 ]---
[  631.071085] bcache: bcache_device_free() bcache0 stopped

The reason is when cache_set_flush() tries to call kthread_stop() to stop
allocator thread, but it exits already due to cache device I/O errors.

This patch adds wait_for_kthread_stop() at tail of bch_allocator_thread(),
to prevent the thread routine exiting directly. Then the allocator thread
can be blocked at wait_for_kthread_stop() and wait for cache_set_flush()
to stop it by calling kthread_stop().

changelog:
v3: add Reviewed-by from Hannnes.
v2: not directly return from allocator_wait(), move 'return 0' to tail of
    bch_allocator_thread().
v1: initial version.

Fixes: 771f393e ("bcache: add CACHE_SET_IO_DISABLE to struct cache_set flags")
Signed-off-by: NColy Li <colyli@suse.de>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ecb2ba8c

bcache: count backing device I/O error for writeback I/O · bf78980f

由 Coly Li 提交于 5月 03, 2018

Commit c7b7bd07 ("bcache: add io_disable to struct cached_dev")
counts backing device I/O requets and set dc->io_disable to true if error
counters exceeds dc->io_error_limit. But it only counts I/O errors for
regular I/O request, neglects errors of write back I/Os when backing device
is offline.

This patch counts the errors of writeback I/Os, in dirty_endio() if
bio->bi_status is  not 0, it means error happens when writing dirty keys
to backing device, then bch_count_backing_io_errors() is called.

By this fix, even there is no reqular I/O request coming, if writeback I/O
errors exceed dc->io_error_limit, the bcache device may still be stopped
for the broken backing device.

Fixes: c7b7bd07 ("bcache: add io_disable to struct cached_dev")
Signed-off-by: NColy Li <colyli@suse.de>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

bf78980f

bcache: set CACHE_SET_IO_DISABLE in bch_cached_dev_error() · 6147305c

由 Coly Li 提交于 5月 03, 2018

Commit c7b7bd07 ("bcache: add io_disable to struct cached_dev") tries
to stop bcache device by calling bcache_device_stop() when too many I/O
errors happened on backing device. But if there is internal I/O happening
on cache device (writeback scan, garbage collection, etc), a regular I/O
request triggers the internal I/Os may still holds a refcount of dc->count,
and the refcount may only be dropped after the internal I/O stopped.

By this patch, bch_cached_dev_error() will check if the backing device is
attached to a cache set, if yes that CACHE_SET_IO_DISABLE will be set to
flags of this cache set. Then internal I/Os on cache device will be
rejected and stopped immediately, and the bcache device can be stopped.

For people who are not familiar with the interesting refcount dependance,
let me explain a bit more how the fix works. Example the writeback thread
will scan cache device for dirty data writeback purpose. Before it stopps,
it holds a refcount of dc->count. When CACHE_SET_IO_DISABLE bit is set,
the internal I/O will stopped and the while-loop in bch_writeback_thread()
quits and calls cached_dev_put() to drop dc->count. If this is the last
refcount to drop, then cached_dev_detach_finish() will be called. In this
call back function, in turn closure_put(dc->disk.cl) is called to drop a
refcount of closure dc->disk.cl. If this is the last refcount of this
closure to drop, then cached_dev_flush() will be called. Then the cached
device is freed. So if CACHE_SET_IO_DISABLE is not set, the bache device
can not be stopped until all inernal cache device I/O stopped. For large
size cache device, and writeback thread competes locks with gc thread,
there might be a quite long time to wait.

Fixes: c7b7bd07 ("bcache: add io_disable to struct cached_dev")
Signed-off-by: NColy Li <colyli@suse.de>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

6147305c

bcache: store disk name in struct cache and struct cached_dev · 6e916a7e

由 Coly Li 提交于 5月 03, 2018

Current code uses bdevname() or bio_devname() to reference gendisk
disk name when bcache needs to display the disk names in kernel message.
It was safe before bcache device failure handling patch set merged in,
because when devices are failed, there was deadlock to prevent bcache
printing error messages with gendisk disk name. But after the failure
handling patch set merged, the deadlock is fixed, so it is possible
that the gendisk structure bdev->hd_disk is released when bdevname() is
called to reference bdev->bd_disk->disk_name[]. This is why I receive
bug report of NULL pointers deference panic.

This patch stores gendisk disk name in a buffer inside struct cache and
struct cached_dev, then print out the offline device name won't reference
bdev->hd_disk anymore. And this patch also avoids extra function calls
of bdevname() and bio_devnmae().

Changelog:
v3, add Reviewed-by from Hannes.
v2, call bdevname() earlier in register_bdev()
v1, first version with segguestion from Junhui Tang.

Fixes: c7b7bd07 ("bcache: add io_disable to struct cached_dev")
Fixes: 5138ac67 ("bcache: fix misleading error message in bch_count_io_errors()")
Signed-off-by: NColy Li <colyli@suse.de>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

6e916a7e

01 5月, 2018 2 次提交

dm: fix some sparse warnings and whitespace in dax methods · 3d97c829

由 Mike Snitzer 提交于 4月 30, 2018

Eliminate these sparse warnings:
drivers/md/dm.c:1062:9: warning: context imbalance in 'dm_dax_direct_access' - unexpected unlock
drivers/md/dm.c:1086:9: warning: context imbalance in 'dm_dax_copy_from_iter' - unexpected unlock
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

3d97c829

dm cache background tracker: fix sparse warning · 280884fa

由 Mike Snitzer 提交于 4月 30, 2018

Fix drivers/md/dm-cache-background-tracker.c:169:16: warning: symbol
'alloc_work' was not declared. Should it be static?
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

280884fa

30 4月, 2018 2 次提交

dm bufio: fix buffer alignment · f7879b4c

由 Mikulas Patocka 提交于 4月 19, 2018

Commit 6b5e718c ("dm bufio: relax alignment constraint on slab
cache") relaxed alignment on dm-bufio cache, however it may break
dm-crypt or dm-integrity.

dm-crypt and dm-integrity require that the size of bio vector entries
(bv_len) is aligned on its sector size. bv_offset doesn't have to be
aligned, but bv_len must be. XFS sends unaligned bios, but they do not
cross page boundary, so the requirement for aligned bv_len is met.

Commit 6b5e718c made dm-bufio send unaligned bios that cross page
boundary, this could break dm-crypt and dm-integrity.

Reinstates the alignment. Note that misaligned entries only happen when
we use slab/slub debugging. Without debugging, the entries are always
aligned.

Fixes: 6b5e718c ("dm bufio: relax alignment constraint on slab cache")
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

f7879b4c

dm integrity: use kvfree for kvmalloc'd memory · fc8cec11

由 Mikulas Patocka 提交于 4月 17, 2018

Use kvfree instead of kfree because the array is allocated with kvmalloc.

Fixes: 7eada909 ("dm: add integrity target")
Cc: stable@vger.kernel.org # v4.12+
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

fc8cec11

09 4月, 2018 3 次提交

raid1: copy write hint from master bio to behind bio · dba40d46

由 Mariusz Dabrowski 提交于 2月 14, 2018

Signed-off-by: NMariusz Dabrowski <mariusz.dabrowski@intel.com>
Reviewed-by: NArtur Paszkiewicz <artur.paszkiewicz@intel.com>
Reviewed-by: NPawel Baldysiak <pawel.baldysiak@intel.com>
Signed-off-by: NShaohua Li <shli@fb.com>

dba40d46

md/raid1: exit sync request if MD_RECOVERY_INTR is set · 8c242593

由 Yufen Yu 提交于 4月 09, 2018

We met a sync thread stuck as follows:

 raid1_sync_request+0x2c9/0xb50
 md_do_sync+0x983/0xfa0
 md_thread+0x11c/0x160
 kthread+0x111/0x130
 ret_from_fork+0x35/0x40
 0xffffffffffffffff

At the same time, there is a stuck mdadm thread (mdadm --manage
/dev/md2 --add /dev/sda). It is trying to stop the sync thread:

 kthread_stop+0x42/0xf0
 md_unregister_thread+0x3a/0x70
 md_reap_sync_thread+0x15/0x160
 action_store+0x142/0x2a0
 md_attr_store+0x6c/0xb0
 kernfs_fop_write+0x102/0x180
 __vfs_write+0x33/0x170
 vfs_write+0xad/0x1a0
 SyS_write+0x52/0xc0
 do_syscall_64+0x6e/0x190
 entry_SYSCALL_64_after_hwframe+0x3d/0xa2

Debug tools show that the sync thread is waiting in raise_barrier(),
until raid1d() end all normal IO bios into bio_end_io_list(introduced
in commit 55ce74d4). But, raid1d() cannot end these bios if
MD_CHANGE_PENDING bit is set. It needs to get mddev->reconfig_mutex lock
and then clear the bit in md_check_recovery().
However, the lock is holding by mdadm in action_store().

Thus, there is a loop:
mdadm waiting for sync thread to stop, sync thread waiting for
raid1d() to end bios, raid1d() waiting for mdadm to release
mddev->reconfig_mutex lock and then it can end bios.

Fix this by checking MD_RECOVERY_INTR while waiting in raise_barrier(),
so that sync thread can exit while mdadm is stoping the sync thread.

Fixes: 55ce74d4 ("md/raid1: ensure device failure recorded before write request returns.")
Signed-off-by: NJason Yan <yanaijie@huawei.com>
Signed-off-by: NYufen Yu <yuyufen@huawei.com>
Signed-off-by: NShaohua Li <shli@fb.com>

8c242593

md-cluster: don't update recovery_offset for faulty device · 0ea9924a

由 Guoqing Jiang 提交于 4月 09, 2018

Device could become faulty when clustered array handling
METADATA_UPDATED msg, so we don't need to call read_rdev
for this device.
Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>

0ea9924a

05 4月, 2018 4 次提交

dm: remove fmode_t argument from .prepare_ioctl hook · 5bd5e8d8

由 Mike Snitzer 提交于 4月 03, 2018

Use the fmode_t that is passed to dm_blk_ioctl() rather than
inconsistently (varies across targets) drop it on the floor by
overriding it with the fmode_t stored in 'struct dm_dev'.

All the persistent reservation functions weren't using the fmode_t they
got back from .prepare_ioctl so remove them.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

5bd5e8d8

dm: hold DM table for duration of ioctl rather than use blkdev_get · 971888c4

由 Mike Snitzer 提交于 4月 03, 2018

Commit 519049af ("dm: use blkdev_get rather than bdgrab when issuing
pass-through ioctl") inadvertantly introduced a regression relative to
users of device cgroups that issue ioctls (e.g. libvirt).  Using
blkdev_get() in DM's passthrough ioctl support implicitly introduced a
cgroup permissions check that would fail unless care were taken to add
all devices in the IO stack to the device cgroup.  E.g. rather than just
adding the top-level DM multipath device to the cgroup all the
underlying devices would need to be allowed.

Fix this, to no longer require allowing all underlying devices, by
simply holding the live DM table (which includes the table's original
blkdev_get() reference on the blockdevice that the ioctl will be issued
to) for the duration of the ioctl.

Also, bump the DM ioctl version so a user can know that their device
cgroup allow workaround is no longer needed.
Reported-by: NMichal Privoznik <mprivozn@redhat.com>
Suggested-by: NMikulas Patocka <mpatocka@redhat.com>
Fixes: 519049af ("dm: use blkdev_get rather than bdgrab when issuing pass-through ioctl")
Cc: stable@vger.kernel.org # 4.16
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

971888c4

dm raid: fix parse_raid_params() variable range issue · 13bc62d4

由 Heinz Mauelshagen 提交于 3月 28, 2018

parse_raid_params() compares variable "int value" with INT_MAX.

E.g. related Coverity report excerpt:
   CID 1364818 (#2 of 3): Operands don't affect result (CONSTANT_EXPRESSION_RESULT) [select issue]
1433                        if (value > INT_MAX) {

Fix by changing checks to avoid INT_MAX.

Whilst on it, avoid unnecessary checks against constants
and add check for sane recovery speed min/max.
Signed-off-by: NHeinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

13bc62d4

dm verity: make verity_for_io_block static · d4b1aaf5

由 weiyongjun (A) 提交于 3月 28, 2018

Fixes the following sparse warning:

drivers/md/dm-verity-target.c:375:6: warning:
 symbol 'verity_for_io_block' was not declared. Should it be static?
Signed-off-by: NWei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

d4b1aaf5

04 4月, 2018 7 次提交

dm verity: add 'check_at_most_once' option to only validate hashes once · 843f38d3

由 Patrik Torstensson 提交于 3月 22, 2018

This allows platforms that are CPU/memory contrained to verify data
blocks only the first time they are read from the data device, rather
than every time.  As such, it provides a reduced level of security
because only offline tampering of the data device's content will be
detected, not online tampering.

Hash blocks are still verified each time they are read from the hash
device, since verification of hash blocks is less performance critical
than data blocks, and a hash block will not be verified any more after
all the data blocks it covers have been verified anyway.

This option introduces a bitset that is used to check if a block has
been validated before or not.  A block can be validated more than once
as there is no thread protection for the bitset.

These changes were developed and tested on entry-level Android Go
devices.
Signed-off-by: NPatrik Torstensson <totte@google.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

843f38d3

dm bufio: don't embed a bio in the dm_buffer structure · 45354f1e

由 Mikulas Patocka 提交于 3月 26, 2018

The bio structure consumes a substantial part of dm_buffer.  The bio
structure is only needed when doing I/O on the buffer, thus we don't
have to embed it in the buffer.

Allocate the bio structure only when doing I/O.

We don't need to create a bio_set because, in case of allocation
failure, dm-bufio falls back to using dm-io (which keeps its own
bio_set).
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

45354f1e

dm bufio: support non-power-of-two block sizes · f51f2e0a

由 Mikulas Patocka 提交于 3月 26, 2018

Support block sizes that are not a power-of-two (but they must be a
multiple of 512b).  As always, a slab cache is used for allocations.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

f51f2e0a

dm bufio: use slab cache for dm_buffer structure allocations · 359dbf19

由 Mikulas Patocka 提交于 3月 26, 2018

kmalloc padded to the next power of two, using a slab cache avoids this.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

359dbf19

dm bufio: reorder fields in dm_buffer structure · 03b02939

由 Mikulas Patocka 提交于 3月 26, 2018

Reorder fields in dm_buffer structure to improve packing and reduce
structure size.  The compiler allocates 32-bit integer for field 'enum
data_mode', so change it to unsigned char.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

03b02939

dm bufio: relax alignment constraint on slab cache · 6b5e718c

由 Mikulas Patocka 提交于 3月 15, 2018

The I/O buffer doesn't have to be aligned on block size granularity,
relax alignment to ARCH_KMALLOC_MINALIGN (required to allow DMA from
slab cache memory on some architectures).

Also, set SLAB_RECLAIM_ACCOUNT so that the memory allocated from the
cache is accounted as reclaimable and doesn't inflate the 'used' entry
in the free command.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

6b5e718c

dm bufio: remove code that merges slab caches · 21bb1327

由 Mikulas Patocka 提交于 3月 26, 2018

All slab allocators can merge duplicate caches.  So dm-bufio doesn't
need extra slab merging logic.  Instead it can just allocate one slab
cache per client and let the allocator merge them.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

21bb1327

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功