提交 · c62b37d96b6eb3ec5ae4cbe00db107bf15aebc93 · openeuler / Kernel

06 6月, 2020 5 次提交

dm zoned: select reclaim zone based on device index · 69875d44

由 Hannes Reinecke 提交于 6月 02, 2020

per-device reclaim should select zones on that device only.
Signed-off-by: NHannes Reinecke <hare@suse.de>
Reviewed-by: NDamien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

69875d44

dm zoned: support arbitrary number of devices · 4dba1288

由 Hannes Reinecke 提交于 6月 02, 2020

Remove the hard-coded limit of two devices and support an unlimited
number of additional zoned devices.
Signed-off-by: NHannes Reinecke <hare@suse.de>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

4dba1288

dm zoned: move random and sequential zones into struct dmz_dev · bd82fdab

由 Hannes Reinecke 提交于 6月 02, 2020

Random and sequential zones should be part of the respective
device structure to make arbitration between devices possible.
Signed-off-by: NHannes Reinecke <hare@suse.de>
Reviewed-by: NDamien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

bd82fdab

dm zoned: per-device reclaim · f97809ae

由 Hannes Reinecke 提交于 6月 02, 2020

Instead of having one reclaim workqueue for the entire set we should
be allocating a reclaim workqueue per device; doing so will reduce
contention and should boost performance for a multi-device setup.
Signed-off-by: NHannes Reinecke <hare@suse.de>
Reviewed-by: NDamien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

f97809ae

dm zoned: add device pointer to struct dm_zone · 8f22272a

由 Hannes Reinecke 提交于 6月 02, 2020

Add a pointer, to the containing device, within struct dm_zone and
kill dmz_zone_to_dev().
Signed-off-by: NHannes Reinecke <hare@suse.de>
Reviewed-by: NDamien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

8f22272a

21 5月, 2020 4 次提交

dm zoned: separate random and cache zones · 34f5affd

由 Hannes Reinecke 提交于 5月 19, 2020

Instead of lumping emulated zones together with random zones we
should be handling them as separate 'cache' zones. This improves
code readability and allows an easier implementation of different
cache policies.

Also add additional allocation flags, to separate the type (cache,
random, or sequential) from the purpose (eg reclaim).

Also switch the allocation policy to not use random zones as buffer
zones if cache zones are present. This avoids a performance drop when
all cache zones are used.
Signed-off-by: NHannes Reinecke <hare@suse.de>
Reviewed-by: NDamien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

34f5affd

dm zoned: Avoid 64-bit division error in dmz_fixup_devices · 42c689f6

由 Nathan Chancellor 提交于 5月 13, 2020

When building arm32 allyesconfig:

ld.lld: error: undefined symbol: __aeabi_uldivmod
>>> referenced by dm-zoned-target.c
>>>               md/dm-zoned-target.o:(dmz_ctr) in archive drivers/built-in.a

dmz_fixup_devices uses DIV_ROUND_UP with variables of type sector_t. As
such, it should be using DIV_ROUND_UP_SECTOR_T, which handles this
automatically.

Fixes: 70978208ec91 ("dm zoned: metadata version 2")
Signed-off-by: NNathan Chancellor <natechancellor@gmail.com>
Reviewed-by: NDamien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

42c689f6

dm zoned: remove spurious newlines from debugging messages · 49de3b7d

由 Hannes Reinecke 提交于 5月 14, 2020

DMDEBUG will already add a newline to the logging messages, so we
shouldn't be adding it to the message itself.
Signed-off-by: NHannes Reinecke <hare@suse.de>
Reviewed-by: NDamien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

49de3b7d

dm zoned: metadata version 2 · bd5c4031

由 Hannes Reinecke 提交于 5月 11, 2020

Implement handling for metadata version 2. The new metadata adds a
label and UUID for the device mapper device, and additional UUID for
the underlying block devices.

It also allows for an additional regular drive to be used for
emulating random access zones. The emulated zones will be placed
logically in front of the zones from the zoned block device, causing
the superblocks and metadata to be stored on that device.

The first zone of the original zoned device will be used to hold
another, tertiary copy of the metadata; this copy carries a generation
number of 0 and is never updated; it's just used for identification.
Signed-off-by: NHannes Reinecke <hare@suse.de>
Reviewed-by: NBob Liu <bob.liu@oracle.com>
Reviewed-by: NDamien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

bd5c4031

20 5月, 2020 2 次提交

dm zoned: replace 'target' pointer in the bio context · 52d67758

由 Hannes Reinecke 提交于 5月 11, 2020

Replace the 'target' pointer in the bio context with the
device pointer as this is what's actually used.
Signed-off-by: NHannes Reinecke <hare@suse.de>
Reviewed-by: NBob Liu <bob.liu@oracle.com>
Reviewed-by: NDamien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

52d67758

dm zoned: remove 'dev' argument from reclaim · 6c805f77

由 Hannes Reinecke 提交于 5月 11, 2020

Use the dmz_zone_to_dev() mapping function to remove the
'dev' argument from reclaim.
Signed-off-by: NHannes Reinecke <hare@suse.de>
Reviewed-by: NBob Liu <bob.liu@oracle.com>
Reviewed-by: NDamien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

6c805f77

15 5月, 2020 6 次提交

dm zoned: Introduce dmz_dev_is_dying() and dmz_check_dev() · d0e21ce4

由 Hannes Reinecke 提交于 5月 11, 2020

Introduce accessors dmz_dev_is_dying() and dmz_check_dev() to
avoid having to reference the devices directly.
Signed-off-by: NHannes Reinecke <hare@suse.de>
Reviewed-by: NBob Liu <bob.liu@oracle.com>
Reviewed-by: NDamien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

d0e21ce4

dm zoned: introduce dmz_metadata_label() to format device name · 2234e732

由 Hannes Reinecke 提交于 5月 11, 2020

Introduce dmz_metadata_label() to format the device-mapper device
name and use it instead of the device name of the underlying device.
Signed-off-by: NHannes Reinecke <hare@suse.de>
Reviewed-by: NDamien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: NBob Liu <bob.liu@oracle.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

2234e732

dm zoned: move fields from struct dmz_dev to dmz_metadata · 36820560

由 Hannes Reinecke 提交于 5月 11, 2020

Move fields from the device structure into the metadata structure
and provide accessor functions.
Signed-off-by: NHannes Reinecke <hare@suse.de>
Reviewed-by: NDamien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: NBob Liu <bob.liu@oracle.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

36820560

dm zoned: store zone id within the zone structure and kill dmz_id() · b7122873

由 Hannes Reinecke 提交于 5月 11, 2020

Instead of calculating the zone index by the offset within the
zone array store the index within the structure itself. With that
the helper dmz_id() is pointless and can be replaced with accessing
the ->id value directly.
Signed-off-by: NHannes Reinecke <hare@suse.de>
Reviewed-by: NBob Liu <bob.liu@oracle.com>
Reviewed-by: NDamien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

b7122873

dm zoned: add 'message' callback · 90b39d58

由 Hannes Reinecke 提交于 5月 11, 2020

Add callback for 'dmsetup message' to allow the reclaim process
to be triggered manually.
Eg.

	dmsetup message /dev/dm-X 0 message

will start the reclaim process even if the default threshold
of 50 percent of free random zones is not reached.
Signed-off-by: NHannes Reinecke <hare@suse.de>
Reviewed-by: NBob Liu <bob.liu@oracle.com>
Reviewed-by: NDamien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

90b39d58

dm zoned: add 'status' callback · bc3d5717

由 Hannes Reinecke 提交于 5月 11, 2020

Add callback to supply information for 'dmsetup status'
and 'dmsetup table'. The output for 'dmsetup status' is

0 <size> zoned <nr_zones> zones <nr_unmap_rnd>/<nr_rnd> random <nr_unmap_seq>/<nr_seq> sequential

where <nr_unmap_rnd> is the number of unmapped (ie free) random zones,
<nr_rnd> the total number of random zones, <nr_unmap_seq> the number
of unmapped sequential zones, and <nr_seq> the total number of
sequential zones.
Signed-off-by: NHannes Reinecke <hare@suse.de>
Reviewed-by: NBob Liu <bob.liu@oracle.com>
Reviewed-by: NDamien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

bc3d5717

04 3月, 2020 1 次提交

dm: bump version of core and various targets · 636be424

由 Mike Snitzer 提交于 2月 27, 2020

Changes made during the 5.6 cycle warrant bumping the version number
for DM core and the targets modified by this commit.

It should be noted that dm-thin, dm-crypt and dm-raid already had
their target version bumped during the 5.6 merge window.

Signed-off-by; Mike Snitzer <snitzer@redhat.com>

636be424

28 2月, 2020 1 次提交

dm zoned: Fix reference counter initial value of chunk works · ee63634b

由 Shin'ichiro Kawasaki 提交于 2月 27, 2020

Dm-zoned initializes reference counters of new chunk works with zero
value and refcount_inc() is called to increment the counter. However, the
refcount_inc() function handles the addition to zero value as an error
and triggers the warning as follows:

refcount_t: addition on 0; use-after-free.
WARNING: CPU: 7 PID: 1506 at lib/refcount.c:25 refcount_warn_saturate+0x68/0xf0
...
CPU: 7 PID: 1506 Comm: systemd-udevd Not tainted 5.4.0+ #134
...
Call Trace:
 dmz_map+0x2d2/0x350 [dm_zoned]
 __map_bio+0x42/0x1a0
 __split_and_process_non_flush+0x14a/0x1b0
 __split_and_process_bio+0x83/0x240
 ? kmem_cache_alloc+0x165/0x220
 dm_process_bio+0x90/0x230
 ? generic_make_request_checks+0x2e7/0x680
 dm_make_request+0x3e/0xb0
 generic_make_request+0xcf/0x320
 ? memcg_drain_all_list_lrus+0x1c0/0x1c0
 submit_bio+0x3c/0x160
 ? guard_bio_eod+0x2c/0x130
 mpage_readpages+0x182/0x1d0
 ? bdev_evict_inode+0xf0/0xf0
 read_pages+0x6b/0x1b0
 __do_page_cache_readahead+0x1ba/0x1d0
 force_page_cache_readahead+0x93/0x100
 generic_file_read_iter+0x83a/0xe40
 ? __seccomp_filter+0x7b/0x670
 new_sync_read+0x12a/0x1c0
 vfs_read+0x9d/0x150
 ksys_read+0x5f/0xe0
 do_syscall_64+0x5b/0x180
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
...

After this warning, following refcount API calls for the counter all fail
to change the counter value.

Fix this by setting the initial reference counter value not zero but one
for the new chunk works. Instead, do not call refcount_inc() via
dmz_get_chunk_work() for the new chunks works.

The failure was observed with linux version 5.4 with CONFIG_REFCOUNT_FULL
enabled. Refcount rework was merged to linux version 5.5 by the
commit 168829ad ("Merge branch 'locking-core-for-linus' of
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip"). After this
commit, CONFIG_REFCOUNT_FULL was removed and the failure was observed
regardless of kernel configuration.

Linux version 4.20 merged the commit 092b5648 ("dm zoned: target: use
refcount_t for dm zoned reference counters"). Before this commit, dm
zoned used atomic_t APIs which does not check addition to zero, then this
fix is not necessary.

Fixes: 092b5648 ("dm zoned: target: use refcount_t for dm zoned reference counters")
Cc: stable@vger.kernel.org # 5.4+
Signed-off-by: NShin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Reviewed-by: NDamien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

ee63634b

03 12月, 2019 1 次提交

block: simplify blkdev_nr_zones · 9b38bb4b

由 Christoph Hellwig 提交于 12月 03, 2019

Simplify the arguments to blkdev_nr_zones by passing a gendisk instead
of the block_device and capacity.  This also removes the need for
__blkdev_nr_zones as all callers are outside the fast path and can
deal with the additional branch.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

9b38bb4b

07 11月, 2019 1 次提交

dm zoned: reduce overhead of backing device checks · e7fad909

由 Dmitry Fomichev 提交于 11月 06, 2019

Commit 75d66ffb added backing device health checks and as a part
of these checks, check_events() block ops template call is invoked in
dm-zoned mapping path as well as in reclaim and flush path. Calling
check_events() with ATA or SCSI backing devices introduces a blocking
scsi_test_unit_ready() call being made in sd_check_events(). Even though
the overhead of calling scsi_test_unit_ready() is small for ATA zoned
devices, it is much larger for SCSI and it affects performance in a very
negative way.

Fix this performance regression by executing check_events() only in case
of any I/O errors. The function dmz_bdev_is_dying() is modified to call
only blk_queue_dying(), while calls to check_events() are made in a new
helper function, dmz_check_bdev().
Reported-by: Nzhangxiaoxu <zhangxiaoxu5@huawei.com>
Fixes: 75d66ffb ("dm zoned: properly handle backing device failure")
Cc: stable@vger.kernel.org
Signed-off-by: NDmitry Fomichev <dmitry.fomichev@wdc.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

e7fad909

26 8月, 2019 1 次提交

dm zoned: fix invalid memory access · 0c8e9c2d

由 Mikulas Patocka 提交于 8月 26, 2019

Commit 75d66ffb ("dm zoned: properly
handle backing device failure") triggers a coverity warning:

*** CID 1452808:  Memory - illegal accesses  (USE_AFTER_FREE)
/drivers/md/dm-zoned-target.c: 137 in dmz_submit_bio()
131             clone->bi_private = bioctx;
132
133             bio_advance(bio, clone->bi_iter.bi_size);
134
135             refcount_inc(&bioctx->ref);
136             generic_make_request(clone);
>>>     CID 1452808:  Memory - illegal accesses  (USE_AFTER_FREE)
>>>     Dereferencing freed pointer "clone".
137             if (clone->bi_status == BLK_STS_IOERR)
138                     return -EIO;
139
140             if (bio_op(bio) == REQ_OP_WRITE && dmz_is_seq(zone))
141                     zone->wp_block += nr_blocks;
142

The "clone" bio may be processed and freed before the check
"clone->bi_status == BLK_STS_IOERR" - so this check can access invalid
memory.

Fixes: 75d66ffb ("dm zoned: properly handle backing device failure")
Cc: stable@vger.kernel.org
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Reviewed-by: NDamien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

0c8e9c2d

16 8月, 2019 3 次提交

dm zoned: add SPDX license identifiers · bae9a0aa

由 Dmitry Fomichev 提交于 8月 02, 2019

Signed-off-by: NDmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: NDamien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

bae9a0aa

dm zoned: properly handle backing device failure · 75d66ffb

由 Dmitry Fomichev 提交于 8月 10, 2019

dm-zoned is observed to lock up or livelock in case of hardware
failure or some misconfiguration of the backing zoned device.

This patch adds a new dm-zoned target function that checks the status of
the backing device. If the request queue of the backing device is found
to be in dying state or the SCSI backing device enters offline state,
the health check code sets a dm-zoned target flag prompting all further
incoming I/O to be rejected. In order to detect backing device failures
timely, this new function is called in the request mapping path, at the
beginning of every reclaim run and before performing any metadata I/O.

The proper way out of this situation is to do

dmsetup remove <dm-zoned target>

and recreate the target when the problem with the backing device
is resolved.

Fixes: 3b1a94c8 ("dm zoned: drive-managed zoned block device target")
Cc: stable@vger.kernel.org
Signed-off-by: NDmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: NDamien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

75d66ffb

dm zoned: improve error handling in i/o map code · d7428c50

由 Dmitry Fomichev 提交于 8月 10, 2019

Some errors are ignored in the I/O path during queueing chunks
for processing by chunk works. Since at least these errors are
transient in nature, it should be possible to retry the failed
incoming commands.

The fix -

Errors that can happen while queueing chunks are carried upwards
to the main mapping function and it now returns DM_MAPIO_REQUEUE
for any incoming requests that can not be properly queued.

Error logging/debug messages are added where needed.

Fixes: 3b1a94c8 ("dm zoned: drive-managed zoned block device target")
Cc: stable@vger.kernel.org
Signed-off-by: NDmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: NDamien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

d7428c50

19 4月, 2019 1 次提交

dm zoned: Silence a static checker warning · a3839bc6

由 Dan Carpenter 提交于 4月 10, 2019

My static checker complains about this line from dmz_get_zoned_device()

aligned_capacity = dev->capacity & ~(blk_queue_zone_sectors(q) - 1);

The problem is that "aligned_capacity" and "dev->capacity" are sector_t
type (which is a u64 under most configs) but blk_queue_zone_sectors(q)
returns a u32 so the higher 32 bits in aligned_capacity are cleared to
zero. This patch adds a cast to address the issue.

Fixes: 114e0259 ("dm zoned: ignore last smaller runt zone")
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: NDamien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

a3839bc6

21 2月, 2019 1 次提交

dm: eliminate 'split_discard_bios' flag from DM target interface · 61697a6a

由 Mike Snitzer 提交于 1月 18, 2019

There is no need to have DM core split discards on behalf of a DM target
now that blk_queue_split() handles splitting discards based on the
queue_limits.  A DM target just needs to set max_discard_sectors,
discard_granularity, etc, in queue_limits.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

61697a6a

08 12月, 2018 1 次提交

dm zoned: Fix target BIO completion handling · d57f9da8

由 Damien Le Moal 提交于 11月 30, 2018

struct bioctx includes the ref refcount_t to track the number of I/O
fragments used to process a target BIO as well as ensure that the zone
of the BIO is kept in the active state throughout the lifetime of the
BIO. However, since decrementing of this reference count is done in the
target .end_io method, the function bio_endio() must be called multiple
times for read and write target BIOs, which causes problems with the
value of the __bi_remaining struct bio field for chained BIOs (e.g. the
clone BIO passed by dm core is large and splits into fragments by the
block layer), resulting in incorrect values and inconsistencies with the
BIO_CHAIN flag setting. This is turn triggers the BUG_ON() call:

BUG_ON(atomic_read(&bio->__bi_remaining) <= 0);

in bio_remaining_done() called from bio_endio().

Fix this ensuring that bio_endio() is called only once for any target
BIO by always using internal clone BIOs for processing any read or
write target BIO. This allows reference counting using the target BIO
context counter to trigger the target BIO completion bio_endio() call
once all data, metadata and other zone work triggered by the BIO
complete.

Overall, this simplifies the code too as the target .end_io becomes
unnecessary and differences between read and write BIO issuing and
completion processing disappear.

Fixes: 3b1a94c8 ("dm zoned: drive-managed zoned block device target")
Cc: stable@vger.kernel.org
Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

d57f9da8

26 10月, 2018 1 次提交

block: Introduce blkdev_nr_zones() helper · a91e1380

由 Damien Le Moal 提交于 10月 12, 2018

Introduce the blkdev_nr_zones() helper function to get the total
number of zones of a zoned block device. This number is always 0 for a
regular block device (q->limits.zoned == BLK_ZONED_NONE case).

Replace hard-coded number of zones calculation in dmz_get_zoned_device()
with a call to this helper.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

a91e1380

17 10月, 2018 1 次提交

dm zoned: target: use refcount_t for dm zoned reference counters · 092b5648

由 John Pittman 提交于 8月 23, 2018

The API surrounding refcount_t should be used in place of atomic_t
when variables are being used as reference counters.  This API can
prevent issues such as counter overflows and use-after-free
conditions.  Within the dm zoned target stack, the atomic_t API is
used for bioctx->ref and cw->refcount.  Change these to use
refcount_t, avoiding the issues mentioned.
Signed-off-by: NJohn Pittman <jpittman@redhat.com>
Reviewed-by: NDamien Le Moal <damien.lemoal@wdc.com>
Tested-by: NDamien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

092b5648

23 6月, 2018 1 次提交

dm zoned: avoid triggering reclaim from inside dmz_map() · 2d0b2d64

由 Bart Van Assche 提交于 6月 22, 2018

This patch avoids that lockdep reports the following:

======================================================
WARNING: possible circular locking dependency detected
4.18.0-rc1 #62 Not tainted
------------------------------------------------------
kswapd0/84 is trying to acquire lock:
00000000c313516d (&xfs_nondir_ilock_class){++++}, at: xfs_free_eofblocks+0xa2/0x1e0

but task is already holding lock:
00000000591c83ae (fs_reclaim){+.+.}, at: __fs_reclaim_acquire+0x5/0x30

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #2 (fs_reclaim){+.+.}:
  kmem_cache_alloc+0x2c/0x2b0
  radix_tree_node_alloc.constprop.19+0x3d/0xc0
  __radix_tree_create+0x161/0x1c0
  __radix_tree_insert+0x45/0x210
  dmz_map+0x245/0x2d0 [dm_zoned]
  __map_bio+0x40/0x260
  __split_and_process_non_flush+0x116/0x220
  __split_and_process_bio+0x81/0x180
  __dm_make_request.isra.32+0x5a/0x100
  generic_make_request+0x36e/0x690
  submit_bio+0x6c/0x140
  mpage_readpages+0x19e/0x1f0
  read_pages+0x6d/0x1b0
  __do_page_cache_readahead+0x21b/0x2d0
  force_page_cache_readahead+0xc4/0x100
  generic_file_read_iter+0x7c6/0xd20
  __vfs_read+0x102/0x180
  vfs_read+0x9b/0x140
  ksys_read+0x55/0xc0
  do_syscall_64+0x5a/0x1f0
  entry_SYSCALL_64_after_hwframe+0x49/0xbe

-> #1 (&dmz->chunk_lock){+.+.}:
  dmz_map+0x133/0x2d0 [dm_zoned]
  __map_bio+0x40/0x260
  __split_and_process_non_flush+0x116/0x220
  __split_and_process_bio+0x81/0x180
  __dm_make_request.isra.32+0x5a/0x100
  generic_make_request+0x36e/0x690
  submit_bio+0x6c/0x140
  _xfs_buf_ioapply+0x31c/0x590
  xfs_buf_submit_wait+0x73/0x520
  xfs_buf_read_map+0x134/0x2f0
  xfs_trans_read_buf_map+0xc3/0x580
  xfs_read_agf+0xa5/0x1e0
  xfs_alloc_read_agf+0x59/0x2b0
  xfs_alloc_pagf_init+0x27/0x60
  xfs_bmap_longest_free_extent+0x43/0xb0
  xfs_bmap_btalloc_nullfb+0x7f/0xf0
  xfs_bmap_btalloc+0x428/0x7c0
  xfs_bmapi_write+0x598/0xcc0
  xfs_iomap_write_allocate+0x15a/0x330
  xfs_map_blocks+0x1cf/0x3f0
  xfs_do_writepage+0x15f/0x7b0
  write_cache_pages+0x1ca/0x540
  xfs_vm_writepages+0x65/0xa0
  do_writepages+0x48/0xf0
  __writeback_single_inode+0x58/0x730
  writeback_sb_inodes+0x249/0x5c0
  wb_writeback+0x11e/0x550
  wb_workfn+0xa3/0x670
  process_one_work+0x228/0x670
  worker_thread+0x3c/0x390
  kthread+0x11c/0x140
  ret_from_fork+0x3a/0x50

-> #0 (&xfs_nondir_ilock_class){++++}:
  down_read_nested+0x43/0x70
  xfs_free_eofblocks+0xa2/0x1e0
  xfs_fs_destroy_inode+0xac/0x270
  dispose_list+0x51/0x80
  prune_icache_sb+0x52/0x70
  super_cache_scan+0x127/0x1a0
  shrink_slab.part.47+0x1bd/0x590
  shrink_node+0x3b5/0x470
  balance_pgdat+0x158/0x3b0
  kswapd+0x1ba/0x600
  kthread+0x11c/0x140
  ret_from_fork+0x3a/0x50

other info that might help us debug this:

Chain exists of:
  &xfs_nondir_ilock_class --> &dmz->chunk_lock --> fs_reclaim

Possible unsafe locking scenario:

     CPU0                    CPU1
     ----                    ----
lock(fs_reclaim);
                             lock(&dmz->chunk_lock);
                             lock(fs_reclaim);
lock(&xfs_nondir_ilock_class);

*** DEADLOCK ***

3 locks held by kswapd0/84:
 #0: 00000000591c83ae (fs_reclaim){+.+.}, at: __fs_reclaim_acquire+0x5/0x30
 #1: 000000000f8208f5 (shrinker_rwsem){++++}, at: shrink_slab.part.47+0x3f/0x590
 #2: 00000000cacefa54 (&type->s_umount_key#43){.+.+}, at: trylock_super+0x16/0x50

stack backtrace:
CPU: 7 PID: 84 Comm: kswapd0 Not tainted 4.18.0-rc1 #62
Hardware name: Supermicro Super Server/X10SRL-F, BIOS 2.0 12/17/2015
Call Trace:
 dump_stack+0x85/0xcb
 print_circular_bug.isra.36+0x1ce/0x1db
 __lock_acquire+0x124e/0x1310
 lock_acquire+0x9f/0x1f0
 down_read_nested+0x43/0x70
 xfs_free_eofblocks+0xa2/0x1e0
 xfs_fs_destroy_inode+0xac/0x270
 dispose_list+0x51/0x80
 prune_icache_sb+0x52/0x70
 super_cache_scan+0x127/0x1a0
 shrink_slab.part.47+0x1bd/0x590
 shrink_node+0x3b5/0x470
 balance_pgdat+0x158/0x3b0
 kswapd+0x1ba/0x600
 kthread+0x11c/0x140
 ret_from_fork+0x3a/0x50
Reported-by: NMasato Suzuki <masato.suzuki@wdc.com>
Fixes: 4218a955 ("dm zoned: use GFP_NOIO in I/O path")
Cc: <stable@vger.kernel.org>
Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

2d0b2d64

08 6月, 2018 1 次提交

dm: adjust structure members to improve alignment · 72d711c8

由 Mike Snitzer 提交于 5月 22, 2018

Eliminate most holes in DM data structures that were modified by
commit 6f1c819c ("dm: convert to bioset_init()/mempool_init()").
Also prevent structure members from unnecessarily spanning cache
lines.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

72d711c8

31 5月, 2018 1 次提交

dm: convert to bioset_init()/mempool_init() · 6f1c819c

由 Kent Overstreet 提交于 5月 20, 2018

Convert dm to embedded bio sets.
Acked-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NKent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

6f1c819c

05 4月, 2018 1 次提交

dm: remove fmode_t argument from .prepare_ioctl hook · 5bd5e8d8

由 Mike Snitzer 提交于 4月 03, 2018

Use the fmode_t that is passed to dm_blk_ioctl() rather than
inconsistently (varies across targets) drop it on the floor by
overriding it with the fmode_t stored in 'struct dm_dev'.

All the persistent reservation functions weren't using the fmode_t they
got back from .prepare_ioctl so remove them.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

5bd5e8d8

17 1月, 2018 1 次提交
- M
  dm: backfill missing calls to mutex_destroy() · d5ffebdd
  由 Mike Snitzer 提交于 1月 05, 2018
```
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
```
  d5ffebdd
11 11月, 2017 1 次提交

dm zoned: ignore last smaller runt zone · 114e0259

由 Damien Le Moal 提交于 10月 28, 2017

The SCSI layer allows ZBC drives to have a smaller last runt zone. For
such a device, specifying the entire capacity for a dm-zoned target
table entry fails because the specified capacity is not aligned on a
device zone size indicated in the request queue structure of the
device.

Fix this problem by ignoring the last runt zone in the entry length
when seting up the dm-zoned target (ctr method) and when iterating table
entries of the target (iterate_devices method). This allows dm-zoned
users to still easily setup a target using the entire device capacity
(as mandated by dm-zoned) or the aligned capacity excluding the last
runt zone.

While at it, replace direct references to the device queue chunk_sectors
limit with calls to the accessor blk_queue_zone_sectors().
Reported-by: NPeter Desnoyers <pjd@ccs.neu.edu>
Cc: stable@vger.kernel.org
Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

114e0259

24 8月, 2017 1 次提交

block: replace bi_bdev with a gendisk pointer and partitions index · 74d46992

由 Christoph Hellwig 提交于 8月 23, 2017

This way we don't need a block_device structure to submit I/O.  The
block_device has different life time rules from the gendisk and
request_queue and is usually only available when the block device node
is open.  Other callers need to explicitly create one (e.g. the lightnvm
passthrough code, or the new nvme multipathing code).

For the actual I/O path all that we need is the gendisk, which exists
once per block device.  But given that the block layer also does
partition remapping we additionally need a partition index, which is
used for said remapping in generic_make_request.

Note that all the block drivers generally want request_queue or
sometimes the gendisk, so this removes a layer of indirection all
over the stack.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

74d46992

27 7月, 2017 1 次提交

dm zoned: use GFP_NOIO in I/O path · 4218a955

由 Damien Le Moal 提交于 7月 24, 2017

Use GFP_NOIO for memory allocations in the I/O path.  Other memory
allocations in the initialization path can use GFP_KERNEL.
Reported-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

4218a955

26 7月, 2017 1 次提交

dm zoned: remove test for impossible REQ_OP_FLUSH conditions · edbe9597

由 Mikulas Patocka 提交于 7月 21, 2017

The value REQ_OP_FLUSH is only used by the block code for
request-based devices.

Remove the tests for REQ_OP_FLUSH from the bio-based dm-zoned-target.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Reviewed-by: NDamien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

edbe9597

19 6月, 2017 1 次提交

dm zoned: drive-managed zoned block device target · 3b1a94c8

由 Damien Le Moal 提交于 6月 07, 2017

The dm-zoned device mapper target provides transparent write access
to zoned block devices (ZBC and ZAC compliant block devices).
dm-zoned hides to the device user (a file system or an application
doing raw block device accesses) any constraint imposed on write
requests by the device, equivalent to a drive-managed zoned block
device model.

Write requests are processed using a combination of on-disk buffering
using the device conventional zones and direct in-place processing for
requests aligned to a zone sequential write pointer position.
A background reclaim process implemented using dm_kcopyd_copy ensures
that conventional zones are always available for executing unaligned
write requests. The reclaim process overhead is minimized by managing
buffer zones in a least-recently-written order and first targeting the
oldest buffer zones. Doing so, blocks under regular write access (such
as metadata blocks of a file system) remain stored in conventional
zones, resulting in no apparent overhead.

dm-zoned implementation focus on simplicity and on minimizing overhead
(CPU, memory and storage overhead). For a 14TB host-managed disk with
256 MB zones, dm-zoned memory usage per disk instance is at most about
3 MB and as little as 5 zones will be used internally for storing metadata
and performing buffer zone reclaim operations. This is achieved using
zone level indirection rather than a full block indirection system for
managing block movement between zones.

dm-zoned primary target is host-managed zoned block devices but it can
also be used with host-aware device models to mitigate potential
device-side performance degradation due to excessive random writing.

Zoned block devices can be formatted and checked for use with the dm-zoned
target using the dmzadm utility available at:

https://github.com/hgst/dm-zoned-toolsSigned-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Reviewed-by: NBart Van Assche <bart.vanassche@sandisk.com>
[Mike Snitzer partly refactored Damien's original work to cleanup the code]
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

3b1a94c8

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功