提交 · 697e23495c94f0380c1ed8b11f830b92b64c99ea · openeuler / Kernel

03 10月, 2020 5 次提交

bcache: explicitly make cache_set only have single cache · 697e2349

由 Coly Li 提交于 10月 01, 2020

Currently although the bcache code has a framework for multiple caches
in a cache set, but indeed the multiple caches never completed and users
use md raid1 for multiple copies of the cached data.

This patch does the following change in struct cache_set, to explicitly
make a cache_set only have single cache,
- Change pointer array "*cache[MAX_CACHES_PER_SET]" to a single pointer
  "*cache".
- Remove pointer array "*cache_by_alloc[MAX_CACHES_PER_SET]".
- Remove "caches_loaded".

Now the code looks as exactly what it does in practic: only one cache is
used in the cache set.
Signed-off-by: NColy Li <colyli@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

697e2349

bcache: remove 'int n' from parameter list of bch_bucket_alloc_set() · 17e4aed8

由 Coly Li 提交于 10月 01, 2020

The parameter 'int n' from bch_bucket_alloc_set() is not cleared
defined. From the code comments n is the number of buckets to alloc, but
from the code itself 'n' is the maximum cache to iterate. Indeed all the
locations where bch_bucket_alloc_set() is called, 'n' is alwasy 1.

This patch removes the confused and unnecessary 'int n' from parameter
list of  bch_bucket_alloc_set(), and explicitly allocates only 1 bucket
for its caller.
Signed-off-by: NColy Li <colyli@suse.de>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

17e4aed8

bcache: Convert to DEFINE_SHOW_ATTRIBUTE · 84e5d136

由 Qinglang Miao 提交于 10月 01, 2020

Use DEFINE_SHOW_ATTRIBUTE macro to simplify the code.

As inode->iprivate equals to third parameter of
debugfs_create_file() which is NULL. So it's equivalent
to original code logic.
Signed-off-by: NQinglang Miao <miaoqinglang@huawei.com>
Signed-off-by: NColy Li <colyli@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

84e5d136

bcache: check c->root with IS_ERR_OR_NULL() in mca_reserve() · 7e59c506

由 Dongsheng Yang 提交于 10月 01, 2020

In mca_reserve(c) macro, we are checking root whether is NULL or not.
But that's not enough, when we read the root node in run_cache_set(),
if we got an error in bch_btree_node_read_done(), we will return
ERR_PTR(-EIO) to c->root.

And then we will go continue to unregister, but before calling
unregister_shrinker(&c->shrink), there is a possibility to call
bch_mca_count(), and we would get a crash with call trace like that:

[ 2149.876008] Unable to handle kernel NULL pointer dereference at virtual address 00000000000000b5
... ...
[ 2150.598931] Call trace:
[ 2150.606439]  bch_mca_count+0x58/0x98 [escache]
[ 2150.615866]  do_shrink_slab+0x54/0x310
[ 2150.624429]  shrink_slab+0x248/0x2d0
[ 2150.632633]  drop_slab_node+0x54/0x88
[ 2150.640746]  drop_slab+0x50/0x88
[ 2150.648228]  drop_caches_sysctl_handler+0xf0/0x118
[ 2150.657219]  proc_sys_call_handler.isra.18+0xb8/0x110
[ 2150.666342]  proc_sys_write+0x40/0x50
[ 2150.673889]  __vfs_write+0x48/0x90
[ 2150.681095]  vfs_write+0xac/0x1b8
[ 2150.688145]  ksys_write+0x6c/0xd0
[ 2150.695127]  __arm64_sys_write+0x24/0x30
[ 2150.702749]  el0_svc_handler+0xa0/0x128
[ 2150.710296]  el0_svc+0x8/0xc
Signed-off-by: NDongsheng Yang <dongsheng.yang@easystack.cn>
Signed-off-by: NColy Li <colyli@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

7e59c506

bcache: share register sysfs with async register · a58e88bf

由 Coly Li 提交于 10月 01, 2020

Previously the experimental async registration uses a separate sysfs
file register_async. Now the async registration code seems working well
for a while, we can do furtuher testing with it now.

This patch changes the async bcache registration shares the same sysfs
file /sys/fs/bcache/register (and register_quiet). Async registration
will be default behavior if BCACHE_ASYNC_REGISTRATION is set in kernel
configure. By default, BCACHE_ASYNC_REGISTRATION is not configured yet.
Signed-off-by: NColy Li <colyli@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

a58e88bf

25 9月, 2020 19 次提交

md/raid10: improve discard request for far layout · d3ee2d84

由 Xiao Ni 提交于 9月 02, 2020

For far layout, the discard region is not continuous on disks. So it needs
far copies r10bio to cover all regions. It needs a way to know all r10bios
have finish or not. Similar with raid10_sync_request, only the first r10bio
master_bio records the discard bio. Other r10bios master_bio record the
first r10bio. The first r10bio can finish after other r10bios finish and
then return the discard bio.
Signed-off-by: NXiao Ni <xni@redhat.com>
Signed-off-by: NSong Liu <songliubraving@fb.com>

d3ee2d84

md/raid10: improve raid10 discard request · bcc90d28

由 Xiao Ni 提交于 9月 02, 2020

Now the discard request is split by chunk size. So it takes a long time
to finish mkfs on disks which support discard function. This patch improve
handling raid10 discard request. It uses the similar way with patch
29efc390 (md/md0: optimize raid0 discard handling).

But it's a little complex than raid0. Because raid10 has different layout.
If raid10 is offset layout and the discard request is smaller than stripe
size. There are some holes when we submit discard bio to underlayer disks.

For example: five disks (disk1 - disk5)
D01 D02 D03 D04 D05
D05 D01 D02 D03 D04
D06 D07 D08 D09 D10
D10 D06 D07 D08 D09
The discard bio just wants to discard from D03 to D10. For disk3, there is
a hole between D03 and D08. For disk4, there is a hole between D04 and D09.
D03 is a chunk, raid10_write_request can handle one chunk perfectly. So
the part that is not aligned with stripe size is still handled by
raid10_write_request.

If reshape is running when discard bio comes and the discard bio spans the
reshape position, raid10_write_request is responsible to handle this
discard bio.

I did a test with this patch set.
Without patch:
time mkfs.xfs /dev/md0
real4m39.775s
user0m0.000s
sys0m0.298s

With patch:
time mkfs.xfs /dev/md0
real0m0.105s
user0m0.000s
sys0m0.007s

nvme3n1           259:1    0   477G  0 disk
└─nvme3n1p1       259:10   0    50G  0 part
nvme4n1           259:2    0   477G  0 disk
└─nvme4n1p1       259:11   0    50G  0 part
nvme5n1           259:6    0   477G  0 disk
└─nvme5n1p1       259:12   0    50G  0 part
nvme2n1           259:9    0   477G  0 disk
└─nvme2n1p1       259:15   0    50G  0 part
nvme0n1           259:13   0   477G  0 disk
└─nvme0n1p1       259:14   0    50G  0 part
Reviewed-by: NColy Li <colyli@suse.de>
Reviewed-by: NGuoqing Jiang <guoqing.jiang@cloud.ionos.com>
Signed-off-by: NXiao Ni <xni@redhat.com>
Signed-off-by: NSong Liu <songliubraving@fb.com>

bcc90d28

md/raid10: pull codes that wait for blocked dev into one function · f046f5d0

由 Xiao Ni 提交于 8月 25, 2020

The following patch will reuse these logics, so pull the same codes into
one function.
Signed-off-by: NXiao Ni <xni@redhat.com>
Signed-off-by: NSong Liu <songliubraving@fb.com>

f046f5d0

md/raid10: extend r10bio devs to raid disks · 8650a889

由 Xiao Ni 提交于 8月 25, 2020

Now it allocs r10bio->devs[conf->copies]. Discard bio needs to submit
to all member disks and it needs to use r10bio. So extend to
r10bio->devs[geo.raid_disks].
Reviewed-by: NColy Li <colyli@suse.de>
Signed-off-by: NXiao Ni <xni@redhat.com>
Signed-off-by: NSong Liu <songliubraving@fb.com>

8650a889

md: add md_submit_discard_bio() for submitting discard bio · 2628089b

由 Xiao Ni 提交于 8月 25, 2020

Move these logic from raid0.c to md.c, so that we can also use it in
raid10.c.
Reviewed-by: NColy Li <colyli@suse.de>
Reviewed-by: NGuoqing Jiang <guoqing.jiang@cloud.ionos.com>
Signed-off-by: NXiao Ni <xni@redhat.com>
Signed-off-by: NSong Liu <songliubraving@fb.com>

2628089b

md: Simplify code with existing definition RESYNC_SECTORS in raid10.c · e287308b

由 Zhen Lei 提交于 8月 21, 2020

#define RESYNC_SECTORS (RESYNC_BLOCK_SIZE >> 9)

"RESYNC_BLOCK_SIZE/512" is equal to "RESYNC_BLOCK_SIZE >> 9", replace it
with RESYNC_SECTORS.
Signed-off-by: NZhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: NSong Liu <songliubraving@fb.com>

e287308b

md/raid5: reallocate page array after setting new stripe_size · 38912584

由 Yufen Yu 提交于 8月 20, 2020

When try to resize stripe_size, we also need to free old
shared page array and allocate new.
Signed-off-by: NYufen Yu <yuyufen@huawei.com>
Signed-off-by: NSong Liu <songliubraving@fb.com>

38912584

md/raid5: resize stripe_head when reshape array · f16acaf3

由 Yufen Yu 提交于 8月 20, 2020

When reshape array, we try to reuse shared pages of old stripe_head,
and allocate more for the new one if needed.
Signed-off-by: NYufen Yu <yuyufen@huawei.com>
Signed-off-by: NSong Liu <songliubraving@fb.com>

f16acaf3

md/raid5: let multiple devices of stripe_head share page · 046169f0

由 Yufen Yu 提交于 8月 20, 2020

In current implementation, grow_buffers() uses alloc_page() to
allocate the buffers for each stripe_head, i.e. allocate a page
for each dev[i] in stripe_head.

After setting stripe_size as a configurable value by writing
sysfs entry, it means that we always allocate 64K buffers, but
just use 4K of them when stripe_size is 4K in 64KB arm64.

To avoid wasting memory, we try to let multiple sh->dev share
one real page. That means, multiple sh->dev[i].page will point
to the only page with different offset. Example of 64K PAGE_SIZE
and 4K stripe_size as following:

                    64K PAGE_SIZE
          +---+---+---+---+------------------------------+
          |   |   |   |   |
          |   |   |   |   |
          +-+-+-+-+-+-+-+-+------------------------------+
            ^   ^   ^   ^
            |   |   |   +----------------------------+
            |   |   |                                |
            |   |   +-------------------+            |
            |   |                       |            |
            |   +----------+            |            |
            |              |            |            |
            +-+            |            |            |
              |            |            |            |
        +-----+-----+------+-----+------+-----+------+------+
sh      | offset(0) | offset(4K) | offset(8K) | offset(12K) |
 +      +-----------+------------+------------+-------------+
 +----> dev[0].page  dev[1].page  dev[2].page  dev[3].page

A new 'pages' array will be added into stripe_head to record shared
page used by this stripe_head. Allocate them when grow_buffers()
and free them when shrink_buffers().

After trying to share page, the users of sh->dev[i].page need to take
care of the related page offset: page of issued bio and page passed
to xor compution functions. But thanks for previous different page offset
supported. Here, we just need to set correct dev[i].offset.
Signed-off-by: NYufen Yu <yuyufen@huawei.com>
Signed-off-by: NSong Liu <songliubraving@fb.com>

046169f0

md/raid6: let async recovery function support different page offset · 4f86ff55

由 Yufen Yu 提交于 8月 20, 2020

For now, asynchronous raid6 recovery calculate functions are require
common offset for pages. But, we expect them to support different page
offset after introducing stripe shared page. Do that by simplily adding
page offset where each page address are referred. Then, replace the
old interface with the new ones in raid6 and raid6test.
Signed-off-by: NYufen Yu <yuyufen@huawei.com>
Signed-off-by: NSong Liu <songliubraving@fb.com>

4f86ff55

md/raid6: let syndrome computor support different page offset · d69454bc

由 Yufen Yu 提交于 8月 20, 2020

For now, syndrome compute functions require common offset in the pages
array. However, we expect them to support different offset when try to
use shared page in the following. Simplily covert them by adding page
offset where each page address are referred.

Since the only caller of async_gen_syndrome() and async_syndrome_val()
are in raid6, we don't want to reserve the old interface but modify the
interface directly. After that, replacing old interfaces with new ones
for raid6 and raid6test.
Signed-off-by: NYufen Yu <yuyufen@huawei.com>
Signed-off-by: NSong Liu <songliubraving@fb.com>

d69454bc

md/raid5: convert to new xor compution interface · a7c224a8

由 Yufen Yu 提交于 8月 20, 2020

We try to replace async_xor() and async_xor_val() with the new
introduced interface async_xor_offs() and async_xor_val_offs()
for raid456.
Signed-off-by: NYufen Yu <yuyufen@huawei.com>
Signed-off-by: NSong Liu <songliubraving@fb.com>

a7c224a8

md/raid5: make async_copy_data() to support different page offset · 248728dd

由 Yufen Yu 提交于 8月 20, 2020

ops_run_biofill() and ops_run_biodrain() will call async_copy_data()
to copy sh->dev[i].page from or to bio page. For now, it implies the
offset of dev[i].page is 0. But we want to support different page offset
in the following.

Thus, pass page offset to these functions and replace 'page_offset'
with 'page_offset + poff'.

No functional change.
Signed-off-by: NYufen Yu <yuyufen@huawei.com>
Signed-off-by: NSong Liu <songliubraving@fb.com>

248728dd

md/raid5: add a new member of offset into r5dev · 7aba13b7

由 Yufen Yu 提交于 8月 20, 2020

Add a new member of offset into struct r5dev. It indicates the
offset of related dev[i].page. For now, since each device have a
privated page, the value is always 0. Thus, we set offset as 0
when allcate page in grow_buffers() and resize_stripes().

To support following different page offset, we try to use the page
offset rather than '0' directly for async_memcpy() and ops_run_io().

We try to support different page offset for xor compution functions
in the following. To avoid repeatly allocate a new array each time,
we add a memory region into scribble buffer to record offset.

No functional change.
Signed-off-by: NYufen Yu <yuyufen@huawei.com>
Signed-off-by: NSong Liu <songliubraving@fb.com>

7aba13b7

md: only calculate blocksize once and use i_blocksize() · 313b825f

由 Xianting Tian 提交于 8月 18, 2020

We alreday has the interface i_blocksize(), which can be used
to get blocksize, so use it.
Only calculate blocksize once and use it within read_page().
Signed-off-by: NXianting Tian <tian.xianting@h3c.com>
Signed-off-by: NSong Liu <songliubraving@fb.com>

313b825f

bdi: replace BDI_CAP_STABLE_WRITES with a queue and a sb flag · 1cb039f3

由 Christoph Hellwig 提交于 9月 24, 2020

The BDI_CAP_STABLE_WRITES is one of the few bits of information in the
backing_dev_info shared between the block drivers and the writeback code.
To help untangling the dependency replace it with a queue flag and a
superblock flag derived from it.  This also helps with the case of e.g.
a file system requiring stable writes due to its own checksumming, but
not forcing it on other users of the block device like the swap code.

One downside is that we an't support the stable_pages_required bdi
attribute in sysfs anymore.  It is replaced with a queue attribute which
also is writable for easier testing.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJan Kara <jack@suse.cz>
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

1cb039f3

block: lift setting the readahead size into the block layer · c2e4cd57

由 Christoph Hellwig 提交于 9月 24, 2020

Drivers shouldn't really mess with the readahead size, as that is a VM
concept.  Instead set it based on the optimal I/O size by lifting the
algorithm from the md driver when registering the disk.  Also set
bdi->io_pages there as well by applying the same scheme based on
max_sectors.  To ensure the limits work well for stacking drivers a
new helper is added to update the readahead limits from the block
limits, which is also called from disk_stack_limits.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Reviewed-by: NMike Snitzer <snitzer@redhat.com>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Acked-by: NColy Li <colyli@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c2e4cd57

md: update the optimal I/O size on reshape · 16ef5101

由 Christoph Hellwig 提交于 9月 24, 2020

The raid5 and raid10 drivers currently update the read-ahead size,
but not the optimal I/O size on reshape.  To prepare for deriving the
read-ahead size from the optimal I/O size make sure it is updated
as well.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Acked-by: NSong Liu <song@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

16ef5101

bcache: inherit the optimal I/O size · 5d4ce78b

由 Christoph Hellwig 提交于 9月 24, 2020

Inherit the optimal I/O size setting just like the readahead window,
as any reason to do larger I/O does not apply to just readahead.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJan Kara <jack@suse.cz>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Acked-by: NColy Li <colyli@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

5d4ce78b

20 9月, 2020 2 次提交

dm: Call proper helper to determine dax support · e2ec5128

由 Jan Kara 提交于 9月 20, 2020

DM was calling generic_fsdax_supported() to determine whether a device
referenced in the DM table supports DAX. However this is a helper for "leaf" device drivers so that
they don't have to duplicate common generic checks. High level code
should call dax_supported() helper which that calls into appropriate
helper for the particular device. This problem manifested itself as
kernel messages:

dm-3: error: dax access failed (-95)

when lvm2-testsuite run in cases where a DM device was stacked on top of
another DM device.

Fixes: 7bf7eac8 ("dax: Arrange for dax_supported check to span multiple devices")
Cc: <stable@vger.kernel.org>
Tested-by: NAdrian Huang <ahuang12@lenovo.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Acked-by: NMike Snitzer <snitzer@redhat.com>
Reported-by: Nkernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/r/160061715195.13131.5503173247632041975.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: NDan Williams <dan.j.williams@intel.com>

e2ec5128

dm/dax: Fix table reference counts · 02186d88

由 Dan Williams 提交于 9月 18, 2020

A recent fix to the dm_dax_supported() flow uncovered a latent bug. When
dm_get_live_table() fails it is still required to drop the
srcu_read_lock(). Without this change the lvm2 test-suite triggers this
warning:

    # lvm2-testsuite --only pvmove-abort-all.sh

    WARNING: lock held when returning to user space!
    5.9.0-rc5+ #251 Tainted: G           OE
    ------------------------------------------------
    lvm/1318 is leaving the kernel with locks still held!
    1 lock held by lvm/1318:
     #0: ffff9372abb5a340 (&md->io_barrier){....}-{0:0}, at: dm_get_live_table+0x5/0xb0 [dm_mod]

...and later on this hang signature:

    INFO: task lvm:1344 blocked for more than 122 seconds.
          Tainted: G           OE     5.9.0-rc5+ #251
    "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    task:lvm             state:D stack:    0 pid: 1344 ppid:     1 flags:0x00004000
    Call Trace:
     __schedule+0x45f/0xa80
     ? finish_task_switch+0x249/0x2c0
     ? wait_for_completion+0x86/0x110
     schedule+0x5f/0xd0
     schedule_timeout+0x212/0x2a0
     ? __schedule+0x467/0xa80
     ? wait_for_completion+0x86/0x110
     wait_for_completion+0xb0/0x110
     __synchronize_srcu+0xd1/0x160
     ? __bpf_trace_rcu_utilization+0x10/0x10
     __dm_suspend+0x6d/0x210 [dm_mod]
     dm_suspend+0xf6/0x140 [dm_mod]

Fixes: 7bf7eac8 ("dax: Arrange for dax_supported check to span multiple devices")
Cc: <stable@vger.kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Alasdair Kergon <agk@redhat.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Reported-by: NAdrian Huang <ahuang12@lenovo.com>
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Tested-by: NAdrian Huang <ahuang12@lenovo.com>
Link: https://lore.kernel.org/r/160045867590.25663.7548541079217827340.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: NDan Williams <dan.j.williams@intel.com>

02186d88

12 9月, 2020 2 次提交

bcache: use part_[begin|end]_io_acct instead of disk_[begin|end]_io_acct · 0806e60f

由 Song Liu 提交于 8月 31, 2020

This enables proper statistics in /proc/diskstats for bcache partitions.
Signed-off-by: NSong Liu <songliubraving@fb.com>
Reviewed-by: NColy Li <colyli@suse.de>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

0806e60f

md: use part_[begin|end]_io_acct instead of disk_[begin|end]_io_acct · 00fe60ea

由 Song Liu 提交于 8月 31, 2020

This enables proper statistics in /proc/diskstats for md partitions.
Signed-off-by: NSong Liu <songliubraving@fb.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

00fe60ea

10 9月, 2020 1 次提交

md: use bdev_check_media_change · 818077d6

由 Christoph Hellwig 提交于 9月 08, 2020

The md driver does not have a ->revalidate_disk method, so it can just
use bdev_check_media_change without any additional changes.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Acked-by: NSong Liu <song@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

818077d6

03 9月, 2020 3 次提交

dm thin metadata: Fix use-after-free in dm_bm_set_read_only · 3a653b20

由 Ye Bin 提交于 9月 01, 2020

The following error ocurred when testing disk online/offline:

[  301.798344] device-mapper: thin: 253:5: aborting current metadata transaction
[  301.848441] device-mapper: thin: 253:5: failed to abort metadata transaction
[  301.849206] Aborting journal on device dm-26-8.
[  301.850489] EXT4-fs error (device dm-26) in __ext4_new_inode:943: Journal has aborted
[  301.851095] EXT4-fs (dm-26): Delayed block allocation failed for inode 398742 at logical offset 181 with max blocks 19 with error 30
[  301.854476] BUG: KASAN: use-after-free in dm_bm_set_read_only+0x3a/0x40 [dm_persistent_data]

Reason is:

 metadata_operation_failed
    abort_transaction
        dm_pool_abort_metadata
	    __create_persistent_data_objects
	        r = __open_or_format_metadata
	        if (r) --> If failed will free pmd->bm but pmd->bm not set NULL
		    dm_block_manager_destroy(pmd->bm);
    set_pool_mode
	dm_pool_metadata_read_only(pool->pmd);
	dm_bm_set_read_only(pmd->bm);  --> use-after-free

Add checks to see if pmd->bm is NULL in dm_bm_set_read_only and
dm_bm_set_read_write functions.  If bm is NULL it means creating the
bm failed and so dm_bm_is_read_only must return true.
Signed-off-by: NYe Bin <yebin10@huawei.com>
Cc: stable@vger.kernel.org
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

3a653b20

dm thin metadata: Avoid returning cmd->bm wild pointer on error · 219403d7

由 Ye Bin 提交于 9月 01, 2020

Maybe __create_persistent_data_objects() caller will use PTR_ERR as a
pointer, it will lead to some strange things.
Signed-off-by: NYe Bin <yebin10@huawei.com>
Cc: stable@vger.kernel.org
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

219403d7

dm cache metadata: Avoid returning cmd->bm wild pointer on error · d16ff19e

由 Ye Bin 提交于 9月 01, 2020

Maybe __create_persistent_data_objects() caller will use PTR_ERR as a
pointer, it will lead to some strange things.
Signed-off-by: NYe Bin <yebin10@huawei.com>
Cc: stable@vger.kernel.org
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

d16ff19e

02 9月, 2020 7 次提交

block: remove revalidate_disk() · de09077c

由 Christoph Hellwig 提交于 9月 01, 2020

Remove the now unused helper.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Acked-by: NSong Liu <song@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

de09077c

block: add a new revalidate_disk_size helper · 659e56ba

由 Christoph Hellwig 提交于 9月 01, 2020

revalidate_disk is a relative awkward helper for driver use, as it first
calls an optional driver method and then updates the block device size,
while most callers either don't need the method call at all, or want to
keep state between the caller and the called method.

Add a revalidate_disk_size helper that just performs the update of the
block device size from the gendisk one, and switch all drivers that do
not implement ->revalidate_disk to use the new helper instead of
revalidate_disk()
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Acked-by: NSong Liu <song@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

659e56ba

block: fix locking for struct block_device size updates · c2b4bb8c

由 Christoph Hellwig 提交于 8月 23, 2020

Two different callers use two different mutexes for updating the
block device size, which obviously doesn't help to actually protect
against concurrent updates from the different callers.  In addition
one of the locks, bd_mutex is rather prone to deadlocks with other
parts of the block stack that use it for high level synchronization.

Switch to using a new spinlock protecting just the size updates, as
that is all we need, and make sure everyone does the update through
the proper helper.

This fixes a bug reported with the nvme revalidating disks during a
hot removal operation, which can currently deadlock on bd_mutex.
Reported-by: NXianting Tian <xianting_tian@126.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c2b4bb8c

dm integrity: fix error reporting in bitmap mode after creation · e27fec66

由 Mikulas Patocka 提交于 8月 31, 2020

The dm-integrity target did not report errors in bitmap mode just after
creation. The reason is that the function integrity_recalc didn't clean up
ic->recalc_bitmap as it proceeded with recalculation.

Fix this by updating the bitmap accordingly -- the double shift serves
to rounddown.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Fixes: 468dfca3 ("dm integrity: add a bitmap mode")
Cc: stable@vger.kernel.org # v5.2+
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

e27fec66

dm crypt: Initialize crypto wait structures · 7785a9e4

由 Damien Le Moal 提交于 8月 31, 2020

Use the DECLARE_CRYPTO_WAIT() macro to properly initialize the crypto
wait structures declared on stack before their use with
crypto_wait_req().

Fixes: 39d13a1a ("dm crypt: reuse eboiv skcipher for IV generation")
Fixes: bbb16584 ("dm crypt: Implement Elephant diffuser for Bitlocker compatibility")
Cc: stable@vger.kernel.org
Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

7785a9e4

dm mpath: fix racey management of PG initialization · c322ee93

由 Mike Snitzer 提交于 8月 24, 2020

Commit 935fcc56 ("dm mpath: only flush workqueue when needed")
changed flush_multipath_work() to avoid needless workqueue
flushing (of a multipath global workqueue). But that change didn't
realize the surrounding flush_multipath_work() code should also only
run if 'pg_init_in_progress' is set.

Fix this by only doing all of flush_multipath_work()'s PG init related
work if 'pg_init_in_progress' is set.

Otherwise multipath_wait_for_pg_init_completion() will run
unconditionally but the preceeding flush_workqueue(kmpath_handlerd)
may not. This could lead to deadlock (though only if kmpath_handlerd
never runs a corresponding work to decrement 'pg_init_in_progress').

It could also be, though highly unlikely, that the kmpath_handlerd
work that does PG init completes before 'pg_init_in_progress' is set,
and then an intervening DM table reload's multipath_postsuspend()
triggers flush_multipath_work().

Fixes: 935fcc56 ("dm mpath: only flush workqueue when needed")
Cc: stable@vger.kernel.org
Reported-by: NBen Marzinski <bmarzins@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

c322ee93

dm writecache: handle DAX to partitions on persistent memory correctly · f9e040ef

由 Mikulas Patocka 提交于 8月 24, 2020

The function dax_direct_access doesn't take partitions into account,
it always maps pages from the beginning of the device. Therefore,
persistent_memory_claim() must get the partition offset using
get_start_sect() and add it to the page offsets passed to
dax_direct_access().
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Fixes: 48debafe ("dm: add writecache target")
Cc: stable@vger.kernel.org # 4.18+
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

f9e040ef

28 8月, 2020 1 次提交

md/raid5: make sure stripe_size as power of two · 6af10a33

由 Yufen Yu 提交于 8月 20, 2020

Commit 3b5408b9 ("md/raid5: support config stripe_size by sysfs
entry") make stripe_size as a configurable value. It just requires
stripe_size as multiple of 4KB.

In fact, we should make sure stripe_size as power of two. Otherwise,
stripe_shift which is the result of ilog2 can not represent the real
stripe_size. Then, stripe_hash() and stripe_hash_locks_hash() may
get unexpected value.

Fixes: 3b5408b9 ("md/raid5: support config stripe_size by sysfs entry")
Signed-off-by: NYufen Yu <yuyufen@huawei.com>
Signed-off-by: NSong Liu <songliubraving@fb.com>

6af10a33

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功