提交 · f5b9a51db29c31f4e486b08d1d823d6f75f2c2c7 · openeuler / Kernel

15 4月, 2021 18 次提交

nvme: factor out nvme_ns_open and nvme_ns_release helpers · f5b9a51d

由 Christoph Hellwig 提交于 4月 07, 2021

These will be reused for the per-namespace character devices.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <kbusch@kernel.org>
Reviewed-by: NJavier González <javier.gonz@samsung.com>
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>

f5b9a51d

nvme: move nvme_ns_head_ops to multipath.c · 1496bd49

由 Christoph Hellwig 提交于 4月 07, 2021

Move the multipath block_device_operations to multipath.c, where they
belong.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <kbusch@kernel.org>
Reviewed-by: NJavier González <javier.gonz@samsung.com>
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>

1496bd49

nvme: factor out a nvme_tryget_ns_head helper · 871ca3ef

由 Christoph Hellwig 提交于 4月 07, 2021

Add a helper to avoid opencoding ns_head->ref manipulations.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <kbusch@kernel.org>
Reviewed-by: NJavier González <javier.gonz@samsung.com>
Reviewed-by: NKanchan Joshi <joshi.k@samsung.com>
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>

871ca3ef

nvme: move the ioctl code to a separate file · 2405252a

由 Christoph Hellwig 提交于 4月 10, 2021

Split out the ioctl code from core.c into a new file.  Also update
copyrights while we're at it.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <kbusch@kernel.org>
Reviewed-by: NJavier González <javier.gonz@samsung.com>
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>

2405252a

nvme: don't bother to look up a namespace for controller ioctls · 3557a440

由 Christoph Hellwig 提交于 8月 14, 2020

Don't bother to look up a namespace just to drop if after retreiving the
controller for the multipath case.  Just look up a live controller for
the subsystem directly.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <kbusch@kernel.org>
Reviewed-by: NJavier González <javier.gonz@samsung.com>

3557a440

nvme: simplify block device ioctl handling for the !multipath case · 2f907f7f

由 Christoph Hellwig 提交于 8月 14, 2020

Only use the existing ioctl handler for the multipath case, and add a
simpler one that reverts to the pre-multipath case for not shared
use case.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <kbusch@kernel.org>
Reviewed-by: NJavier González <javier.gonz@samsung.com>

2f907f7f

nvme: simplify the compat ioctl handling · 89b3d6e6

由 Christoph Hellwig 提交于 4月 08, 2021

Don't bother defining a separate compat_ioctl handler, and just handle
the NVME_IOCTL_SUBMIT_IO32 case inline.  Also only defined it for those
ABIs (currently just i386 vs x86_64) that are affected.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <kbusch@kernel.org>
Reviewed-by: NJavier González <javier.gonz@samsung.com>

89b3d6e6

nvme: factor out a nvme_ns_ioctl helper · a5d737f1

由 Christoph Hellwig 提交于 8月 14, 2020

Factor out a helper for the namespace based ioctls.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <kbusch@kernel.org>
Reviewed-by: NJavier González <javier.gonz@samsung.com>
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>

a5d737f1

nvme: pass a user pointer to nvme_nvm_ioctl · d7790d37

由 Christoph Hellwig 提交于 8月 14, 2020

Pass the proper user pointer instead of the not all that useful integer
representation.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <kbusch@kernel.org>
Reviewed-by: NJavier González <javier.gonz@samsung.com>

d7790d37

nvme: cleanup setting the disk name · 9953ab0c

由 Christoph Hellwig 提交于 4月 07, 2021

Return false from nvme_set_disk_name and let the caller set the
non-multipath name instead of duplicating the naming information in two
places.  Also remove the pointless local variables for the disk name
and flags and the not needed ctrl argument.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <kbusch@kernel.org>
Reviewed-by: NJavier González <javier.gonz@samsung.com>

9953ab0c

nvme: add a nvme_ns_head_multipath helper · 30897388

由 Minwoo Im 提交于 4月 07, 2021

Move the multipath gendisk out of #ifdef CONFIG_NVME_MULTIPATH and add
a new nvme_ns_head_multipath that uses it to check if a ns_head has
a multipath device associated with it.
Signed-off-by: NMinwoo Im <minwoo.im.dev@gmail.com>
[hch: added the IS_ENABLED, converted a few existing users]
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <kbusch@kernel.org>
Reviewed-by: NJavier González <javier.gonz@samsung.com>
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>

30897388

nvme: remove single trailing whitespace · 95d54bd1

由 Niklas Cassel 提交于 4月 10, 2021

There is a single trailing whitespace in core.c.
Since this is just a single whitespace, the chances of this affecting
backports to stable should be quite low, so let's just remove it.
Signed-off-by: NNiklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

95d54bd1

nvme-multipath: remove single trailing whitespace · e234f1f8

由 Niklas Cassel 提交于 4月 10, 2021

There is a single trailing whitespace in multipath.c.
Since this is just a single whitespace, the chances of this affecting
backports to stable should be quite low, so let's just remove it.
Signed-off-by: NNiklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

e234f1f8

nvme-pci: remove single trailing whitespace · 53dc180e

由 Niklas Cassel 提交于 4月 10, 2021

There is a single trailing whitespace in pci.c.
Since this is just a single whitespace, the chances of this affecting
backports to stable should be quite low, so let's just remove it.
Signed-off-by: NNiklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

53dc180e

nvme-pci: don't simple map sgl when sgls are disabled · e51183be

由 Niklas Cassel 提交于 4月 09, 2021

According to the module parameter description for sgl_threshold,
a value of 0 means that SGLs are disabled.

If SGLs are disabled, we should respect that, even for the case
where the request is made up of a single physical segment.

Fixes: 29791057 ("nvme-pci: optimize mapping single segment requests using SGLs")
Signed-off-by: NNiklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

e51183be

nvmet: fix a spelling mistake "nubmer" -> "number" · ccc1003b

由 Colin Ian King 提交于 4月 07, 2021

There is a spelling mistake in a pr_err error message. Fix it.
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

ccc1003b

nvmet-fc: simplify nvmet_fc_alloc_hostport · 0d8ddeea

由 Amit Engel 提交于 3月 22, 2021

Once a host is already created, avoid allocate additional hostports that
will be thrown away. add an helper function to handle host search.
Reviewed-by: NHimanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NAmit Engel <amit.engel@dell.com>
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

0d8ddeea

nvmet-tcp: fix a segmentation fault during io parsing error · bdaf1327

由 Elad Grupi 提交于 3月 31, 2021

In case there is an io that contains inline data and it goes to
parsing error flow, command response will free command and iov
before clearing the data on the socket buffer.
This will delay the command response until receive flow is completed.

Fixes: 872d26a3 ("nvmet-tcp: add NVMe over TCP target driver")
Signed-off-by: NElad Grupi <elad.grupi@dell.com>
Signed-off-by: NHou Pu <houpu.main@gmail.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

bdaf1327

13 4月, 2021 4 次提交

lightnvm: deprecated OCSSD support and schedule it for removal in Linux 5.15 · f8ee34a9

由 Christoph Hellwig 提交于 4月 13, 2021

Lightnvm was an innovative idea to expose more low-level control over SSDs.
But it failed to get properly standardized and remains a non-standarized
extension to NVMe that requires vendor specific quirks for a few now mostly
obsolete SSD devices. The standardized ZNS command set for NVMe has take
over a lot of the approaches and allows for fully standardized operation.

Remove the Linux code to support open channel SSDs as the few production
deployments of the above mentioned SSDs are using userspace driver stacks
instead of the fairly limited Linux support.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJavier González <javier@javigon.com>
Signed-off-by: NMatias Bjørling <mb@lightnvm.io>
Link: https://lore.kernel.org/r/20210413105257.159260-5-matias.bjorling@wdc.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

f8ee34a9

lightnvm: remove duplicate include in lightnvm.h · 655cdafd

由 Zhang Yunkai 提交于 4月 13, 2021

'linux/blkdev.h' and 'uapi/linux/lightnvm.h' included in 'lightnvm.h'
is duplicated.It is also included in the 5th and 7th line.
Signed-off-by: NZhang Yunkai <zhang.yunkai@zte.com.cn>
Signed-off-by: NMatias Bjørling <matias.bjorling@wdc.com>
Link: https://lore.kernel.org/r/20210413105257.159260-4-matias.bjorling@wdc.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

655cdafd

lightnvm: return the correct return value · 1c6b0bc7

由 Tian Tao 提交于 4月 13, 2021

When memdup_user returns an error, memdup_user has two different return
values, use PTR_ERR to get the correct return value.
Signed-off-by: NTian Tao <tiantao6@hisilicon.com>
Signed-off-by: NMatias Bjørling <matias.bjorling@wdc.com>
Link: https://lore.kernel.org/r/20210413105257.159260-3-matias.bjorling@wdc.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

1c6b0bc7

lightnvm: use kobj_to_dev() · 327e1d29

由 Chaitanya Kulkarni 提交于 4月 13, 2021

This fixs coccicheck warning:

drivers/nvme//host/lightnvm.c:1243:60-61: WARNING opportunity for
kobj_to_dev()
Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NMatias Bjørling <matias.bjorling@wdc.com>
Link: https://lore.kernel.org/r/20210413105257.159260-2-matias.bjorling@wdc.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

327e1d29

12 4月, 2021 3 次提交

block: remove the -ERESTARTSYS handling in blkdev_get_by_dev · a8ed1a06

由 Christoph Hellwig 提交于 4月 12, 2021

Now that md has been cleaned up we can get rid of this hack.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

a8ed1a06

null_blk: add option for managing virtual boundary · cee1b215

由 Max Gurtovoy 提交于 4月 12, 2021

This will enable changing the virtual boundary of null blk devices. For
now, null blk devices didn't have any restriction on the scatter/gather
elements received from the block layer. Add a module parameter and a
configfs option that will control the virtual boundary. This will
enable testing the efficiency of the block layer bounce buffer in case
a suitable application will send discontiguous IO to the given device.

Initial testing with patched FIO showed the following results (64 jobs,
128 iodepth, 1 nullb device):
IO size READ (virt=false) READ (virt=true) Write (virt=false) Write (virt=true)
---------- ------------------- ----------------- ------------------- -------------------
1k 10.7M 8482k 10.8M 8471k
2k 10.4M 8266k 10.4M 8271k
4k 10.4M 8274k 10.3M 8226k
8k 10.2M 8131k 9800k 7933k
16k 9567k 7764k 8081k 6828k
32k 8865k 7309k 5570k 5153k
64k 7695k 6586k 2682k 2617k
128k 5346k 5489k 1320k 1296k
Signed-off-by: NMax Gurtovoy <mgurtovoy@nvidia.com>
Reviewed-by: NDamien Le Moal <damien.lemoal@wdc.com>
Link: https://lore.kernel.org/r/20210412095523.278632-1-mgurtovoy@nvidia.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

cee1b215

gdrom: fix compilation error · eb87e4e9

由 Chaitanya Kulkarni 提交于 4月 11, 2021

Use the right name for the struct request variable that removes the
following compilation error :-

make --silent --keep-going --jobs=8
O=/home/tuxbuild/.cache/tuxmake/builds/1/tmp ARCH=sh
CROSS_COMPILE=sh4-linux-gnu- 'CC=sccache sh4-linux-gnu-gcc'
'HOSTCC=sccache gcc'

In file included from /builds/linux/include/linux/scatterlist.h:9,
                 from /builds/linux/include/linux/dma-mapping.h:10,
                 from /builds/linux/drivers/cdrom/gdrom.c:16:
/builds/linux/drivers/cdrom/gdrom.c: In function 'gdrom_readdisk_dma':
/builds/linux/drivers/cdrom/gdrom.c:586:61: error: 'rq' undeclared
(first use in this function)
  586 |  __raw_writel(page_to_phys(bio_page(req->bio)) + bio_offset(rq->bio),
      |                                                             ^~

Fixes: 1d2c8200 ("gdrom: support highmem")
Reported-by: NNaresh Kamboju <naresh.kamboju@linaro.org>
Tested-by: NNaresh Kamboju <naresh.kamboju@linaro.org>
Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

eb87e4e9

11 4月, 2021 7 次提交

bcache: fix a regression of code compiling failure in debug.c · 33ec5dfe

由 Coly Li 提交于 4月 11, 2021

The patch "bcache: remove PTR_CACHE" introduces a compiling failure in
debug.c with following error message,
  In file included from drivers/md/bcache/bcache.h:182:0,
                   from drivers/md/bcache/debug.c:9:
  drivers/md/bcache/debug.c: In function 'bch_btree_verify':
  drivers/md/bcache/debug.c:53:19: error: 'c' undeclared (first use in
  this function)
    bio_set_dev(bio, c->cache->bdev);
                     ^
This patch fixes the regression by replacing c->cache->bdev by b->c->
cache->bdev.
Signed-off-by: NColy Li <colyli@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210411134316.80274-8-colyli@suse.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

33ec5dfe

bcache: Use 64-bit arithmetic instead of 32-bit · 62594f18

由 Gustavo A. R. Silva 提交于 4月 11, 2021

Cast multiple variables to (int64_t) in order to give the compiler
complete information about the proper arithmetic to use. Notice that
these variables are being used in contexts that expect expressions of
type int64_t (64 bit, signed). And currently, such expressions are
being evaluated using 32-bit arithmetic.

Fixes: d0cf9503 ("octeontx2-pf: ethtool fec mode support")
Addresses-Coverity-ID: 1501724 ("Unintentional integer overflow")
Addresses-Coverity-ID: 1501725 ("Unintentional integer overflow")
Addresses-Coverity-ID: 1501726 ("Unintentional integer overflow")
Signed-off-by: NGustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: NColy Li <colyli@suse.de>
Link: https://lore.kernel.org/r/20210411134316.80274-7-colyli@suse.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

62594f18

md: bcache: Trivial typo fixes in the file journal.c · 9c9b81c4

由 Bhaskar Chowdhury 提交于 4月 11, 2021

s/condidate/candidate/
s/folowing/following/
Signed-off-by: NBhaskar Chowdhury <unixbhaskar@gmail.com>
Signed-off-by: NColy Li <colyli@suse.de>
Link: https://lore.kernel.org/r/20210411134316.80274-6-colyli@suse.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

9c9b81c4

md: bcache: avoid -Wempty-body warnings · be3bacec

由 Arnd Bergmann 提交于 4月 11, 2021

building with 'make W=1' shows a harmless warning for each user of the
EBUG_ON() macro:

drivers/md/bcache/bset.c: In function 'bch_btree_sort_partial':
drivers/md/bcache/util.h:30:55: error: suggest braces around empty body in an 'if' statement [-Werror=empty-body]
   30 | #define EBUG_ON(cond)                   do { if (cond); } while (0)
      |                                                       ^
drivers/md/bcache/bset.c:1312:9: note: in expansion of macro 'EBUG_ON'
 1312 |         EBUG_ON(oldsize >= 0 && bch_count_data(b) != oldsize);
      |         ^~~~~~~

Reword the macro slightly to avoid the warning.
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NColy Li <colyli@suse.de>
Link: https://lore.kernel.org/r/20210411134316.80274-5-colyli@suse.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

be3bacec

bcache: use NULL instead of using plain integer as pointer · f9a018e8

由 Yang Li 提交于 4月 11, 2021

This fixes the following sparse warnings:
drivers/md/bcache/features.c:22:16: warning: Using plain integer as NULL
pointer
Reported-by: NAbaci Robot <abaci@linux.alibaba.com>
Signed-off-by: NYang Li <yang.lee@linux.alibaba.com>
Signed-off-by: NColy Li <colyli@suse.de>
Link: https://lore.kernel.org/r/20210411134316.80274-4-colyli@suse.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

f9a018e8

bcache: remove PTR_CACHE · 11e9560e

由 Christoph Hellwig 提交于 4月 11, 2021

Remove the PTR_CACHE inline and replace it with a direct dereference
of c->cache.

(Coly Li: fix the typo from PTR_BUCKET to PTR_CACHE in commit log)
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NColy Li <colyli@suse.de>
Link: https://lore.kernel.org/r/20210411134316.80274-3-colyli@suse.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

11e9560e

bcache: reduce redundant code in bch_cached_dev_run() · 13e1db65

由 Zhiqiang Liu 提交于 4月 11, 2021

In bch_cached_dev_run(), free(env[1])|free(env[2])|free(buf)
show up three times. This patch introduce out tag in
which free(env[1])|free(env[2])|free(buf) are only called
one time. If we need to call free() when errors occur,
we can set error code to ret, and then goto out tag directly.
Signed-off-by: NZhiqiang Liu <liuzhiqiang26@huawei.com>
Signed-off-by: NColy Li <colyli@suse.de>
Link: https://lore.kernel.org/r/20210411134316.80274-2-colyli@suse.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

13e1db65

08 4月, 2021 4 次提交

Merge branch 'md-next' of... · ff917638

由 Jens Axboe 提交于 4月 08, 2021

Merge branch 'md-next' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md into for-5.13/drivers

Pull MD updates from Song:

"These patches fix a race condition with md_release() and md_open()."

* 'md-next' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md:
  md: split mddev_find
  md: factor out a mddev_find_locked helper from mddev_find
  md: md_open returns -EBUSY when entering racing area

ff917638

md: split mddev_find · 65aa97c4

由 Christoph Hellwig 提交于 4月 03, 2021

Split mddev_find into a simple mddev_find that just finds an existing
mddev by the unit number, and a more complicated mddev_find that deals
with find or allocating a mddev.

This turns out to fix this bug reported by Zhao Heming.

----------------------------- snip ------------------------------
commit d3374825 ("md: make devices disappear when they are no longer
needed.") introduced protection between mddev creating & removing. The
md_open shouldn't create mddev when all_mddevs list doesn't contain
mddev. With currently code logic, there will be very easy to trigger
soft lockup in non-preempt env.

*** env ***
kvm-qemu VM 2C1G with 2 iscsi luns
kernel should be non-preempt

*** script ***

about trigger 1 time with 10 tests

`1  node1="15sp3-mdcluster1"
2  node2="15sp3-mdcluster2"
3
4  mdadm -Ss
5  ssh ${node2} "mdadm -Ss"
6  wipefs -a /dev/sda /dev/sdb
7  mdadm -CR /dev/md0 -b clustered -e 1.2 -n 2 -l mirror /dev/sda \
   /dev/sdb --assume-clean
8
9  for i in {1..100}; do
10    echo ==== $i ====;
11
12    echo "test  ...."
13    ssh ${node2} "mdadm -A /dev/md0 /dev/sda /dev/sdb"
14    sleep 1
15
16    echo "clean  ....."
17    ssh ${node2} "mdadm -Ss"
18 done
`
I use mdcluster env to trigger soft lockup, but it isn't mdcluster
speical bug. To stop md array in mdcluster env will do more jobs than
non-cluster array, which will leave enough time/gap to allow kernel to
run md_open.

*** stack ***

`ID: 2831   TASK: ffff8dd7223b5040  CPU: 0   COMMAND: "mdadm"
 #0 [ffffa15d00a13b90] __schedule at ffffffffb8f1935f
 #1 [ffffa15d00a13ba8] exact_lock at ffffffffb8a4a66d
 #2 [ffffa15d00a13bb0] kobj_lookup at ffffffffb8c62fe3
 #3 [ffffa15d00a13c28] __blkdev_get at ffffffffb89273b9
 #4 [ffffa15d00a13c98] blkdev_get at ffffffffb8927964
 #5 [ffffa15d00a13cb0] do_dentry_open at ffffffffb88dc4b4
 #6 [ffffa15d00a13ce0] path_openat at ffffffffb88f0ccc
 #7 [ffffa15d00a13db8] do_filp_open at ffffffffb88f32bb
 #8 [ffffa15d00a13ee0] do_sys_open at ffffffffb88ddc7d
 #9 [ffffa15d00a13f38] do_syscall_64 at ffffffffb86053cb ffffffffb900008c

or:
[  884.226509]  mddev_put+0x1c/0xe0 [md_mod]
[  884.226515]  md_open+0x3c/0xe0 [md_mod]
[  884.226518]  __blkdev_get+0x30d/0x710
[  884.226520]  ? bd_acquire+0xd0/0xd0
[  884.226522]  blkdev_get+0x14/0x30
[  884.226524]  do_dentry_open+0x204/0x3a0
[  884.226531]  path_openat+0x2fc/0x1520
[  884.226534]  ? seq_printf+0x4e/0x70
[  884.226536]  do_filp_open+0x9b/0x110
[  884.226542]  ? md_release+0x20/0x20 [md_mod]
[  884.226543]  ? seq_read+0x1d8/0x3e0
[  884.226545]  ? kmem_cache_alloc+0x18a/0x270
[  884.226547]  ? do_sys_open+0x1bd/0x260
[  884.226548]  do_sys_open+0x1bd/0x260
[  884.226551]  do_syscall_64+0x5b/0x1e0
[  884.226554]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
`
*** rootcause ***

"mdadm -A" (or other array assemble commands) will start a daemon "mdadm
--monitor" by default. When "mdadm -Ss" is running, the stop action will
wakeup "mdadm --monitor". The "--monitor" daemon will immediately get
info from /proc/mdstat. This time mddev in kernel still exist, so
/proc/mdstat still show md device, which makes "mdadm --monitor" to open
/dev/md0.

The previously "mdadm -Ss" is removing action, the "mdadm --monitor"
open action will trigger md_open which is creating action. Racing is
happening.

`<thread 1>: "mdadm -Ss"
md_release
  mddev_put deletes mddev from all_mddevs
  queue_work for mddev_delayed_delete
  at this time, "/dev/md0" is still available for opening

<thread 2>: "mdadm --monitor ..."
md_open
 + mddev_find can't find mddev of /dev/md0, and create a new mddev and
 |    return.
 + trigger "if (mddev->gendisk != bdev->bd_disk)" and return
      -ERESTARTSYS.
`
In non-preempt kernel, <thread 2> is occupying on current CPU. and
mddev_delayed_delete which was created in <thread 1> also can't be
schedule.

In preempt kernel, it can also trigger above racing. But kernel doesn't
allow one thread running on a CPU all the time. after <thread 2> running
some time, the later "mdadm -A" (refer above script line 13) will call
md_alloc to alloc a new gendisk for mddev. it will break md_open
statement "if (mddev->gendisk != bdev->bd_disk)" and return 0 to caller,
the soft lockup is broken.
------------------------------ snip ------------------------------

Cc: stable@vger.kernel.org
Fixes: d3374825 ("md: make devices disappear when they are no longer needed.")
Reported-by: NHeming Zhao <heming.zhao@suse.com>
Reviewed-by: NHeming Zhao <heming.zhao@suse.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NSong Liu <song@kernel.org>

65aa97c4

md: factor out a mddev_find_locked helper from mddev_find · 8b57251f

由 Christoph Hellwig 提交于 4月 03, 2021

Factor out a self-contained helper to just lookup a mddev by the dev_t
"unit".

Cc: stable@vger.kernel.org
Reviewed-by: NHeming Zhao <heming.zhao@suse.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NSong Liu <song@kernel.org>

8b57251f

md: md_open returns -EBUSY when entering racing area · 6a4db2a6

由 Zhao Heming 提交于 4月 03, 2021

commit d3374825 ("md: make devices disappear when they are no longer
needed.") introduced protection between mddev creating & removing. The
md_open shouldn't create mddev when all_mddevs list doesn't contain
mddev. With currently code logic, there will be very easy to trigger
soft lockup in non-preempt env.

This patch changes md_open returning from -ERESTARTSYS to -EBUSY, which
will break the infinitely retry when md_open enter racing area.

This patch is partly fix soft lockup issue, full fix needs mddev_find
is split into two functions: mddev_find & mddev_find_or_alloc. And
md_open should call new mddev_find (it only does searching job).

For more detail, please refer with Christoph's "split mddev_find" patch
in later commits.

*** env ***
kvm-qemu VM 2C1G with 2 iscsi luns
kernel should be non-preempt

*** script ***

about trigger every time with below script

```
1  node1="mdcluster1"
2  node2="mdcluster2"
3
4  mdadm -Ss
5  ssh ${node2} "mdadm -Ss"
6  wipefs -a /dev/sda /dev/sdb
7  mdadm -CR /dev/md0 -b clustered -e 1.2 -n 2 -l mirror /dev/sda \
   /dev/sdb --assume-clean
8
9  for i in {1..10}; do
10    echo ==== $i ====;
11
12    echo "test  ...."
13    ssh ${node2} "mdadm -A /dev/md0 /dev/sda /dev/sdb"
14    sleep 1
15
16    echo "clean  ....."
17    ssh ${node2} "mdadm -Ss"
18 done
```

I use mdcluster env to trigger soft lockup, but it isn't mdcluster
speical bug. To stop md array in mdcluster env will do more jobs than
non-cluster array, which will leave enough time/gap to allow kernel to
run md_open.

*** stack ***

```
[  884.226509]  mddev_put+0x1c/0xe0 [md_mod]
[  884.226515]  md_open+0x3c/0xe0 [md_mod]
[  884.226518]  __blkdev_get+0x30d/0x710
[  884.226520]  ? bd_acquire+0xd0/0xd0
[  884.226522]  blkdev_get+0x14/0x30
[  884.226524]  do_dentry_open+0x204/0x3a0
[  884.226531]  path_openat+0x2fc/0x1520
[  884.226534]  ? seq_printf+0x4e/0x70
[  884.226536]  do_filp_open+0x9b/0x110
[  884.226542]  ? md_release+0x20/0x20 [md_mod]
[  884.226543]  ? seq_read+0x1d8/0x3e0
[  884.226545]  ? kmem_cache_alloc+0x18a/0x270
[  884.226547]  ? do_sys_open+0x1bd/0x260
[  884.226548]  do_sys_open+0x1bd/0x260
[  884.226551]  do_syscall_64+0x5b/0x1e0
[  884.226554]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
```

*** rootcause ***

"mdadm -A" (or other array assemble commands) will start a daemon "mdadm
--monitor" by default. When "mdadm -Ss" is running, the stop action will
wakeup "mdadm --monitor". The "--monitor" daemon will immediately get
info from /proc/mdstat. This time mddev in kernel still exist, so
/proc/mdstat still show md device, which makes "mdadm --monitor" to open
/dev/md0.

The previously "mdadm -Ss" is removing action, the "mdadm --monitor"
open action will trigger md_open which is creating action. Racing is
happening.

```
<thread 1>: "mdadm -Ss"
md_release
  mddev_put deletes mddev from all_mddevs
  queue_work for mddev_delayed_delete
  at this time, "/dev/md0" is still available for opening

<thread 2>: "mdadm --monitor ..."
md_open
 + mddev_find can't find mddev of /dev/md0, and create a new mddev and
 |    return.
 + trigger "if (mddev->gendisk != bdev->bd_disk)" and return
      -ERESTARTSYS.
```

In non-preempt kernel, <thread 2> is occupying on current CPU. and
mddev_delayed_delete which was created in <thread 1> also can't be
schedule.

In preempt kernel, it can also trigger above racing. But kernel doesn't
allow one thread running on a CPU all the time. after <thread 2> running
some time, the later "mdadm -A" (refer above script line 13) will call
md_alloc to alloc a new gendisk for mddev. it will break md_open
statement "if (mddev->gendisk != bdev->bd_disk)" and return 0 to caller,
the soft lockup is broken.

Cc: stable@vger.kernel.org
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NZhao Heming <heming.zhao@suse.com>
Signed-off-by: NSong Liu <song@kernel.org>

6a4db2a6

06 4月, 2021 4 次提交

drbd: use DEFINE_SPINLOCK() for spinlock · 9c282c29

由 Guobin Huang 提交于 4月 06, 2021

spinlock can be initialized automatically with DEFINE_SPINLOCK()
rather than explicitly calling spin_lock_init().
Reported-by: NHulk Robot <hulkci@huawei.com>
Signed-off-by: NGuobin Huang <huangguobin4@huawei.com>
Link: https://lore.kernel.org/r/1617710988-49205-1-git-send-email-huangguobin4@huawei.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

9c282c29

swim3: support highmem · b60b270b

由 Christoph Hellwig 提交于 4月 06, 2021

swim3 only uses the virtual address of a bio to stash it into the data
transfer using virt_to_bus. But the ppc32 virt_to_bus just uses the
physical address with an offset. Replace virt_to_bus with a local hack
that performs the equivalent transformation and stop asking for block
layer bounce buffering.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210406061839.811588-1-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

b60b270b

floppy: always use the track buffer · 3d86739c

由 Christoph Hellwig 提交于 4月 06, 2021

Always use the track buffer that is already used for addresses outside
the 16MB address capability of the floppy controller. This allows to
remove a lot of code that relies on kernel virtual addresses. With
this gone there is just a single place left that looks at the bio,
which can be converted to memcpy_{from,to}_page, thus removing the need
for the extra block-layer bounce buffering for highmem pages.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210406061755.811522-1-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

3d86739c

swim: don't call blk_queue_bounce_limit · 4c6e5bc8

由 Christoph Hellwig 提交于 4月 06, 2021

m68k doesn't support highmem, so don't bother enabling the block layer
bounce buffer code. Just for safety throw in a depend on !HIGHMEM.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210406061725.811389-1-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

4c6e5bc8

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功