- 15 4月, 2021 18 次提交
-
-
由 Christoph Hellwig 提交于
These will be reused for the per-namespace character devices. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NKeith Busch <kbusch@kernel.org> Reviewed-by: NJavier González <javier.gonz@samsung.com> Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
-
由 Christoph Hellwig 提交于
Move the multipath block_device_operations to multipath.c, where they belong. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NKeith Busch <kbusch@kernel.org> Reviewed-by: NJavier González <javier.gonz@samsung.com> Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
-
由 Christoph Hellwig 提交于
Add a helper to avoid opencoding ns_head->ref manipulations. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NKeith Busch <kbusch@kernel.org> Reviewed-by: NJavier González <javier.gonz@samsung.com> Reviewed-by: NKanchan Joshi <joshi.k@samsung.com> Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
-
由 Christoph Hellwig 提交于
Split out the ioctl code from core.c into a new file. Also update copyrights while we're at it. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NKeith Busch <kbusch@kernel.org> Reviewed-by: NJavier González <javier.gonz@samsung.com> Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
-
由 Christoph Hellwig 提交于
Don't bother to look up a namespace just to drop if after retreiving the controller for the multipath case. Just look up a live controller for the subsystem directly. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NKeith Busch <kbusch@kernel.org> Reviewed-by: NJavier González <javier.gonz@samsung.com>
-
由 Christoph Hellwig 提交于
Only use the existing ioctl handler for the multipath case, and add a simpler one that reverts to the pre-multipath case for not shared use case. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NKeith Busch <kbusch@kernel.org> Reviewed-by: NJavier González <javier.gonz@samsung.com>
-
由 Christoph Hellwig 提交于
Don't bother defining a separate compat_ioctl handler, and just handle the NVME_IOCTL_SUBMIT_IO32 case inline. Also only defined it for those ABIs (currently just i386 vs x86_64) that are affected. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NKeith Busch <kbusch@kernel.org> Reviewed-by: NJavier González <javier.gonz@samsung.com>
-
由 Christoph Hellwig 提交于
Factor out a helper for the namespace based ioctls. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NKeith Busch <kbusch@kernel.org> Reviewed-by: NJavier González <javier.gonz@samsung.com> Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
-
由 Christoph Hellwig 提交于
Pass the proper user pointer instead of the not all that useful integer representation. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NKeith Busch <kbusch@kernel.org> Reviewed-by: NJavier González <javier.gonz@samsung.com>
-
由 Christoph Hellwig 提交于
Return false from nvme_set_disk_name and let the caller set the non-multipath name instead of duplicating the naming information in two places. Also remove the pointless local variables for the disk name and flags and the not needed ctrl argument. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NKeith Busch <kbusch@kernel.org> Reviewed-by: NJavier González <javier.gonz@samsung.com>
-
由 Minwoo Im 提交于
Move the multipath gendisk out of #ifdef CONFIG_NVME_MULTIPATH and add a new nvme_ns_head_multipath that uses it to check if a ns_head has a multipath device associated with it. Signed-off-by: NMinwoo Im <minwoo.im.dev@gmail.com> [hch: added the IS_ENABLED, converted a few existing users] Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NKeith Busch <kbusch@kernel.org> Reviewed-by: NJavier González <javier.gonz@samsung.com> Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
-
由 Niklas Cassel 提交于
There is a single trailing whitespace in core.c. Since this is just a single whitespace, the chances of this affecting backports to stable should be quite low, so let's just remove it. Signed-off-by: NNiklas Cassel <niklas.cassel@wdc.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Niklas Cassel 提交于
There is a single trailing whitespace in multipath.c. Since this is just a single whitespace, the chances of this affecting backports to stable should be quite low, so let's just remove it. Signed-off-by: NNiklas Cassel <niklas.cassel@wdc.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Niklas Cassel 提交于
There is a single trailing whitespace in pci.c. Since this is just a single whitespace, the chances of this affecting backports to stable should be quite low, so let's just remove it. Signed-off-by: NNiklas Cassel <niklas.cassel@wdc.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Niklas Cassel 提交于
According to the module parameter description for sgl_threshold, a value of 0 means that SGLs are disabled. If SGLs are disabled, we should respect that, even for the case where the request is made up of a single physical segment. Fixes: 29791057 ("nvme-pci: optimize mapping single segment requests using SGLs") Signed-off-by: NNiklas Cassel <niklas.cassel@wdc.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Colin Ian King 提交于
There is a spelling mistake in a pr_err error message. Fix it. Signed-off-by: NColin Ian King <colin.king@canonical.com> Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Amit Engel 提交于
Once a host is already created, avoid allocate additional hostports that will be thrown away. add an helper function to handle host search. Reviewed-by: NHimanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: NJames Smart <jsmart2021@gmail.com> Signed-off-by: NAmit Engel <amit.engel@dell.com> Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Elad Grupi 提交于
In case there is an io that contains inline data and it goes to parsing error flow, command response will free command and iov before clearing the data on the socket buffer. This will delay the command response until receive flow is completed. Fixes: 872d26a3 ("nvmet-tcp: add NVMe over TCP target driver") Signed-off-by: NElad Grupi <elad.grupi@dell.com> Signed-off-by: NHou Pu <houpu.main@gmail.com> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 13 4月, 2021 4 次提交
-
-
由 Christoph Hellwig 提交于
Lightnvm was an innovative idea to expose more low-level control over SSDs. But it failed to get properly standardized and remains a non-standarized extension to NVMe that requires vendor specific quirks for a few now mostly obsolete SSD devices. The standardized ZNS command set for NVMe has take over a lot of the approaches and allows for fully standardized operation. Remove the Linux code to support open channel SSDs as the few production deployments of the above mentioned SSDs are using userspace driver stacks instead of the fairly limited Linux support. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NJavier González <javier@javigon.com> Signed-off-by: NMatias Bjørling <mb@lightnvm.io> Link: https://lore.kernel.org/r/20210413105257.159260-5-matias.bjorling@wdc.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Zhang Yunkai 提交于
'linux/blkdev.h' and 'uapi/linux/lightnvm.h' included in 'lightnvm.h' is duplicated.It is also included in the 5th and 7th line. Signed-off-by: NZhang Yunkai <zhang.yunkai@zte.com.cn> Signed-off-by: NMatias Bjørling <matias.bjorling@wdc.com> Link: https://lore.kernel.org/r/20210413105257.159260-4-matias.bjorling@wdc.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Tian Tao 提交于
When memdup_user returns an error, memdup_user has two different return values, use PTR_ERR to get the correct return value. Signed-off-by: NTian Tao <tiantao6@hisilicon.com> Signed-off-by: NMatias Bjørling <matias.bjorling@wdc.com> Link: https://lore.kernel.org/r/20210413105257.159260-3-matias.bjorling@wdc.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Chaitanya Kulkarni 提交于
This fixs coccicheck warning: drivers/nvme//host/lightnvm.c:1243:60-61: WARNING opportunity for kobj_to_dev() Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: NMatias Bjørling <matias.bjorling@wdc.com> Link: https://lore.kernel.org/r/20210413105257.159260-2-matias.bjorling@wdc.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
- 12 4月, 2021 3 次提交
-
-
由 Christoph Hellwig 提交于
Now that md has been cleaned up we can get rid of this hack. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Max Gurtovoy 提交于
This will enable changing the virtual boundary of null blk devices. For now, null blk devices didn't have any restriction on the scatter/gather elements received from the block layer. Add a module parameter and a configfs option that will control the virtual boundary. This will enable testing the efficiency of the block layer bounce buffer in case a suitable application will send discontiguous IO to the given device. Initial testing with patched FIO showed the following results (64 jobs, 128 iodepth, 1 nullb device): IO size READ (virt=false) READ (virt=true) Write (virt=false) Write (virt=true) ---------- ------------------- ----------------- ------------------- ------------------- 1k 10.7M 8482k 10.8M 8471k 2k 10.4M 8266k 10.4M 8271k 4k 10.4M 8274k 10.3M 8226k 8k 10.2M 8131k 9800k 7933k 16k 9567k 7764k 8081k 6828k 32k 8865k 7309k 5570k 5153k 64k 7695k 6586k 2682k 2617k 128k 5346k 5489k 1320k 1296k Signed-off-by: NMax Gurtovoy <mgurtovoy@nvidia.com> Reviewed-by: NDamien Le Moal <damien.lemoal@wdc.com> Link: https://lore.kernel.org/r/20210412095523.278632-1-mgurtovoy@nvidia.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Chaitanya Kulkarni 提交于
Use the right name for the struct request variable that removes the following compilation error :- make --silent --keep-going --jobs=8 O=/home/tuxbuild/.cache/tuxmake/builds/1/tmp ARCH=sh CROSS_COMPILE=sh4-linux-gnu- 'CC=sccache sh4-linux-gnu-gcc' 'HOSTCC=sccache gcc' In file included from /builds/linux/include/linux/scatterlist.h:9, from /builds/linux/include/linux/dma-mapping.h:10, from /builds/linux/drivers/cdrom/gdrom.c:16: /builds/linux/drivers/cdrom/gdrom.c: In function 'gdrom_readdisk_dma': /builds/linux/drivers/cdrom/gdrom.c:586:61: error: 'rq' undeclared (first use in this function) 586 | __raw_writel(page_to_phys(bio_page(req->bio)) + bio_offset(rq->bio), | ^~ Fixes: 1d2c8200 ("gdrom: support highmem") Reported-by: NNaresh Kamboju <naresh.kamboju@linaro.org> Tested-by: NNaresh Kamboju <naresh.kamboju@linaro.org> Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 11 4月, 2021 7 次提交
-
-
由 Coly Li 提交于
The patch "bcache: remove PTR_CACHE" introduces a compiling failure in debug.c with following error message, In file included from drivers/md/bcache/bcache.h:182:0, from drivers/md/bcache/debug.c:9: drivers/md/bcache/debug.c: In function 'bch_btree_verify': drivers/md/bcache/debug.c:53:19: error: 'c' undeclared (first use in this function) bio_set_dev(bio, c->cache->bdev); ^ This patch fixes the regression by replacing c->cache->bdev by b->c-> cache->bdev. Signed-off-by: NColy Li <colyli@suse.de> Cc: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210411134316.80274-8-colyli@suse.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Gustavo A. R. Silva 提交于
Cast multiple variables to (int64_t) in order to give the compiler complete information about the proper arithmetic to use. Notice that these variables are being used in contexts that expect expressions of type int64_t (64 bit, signed). And currently, such expressions are being evaluated using 32-bit arithmetic. Fixes: d0cf9503 ("octeontx2-pf: ethtool fec mode support") Addresses-Coverity-ID: 1501724 ("Unintentional integer overflow") Addresses-Coverity-ID: 1501725 ("Unintentional integer overflow") Addresses-Coverity-ID: 1501726 ("Unintentional integer overflow") Signed-off-by: NGustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: NColy Li <colyli@suse.de> Link: https://lore.kernel.org/r/20210411134316.80274-7-colyli@suse.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Bhaskar Chowdhury 提交于
s/condidate/candidate/ s/folowing/following/ Signed-off-by: NBhaskar Chowdhury <unixbhaskar@gmail.com> Signed-off-by: NColy Li <colyli@suse.de> Link: https://lore.kernel.org/r/20210411134316.80274-6-colyli@suse.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Arnd Bergmann 提交于
building with 'make W=1' shows a harmless warning for each user of the EBUG_ON() macro: drivers/md/bcache/bset.c: In function 'bch_btree_sort_partial': drivers/md/bcache/util.h:30:55: error: suggest braces around empty body in an 'if' statement [-Werror=empty-body] 30 | #define EBUG_ON(cond) do { if (cond); } while (0) | ^ drivers/md/bcache/bset.c:1312:9: note: in expansion of macro 'EBUG_ON' 1312 | EBUG_ON(oldsize >= 0 && bch_count_data(b) != oldsize); | ^~~~~~~ Reword the macro slightly to avoid the warning. Signed-off-by: NArnd Bergmann <arnd@arndb.de> Signed-off-by: NColy Li <colyli@suse.de> Link: https://lore.kernel.org/r/20210411134316.80274-5-colyli@suse.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Yang Li 提交于
This fixes the following sparse warnings: drivers/md/bcache/features.c:22:16: warning: Using plain integer as NULL pointer Reported-by: NAbaci Robot <abaci@linux.alibaba.com> Signed-off-by: NYang Li <yang.lee@linux.alibaba.com> Signed-off-by: NColy Li <colyli@suse.de> Link: https://lore.kernel.org/r/20210411134316.80274-4-colyli@suse.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Remove the PTR_CACHE inline and replace it with a direct dereference of c->cache. (Coly Li: fix the typo from PTR_BUCKET to PTR_CACHE in commit log) Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NColy Li <colyli@suse.de> Link: https://lore.kernel.org/r/20210411134316.80274-3-colyli@suse.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Zhiqiang Liu 提交于
In bch_cached_dev_run(), free(env[1])|free(env[2])|free(buf) show up three times. This patch introduce out tag in which free(env[1])|free(env[2])|free(buf) are only called one time. If we need to call free() when errors occur, we can set error code to ret, and then goto out tag directly. Signed-off-by: NZhiqiang Liu <liuzhiqiang26@huawei.com> Signed-off-by: NColy Li <colyli@suse.de> Link: https://lore.kernel.org/r/20210411134316.80274-2-colyli@suse.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
- 08 4月, 2021 4 次提交
-
-
由 Jens Axboe 提交于
Merge branch 'md-next' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md into for-5.13/drivers Pull MD updates from Song: "These patches fix a race condition with md_release() and md_open()." * 'md-next' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md: md: split mddev_find md: factor out a mddev_find_locked helper from mddev_find md: md_open returns -EBUSY when entering racing area
-
由 Christoph Hellwig 提交于
Split mddev_find into a simple mddev_find that just finds an existing mddev by the unit number, and a more complicated mddev_find that deals with find or allocating a mddev. This turns out to fix this bug reported by Zhao Heming. ----------------------------- snip ------------------------------ commit d3374825 ("md: make devices disappear when they are no longer needed.") introduced protection between mddev creating & removing. The md_open shouldn't create mddev when all_mddevs list doesn't contain mddev. With currently code logic, there will be very easy to trigger soft lockup in non-preempt env. *** env *** kvm-qemu VM 2C1G with 2 iscsi luns kernel should be non-preempt *** script *** about trigger 1 time with 10 tests `1 node1="15sp3-mdcluster1" 2 node2="15sp3-mdcluster2" 3 4 mdadm -Ss 5 ssh ${node2} "mdadm -Ss" 6 wipefs -a /dev/sda /dev/sdb 7 mdadm -CR /dev/md0 -b clustered -e 1.2 -n 2 -l mirror /dev/sda \ /dev/sdb --assume-clean 8 9 for i in {1..100}; do 10 echo ==== $i ====; 11 12 echo "test ...." 13 ssh ${node2} "mdadm -A /dev/md0 /dev/sda /dev/sdb" 14 sleep 1 15 16 echo "clean ....." 17 ssh ${node2} "mdadm -Ss" 18 done ` I use mdcluster env to trigger soft lockup, but it isn't mdcluster speical bug. To stop md array in mdcluster env will do more jobs than non-cluster array, which will leave enough time/gap to allow kernel to run md_open. *** stack *** `ID: 2831 TASK: ffff8dd7223b5040 CPU: 0 COMMAND: "mdadm" #0 [ffffa15d00a13b90] __schedule at ffffffffb8f1935f #1 [ffffa15d00a13ba8] exact_lock at ffffffffb8a4a66d #2 [ffffa15d00a13bb0] kobj_lookup at ffffffffb8c62fe3 #3 [ffffa15d00a13c28] __blkdev_get at ffffffffb89273b9 #4 [ffffa15d00a13c98] blkdev_get at ffffffffb8927964 #5 [ffffa15d00a13cb0] do_dentry_open at ffffffffb88dc4b4 #6 [ffffa15d00a13ce0] path_openat at ffffffffb88f0ccc #7 [ffffa15d00a13db8] do_filp_open at ffffffffb88f32bb #8 [ffffa15d00a13ee0] do_sys_open at ffffffffb88ddc7d #9 [ffffa15d00a13f38] do_syscall_64 at ffffffffb86053cb ffffffffb900008c or: [ 884.226509] mddev_put+0x1c/0xe0 [md_mod] [ 884.226515] md_open+0x3c/0xe0 [md_mod] [ 884.226518] __blkdev_get+0x30d/0x710 [ 884.226520] ? bd_acquire+0xd0/0xd0 [ 884.226522] blkdev_get+0x14/0x30 [ 884.226524] do_dentry_open+0x204/0x3a0 [ 884.226531] path_openat+0x2fc/0x1520 [ 884.226534] ? seq_printf+0x4e/0x70 [ 884.226536] do_filp_open+0x9b/0x110 [ 884.226542] ? md_release+0x20/0x20 [md_mod] [ 884.226543] ? seq_read+0x1d8/0x3e0 [ 884.226545] ? kmem_cache_alloc+0x18a/0x270 [ 884.226547] ? do_sys_open+0x1bd/0x260 [ 884.226548] do_sys_open+0x1bd/0x260 [ 884.226551] do_syscall_64+0x5b/0x1e0 [ 884.226554] entry_SYSCALL_64_after_hwframe+0x44/0xa9 ` *** rootcause *** "mdadm -A" (or other array assemble commands) will start a daemon "mdadm --monitor" by default. When "mdadm -Ss" is running, the stop action will wakeup "mdadm --monitor". The "--monitor" daemon will immediately get info from /proc/mdstat. This time mddev in kernel still exist, so /proc/mdstat still show md device, which makes "mdadm --monitor" to open /dev/md0. The previously "mdadm -Ss" is removing action, the "mdadm --monitor" open action will trigger md_open which is creating action. Racing is happening. `<thread 1>: "mdadm -Ss" md_release mddev_put deletes mddev from all_mddevs queue_work for mddev_delayed_delete at this time, "/dev/md0" is still available for opening <thread 2>: "mdadm --monitor ..." md_open + mddev_find can't find mddev of /dev/md0, and create a new mddev and | return. + trigger "if (mddev->gendisk != bdev->bd_disk)" and return -ERESTARTSYS. ` In non-preempt kernel, <thread 2> is occupying on current CPU. and mddev_delayed_delete which was created in <thread 1> also can't be schedule. In preempt kernel, it can also trigger above racing. But kernel doesn't allow one thread running on a CPU all the time. after <thread 2> running some time, the later "mdadm -A" (refer above script line 13) will call md_alloc to alloc a new gendisk for mddev. it will break md_open statement "if (mddev->gendisk != bdev->bd_disk)" and return 0 to caller, the soft lockup is broken. ------------------------------ snip ------------------------------ Cc: stable@vger.kernel.org Fixes: d3374825 ("md: make devices disappear when they are no longer needed.") Reported-by: NHeming Zhao <heming.zhao@suse.com> Reviewed-by: NHeming Zhao <heming.zhao@suse.com> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NSong Liu <song@kernel.org>
-
由 Christoph Hellwig 提交于
Factor out a self-contained helper to just lookup a mddev by the dev_t "unit". Cc: stable@vger.kernel.org Reviewed-by: NHeming Zhao <heming.zhao@suse.com> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NSong Liu <song@kernel.org>
-
由 Zhao Heming 提交于
commit d3374825 ("md: make devices disappear when they are no longer needed.") introduced protection between mddev creating & removing. The md_open shouldn't create mddev when all_mddevs list doesn't contain mddev. With currently code logic, there will be very easy to trigger soft lockup in non-preempt env. This patch changes md_open returning from -ERESTARTSYS to -EBUSY, which will break the infinitely retry when md_open enter racing area. This patch is partly fix soft lockup issue, full fix needs mddev_find is split into two functions: mddev_find & mddev_find_or_alloc. And md_open should call new mddev_find (it only does searching job). For more detail, please refer with Christoph's "split mddev_find" patch in later commits. *** env *** kvm-qemu VM 2C1G with 2 iscsi luns kernel should be non-preempt *** script *** about trigger every time with below script ``` 1 node1="mdcluster1" 2 node2="mdcluster2" 3 4 mdadm -Ss 5 ssh ${node2} "mdadm -Ss" 6 wipefs -a /dev/sda /dev/sdb 7 mdadm -CR /dev/md0 -b clustered -e 1.2 -n 2 -l mirror /dev/sda \ /dev/sdb --assume-clean 8 9 for i in {1..10}; do 10 echo ==== $i ====; 11 12 echo "test ...." 13 ssh ${node2} "mdadm -A /dev/md0 /dev/sda /dev/sdb" 14 sleep 1 15 16 echo "clean ....." 17 ssh ${node2} "mdadm -Ss" 18 done ``` I use mdcluster env to trigger soft lockup, but it isn't mdcluster speical bug. To stop md array in mdcluster env will do more jobs than non-cluster array, which will leave enough time/gap to allow kernel to run md_open. *** stack *** ``` [ 884.226509] mddev_put+0x1c/0xe0 [md_mod] [ 884.226515] md_open+0x3c/0xe0 [md_mod] [ 884.226518] __blkdev_get+0x30d/0x710 [ 884.226520] ? bd_acquire+0xd0/0xd0 [ 884.226522] blkdev_get+0x14/0x30 [ 884.226524] do_dentry_open+0x204/0x3a0 [ 884.226531] path_openat+0x2fc/0x1520 [ 884.226534] ? seq_printf+0x4e/0x70 [ 884.226536] do_filp_open+0x9b/0x110 [ 884.226542] ? md_release+0x20/0x20 [md_mod] [ 884.226543] ? seq_read+0x1d8/0x3e0 [ 884.226545] ? kmem_cache_alloc+0x18a/0x270 [ 884.226547] ? do_sys_open+0x1bd/0x260 [ 884.226548] do_sys_open+0x1bd/0x260 [ 884.226551] do_syscall_64+0x5b/0x1e0 [ 884.226554] entry_SYSCALL_64_after_hwframe+0x44/0xa9 ``` *** rootcause *** "mdadm -A" (or other array assemble commands) will start a daemon "mdadm --monitor" by default. When "mdadm -Ss" is running, the stop action will wakeup "mdadm --monitor". The "--monitor" daemon will immediately get info from /proc/mdstat. This time mddev in kernel still exist, so /proc/mdstat still show md device, which makes "mdadm --monitor" to open /dev/md0. The previously "mdadm -Ss" is removing action, the "mdadm --monitor" open action will trigger md_open which is creating action. Racing is happening. ``` <thread 1>: "mdadm -Ss" md_release mddev_put deletes mddev from all_mddevs queue_work for mddev_delayed_delete at this time, "/dev/md0" is still available for opening <thread 2>: "mdadm --monitor ..." md_open + mddev_find can't find mddev of /dev/md0, and create a new mddev and | return. + trigger "if (mddev->gendisk != bdev->bd_disk)" and return -ERESTARTSYS. ``` In non-preempt kernel, <thread 2> is occupying on current CPU. and mddev_delayed_delete which was created in <thread 1> also can't be schedule. In preempt kernel, it can also trigger above racing. But kernel doesn't allow one thread running on a CPU all the time. after <thread 2> running some time, the later "mdadm -A" (refer above script line 13) will call md_alloc to alloc a new gendisk for mddev. it will break md_open statement "if (mddev->gendisk != bdev->bd_disk)" and return 0 to caller, the soft lockup is broken. Cc: stable@vger.kernel.org Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NZhao Heming <heming.zhao@suse.com> Signed-off-by: NSong Liu <song@kernel.org>
-
- 06 4月, 2021 4 次提交
-
-
由 Guobin Huang 提交于
spinlock can be initialized automatically with DEFINE_SPINLOCK() rather than explicitly calling spin_lock_init(). Reported-by: NHulk Robot <hulkci@huawei.com> Signed-off-by: NGuobin Huang <huangguobin4@huawei.com> Link: https://lore.kernel.org/r/1617710988-49205-1-git-send-email-huangguobin4@huawei.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
swim3 only uses the virtual address of a bio to stash it into the data transfer using virt_to_bus. But the ppc32 virt_to_bus just uses the physical address with an offset. Replace virt_to_bus with a local hack that performs the equivalent transformation and stop asking for block layer bounce buffering. Signed-off-by: NChristoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210406061839.811588-1-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Always use the track buffer that is already used for addresses outside the 16MB address capability of the floppy controller. This allows to remove a lot of code that relies on kernel virtual addresses. With this gone there is just a single place left that looks at the bio, which can be converted to memcpy_{from,to}_page, thus removing the need for the extra block-layer bounce buffering for highmem pages. Signed-off-by: NChristoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210406061755.811522-1-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
m68k doesn't support highmem, so don't bother enabling the block layer bounce buffer code. Just for safety throw in a depend on !HIGHMEM. Signed-off-by: NChristoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210406061725.811389-1-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-