- 20 1月, 2022 1 次提交
-
-
由 zhangwensheng 提交于
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I4QXS1?from=project-issue CVE: NA -------------------------------- UBSAN reports this problem: [ 5984.281385] UBSAN: Undefined behaviour in drivers/md/md.c:8175:15 [ 5984.281390] signed integer overflow: [ 5984.281393] -2147483291 - 2072033152 cannot be represented in type 'int' [ 5984.281400] CPU: 25 PID: 1854 Comm: md101_resync Kdump: loaded Not tainted 4.19.90 [ 5984.281404] Hardware name: Huawei TaiShan 200 (Model 5280)/BC82AMDDA [ 5984.281406] Call trace: [ 5984.281415] dump_backtrace+0x0/0x310 [ 5984.281418] show_stack+0x28/0x38 [ 5984.281425] dump_stack+0xec/0x15c [ 5984.281430] ubsan_epilogue+0x18/0x84 [ 5984.281434] handle_overflow+0x14c/0x19c [ 5984.281439] __ubsan_handle_sub_overflow+0x34/0x44 [ 5984.281445] is_mddev_idle+0x338/0x3d8 [ 5984.281449] md_do_sync+0x1bb8/0x1cf8 [ 5984.281452] md_thread+0x220/0x288 [ 5984.281457] kthread+0x1d8/0x1e0 [ 5984.281461] ret_from_fork+0x10/0x18 When the stat aacum of the disk is greater than INT_MAX, its value becomes negative after casting to 'int', which may lead to overflow after subtracting a positive number. In the same way, when the value of sync_io is greater than INT_MAX,overflow may also occur. These situations will lead to undefined behavior. Otherwise, if the stat accum of the disk is close to INT_MAX when creating raid arrays, the initial value of last_events would be set close to INT_MAX when mddev initializes IO event counters. 'curr_events - rdev->last_events > 64' will always false during synchronization. If all the disks of mddev are in this case, is_mddev_idle() will always return 1, which may cause non-sync IO is very slow. To address these problems, need to use 64bit signed integer type for sync_io,last_events, and curr_events. Signed-off-by: Nzhangwensheng <zhangwensheng5@huawei.com> Reviewed-by: NTao Hou <houtao1@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
- 14 1月, 2022 2 次提交
-
-
由 Joe Thornber 提交于
stable inclusion from stable-v5.10.88 commit 0e21e6cd5eebfc929ac5fa3b97ca2d4ace3cb6a3 bugzilla: 186058 https://gitee.com/openeuler/kernel/issues/I4QW6A Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=0e21e6cd5eebfc929ac5fa3b97ca2d4ace3cb6a3 -------------------------------- commit 1b8d2789 upstream. Move dm_tm_unlock() after dm_tm_dec(). Cc: stable@vger.kernel.org Signed-off-by: NJoe Thornber <ejt@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Markus Hochholdinger 提交于
stable inclusion from stable-v5.10.85 commit 8b4264c27b821d6b3550fd67c0169cbc5549db8c bugzilla: 186032 https://gitee.com/openeuler/kernel/issues/I4QVI4 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=8b4264c27b821d6b3550fd67c0169cbc5549db8c -------------------------------- commit 55df1ce0 upstream. The superblock of version 1.0 doesn't get moved to the new position on a device size change. This leads to a rdev without a superblock on a known position, the raid can't be re-assembled. The line was removed by mistake and is re-added by this patch. Fixes: d9c0fa50 ("md: fix max sectors calculation for super 1.0") Cc: stable@vger.kernel.org Signed-off-by: NMarkus Hochholdinger <markus@hochholdinger.net> Reviewed-by: NXiao Ni <xni@redhat.com> Signed-off-by: NSong Liu <songliubraving@fb.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
- 12 1月, 2022 7 次提交
-
-
由 Zheng Zengkai 提交于
euleros/rtos inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I4LOJ6 CVE: NA -------------------------------- This patch set introduce many conflicts while backporting mainline bcache patches, revert it temporarily. This reverts commit a460ae11. Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
-
由 Zheng Zengkai 提交于
euleros/rtos inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I4LOJ6 CVE: NA -------------------------------- This patch set introduce many conflicts while backporting mainline bcache patches, revert it temporarily. This reverts commit 30dc9d9c. Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
-
由 Zheng Zengkai 提交于
euleros/rtos inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I4LOJ6 CVE: NA -------------------------------- This patch set introduce many conflicts while backporting mainline bcache patches, revert it temporarily. This reverts commit 08a3ac0e. Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
-
由 Zheng Zengkai 提交于
euleros/rtos inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I4LOJ6 CVE: NA -------------------------------- This patch set introduce many conflicts while backporting mainline bcache patches, revert it temporarily. This reverts commit 1751c6ad. Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
-
由 Zheng Zengkai 提交于
euleros/rtos inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I4LOJ6 CVE: NA -------------------------------- This patch set introduce many conflicts while backporting mainline bcache patches, revert it temporarily. This reverts commit e8c75ee9. Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
-
由 Zheng Zengkai 提交于
euleros/rtos inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I4LOJ6 CVE: NA -------------------------------- This patch set introduce many conflicts while backporting mainline bcache patches, revert it temporarily. This reverts commit 6214e257. Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
-
由 Zheng Zengkai 提交于
euleros/rtos inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I4LOJ6 CVE: NA -------------------------------- This patch set introduce many conflicts while backporting mainline bcache patches, revert it temporarily. This reverts commit 4c16a439. Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
-
- 10 1月, 2022 7 次提交
-
-
由 Li Ruilin 提交于
euleros/rtos inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I4LOJ6 CVE: NA -------------------------------- Before this patch bypassed I/O request will not record start time, therefore when calculating latency data, start time will be calculated wrongly. This patch makes start time always be recorded to fix this. Signed-off-by: NLi Ruilin <liruilin4@huawei.com> Reviewed-by: NSong Chao <chao.song@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com> Signed-off-by: NCheng Jian <cj.chengjian@huawei.com> Reviewed-by: NGuangxing Deng <dengguangxing@huawei.com> Reviewed-by: Nchao song <chao.song@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Li Ruilin 提交于
euleros/rtos inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I4LOJ6 CVE: NA -------------------------------- commit 6947676c374("bcache: add a framework to perform prefetch") collects data insert info which includes device info got from bio. However, bio created by write_moving here has no device info, causing a null pointer dereference. [ 1497.991768] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 [ 1497.991869] PGD 0 P4D 0 [ 1497.991912] Oops: 0000 [#1] SMP PTI [ 1497.991962] CPU: 2 PID: 733 Comm: kworker/2:3 Not tainted 4.19.90+ #33 [ 1497.992030] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 [ 1497.992137] Workqueue: bcache_gc write_moving [bcache] [ 1497.992219] RIP: 0010:bch_data_insert+0x4c/0x140 [bcache] ... [ 1497.993367] Call Trace: [ 1497.993427] ? cached_dev_read_error+0x140/0x140 [bcache] [ 1497.993526] write_moving+0x19e/0x1b0 [bcache] [ 1497.993621] process_one_work+0x1fd/0x440 [ 1497.993742] worker_thread+0x34/0x410 [ 1497.993811] kthread+0x121/0x140 [ 1497.993873] ? process_one_work+0x440/0x440 [ 1497.993946] ? kthread_create_worker_on_cpu+0x70/0x70 [ 1497.994043] ret_from_fork+0x35/0x40 Signed-off-by: NLi Ruilin <liruilin4@huawei.com> Review-by: NSong Chao <chao.song@huawei.com> Review-by: NXu Wei <xuwei56@huawei.com> Signed-off-by: NLi Ruilin <liruilin4@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com> Signed-off-by: NCheng Jian <cj.chengjian@huawei.com> Reviewed-by: NGuangxing Deng <dengguangxing@huawei.com> Reviewed-by: Nchao song <chao.song@huawei.com> Reviewed-by: Nchao song <chao.song@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Li Ruilin 提交于
euleros inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I4LOJ6 CVE: NA ------------------------------ The recently pushd bugfix patch "Delay to invalidate cache data in writearound write" has a stupid copy&paste bug, which causes bypassed write request will never invalidate data in cache device, causing a data corruption. This patch fixes this corruption. This patch also ensures that the writeback lock is released after data insert. Fixes: 6a1d9c41b367 ("bcache: Delay to invalidate cache data in writearound write") Signed-off-by: NLi Ruilin <liruilin4@huawei.com> Signed-off-by: NSong Chao <chao.song@huawei.com> Reviewed-by: NPeng Junyi <pengjunyi1@huawei.com> Acked-by: NXie Xiuqi <xiexiuqi@huawei.com> Signed-off-by: NCheng Jian <cj.chengjian@huawei.com> Reviewed-by: NGuangxing Deng <dengguangxing@huawei.com> Reviewed-by: Nchao song <chao.song@huawei.com> Reviewed-by: Nchao song <chao.song@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Li Ruilin 提交于
euleros inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I4LOJ6 CVE: NA ------------------------------ In writearound cache mode, read request quickly followed by write request may overwrite the invalidate bkey inserted by the write request. The function bch_data_insert() is invoked asynchronously as the bio subbmited to backend block device, therefore there may be a read request subbmited after the bch_data_insert() done and ended before the backend bio is end. This read request will read data from the backend block device, and insert dirty data to cache device. However by writearound cache mode, bcache will not invalidate data again, so that read request after will read dirty data from the cache, causing a data corruption. By this patch we delay the invalidation to end of backend bio to avoid this corruption. Signed-off-by: NLi Ruilin <liruilin4@huawei.com> Reviewed-by: NLuan Jianhai <luanjianhai@huawei.com> Reviewed-by: NPeng Junyi <pengjunyi1@huawei.com> Acked-by: NXie Xiuqi <xiexiuqi@huawei.com> Signed-off-by: NCheng Jian <cj.chengjian@huawei.com> Reviewed-by: NGuangxing Deng <dengguangxing@huawei.com> Reviewed-by: Nchao song <chao.song@huawei.com> Reviewed-by: Nchao song <chao.song@huawei.com> Reviewed-by: Nchao song <chao.song@huawei.com> Reviewed-by: Nchao song <chao.song@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Li Ruilin 提交于
euleros inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I4LOJ6 CVE: NA ------------------------------ Add a list to save all prefetch requests. When an IO request comes, check if the request has overlap with some of prefetch requests. If it das have, block the request until the prefetch request is end. Add a switch to control whether to enable this. If not enabled, count the overlapped IO request as a fake hit for performance analysis. Signed-off-by: NLi Ruilin <liruilin4@huawei.com> Reviewed-by: NLuan Jianhai <luanjianhai@huawei.com> Reviewed-by: NPeng Junyi <pengjunyi1@huawei.com> Acked-by: NXie Xiuqi <xiexiuqi@huawei.com> Signed-off-by: NCheng Jian <cj.chengjian@huawei.com> Reviewed-by: NGuangxing Deng <dengguangxing@huawei.com> Reviewed-by: Nchao song <chao.song@huawei.com> Reviewed-by: Nchao song <chao.song@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Li Ruilin 提交于
euleros inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I4LOJ6 CVE: NA ------------------------------ provide a switch named read_bypass. If enbale, all IO requests will bypass the cache. This option could be useful when we enable userspace prefetch and the cache device is low capacity. Signed-off-by: NLi Ruilin <liruilin4@huawei.com> Reviewed-by: NLuan Jianhai <luanjianhai@huawei.com> Reviewed-by: NPeng Junyi <pengjunyi1@huawei.com> Acked-by: NXie Xiuqi <xiexiuqi@huawei.com> Signed-off-by: NCheng Jian <cj.chengjian@huawei.com> Reviewed-by: NGuangxing Deng <dengguangxing@huawei.com> Reviewed-by: Nchao song <chao.song@huawei.com> Reviewed-by: Nchao song <chao.song@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Li Ruilin 提交于
euleros inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I4LOJ6 CVE: NA ------------------------------ Add a framwork to transform io informations to userspace client and process prefetch request sent by userspace client. Create a char device namede "acache" for connecting between kernelspace and userspace. Save informations of all io requests into a buffer and pass them to client when client reads from the device. The prefetch request could be treated as normal io request. As deference, those requests have no need return data back to userspace, and they should not append readahead part. Add two parameters. acache_dev_size is for controlling size of buffer to save io informations. acache_prefetch_workers is for controlling max threads to process prefetch requests. Signed-off-by: NLi Ruilin <liruilin4@huawei.com> Reviewed-by: NLuan Jianhai <luanjianhai@huawei.com> Reviewed-by: NPeng Junyi <pengjunyi1@huawei.com> Acked-by: NXie Xiuqi <xiexiuqi@huawei.com> Signed-off-by: NCheng Jian <cj.chengjian@huawei.com> Reviewed-by: NGuangxing Deng <dengguangxing@huawei.com> Reviewed-by: Nchao song <chao.song@huawei.com> Reviewed-by: Nchao song <chao.song@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
- 23 12月, 2021 1 次提交
-
-
由 Yufen Yu 提交于
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I4JYYO?from=project-issue CVE: NA --------------------------- We get a NULL pointer dereference oops when test raid1 as follow: mdadm -CR /dev/md1 -l 1 -n 2 /dev/sd[ab] mdadm /dev/md1 -f /dev/sda mdadm /dev/md1 -r /dev/sda mdadm /dev/md1 -a /dev/sda sleep 5 mdadm /dev/md1 -f /dev/sdb mdadm /dev/md1 -r /dev/sdb mdadm /dev/md1 -a /dev/sdb After a disk(/dev/sda) has been removed, we add the disk to raid array again, which would trigger recovery action. Since the rdev current state is 'spare', read/write bio can be issued to the disk. Then we set the other disk (/dev/sdb) faulty. Since the raid array is now in degraded state and /dev/sdb is the only 'In_sync' disk, raid1_error() will return but without set faulty success. However, that can interrupt the recovery action and md_check_recovery will try to call remove_and_add_spares() to remove the spare disk. And the race condition between remove_and_add_spares() and raid1_write_request() in follow can cause NULL pointer dereference for conf->mirrors[i].rdev: raid1_write_request() md_check_recovery raid1_error() rcu_read_lock() rdev != NULL !test_bit(Faulty, &rdev->flags) conf->recovery_disabled= mddev->recovery_disabled; return busy remove_and_add_spares raid1_remove_disk rdev->nr_pending == 0 atomic_inc(&rdev->nr_pending); rcu_read_unlock() p->rdev=NULL conf->mirrors[i].rdev->data_offset NULL pointer deref!!! if (!test_bit(RemoveSynchronized, &rdev->flags)) synchronize_rcu(); p->rdev=rdev To fix the race condition, we add a new flag 'WantRemove' for rdev. Before access conf->mirrors[i].rdev, we need to ensure the rdev without 'WantRemove' bit. Link: https://marc.info/?l=linux-raid&m=156412052717709&w=2Reported-by: NZou Wei <zou_wei@huawei.com> Signed-off-by: NYufen Yu <yuyufen@huawei.com> Confilct: drivers/md/md.h Signed-off-by: NLaibin Qiu <qiulaibin@huawei.com> Reviewed-by: Nyuyufen <yuyufen@huawei.com> Reviewed-by: NJason Yan <yanaijie@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
- 06 12月, 2021 1 次提交
-
-
由 Xiao Ni 提交于
stable inclusion from stable-5.10.80 commit 2338c3501726895c1657adda2308fcf9e6f17449 bugzilla: 185821 https://gitee.com/openeuler/kernel/issues/I4L7CG Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=2338c3501726895c1657adda2308fcf9e6f17449 -------------------------------- [ Upstream commit 8b9e2291 ] When the in memory flag is changed, we need to persist the change in the rdev superblock flags. This is needed for "writemostly" and "failfast". Reviewed-by: NLi Feng <fengli@smartx.com> Signed-off-by: NXiao Ni <xni@redhat.com> Signed-off-by: NSong Liu <songliubraving@fb.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Reviewed-by: NWeilong Chen <chenweilong@huawei.com> Acked-by: NWeilong Chen <chenweilong@huawei.com> Signed-off-by: NChen Jun <chenjun102@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
- 15 11月, 2021 3 次提交
-
-
由 Ming Lei 提交于
mainline inclusion from mainline-v5.16 commit a1c2f7e7 category: bugfix bugzilla: 182378 https://gitee.com/openeuler/kernel/issues/I4DDEL Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a1c2f7e7f25c9d35d3bf046f99682c5373b20fa2 --------------------------- For fixing queue quiesce race between driver and block layer(elevator switch, update nr_requests, ...), we need to support concurrent quiesce and unquiesce, which requires the two call to be balanced. __bind() is only called from dm_swap_table() in which dm device has been suspended already, so not necessary to stop queue again. With this way, request queue quiesce and unquiesce can be balanced. Reported-by: NYi Zhang <yi.zhang@redhat.com> Fixes: e70feb8b ("blk-mq: support concurrent queue quiesce/unquiesce") Signed-off-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NYu Kuai <yukuai3@huawei.com> Reviewed-by: NJason Yan <yanaijie@huawei.com> Signed-off-by: NChen Jun <chenjun102@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Christoph Hellwig 提交于
stable inclusion from stable-5.10.70 commit b18ba3f477a2fdd12d2ca2e01d2bd874968714e2 bugzilla: 182949 https://gitee.com/openeuler/kernel/issues/I4I3GQ Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=b18ba3f477a2fdd12d2ca2e01d2bd874968714e2 -------------------------------- [ Upstream commit 7df835a3 ] Commit b0140891 ("md: Fix race when creating a new md device.") not only moved assigning mddev->gendisk before calling add_disk, which fixes the races described in the commit log, but also added a mddev->open_mutex critical section over add_disk and creation of the md kobj. Adding a kobject after add_disk is racy vs deleting the gendisk right after adding it, but md already prevents against that by holding a mddev->active reference. On the other hand taking this lock added a lock order reversal with what is not disk->open_mutex (used to be bdev->bd_mutex when the commit was added) for partition devices, which need that lock for the internal open for the partition scan, and a recent commit also takes it for non-partitioned devices, leading to further lockdep splatter. Fixes: b0140891 ("md: Fix race when creating a new md device.") Fixes: d6263387 ("block: support delayed holder registration") Reported-by: syzbot+fadc0aaf497e6a493b9f@syzkaller.appspotmail.com Signed-off-by: NChristoph Hellwig <hch@lst.de> Tested-by: syzbot+fadc0aaf497e6a493b9f@syzkaller.appspotmail.com Reviewed-by: NNeilBrown <neilb@suse.de> Signed-off-by: NSong Liu <songliubraving@fb.com> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: NWeilong Chen <chenweilong@huawei.com> Signed-off-by: NChen Jun <chenjun102@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Sami Tolvanen 提交于
stable inclusion from stable-5.10.70 commit 55e6f8b3c0f5cc600df12ddd0371d2703b910fd7 bugzilla: 182949 https://gitee.com/openeuler/kernel/issues/I4I3GQ Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=55e6f8b3c0f5cc600df12ddd0371d2703b910fd7 -------------------------------- [ Upstream commit 4f0f586b ] list_sort() internally casts the comparison function passed to it to a different type with constant struct list_head pointers, and uses this pointer to call the functions, which trips indirect call Control-Flow Integrity (CFI) checking. Instead of removing the consts, this change defines the list_cmp_func_t type and changes the comparison function types of all list_sort() callers to use const pointers, thus avoiding type mismatches. Suggested-by: NNick Desaulniers <ndesaulniers@google.com> Signed-off-by: NSami Tolvanen <samitolvanen@google.com> Reviewed-by: NNick Desaulniers <ndesaulniers@google.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NKees Cook <keescook@chromium.org> Tested-by: NNick Desaulniers <ndesaulniers@google.com> Tested-by: NNathan Chancellor <nathan@kernel.org> Signed-off-by: NKees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20210408182843.1754385-10-samitolvanen@google.comSigned-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: NWeilong Chen <chenweilong@huawei.com> Signed-off-by: NChen Jun <chenjun102@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
- 21 10月, 2021 1 次提交
-
-
由 Arne Welzel 提交于
stable inclusion from stable-5.10.67 commit 7509c4cb7c8050177da9ee5e053c0c3d55bb66b7 bugzilla: 182619 https://gitee.com/openeuler/kernel/issues/I4EWO7 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=7509c4cb7c8050177da9ee5e053c0c3d55bb66b7 -------------------------------- commit 528b16bf upstream. On systems with many cores using dm-crypt, heavy spinlock contention in percpu_counter_compare() can be observed when the page allocation limit for a given device is reached or close to be reached. This is due to percpu_counter_compare() taking a spinlock to compute an exact result on potentially many CPUs at the same time. Switch to non-exact comparison of allocated and allowed pages by using the value returned by percpu_counter_read_positive() to avoid taking the percpu_counter spinlock. This may over/under estimate the actual number of allocated pages by at most (batch-1) * num_online_cpus(). Currently, batch is bounded by 32. The system on which this issue was first observed has 256 CPUs and 512GB of RAM. With a 4k page size, this change may over/under estimate by 31MB. With ~10G (2%) allowed dm-crypt allocations, this seems an acceptable error. Certainly preferred over running into the spinlock contention. This behavior was reproduced on an EC2 c5.24xlarge instance with 96 CPUs and 192GB RAM as follows, but can be provoked on systems with less CPUs as well. * Disable swap * Tune vm settings to promote regular writeback $ echo 50 > /proc/sys/vm/dirty_expire_centisecs $ echo 25 > /proc/sys/vm/dirty_writeback_centisecs $ echo $((128 * 1024 * 1024)) > /proc/sys/vm/dirty_background_bytes * Create 8 dmcrypt devices based on files on a tmpfs * Create and mount an ext4 filesystem on each crypt devices * Run stress-ng --hdd 8 within one of above filesystems Total %system usage collected from sysstat goes to ~35%. Write throughput on the underlying loop device is ~2GB/s. perf profiling an individual kworker kcryptd thread shows the following profile, indicating spinlock contention in percpu_counter_compare(): 99.98% 0.00% kworker/u193:46 [kernel.kallsyms] [k] ret_from_fork | --ret_from_fork kthread worker_thread | --99.92%--process_one_work | |--80.52%--kcryptd_crypt | | | |--62.58%--mempool_alloc | | | | | --62.24%--crypt_page_alloc | | | | | --61.51%--__percpu_counter_compare | | | | | --61.34%--__percpu_counter_sum | | | | | |--58.68%--_raw_spin_lock_irqsave | | | | | | | --58.30%--native_queued_spin_lock_slowpath | | | | | --0.69%--cpumask_next | | | | | --0.51%--_find_next_bit | | | |--10.61%--crypt_convert | | | | | |--6.05%--xts_crypt ... After applying this patch and running the same test, %system usage is lowered to ~7% and write throughput on the loop device increases to ~2.7GB/s. perf report shows mempool_alloc() as ~8% rather than ~62% in the profile and not hitting the percpu_counter() spinlock anymore. |--8.15%--mempool_alloc | | | |--3.93%--crypt_page_alloc | | | | | --3.75%--__alloc_pages | | | | | --3.62%--get_page_from_freelist | | | | | --3.22%--rmqueue_bulk | | | | | --2.59%--_raw_spin_lock | | | | | --2.57%--native_queued_spin_lock_slowpath | | | --3.05%--_raw_spin_lock_irqsave | | | --2.49%--native_queued_spin_lock_slowpath Suggested-by: NDJ Gregor <dj@corelight.com> Reviewed-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NArne Welzel <arne.welzel@corelight.com> Fixes: 5059353d ("dm crypt: limit the number of allocated pages") Cc: stable@vger.kernel.org Signed-off-by: NMike Snitzer <snitzer@redhat.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: NWeilong Chen <chenweilong@huawei.com> Signed-off-by: NChen Jun <chenjun102@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
- 19 10月, 2021 2 次提交
-
-
由 Christoph Hellwig 提交于
stable inclusion from stable-5.10.65 commit cf13537be54c6ea49f208668f5f8b8bda3edd28e bugzilla: 182361 https://gitee.com/openeuler/kernel/issues/I4EH3U Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=cf13537be54c6ea49f208668f5f8b8bda3edd28e -------------------------------- [ Upstream commit 224b0683 ] Except for the IDA none of the allocations in bcache_device_init is unwound on error, fix that. Signed-off-by: NChristoph Hellwig <hch@lst.de> Acked-by: NColy Li <colyli@suse.de> Link: https://lore.kernel.org/r/20210809064028.1198327-7-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: NWeilong Chen <chenweilong@huawei.com> Signed-off-by: NChen Jun <chenjun102@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Guoqing Jiang 提交于
mainline inclusion from mainline-v5.14-rc1 commit ad3fc798 category: bugfix bugzilla: 169402 https://gitee.com/openeuler/kernel/issues/I4DDEL Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ad3fc798800fb7ca04c1dfc439dba946818048d8 ------------------------------------------------- The commit 41d2d848 ("md: improve io stats accounting") could cause double fault problem per the report [1], and also it is not correct to change ->bi_end_io if md don't own it, so let's revert it. And io stats accounting will be replemented in later commits. [1]. https://lore.kernel.org/linux-raid/3bf04253-3fad-434a-63a7-20214e38cf26@gmail.com/T/#t Fixes: 41d2d848 ("md: improve io stats accounting") Signed-off-by: NGuoqing Jiang <jiangguoqing@kylinos.cn> Signed-off-by: NSong Liu <song@kernel.org> Signed-off-by: NLuo Meng <luomeng12@huawei.com> Conflicts: drivers/md/md.c Reviewed-by: NJason Yan <yanaijie@huawei.com> Signed-off-by: NChen Jun <chenjun102@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
- 15 10月, 2021 8 次提交
-
-
由 Wei Shuyu 提交于
stable inclusion from stable-5.10.58 commit a786282b55b4d1134931c679a80e900c46461eae bugzilla: 176984 https://gitee.com/openeuler/kernel/issues/I4E2P4 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=a786282b55b4d1134931c679a80e900c46461eae -------------------------------- commit 5ba03936 upstream. Similar to [1], this patch fixes the same bug in raid10. Also cleanup the comments. [1] commit 2417b986 ("md/raid1: properly indicate failure when ending a failed write request") Cc: stable@vger.kernel.org Fixes: 7cee6d4e ("md/raid10: end bio when the device faulty") Signed-off-by: NWei Shuyu <wsy@dogben.com> Acked-by: NGuoqing Jiang <jiangguoqing@kylinos.cn> Signed-off-by: NSong Liu <song@kernel.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: NWeilong Chen <chenweilong@huawei.com> Signed-off-by: NChen Jun <chenjun102@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Mikulas Patocka 提交于
stable inclusion from stable-5.10.51 commit b716ccffbc8dc8f14773d4ae7daadbca6167da2d bugzilla: 175263 https://gitee.com/openeuler/kernel/issues/I4DT6F Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=b716ccffbc8dc8f14773d4ae7daadbca6167da2d -------------------------------- commit 867de40c upstream. SSDs perform badly with sub-4k writes (because they perfrorm read-modify-write internally), so make sure writecache writes at least 4k when committing. Fixes: 991bd8d7 ("dm writecache: commit just one block, not a full page") Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: NWeilong Chen <chenweilong@huawei.com> Signed-off-by: NChen Jun <chenjun102@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Mikulas Patocka 提交于
stable inclusion from stable-5.10.51 commit 1b5918b087b1dd7bf193340f25ca63c50a277638 bugzilla: 175263 https://gitee.com/openeuler/kernel/issues/I4DT6F Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=1b5918b087b1dd7bf193340f25ca63c50a277638 -------------------------------- commit ee55b92a upstream. Commit d53f1faf ("dm writecache: do direct write if the cache is full") changed dm-writecache, so that it writes directly to the origin device if the cache is full. Unfortunately, it doesn't forward flush requests to the origin device, so that there is a bug where flushes are being ignored. Fix this by adding missing flush forwarding. For PMEM mode, we fix this bug by disabling direct writes to the origin device, because it performs better. Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Fixes: d53f1faf ("dm writecache: do direct write if the cache is full") Cc: stable@vger.kernel.org # v5.7+ Signed-off-by: NMike Snitzer <snitzer@redhat.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: NWeilong Chen <chenweilong@huawei.com> Signed-off-by: NChen Jun <chenjun102@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Damien Le Moal 提交于
stable inclusion from stable-5.10.51 commit cbc03ffec260c28cd177fee143f2fe74cc36ba00 bugzilla: 175263 https://gitee.com/openeuler/kernel/issues/I4DT6F Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=cbc03ffec260c28cd177fee143f2fe74cc36ba00 -------------------------------- commit bab68499 upstream. The dm-zoned target cannot support zoned block devices with zones that have a capacity smaller than the zone size (e.g. NVMe zoned namespaces) due to the current chunk zone mapping implementation as it is assumed that zones and chunks have the same size with all blocks usable. If a zoned drive is found to have zones with a capacity different from the zone size, fail the target initialization. Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com> Cc: stable@vger.kernel.org # v5.9+ Signed-off-by: NMike Snitzer <snitzer@redhat.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: NWeilong Chen <chenweilong@huawei.com> Signed-off-by: NChen Jun <chenjun102@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Mikulas Patocka 提交于
stable inclusion from stable-5.10.51 commit ad7083a95d8ac29acdca83942997dd4014c4149d bugzilla: 175263 https://gitee.com/openeuler/kernel/issues/I4DT6F Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=ad7083a95d8ac29acdca83942997dd4014c4149d -------------------------------- [ Upstream commit 991bd8d7 ] Some architectures have pages larger than 4k and committing a full page causes needless overhead. Fix this by writing a single block when committing the superblock. Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: NWeilong Chen <chenweilong@huawei.com> Signed-off-by: NChen Jun <chenjun102@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Damien Le Moal 提交于
stable inclusion from stable-5.10.51 commit cc4f0a9d5aa1b5abffb2366a0b37c37806362fe8 bugzilla: 175263 https://gitee.com/openeuler/kernel/issues/I4DT6F Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=cc4f0a9d5aa1b5abffb2366a0b37c37806362fe8 -------------------------------- [ Upstream commit 6842d264 ] Fix dm_accept_partial_bio() to actually check that zone management commands are not passed as explained in the function documentation comment. Also, since a zone append operation cannot be split, add REQ_OP_ZONE_APPEND as a forbidden command. White lines are added around the group of BUG_ON() calls to make the code more legible. Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: NWeilong Chen <chenweilong@huawei.com> Signed-off-by: NChen Jun <chenjun102@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Mikulas Patocka 提交于
stable inclusion from stable-5.10.51 commit 939f750215b89d2cc774a85a6810286aec0f5718 bugzilla: 175263 https://gitee.com/openeuler/kernel/issues/I4DT6F Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=939f750215b89d2cc774a85a6810286aec0f5718 -------------------------------- [ Upstream commit ee50cc19 ] If dm-writecache overwrites existing cached data, it splits the incoming bio into many block-sized bios. The I/O scheduler does merge these bios into one large request but this needless splitting and merging causes performance degradation. Fix this by avoiding bio splitting if the cache target area that is being overwritten is contiguous. Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: NWeilong Chen <chenweilong@huawei.com> Signed-off-by: NChen Jun <chenjun102@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Joe Thornber 提交于
stable inclusion from stable-5.10.51 commit 65e780667cf39f54154c6d95e40fd650bd0a31c5 bugzilla: 175263 https://gitee.com/openeuler/kernel/issues/I4DT6F Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=65e780667cf39f54154c6d95e40fd650bd0a31c5 -------------------------------- [ Upstream commit 5faafc77 ] Current commit code resets the place where the search for free blocks will begin back to the start of the metadata device. There are a couple of repercussions to this: - The first allocation after the commit is likely to take longer than normal as it searches for a free block in an area that is likely to have very few free blocks (if any). - Any free blocks it finds will have been recently freed. Reusing them means we have fewer old copies of the metadata to aid recovery from hardware error. Fix these issues by leaving the cursor alone, only resetting when the search hits the end of the metadata device. Signed-off-by: NJoe Thornber <ejt@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: NWeilong Chen <chenweilong@huawei.com> Signed-off-by: NChen Jun <chenjun102@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
- 03 7月, 2021 2 次提交
-
-
由 Hou Tao 提交于
mainline inclusion from mainline-next commit b8e0c7f90e6f99ee64ea60e39253d5fcfb445f9e category: bugfix bugzilla: 167383 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b8e0c7f90e6f99ee64ea60e39253d5fcfb445f9e -------------------------------- remove_raw() in dm_btree_remove() may fail due to IO read error (e.g. read the content of origin block fails during shadowing), and the value of shadow_spine::root is uninitialized, but the uninitialized value is still assign to new_root in the end of dm_btree_remove(). For dm-thin, the value of pmd->details_root or pmd->root will become an uninitialized value, so if trying to read details_info tree again out-of-bound memory may occur as showed below: general protection fault, probably for non-canonical address 0x3fdcb14c8d7520 CPU: 4 PID: 515 Comm: dmsetup Not tainted 5.13.0-rc6 Hardware name: QEMU Standard PC RIP: 0010:metadata_ll_load_ie+0x14/0x30 Call Trace: sm_metadata_count_is_more_than_one+0xb9/0xe0 dm_tm_shadow_block+0x52/0x1c0 shadow_step+0x59/0xf0 remove_raw+0xb2/0x170 dm_btree_remove+0xf4/0x1c0 dm_pool_delete_thin_device+0xc3/0x140 pool_message+0x218/0x2b0 target_message+0x251/0x290 ctl_ioctl+0x1c4/0x4d0 dm_ctl_ioctl+0xe/0x20 __x64_sys_ioctl+0x7b/0xb0 do_syscall_64+0x40/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xae Fixing it by only assign new_root when removal succeeds Signed-off-by: NHou Tao <houtao1@huawei.com> Signed-off-by: NLuo Meng <luomeng12@huawei.com> Reviewed-by: NZhang Xiaoxu <zhangxiaoxu5@huawei.com> Signed-off-by: NChen Jun <chenjun102@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 John Keeping 提交于
stable inclusion from stable-5.10.44 commit 90547d5db50bcb2705709e420e0af51535109113 bugzilla: 109295 CVE: NA -------------------------------- [ Upstream commit 0c1f3193 ] The third parameter of module_param() is permissions for the sysfs node but it looks like it is being used as the initial value of the parameter here. In fact, false here equates to omitting the file from sysfs and does not affect the value of require_signatures. Making the parameter writable is not simple because going from false->true is fine but it should not be possible to remove the requirement to verify a signature. But it can be useful to inspect the value of this parameter from userspace, so change the permissions to make a read-only file in sysfs. Signed-off-by: NJohn Keeping <john@metanate.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: NWeilong Chen <chenweilong@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
- 15 6月, 2021 1 次提交
-
-
由 Mikulas Patocka 提交于
stable inclusion from stable-5.10.42 commit 1a8ecc3cd1a1250c12128717e47d160f22ba3866 bugzilla: 55093 CVE: NA -------------------------------- commit 7e768532 upstream. If an origin target has no snapshots, o->split_boundary is set to 0. This causes BUG_ON(sectors <= 0) in block/bio.c:bio_split(). Fix this by initializing chunk_size, and in turn split_boundary, to rounddown_pow_of_two(UINT_MAX) -- the largest power of two that fits into "unsigned" type. Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: NMike Snitzer <snitzer@redhat.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: NWeilong Chen <chenweilong@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
- 03 6月, 2021 4 次提交
-
-
由 JeongHyeon Lee 提交于
mainline inclusion from mainline-v5.13-rc1 commit 219a9b5e category: bugfix bugzilla: 51874 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=219a9b5e738b75a6a5e9effe1d72f60037a2f131 ----------------------------------------------- If more than one one handling mode is requested during DM verity table load, the last requested mode will be used. Change this to impose more strict checking so that the table load will fail if more than one error handling mode is requested. Signed-off-by: NJeongHyeon Lee <jhs2.lee@samsung.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com> Signed-off-by: NLuo Meng <luomeng12@huawei.com> Reviewed-by: NZhang Xiaoxu <zhangxiaoxu5@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Mikulas Patocka 提交于
stable inclusion from stable-5.10.40 commit 2a61f0ccb756f966f7d04aa149635c843f821ad3 bugzilla: 51882 CVE: NA -------------------------------- commit c699a0db upstream. The following commands will crash the kernel: modprobe brd rd_size=1048576 dmsetup create o --table "0 `blockdev --getsize /dev/ram0` snapshot-origin /dev/ram0" dmsetup create s --table "0 `blockdev --getsize /dev/ram0` snapshot /dev/ram0 /dev/ram1 N 0" The reason is that when we test for zero chunk size, we jump to the label bad_read_metadata without setting the "r" variable. The function snapshot_ctr destroys all the structures and then exits with "r == 0". The kernel then crashes because it falsely believes that snapshot_ctr succeeded. In order to fix the bug, we set the variable "r" to -EINVAL. Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: NMike Snitzer <snitzer@redhat.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: NWeilong Chen <chenweilong@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Jan Glauber 提交于
stable inclusion from stable-5.10.37 commit 0035a4704557ba66824c08d5759d6e743747410b bugzilla: 51868 CVE: NA -------------------------------- commit 7abfabaf upstream. Reading /proc/mdstat with a read buffer size that would not fit the unused status line in the first read will skip this line from the output. So 'dd if=/proc/mdstat bs=64 2>/dev/null' will not print something like: unused devices: <none> Don't return NULL immediately in start() for v=2 but call show() once to print the status line also for multiple reads. Cc: stable@vger.kernel.org Fixes: 1f4aace6 ("fs/seq_file.c: simplify seq_file iteration code and interface") Signed-off-by: NJan Glauber <jglauber@digitalocean.com> Signed-off-by: NSong Liu <songliubraving@fb.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: NWeilong Chen <chenweilong@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Zhao Heming 提交于
stable inclusion from stable-5.10.37 commit b70b7ec500892f8bc12ffc6f60a3af6fd61d3a8b bugzilla: 51868 CVE: NA -------------------------------- commit 6a4db2a6 upstream. commit d3374825 ("md: make devices disappear when they are no longer needed.") introduced protection between mddev creating & removing. The md_open shouldn't create mddev when all_mddevs list doesn't contain mddev. With currently code logic, there will be very easy to trigger soft lockup in non-preempt env. This patch changes md_open returning from -ERESTARTSYS to -EBUSY, which will break the infinitely retry when md_open enter racing area. This patch is partly fix soft lockup issue, full fix needs mddev_find is split into two functions: mddev_find & mddev_find_or_alloc. And md_open should call new mddev_find (it only does searching job). For more detail, please refer with Christoph's "split mddev_find" patch in later commits. *** env *** kvm-qemu VM 2C1G with 2 iscsi luns kernel should be non-preempt *** script *** about trigger every time with below script ``` 1 node1="mdcluster1" 2 node2="mdcluster2" 3 4 mdadm -Ss 5 ssh ${node2} "mdadm -Ss" 6 wipefs -a /dev/sda /dev/sdb 7 mdadm -CR /dev/md0 -b clustered -e 1.2 -n 2 -l mirror /dev/sda \ /dev/sdb --assume-clean 8 9 for i in {1..10}; do 10 echo ==== $i ====; 11 12 echo "test ...." 13 ssh ${node2} "mdadm -A /dev/md0 /dev/sda /dev/sdb" 14 sleep 1 15 16 echo "clean ....." 17 ssh ${node2} "mdadm -Ss" 18 done ``` I use mdcluster env to trigger soft lockup, but it isn't mdcluster speical bug. To stop md array in mdcluster env will do more jobs than non-cluster array, which will leave enough time/gap to allow kernel to run md_open. *** stack *** ``` [ 884.226509] mddev_put+0x1c/0xe0 [md_mod] [ 884.226515] md_open+0x3c/0xe0 [md_mod] [ 884.226518] __blkdev_get+0x30d/0x710 [ 884.226520] ? bd_acquire+0xd0/0xd0 [ 884.226522] blkdev_get+0x14/0x30 [ 884.226524] do_dentry_open+0x204/0x3a0 [ 884.226531] path_openat+0x2fc/0x1520 [ 884.226534] ? seq_printf+0x4e/0x70 [ 884.226536] do_filp_open+0x9b/0x110 [ 884.226542] ? md_release+0x20/0x20 [md_mod] [ 884.226543] ? seq_read+0x1d8/0x3e0 [ 884.226545] ? kmem_cache_alloc+0x18a/0x270 [ 884.226547] ? do_sys_open+0x1bd/0x260 [ 884.226548] do_sys_open+0x1bd/0x260 [ 884.226551] do_syscall_64+0x5b/0x1e0 [ 884.226554] entry_SYSCALL_64_after_hwframe+0x44/0xa9 ``` *** rootcause *** "mdadm -A" (or other array assemble commands) will start a daemon "mdadm --monitor" by default. When "mdadm -Ss" is running, the stop action will wakeup "mdadm --monitor". The "--monitor" daemon will immediately get info from /proc/mdstat. This time mddev in kernel still exist, so /proc/mdstat still show md device, which makes "mdadm --monitor" to open /dev/md0. The previously "mdadm -Ss" is removing action, the "mdadm --monitor" open action will trigger md_open which is creating action. Racing is happening. ``` <thread 1>: "mdadm -Ss" md_release mddev_put deletes mddev from all_mddevs queue_work for mddev_delayed_delete at this time, "/dev/md0" is still available for opening <thread 2>: "mdadm --monitor ..." md_open + mddev_find can't find mddev of /dev/md0, and create a new mddev and | return. + trigger "if (mddev->gendisk != bdev->bd_disk)" and return -ERESTARTSYS. ``` In non-preempt kernel, <thread 2> is occupying on current CPU. and mddev_delayed_delete which was created in <thread 1> also can't be schedule. In preempt kernel, it can also trigger above racing. But kernel doesn't allow one thread running on a CPU all the time. after <thread 2> running some time, the later "mdadm -A" (refer above script line 13) will call md_alloc to alloc a new gendisk for mddev. it will break md_open statement "if (mddev->gendisk != bdev->bd_disk)" and return 0 to caller, the soft lockup is broken. Cc: stable@vger.kernel.org Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NZhao Heming <heming.zhao@suse.com> Signed-off-by: NSong Liu <song@kernel.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: NWeilong Chen <chenweilong@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-