- 08 2月, 2021 36 次提交
-
-
由 Pavel Begunkov 提交于
stable inclusion from stable-5.10.12 commit 7bccd1c19128140b9fefaa43808924c6932bef5b bugzilla: 47876 -------------------------------- [ Upstream commit 4aa84f2f ] CPU0 CPU1 ---- ---- lock(&new->fa_lock); local_irq_disable(); lock(&ctx->completion_lock); lock(&new->fa_lock); <Interrupt> lock(&ctx->completion_lock); *** DEADLOCK *** Move kill_fasync() out of io_commit_cqring() to io_cqring_ev_posted(), so it doesn't hold completion_lock while doing it. That saves from the reported deadlock, and it's just nice to shorten the locking time and untangle nested locks (compl_lock -> wq_head::lock). Cc: stable@vger.kernel.org # 5.5+ Reported-by: syzbot+91ca3f25bd7f795f019c@syzkaller.appspotmail.com Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Pavel Begunkov 提交于
stable inclusion from stable-5.10.12 commit 186725a80c4e931b6fe31b94d66c989d5f2354c1 bugzilla: 47876 -------------------------------- [ Upstream commit 0b5cd6c3 ] If there are no requests at the time __io_uring_task_cancel() is called, tctx_inflight() returns zero and and it terminates not getting a chance to go through __io_uring_files_cancel() and do io_disable_sqo_submit(). And we absolutely want them disabled by the time cancellation ends. Cc: stable@vger.kernel.org # 5.5+ Reported-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Fixes: d9d05217 ("io_uring: stop SQPOLL submit on creator's death") Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Pavel Begunkov 提交于
stable inclusion from stable-5.10.12 commit 54b4c4f9aba9e5d1ef6877f42a57895b189107c9 bugzilla: 47876 -------------------------------- [ Upstream commit 4325cb49 ] WARNING: CPU: 1 PID: 11100 at fs/io_uring.c:9096 io_uring_flush+0x326/0x3a0 fs/io_uring.c:9096 RIP: 0010:io_uring_flush+0x326/0x3a0 fs/io_uring.c:9096 Call Trace: filp_close+0xb4/0x170 fs/open.c:1280 close_files fs/file.c:401 [inline] put_files_struct fs/file.c:416 [inline] put_files_struct+0x1cc/0x350 fs/file.c:413 exit_files+0x7e/0xa0 fs/file.c:433 do_exit+0xc22/0x2ae0 kernel/exit.c:820 do_group_exit+0x125/0x310 kernel/exit.c:922 get_signal+0x3e9/0x20a0 kernel/signal.c:2770 arch_do_signal_or_restart+0x2a8/0x1eb0 arch/x86/kernel/signal.c:811 handle_signal_work kernel/entry/common.c:147 [inline] exit_to_user_mode_loop kernel/entry/common.c:171 [inline] exit_to_user_mode_prepare+0x148/0x250 kernel/entry/common.c:201 __syscall_exit_to_user_mode_work kernel/entry/common.c:291 [inline] syscall_exit_to_user_mode+0x19/0x50 kernel/entry/common.c:302 entry_SYSCALL_64_after_hwframe+0x44/0xa9 An SQPOLL ring creator task may have gotten rid of its file note during exit and called io_disable_sqo_submit(), but the io_uring is still left referenced through fdtable, which will be put during close_files() and cause a false positive warning. First split the warning into two for more clarity when is hit, and the add sqo_dead check to handle the described case. Cc: stable@vger.kernel.org # 5.5+ Reported-by: syzbot+a32b546d58dde07875a1@syzkaller.appspotmail.com Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Pavel Begunkov 提交于
stable inclusion from stable-5.10.12 commit 0682759126bc761c325325ca809ae99c93fda2a0 bugzilla: 47876 -------------------------------- [ Upstream commit 6b393a1f ] WARNING: CPU: 1 PID: 9094 at fs/io_uring.c:8884 io_disable_sqo_submit+0x106/0x130 fs/io_uring.c:8884 Call Trace: io_uring_flush+0x28b/0x3a0 fs/io_uring.c:9099 filp_close+0xb4/0x170 fs/open.c:1280 close_fd+0x5c/0x80 fs/file.c:626 __do_sys_close fs/open.c:1299 [inline] __se_sys_close fs/open.c:1297 [inline] __x64_sys_close+0x2f/0xa0 fs/open.c:1297 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 io_uring's final close() may be triggered by any task not only the creator. It's well handled by io_uring_flush() including SQPOLL case, though a warning in io_disable_sqo_submit() will fallaciously fire by moving this warning out to the only call site that matters. Cc: stable@vger.kernel.org # 5.5+ Reported-by: syzbot+2f5d1785dc624932da78@syzkaller.appspotmail.com Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Pavel Begunkov 提交于
stable inclusion from stable-5.10.12 commit 8cb6f4da831bc51145aee3a923f03114121dea6b bugzilla: 47876 -------------------------------- [ Upstream commit 06585c49 ] WARNING: CPU: 0 PID: 8494 at fs/io_uring.c:8717 io_ring_ctx_wait_and_kill+0x4f2/0x600 fs/io_uring.c:8717 Call Trace: io_uring_release+0x3e/0x50 fs/io_uring.c:8759 __fput+0x283/0x920 fs/file_table.c:280 task_work_run+0xdd/0x190 kernel/task_work.c:140 tracehook_notify_resume include/linux/tracehook.h:189 [inline] exit_to_user_mode_loop kernel/entry/common.c:174 [inline] exit_to_user_mode_prepare+0x249/0x250 kernel/entry/common.c:201 __syscall_exit_to_user_mode_work kernel/entry/common.c:291 [inline] syscall_exit_to_user_mode+0x19/0x50 kernel/entry/common.c:302 entry_SYSCALL_64_after_hwframe+0x44/0xa9 failed io_uring_install_fd() is a special case, we don't do io_ring_ctx_wait_and_kill() directly but defer it to fput, though still need to io_disable_sqo_submit() before. note: it doesn't fix any real problem, just a warning. That's because sqring won't be available to the userspace in this case and so SQPOLL won't submit anything. Cc: stable@vger.kernel.org # 5.5+ Reported-by: syzbot+9c9c35374c0ecac06516@syzkaller.appspotmail.com Fixes: d9d05217 ("io_uring: stop SQPOLL submit on creator's death") Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Pavel Begunkov 提交于
stable inclusion from stable-5.10.12 commit 0e3562e3b2aeb4a6aa4615185a8f59c51cade61b bugzilla: 47876 -------------------------------- [ Upstream commit b4411616 ] general protection fault, probably for non-canonical address 0xdffffc0000000022: 0000 [#1] KASAN: null-ptr-deref in range [0x0000000000000110-0x0000000000000117] RIP: 0010:io_ring_set_wakeup_flag fs/io_uring.c:6929 [inline] RIP: 0010:io_disable_sqo_submit+0xdb/0x130 fs/io_uring.c:8891 Call Trace: io_uring_create fs/io_uring.c:9711 [inline] io_uring_setup+0x12b1/0x38e0 fs/io_uring.c:9739 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 io_disable_sqo_submit() might be called before user rings were allocated, don't do io_ring_set_wakeup_flag() in those cases. Cc: stable@vger.kernel.org # 5.5+ Reported-by: syzbot+ab412638aeb652ded540@syzkaller.appspotmail.com Fixes: d9d05217 ("io_uring: stop SQPOLL submit on creator's death") Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Pavel Begunkov 提交于
stable inclusion from stable-5.10.12 commit a63d9157571b52f7339d6db4c2ab7bc3bfe527c0 bugzilla: 47876 -------------------------------- [ Upstream commit d9d05217 ] When the creator of SQPOLL io_uring dies (i.e. sqo_task), we don't want its internals like ->files and ->mm to be poked by the SQPOLL task, it have never been nice and recently got racy. That can happen when the owner undergoes destruction and SQPOLL tasks tries to submit new requests in parallel, and so calls io_sq_thread_acquire*(). That patch halts SQPOLL submissions when sqo_task dies by introducing sqo_dead flag. Once set, the SQPOLL task must not do any submission, which is synchronised by uring_lock as well as the new flag. The tricky part is to make sure that disabling always happens, that means either the ring is discovered by creator's do_exit() -> cancel, or if the final close() happens before it's done by the creator. The last is guaranteed by the fact that for SQPOLL the creator task and only it holds exactly one file note, so either it pins up to do_exit() or removed by the creator on the final put in flush. (see comments in uring_flush() around file->f_count == 2). One more place that can trigger io_sq_thread_acquire_*() is __io_req_task_submit(). Shoot off requests on sqo_dead there, even though actually we don't need to. That's because cancellation of sqo_task should wait for the request before going any further. note 1: io_disable_sqo_submit() does io_ring_set_wakeup_flag() so the caller would enter the ring to get an error, but it still doesn't guarantee that the flag won't be cleared. note 2: if final __userspace__ close happens not from the creator task, the file note will pin the ring until the task dies. Cc: stable@vger.kernel.org # 5.5+ Fixed: b1b6b5a3 ("kernel/io_uring: cancel io_uring before task works") Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Pavel Begunkov 提交于
stable inclusion from stable-5.10.12 commit da67631a33c342528245817cc61e36dd945665b0 bugzilla: 47876 -------------------------------- [ Upstream commit 6b5733eb] files_cancel() should cancel all relevant requests and drop file notes, so we should never have file notes after that, including on-exit fput and flush. Add a WARN_ONCE to be sure. Cc: stable@vger.kernel.org # 5.5+ Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Pavel Begunkov 提交于
stable inclusion from stable-5.10.12 commit 18f31594ee52ed1f364e376767fb839935fd899c bugzilla: 47876 -------------------------------- [ Upstream commit 4f793dc4 ] A simple preparation change inlining io_uring_attempt_task_drop() into io_uring_flush(). Cc: stable@vger.kernel.org # 5.5+ Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Pavel Begunkov 提交于
stable inclusion from stable-5.10.12 commit 7bf3fb6243a3b153ab1854b331ec19d67a4878bb bugzilla: 47876 -------------------------------- [ Upstream commit b1b6b5a3 ] For cancelling io_uring requests it needs either to be able to run currently enqueued task_works or having it shut down by that moment. Otherwise io_uring_cancel_files() may be waiting for requests that won't ever complete. Go with the first way and do cancellations before setting PF_EXITING and so before putting the task_work infrastructure into a transition state where task_work_run() would better not be called. Cc: stable@vger.kernel.org # 5.5+ Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 zhangyi (F) 提交于
hulk inclusion category: bugfix bugzilla: 47443 CVE: NA --------------------------- After commit 4fdcfab5 ("jffs2: fix use-after-free on symlink traversal"), it expose a freeing uninitialized memory problem due to this commit move the operaion of freeing f->target to jffs2_i_callback(), which may not be initialized in some error path of allocating jffs2 inode (eg: jffs2_iget()->iget_locked()-> destroy_inode()->..->jffs2_i_callback()->kfree(f->target)). Fix this by initialize the jffs2_inode_info just after allocating it. Reported-by: NGuohua Zhong <zhongguohua1@huawei.com> Reported-by: NHuaijie Yi <yihuaijie@huawei.com> Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com> Reviewed-by: NYang Erkun <yangerkun@huawei.com> [backport from hulk-4.4] Signed-off-by: Nyangerkun <yangerkun@huawei.com> Reviewed-by: Nzhangyi (F) <yi.zhang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Hou Tao 提交于
euler inclusion category: bugfix bugzilla: 47446 CVE: NA -------------------------------------------------- In jffs2_do_clear_inode(), we will check whether or not there is any jffs2_raw_node_ref associated with the current inocache. If there is no raw-node-ref, the inocache could be freed. And if there are still some jffs2_raw_node_ref linked in inocache->nodes, the inocache could not be freed and its free will be decided by jffs2_remove_node_refs_from_ino_list(). However there is a race between jffs2_do_clear_inode() and jffs2_remove_node_refs_from_ino_list() as shown in the following scenario: CPU 0 CPU 1 in sys_unlink() in jffs2_garbage_collect_pass() jffs2_do_unlink f->inocache->pino_nlink = 0 set_nlink(inode, 0) // contains all raw-node-refs of the unlinked inode start GC a jeb iput_final jffs2_evict_inode jffs2_do_clear_inode acquire f->sem mark all refs as obsolete GC complete jeb is moved to erase_pending_list jffs2_erase_pending_blocks jffs2_free_jeb_node_refs jffs2_remove_node_refs_from_ino_list f->inocache = INO_STATE_CHECKEDABSENT // no raw-node-ref is associated with the // inocache of the unlinked inode ic->nodes == (void *)ic && ic->pino_nlink == 0 jffs2_del_ino_cache f->inodecache->nodes == f->nodes // double-free occurs jffs2_del_ino_cache Double-free of inocache will lead to all kinds of weired behaviours. The following BUG_ON is one case in which two active inodes are used the same inocache (the freed inocache is reused by a new inode, then the inocache is double-freed and reused by another new inode): jffs2: Raw node at 0x006c6000 wasn't in node lists for ino #662249 ------------[ cut here ]------------ kernel BUG at fs/jffs2/gc.c:645! invalid opcode: 0000 [#1] PREEMPT SMP Modules linked in: nandsim CPU: 0 PID: 15837 Comm: cp Not tainted 4.4.172 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) RIP: [<ffffffff816f1256>] jffs2_garbage_collect_live+0x1578/0x1593 Call Trace: [<ffffffff8154b8aa>] jffs2_garbage_collect_pass+0xf6a/0x15d0 [<ffffffff81541bbd>] jffs2_reserve_space+0x2bd/0x8a0 [<ffffffff81546a62>] jffs2_do_create+0x52/0x480 [<ffffffff8153c9f2>] jffs2_create+0xe2/0x2a0 [<ffffffff8133bed7>] vfs_create+0xe7/0x220 [<ffffffff81340ab4>] path_openat+0x11f4/0x1c00 [<ffffffff81343635>] do_filp_open+0xa5/0x140 [<ffffffff813288ed>] do_sys_open+0x19d/0x320 [<ffffffff81328a96>] SyS_open+0x26/0x30 [<ffffffff81c3f8f8>] entry_SYSCALL_64_fastpath+0x18/0x73 ---[ end trace dd5c02f1653e8cac ]--- Fix it by protecting no-raw-node-ref check by erase_completion_lock. And also need to move the call of jffs2_set_inocache_state() under erase_completion_lock, else the inocache may be leaked because jffs2_del_ino_cache() invoked by jffs2_remove_node_refs_from_ino_list() may find the state of inocache is still INO_STATE_CHECKING and will not free the inocache. Link: http://lists.infradead.org/pipermail/linux-mtd/2019-February/087764.htmlSigned-off-by: NHou Tao <houtao1@huawei.com> Reviewed-by: NWei Fang <fangwei1@huawei.com> Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com> [cherry-pick from hulk-4.4] Signed-off-by: Nyangerkun <yangerkun@huawei.com> Reviewed-by: NHou Tao <houtao1@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Hou Tao 提交于
hulk inclusion category: bugfix bugzilla: 47446 CVE: NA -------------------------- For inode that fails to be created midway, GC procedure may try to GC its dnode, and in the following case BUG() will be triggered: CPU 0 CPU 1 in jffs2_do_create() in jffs2_garbage_collect_pass() jffs2_write_dnode succeed // for dirent jffs2_reserve_space fail inum = ic->ino nlink = ic->pino_nlink (> 0) iget_failed make_bad_inode remove_inode_hash iput jffs2_evict_inode jffs2_do_clear_inode jffs2_set_inocache_state(INO_STATE_CLEARING) jffs2_gc_fetch_inode jffs2_iget // a new inode is created because // the old inode had been unhashed iget_locked jffs2_do_read_inode jffs2_get_ino_cache // assert BUG() f->inocache->state = INO_STATE_CLEARING Fix it by waiting for its state changes to INO_STATE_CHECKEDABSENT. Link: http://lists.infradead.org/pipermail/linux-mtd/2019-February/087762.htmlSigned-off-by: NHou Tao <houtao1@huawei.com> Reviewed-by: NWei Fang <fangwei1@huawei.com> Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com> [cherry-pick from hulk-4.4] Signed-off-by: Nyangerkun <yangerkun@huawei.com> Reviewed-by: NHou Tao <houtao1@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Hou Tao 提交于
hulk inclusion category: bugfix bugzilla: 47446 CVE: NA ------------------------------------------------- So jffs2_do_clear_inode() could mark all flash nodes used by the inode as obsolete and GC procedure will reclaim these flash nodes, else these flash spaces will not be reclaimable forever. Link: http://lists.infradead.org/pipermail/linux-mtd/2019-February/087763.htmlSigned-off-by: NHou Tao <houtao1@huawei.com> Reviewed-by: NWei Fang <fangwei1@huawei.com> Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com> Signed-off-by: Nyangerkun <yangerkun@huawei.com> Reviewed-by: NHou Tao <houtao1@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Kyeong Yoo 提交于
hulk inclusion category: bugfix bugzilla: 47446 CVE: NA ------------------------------------------------- GC task can deadlock in read_cache_page() because it may attempt to release a page that is actually allocated by another task in jffs2_write_begin(). The reason is that in jffs2_write_begin() there is a small window a cache page is allocated for use but not set Uptodate yet. This ends up with a deadlock between two tasks: 1) A task (e.g. file copy) - jffs2_write_begin() locks a cache page - jffs2_write_end() tries to lock "alloc_sem" from jffs2_reserve_space() <-- STUCK 2) GC task (jffs2_gcd_mtd3) - jffs2_garbage_collect_pass() locks "alloc_sem" - try to lock the same cache page in read_cache_page() <-- STUCK So to avoid this deadlock, hold "alloc_sem" in jffs2_write_begin() while reading data in a cache page. Signed-off-by: NKyeong Yoo <kyeong.yoo@alliedtelesis.co.nz> Link: http://lists.infradead.org/pipermail/linux-mtd/2017-July/075581.htmlSigned-off-by: NHou Tao <houtao1@huawei.com> Reviewed-by: NWei Fang <fangwei1@huawei.com> Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com> [backport from hulk-4.4] Conflicts: fs/jffs2/file.c Signed-off-by: Nyangerkun <yangerkun@huawei.com> Reviewed-by: NHou Tao <houtao1@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Hou Tao 提交于
euler inclusion category: bugfix bugzilla: 47447 CVE: NA ------------------------------------------------- For xattr modification, we do not write a new jffs2_raw_xref with delete marker into flash, so if a xattr is modified then removed, and the old xref & xdatum are not erased by GC, after reboot or remount, the new xattr xref will be dead but the old xattr xref will be alive, and we will get the overwritten xattr instead of non-existent error when reading the removed xattr. Fix it by writing the deletion mark for xattr overwrite. Fixes: 8a13695c ("[JFFS2][XATTR] rid unnecessary writing of delete marker.") Signed-off-by: NHou Tao <houtao1@huawei.com> Acked-by: NMiao Xie <miaoxie@huawei.com> Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com> [cherry-pick from hulk-4.4] Signed-off-by: Nyangerkun <yangerkun@huawei.com> Reviewed-by: NHou Tao <houtao1@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Johannes Berg 提交于
stable inclusion from stable-5.10.11 commit e8572713897eb9e4bfaef90bf15d5dd00d7126fc bugzilla: 47621 -------------------------------- commit f8ad8187 upstream. After commit 36e2c742 ("fs: don't allow splice read/write without explicit ops") sendfile() could no longer send data from a real file to a pipe, breaking for example certain cgit setups (e.g. when running behind fcgiwrap), because in this case cgit will try to do exactly this: sendfile() to a pipe. Fix this by using iter_file_splice_write for the splice_write method of pipes, as suggested by Christoph. Cc: stable@vger.kernel.org Fixes: 36e2c742 ("fs: don't allow splice read/write without explicit ops") Suggested-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NChristoph Hellwig <hch@lst.de> Tested-by: NJohannes Berg <johannes@sipsolutions.net> Signed-off-by: NJohannes Berg <johannes@sipsolutions.net> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
-
由 Christoph Hellwig 提交于
stable inclusion from stable-5.10.11 commit 0b6672fd778cd92caed7206ba520a3f056d10484 bugzilla: 47621 -------------------------------- commit f2d6c270 upstream. Wire up the splice_read and splice_write methods to the default helpers using ->read_iter and ->write_iter now that those are implemented for kernfs. This restores support to use splice and sendfile on kernfs files. Fixes: 36e2c742 ("fs: don't allow splice read/write without explicit ops") Reported-by: NSiddharth Gupta <sidgup@codeaurora.org> Tested-by: NSiddharth Gupta <sidgup@codeaurora.org> Signed-off-by: NChristoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210120204631.274206-4-hch@lst.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
-
由 Christoph Hellwig 提交于
stable inclusion from stable-5.10.11 commit 11167454e9cbfa95856fea3f8e5428b4215a534c bugzilla: 47621 -------------------------------- commit cc099e0b upstream. Switch kernfs to implement the write_iter method instead of plain old write to prepare to supporting splice and sendfile again. Signed-off-by: NChristoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210120204631.274206-3-hch@lst.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
-
由 Christoph Hellwig 提交于
stable inclusion from stable-5.10.11 commit 6ce10b6481cd46040bf3c8f3daec08d3fafa30f4 bugzilla: 47621 -------------------------------- commit 4eaad21a upstream. Switch kernfs to implement the read_iter method instead of plain old read to prepare to supporting splice and sendfile again. Signed-off-by: NChristoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210120204631.274206-2-hch@lst.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
-
由 Takashi Iwai 提交于
stable inclusion from stable-5.10.11 commit 76e2b0b65d47206754084233d268d57ade2a988e bugzilla: 47621 -------------------------------- commit db58465f upstream. After the recent actions to convert readpages aops to readahead, the NULL checks of readpages aops in cachefiles_read_or_alloc_page() may hit falsely. More badly, it's an ASSERT() call, and this panics. Drop the superfluous NULL checks for fixing this regression. [DH: Note that cachefiles never actually used readpages, so this check was never actually necessary] BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=208883 BugLink: https://bugzilla.opensuse.org/show_bug.cgi?id=1175245 Fixes: 9ae326a6 ("CacheFiles: A cache that backs onto a mounted filesystem") Signed-off-by: NTakashi Iwai <tiwai@suse.de> Signed-off-by: NDavid Howells <dhowells@redhat.com> Acked-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
-
由 Pavel Begunkov 提交于
stable inclusion from stable-5.10.11 commit 2df15ef2a9cc58142d7acf1393db3fe5434f44c2 bugzilla: 47621 -------------------------------- commit 9a173346 upstream. Sockets and other non-regular files may actually expect short reads to happen, don't retry reads for them. Because non-reg files don't set FMODE_BUF_RASYNC and so it won't do second/retry do_read, we can filter out those cases after first do_read() attempt with ret>0. Cc: stable@vger.kernel.org # 5.9+ Suggested-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
-
由 Jens Axboe 提交于
stable inclusion from stable-5.10.11 commit f3ac7a5996d7cd739664c5f71cab4f8da03937e7 bugzilla: 47621 -------------------------------- commit 607ec89e upstream. IORING_OP_CLOSE is special in terms of cancelation, since it has an intermediate state where we've removed the file descriptor but hasn't closed the file yet. For that reason, it's currently marked with IO_WQ_WORK_NO_CANCEL to prevent cancelation. This ensures that the op is always run even if canceled, to prevent leaving us with a live file but an fd that is gone. However, with SQPOLL, since a cancel request doesn't carry any resources on behalf of the request being canceled, if we cancel before any of the close op has been run, we can end up with io-wq not having the ->files assigned. This can result in the following oops reported by Joseph: BUG: kernel NULL pointer dereference, address: 00000000000000d8 PGD 800000010b76f067 P4D 800000010b76f067 PUD 10b462067 PMD 0 Oops: 0000 [#1] SMP PTI CPU: 1 PID: 1788 Comm: io_uring-sq Not tainted 5.11.0-rc4 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:__lock_acquire+0x19d/0x18c0 Code: 00 00 8b 1d fd 56 dd 08 85 db 0f 85 43 05 00 00 48 c7 c6 98 7b 95 82 48 c7 c7 57 96 93 82 e8 9a bc f5 ff 0f 0b e9 2b 05 00 00 <48> 81 3f c0 ca 67 8a b8 00 00 00 00 41 0f 45 c0 89 04 24 e9 81 fe RSP: 0018:ffffc90001933828 EFLAGS: 00010002 RAX: 0000000000000001 RBX: 0000000000000001 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00000000000000d8 RBP: 0000000000000246 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000000 R14: ffff888106e8a140 R15: 00000000000000d8 FS: 0000000000000000(0000) GS:ffff88813bd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000000000d8 CR3: 0000000106efa004 CR4: 00000000003706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: lock_acquire+0x31a/0x440 ? close_fd_get_file+0x39/0x160 ? __lock_acquire+0x647/0x18c0 _raw_spin_lock+0x2c/0x40 ? close_fd_get_file+0x39/0x160 close_fd_get_file+0x39/0x160 io_issue_sqe+0x1334/0x14e0 ? lock_acquire+0x31a/0x440 ? __io_free_req+0xcf/0x2e0 ? __io_free_req+0x175/0x2e0 ? find_held_lock+0x28/0xb0 ? io_wq_submit_work+0x7f/0x240 io_wq_submit_work+0x7f/0x240 io_wq_cancel_cb+0x161/0x580 ? io_wqe_wake_worker+0x114/0x360 ? io_uring_get_socket+0x40/0x40 io_async_find_and_cancel+0x3b/0x140 io_issue_sqe+0xbe1/0x14e0 ? __lock_acquire+0x647/0x18c0 ? __io_queue_sqe+0x10b/0x5f0 __io_queue_sqe+0x10b/0x5f0 ? io_req_prep+0xdb/0x1150 ? mark_held_locks+0x6d/0xb0 ? mark_held_locks+0x6d/0xb0 ? io_queue_sqe+0x235/0x4b0 io_queue_sqe+0x235/0x4b0 io_submit_sqes+0xd7e/0x12a0 ? _raw_spin_unlock_irq+0x24/0x30 ? io_sq_thread+0x3ae/0x940 io_sq_thread+0x207/0x940 ? do_wait_intr_irq+0xc0/0xc0 ? __ia32_sys_io_uring_enter+0x650/0x650 kthread+0x134/0x180 ? kthread_create_worker_on_cpu+0x90/0x90 ret_from_fork+0x1f/0x30 Fix this by moving the IO_WQ_WORK_NO_CANCEL until _after_ we've modified the fdtable. Canceling before this point is totally fine, and running it in the io-wq context _after_ that point is also fine. For 5.12, we'll handle this internally and get rid of the no-cancel flag, as IORING_OP_CLOSE is the only user of it. Cc: stable@vger.kernel.org Fixes: b5dba59e ("io_uring: add support for IORING_OP_CLOSE") Reported-by: "Abaci <abaci@linux.alibaba.com>" Reviewed-and-tested-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
-
由 Jens Axboe 提交于
stable inclusion from stable-5.10.11 commit ca75872dd9f3db7893113b8fca6f2c874a4cbccf bugzilla: 47621 -------------------------------- commit c93cc9e1 upstream. If we're freeing/finishing iopoll requests, ensure we check if the task is in idling in terms of cancelation. Otherwise we could end up waiting forever in __io_uring_task_cancel() if the task has active iopoll requests that need cancelation. Cc: stable@vger.kernel.org # 5.9+ Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
-
由 Xiaoming Ni 提交于
stable inclusion from stable-5.10.11 commit cb5fe25c822057c49e61229a4e83ba27c3e24c17 bugzilla: 47621 -------------------------------- commit 697edcb0 upstream. The process_sysctl_arg() does not check whether val is empty before invoking strlen(val). If the command line parameter () is incorrectly configured and val is empty, oops is triggered. For example: "hung_task_panic=1" is incorrectly written as "hung_task_panic", oops is triggered. The call stack is as follows: Kernel command line: .... hung_task_panic ...... Call trace: __pi_strlen+0x10/0x98 parse_args+0x278/0x344 do_sysctl_args+0x8c/0xfc kernel_init+0x5c/0xf4 ret_from_fork+0x10/0x30 To fix it, check whether "val" is empty when "phram" is a sysctl field. Error codes are returned in the failure branch, and error logs are generated by parse_args(). Link: https://lkml.kernel.org/r/20210118133029.28580-1-nixiaoming@huawei.com Fixes: 3db978d4 ("kernel/sysctl: support setting sysctl parameters from kernel command line") Signed-off-by: NXiaoming Ni <nixiaoming@huawei.com> Acked-by: NVlastimil Babka <vbabka@suse.cz> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Iurii Zaikin <yzaikin@google.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Heiner Kallweit <hkallweit1@gmail.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: <stable@vger.kernel.org> [5.8+] Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
-
由 Ronnie Sahlberg 提交于
stable inclusion from stable-5.10.11 commit 2edf2c9f3e5e7a6fbeaa40b9a4ef65b4dfc97405 bugzilla: 47621 -------------------------------- commit 214a5ea0 upstream. RHBZ 1848178 The original intent of returning an error in this function in the patch: "CIFS: Mask off signals when sending SMB packets" was to avoid interrupting packet send in the middle of sending the data (and thus breaking an SMB connection), but we also don't want to fail the request for non-fatal signals even before we have had a chance to try to send it (the reported problem could be reproduced e.g. by exiting a child process when the parent process was in the midst of calling futimens to update a file's timestamps). In addition, since the signal may remain pending when we enter the sending loop, we may end up not sending the whole packet before TCP buffers become full. In this case the code returns -EINTR but what we need here is to return -ERESTARTSYS instead to allow system calls to be restarted. Fixes: b30c74c7 ("CIFS: Mask off signals when sending SMB packets") Cc: stable@vger.kernel.org # v5.1+ Signed-off-by: NRonnie Sahlberg <lsahlber@redhat.com> Reviewed-by: NPavel Shilovsky <pshilov@microsoft.com> Signed-off-by: NSteve French <stfrench@microsoft.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
-
由 Josef Bacik 提交于
stable inclusion from stable-5.10.11 commit dbba7a38b0074412b22b8ac41092015e1dae12ae bugzilla: 47621 -------------------------------- [ Upstream commit 71008734 ] We're supposed to print the root_key.offset in btrfs_root_name in the case of a reloc root, not the objectid. Fix this helper to take the key so we have access to the offset when we need it. Fixes: 457f1864 ("btrfs: pretty print leaked root name") Reviewed-by: NQu Wenruo <wqu@suse.com> Reviewed-by: NNikolay Borisov <nborisov@suse.com> Signed-off-by: NJosef Bacik <josef@toxicpanda.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
-
由 Trond Myklebust 提交于
stable inclusion from stable-5.10.11 commit 6533681890902e3b59bbceaea311760b3791c28d bugzilla: 47621 -------------------------------- [ Upstream commit b68f0cbd ] If the READ_PLUS operation was truncated due to an error, then ensure we clear the 'eof' flag. Fixes: 9f0b5792 ("NFSD: Encode a full READ_PLUS reply") Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
-
由 Trond Myklebust 提交于
stable inclusion from stable-5.10.11 commit de82ec8e5e8cba33f84ebef26478b636e94a90fb bugzilla: 47621 -------------------------------- [ Upstream commit 72d78717 ] Ensure that we encode the data payload + padding, and that we truncate the preallocated buffer to the actual read size. Fixes: 528b8493 ("NFSD: Add READ_PLUS data support") Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
-
由 Marcelo Diop-Gonzalez 提交于
stable inclusion from stable-5.10.11 commit 2ca824c79376453e7e3df60437324b36043ff29b bugzilla: 47621 -------------------------------- [ Upstream commit f010505b ] Right now io_flush_timeouts() checks if the current number of events is equal to ->timeout.target_seq, but this will miss some timeouts if there have been more than 1 event added since the last time they were flushed (possible in io_submit_flush_completions(), for example). Fix it by recording the last sequence at which timeouts were flushed so that the number of events seen can be compared to the number of events needed without overflow. Signed-off-by: NMarcelo Diop-Gonzalez <marcelo827@gmail.com> Reviewed-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
-
由 Eric Biggers 提交于
stable inclusion from stable-5.10.11 commit 13ef6bccab397c02d5a48d236316fd5f626f8b01 bugzilla: 47621 -------------------------------- commit 1e249cb5 upstream. When lazytime is enabled and an inode is being written due to its in-memory updated timestamps having expired, either due to a sync() or syncfs() system call or due to dirtytime_expire_interval having elapsed, the VFS needs to inform the filesystem so that the filesystem can copy the inode's timestamps out to the on-disk data structures. This is done by __writeback_single_inode() calling mark_inode_dirty_sync(), which then calls ->dirty_inode(I_DIRTY_SYNC). However, this occurs after __writeback_single_inode() has already cleared the dirty flags from ->i_state. This causes two bugs: - mark_inode_dirty_sync() redirties the inode, causing it to remain dirty. This wastefully causes the inode to be written twice. But more importantly, it breaks cases where sync_filesystem() is expected to clean dirty inodes. This includes the FS_IOC_REMOVE_ENCRYPTION_KEY ioctl (as reported at https://lore.kernel.org/r/20200306004555.GB225345@gmail.com), as well as possibly filesystem freezing (freeze_super()). - Since ->i_state doesn't contain I_DIRTY_TIME when ->dirty_inode() is called from __writeback_single_inode() for lazytime expiration, xfs_fs_dirty_inode() ignores the notification. (XFS only cares about lazytime expirations, and it assumes that i_state will contain I_DIRTY_TIME during those.) Therefore, lazy timestamps aren't persisted by sync(), syncfs(), or dirtytime_expire_interval on XFS. Fix this by moving the call to mark_inode_dirty_sync() to earlier in __writeback_single_inode(), before the dirty flags are cleared from i_state. This makes filesystems be properly notified of the timestamp expiration, and it avoids incorrectly redirtying the inode. This fixes xfstest generic/580 (which tests FS_IOC_REMOVE_ENCRYPTION_KEY) when run on ext4 or f2fs with lazytime enabled. It also fixes the new lazytime xfstest I've proposed, which reproduces the above-mentioned XFS bug (https://lore.kernel.org/r/20210105005818.92978-1-ebiggers@kernel.org). Alternatively, we could call ->dirty_inode(I_DIRTY_SYNC) directly. But due to the introduction of I_SYNC_QUEUED, mark_inode_dirty_sync() is the right thing to do because mark_inode_dirty_sync() now knows not to move the inode to a writeback list if it is currently queued for sync. Fixes: 0ae45f63 ("vfs: add support for a lazytime mount option") Cc: stable@vger.kernel.org Depends-on: 5afced3b ("writeback: Avoid skipping inode writeback") Link: https://lore.kernel.org/r/20210112190253.64307-2-ebiggers@kernel.orgSuggested-by: NJan Kara <jack@suse.cz> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NJan Kara <jack@suse.cz> Signed-off-by: NEric Biggers <ebiggers@google.com> Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
-
由 Filipe Manana 提交于
stable inclusion from stable-5.10.11 commit adc11110d1e58b575b669f7d76982dac4220ea10 bugzilla: 47621 -------------------------------- commit 518837e6 upstream. When an incremental send finds an extent that is shared, it checks which file extent items in the range refer to that extent, and for those it emits clone operations, while for others it emits regular write operations to avoid corruption at the destination (as described and fixed by commit d906d49f ("Btrfs: send, fix file corruption due to incorrect cloning operations")). However when the root we are cloning from is the send root, we are cloning from the inode currently being processed and the source file range has several extent items that partially point to the desired extent, with an offset smaller than the offset in the file extent item for the range we want to clone into, it can cause the algorithm to issue a clone operation that starts at the current eof of the file being processed in the receiver side, in which case the receiver will fail, with EINVAL, when attempting to execute the clone operation. Example reproducer: $ cat test-send-clone.sh #!/bin/bash DEV=/dev/sdi MNT=/mnt/sdi mkfs.btrfs -f $DEV >/dev/null mount $DEV $MNT # Create our test file with a single and large extent (1M) and with # different content for different file ranges that will be reflinked # later. xfs_io -f \ -c "pwrite -S 0xab 0 128K" \ -c "pwrite -S 0xcd 128K 128K" \ -c "pwrite -S 0xef 256K 256K" \ -c "pwrite -S 0x1a 512K 512K" \ $MNT/foobar btrfs subvolume snapshot -r $MNT $MNT/snap1 btrfs send -f /tmp/snap1.send $MNT/snap1 # Now do a series of changes to our file such that we end up with # different parts of the extent reflinked into different file offsets # and we overwrite a large part of the extent too, so no file extent # items refer to that part that was overwritten. This used to confuse # the algorithm used by the kernel to figure out which file ranges to # clone, making it attempt to clone from a source range starting at # the current eof of the file, resulting in the receiver to fail since # it is an invalid clone operation. # xfs_io -c "reflink $MNT/foobar 64K 1M 960K" \ -c "reflink $MNT/foobar 0K 512K 256K" \ -c "reflink $MNT/foobar 512K 128K 256K" \ -c "pwrite -S 0x73 384K 640K" \ $MNT/foobar btrfs subvolume snapshot -r $MNT $MNT/snap2 btrfs send -f /tmp/snap2.send -p $MNT/snap1 $MNT/snap2 echo -e "\nFile digest in the original filesystem:" md5sum $MNT/snap2/foobar # Now unmount the filesystem, create a new one, mount it and try to # apply both send streams to recreate both snapshots. umount $DEV mkfs.btrfs -f $DEV >/dev/null mount $DEV $MNT btrfs receive -f /tmp/snap1.send $MNT btrfs receive -f /tmp/snap2.send $MNT # Must match what we got in the original filesystem of course. echo -e "\nFile digest in the new filesystem:" md5sum $MNT/snap2/foobar umount $MNT When running the reproducer, the incremental send operation fails due to an invalid clone operation: $ ./test-send-clone.sh wrote 131072/131072 bytes at offset 0 128 KiB, 32 ops; 0.0015 sec (80.906 MiB/sec and 20711.9741 ops/sec) wrote 131072/131072 bytes at offset 131072 128 KiB, 32 ops; 0.0013 sec (90.514 MiB/sec and 23171.6148 ops/sec) wrote 262144/262144 bytes at offset 262144 256 KiB, 64 ops; 0.0025 sec (98.270 MiB/sec and 25157.2327 ops/sec) wrote 524288/524288 bytes at offset 524288 512 KiB, 128 ops; 0.0052 sec (95.730 MiB/sec and 24506.9883 ops/sec) Create a readonly snapshot of '/mnt/sdi' in '/mnt/sdi/snap1' At subvol /mnt/sdi/snap1 linked 983040/983040 bytes at offset 1048576 960 KiB, 1 ops; 0.0006 sec (1.419 GiB/sec and 1550.3876 ops/sec) linked 262144/262144 bytes at offset 524288 256 KiB, 1 ops; 0.0020 sec (120.192 MiB/sec and 480.7692 ops/sec) linked 262144/262144 bytes at offset 131072 256 KiB, 1 ops; 0.0018 sec (133.833 MiB/sec and 535.3319 ops/sec) wrote 655360/655360 bytes at offset 393216 640 KiB, 160 ops; 0.0093 sec (66.781 MiB/sec and 17095.8436 ops/sec) Create a readonly snapshot of '/mnt/sdi' in '/mnt/sdi/snap2' At subvol /mnt/sdi/snap2 File digest in the original filesystem: 9c13c61cb0b9f5abf45344375cb04dfa /mnt/sdi/snap2/foobar At subvol snap1 At snapshot snap2 ERROR: failed to clone extents to foobar: Invalid argument File digest in the new filesystem: 132f0396da8f48d2e667196bff882cfc /mnt/sdi/snap2/foobar The clone operation is invalid because its source range starts at the current eof of the file in the receiver, causing the receiver to get an EINVAL error from the clone operation when attempting it. For the example above, what happens is the following: 1) When processing the extent at file offset 1M, the algorithm checks that the extent is shared and can be (fully or partially) found at file offset 0. At this point the file has a size (and eof) of 1M at the receiver; 2) It finds that our extent item at file offset 1M has a data offset of 64K and, since the file extent item at file offset 0 has a data offset of 0, it issues a clone operation, from the same file and root, that has a source range offset of 64K, destination offset of 1M and a length of 64K, since the extent item at file offset 0 refers only to the first 128K of the shared extent. After this clone operation, the file size (and eof) at the receiver is increased from 1M to 1088K (1M + 64K); 3) Now there's still 896K (960K - 64K) of data left to clone or write, so it checks for the next file extent item, which starts at file offset 128K. This file extent item has a data offset of 0 and a length of 256K, so a clone operation with a source range offset of 256K, a destination offset of 1088K (1M + 64K) and length of 128K is issued. After this operation the file size (and eof) at the receiver increases from 1088K to 1216K (1088K + 128K); 4) Now there's still 768K (896K - 128K) of data left to clone or write, so it checks for the next file extent item, located at file offset 384K. This file extent item points to a different extent, not the one we want to clone, with a length of 640K. So we issue a write operation into the file range 1216K (1088K + 128K, end of the last clone operation), with a length of 640K and with a data matching the one we can find for that range in send root. After this operation, the file size (and eof) at the receiver increases from 1216K to 1856K (1216K + 640K); 5) Now there's still 128K (768K - 640K) of data left to clone or write, so we look into the file extent item, which is for file offset 1M and it points to the extent we want to clone, with a data offset of 64K and a length of 960K. However this matches the file offset we started with, the start of the range to clone into. So we can't for sure find any file extent item from here onwards with the rest of the data we want to clone, yet we proceed and since the file extent item points to the shared extent, with a data offset of 64K, we issue a clone operation with a source range starting at file offset 1856K, which matches the file extent item's offset, 1M, plus the amount of data cloned and written so far, which is 64K (step 2) + 128K (step 3) + 640K (step 4). This clone operation is invalid since the source range offset matches the current eof of the file in the receiver. We should have stopped looking for extents to clone at this point and instead fallback to write, which would simply the contain the data in the file range from 1856K to 1856K + 128K. So fix this by stopping the loop that looks for file ranges to clone at clone_range() when we reach the current eof of the file being processed, if we are cloning from the same file and using the send root as the clone root. This ensures any data not yet cloned will be sent to the receiver through a write operation. A test case for fstests will follow soon. Reported-by: NMassimo B. <massimo.b@gmx.net> Link: https://lore.kernel.org/linux-btrfs/6ae34776e85912960a253a8327068a892998e685.camel@gmx.net/ Fixes: 11f2069c ("Btrfs: send, allow clone operations within the same file") CC: stable@vger.kernel.org # 5.5+ Reviewed-by: NJosef Bacik <josef@toxicpanda.com> Signed-off-by: NFilipe Manana <fdmanana@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
-
由 Josef Bacik 提交于
stable inclusion from stable-5.10.11 commit 018abb50891e4faf051de2ac01cb041f3904e1d1 bugzilla: 47621 -------------------------------- commit 34d1eb0e upstream. If we fail to update a block group item in the loop we'll break, however we'll do btrfs_run_delayed_refs and lose our error value in ret, and thus not clean up properly. Fix this by only running the delayed refs if there was no failure. CC: stable@vger.kernel.org # 4.4+ Reviewed-by: NQu Wenruo <wqu@suse.com> Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: NJosef Bacik <josef@toxicpanda.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
-
由 Josef Bacik 提交于
stable inclusion from stable-5.10.11 commit 14e17e90bfaaf0392d8a48744f91d81ea121fd10 bugzilla: 47621 -------------------------------- commit fb286100 upstream. While testing the error paths of relocation I hit the following lockdep splat: ====================================================== WARNING: possible circular locking dependency detected 5.10.0-rc6+ #217 Not tainted ------------------------------------------------------ mount/779 is trying to acquire lock: ffffa0e676945418 (&fs_info->balance_mutex){+.+.}-{3:3}, at: btrfs_recover_balance+0x2f0/0x340 but task is already holding lock: ffffa0e60ee31da8 (btrfs-root-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x27/0x100 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (btrfs-root-00){++++}-{3:3}: down_read_nested+0x43/0x130 __btrfs_tree_read_lock+0x27/0x100 btrfs_read_lock_root_node+0x31/0x40 btrfs_search_slot+0x462/0x8f0 btrfs_update_root+0x55/0x2b0 btrfs_drop_snapshot+0x398/0x750 clean_dirty_subvols+0xdf/0x120 btrfs_recover_relocation+0x534/0x5a0 btrfs_start_pre_rw_mount+0xcb/0x170 open_ctree+0x151f/0x1726 btrfs_mount_root.cold+0x12/0xea legacy_get_tree+0x30/0x50 vfs_get_tree+0x28/0xc0 vfs_kern_mount.part.0+0x71/0xb0 btrfs_mount+0x10d/0x380 legacy_get_tree+0x30/0x50 vfs_get_tree+0x28/0xc0 path_mount+0x433/0xc10 __x64_sys_mount+0xe3/0x120 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #1 (sb_internal#2){.+.+}-{0:0}: start_transaction+0x444/0x700 insert_balance_item.isra.0+0x37/0x320 btrfs_balance+0x354/0xf40 btrfs_ioctl_balance+0x2cf/0x380 __x64_sys_ioctl+0x83/0xb0 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #0 (&fs_info->balance_mutex){+.+.}-{3:3}: __lock_acquire+0x1120/0x1e10 lock_acquire+0x116/0x370 __mutex_lock+0x7e/0x7b0 btrfs_recover_balance+0x2f0/0x340 open_ctree+0x1095/0x1726 btrfs_mount_root.cold+0x12/0xea legacy_get_tree+0x30/0x50 vfs_get_tree+0x28/0xc0 vfs_kern_mount.part.0+0x71/0xb0 btrfs_mount+0x10d/0x380 legacy_get_tree+0x30/0x50 vfs_get_tree+0x28/0xc0 path_mount+0x433/0xc10 __x64_sys_mount+0xe3/0x120 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xa9 other info that might help us debug this: Chain exists of: &fs_info->balance_mutex --> sb_internal#2 --> btrfs-root-00 Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(btrfs-root-00); lock(sb_internal#2); lock(btrfs-root-00); lock(&fs_info->balance_mutex); *** DEADLOCK *** 2 locks held by mount/779: #0: ffffa0e60dc040e0 (&type->s_umount_key#47/1){+.+.}-{3:3}, at: alloc_super+0xb5/0x380 #1: ffffa0e60ee31da8 (btrfs-root-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x27/0x100 stack backtrace: CPU: 0 PID: 779 Comm: mount Not tainted 5.10.0-rc6+ #217 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014 Call Trace: dump_stack+0x8b/0xb0 check_noncircular+0xcf/0xf0 ? trace_call_bpf+0x139/0x260 __lock_acquire+0x1120/0x1e10 lock_acquire+0x116/0x370 ? btrfs_recover_balance+0x2f0/0x340 __mutex_lock+0x7e/0x7b0 ? btrfs_recover_balance+0x2f0/0x340 ? btrfs_recover_balance+0x2f0/0x340 ? rcu_read_lock_sched_held+0x3f/0x80 ? kmem_cache_alloc_trace+0x2c4/0x2f0 ? btrfs_get_64+0x5e/0x100 btrfs_recover_balance+0x2f0/0x340 open_ctree+0x1095/0x1726 btrfs_mount_root.cold+0x12/0xea ? rcu_read_lock_sched_held+0x3f/0x80 legacy_get_tree+0x30/0x50 vfs_get_tree+0x28/0xc0 vfs_kern_mount.part.0+0x71/0xb0 btrfs_mount+0x10d/0x380 ? __kmalloc_track_caller+0x2f2/0x320 legacy_get_tree+0x30/0x50 vfs_get_tree+0x28/0xc0 ? capable+0x3a/0x60 path_mount+0x433/0xc10 __x64_sys_mount+0xe3/0x120 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xa9 This is straightforward to fix, simply release the path before we setup the balance_ctl. CC: stable@vger.kernel.org # 4.4+ Reviewed-by: NQu Wenruo <wqu@suse.com> Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: NJosef Bacik <josef@toxicpanda.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
-
由 Josef Bacik 提交于
stable inclusion from stable-5.10.11 commit 5169a289fc8c860c1f29883053116cbef2123eaf bugzilla: 47621 -------------------------------- commit 49ecc679 upstream. Zygo reported the following KASAN splat: BUG: KASAN: use-after-free in btrfs_backref_cleanup_node+0x18a/0x420 Read of size 8 at addr ffff888112402950 by task btrfs/28836 CPU: 0 PID: 28836 Comm: btrfs Tainted: G W 5.10.0-e35f27394290-for-next+ #23 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 Call Trace: dump_stack+0xbc/0xf9 ? btrfs_backref_cleanup_node+0x18a/0x420 print_address_description.constprop.8+0x21/0x210 ? record_print_text.cold.34+0x11/0x11 ? btrfs_backref_cleanup_node+0x18a/0x420 ? btrfs_backref_cleanup_node+0x18a/0x420 kasan_report.cold.10+0x20/0x37 ? btrfs_backref_cleanup_node+0x18a/0x420 __asan_load8+0x69/0x90 btrfs_backref_cleanup_node+0x18a/0x420 btrfs_backref_release_cache+0x83/0x1b0 relocate_block_group+0x394/0x780 ? merge_reloc_roots+0x4a0/0x4a0 btrfs_relocate_block_group+0x26e/0x4c0 btrfs_relocate_chunk+0x52/0x120 btrfs_balance+0xe2e/0x1900 ? check_flags.part.50+0x6c/0x1e0 ? btrfs_relocate_chunk+0x120/0x120 ? kmem_cache_alloc_trace+0xa06/0xcb0 ? _copy_from_user+0x83/0xc0 btrfs_ioctl_balance+0x3a7/0x460 btrfs_ioctl+0x24c8/0x4360 ? __kasan_check_read+0x11/0x20 ? check_chain_key+0x1f4/0x2f0 ? __asan_loadN+0xf/0x20 ? btrfs_ioctl_get_supported_features+0x30/0x30 ? kvm_sched_clock_read+0x18/0x30 ? check_chain_key+0x1f4/0x2f0 ? lock_downgrade+0x3f0/0x3f0 ? handle_mm_fault+0xad6/0x2150 ? do_vfs_ioctl+0xfc/0x9d0 ? ioctl_file_clone+0xe0/0xe0 ? check_flags.part.50+0x6c/0x1e0 ? check_flags.part.50+0x6c/0x1e0 ? check_flags+0x26/0x30 ? lock_is_held_type+0xc3/0xf0 ? syscall_enter_from_user_mode+0x1b/0x60 ? do_syscall_64+0x13/0x80 ? rcu_read_lock_sched_held+0xa1/0xd0 ? __kasan_check_read+0x11/0x20 ? __fget_light+0xae/0x110 __x64_sys_ioctl+0xc3/0x100 do_syscall_64+0x37/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f4c4bdfe427 Allocated by task 28836: kasan_save_stack+0x21/0x50 __kasan_kmalloc.constprop.18+0xbe/0xd0 kasan_kmalloc+0x9/0x10 kmem_cache_alloc_trace+0x410/0xcb0 btrfs_backref_alloc_node+0x46/0xf0 btrfs_backref_add_tree_node+0x60d/0x11d0 build_backref_tree+0xc5/0x700 relocate_tree_blocks+0x2be/0xb90 relocate_block_group+0x2eb/0x780 btrfs_relocate_block_group+0x26e/0x4c0 btrfs_relocate_chunk+0x52/0x120 btrfs_balance+0xe2e/0x1900 btrfs_ioctl_balance+0x3a7/0x460 btrfs_ioctl+0x24c8/0x4360 __x64_sys_ioctl+0xc3/0x100 do_syscall_64+0x37/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Freed by task 28836: kasan_save_stack+0x21/0x50 kasan_set_track+0x20/0x30 kasan_set_free_info+0x1f/0x30 __kasan_slab_free+0xf3/0x140 kasan_slab_free+0xe/0x10 kfree+0xde/0x200 btrfs_backref_error_cleanup+0x452/0x530 build_backref_tree+0x1a5/0x700 relocate_tree_blocks+0x2be/0xb90 relocate_block_group+0x2eb/0x780 btrfs_relocate_block_group+0x26e/0x4c0 btrfs_relocate_chunk+0x52/0x120 btrfs_balance+0xe2e/0x1900 btrfs_ioctl_balance+0x3a7/0x460 btrfs_ioctl+0x24c8/0x4360 __x64_sys_ioctl+0xc3/0x100 do_syscall_64+0x37/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xa9 This occurred because we freed our backref node in btrfs_backref_error_cleanup(), but then tried to free it again in btrfs_backref_release_cache(). This is because btrfs_backref_release_cache() will cycle through all of the cache->leaves nodes and free them up. However btrfs_backref_error_cleanup() freed the backref node with btrfs_backref_free_node(), which simply kfree()d the backref node without unlinking it from the cache. Change this to a btrfs_backref_drop_node(), which does the appropriate cleanup and removes the node from the cache->leaves list, so when we go to free the remaining cache we don't trip over items we've already dropped. Fixes: 75bfb9af ("Btrfs: cleanup error handling in build_backref_tree") CC: stable@vger.kernel.org # 4.4+ Signed-off-by: NJosef Bacik <josef@toxicpanda.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
-
由 Josef Bacik 提交于
stable inclusion from stable-5.10.11 commit 9e2fc8f10c9175e7f5d4bd636036ef427bb3eae9 bugzilla: 47621 -------------------------------- commit 18d3bff4 upstream. This was partially fixed by f3e3d9cc ("btrfs: avoid possible signal interruption of btrfs_drop_snapshot() on relocation tree"), however it missed a spot when we restart a trans handle because we need to end the transaction. The fix is the same, simply use btrfs_join_transaction() instead of btrfs_start_transaction() when deleting reloc roots. Fixes: f3e3d9cc ("btrfs: avoid possible signal interruption of btrfs_drop_snapshot() on relocation tree") CC: stable@vger.kernel.org # 5.4+ Signed-off-by: NJosef Bacik <josef@toxicpanda.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
-
- 29 1月, 2021 3 次提交
-
-
由 J. Bruce Fields 提交于
stable inclusion from stable-5.10.10 commit fdcaa4af5e70e2d984c9620a09e9dade067f2620 bugzilla: 47610 -------------------------------- commit 51b2ee7d upstream. If you export a subdirectory of a filesystem, a READDIRPLUS on the root of that export will return the filehandle of the parent with the ".." entry. The filehandle is optional, so let's just not return the filehandle for ".." if we're at the root of an export. Note that once the client learns one filehandle outside of the export, they can trivially access the rest of the export using further lookups. However, it is also not very difficult to guess filehandles outside of the export. So exporting a subdirectory of a filesystem should considered equivalent to providing access to the entire filesystem. To avoid confusion, we recommend only exporting entire filesystems. Reported-by: NYoujipeng <wangzhibei1999@gmail.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
-
由 zhangyi (F) 提交于
maillist inclusion category: bugfix bugzilla: 47445 CVE: NA Reference: https://lore.kernel.org/lkml/1564451504-27906-1-git-send-email-yi.zhang@huawei.com/ --------------------------- io_[p]getevents syscall should return -EINVAL if timeout is out of range, add this validity check. Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com> Reviewed-by: NJeff Moyer <jmoyer@redhat.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Deepa Dinamani <deepa.kernel@gmail.com> Signed-off-by: Nyangerkun <yangerkun@huawei.com> Reviewed-by: Nzhangyi (F) <yi.zhang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 yangerkun 提交于
hulk inclusion category: bugfix bugzilla: 47438 CVE: NA --------------------------- UBSAN has reported a overflow with mem_lseek. And it's fine with mem_open set file mode with FMODE_UNSIGNED_OFFSET(memory_lseek). However, another file use mem_lseek do lseek can have not FMODE_UNSIGNED_OFFSET(proc_kpagecount_operations/proc_pagemap_operations), fix it by checking overflow and FMODE_UNSIGNED_OFFSET. Reviewed-by: Nzhangyi (F) <yi.zhang@huawei.com> ================================================================== UBSAN: Undefined behaviour in ../fs/proc/base.c:941:15 signed integer overflow: 4611686018427387904 + 4611686018427387904 cannot be represented in type 'long long int' CPU: 4 PID: 4762 Comm: syz-executor.1 Not tainted 4.4.189 #3 Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 Call trace: [<ffffff90080a5f28>] dump_backtrace+0x0/0x590 arch/arm64/kernel/traps.c:91 [<ffffff90080a64f0>] show_stack+0x38/0x60 arch/arm64/kernel/traps.c:234 [<ffffff9008986a34>] __dump_stack lib/dump_stack.c:15 [inline] [<ffffff9008986a34>] dump_stack+0x128/0x184 lib/dump_stack.c:51 [<ffffff9008a2d120>] ubsan_epilogue+0x34/0x9c lib/ubsan.c:166 [<ffffff9008a2d8b8>] handle_overflow+0x228/0x280 lib/ubsan.c:197 [<ffffff9008a2da2c>] __ubsan_handle_add_overflow+0x4c/0x68 lib/ubsan.c:204 [<ffffff900862b9f4>] mem_lseek+0x12c/0x130 fs/proc/base.c:941 [<ffffff90084ef78c>] vfs_llseek fs/read_write.c:260 [inline] [<ffffff90084ef78c>] SYSC_lseek fs/read_write.c:285 [inline] [<ffffff90084ef78c>] SyS_lseek+0x164/0x1f0 fs/read_write.c:276 [<ffffff9008093c80>] el0_svc_naked+0x30/0x34 ================================================================== Signed-off-by: Nyangerkun <yangerkun@huawei.com> Reviewed-by: Nzhangyi (F) <yi.zhang@huawei.com> Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com> (cherry picked from commit a422358aa04c53a08b215b8dcd6814d916ef5cf1) Conflicts: fs/read_write.c Signed-off-by: NLi Ming <limingming.li@huawei.com> Reviewed-by: Nzhangyi (F) <yi.zhang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
- 28 1月, 2021 1 次提交
-
-
由 Al Viro 提交于
stable inclusion from stable-5.10.9 commit c6dc4f8e617b4c12c519d2e01305fe5e3343f01d bugzilla: 47457 -------------------------------- commit a0a6df9a upstream. Unfortunately, there's userland code that used to rely upon these checks being done before anything else to check for UMOUNT_NOFOLLOW support. That broke in 41525f56 ("fs: refactor ksys_umount"). Separate those from the rest of checks and move them to ksys_umount(); unlike everything else in there, this can be sanely done there. Reported-by: NSargun Dhillon <sargun@sargun.me> Fixes: 41525f56 ("fs: refactor ksys_umount") Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
-