- 13 4月, 2021 17 次提交
-
-
由 Pavel Begunkov 提交于
stable inclusion from stable-5.10.26 commit 1c20e9040f49687ba2ccc2ffd4411351a6c2ebff bugzilla: 51363 -------------------------------- [ Upstream commit 9ae1f8dd ] WARNING: inconsistent lock state inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage. syz-executor217/8450 [HC1[1]:SC0[0]:HE0:SE1] takes: ffff888023d6e620 (&fs->lock){?.+.}-{2:2}, at: spin_lock include/linux/spinlock.h:354 [inline] ffff888023d6e620 (&fs->lock){?.+.}-{2:2}, at: io_req_clean_work fs/io_uring.c:1398 [inline] ffff888023d6e620 (&fs->lock){?.+.}-{2:2}, at: io_dismantle_req+0x66f/0xf60 fs/io_uring.c:2029 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&fs->lock); <Interrupt> lock(&fs->lock); *** DEADLOCK *** 1 lock held by syz-executor217/8450: #0: ffff88802417c3e8 (&ctx->uring_lock){+.+.}-{3:3}, at: __do_sys_io_uring_enter+0x1071/0x1f30 fs/io_uring.c:9442 stack backtrace: CPU: 1 PID: 8450 Comm: syz-executor217 Not tainted 5.11.0-rc5-next-20210129-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: <IRQ> [...] _raw_spin_lock+0x2a/0x40 kernel/locking/spinlock.c:151 spin_lock include/linux/spinlock.h:354 [inline] io_req_clean_work fs/io_uring.c:1398 [inline] io_dismantle_req+0x66f/0xf60 fs/io_uring.c:2029 __io_free_req+0x3d/0x2e0 fs/io_uring.c:2046 io_free_req fs/io_uring.c:2269 [inline] io_double_put_req fs/io_uring.c:2392 [inline] io_put_req+0xf9/0x570 fs/io_uring.c:2388 io_link_timeout_fn+0x30c/0x480 fs/io_uring.c:6497 __run_hrtimer kernel/time/hrtimer.c:1519 [inline] __hrtimer_run_queues+0x609/0xe40 kernel/time/hrtimer.c:1583 hrtimer_interrupt+0x334/0x940 kernel/time/hrtimer.c:1645 local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1085 [inline] __sysvec_apic_timer_interrupt+0x146/0x540 arch/x86/kernel/apic/apic.c:1102 asm_call_irq_on_stack+0xf/0x20 </IRQ> __run_sysvec_on_irqstack arch/x86/include/asm/irq_stack.h:37 [inline] run_sysvec_on_irqstack_cond arch/x86/include/asm/irq_stack.h:89 [inline] sysvec_apic_timer_interrupt+0xbd/0x100 arch/x86/kernel/apic/apic.c:1096 asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:629 RIP: 0010:__raw_spin_unlock_irq include/linux/spinlock_api_smp.h:169 [inline] RIP: 0010:_raw_spin_unlock_irq+0x25/0x40 kernel/locking/spinlock.c:199 spin_unlock_irq include/linux/spinlock.h:404 [inline] io_queue_linked_timeout+0x194/0x1f0 fs/io_uring.c:6525 __io_queue_sqe+0x328/0x1290 fs/io_uring.c:6594 io_queue_sqe+0x631/0x10d0 fs/io_uring.c:6639 io_queue_link_head fs/io_uring.c:6650 [inline] io_submit_sqe fs/io_uring.c:6697 [inline] io_submit_sqes+0x19b5/0x2720 fs/io_uring.c:6960 __do_sys_io_uring_enter+0x107d/0x1f30 fs/io_uring.c:9443 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Don't free requests from under hrtimer context (softirq) as it may sleep or take spinlocks improperly (e.g. non-irq versions). Cc: stable@vger.kernel.org # 5.6+ Reported-by: syzbot+81d17233a2b02eafba33@syzkaller.appspotmail.com Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: N Weilong Chen <chenweilong@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Steve French 提交于
stable inclusion from stable-5.10.26 commit 04eb2b2fa12ff6023a92d5199275255e9b82011b bugzilla: 51363 -------------------------------- commit 65af8f01 upstream. Applications that create and extend and write to a file do not expect to see 0 allocation size. When file is extended, set its allocation size to a plausible value until we have a chance to query the server for it. When the file is cached this will prevent showing an impossible number of allocated blocks (like 0). This fixes e.g. xfstests 614 which does 1) create a file and set its size to 64K 2) mmap write 64K to the file 3) stat -c %b for the file (to query the number of allocated blocks) It was failing because we returned 0 blocks. Even though we would return the correct cached file size, we returned an impossible allocation size. Signed-off-by: NSteve French <stfrench@microsoft.com> CC: <stable@vger.kernel.org> Reviewed-by: NAurelien Aptel <aaptel@suse.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: N Weilong Chen <chenweilong@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Jens Axboe 提交于
stable inclusion from stable-5.10.26 commit 6cae8095490caae12875300243ec94b39b6a2a78 bugzilla: 51363 -------------------------------- commit 3ebba796 upstream. If we create it in a disabled state because IORING_SETUP_R_DISABLED is set on ring creation, we need to ensure that we've kicked the thread if we're exiting before it's been explicitly disabled. Otherwise we can run into a deadlock where exit is waiting go park the SQPOLL thread, but the SQPOLL thread itself is waiting to get a signal to start. That results in the below trace of both tasks hung, waiting on each other: INFO: task syz-executor458:8401 blocked for more than 143 seconds. Not tainted 5.11.0-next-20210226-syzkaller #0 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:syz-executor458 state:D stack:27536 pid: 8401 ppid: 8400 flags:0x00004004 Call Trace: context_switch kernel/sched/core.c:4324 [inline] __schedule+0x90c/0x21a0 kernel/sched/core.c:5075 schedule+0xcf/0x270 kernel/sched/core.c:5154 schedule_timeout+0x1db/0x250 kernel/time/timer.c:1868 do_wait_for_common kernel/sched/completion.c:85 [inline] __wait_for_common kernel/sched/completion.c:106 [inline] wait_for_common kernel/sched/completion.c:117 [inline] wait_for_completion+0x168/0x270 kernel/sched/completion.c:138 io_sq_thread_park fs/io_uring.c:7115 [inline] io_sq_thread_park+0xd5/0x130 fs/io_uring.c:7103 io_uring_cancel_task_requests+0x24c/0xd90 fs/io_uring.c:8745 __io_uring_files_cancel+0x110/0x230 fs/io_uring.c:8840 io_uring_files_cancel include/linux/io_uring.h:47 [inline] do_exit+0x299/0x2a60 kernel/exit.c:780 do_group_exit+0x125/0x310 kernel/exit.c:922 __do_sys_exit_group kernel/exit.c:933 [inline] __se_sys_exit_group kernel/exit.c:931 [inline] __x64_sys_exit_group+0x3a/0x50 kernel/exit.c:931 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x43e899 RSP: 002b:00007ffe89376d48 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7 RAX: ffffffffffffffda RBX: 00000000004af2f0 RCX: 000000000043e899 RDX: 000000000000003c RSI: 00000000000000e7 RDI: 0000000000000000 RBP: 0000000000000000 R08: ffffffffffffffc0 R09: 0000000010000000 R10: 0000000000008011 R11: 0000000000000246 R12: 00000000004af2f0 R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000001 INFO: task iou-sqp-8401:8402 can't die for more than 143 seconds. task:iou-sqp-8401 state:D stack:30272 pid: 8402 ppid: 8400 flags:0x00004004 Call Trace: context_switch kernel/sched/core.c:4324 [inline] __schedule+0x90c/0x21a0 kernel/sched/core.c:5075 schedule+0xcf/0x270 kernel/sched/core.c:5154 schedule_timeout+0x1db/0x250 kernel/time/timer.c:1868 do_wait_for_common kernel/sched/completion.c:85 [inline] __wait_for_common kernel/sched/completion.c:106 [inline] wait_for_common kernel/sched/completion.c:117 [inline] wait_for_completion+0x168/0x270 kernel/sched/completion.c:138 io_sq_thread+0x27d/0x1ae0 fs/io_uring.c:6717 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294 INFO: task iou-sqp-8401:8402 blocked for more than 143 seconds. Reported-by: syzbot+fb5458330b4442f2090d@syzkaller.appspotmail.com Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: N Weilong Chen <chenweilong@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Tetsuo Handa 提交于
stable inclusion from stable-5.10.26 commit a7acb614287b7de8bf86d6758dac43bbd1d29534 bugzilla: 51363 -------------------------------- commit 9c7d83ae upstream. syzbot is hitting WARN_ON(pstore_sb != sb) at pstore_kill_sb() [1], for the assumption that pstore_sb != NULL is wrong because pstore_fill_super() will not assign pstore_sb = sb when new_inode() for d_make_root() returned NULL (due to memory allocation fault injection). Since mount_single() calls pstore_kill_sb() when pstore_fill_super() failed, pstore_kill_sb() needs to be aware of such failure path. [1] https://syzkaller.appspot.com/bug?id=6abacb8da5137cb47a416f2bef95719ed60508a0Reported-by: Nsyzbot <syzbot+d0cf0ad6513e9a1da5df@syzkaller.appspotmail.com> Signed-off-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: NKees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20210214031307.57903-1-penguin-kernel@I-love.SAKURA.ne.jpSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: N Weilong Chen <chenweilong@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Olga Kornievskaia 提交于
stable inclusion from stable-5.10.26 commit 982b899ba672c1eb2e0c01fef197bda13de4af55 bugzilla: 51363 -------------------------------- commit 614c9750 upstream. A cleanup of the inter SSC copy needs to call fput() of the source file handle to make sure that file structure is freed as well as drop the reference on the superblock to unmount the source server. Fixes: 36e1e5ba ("NFSD: Fix use-after-free warning when doing inter-server copy") Signed-off-by: NOlga Kornievskaia <kolga@netapp.com> Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Tested-by: NDai Ngo <dai.ngo@oracle.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: N Weilong Chen <chenweilong@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 J. Bruce Fields 提交于
stable inclusion from stable-5.10.26 commit 12628e7779f8e191c010955058d278df5bf0c0d4 bugzilla: 51363 -------------------------------- commit bfdd89f2 upstream. The typical result of the backwards comparison here is that the source server in a server-to-server copy will return BAD_STATEID within a few seconds of the copy starting, instead of giving the copy a full lease period, so the copy_file_range() call will end up unnecessarily returning a short read. Fixes: 624322f1 "NFSD add COPY_NOTIFY operation" Signed-off-by: NJ. Bruce Fields <bfields@redhat.com> Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: N Weilong Chen <chenweilong@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Trond Myklebust 提交于
stable inclusion from stable-5.10.26 commit 5ea0aa29ad4b8bc96b8cfcfb367f04b50b9cf92f bugzilla: 51363 -------------------------------- commit d30881f5 upstream. If a file is unhashed, then we're going to reject it anyway and retry, so make sure we skip it when we're doing the RCU lockless lookup. This avoids a number of unnecessary nfserr_jukebox returns from nfsd_file_acquire() Fixes: 65294c1f ("nfsd: add a new struct file caching facility to nfsd") Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: N Weilong Chen <chenweilong@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 David Howells 提交于
stable inclusion from stable-5.10.26 commit 64195f022ae8c24e0abccc1545d557b064e73ed3 bugzilla: 51363 -------------------------------- commit a7889c63 upstream. afs_listxattr() lists all the available special afs xattrs (i.e. those in the "afs.*" space), no matter what type of server we're dealing with. But OpenAFS servers, for example, cannot deal with some of the extra-capable attributes that AuriStor (YFS) servers provide. Unfortunately, the presence of the afs.yfs.* attributes causes errors[1] for anything that tries to read them if the server is of the wrong type. Fix the problem by removing afs_listxattr() so that none of the special xattrs are listed (AFS doesn't support xattrs). It does mean, however, that getfattr won't list them, though they can still be accessed with getxattr() and setxattr(). This can be tested with something like: getfattr -d -m ".*" /afs/example.com/path/to/file With this change, none of the afs.* attributes should be visible. Changes: ver #2: - Hide all of the afs.* xattrs, not just the ACL ones. Fixes: ae46578b ("afs: Get YFS ACLs and information through xattrs") Reported-by: NGaja Sophie Peters <gaja.peters@math.uni-hamburg.de> Signed-off-by: NDavid Howells <dhowells@redhat.com> Tested-by: NGaja Sophie Peters <gaja.peters@math.uni-hamburg.de> Reviewed-by: NJeffrey Altman <jaltman@auristor.com> Reviewed-by: NMarc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Link: http://lists.infradead.org/pipermail/linux-afs/2021-March/003502.html [1] Link: http://lists.infradead.org/pipermail/linux-afs/2021-March/003567.html # v1 Link: http://lists.infradead.org/pipermail/linux-afs/2021-March/003573.html # v2 Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: N Weilong Chen <chenweilong@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 David Howells 提交于
stable inclusion from stable-5.10.26 commit 78ba4793b084f722a0aaf5f32a3d9f7c3e284b22 bugzilla: 51363 -------------------------------- commit 64fcbb61 upstream. If someone attempts to access YFS-related xattrs (e.g. afs.yfs.acl) on a file on a non-YFS AFS server (such as OpenAFS), then the kernel will jump to a NULL function pointer because the afs_fetch_acl_operation descriptor doesn't point to a function for issuing an operation on a non-YFS server[1]. Fix this by making afs_wait_for_operation() check that the issue_afs_rpc method is set before jumping to it and setting -ENOTSUPP if not. This fix also covers other potential operations that also only exist on YFS servers. afs_xattr_get/set_yfs() then need to translate -ENOTSUPP to -ENODATA as the former error is internal to the kernel. The bug shows up as an oops like the following: BUG: kernel NULL pointer dereference, address: 0000000000000000 [...] Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6. [...] Call Trace: afs_wait_for_operation+0x83/0x1b0 [kafs] afs_xattr_get_yfs+0xe6/0x270 [kafs] __vfs_getxattr+0x59/0x80 vfs_getxattr+0x11c/0x140 getxattr+0x181/0x250 ? __check_object_size+0x13f/0x150 ? __fput+0x16d/0x250 __x64_sys_fgetxattr+0x64/0xb0 do_syscall_64+0x49/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7fb120a9defe This was triggered with "cp -a" which attempts to copy xattrs, including afs ones, but is easier to reproduce with getfattr, e.g.: getfattr -d -m ".*" /afs/openafs.org/ Fixes: e49c7b2f ("afs: Build an abstraction around an "operation" concept") Reported-by: NGaja Sophie Peters <gaja.peters@math.uni-hamburg.de> Signed-off-by: NDavid Howells <dhowells@redhat.com> Tested-by: NGaja Sophie Peters <gaja.peters@math.uni-hamburg.de> Reviewed-by: NMarc Dionne <marc.dionne@auristor.com> Reviewed-by: NJeffrey Altman <jaltman@auristor.com> cc: linux-afs@lists.infradead.org Link: http://lists.infradead.org/pipermail/linux-afs/2021-March/003498.html [1] Link: http://lists.infradead.org/pipermail/linux-afs/2021-March/003566.html # v1 Link: http://lists.infradead.org/pipermail/linux-afs/2021-March/003572.html # v2 Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: N Weilong Chen <chenweilong@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 David Sterba 提交于
stable inclusion from stable-5.10.26 commit 2c8d6a9474f07375c87c4dc6f008610b3ce755a7 bugzilla: 51363 -------------------------------- commit 34e49994 upstream. The free space tree bitmap slab cache is created with SLAB_RED_ZONE but that's a debugging flag and not always enabled. Also the other slabs are created with at least SLAB_MEM_SPREAD that we want as well to average the memory placement cost. Reported-by: NVlastimil Babka <vbabka@suse.cz> Fixes: 3acd4850 ("btrfs: fix allocation of free space cache v1 bitmap pages") CC: stable@vger.kernel.org # 5.4+ Signed-off-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: N Weilong Chen <chenweilong@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Filipe Manana 提交于
stable inclusion from stable-5.10.26 commit 38ffe9eaeb7cce383525439f0948f9eb74632e1d bugzilla: 51363 -------------------------------- commit dbcc7d57 upstream. While resolving backreferences, as part of a logical ino ioctl call or fiemap, we can end up hitting a BUG_ON() when replaying tree mod log operations of a root, triggering a stack trace like the following: ------------[ cut here ]------------ kernel BUG at fs/btrfs/ctree.c:1210! invalid opcode: 0000 [#1] SMP KASAN PTI CPU: 1 PID: 19054 Comm: crawl_335 Tainted: G W 5.11.0-2d11c0084b02-misc-next+ #89 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 RIP: 0010:__tree_mod_log_rewind+0x3b1/0x3c0 Code: 05 48 8d 74 10 (...) RSP: 0018:ffffc90001eb70b8 EFLAGS: 00010297 RAX: 0000000000000000 RBX: ffff88812344e400 RCX: ffffffffb28933b6 RDX: 0000000000000007 RSI: dffffc0000000000 RDI: ffff88812344e42c RBP: ffffc90001eb7108 R08: 1ffff11020b60a20 R09: ffffed1020b60a20 R10: ffff888105b050f9 R11: ffffed1020b60a1f R12: 00000000000000ee R13: ffff8880195520c0 R14: ffff8881bc958500 R15: ffff88812344e42c FS: 00007fd1955e8700(0000) GS:ffff8881f5600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007efdb7928718 CR3: 000000010103a006 CR4: 0000000000170ee0 Call Trace: btrfs_search_old_slot+0x265/0x10d0 ? lock_acquired+0xbb/0x600 ? btrfs_search_slot+0x1090/0x1090 ? free_extent_buffer.part.61+0xd7/0x140 ? free_extent_buffer+0x13/0x20 resolve_indirect_refs+0x3e9/0xfc0 ? lock_downgrade+0x3d0/0x3d0 ? __kasan_check_read+0x11/0x20 ? add_prelim_ref.part.11+0x150/0x150 ? lock_downgrade+0x3d0/0x3d0 ? __kasan_check_read+0x11/0x20 ? lock_acquired+0xbb/0x600 ? __kasan_check_write+0x14/0x20 ? do_raw_spin_unlock+0xa8/0x140 ? rb_insert_color+0x30/0x360 ? prelim_ref_insert+0x12d/0x430 find_parent_nodes+0x5c3/0x1830 ? resolve_indirect_refs+0xfc0/0xfc0 ? lock_release+0xc8/0x620 ? fs_reclaim_acquire+0x67/0xf0 ? lock_acquire+0xc7/0x510 ? lock_downgrade+0x3d0/0x3d0 ? lockdep_hardirqs_on_prepare+0x160/0x210 ? lock_release+0xc8/0x620 ? fs_reclaim_acquire+0x67/0xf0 ? lock_acquire+0xc7/0x510 ? poison_range+0x38/0x40 ? unpoison_range+0x14/0x40 ? trace_hardirqs_on+0x55/0x120 btrfs_find_all_roots_safe+0x142/0x1e0 ? find_parent_nodes+0x1830/0x1830 ? btrfs_inode_flags_to_xflags+0x50/0x50 iterate_extent_inodes+0x20e/0x580 ? tree_backref_for_extent+0x230/0x230 ? lock_downgrade+0x3d0/0x3d0 ? read_extent_buffer+0xdd/0x110 ? lock_downgrade+0x3d0/0x3d0 ? __kasan_check_read+0x11/0x20 ? lock_acquired+0xbb/0x600 ? __kasan_check_write+0x14/0x20 ? _raw_spin_unlock+0x22/0x30 ? __kasan_check_write+0x14/0x20 iterate_inodes_from_logical+0x129/0x170 ? iterate_inodes_from_logical+0x129/0x170 ? btrfs_inode_flags_to_xflags+0x50/0x50 ? iterate_extent_inodes+0x580/0x580 ? __vmalloc_node+0x92/0xb0 ? init_data_container+0x34/0xb0 ? init_data_container+0x34/0xb0 ? kvmalloc_node+0x60/0x80 btrfs_ioctl_logical_to_ino+0x158/0x230 btrfs_ioctl+0x205e/0x4040 ? __might_sleep+0x71/0xe0 ? btrfs_ioctl_get_supported_features+0x30/0x30 ? getrusage+0x4b6/0x9c0 ? __kasan_check_read+0x11/0x20 ? lock_release+0xc8/0x620 ? __might_fault+0x64/0xd0 ? lock_acquire+0xc7/0x510 ? lock_downgrade+0x3d0/0x3d0 ? lockdep_hardirqs_on_prepare+0x210/0x210 ? lockdep_hardirqs_on_prepare+0x210/0x210 ? __kasan_check_read+0x11/0x20 ? do_vfs_ioctl+0xfc/0x9d0 ? ioctl_file_clone+0xe0/0xe0 ? lock_downgrade+0x3d0/0x3d0 ? lockdep_hardirqs_on_prepare+0x210/0x210 ? __kasan_check_read+0x11/0x20 ? lock_release+0xc8/0x620 ? __task_pid_nr_ns+0xd3/0x250 ? lock_acquire+0xc7/0x510 ? __fget_files+0x160/0x230 ? __fget_light+0xf2/0x110 __x64_sys_ioctl+0xc3/0x100 do_syscall_64+0x37/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7fd1976e2427 Code: 00 00 90 48 8b 05 (...) RSP: 002b:00007fd1955e5cf8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00007fd1955e5f40 RCX: 00007fd1976e2427 RDX: 00007fd1955e5f48 RSI: 00000000c038943b RDI: 0000000000000004 RBP: 0000000001000000 R08: 0000000000000000 R09: 00007fd1955e6120 R10: 0000557835366b00 R11: 0000000000000246 R12: 0000000000000004 R13: 00007fd1955e5f48 R14: 00007fd1955e5f40 R15: 00007fd1955e5ef8 Modules linked in: ---[ end trace ec8931a1c36e57be ]--- (gdb) l *(__tree_mod_log_rewind+0x3b1) 0xffffffff81893521 is in __tree_mod_log_rewind (fs/btrfs/ctree.c:1210). 1205 * the modification. as we're going backwards, we do the 1206 * opposite of each operation here. 1207 */ 1208 switch (tm->op) { 1209 case MOD_LOG_KEY_REMOVE_WHILE_FREEING: 1210 BUG_ON(tm->slot < n); 1211 fallthrough; 1212 case MOD_LOG_KEY_REMOVE_WHILE_MOVING: 1213 case MOD_LOG_KEY_REMOVE: 1214 btrfs_set_node_key(eb, &tm->key, tm->slot); Here's what happens to hit that BUG_ON(): 1) We have one tree mod log user (through fiemap or the logical ino ioctl), with a sequence number of 1, so we have fs_info->tree_mod_seq == 1; 2) Another task is at ctree.c:balance_level() and we have eb X currently as the root of the tree, and we promote its single child, eb Y, as the new root. Then, at ctree.c:balance_level(), we call: tree_mod_log_insert_root(eb X, eb Y, 1); 3) At tree_mod_log_insert_root() we create tree mod log elements for each slot of eb X, of operation type MOD_LOG_KEY_REMOVE_WHILE_FREEING each with a ->logical pointing to ebX->start. These are placed in an array named tm_list. Lets assume there are N elements (N pointers in eb X); 4) Then, still at tree_mod_log_insert_root(), we create a tree mod log element of operation type MOD_LOG_ROOT_REPLACE, ->logical set to ebY->start, ->old_root.logical set to ebX->start, ->old_root.level set to the level of eb X and ->generation set to the generation of eb X; 5) Then tree_mod_log_insert_root() calls tree_mod_log_free_eb() with tm_list as argument. After that, tree_mod_log_free_eb() calls __tree_mod_log_insert() for each member of tm_list in reverse order, from highest slot in eb X, slot N - 1, to slot 0 of eb X; 6) __tree_mod_log_insert() sets the sequence number of each given tree mod log operation - it increments fs_info->tree_mod_seq and sets fs_info->tree_mod_seq as the sequence number of the given tree mod log operation. This means that for the tm_list created at tree_mod_log_insert_root(), the element corresponding to slot 0 of eb X has the highest sequence number (1 + N), and the element corresponding to the last slot has the lowest sequence number (2); 7) Then, after inserting tm_list's elements into the tree mod log rbtree, the MOD_LOG_ROOT_REPLACE element is inserted, which gets the highest sequence number, which is N + 2; 8) Back to ctree.c:balance_level(), we free eb X by calling btrfs_free_tree_block() on it. Because eb X was created in the current transaction, has no other references and writeback did not happen for it, we add it back to the free space cache/tree; 9) Later some other task T allocates the metadata extent from eb X, since it is marked as free space in the space cache/tree, and uses it as a node for some other btree; 10) The tree mod log user task calls btrfs_search_old_slot(), which calls get_old_root(), and finally that calls __tree_mod_log_oldest_root() with time_seq == 1 and eb_root == eb Y; 11) First iteration of the while loop finds the tree mod log element with sequence number N + 2, for the logical address of eb Y and of type MOD_LOG_ROOT_REPLACE; 12) Because the operation type is MOD_LOG_ROOT_REPLACE, we don't break out of the loop, and set root_logical to point to tm->old_root.logical which corresponds to the logical address of eb X; 13) On the next iteration of the while loop, the call to tree_mod_log_search_oldest() returns the smallest tree mod log element for the logical address of eb X, which has a sequence number of 2, an operation type of MOD_LOG_KEY_REMOVE_WHILE_FREEING and corresponds to the old slot N - 1 of eb X (eb X had N items in it before being freed); 14) We then break out of the while loop and return the tree mod log operation of type MOD_LOG_ROOT_REPLACE (eb Y), and not the one for slot N - 1 of eb X, to get_old_root(); 15) At get_old_root(), we process the MOD_LOG_ROOT_REPLACE operation and set "logical" to the logical address of eb X, which was the old root. We then call tree_mod_log_search() passing it the logical address of eb X and time_seq == 1; 16) Then before calling tree_mod_log_search(), task T adds a key to eb X, which results in adding a tree mod log operation of type MOD_LOG_KEY_ADD to the tree mod log - this is done at ctree.c:insert_ptr() - but after adding the tree mod log operation and before updating the number of items in eb X from 0 to 1... 17) The task at get_old_root() calls tree_mod_log_search() and gets the tree mod log operation of type MOD_LOG_KEY_ADD just added by task T. Then it enters the following if branch: if (old_root && tm && tm->op != MOD_LOG_KEY_REMOVE_WHILE_FREEING) { (...) } (...) Calls read_tree_block() for eb X, which gets a reference on eb X but does not lock it - task T has it locked. Then it clones eb X while it has nritems set to 0 in its header, before task T sets nritems to 1 in eb X's header. From hereupon we use the clone of eb X which no other task has access to; 18) Then we call __tree_mod_log_rewind(), passing it the MOD_LOG_KEY_ADD mod log operation we just got from tree_mod_log_search() in the previous step and the cloned version of eb X; 19) At __tree_mod_log_rewind(), we set the local variable "n" to the number of items set in eb X's clone, which is 0. Then we enter the while loop, and in its first iteration we process the MOD_LOG_KEY_ADD operation, which just decrements "n" from 0 to (u32)-1, since "n" is declared with a type of u32. At the end of this iteration we call rb_next() to find the next tree mod log operation for eb X, that gives us the mod log operation of type MOD_LOG_KEY_REMOVE_WHILE_FREEING, for slot 0, with a sequence number of N + 1 (steps 3 to 6); 20) Then we go back to the top of the while loop and trigger the following BUG_ON(): (...) switch (tm->op) { case MOD_LOG_KEY_REMOVE_WHILE_FREEING: BUG_ON(tm->slot < n); fallthrough; (...) Because "n" has a value of (u32)-1 (4294967295) and tm->slot is 0. Fix this by taking a read lock on the extent buffer before cloning it at ctree.c:get_old_root(). This should be done regardless of the extent buffer having been freed and reused, as a concurrent task might be modifying it (while holding a write lock on it). Reported-by: NZygo Blaxell <ce3g8jdj@umail.furryterror.org> Link: https://lore.kernel.org/linux-btrfs/20210227155037.GN28049@hungrycats.org/ Fixes: 834328a8 ("Btrfs: tree mod log's old roots could still be part of the tree") CC: stable@vger.kernel.org # 4.4+ Signed-off-by: NFilipe Manana <fdmanana@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: N Weilong Chen <chenweilong@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Chao Yu 提交于
stable inclusion from stable-5.10.26 commit 78486cf1f31e3f646a981f91f4be3db62689265e bugzilla: 51363 -------------------------------- commit 6980d29c upstream. In zonefs_open_zone(), if opened zone count is larger than .s_max_open_zones threshold, we missed to recover .i_wr_refcnt, fix this. Fixes: b5c00e97 ("zonefs: open/close zone on file open/close") Cc: <stable@vger.kernel.org> Signed-off-by: NChao Yu <yuchao0@huawei.com> Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: N Weilong Chen <chenweilong@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Damien Le Moal 提交于
stable inclusion from stable-5.10.26 commit 9c1c5e81a00250628b1dea74b815fc641ee77952 bugzilla: 51363 -------------------------------- commit 1601ea06 upstream. The sequential write constraint of sequential zone file prevent their use as swap files. Only allow conventional zone files to be used as swap files. Fixes: 8dcc1a9d ("fs: New zonefs file system") Cc: <stable@vger.kernel.org> Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: N Weilong Chen <chenweilong@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Damien Le Moal 提交于
stable inclusion from stable-5.10.26 commit dfbdbf0f359abbe5005ee3d99d1923af904c8584 bugzilla: 51363 -------------------------------- commit ebfd68cd upstream. zonefs updates the size of a sequential zone file inode only on completion of direct writes. When executing asynchronous append writes (with a file open with O_APPEND or using RWF_APPEND), the use of the current inode size in generic_write_checks() to set an iocb offset thus leads to unaligned write if an application issues an append write operation with another write already being executed. Fix this problem by introducing zonefs_write_checks() as a modified version of generic_write_checks() using the file inode wp_offset for an append write iocb offset. Also introduce zonefs_write_check_limits() to replace generic_write_check_limits() call. This zonefs special helper makes sure that the maximum file limit used is the maximum size of the file being accessed. Since zonefs_write_checks() already truncates the iov_iter, the calls to iov_iter_truncate() in zonefs_file_dio_write() and zonefs_file_buffered_write() are removed. Fixes: 8dcc1a9d ("fs: New zonefs file system") Cc: <stable@vger.kernel.org> Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: N Weilong Chen <chenweilong@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 J. Bruce Fields 提交于
stable inclusion from stable-5.10.25 commit df8596f5774387f92133e0e5b7e05808ff6595d7 bugzilla: 51362 -------------------------------- commit 6ee65a77 upstream. This reverts commit 94415b06. That commit claimed to allow a client to get a read delegation when it was the only writer. Actually it allowed a client to get a read delegation when *any* client has a write open! The main problem is that it's depending on nfs4_clnt_odstate structures that are actually only maintained for pnfs exports. This causes clients to miss writes performed by other clients, even when there have been intervening closes and opens, violating close-to-open cache consistency. We can do this a different way, but first we should just revert this. I've added pynfs 4.1 test DELEG19 to test for this, as I should have done originally! Cc: stable@vger.kernel.org Reported-by: NTimo Rothenpieler <timo@rothenpieler.org> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com> Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: N Weilong Chen <chenweilong@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 J. Bruce Fields 提交于
stable inclusion from stable-5.10.25 commit 894ecf0cb505561b9f37b302b7479eea939b0790 bugzilla: 51362 -------------------------------- commit 4aa5e002 upstream. This reverts commit 50747dd5 "nfsd4: remove check_conflicting_opens warning", as a prerequisite for reverting 94415b06, which has a serious bug. Cc: stable@vger.kernel.org Signed-off-by: NJ. Bruce Fields <bfields@redhat.com> Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: N Weilong Chen <chenweilong@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Amir Goldstein 提交于
stable inclusion from stable-5.10.25 commit d955f13ea2120269319d6133d0dd82b66d1eeca3 bugzilla: 51362 -------------------------------- commit 775c5033 upstream. Commit 5d069dbe ("fuse: fix bad inode") replaced make_bad_inode() in fuse_iget() with a private implementation fuse_make_bad(). The private implementation fails to remove the bad inode from inode cache, so the retry loop with iget5_locked() finds the same bad inode and marks it bad forever. kmsg snip: [ ] rcu: INFO: rcu_sched self-detected stall on CPU ... [ ] ? bit_wait_io+0x50/0x50 [ ] ? fuse_init_file_inode+0x70/0x70 [ ] ? find_inode.isra.32+0x60/0xb0 [ ] ? fuse_init_file_inode+0x70/0x70 [ ] ilookup5_nowait+0x65/0x90 [ ] ? fuse_init_file_inode+0x70/0x70 [ ] ilookup5.part.36+0x2e/0x80 [ ] ? fuse_init_file_inode+0x70/0x70 [ ] ? fuse_inode_eq+0x20/0x20 [ ] iget5_locked+0x21/0x80 [ ] ? fuse_inode_eq+0x20/0x20 [ ] fuse_iget+0x96/0x1b0 Fixes: 5d069dbe ("fuse: fix bad inode") Cc: stable@vger.kernel.org # 5.10+ Signed-off-by: NAmir Goldstein <amir73il@gmail.com> Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: N Weilong Chen <chenweilong@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
- 09 4月, 2021 23 次提交
-
-
由 Jan Kara 提交于
mainline inclusion from mainline-v5.12-rc4 commit 2a4ae3bc category: bugfix bugzilla: 50839 CVE: NA ----------------------------------------------- When filesystem mount fails because of corrupted filesystem we first cancel the s_err_report timer reminding fs errors every day and only then we flush s_error_work. However s_error_work may report another fs error and re-arm timer thus resulting in timer use-after-free. Fix the problem by first flushing the work and only after that canceling the s_err_report timer. Reported-by: syzbot+628472a2aac693ab0fcd@syzkaller.appspotmail.com Fixes: 2d01ddc8 ("ext4: save error info to sb through journal if available") CC: stable@vger.kernel.org Signed-off-by: NJan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210315165906.2175-1-jack@suse.czSigned-off-by: NTheodore Ts'o <tytso@mit.edu> Signed-off-by: NYe Bin <yebin10@huawei.com> Reviewed-by: Nzhangyi (F) <yi.zhang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Jan Kara 提交于
mainline inclusion from mainline-v5.11-rc4 commit a3f5cf14 category: bugfix bugzilla: 50839 CVE: NA ----------------------------------------------- The wrapper is now useless since it does what ext4_handle_dirty_metadata() does. Just remove it. Signed-off-by: NJan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20201216101844.22917-9-jack@suse.czSigned-off-by: NTheodore Ts'o <tytso@mit.edu> Signed-off-by: NYe Bin <yebin10@huawei.com> Reviewed-by: Nzhangyi (F) <yi.zhang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Jan Kara 提交于
mainline inclusion from mainline-v5.11-rc4 commit e92ad03f category: bugfix bugzilla: 50839 CVE: NA ----------------------------------------------- No behavioral change. Signed-off-by: NJan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20201216101844.22917-6-jack@suse.czSigned-off-by: NTheodore Ts'o <tytso@mit.edu> Signed-off-by: NYe Bin <yebin10@huawei.com> Reviewed-by: Nzhangyi (F) <yi.zhang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Jan Kara 提交于
mainline inclusion from mainline-v5.11-rc4 commit 2d01ddc8 category: bugfix bugzilla: 50839 CVE: NA ----------------------------------------------- If journalling is still working at the moment we get to writing error information to the superblock we cannot write directly to the superblock as such write could race with journalled update of the superblock and cause journal checksum failures, writing inconsistent information to the journal or other problems. We cannot journal the superblock directly from the error handling functions as we are running in uncertain context and could deadlock so just punt journalled superblock update to a workqueue. Signed-off-by: NJan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20201216101844.22917-5-jack@suse.czSigned-off-by: NTheodore Ts'o <tytso@mit.edu> Signed-off-by: NYe Bin <yebin10@huawei.com> Reviewed-by: Nzhangyi (F) <yi.zhang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Jan Kara 提交于
mainline inclusion from mainline-v5.11-rc4 commit 05c2c00f category: bugfix bugzilla: 50839 CVE: NA ----------------------------------------------- Protect all superblock modifications (including checksum computation) with a superblock buffer lock. That way we are sure computed checksum matches current superblock contents (a mismatch could cause checksum failures in nojournal mode or if an unjournalled superblock update races with a journalled one). Also we avoid modifying superblock contents while it is being written out (which can cause DIF/DIX failures if we are running in nojournal mode). Signed-off-by: NJan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20201216101844.22917-4-jack@suse.czSigned-off-by: NTheodore Ts'o <tytso@mit.edu> conflicts: fs/ext4/file.c Signed-off-by: NYe Bin <yebin10@huawei.com> Reviewed-by: Nzhangyi (F) <yi.zhang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Jan Kara 提交于
mainline inclusion from mainline-v5.11-rc4 commit 4392fbc4 category: bugfix bugzilla: 50839 CVE: NA ----------------------------------------------- Everybody passes 1 as sync argument of ext4_commit_super(). Just drop it. Reviewed-by: NHarshad Shirwadkar <harshadshirwadkar@gmail.com> Signed-off-by: NJan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20201216101844.22917-3-jack@suse.czSigned-off-by: NTheodore Ts'o <tytso@mit.edu> Signed-off-by: NYe Bin <yebin10@huawei.com> Reviewed-by: Nzhangyi (F) <yi.zhang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Jan Kara 提交于
mainline inclusion from mainline-v5.11-rc4 commit e789ca0c category: bugfix bugzilla: 50839 CVE: NA ----------------------------------------------- save_error_info() is always called together with ext4_handle_error(). Combine them into a single call and move unconditional bits out of save_error_info() into ext4_handle_error(). Signed-off-by: NJan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20201216101844.22917-2-jack@suse.czSigned-off-by: NTheodore Ts'o <tytso@mit.edu> Signed-off-by: NYe Bin <yebin10@huawei.com> Reviewed-by: Nzhangyi (F) <yi.zhang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Jan Kara 提交于
mainline inclusion from mainline-v5.11-rc1 commit c92dc856 category: bugfix bugzilla: 50839 CVE: NA ----------------------------------------------- When filesystem inconsistency is detected with group locked, we currently try to modify superblock to store error there without blocking. However this can cause superblock checksum failures (or DIF/DIX failure) when the superblock is just being written out. Make error handling code just store error information in ext4_sb_info structure and copy it to on-disk superblock only in ext4_commit_super(). In case of error happening with group locked, we just postpone the superblock flushing to a workqueue. [ Added fixup so that s_first_error_* does not get updated after the file system is remounted. Also added fix for syzbot failure. - Ted ] Signed-off-by: NJan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20201127113405.26867-8-jack@suse.czSigned-off-by: NTheodore Ts'o <tytso@mit.edu> Cc: Hillf Danton <hdanton@sina.com> Reported-by: syzbot+9043030c040ce1849a60@syzkaller.appspotmail.com Signed-off-by: NYe Bin <yebin10@huawei.com> Reviewed-by: Nzhangyi (F) <yi.zhang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Jan Kara 提交于
mainline inclusion from mainline-v5.11-rc1 commit 02a7780e category: bugfix bugzilla: 50839 CVE: NA ----------------------------------------------- We convert errno's to ext4 on-disk format error codes in save_error_info(). Add a function and a bit of macro magic to make this simpler. Signed-off-by: NJan Kara <jack@suse.cz> Reviewed-by: NAndreas Dilger <adilger@dilger.ca> Link: https://lore.kernel.org/r/20201127113405.26867-7-jack@suse.czSigned-off-by: NTheodore Ts'o <tytso@mit.edu> Signed-off-by: NYe Bin <yebin10@huawei.com> Reviewed-by: Nzhangyi (F) <yi.zhang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Jan Kara 提交于
mainline inclusion from mainline-v5.11-rc1 commit 40676623 category: bugfix bugzilla: 50839 CVE: NA ----------------------------------------------- Just move error info related functions in super.c close to ext4_handle_error(). We'll want to combine save_error_info() with ext4_handle_error() and this makes change more obvious and saves a forward declaration as well. No functional change. Signed-off-by: NJan Kara <jack@suse.cz> Reviewed-by: NAndreas Dilger <adilger@dilger.ca> Link: https://lore.kernel.org/r/20201127113405.26867-6-jack@suse.czSigned-off-by: NTheodore Ts'o <tytso@mit.edu> Signed-off-by: NYe Bin <yebin10@huawei.com> Reviewed-by: Nzhangyi (F) <yi.zhang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Jan Kara 提交于
mainline inclusion from mainline-v5.11-rc1 commit 014c9caa category: bugfix bugzilla: 50839 CVE: NA ----------------------------------------------- The only difference between __ext4_abort() and __ext4_error() is that the former one ignores errors=continue mount option. Unify the code to reduce duplication. Signed-off-by: NJan Kara <jack@suse.cz> Reviewed-by: NAndreas Dilger <adilger@dilger.ca> Link: https://lore.kernel.org/r/20201127113405.26867-5-jack@suse.czSigned-off-by: NTheodore Ts'o <tytso@mit.edu> Signed-off-by: NYe Bin <yebin10@huawei.com> Reviewed-by: Nzhangyi (F) <yi.zhang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Jan Kara 提交于
mainline inclusion from mainline-v5.11-rc1 commit 93c20bc3 category: bugfix bugzilla: 50839 CVE: NA ----------------------------------------------- We use __ext4_error() when ext4_protect_reserved_inode() finds filesystem corruption. However EXT4_ERROR_INODE_ERR() is perfectly capable of reporting all the needed information. So just use that. Signed-off-by: NJan Kara <jack@suse.cz> Reviewed-by: NAndreas Dilger <adilger@dilger.ca> Link: https://lore.kernel.org/r/20201127113405.26867-4-jack@suse.czSigned-off-by: NTheodore Ts'o <tytso@mit.edu> Signed-off-by: NYe Bin <yebin10@huawei.com> Reviewed-by: Nzhangyi (F) <yi.zhang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Jan Kara 提交于
mainline inclusion from mainline-v5.11-rc1 commit 81414b4d category: bugfix bugzilla: 50839 CVE: NA ----------------------------------------------- Superblock is written out either through ext4_commit_super() or through ext4_handle_dirty_super(). In both cases we recompute the checksum so it is not necessary to recompute it after updating superblock free inodes & blocks counters. Signed-off-by: NJan Kara <jack@suse.cz> Reviewed-by: NAndreas Dilger <adilger@dilger.ca> Link: https://lore.kernel.org/r/20201127113405.26867-3-jack@suse.czSigned-off-by: NTheodore Ts'o <tytso@mit.edu> Signed-off-by: NYe Bin <yebin10@huawei.com> Reviewed-by: Nzhangyi (F) <yi.zhang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Lior Ribak 提交于
stable inclusion from stable-5.10.24 commit 5ab9464a2a3c538eedbb438f1802f2fd98d0953f bugzilla: 51348 -------------------------------- commit e7850f4d upstream. There is a deadlock in bm_register_write: First, in the begining of the function, a lock is taken on the binfmt_misc root inode with inode_lock(d_inode(root)). Then, if the user used the MISC_FMT_OPEN_FILE flag, the function will call open_exec on the user-provided interpreter. open_exec will call a path lookup, and if the path lookup process includes the root of binfmt_misc, it will try to take a shared lock on its inode again, but it is already locked, and the code will get stuck in a deadlock To reproduce the bug: $ echo ":iiiii:E::ii::/proc/sys/fs/binfmt_misc/bla:F" > /proc/sys/fs/binfmt_misc/register backtrace of where the lock occurs (#5): 0 schedule () at ./arch/x86/include/asm/current.h:15 1 0xffffffff81b51237 in rwsem_down_read_slowpath (sem=0xffff888003b202e0, count=<optimized out>, state=state@entry=2) at kernel/locking/rwsem.c:992 2 0xffffffff81b5150a in __down_read_common (state=2, sem=<optimized out>) at kernel/locking/rwsem.c:1213 3 __down_read (sem=<optimized out>) at kernel/locking/rwsem.c:1222 4 down_read (sem=<optimized out>) at kernel/locking/rwsem.c:1355 5 0xffffffff811ee22a in inode_lock_shared (inode=<optimized out>) at ./include/linux/fs.h:783 6 open_last_lookups (op=0xffffc9000022fe34, file=0xffff888004098600, nd=0xffffc9000022fd10) at fs/namei.c:3177 7 path_openat (nd=nd@entry=0xffffc9000022fd10, op=op@entry=0xffffc9000022fe34, flags=flags@entry=65) at fs/namei.c:3366 8 0xffffffff811efe1c in do_filp_open (dfd=<optimized out>, pathname=pathname@entry=0xffff8880031b9000, op=op@entry=0xffffc9000022fe34) at fs/namei.c:3396 9 0xffffffff811e493f in do_open_execat (fd=fd@entry=-100, name=name@entry=0xffff8880031b9000, flags=<optimized out>, flags@entry=0) at fs/exec.c:913 10 0xffffffff811e4a92 in open_exec (name=<optimized out>) at fs/exec.c:948 11 0xffffffff8124aa84 in bm_register_write (file=<optimized out>, buffer=<optimized out>, count=19, ppos=<optimized out>) at fs/binfmt_misc.c:682 12 0xffffffff811decd2 in vfs_write (file=file@entry=0xffff888004098500, buf=buf@entry=0xa758d0 ":iiiii:E::ii::i:CF ", count=count@entry=19, pos=pos@entry=0xffffc9000022ff10) at fs/read_write.c:603 13 0xffffffff811defda in ksys_write (fd=<optimized out>, buf=0xa758d0 ":iiiii:E::ii::i:CF ", count=19) at fs/read_write.c:658 14 0xffffffff81b49813 in do_syscall_64 (nr=<optimized out>, regs=0xffffc9000022ff58) at arch/x86/entry/common.c:46 15 0xffffffff81c0007c in entry_SYSCALL_64 () at arch/x86/entry/entry_64.S:120 To solve the issue, the open_exec call is moved to before the write lock is taken by bm_register_write Link: https://lkml.kernel.org/r/20210228224414.95962-1-liorribak@gmail.com Fixes: 948b701a ("binfmt_misc: add persistent opened binary handler for containers") Signed-off-by: NLior Ribak <liorribak@gmail.com> Acked-by: NHelge Deller <deller@gmx.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: N Weilong Chen <chenweilong@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Daiyue Zhang 提交于
stable inclusion from stable-5.10.24 commit 109720342efd6ace3d2e8f34a25ea65036bb1d3b bugzilla: 51348 -------------------------------- [ Upstream commit 14fbbc82 ] Commit b0841eef ("configfs: provide exclusion between IO and removals") uses ->frag_dead to mark the fragment state, thus no bothering with extra refcount on config_item when opening a file. The configfs_get_config_item was removed in __configfs_open_file, but not with config_item_put. So the refcount on config_item will lost its balance, causing use-after-free issues in some occasions like this: Test: 1. Mount configfs on /config with read-only items: drwxrwx--- 289 root root 0 2021-04-01 11:55 /config drwxr-xr-x 2 root root 0 2021-04-01 11:54 /config/a --w--w--w- 1 root root 4096 2021-04-01 11:53 /config/a/1.txt ...... 2. Then run: for file in /config do echo $file grep -R 'key' $file done 3. __configfs_open_file will be called in parallel, the first one got called will do: if (file->f_mode & FMODE_READ) { if (!(inode->i_mode & S_IRUGO)) goto out_put_module; config_item_put(buffer->item); kref_put() package_details_release() kfree() the other one will run into use-after-free issues like this: BUG: KASAN: use-after-free in __configfs_open_file+0x1bc/0x3b0 Read of size 8 at addr fffffff155f02480 by task grep/13096 CPU: 0 PID: 13096 Comm: grep VIP: 00 Tainted: G W 4.14.116-kasan #1 TGID: 13096 Comm: grep Call trace: dump_stack+0x118/0x160 kasan_report+0x22c/0x294 __asan_load8+0x80/0x88 __configfs_open_file+0x1bc/0x3b0 configfs_open_file+0x28/0x34 do_dentry_open+0x2cc/0x5c0 vfs_open+0x80/0xe0 path_openat+0xd8c/0x2988 do_filp_open+0x1c4/0x2fc do_sys_open+0x23c/0x404 SyS_openat+0x38/0x48 Allocated by task 2138: kasan_kmalloc+0xe0/0x1ac kmem_cache_alloc_trace+0x334/0x394 packages_make_item+0x4c/0x180 configfs_mkdir+0x358/0x740 vfs_mkdir2+0x1bc/0x2e8 SyS_mkdirat+0x154/0x23c el0_svc_naked+0x34/0x38 Freed by task 13096: kasan_slab_free+0xb8/0x194 kfree+0x13c/0x910 package_details_release+0x524/0x56c kref_put+0xc4/0x104 config_item_put+0x24/0x34 __configfs_open_file+0x35c/0x3b0 configfs_open_file+0x28/0x34 do_dentry_open+0x2cc/0x5c0 vfs_open+0x80/0xe0 path_openat+0xd8c/0x2988 do_filp_open+0x1c4/0x2fc do_sys_open+0x23c/0x404 SyS_openat+0x38/0x48 el0_svc_naked+0x34/0x38 To fix this issue, remove the config_item_put in __configfs_open_file to balance the refcount of config_item. Fixes: b0841eef ("configfs: provide exclusion between IO and removals") Signed-off-by: NDaiyue Zhang <zhangdaiyue1@huawei.com> Signed-off-by: NYi Chen <chenyi77@huawei.com> Signed-off-by: NGe Qiu <qiuge@huawei.com> Reviewed-by: NChao Yu <yuchao0@huawei.com> Acked-by: NAl Viro <viro@zeniv.linux.org.uk> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: N Weilong Chen <chenweilong@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Ondrej Mosnacek 提交于
stable inclusion from stable-5.10.24 commit caa86901c863e7c3646d189f2deb9e844afd0568 bugzilla: 51348 -------------------------------- [ Upstream commit 53cb2454 ] An xattr 'get' handler is expected to return the length of the value on success, yet _nfs4_get_security_label() (and consequently also nfs4_xattr_get_nfs4_label(), which is used as an xattr handler) returns just 0 on success. Fix this by returning label.len instead, which contains the length of the result. Fixes: aa9c2669 ("NFS: Client implementation of Labeled-NFS") Signed-off-by: NOndrej Mosnacek <omosnace@redhat.com> Reviewed-by: NJames Morris <jamorris@linux.microsoft.com> Reviewed-by: NPaul Moore <paul@paul-moore.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: N Weilong Chen <chenweilong@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Trond Myklebust 提交于
stable inclusion from stable-5.10.24 commit e181960ec51d5fa089d6e8e2478febe01ca8be04 bugzilla: 51348 -------------------------------- [ Upstream commit 47397915 ] The fact that the lookup revalidation failed, does not mean that the inode contents have changed. Fixes: 5ceb9d7f ("NFS: Refactor nfs_lookup_revalidate()") Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: N Weilong Chen <chenweilong@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Trond Myklebust 提交于
stable inclusion from stable-5.10.24 commit dd756d05bee58077ea0239861022ca83e7d8d23d bugzilla: 51348 -------------------------------- [ Upstream commit 82e7ca13 ] There should be no reason to expect the directory permissions to change just because the directory contents changed or a negative lookup timed out. So let's avoid doing a full call to nfs_mark_for_revalidate() in that case. Furthermore, if this is a negative dentry, and we haven't actually done a new lookup, then we have no reason yet to believe the directory has changed at all. So let's remove the gratuitous directory inode invalidation altogether when called from nfs_lookup_revalidate_negative(). Reported-by: NGeert Jansen <gerardu@amazon.com> Fixes: 5ceb9d7f ("NFS: Refactor nfs_lookup_revalidate()") Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: N Weilong Chen <chenweilong@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Paulo Alcantara 提交于
stable inclusion from stable-5.10.24 commit d308202c1b96024a2f3325642f5e087cf997b5d9 bugzilla: 51348 -------------------------------- commit 04ad69c3 upstream. In case of interrupted syscalls, prevent sending CLOSE commands for compound CREATE+CLOSE requests by introducing an CIFS_CP_CREATE_CLOSE_OP flag to indicate lower layers that it should not send a CLOSE command to the MIDs corresponding the compound CREATE+CLOSE request. A simple reproducer: #!/bin/bash mount //server/share /mnt -o username=foo,password=*** tc qdisc add dev eth0 root netem delay 450ms stat -f /mnt &>/dev/null & pid=$! sleep 0.01 kill $pid tc qdisc del dev eth0 root umount /mnt Before patch: ... 6 0.256893470 192.168.122.2 → 192.168.122.15 SMB2 402 Create Request File: ;GetInfo Request FS_INFO/FileFsFullSizeInformation;Close Request 7 0.257144491 192.168.122.15 → 192.168.122.2 SMB2 498 Create Response File: ;GetInfo Response;Close Response 9 0.260798209 192.168.122.2 → 192.168.122.15 SMB2 146 Close Request File: 10 0.260841089 192.168.122.15 → 192.168.122.2 SMB2 130 Close Response, Error: STATUS_FILE_CLOSED Signed-off-by: NPaulo Alcantara (SUSE) <pc@cjr.nz> Reviewed-by: NRonnie Sahlberg <lsahlber@redhat.com> Reviewed-by: NAurelien Aptel <aaptel@suse.com> CC: <stable@vger.kernel.org> Signed-off-by: NSteve French <stfrench@microsoft.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: N Weilong Chen <chenweilong@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Jan Kara 提交于
stable inclusion from stable-5.10.24 commit d44c9780ed40db88626c9354868eab72159c7a7f bugzilla: 51348 -------------------------------- commit 56887cff upstream. Commit 384d87ef ("block: Do not discard buffers under a mounted filesystem") made paths issuing discard or zeroout requests to the underlying device try to grab block device in exclusive mode. If that failed we returned EBUSY to userspace. This however caused unexpected fallout in userspace where e.g. FUSE filesystems issue discard requests from userspace daemons although the device is open exclusively by the kernel. Also shrinking of logical volume by LVM issues discard requests to a device which may be claimed exclusively because there's another LV on the same PV. So to avoid these userspace regressions, fall back to invalidate_inode_pages2_range() instead of returning EBUSY to userspace and return EBUSY only of that call fails as well (meaning that there's indeed someone using the particular device range we are trying to discard). Link: https://bugzilla.kernel.org/show_bug.cgi?id=211167 Fixes: 384d87ef ("block: Do not discard buffers under a mounted filesystem") CC: stable@vger.kernel.org Signed-off-by: NJan Kara <jack@suse.cz> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: N Weilong Chen <chenweilong@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Theodore Ts'o 提交于
stable inclusion from stable-5.10.24 commit 64578f9417e1e3482f3e4492496772fca130f526 bugzilla: 51348 -------------------------------- [ Upstream commit 027f14f5 ] If we try to make any changes via the journal between when the journal is initialized, but before the multi-block allocated is initialized, we will end up deferencing a NULL pointer when the journal commit callback function calls ext4_process_freed_data(). The proximate cause of this failure was commit 2d01ddc8 ("ext4: save error info to sb through journal if available") since file system corruption problems detected before the call to ext4_mb_init() would result in a journal commit before we aborted the mount of the file system.... and we would then trigger the NULL pointer deref. Link: https://lore.kernel.org/r/YAm8qH/0oo2ofSMR@mit.eduReported-by: NMurphy Zhou <jencce.kernel@gmail.com> Reviewed-by: NJan Kara <jack@suse.cz> Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: N Weilong Chen <chenweilong@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Steven J. Magnani 提交于
stable inclusion from stable-5.10.24 commit 82d6c12899e2645bd17d6f9c7d494f360e1089e1 bugzilla: 51348 -------------------------------- [ Upstream commit 63c9e47a ] When extending a file, udf_do_extend_file() may enter following empty indirect extent. At the end of udf_do_extend_file() we revert prev_epos to point to the last written extent. However if we end up not adding any further extent in udf_do_extend_file(), the reverting points prev_epos into the header area of the AED and following updates of the extents (in udf_update_extents()) will corrupt the header. Make sure that we do not follow indirect extent if we are not going to add any more extents so that returning back to the last written extent works correctly. Link: https://lore.kernel.org/r/20210107234116.6190-2-magnani@ieee.orgSigned-off-by: NSteven J. Magnani <magnani@ieee.org> Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: N Weilong Chen <chenweilong@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Aurelien Aptel 提交于
stable inclusion from stable-5.10.24 commit 3370a84d781ca5227682bd6e747aaefb6dcc8e21 bugzilla: 51348 -------------------------------- commit a249cc8b upstream. With multichannel, operations like the queries from "ls -lR" can cause all credits to be used and errors to be returned since max_credits was not being set correctly on the secondary channels and thus the client was requesting 0 credits incorrectly in some cases (which can lead to not having enough credits to perform any operation on that channel). Signed-off-by: NAurelien Aptel <aaptel@suse.com> CC: <stable@vger.kernel.org> # v5.8+ Reviewed-by: NShyam Prasad N <sprasad@microsoft.com> Signed-off-by: NSteve French <stfrench@microsoft.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: N Weilong Chen <chenweilong@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-