1. 24 11月, 2022 1 次提交
    • D
      fscache: fix OOB Read in __fscache_acquire_volume · 9f0933ac
      David Howells 提交于
      The type of a->key[0] is char in fscache_volume_same().  If the length
      of cache volume key is greater than 127, the value of a->key[0] is less
      than 0.  In this case, klen becomes much larger than 255 after type
      conversion, because the type of klen is size_t.  As a result, memcmp()
      is read out of bounds.
      
      This causes a slab-out-of-bounds Read in __fscache_acquire_volume(), as
      reported by Syzbot.
      
      Fix this by changing the type of the stored key to "u8 *" rather than
      "char *" (it isn't a simple string anyway).  Also put in a check that
      the volume name doesn't exceed NAME_MAX.
      
        BUG: KASAN: slab-out-of-bounds in memcmp+0x16f/0x1c0 lib/string.c:757
        Read of size 8 at addr ffff888016f3aa90 by task syz-executor344/3613
        Call Trace:
         memcmp+0x16f/0x1c0 lib/string.c:757
         memcmp include/linux/fortify-string.h:420 [inline]
         fscache_volume_same fs/fscache/volume.c:133 [inline]
         fscache_hash_volume fs/fscache/volume.c:171 [inline]
         __fscache_acquire_volume+0x76c/0x1080 fs/fscache/volume.c:328
         fscache_acquire_volume include/linux/fscache.h:204 [inline]
         v9fs_cache_session_get_cookie+0x143/0x240 fs/9p/cache.c:34
         v9fs_session_init+0x1166/0x1810 fs/9p/v9fs.c:473
         v9fs_mount+0xba/0xc90 fs/9p/vfs_super.c:126
         legacy_get_tree+0x105/0x220 fs/fs_context.c:610
         vfs_get_tree+0x89/0x2f0 fs/super.c:1530
         do_new_mount fs/namespace.c:3040 [inline]
         path_mount+0x1326/0x1e20 fs/namespace.c:3370
         do_mount fs/namespace.c:3383 [inline]
         __do_sys_mount fs/namespace.c:3591 [inline]
         __se_sys_mount fs/namespace.c:3568 [inline]
         __x64_sys_mount+0x27f/0x300 fs/namespace.c:3568
      
      Fixes: 62ab6335 ("fscache: Implement volume registration")
      Reported-by: syzbot+a76f6a6e524cf2080aa3@syzkaller.appspotmail.com
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Reviewed-by: NZhang Peng <zhangpeng362@huawei.com>
      Reviewed-by: NJingbo Xu <jefflexu@linux.alibaba.com>
      cc: Dominique Martinet <asmadeus@codewreck.org>
      cc: Jeff Layton <jlayton@kernel.org>
      cc: v9fs-developer@lists.sourceforge.net
      cc: linux-cachefs@redhat.com
      Link: https://lore.kernel.org/r/Y3OH+Dmi0QIOK18n@codewreck.org/ # Zhang Peng's v1 fix
      Link: https://lore.kernel.org/r/20221115140447.2971680-1-zhangpeng362@huawei.com/ # Zhang Peng's v2 fix
      Link: https://lore.kernel.org/r/166869954095.3793579.8500020902371015443.stgit@warthog.procyon.org.uk/ # v1
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9f0933ac
  2. 23 11月, 2022 1 次提交
  3. 19 11月, 2022 1 次提交
  4. 17 11月, 2022 1 次提交
  5. 16 11月, 2022 6 次提交
  6. 14 11月, 2022 5 次提交
  7. 12 11月, 2022 1 次提交
  8. 11 11月, 2022 1 次提交
    • T
      kernfs: Fix spurious lockdep warning in kernfs_find_and_get_node_by_id() · 1edfe4ea
      Tejun Heo 提交于
      c2549174 ("kernfs: Add KERNFS_REMOVING flags") made
      kernfs_find_and_get_node_by_id() test kernfs_active() instead of
      KERNFS_ACTIVATED. kernfs_find_and_get_by_id() is called without holding the
      kernfs_rwsem triggering the following lockdep warning.
      
        WARNING: CPU: 1 PID: 6191 at fs/kernfs/dir.c:36 kernfs_active+0xe8/0x120 fs/kernfs/dir.c:38
        Modules linked in:
        CPU: 1 PID: 6191 Comm: syz-executor.1 Not tainted 6.0.0-syzkaller-09413-g4899a36f #0
        Hardware name: linux,dummy-virt (DT)
        pstate: 10000005 (nzcV daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
        pc : kernfs_active+0xe8/0x120 fs/kernfs/dir.c:36
        lr : lock_is_held include/linux/lockdep.h:283 [inline]
        lr : kernfs_active+0x94/0x120 fs/kernfs/dir.c:36
        sp : ffff8000182c7a00
        x29: ffff8000182c7a00 x28: 0000000000000002 x27: 0000000000000001
        x26: ffff00000ee1f6a8 x25: 1fffe00001dc3ed5 x24: 0000000000000000
        x23: ffff80000ca1fba0 x22: ffff8000089efcb0 x21: 0000000000000001
        x20: ffff0000091181d0 x19: ffff0000091181d0 x18: ffff00006a9e6b88
        x17: 0000000000000000 x16: 0000000000000000 x15: ffff00006a9e6bc4
        x14: 1ffff00003058f0e x13: 1fffe0000258c816 x12: ffff700003058f39
        x11: 1ffff00003058f38 x10: ffff700003058f38 x9 : dfff800000000000
        x8 : ffff80000e482f20 x7 : ffff0000091d8058 x6 : ffff80000e482c60
        x5 : ffff000009402ee8 x4 : 1ffff00001bd1f46 x3 : 1fffe0000258c6d1
        x2 : 0000000000000003 x1 : 00000000000000c0 x0 : 0000000000000000
        Call trace:
         kernfs_active+0xe8/0x120 fs/kernfs/dir.c:38
         kernfs_find_and_get_node_by_id+0x6c/0x140 fs/kernfs/dir.c:708
         __kernfs_fh_to_dentry fs/kernfs/mount.c:102 [inline]
         kernfs_fh_to_dentry+0x88/0x1fc fs/kernfs/mount.c:128
         exportfs_decode_fh_raw+0x104/0x560 fs/exportfs/expfs.c:435
         exportfs_decode_fh+0x10/0x5c fs/exportfs/expfs.c:575
         do_handle_to_path fs/fhandle.c:152 [inline]
         handle_to_path fs/fhandle.c:207 [inline]
         do_handle_open+0x2a4/0x7b0 fs/fhandle.c:223
         __do_compat_sys_open_by_handle_at fs/fhandle.c:277 [inline]
         __se_compat_sys_open_by_handle_at fs/fhandle.c:274 [inline]
         __arm64_compat_sys_open_by_handle_at+0x6c/0x9c fs/fhandle.c:274
         __invoke_syscall arch/arm64/kernel/syscall.c:38 [inline]
         invoke_syscall+0x6c/0x260 arch/arm64/kernel/syscall.c:52
         el0_svc_common.constprop.0+0xc4/0x254 arch/arm64/kernel/syscall.c:142
         do_el0_svc_compat+0x40/0x70 arch/arm64/kernel/syscall.c:212
         el0_svc_compat+0x54/0x140 arch/arm64/kernel/entry-common.c:772
         el0t_32_sync_handler+0x90/0x140 arch/arm64/kernel/entry-common.c:782
         el0t_32_sync+0x190/0x194 arch/arm64/kernel/entry.S:586
        irq event stamp: 232
        hardirqs last  enabled at (231): [<ffff8000081edf70>] raw_spin_rq_unlock_irq kernel/sched/sched.h:1367 [inline]
        hardirqs last  enabled at (231): [<ffff8000081edf70>] finish_lock_switch kernel/sched/core.c:4943 [inline]
        hardirqs last  enabled at (231): [<ffff8000081edf70>] finish_task_switch.isra.0+0x200/0x880 kernel/sched/core.c:5061
        hardirqs last disabled at (232): [<ffff80000c888bb4>] el1_dbg+0x24/0x80 arch/arm64/kernel/entry-common.c:404
        softirqs last  enabled at (228): [<ffff800008010938>] _stext+0x938/0xf58
        softirqs last disabled at (207): [<ffff800008019380>] ____do_softirq+0x10/0x20 arch/arm64/kernel/irq.c:79
        ---[ end trace 0000000000000000 ]---
      
      The lockdep warning in kernfs_active() is there to ensure that the activated
      state stays stable for the caller. For kernfs_find_and_get_node_by_id(), all
      that's needed is ensuring that a node which has never been activated can't
      be looked up and guaranteeing lookup success when the caller knows the node
      to be active, both of which can be achieved by testing the active count
      without holding the kernfs_rwsem.
      
      Fix the spurious warning by introducing __kernfs_active() which doesn't have
      the lockdep annotation.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: syzbot+590ce62b128e79cf0a35@syzkaller.appspotmail.com
      Fixes: c2549174 ("kernfs: Add KERNFS_REMOVING flags")
      Cc: Amir Goldstein <amir73il@gmail.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Link: https://lore.kernel.org/r/Y0SwqBsZ9BMmZv6x@slm.duckdns.orgSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1edfe4ea
  9. 10 11月, 2022 1 次提交
  10. 09 11月, 2022 6 次提交
    • Z
      udf: Fix a slab-out-of-bounds write bug in udf_find_entry() · c8af247d
      ZhangPeng 提交于
      Syzbot reported a slab-out-of-bounds Write bug:
      
      loop0: detected capacity change from 0 to 2048
      ==================================================================
      BUG: KASAN: slab-out-of-bounds in udf_find_entry+0x8a5/0x14f0
      fs/udf/namei.c:253
      Write of size 105 at addr ffff8880123ff896 by task syz-executor323/3610
      
      CPU: 0 PID: 3610 Comm: syz-executor323 Not tainted
      6.1.0-rc2-syzkaller-00105-gb229b6ca #0
      Hardware name: Google Compute Engine/Google Compute Engine, BIOS
      Google 10/11/2022
      Call Trace:
       <TASK>
       __dump_stack lib/dump_stack.c:88 [inline]
       dump_stack_lvl+0x1b1/0x28e lib/dump_stack.c:106
       print_address_description+0x74/0x340 mm/kasan/report.c:284
       print_report+0x107/0x1f0 mm/kasan/report.c:395
       kasan_report+0xcd/0x100 mm/kasan/report.c:495
       kasan_check_range+0x2a7/0x2e0 mm/kasan/generic.c:189
       memcpy+0x3c/0x60 mm/kasan/shadow.c:66
       udf_find_entry+0x8a5/0x14f0 fs/udf/namei.c:253
       udf_lookup+0xef/0x340 fs/udf/namei.c:309
       lookup_open fs/namei.c:3391 [inline]
       open_last_lookups fs/namei.c:3481 [inline]
       path_openat+0x10e6/0x2df0 fs/namei.c:3710
       do_filp_open+0x264/0x4f0 fs/namei.c:3740
       do_sys_openat2+0x124/0x4e0 fs/open.c:1310
       do_sys_open fs/open.c:1326 [inline]
       __do_sys_creat fs/open.c:1402 [inline]
       __se_sys_creat fs/open.c:1396 [inline]
       __x64_sys_creat+0x11f/0x160 fs/open.c:1396
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x63/0xcd
      RIP: 0033:0x7ffab0d164d9
      Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89
      f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01
      f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007ffe1a7e6bb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000055
      RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007ffab0d164d9
      RDX: 00007ffab0d164d9 RSI: 0000000000000000 RDI: 0000000020000180
      RBP: 00007ffab0cd5a10 R08: 0000000000000000 R09: 0000000000000000
      R10: 00005555573552c0 R11: 0000000000000246 R12: 00007ffab0cd5aa0
      R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
       </TASK>
      
      Allocated by task 3610:
       kasan_save_stack mm/kasan/common.c:45 [inline]
       kasan_set_track+0x3d/0x60 mm/kasan/common.c:52
       ____kasan_kmalloc mm/kasan/common.c:371 [inline]
       __kasan_kmalloc+0x97/0xb0 mm/kasan/common.c:380
       kmalloc include/linux/slab.h:576 [inline]
       udf_find_entry+0x7b6/0x14f0 fs/udf/namei.c:243
       udf_lookup+0xef/0x340 fs/udf/namei.c:309
       lookup_open fs/namei.c:3391 [inline]
       open_last_lookups fs/namei.c:3481 [inline]
       path_openat+0x10e6/0x2df0 fs/namei.c:3710
       do_filp_open+0x264/0x4f0 fs/namei.c:3740
       do_sys_openat2+0x124/0x4e0 fs/open.c:1310
       do_sys_open fs/open.c:1326 [inline]
       __do_sys_creat fs/open.c:1402 [inline]
       __se_sys_creat fs/open.c:1396 [inline]
       __x64_sys_creat+0x11f/0x160 fs/open.c:1396
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      The buggy address belongs to the object at ffff8880123ff800
       which belongs to the cache kmalloc-256 of size 256
      The buggy address is located 150 bytes inside of
       256-byte region [ffff8880123ff800, ffff8880123ff900)
      
      The buggy address belongs to the physical page:
      page:ffffea000048ff80 refcount:1 mapcount:0 mapping:0000000000000000
      index:0x0 pfn:0x123fe
      head:ffffea000048ff80 order:1 compound_mapcount:0 compound_pincount:0
      flags: 0xfff00000010200(slab|head|node=0|zone=1|lastcpupid=0x7ff)
      raw: 00fff00000010200 ffffea00004b8500 dead000000000003 ffff888012041b40
      raw: 0000000000000000 0000000080100010 00000001ffffffff 0000000000000000
      page dumped because: kasan: bad access detected
      page_owner tracks the page as allocated
      page last allocated via order 0, migratetype Unmovable, gfp_mask 0x0(),
      pid 1, tgid 1 (swapper/0), ts 1841222404, free_ts 0
       create_dummy_stack mm/page_owner.c:67 [inline]
       register_early_stack+0x77/0xd0 mm/page_owner.c:83
       init_page_owner+0x3a/0x731 mm/page_owner.c:93
       kernel_init_freeable+0x41c/0x5d5 init/main.c:1629
       kernel_init+0x19/0x2b0 init/main.c:1519
      page_owner free stack trace missing
      
      Memory state around the buggy address:
       ffff8880123ff780: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
       ffff8880123ff800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      >ffff8880123ff880: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 06
                                                                      ^
       ffff8880123ff900: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
       ffff8880123ff980: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      ==================================================================
      
      Fix this by changing the memory size allocated for copy_name from
      UDF_NAME_LEN(254) to UDF_NAME_LEN_CS0(255), because the total length
      (lfi) of subsequent memcpy can be up to 255.
      
      CC: stable@vger.kernel.org
      Reported-by: syzbot+69c9fdccc6dd08961d34@syzkaller.appspotmail.com
      Fixes: 066b9cde ("udf: Use separate buffer for copying split names")
      Signed-off-by: NZhangPeng <zhangpeng362@huawei.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20221109013542.442790-1-zhangpeng362@huawei.com
      c8af247d
    • J
      fs: fix leaked psi pressure state · 82e60d00
      Johannes Weiner 提交于
      When psi annotations were added to to btrfs compression reads, the psi
      state tracking over add_ra_bio_pages and btrfs_submit_compressed_read was
      faulty.  A pressure state, once entered, is never left.  This results in
      incorrectly elevated pressure, which triggers OOM kills.
      
      pflags record the *previous* memstall state when we enter a new one.  The
      code tried to initialize pflags to 1, and then optimize the leave call
      when we either didn't enter a memstall, or were already inside a nested
      stall.  However, there can be multiple PageWorkingset pages in the bio, at
      which point it's that path itself that enters repeatedly and overwrites
      pflags.  This causes us to miss the exit.
      
      Enter the stall only once if needed, then unwind correctly.
      
      erofs has the same problem, fix that up too.  And move the memstall exit
      past submit_bio() to restore submit accounting originally added by
      b8e24a93 ("block: annotate refault stalls from IO submission").
      
      Link: https://lkml.kernel.org/r/Y2UHRqthNUwuIQGS@cmpxchg.org
      Fixes: 4088a47e ("btrfs: add manual PSI accounting for compressed reads")
      Fixes: 99486c51 ("erofs: add manual PSI accounting for the compressed address space")
      Fixes: 118f3663 ("block: remove PSI accounting from the bio layer")
      Link: https://lore.kernel.org/r/d20a0a85-e415-cf78-27f9-77dd7a94bc8d@leemhuis.info/Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Reported-by: NThorsten Leemhuis <linux@leemhuis.info>
      Tested-by: NThorsten Leemhuis <linux@leemhuis.info>
      Cc: Chao Yu <chao@kernel.org>
      Cc: Chris Mason <clm@fb.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: David Sterba <dsterba@suse.com>
      Cc: Gao Xiang <xiang@kernel.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Josef Bacik <josef@toxicpanda.com>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      82e60d00
    • R
      nilfs2: fix use-after-free bug of ns_writer on remount · 8cccf05f
      Ryusuke Konishi 提交于
      If a nilfs2 filesystem is downgraded to read-only due to metadata
      corruption on disk and is remounted read/write, or if emergency read-only
      remount is performed, detaching a log writer and synchronizing the
      filesystem can be done at the same time.
      
      In these cases, use-after-free of the log writer (hereinafter
      nilfs->ns_writer) can happen as shown in the scenario below:
      
       Task1                               Task2
       --------------------------------    ------------------------------
       nilfs_construct_segment
         nilfs_segctor_sync
           init_wait
           init_waitqueue_entry
           add_wait_queue
           schedule
                                           nilfs_remount (R/W remount case)
      				       nilfs_attach_log_writer
                                               nilfs_detach_log_writer
                                                 nilfs_segctor_destroy
                                                   kfree
           finish_wait
             _raw_spin_lock_irqsave
               __raw_spin_lock_irqsave
                 do_raw_spin_lock
                   debug_spin_lock_before  <-- use-after-free
      
      While Task1 is sleeping, nilfs->ns_writer is freed by Task2.  After Task1
      waked up, Task1 accesses nilfs->ns_writer which is already freed.  This
      scenario diagram is based on the Shigeru Yoshida's post [1].
      
      This patch fixes the issue by not detaching nilfs->ns_writer on remount so
      that this UAF race doesn't happen.  Along with this change, this patch
      also inserts a few necessary read-only checks with superblock instance
      where only the ns_writer pointer was used to check if the filesystem is
      read-only.
      
      Link: https://syzkaller.appspot.com/bug?id=79a4c002e960419ca173d55e863bd09e8112df8b
      Link: https://lkml.kernel.org/r/20221103141759.1836312-1-syoshida@redhat.com [1]
      Link: https://lkml.kernel.org/r/20221104142959.28296-1-konishi.ryusuke@gmail.comSigned-off-by: NRyusuke Konishi <konishi.ryusuke@gmail.com>
      Reported-by: syzbot+f816fa82f8783f7a02bb@syzkaller.appspotmail.com
      Reported-by: NShigeru Yoshida <syoshida@redhat.com>
      Tested-by: NRyusuke Konishi <konishi.ryusuke@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      8cccf05f
    • R
      nilfs2: fix deadlock in nilfs_count_free_blocks() · 8ac932a4
      Ryusuke Konishi 提交于
      A semaphore deadlock can occur if nilfs_get_block() detects metadata
      corruption while locating data blocks and a superblock writeback occurs at
      the same time:
      
      task 1                               task 2
      ------                               ------
      * A file operation *
      nilfs_truncate()
        nilfs_get_block()
          down_read(rwsem A) <--
          nilfs_bmap_lookup_contig()
            ...                            generic_shutdown_super()
                                             nilfs_put_super()
                                               * Prepare to write superblock *
                                               down_write(rwsem B) <--
                                               nilfs_cleanup_super()
            * Detect b-tree corruption *         nilfs_set_log_cursor()
            nilfs_bmap_convert_error()             nilfs_count_free_blocks()
              __nilfs_error()                        down_read(rwsem A) <--
                nilfs_set_error()
                  down_write(rwsem B) <--
      
                                 *** DEADLOCK ***
      
      Here, nilfs_get_block() readlocks rwsem A (= NILFS_MDT(dat_inode)->mi_sem)
      and then calls nilfs_bmap_lookup_contig(), but if it fails due to metadata
      corruption, __nilfs_error() is called from nilfs_bmap_convert_error()
      inside the lock section.
      
      Since __nilfs_error() calls nilfs_set_error() unless the filesystem is
      read-only and nilfs_set_error() attempts to writelock rwsem B (=
      nilfs->ns_sem) to write back superblock exclusively, hierarchical lock
      acquisition occurs in the order rwsem A -> rwsem B.
      
      Now, if another task starts updating the superblock, it may writelock
      rwsem B during the lock sequence above, and can deadlock trying to
      readlock rwsem A in nilfs_count_free_blocks().
      
      However, there is actually no need to take rwsem A in
      nilfs_count_free_blocks() because it, within the lock section, only reads
      a single integer data on a shared struct with
      nilfs_sufile_get_ncleansegs().  This has been the case after commit
      aa474a22 ("nilfs2: add local variable to cache the number of clean
      segments"), that is, even before this bug was introduced.
      
      So, this resolves the deadlock problem by just not taking the semaphore in
      nilfs_count_free_blocks().
      
      Link: https://lkml.kernel.org/r/20221029044912.9139-1-konishi.ryusuke@gmail.com
      Fixes: e828949e ("nilfs2: call nilfs_error inside bmap routines")
      Signed-off-by: NRyusuke Konishi <konishi.ryusuke@gmail.com>
      Reported-by: syzbot+45d6ce7b7ad7ef455d03@syzkaller.appspotmail.com
      Tested-by: NRyusuke Konishi <konishi.ryusuke@gmail.com>
      Cc: <stable@vger.kernel.org>	[2.6.38+
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      8ac932a4
    • J
      hugetlbfs: don't delete error page from pagecache · 8625147c
      James Houghton 提交于
      This change is very similar to the change that was made for shmem [1], and
      it solves the same problem but for HugeTLBFS instead.
      
      Currently, when poison is found in a HugeTLB page, the page is removed
      from the page cache.  That means that attempting to map or read that
      hugepage in the future will result in a new hugepage being allocated
      instead of notifying the user that the page was poisoned.  As [1] states,
      this is effectively memory corruption.
      
      The fix is to leave the page in the page cache.  If the user attempts to
      use a poisoned HugeTLB page with a syscall, the syscall will fail with
      EIO, the same error code that shmem uses.  For attempts to map the page,
      the thread will get a BUS_MCEERR_AR SIGBUS.
      
      [1]: commit a7605426 ("mm: shmem: don't truncate page if memory failure happens")
      
      Link: https://lkml.kernel.org/r/20221018200125.848471-1-jthoughton@google.comSigned-off-by: NJames Houghton <jthoughton@google.com>
      Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com>
      Reviewed-by: NNaoya Horiguchi <naoya.horiguchi@nec.com>
      Tested-by: NNaoya Horiguchi <naoya.horiguchi@nec.com>
      Reviewed-by: NYang Shi <shy828301@gmail.com>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: James Houghton <jthoughton@google.com>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Muchun Song <songmuchun@bytedance.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      8625147c
    • J
      nfsd: put the export reference in nfsd4_verify_deleg_dentry · 50256e47
      Jeff Layton 提交于
      nfsd_lookup_dentry returns an export reference in addition to the dentry
      ref. Ensure that we put it too.
      
      Link: https://bugzilla.redhat.com/show_bug.cgi?id=2138866
      Fixes: 876c553c ("NFSD: verify the opened dentry after setting a delegation")
      Reported-by: NYongcheng Yang <yoyang@redhat.com>
      Signed-off-by: NJeff Layton <jlayton@kernel.org>
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      50256e47
  11. 08 11月, 2022 5 次提交
  12. 07 11月, 2022 7 次提交
    • J
      btrfs: zoned: fix locking imbalance on scrub · c62f6bec
      Johannes Thumshirn 提交于
      If we're doing device replace on a zoned filesystem and discover in
      scrub_enumerate_chunks() that we don't have to copy the block group it is
      unlocked before it gets skipped.
      
      But as the block group hasn't yet been locked before it leads to a locking
      imbalance. To fix this simply remove the unlock.
      
      This was uncovered by fstests' testcase btrfs/163.
      
      Fixes: 9283b9e0 ("btrfs: remove lock protection for BLOCK_GROUP_FLAG_TO_COPY")
      Signed-off-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      c62f6bec
    • J
      btrfs: zoned: initialize device's zone info for seeding · a8d1b164
      Johannes Thumshirn 提交于
      When performing seeding on a zoned filesystem it is necessary to
      initialize each zoned device's btrfs_zoned_device_info structure,
      otherwise mounting the filesystem will cause a NULL pointer dereference.
      
      This was uncovered by fstests' testcase btrfs/163.
      
      CC: stable@vger.kernel.org # 5.15+
      Signed-off-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      a8d1b164
    • J
      btrfs: zoned: clone zoned device info when cloning a device · 21e61ec6
      Johannes Thumshirn 提交于
      When cloning a btrfs_device, we're not cloning the associated
      btrfs_zoned_device_info structure of the device in case of a zoned
      filesystem.
      
      Later on this leads to a NULL pointer dereference when accessing the
      device's zone_info for instance when setting a zone as active.
      
      This was uncovered by fstests' testcase btrfs/161.
      
      CC: stable@vger.kernel.org # 5.15+
      Signed-off-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      21e61ec6
    • Q
      Revert "btrfs: scrub: use larger block size for data extent scrub" · b75b51f8
      Qu Wenruo 提交于
      This reverts commit 786672e9.
      
      [BUG]
      Since commit 786672e9 ("btrfs: scrub: use larger block size for data
      extent scrub"), btrfs scrub no longer reports errors if the corruption
      is not in the first sector of a STRIPE_LEN.
      
      The following script can expose the problem:
      
        mkfs.btrfs -f $dev
        mount $dev $mnt
        xfs_io -f -c "pwrite -S 0xff 0 8k" $mnt/foobar
        umount $mnt
      
        # 13631488 is the logical bytenr of above 8K extent
        btrfs-map-logical -l 13631488 -b 4096 $dev
        mirror 1 logical 13631488 physical 13631488 device /dev/test/scratch1
      
        # Corrupt the 2nd sector of that extent
        xfs_io -f -c "pwrite -S 0x00 13635584 4k" $dev
      
        mount $dev $mnt
        btrfs scrub start -B $mnt
        scrub done for 54e63f9f-0c30-4c84-a33b-5c56014629b7
        Scrub started:    Mon Nov  7 07:18:27 2022
        Status:           finished
        Duration:         0:00:00
        Total to scrub:   536.00MiB
        Rate:             0.00B/s
        Error summary:    no errors found <<<
      
      [CAUSE]
      That offending commit enlarges the data extent scrub size from sector
      size to BTRFS_STRIPE_LEN, to avoid extra scrub_block to be allocated.
      
      But unfortunately the data extent scrub is still heavily relying on the
      fact that there is only one scrub_sector per scrub_block.
      
      Thus it will only check the first sector, and ignoring the remaining
      sectors.
      
      Furthermore the error reporting is not able to handle multiple sectors
      either.
      
      [FIX]
      For now just revert the offending commit.
      
      The consequence is just extra memory usage during scrub.
      We will need a proper change to make the remaining data scrub path to
      handle multiple sectors before we enlarging the data scrub size.
      Reported-by: NLi Zhang <zhanglikernel@gmail.com>
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      b75b51f8
    • D
      btrfs: don't print stack trace when transaction is aborted due to ENOMEM · 8bb808c6
      David Sterba 提交于
      Add ENOMEM among the error codes that don't print stack trace on
      transaction abort. We've got several reports from syzbot that detects
      stacks as errors but caused by limiting memory. As this is an artificial
      condition we don't need to know where exactly the error happens, the
      abort and error cleanup will continue like e.g. for EIO.
      
      As the transaction aborts code needs to be inline in a lot of code, the
      implementation cases about minimal bloat. The error codes are in a
      separate function and the WARN uses the condition directly. This
      increases the code size by 571 bytes on release build.
      
      Alternatives considered: add -ENOMEM among the errors, this increases
      size by 2340 bytes, various attempts to combine the WARN and helper
      calls, increase by 700 or more bytes.
      
      Example syzbot reports (error -12):
      
      - https://syzkaller.appspot.com/bug?extid=5244d35be7f589cf093e
      - https://syzkaller.appspot.com/bug?extid=9c37714c07194d816417Signed-off-by: NDavid Sterba <dsterba@suse.com>
      8bb808c6
    • Z
      btrfs: selftests: fix wrong error check in btrfs_free_dummy_root() · 9b2f2034
      Zhang Xiaoxu 提交于
      The btrfs_alloc_dummy_root() uses ERR_PTR as the error return value
      rather than NULL, if error happened, there will be a NULL pointer
      dereference:
      
        BUG: KASAN: null-ptr-deref in btrfs_free_dummy_root+0x21/0x50 [btrfs]
        Read of size 8 at addr 000000000000002c by task insmod/258926
      
        CPU: 2 PID: 258926 Comm: insmod Tainted: G        W          6.1.0-rc2+ #5
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc33 04/01/2014
        Call Trace:
         <TASK>
         dump_stack_lvl+0x34/0x44
         kasan_report+0xb7/0x140
         kasan_check_range+0x145/0x1a0
         btrfs_free_dummy_root+0x21/0x50 [btrfs]
         btrfs_test_free_space_cache+0x1a8c/0x1add [btrfs]
         btrfs_run_sanity_tests+0x65/0x80 [btrfs]
         init_btrfs_fs+0xec/0x154 [btrfs]
         do_one_initcall+0x87/0x2a0
         do_init_module+0xdf/0x320
         load_module+0x3006/0x3390
         __do_sys_finit_module+0x113/0x1b0
         do_syscall_64+0x35/0x80
       entry_SYSCALL_64_after_hwframe+0x46/0xb0
      
      Fixes: aaedb55b ("Btrfs: add tests for btrfs_get_extent")
      CC: stable@vger.kernel.org # 4.9+
      Reviewed-by: NAnand Jain <anand.jain@oracle.com>
      Signed-off-by: NZhang Xiaoxu <zhangxiaoxu5@huawei.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      9b2f2034
    • L
      btrfs: fix match incorrectly in dev_args_match_device · 0fca385d
      Liu Shixin 提交于
      syzkaller found a failed assertion:
      
        assertion failed: (args->devid != (u64)-1) || args->missing, in fs/btrfs/volumes.c:6921
      
      This can be triggered when we set devid to (u64)-1 by ioctl. In this
      case, the match of devid will be skipped and the match of device may
      succeed incorrectly.
      
      Patch 562d7b15 introduced this function which is used to match device.
      This function contains two matching scenarios, we can distinguish them by
      checking the value of args->missing rather than check whether args->devid
      and args->uuid is default value.
      
      Reported-by: syzbot+031687116258450f9853@syzkaller.appspotmail.com
      Fixes: 562d7b15 ("btrfs: handle device lookup with btrfs_dev_lookup_args")
      CC: stable@vger.kernel.org # 5.16+
      Reviewed-by: NNikolay Borisov <nborisov@suse.com>
      Signed-off-by: NLiu Shixin <liushixin2@huawei.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      0fca385d
  13. 06 11月, 2022 4 次提交
    • T
      ext4: fix fortify warning in fs/ext4/fast_commit.c:1551 · 0d043351
      Theodore Ts'o 提交于
      With the new fortify string system, rework the memcpy to avoid this
      warning:
      
      memcpy: detected field-spanning write (size 60) of single field "&raw_inode->i_generation" at fs/ext4/fast_commit.c:1551 (size 4)
      
      Cc: stable@kernel.org
      Fixes: 54d9469b ("fortify: Add run-time WARN for cross-field memcpy()")
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      0d043351
    • J
      ext4: fix wrong return err in ext4_load_and_init_journal() · 9f2a1d9f
      Jason Yan 提交于
      The return value is wrong in ext4_load_and_init_journal(). The local
      variable 'err' need to be initialized before goto out. The original code
      in __ext4_fill_super() is fine because it has two return values 'ret'
      and 'err' and 'ret' is initialized as -EINVAL. After we factor out
      ext4_load_and_init_journal(), this code is broken. So fix it by directly
      returning -EINVAL in the error handler path.
      
      Cc: stable@kernel.org
      Fixes: 9c1dd22d ("ext4: factor out ext4_load_and_init_journal()")
      Signed-off-by: NJason Yan <yanaijie@huawei.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20221025040206.3134773-1-yanaijie@huawei.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
      9f2a1d9f
    • Y
      ext4: fix warning in 'ext4_da_release_space' · 1b8f787e
      Ye Bin 提交于
      Syzkaller report issue as follows:
      EXT4-fs (loop0): Free/Dirty block details
      EXT4-fs (loop0): free_blocks=0
      EXT4-fs (loop0): dirty_blocks=0
      EXT4-fs (loop0): Block reservation details
      EXT4-fs (loop0): i_reserved_data_blocks=0
      EXT4-fs warning (device loop0): ext4_da_release_space:1527: ext4_da_release_space: ino 18, to_free 1 with only 0 reserved data blocks
      ------------[ cut here ]------------
      WARNING: CPU: 0 PID: 92 at fs/ext4/inode.c:1528 ext4_da_release_space+0x25e/0x370 fs/ext4/inode.c:1524
      Modules linked in:
      CPU: 0 PID: 92 Comm: kworker/u4:4 Not tainted 6.0.0-syzkaller-09423-g493ffd66 #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
      Workqueue: writeback wb_workfn (flush-7:0)
      RIP: 0010:ext4_da_release_space+0x25e/0x370 fs/ext4/inode.c:1528
      RSP: 0018:ffffc900015f6c90 EFLAGS: 00010296
      RAX: 42215896cd52ea00 RBX: 0000000000000000 RCX: 42215896cd52ea00
      RDX: 0000000000000000 RSI: 0000000080000001 RDI: 0000000000000000
      RBP: 1ffff1100e907d96 R08: ffffffff816aa79d R09: fffff520002bece5
      R10: fffff520002bece5 R11: 1ffff920002bece4 R12: ffff888021fd2000
      R13: ffff88807483ecb0 R14: 0000000000000001 R15: ffff88807483e740
      FS:  0000000000000000(0000) GS:ffff8880b9a00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00005555569ba628 CR3: 000000000c88e000 CR4: 00000000003506f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       <TASK>
       ext4_es_remove_extent+0x1ab/0x260 fs/ext4/extents_status.c:1461
       mpage_release_unused_pages+0x24d/0xef0 fs/ext4/inode.c:1589
       ext4_writepages+0x12eb/0x3be0 fs/ext4/inode.c:2852
       do_writepages+0x3c3/0x680 mm/page-writeback.c:2469
       __writeback_single_inode+0xd1/0x670 fs/fs-writeback.c:1587
       writeback_sb_inodes+0xb3b/0x18f0 fs/fs-writeback.c:1870
       wb_writeback+0x41f/0x7b0 fs/fs-writeback.c:2044
       wb_do_writeback fs/fs-writeback.c:2187 [inline]
       wb_workfn+0x3cb/0xef0 fs/fs-writeback.c:2227
       process_one_work+0x877/0xdb0 kernel/workqueue.c:2289
       worker_thread+0xb14/0x1330 kernel/workqueue.c:2436
       kthread+0x266/0x300 kernel/kthread.c:376
       ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
       </TASK>
      
      Above issue may happens as follows:
      ext4_da_write_begin
        ext4_create_inline_data
          ext4_clear_inode_flag(inode, EXT4_INODE_EXTENTS);
          ext4_set_inode_flag(inode, EXT4_INODE_INLINE_DATA);
      __ext4_ioctl
        ext4_ext_migrate -> will lead to eh->eh_entries not zero, and set extent flag
      ext4_da_write_begin
        ext4_da_convert_inline_data_to_extent
          ext4_da_write_inline_data_begin
            ext4_da_map_blocks
              ext4_insert_delayed_block
      	  if (!ext4_es_scan_clu(inode, &ext4_es_is_delonly, lblk))
      	    if (!ext4_es_scan_clu(inode, &ext4_es_is_mapped, lblk))
      	      ext4_clu_mapped(inode, EXT4_B2C(sbi, lblk)); -> will return 1
      	       allocated = true;
                ext4_es_insert_delayed_block(inode, lblk, allocated);
      ext4_writepages
        mpage_map_and_submit_extent(handle, &mpd, &give_up_on_write); -> return -ENOSPC
        mpage_release_unused_pages(&mpd, give_up_on_write); -> give_up_on_write == 1
          ext4_es_remove_extent
            ext4_da_release_space(inode, reserved);
              if (unlikely(to_free > ei->i_reserved_data_blocks))
      	  -> to_free == 1  but ei->i_reserved_data_blocks == 0
      	  -> then trigger warning as above
      
      To solve above issue, forbid inode do migrate which has inline data.
      
      Cc: stable@kernel.org
      Reported-by: syzbot+c740bb18df70ad00952e@syzkaller.appspotmail.com
      Signed-off-by: NYe Bin <yebin10@huawei.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20221018022701.683489-1-yebin10@huawei.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
      1b8f787e
    • L
      ext4: fix BUG_ON() when directory entry has invalid rec_len · 17a0bc9b
      Luís Henriques 提交于
      The rec_len field in the directory entry has to be a multiple of 4.  A
      corrupted filesystem image can be used to hit a BUG() in
      ext4_rec_len_to_disk(), called from make_indexed_dir().
      
       ------------[ cut here ]------------
       kernel BUG at fs/ext4/ext4.h:2413!
       ...
       RIP: 0010:make_indexed_dir+0x53f/0x5f0
       ...
       Call Trace:
        <TASK>
        ? add_dirent_to_buf+0x1b2/0x200
        ext4_add_entry+0x36e/0x480
        ext4_add_nondir+0x2b/0xc0
        ext4_create+0x163/0x200
        path_openat+0x635/0xe90
        do_filp_open+0xb4/0x160
        ? __create_object.isra.0+0x1de/0x3b0
        ? _raw_spin_unlock+0x12/0x30
        do_sys_openat2+0x91/0x150
        __x64_sys_open+0x6c/0xa0
        do_syscall_64+0x3c/0x80
        entry_SYSCALL_64_after_hwframe+0x46/0xb0
      
      The fix simply adds a call to ext4_check_dir_entry() to validate the
      directory entry, returning -EFSCORRUPTED if the entry is invalid.
      
      CC: stable@kernel.org
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=216540Signed-off-by: NLuís Henriques <lhenriques@suse.de>
      Link: https://lore.kernel.org/r/20221012131330.32456-1-lhenriques@suse.deSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
      17a0bc9b