1. 29 3月, 2023 10 次提交
    • D
      xfs, iomap: limit individual ioend chain lengths in writeback · c5883137
      Dave Chinner 提交于
      mainline inclusion
      from mainline-v5.17-rc3
      commit ebb7fb15
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I4KIAO
      CVE: NA
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ebb7fb1557b1d03b906b668aa2164b51e6b7d19a
      
      --------------------------------
      
      Trond Myklebust reported soft lockups in XFS IO completion such as
      this:
      
       watchdog: BUG: soft lockup - CPU#12 stuck for 23s! [kworker/12:1:3106]
       CPU: 12 PID: 3106 Comm: kworker/12:1 Not tainted 4.18.0-305.10.2.el8_4.x86_64 #1
       Workqueue: xfs-conv/md127 xfs_end_io [xfs]
       RIP: 0010:_raw_spin_unlock_irqrestore+0x11/0x20
       Call Trace:
        wake_up_page_bit+0x8a/0x110
        iomap_finish_ioend+0xd7/0x1c0
        iomap_finish_ioends+0x7f/0xb0
        xfs_end_ioend+0x6b/0x100 [xfs]
        xfs_end_io+0xb9/0xe0 [xfs]
        process_one_work+0x1a7/0x360
        worker_thread+0x1fa/0x390
        kthread+0x116/0x130
        ret_from_fork+0x35/0x40
      
      Ioends are processed as an atomic completion unit when all the
      chained bios in the ioend have completed their IO. Logically
      contiguous ioends can also be merged and completed as a single,
      larger unit.  Both of these things can be problematic as both the
      bio chains per ioend and the size of the merged ioends processed as
      a single completion are both unbound.
      
      If we have a large sequential dirty region in the page cache,
      write_cache_pages() will keep feeding us sequential pages and we
      will keep mapping them into ioends and bios until we get a dirty
      page at a non-sequential file offset. These large sequential runs
      can will result in bio and ioend chaining to optimise the io
      patterns. The pages iunder writeback are pinned within these chains
      until the submission chaining is broken, allowing the entire chain
      to be completed. This can result in huge chains being processed
      in IO completion context.
      
      We get deep bio chaining if we have large contiguous physical
      extents. We will keep adding pages to the current bio until it is
      full, then we'll chain a new bio to keep adding pages for writeback.
      Hence we can build bio chains that map millions of pages and tens of
      gigabytes of RAM if the page cache contains big enough contiguous
      dirty file regions. This long bio chain pins those pages until the
      final bio in the chain completes and the ioend can iterate all the
      chained bios and complete them.
      
      OTOH, if we have a physically fragmented file, we end up submitting
      one ioend per physical fragment that each have a small bio or bio
      chain attached to them. We do not chain these at IO submission time,
      but instead we chain them at completion time based on file
      offset via iomap_ioend_try_merge(). Hence we can end up with unbound
      ioend chains being built via completion merging.
      
      XFS can then do COW remapping or unwritten extent conversion on that
      merged chain, which involves walking an extent fragment at a time
      and running a transaction to modify the physical extent information.
      IOWs, we merge all the discontiguous ioends together into a
      contiguous file range, only to then process them individually as
      discontiguous extents.
      
      This extent manipulation is computationally expensive and can run in
      a tight loop, so merging logically contiguous but physically
      discontigous ioends gains us nothing except for hiding the fact the
      fact we broke the ioends up into individual physical extents at
      submission and then need to loop over those individual physical
      extents at completion.
      
      Hence we need to have mechanisms to limit ioend sizes and
      to break up completion processing of large merged ioend chains:
      
      1. bio chains per ioend need to be bound in length. Pure overwrites
      go straight to iomap_finish_ioend() in softirq context with the
      exact bio chain attached to the ioend by submission. Hence the only
      way to prevent long holdoffs here is to bound ioend submission
      sizes because we can't reschedule in softirq context.
      
      2. iomap_finish_ioends() has to handle unbound merged ioend chains
      correctly. This relies on any one call to iomap_finish_ioend() being
      bound in runtime so that cond_resched() can be issued regularly as
      the long ioend chain is processed. i.e. this relies on mechanism #1
      to limit individual ioend sizes to work correctly.
      
      3. filesystems have to loop over the merged ioends to process
      physical extent manipulations. This means they can loop internally,
      and so we break merging at physical extent boundaries so the
      filesystem can easily insert reschedule points between individual
      extent manipulations.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reported-and-tested-by: NTrond Myklebust <trondmy@hammerspace.com>
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Conflicts:
      	include/linux/iomap.h
      	fs/iomap/buffered-io.c
      	fs/xfs/xfs_aops.c
      
      	[ 6e552494 ("iomap: remove unused private field from ioend")
      	  is not applied.
      	  95c4cd05 ("iomap: Convert to_iomap_page to take a folio") is
      	  not applied.
      	  8ffd74e9 ("iomap: Convert bio completions to use folios") is
      	  not applied.
      	  044c6449 ("xfs: drop unused ioend private merge and
      	  setfilesize code") is not applied. ]
      Signed-off-by: NZhihao Cheng <chengzhihao1@huawei.com>
      Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
      Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
      c5883137
    • E
      fs/ntfs3: Validate resident attribute name · af9e000b
      Edward Lo 提交于
      mainline inclusion
      from mainline-v6.2-rc1
      commit 54e45702
      category: bugfix
      bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6OD4U
      CVE: CVE-2022-48423
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=54e45702b648b7c0000e90b3e9b890e367e16ea8
      
      --------------------------------
      
      Though we already have some sanity checks while enumerating attributes,
      resident attribute names aren't included. This patch checks the resident
      attribute names are in the valid ranges.
      
      [  259.209031] BUG: KASAN: slab-out-of-bounds in ni_create_attr_list+0x1e1/0x850
      [  259.210770] Write of size 426 at addr ffff88800632f2b2 by task exp/255
      [  259.211551]
      [  259.212035] CPU: 0 PID: 255 Comm: exp Not tainted 6.0.0-rc6 #37
      [  259.212955] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
      [  259.214387] Call Trace:
      [  259.214640]  <TASK>
      [  259.214895]  dump_stack_lvl+0x49/0x63
      [  259.215284]  print_report.cold+0xf5/0x689
      [  259.215565]  ? kasan_poison+0x3c/0x50
      [  259.215778]  ? kasan_unpoison+0x28/0x60
      [  259.215991]  ? ni_create_attr_list+0x1e1/0x850
      [  259.216270]  kasan_report+0xa7/0x130
      [  259.216481]  ? ni_create_attr_list+0x1e1/0x850
      [  259.216719]  kasan_check_range+0x15a/0x1d0
      [  259.216939]  memcpy+0x3c/0x70
      [  259.217136]  ni_create_attr_list+0x1e1/0x850
      [  259.217945]  ? __rcu_read_unlock+0x5b/0x280
      [  259.218384]  ? ni_remove_attr+0x2e0/0x2e0
      [  259.218712]  ? kernel_text_address+0xcf/0xe0
      [  259.219064]  ? __kernel_text_address+0x12/0x40
      [  259.219434]  ? arch_stack_walk+0x9e/0xf0
      [  259.219668]  ? __this_cpu_preempt_check+0x13/0x20
      [  259.219904]  ? sysvec_apic_timer_interrupt+0x57/0xc0
      [  259.220140]  ? asm_sysvec_apic_timer_interrupt+0x1b/0x20
      [  259.220561]  ni_ins_attr_ext+0x52c/0x5c0
      [  259.220984]  ? ni_create_attr_list+0x850/0x850
      [  259.221532]  ? run_deallocate+0x120/0x120
      [  259.221972]  ? vfs_setxattr+0x128/0x300
      [  259.222688]  ? setxattr+0x126/0x140
      [  259.222921]  ? path_setxattr+0x164/0x180
      [  259.223431]  ? __x64_sys_setxattr+0x6d/0x80
      [  259.223828]  ? entry_SYSCALL_64_after_hwframe+0x63/0xcd
      [  259.224417]  ? mi_find_attr+0x3c/0xf0
      [  259.224772]  ni_insert_attr+0x1ba/0x420
      [  259.225216]  ? ni_ins_attr_ext+0x5c0/0x5c0
      [  259.225504]  ? ntfs_read_ea+0x119/0x450
      [  259.225775]  ni_insert_resident+0xc0/0x1c0
      [  259.226316]  ? ni_insert_nonresident+0x400/0x400
      [  259.227001]  ? __kasan_kmalloc+0x88/0xb0
      [  259.227468]  ? __kmalloc+0x192/0x320
      [  259.227773]  ntfs_set_ea+0x6bf/0xb30
      [  259.228216]  ? ftrace_graph_ret_addr+0x2a/0xb0
      [  259.228494]  ? entry_SYSCALL_64_after_hwframe+0x63/0xcd
      [  259.228838]  ? ntfs_read_ea+0x450/0x450
      [  259.229098]  ? is_bpf_text_address+0x24/0x40
      [  259.229418]  ? kernel_text_address+0xcf/0xe0
      [  259.229681]  ? __kernel_text_address+0x12/0x40
      [  259.229948]  ? unwind_get_return_address+0x3a/0x60
      [  259.230271]  ? write_profile+0x270/0x270
      [  259.230537]  ? arch_stack_walk+0x9e/0xf0
      [  259.230836]  ntfs_setxattr+0x114/0x5c0
      [  259.231099]  ? ntfs_set_acl_ex+0x2e0/0x2e0
      [  259.231529]  ? evm_protected_xattr_common+0x6d/0x100
      [  259.231817]  ? posix_xattr_acl+0x13/0x80
      [  259.232073]  ? evm_protect_xattr+0x1f7/0x440
      [  259.232351]  __vfs_setxattr+0xda/0x120
      [  259.232635]  ? xattr_resolve_name+0x180/0x180
      [  259.232912]  __vfs_setxattr_noperm+0x93/0x300
      [  259.233219]  __vfs_setxattr_locked+0x141/0x160
      [  259.233492]  ? kasan_poison+0x3c/0x50
      [  259.233744]  vfs_setxattr+0x128/0x300
      [  259.234002]  ? __vfs_setxattr_locked+0x160/0x160
      [  259.234837]  do_setxattr+0xb8/0x170
      [  259.235567]  ? vmemdup_user+0x53/0x90
      [  259.236212]  setxattr+0x126/0x140
      [  259.236491]  ? do_setxattr+0x170/0x170
      [  259.236791]  ? debug_smp_processor_id+0x17/0x20
      [  259.237232]  ? kasan_quarantine_put+0x57/0x180
      [  259.237605]  ? putname+0x80/0xa0
      [  259.237870]  ? __kasan_slab_free+0x11c/0x1b0
      [  259.238234]  ? putname+0x80/0xa0
      [  259.238500]  ? preempt_count_sub+0x18/0xc0
      [  259.238775]  ? __mnt_want_write+0xaa/0x100
      [  259.238990]  ? mnt_want_write+0x8b/0x150
      [  259.239290]  path_setxattr+0x164/0x180
      [  259.239605]  ? setxattr+0x140/0x140
      [  259.239849]  ? debug_smp_processor_id+0x17/0x20
      [  259.240174]  ? fpregs_assert_state_consistent+0x67/0x80
      [  259.240411]  __x64_sys_setxattr+0x6d/0x80
      [  259.240715]  do_syscall_64+0x3b/0x90
      [  259.240934]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
      [  259.241697] RIP: 0033:0x7fc6b26e4469
      [  259.242647] Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 088
      [  259.244512] RSP: 002b:00007ffc3c7841f8 EFLAGS: 00000217 ORIG_RAX: 00000000000000bc
      [  259.245086] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fc6b26e4469
      [  259.246025] RDX: 00007ffc3c784380 RSI: 00007ffc3c7842e0 RDI: 00007ffc3c784238
      [  259.246961] RBP: 00007ffc3c788410 R08: 0000000000000001 R09: 00007ffc3c7884f8
      [  259.247775] R10: 000000000000007f R11: 0000000000000217 R12: 00000000004004e0
      [  259.248534] R13: 00007ffc3c7884f0 R14: 0000000000000000 R15: 0000000000000000
      [  259.249368]  </TASK>
      [  259.249644]
      [  259.249888] Allocated by task 255:
      [  259.250283]  kasan_save_stack+0x26/0x50
      [  259.250957]  __kasan_kmalloc+0x88/0xb0
      [  259.251826]  __kmalloc+0x192/0x320
      [  259.252745]  ni_create_attr_list+0x11e/0x850
      [  259.253298]  ni_ins_attr_ext+0x52c/0x5c0
      [  259.253685]  ni_insert_attr+0x1ba/0x420
      [  259.253974]  ni_insert_resident+0xc0/0x1c0
      [  259.254311]  ntfs_set_ea+0x6bf/0xb30
      [  259.254629]  ntfs_setxattr+0x114/0x5c0
      [  259.254859]  __vfs_setxattr+0xda/0x120
      [  259.255155]  __vfs_setxattr_noperm+0x93/0x300
      [  259.255445]  __vfs_setxattr_locked+0x141/0x160
      [  259.255862]  vfs_setxattr+0x128/0x300
      [  259.256251]  do_setxattr+0xb8/0x170
      [  259.256522]  setxattr+0x126/0x140
      [  259.256911]  path_setxattr+0x164/0x180
      [  259.257308]  __x64_sys_setxattr+0x6d/0x80
      [  259.257637]  do_syscall_64+0x3b/0x90
      [  259.257970]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
      [  259.258550]
      [  259.258772] The buggy address belongs to the object at ffff88800632f000
      [  259.258772]  which belongs to the cache kmalloc-1k of size 1024
      [  259.260190] The buggy address is located 690 bytes inside of
      [  259.260190]  1024-byte region [ffff88800632f000, ffff88800632f400)
      [  259.261412]
      [  259.261743] The buggy address belongs to the physical page:
      [  259.262354] page:0000000081e8cac9 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x632c
      [  259.263722] head:0000000081e8cac9 order:2 compound_mapcount:0 compound_pincount:0
      [  259.264284] flags: 0xfffffc0010200(slab|head|node=0|zone=1|lastcpupid=0x1fffff)
      [  259.265312] raw: 000fffffc0010200 ffffea0000060d00 dead000000000004 ffff888001041dc0
      [  259.265772] raw: 0000000000000000 0000000080080008 00000001ffffffff 0000000000000000
      [  259.266305] page dumped because: kasan: bad access detected
      [  259.266588]
      [  259.266728] Memory state around the buggy address:
      [  259.267225]  ffff88800632f300: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      [  259.267841]  ffff88800632f380: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      [  259.269111] >ffff88800632f400: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [  259.269626]                    ^
      [  259.270162]  ffff88800632f480: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [  259.270810]  ffff88800632f500: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      Signed-off-by: NEdward Lo <edward.lo@ambergroup.io>
      Signed-off-by: NKonstantin Komarov <almaz.alexandrovich@paragon-software.com>
      Signed-off-by: NZhaoLong Wang <wangzhaolong1@huawei.com>
      Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
      Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
      Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
      af9e000b
    • E
      coredump: Use the vma snapshot in fill_files_note · 5ba037e0
      Eric W. Biederman 提交于
      stable inclusion
      from stable-v5.10.110
      commit 558564db44755dfb3e48b0d64de327d20981e950
      category: bugfix
      bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6KT9C
      CVE: CVE-2023-1249
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=558564db44755dfb3e48b0d64de327d20981e950
      
      --------------------------------
      
      commit 390031c9 upstream.
      
      Matthew Wilcox reported that there is a missing mmap_lock in
      file_files_note that could possibly lead to a user after free.
      
      Solve this by using the existing vma snapshot for consistency
      and to avoid the need to take the mmap_lock anywhere in the
      coredump code except for dump_vma_snapshot.
      
      Update the dump_vma_snapshot to capture vm_pgoff and vm_file
      that are neeeded by fill_files_note.
      
      Add free_vma_snapshot to free the captured values of vm_file.
      Reported-by: NMatthew Wilcox <willy@infradead.org>
      Link: https://lkml.kernel.org/r/20220131153740.2396974-1-willy@infradead.org
      Cc: stable@vger.kernel.org
      Fixes: a07279c9 ("binfmt_elf, binfmt_elf_fdpic: use a VMA list snapshot")
      Fixes: 2aa362c4 ("coredump: extend core dump note section to contain file names of mapped files")
      Reviewed-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NLi Huafei <lihuafei1@huawei.com>
      Reviewed-by: NXu Kuohai <xukuohai@huawei.com>
      Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
      Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
      5ba037e0
    • E
      coredump/elf: Pass coredump_params into fill_note_info · a682fb18
      Eric W. Biederman 提交于
      stable inclusion
      from stable-v5.10.110
      commit b7933f145ad32bb5e084af55176ab6dcaa15a036
      category: bugfix
      bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6KT9C
      CVE: CVE-2023-1249
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=b7933f145ad32bb5e084af55176ab6dcaa15a036
      
      --------------------------------
      
      commit 9ec7d323 upstream.
      
      Instead of individually passing cprm->siginfo and cprm->regs
      into fill_note_info pass all of struct coredump_params.
      
      This is preparation to allow fill_files_note to use the existing
      vma snapshot.
      Reviewed-by: NJann Horn <jannh@google.com>
      Reviewed-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NLi Huafei <lihuafei1@huawei.com>
      Reviewed-by: NXu Kuohai <xukuohai@huawei.com>
      Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
      Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
      a682fb18
    • E
      coredump: Remove the WARN_ON in dump_vma_snapshot · a6541c35
      Eric W. Biederman 提交于
      stable inclusion
      from stable-v5.10.110
      commit b043ae637a83585b2a497c2eb7ee49446fc68e98
      category: bugfix
      bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6KT9C
      CVE: CVE-2023-1249
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=b043ae637a83585b2a497c2eb7ee49446fc68e98
      
      --------------------------------
      
      commit 49c18663 upstream.
      
      The condition is impossible and to the best of my knowledge has never
      triggered.
      
      We are in deep trouble if that conditions happens and we walk past
      the end of our allocated array.
      
      So delete the WARN_ON and the code that makes it look like the kernel
      can handle the case of walking past the end of it's vma_meta array.
      Reviewed-by: NJann Horn <jannh@google.com>
      Reviewed-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NLi Huafei <lihuafei1@huawei.com>
      Reviewed-by: NXu Kuohai <xukuohai@huawei.com>
      Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
      Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
      a6541c35
    • E
      coredump: Snapshot the vmas in do_coredump · 0e07125c
      Eric W. Biederman 提交于
      stable inclusion
      from stable-v5.10.110
      commit 936c8be4d1447f36ac4d2a464bd03a5cd659c42f
      category: bugfix
      bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6KT9C
      CVE: CVE-2023-1249
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=936c8be4d1447f36ac4d2a464bd03a5cd659c42f
      
      --------------------------------
      
      commit 95c5436a upstream.
      
      Move the call of dump_vma_snapshot and kvfree(vma_meta) out of the
      individual coredump routines into do_coredump itself.  This makes
      the code less error prone and easier to maintain.
      
      Make the vma snapshot available to the coredump routines
      in struct coredump_params.  This makes it easier to
      change and update what is captures in the vma snapshot
      and will be needed for fixing fill_file_notes.
      Reviewed-by: NJann Horn <jannh@google.com>
      Reviewed-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NLi Huafei <lihuafei1@huawei.com>
      Reviewed-by: NXu Kuohai <xukuohai@huawei.com>
      Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
      Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
      0e07125c
    • E
      fs/ntfs3: Validate MFT flags before replaying logs · 447350f6
      Edward Lo 提交于
      maillist inclusion
      category: bugfix
      bugzilla: 188564, https://gitee.com/src-openeuler/kernel/issues/I6OD5R
      CVE: CVE-2022-48425
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/fs/ntfs3?id=467333af2f7b95eeaa61a5b5369a80063cd971fd
      
      ----------------------------------------
      
      Log load and replay is part of the metadata handle flow during mount
      operation. The $MFT record will be loaded and used while replaying logs.
      However, a malformed $MFT record, say, has RECORD_FLAG_DIR flag set and
      contains an ATTR_ROOT attribute will misguide kernel to treat it as a
      directory, and try to free the allocated resources when the
      corresponding inode is freed, which will cause an invalid kfree because
      the memory hasn't actually been allocated.
      
      [  101.368647] BUG: KASAN: invalid-free in kvfree+0x2c/0x40
      [  101.369457]
      [  101.369986] CPU: 0 PID: 198 Comm: mount Not tainted 6.0.0-rc7+ #5
      [  101.370529] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
      [  101.371362] Call Trace:
      [  101.371795]  <TASK>
      [  101.372157]  dump_stack_lvl+0x49/0x63
      [  101.372658]  print_report.cold+0xf5/0x689
      [  101.373022]  ? ni_write_inode+0x754/0xd90
      [  101.373378]  ? kvfree+0x2c/0x40
      [  101.373698]  kasan_report_invalid_free+0x77/0xf0
      [  101.374058]  ? kvfree+0x2c/0x40
      [  101.374352]  ? kvfree+0x2c/0x40
      [  101.374668]  __kasan_slab_free+0x189/0x1b0
      [  101.374992]  ? kvfree+0x2c/0x40
      [  101.375271]  kfree+0x168/0x3b0
      [  101.375717]  kvfree+0x2c/0x40
      [  101.376002]  indx_clear+0x26/0x60
      [  101.376316]  ni_clear+0xc5/0x290
      [  101.376661]  ntfs_evict_inode+0x45/0x70
      [  101.377001]  evict+0x199/0x280
      [  101.377432]  iput.part.0+0x286/0x320
      [  101.377819]  iput+0x32/0x50
      [  101.378166]  ntfs_loadlog_and_replay+0x143/0x320
      [  101.378656]  ? ntfs_bio_fill_1+0x510/0x510
      [  101.378968]  ? iput.part.0+0x286/0x320
      [  101.379367]  ntfs_fill_super+0xecb/0x1ba0
      [  101.379729]  ? put_ntfs+0x1d0/0x1d0
      [  101.380046]  ? vsprintf+0x20/0x20
      [  101.380542]  ? mutex_unlock+0x81/0xd0
      [  101.380914]  ? set_blocksize+0x95/0x150
      [  101.381597]  get_tree_bdev+0x232/0x370
      [  101.382254]  ? put_ntfs+0x1d0/0x1d0
      [  101.382699]  ntfs_fs_get_tree+0x15/0x20
      [  101.383094]  vfs_get_tree+0x4c/0x130
      [  101.383675]  path_mount+0x654/0xfe0
      [  101.384203]  ? putname+0x80/0xa0
      [  101.384540]  ? finish_automount+0x2e0/0x2e0
      [  101.384943]  ? putname+0x80/0xa0
      [  101.385362]  ? kmem_cache_free+0x1c4/0x440
      [  101.385968]  ? putname+0x80/0xa0
      [  101.386666]  do_mount+0xd6/0xf0
      [  101.387228]  ? path_mount+0xfe0/0xfe0
      [  101.387585]  ? __kasan_check_write+0x14/0x20
      [  101.387979]  __x64_sys_mount+0xca/0x110
      [  101.388436]  do_syscall_64+0x3b/0x90
      [  101.388757]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
      [  101.389289] RIP: 0033:0x7fa0f70e948a
      [  101.390048] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008
      [  101.391297] RSP: 002b:00007ffc24fdecc8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
      [  101.391988] RAX: ffffffffffffffda RBX: 000055932c183060 RCX: 00007fa0f70e948a
      [  101.392494] RDX: 000055932c183260 RSI: 000055932c1832e0 RDI: 000055932c18bce0
      [  101.393053] RBP: 0000000000000000 R08: 000055932c183280 R09: 0000000000000020
      [  101.393577] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 000055932c18bce0
      [  101.394044] R13: 000055932c183260 R14: 0000000000000000 R15: 00000000ffffffff
      [  101.394747]  </TASK>
      [  101.395402]
      [  101.396047] Allocated by task 198:
      [  101.396724]  kasan_save_stack+0x26/0x50
      [  101.397400]  __kasan_slab_alloc+0x6d/0x90
      [  101.397974]  kmem_cache_alloc_lru+0x192/0x5a0
      [  101.398524]  ntfs_alloc_inode+0x23/0x70
      [  101.399137]  alloc_inode+0x3b/0xf0
      [  101.399534]  iget5_locked+0x54/0xa0
      [  101.400026]  ntfs_iget5+0xaf/0x1780
      [  101.400414]  ntfs_loadlog_and_replay+0xe5/0x320
      [  101.400883]  ntfs_fill_super+0xecb/0x1ba0
      [  101.401313]  get_tree_bdev+0x232/0x370
      [  101.401774]  ntfs_fs_get_tree+0x15/0x20
      [  101.402224]  vfs_get_tree+0x4c/0x130
      [  101.402673]  path_mount+0x654/0xfe0
      [  101.403160]  do_mount+0xd6/0xf0
      [  101.403537]  __x64_sys_mount+0xca/0x110
      [  101.404058]  do_syscall_64+0x3b/0x90
      [  101.404333]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
      [  101.404816]
      [  101.405067] The buggy address belongs to the object at ffff888008cc9ea0
      [  101.405067]  which belongs to the cache ntfs_inode_cache of size 992
      [  101.406171] The buggy address is located 232 bytes inside of
      [  101.406171]  992-byte region [ffff888008cc9ea0, ffff888008cca280)
      [  101.406995]
      [  101.408559] The buggy address belongs to the physical page:
      [  101.409320] page:00000000dccf19dd refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x8cc8
      [  101.410654] head:00000000dccf19dd order:2 compound_mapcount:0 compound_pincount:0
      [  101.411533] flags: 0xfffffc0010200(slab|head|node=0|zone=1|lastcpupid=0x1fffff)
      [  101.412665] raw: 000fffffc0010200 0000000000000000 dead000000000122 ffff888003695140
      [  101.413209] raw: 0000000000000000 00000000800e000e 00000001ffffffff 0000000000000000
      [  101.413799] page dumped because: kasan: bad access detected
      [  101.414213]
      [  101.414427] Memory state around the buggy address:
      [  101.414991]  ffff888008cc9e80: fc fc fc fc 00 00 00 00 00 00 00 00 00 00 00 00
      [  101.415785]  ffff888008cc9f00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      [  101.416933] >ffff888008cc9f80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      [  101.417857]                       ^
      [  101.418566]  ffff888008cca000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      [  101.419704]  ffff888008cca080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      Signed-off-by: NEdward Lo <edward.lo@ambergroup.io>
      Signed-off-by: NKonstantin Komarov <almaz.alexandrovich@paragon-software.com>
      Signed-off-by: NZhong Jinghua <zhongjinghua@huawei.com>
      Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
      Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
      Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
      447350f6
    • E
      fs/ntfs3: Validate attribute name offset · 887a9199
      Edward Lo 提交于
      maillist inclusion
      category: bugfix
      bugzilla: 188564, https://gitee.com/src-openeuler/kernel/issues/I6OD4P
      CVE: CVE-2022-48424
      
      Reference: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=4f1dc7d9756e66f3f876839ea174df2e656b7f79
      
      ----------------------------------------
      
      Although the attribute name length is checked before comparing it to
      some common names (e.g., $I30), the offset isn't. This adds a sanity
      check for the attribute name offset, guarantee the validity and prevent
      possible out-of-bound memory accesses.
      
      [  191.720056] BUG: unable to handle page fault for address: ffffebde00000008
      [  191.721060] #PF: supervisor read access in kernel mode
      [  191.721586] #PF: error_code(0x0000) - not-present page
      [  191.722079] PGD 0 P4D 0
      [  191.722571] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
      [  191.723179] CPU: 0 PID: 244 Comm: mount Not tainted 6.0.0-rc4 #28
      [  191.723749] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
      [  191.724832] RIP: 0010:kfree+0x56/0x3b0
      [  191.725870] Code: 80 48 01 d8 0f 82 65 03 00 00 48 c7 c2 00 00 00 80 48 2b 15 2c 06 dd 01 48 01 d0 48 c1 e8 0c 48 c1 e0 06 48 03 05 0a 069
      [  191.727375] RSP: 0018:ffff8880076f7878 EFLAGS: 00000286
      [  191.727897] RAX: ffffebde00000000 RBX: 0000000000000040 RCX: ffffffff8528d5b9
      [  191.728531] RDX: 0000777f80000000 RSI: ffffffff8522d49c RDI: 0000000000000040
      [  191.729183] RBP: ffff8880076f78a0 R08: 0000000000000000 R09: 0000000000000000
      [  191.729628] R10: ffff888008949fd8 R11: ffffed10011293fd R12: 0000000000000040
      [  191.730158] R13: ffff888008949f98 R14: ffff888008949ec0 R15: ffff888008949fb0
      [  191.730645] FS:  00007f3520cd7e40(0000) GS:ffff88805ba00000(0000) knlGS:0000000000000000
      [  191.731328] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  191.731667] CR2: ffffebde00000008 CR3: 0000000009704000 CR4: 00000000000006f0
      [  191.732568] Call Trace:
      [  191.733231]  <TASK>
      [  191.733860]  kvfree+0x2c/0x40
      [  191.734632]  ni_clear+0x180/0x290
      [  191.735085]  ntfs_evict_inode+0x45/0x70
      [  191.735495]  evict+0x199/0x280
      [  191.735996]  iput.part.0+0x286/0x320
      [  191.736438]  iput+0x32/0x50
      [  191.736811]  iget_failed+0x23/0x30
      [  191.737270]  ntfs_iget5+0x337/0x1890
      [  191.737629]  ? ntfs_clear_mft_tail+0x20/0x260
      [  191.738201]  ? ntfs_get_block_bmap+0x70/0x70
      [  191.738482]  ? ntfs_objid_init+0xf6/0x140
      [  191.738779]  ? ntfs_reparse_init+0x140/0x140
      [  191.739266]  ntfs_fill_super+0x121b/0x1b50
      [  191.739623]  ? put_ntfs+0x1d0/0x1d0
      [  191.739984]  ? asm_sysvec_apic_timer_interrupt+0x1b/0x20
      [  191.740466]  ? put_ntfs+0x1d0/0x1d0
      [  191.740787]  ? sb_set_blocksize+0x6a/0x80
      [  191.741272]  get_tree_bdev+0x232/0x370
      [  191.741829]  ? put_ntfs+0x1d0/0x1d0
      [  191.742669]  ntfs_fs_get_tree+0x15/0x20
      [  191.743132]  vfs_get_tree+0x4c/0x130
      [  191.743457]  path_mount+0x654/0xfe0
      [  191.743938]  ? putname+0x80/0xa0
      [  191.744271]  ? finish_automount+0x2e0/0x2e0
      [  191.744582]  ? putname+0x80/0xa0
      [  191.745053]  ? kmem_cache_free+0x1c4/0x440
      [  191.745403]  ? putname+0x80/0xa0
      [  191.745616]  do_mount+0xd6/0xf0
      [  191.745887]  ? path_mount+0xfe0/0xfe0
      [  191.746287]  ? __kasan_check_write+0x14/0x20
      [  191.746582]  __x64_sys_mount+0xca/0x110
      [  191.746850]  do_syscall_64+0x3b/0x90
      [  191.747122]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
      [  191.747517] RIP: 0033:0x7f351fee948a
      [  191.748332] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008
      [  191.749341] RSP: 002b:00007ffd51cf3af8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
      [  191.749960] RAX: ffffffffffffffda RBX: 000055b903733060 RCX: 00007f351fee948a
      [  191.750589] RDX: 000055b903733260 RSI: 000055b9037332e0 RDI: 000055b90373bce0
      [  191.751115] RBP: 0000000000000000 R08: 000055b903733280 R09: 0000000000000020
      [  191.751537] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 000055b90373bce0
      [  191.751946] R13: 000055b903733260 R14: 0000000000000000 R15: 00000000ffffffff
      [  191.752519]  </TASK>
      [  191.752782] Modules linked in:
      [  191.753785] CR2: ffffebde00000008
      [  191.754937] ---[ end trace 0000000000000000 ]---
      [  191.755429] RIP: 0010:kfree+0x56/0x3b0
      [  191.755725] Code: 80 48 01 d8 0f 82 65 03 00 00 48 c7 c2 00 00 00 80 48 2b 15 2c 06 dd 01 48 01 d0 48 c1 e8 0c 48 c1 e0 06 48 03 05 0a 069
      [  191.756744] RSP: 0018:ffff8880076f7878 EFLAGS: 00000286
      [  191.757218] RAX: ffffebde00000000 RBX: 0000000000000040 RCX: ffffffff8528d5b9
      [  191.757580] RDX: 0000777f80000000 RSI: ffffffff8522d49c RDI: 0000000000000040
      [  191.758016] RBP: ffff8880076f78a0 R08: 0000000000000000 R09: 0000000000000000
      [  191.758570] R10: ffff888008949fd8 R11: ffffed10011293fd R12: 0000000000000040
      [  191.758957] R13: ffff888008949f98 R14: ffff888008949ec0 R15: ffff888008949fb0
      [  191.759317] FS:  00007f3520cd7e40(0000) GS:ffff88805ba00000(0000) knlGS:0000000000000000
      [  191.759711] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  191.760118] CR2: ffffebde00000008 CR3: 0000000009704000 CR4: 00000000000006f0
      Signed-off-by: NEdward Lo <edward.lo@ambergroup.io>
      Signed-off-by: NKonstantin Komarov <almaz.alexandrovich@paragon-software.com>
      Signed-off-by: NZhong Jinghua <zhongjinghua@huawei.com>
      Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
      Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
      Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
      887a9199
    • Y
      ext4: make sure fs error flag setted before clear journal error · 6546b303
      Ye Bin 提交于
      mainline inclusion
      from mainline-v6.3-rc2
      commit f57886ca
      category: bugfix
      bugzilla: 188471,https://gitee.com/openeuler/kernel/issues/I6MR1V
      CVE: NA
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f57886ca1606ba74cc4ec4eb5cbf073934ffa559
      
      --------------------------------
      
      Now, jounral error number maybe cleared even though ext4_commit_super()
      failed. This may lead to error flag miss, then fsck will miss to check
      file system deeply.
      Signed-off-by: NYe Bin <yebin10@huawei.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20230307061703.245965-3-yebin@huaweicloud.comSigned-off-by: NBaokun Li <libaokun1@huawei.com>
      Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
      Reviewed-by: NYang Erkun <yangerkun@huawei.com>
      Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
      6546b303
    • Y
      ext4: commit super block if fs record error when journal record without error · 04933ef6
      Ye Bin 提交于
      mainline inclusion
      from mainline-v6.3-rc2
      commit eee00237
      category: bugfix
      bugzilla: 188471,https://gitee.com/openeuler/kernel/issues/I6MR1V
      CVE: NA
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=eee00237fa5ec8f704f7323b54e48cc34e2d9168
      
      --------------------------------
      
      Now, 'es->s_state' maybe covered by recover journal. And journal errno
      maybe not recorded in journal sb as IO error. ext4_update_super() only
      update error information when 'sbi->s_add_error_count' large than zero.
      Then 'EXT4_ERROR_FS' flag maybe lost.
      To solve above issue just recover 'es->s_state' error flag after journal
      replay like error info.
      Signed-off-by: NYe Bin <yebin10@huawei.com>
      Reviewed-by: NBaokun Li <libaokun1@huawei.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20230307061703.245965-2-yebin@huaweicloud.comSigned-off-by: NBaokun Li <libaokun1@huawei.com>
      Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
      Reviewed-by: NYang Erkun <yangerkun@huawei.com>
      Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
      04933ef6
  2. 22 3月, 2023 2 次提交
    • D
      ext4: fix another off-by-one fsmap error on 1k block filesystems · 9c07e0e5
      Darrick J. Wong 提交于
      mainline inclusion
      from mainline-v6.3-rc2
      commit c993799b
      category: bugfix
      bugzilla: 188522,https://gitee.com/openeuler/kernel/issues/I6N7ZP
      CVE: NA
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c993799baf9c5861f8df91beb80e1611b12efcbd
      
      --------------------------------
      
      Apparently syzbot figured out that issuing this FSMAP call:
      
      struct fsmap_head cmd = {
      	.fmh_count	= ...;
      	.fmh_keys	= {
      		{ .fmr_device = /* ext4 dev */, .fmr_physical = 0, },
      		{ .fmr_device = /* ext4 dev */, .fmr_physical = 0, },
      	},
      ...
      };
      ret = ioctl(fd, FS_IOC_GETFSMAP, &cmd);
      
      Produces this crash if the underlying filesystem is a 1k-block ext4
      filesystem:
      
      kernel BUG at fs/ext4/ext4.h:3331!
      invalid opcode: 0000 [#1] PREEMPT SMP
      CPU: 3 PID: 3227965 Comm: xfs_io Tainted: G        W  O       6.2.0-rc8-achx
      Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014
      RIP: 0010:ext4_mb_load_buddy_gfp+0x47c/0x570 [ext4]
      RSP: 0018:ffffc90007c03998 EFLAGS: 00010246
      RAX: ffff888004978000 RBX: ffffc90007c03a20 RCX: ffff888041618000
      RDX: 0000000000000000 RSI: 00000000000005a4 RDI: ffffffffa0c99b11
      RBP: ffff888012330000 R08: ffffffffa0c2b7d0 R09: 0000000000000400
      R10: ffffc90007c03950 R11: 0000000000000000 R12: 0000000000000001
      R13: 00000000ffffffff R14: 0000000000000c40 R15: ffff88802678c398
      FS:  00007fdf2020c880(0000) GS:ffff88807e100000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007ffd318a5fe8 CR3: 000000007f80f001 CR4: 00000000001706e0
      Call Trace:
       <TASK>
       ext4_mballoc_query_range+0x4b/0x210 [ext4 dfa189daddffe8fecd3cdfd00564e0f265a8ab80]
       ext4_getfsmap_datadev+0x713/0x890 [ext4 dfa189daddffe8fecd3cdfd00564e0f265a8ab80]
       ext4_getfsmap+0x2b7/0x330 [ext4 dfa189daddffe8fecd3cdfd00564e0f265a8ab80]
       ext4_ioc_getfsmap+0x153/0x2b0 [ext4 dfa189daddffe8fecd3cdfd00564e0f265a8ab80]
       __ext4_ioctl+0x2a7/0x17e0 [ext4 dfa189daddffe8fecd3cdfd00564e0f265a8ab80]
       __x64_sys_ioctl+0x82/0xa0
       do_syscall_64+0x2b/0x80
       entry_SYSCALL_64_after_hwframe+0x46/0xb0
      RIP: 0033:0x7fdf20558aff
      RSP: 002b:00007ffd318a9e30 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
      RAX: ffffffffffffffda RBX: 00000000000200c0 RCX: 00007fdf20558aff
      RDX: 00007fdf1feb2010 RSI: 00000000c0c0583b RDI: 0000000000000003
      RBP: 00005625c0634be0 R08: 00005625c0634c40 R09: 0000000000000001
      R10: 0000000000000000 R11: 0000000000000246 R12: 00007fdf1feb2010
      R13: 00005625be70d994 R14: 0000000000000800 R15: 0000000000000000
      
      For GETFSMAP calls, the caller selects a physical block device by
      writing its block number into fsmap_head.fmh_keys[01].fmr_device.
      To query mappings for a subrange of the device, the starting byte of the
      range is written to fsmap_head.fmh_keys[0].fmr_physical and the last
      byte of the range goes in fsmap_head.fmh_keys[1].fmr_physical.
      
      IOWs, to query what mappings overlap with bytes 3-14 of /dev/sda, you'd
      set the inputs as follows:
      
      	fmh_keys[0] = { .fmr_device = major(8, 0), .fmr_physical = 3},
      	fmh_keys[1] = { .fmr_device = major(8, 0), .fmr_physical = 14},
      
      Which would return you whatever is mapped in the 12 bytes starting at
      physical offset 3.
      
      The crash is due to insufficient range validation of keys[1] in
      ext4_getfsmap_datadev.  On 1k-block filesystems, block 0 is not part of
      the filesystem, which means that s_first_data_block is nonzero.
      ext4_get_group_no_and_offset subtracts this quantity from the blocknr
      argument before cracking it into a group number and a block number
      within a group.  IOWs, block group 0 spans blocks 1-8192 (1-based)
      instead of 0-8191 (0-based) like what happens with larger blocksizes.
      
      The net result of this encoding is that blocknr < s_first_data_block is
      not a valid input to this function.  The end_fsb variable is set from
      the keys that are copied from userspace, which means that in the above
      example, its value is zero.  That leads to an underflow here:
      
      	blocknr = blocknr - le32_to_cpu(es->s_first_data_block);
      
      The division then operates on -1:
      
      	offset = do_div(blocknr, EXT4_BLOCKS_PER_GROUP(sb)) >>
      		EXT4_SB(sb)->s_cluster_bits;
      
      Leaving an impossibly large group number (2^32-1) in blocknr.
      ext4_getfsmap_check_keys checked that keys[0].fmr_physical and
      keys[1].fmr_physical are in increasing order, but
      ext4_getfsmap_datadev adjusts keys[0].fmr_physical to be at least
      s_first_data_block.  This implies that we have to check it again after
      the adjustment, which is the piece that I forgot.
      
      Reported-by: syzbot+6be2b977c89f79b6b153@syzkaller.appspotmail.com
      Fixes: 4a495624 ("ext4: fix off-by-one fsmap error on 1k block filesystems")
      Link: https://syzkaller.appspot.com/bug?id=79d5768e9bfe362911ac1a5057a36fc6b5c30002
      Cc: stable@vger.kernel.org
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Link: https://lore.kernel.org/r/Y+58NPTH7VNGgzdd@magnoliaSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NBaokun Li <libaokun1@huawei.com>
      Reviewed-by: NZhihao Cheng <chengzhihao1@huawei.com>
      Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
      Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
      9c07e0e5
    • Z
      jbd2: fix data missing when reusing bh which is ready to be checkpointed · bf2cd51e
      Zhihao Cheng 提交于
      mainline inclusion
      from mainline-v6.3-rc1
      commit e6b9bd72
      category: bugfix
      bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6C5HV
      CVE: NA
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e6b9bd7290d334451ce054e98e752abc055e0034
      
      --------------------------------
      
      Following process will make data lost and could lead to a filesystem
      corrupted problem:
      
      1. jh(bh) is inserted into T1->t_checkpoint_list, bh is dirty, and
         jh->b_transaction = NULL
      2. T1 is added into journal->j_checkpoint_transactions.
      3. Get bh prepare to write while doing checkpoing:
                 PA				    PB
         do_get_write_access             jbd2_log_do_checkpoint
          spin_lock(&jh->b_state_lock)
           if (buffer_dirty(bh))
            clear_buffer_dirty(bh)   // clear buffer dirty
             set_buffer_jbddirty(bh)
      				    transaction =
      				    journal->j_checkpoint_transactions
      				    jh = transaction->t_checkpoint_list
      				    if (!buffer_dirty(bh))
      		                      __jbd2_journal_remove_checkpoint(jh)
      				      // bh won't be flushed
      		                    jbd2_cleanup_journal_tail
          __jbd2_journal_file_buffer(jh, transaction, BJ_Reserved)
      4. Aborting journal/Power-cut before writing latest bh on journal area.
      
      In this way we get a corrupted filesystem with bh's data lost.
      
      Fix it by moving the clearing of buffer_dirty bit just before the call
      to __jbd2_journal_file_buffer(), both bit clearing and jh->b_transaction
      assignment are under journal->j_list_lock locked, so that
      jbd2_log_do_checkpoint() will wait until jh's new transaction fininshed
      even bh is currently not dirty. And journal_shrink_one_cp_list() won't
      remove jh from checkpoint list if the buffer head is reused in
      do_get_write_access().
      
      Fetch a reproducer in [Link].
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=216898
      Cc: <stable@kernel.org>
      Signed-off-by: NZhihao Cheng <chengzhihao1@huawei.com>
      Signed-off-by: Nzhanchengbin <zhanchengbin1@huawei.com>
      Suggested-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20230110015327.1181863-1-chengzhihao1@huawei.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: NYang Erkun <yangerkun@huawei.com>
      Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
      Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
      bf2cd51e
  3. 15 3月, 2023 3 次提交
    • Y
      ext4: fix WARNING in mb_find_extent · 7dec118b
      Ye Bin 提交于
      maillist inclusion
      category: bugfix
      bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6K53I
      
      Reference: https://patchwork.ozlabs.org/project/linux-ext4/patch/20230116020015.1506120-1-yebin@huaweicloud.com/
      
      --------------------------------
      
      Syzbot found the following issue:
      
      EXT4-fs: Warning: mounting with data=journal disables delayed allocation, dioread_nolock, O_DIRECT and fast_commit support!
      EXT4-fs (loop0): orphan cleanup on readonly fs
      ------------[ cut here ]------------
      WARNING: CPU: 1 PID: 5067 at fs/ext4/mballoc.c:1869 mb_find_extent+0x8a1/0xe30
      Modules linked in:
      CPU: 1 PID: 5067 Comm: syz-executor307 Not tainted 6.2.0-rc1-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022
      RIP: 0010:mb_find_extent+0x8a1/0xe30 fs/ext4/mballoc.c:1869
      RSP: 0018:ffffc90003c9e098 EFLAGS: 00010293
      RAX: ffffffff82405731 RBX: 0000000000000041 RCX: ffff8880783457c0
      RDX: 0000000000000000 RSI: 0000000000000041 RDI: 0000000000000040
      RBP: 0000000000000040 R08: ffffffff82405723 R09: ffffed10053c9402
      R10: ffffed10053c9402 R11: 1ffff110053c9401 R12: 0000000000000000
      R13: ffffc90003c9e538 R14: dffffc0000000000 R15: ffffc90003c9e2cc
      FS:  0000555556665300(0000) GS:ffff8880b9900000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 000056312f6796f8 CR3: 0000000022437000 CR4: 00000000003506e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       <TASK>
       ext4_mb_complex_scan_group+0x353/0x1100 fs/ext4/mballoc.c:2307
       ext4_mb_regular_allocator+0x1533/0x3860 fs/ext4/mballoc.c:2735
       ext4_mb_new_blocks+0xddf/0x3db0 fs/ext4/mballoc.c:5605
       ext4_ext_map_blocks+0x1868/0x6880 fs/ext4/extents.c:4286
       ext4_map_blocks+0xa49/0x1cc0 fs/ext4/inode.c:651
       ext4_getblk+0x1b9/0x770 fs/ext4/inode.c:864
       ext4_bread+0x2a/0x170 fs/ext4/inode.c:920
       ext4_quota_write+0x225/0x570 fs/ext4/super.c:7105
       write_blk fs/quota/quota_tree.c:64 [inline]
       get_free_dqblk+0x34a/0x6d0 fs/quota/quota_tree.c:130
       do_insert_tree+0x26b/0x1aa0 fs/quota/quota_tree.c:340
       do_insert_tree+0x722/0x1aa0 fs/quota/quota_tree.c:375
       do_insert_tree+0x722/0x1aa0 fs/quota/quota_tree.c:375
       do_insert_tree+0x722/0x1aa0 fs/quota/quota_tree.c:375
       dq_insert_tree fs/quota/quota_tree.c:401 [inline]
       qtree_write_dquot+0x3b6/0x530 fs/quota/quota_tree.c:420
       v2_write_dquot+0x11b/0x190 fs/quota/quota_v2.c:358
       dquot_acquire+0x348/0x670 fs/quota/dquot.c:444
       ext4_acquire_dquot+0x2dc/0x400 fs/ext4/super.c:6740
       dqget+0x999/0xdc0 fs/quota/dquot.c:914
       __dquot_initialize+0x3d0/0xcf0 fs/quota/dquot.c:1492
       ext4_process_orphan+0x57/0x2d0 fs/ext4/orphan.c:329
       ext4_orphan_cleanup+0xb60/0x1340 fs/ext4/orphan.c:474
       __ext4_fill_super fs/ext4/super.c:5516 [inline]
       ext4_fill_super+0x81cd/0x8700 fs/ext4/super.c:5644
       get_tree_bdev+0x400/0x620 fs/super.c:1282
       vfs_get_tree+0x88/0x270 fs/super.c:1489
       do_new_mount+0x289/0xad0 fs/namespace.c:3145
       do_mount fs/namespace.c:3488 [inline]
       __do_sys_mount fs/namespace.c:3697 [inline]
       __se_sys_mount+0x2d3/0x3c0 fs/namespace.c:3674
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      Add some debug information:
      mb_find_extent: mb_find_extent block=41, order=0 needed=64 next=0 ex=0/41/1@3735929054 64 64 7
      block_bitmap: ff 3f 0c 00 fc 01 00 00 d2 3d 00 00 00 00 00 00 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
      
      Acctually, blocks per group is 64, but block bitmap indicate at least has
      128 blocks. Now, ext4_validate_block_bitmap() didn't check invalid block's
      bitmap if set.
      To resolve above issue, add check like fsck "Padding at end of block bitmap is
      not set".
      
      Reported-by: syzbot+68223fe9f6c95ad43bed@syzkaller.appspotmail.com
      Signed-off-by: NYe Bin <yebin10@huawei.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NZhihao Cheng <chengzhihao1@huawei.com>
      Reviewed-by: NYang Erkun <yangerkun@huawei.com>
      Reviewed-by: NHou Tao <houtao1@huawei.com>
      Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
      7dec118b
    • D
      xfs: aborting inodes on shutdown may need buffer lock · cfc78156
      Dave Chinner 提交于
      mainline inclusion
      from mainline-v5.17-rc6
      commit d2d7c047
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I4KIAO
      CVE: NA
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d2d7c0473586d2f22e85d615275f34cf19f94447
      
      --------------------------------
      
      Most buffer io list operations are run with the bp->b_lock held, but
      xfs_iflush_abort() can be called without the buffer lock being held
      resulting in inodes being removed from the buffer list while other
      list operations are occurring. This causes problems with corrupted
      bp->b_io_list inode lists during filesystem shutdown, leading to
      traversals that never end, double removals from the AIL, etc.
      
      Fix this by passing the buffer to xfs_iflush_abort() if we have
      it locked. If the inode is attached to the buffer, we're going to
      have to remove it from the buffer list and we'd have to get the
      buffer off the inode log item to do that anyway.
      
      If we don't have a buffer passed in (e.g. from xfs_reclaim_inode())
      then we can determine if the inode has a log item and if it is
      attached to a buffer before we do anything else. If it does have an
      attached buffer, we can lock it safely (because the inode has a
      reference to it) and then perform the inode abort.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      
      conflicts:
      	fs/xfs/xfs_icache.c
      Signed-off-by: NLong Li <leo.lilong@huawei.com>
      Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
      Reviewed-by: NYang Erkun <yangerkun@huawei.com>
      Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
      cfc78156
    • Z
      ext4: fix incorrect options show of original mount_opt and extend mount_opt2 · 2b59ae5e
      Zhang Yi 提交于
      maillist inclusion
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I6D5XF
      
      Reference: https://lore.kernel.org/linux-ext4/20230130111138.76tp6pij3yhh4brh@quack3/T/#t
      
      --------------------------------
      
      Current _ext4_show_options() do not distinguish MOPT_2 flag, so it mixed
      extend sbi->s_mount_opt2 options with sbi->s_mount_opt, it could lead to
      show incorrect options, e.g. show fc_debug_force if we mount with
      errors=continue mode and miss it if we set.
      
        $ mkfs.ext4 /dev/pmem0
        $ mount -o errors=remount-ro /dev/pmem0 /mnt
        $ cat /proc/fs/ext4/pmem0/options | grep fc_debug_force
          #empty
        $ mount -o remount,errors=continue /mnt
        $ cat /proc/fs/ext4/pmem0/options | grep fc_debug_force
          fc_debug_force
        $ mount -o remount,errors=remount-ro,fc_debug_force /mnt
        $ cat /proc/fs/ext4/pmem0/options | grep fc_debug_force
          #empty
      
      Fixes: 995a3ed6 ("ext4: add fast_commit feature and handling for extended mount options")
      Signed-off-by: NZhang Yi <yi.zhang@huawei.com>
      
      Conflict:
        fs/ext4/super.c
      Reviewed-by: NZhihao Cheng <chengzhihao1@huawei.com>
      Reviewed-by: NZhang Xiaoxu <zhangxiaoxu5@huawei.com>
      Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
      2b59ae5e
  4. 08 3月, 2023 3 次提交
  5. 28 2月, 2023 18 次提交
  6. 22 2月, 2023 1 次提交
    • Z
      cifs: Fix use-after-free in rdata->read_into_pages() · 3ce6f394
      ZhaoLong Wang 提交于
      mainline inclusion
      from mainline-v6.2-rc8
      commit aa5465ae
      category: bugfix
      bugzilla: 188381, https://gitee.com/openeuler/kernel/issues/I644ST
      CVE: NA
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=aa5465aeca3c66fecdf7efcf554aed79b4c4b211
      
      ------------------------------------------------------
      
      When the network status is unstable, use-after-free may occur when
      read data from the server.
      
        BUG: KASAN: use-after-free in readpages_fill_pages+0x14c/0x7e0
      
        Call Trace:
         <TASK>
         dump_stack_lvl+0x38/0x4c
         print_report+0x16f/0x4a6
         kasan_report+0xb7/0x130
         readpages_fill_pages+0x14c/0x7e0
         cifs_readv_receive+0x46d/0xa40
         cifs_demultiplex_thread+0x121c/0x1490
         kthread+0x16b/0x1a0
         ret_from_fork+0x2c/0x50
         </TASK>
      
        Allocated by task 2535:
         kasan_save_stack+0x22/0x50
         kasan_set_track+0x25/0x30
         __kasan_kmalloc+0x82/0x90
         cifs_readdata_direct_alloc+0x2c/0x110
         cifs_readdata_alloc+0x2d/0x60
         cifs_readahead+0x393/0xfe0
         read_pages+0x12f/0x470
         page_cache_ra_unbounded+0x1b1/0x240
         filemap_get_pages+0x1c8/0x9a0
         filemap_read+0x1c0/0x540
         cifs_strict_readv+0x21b/0x240
         vfs_read+0x395/0x4b0
         ksys_read+0xb8/0x150
         do_syscall_64+0x3f/0x90
         entry_SYSCALL_64_after_hwframe+0x72/0xdc
      
        Freed by task 79:
         kasan_save_stack+0x22/0x50
         kasan_set_track+0x25/0x30
         kasan_save_free_info+0x2e/0x50
         __kasan_slab_free+0x10e/0x1a0
         __kmem_cache_free+0x7a/0x1a0
         cifs_readdata_release+0x49/0x60
         process_one_work+0x46c/0x760
         worker_thread+0x2a4/0x6f0
         kthread+0x16b/0x1a0
         ret_from_fork+0x2c/0x50
      
        Last potentially related work creation:
         kasan_save_stack+0x22/0x50
         __kasan_record_aux_stack+0x95/0xb0
         insert_work+0x2b/0x130
         __queue_work+0x1fe/0x660
         queue_work_on+0x4b/0x60
         smb2_readv_callback+0x396/0x800
         cifs_abort_connection+0x474/0x6a0
         cifs_reconnect+0x5cb/0xa50
         cifs_readv_from_socket.cold+0x22/0x6c
         cifs_read_page_from_socket+0xc1/0x100
         readpages_fill_pages.cold+0x2f/0x46
         cifs_readv_receive+0x46d/0xa40
         cifs_demultiplex_thread+0x121c/0x1490
         kthread+0x16b/0x1a0
         ret_from_fork+0x2c/0x50
      
      The following function calls will cause UAF of the rdata pointer.
      
      readpages_fill_pages
       cifs_read_page_from_socket
        cifs_readv_from_socket
         cifs_reconnect
          __cifs_reconnect
           cifs_abort_connection
            mid->callback() --> smb2_readv_callback
             queue_work(&rdata->work)  # if the worker completes first,
                                       # the rdata is freed
                cifs_readv_complete
                  kref_put
                    cifs_readdata_release
                      kfree(rdata)
       return rdata->...               # UAF in readpages_fill_pages()
      
      Similarly, this problem also occurs in the uncache_fill_pages().
      
      Fix this by adjusts the order of condition judgment in the return
      statement.
      Signed-off-by: NZhaoLong Wang <wangzhaolong1@huawei.com>
      Cc: stable@vger.kernel.org
      Acked-by: NPaulo Alcantara (SUSE) <pc@cjr.nz>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      Reviewed-by: NZhang Xiaoxu <zhangxiaoxu5@huawei.com>
      Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
      3ce6f394
  7. 14 2月, 2023 3 次提交