1. 05 7月, 2022 1 次提交
  2. 21 6月, 2022 2 次提交
    • B
      ext4: correct the judgment of BUG in ext4_mb_normalize_request · a936ae29
      Baokun Li 提交于
      hulk inclusion
      category: bugfix
      bugzilla: 186777, https://gitee.com/openeuler/kernel/issues/I5C568
      CVE: NA
      
      --------------------------------
      
      ext4_mb_normalize_request() can move logical start of allocated blocks
      to reduce fragmentation and better utilize preallocation. However logical
      block requested as a start of allocation (ac->ac_o_ex.fe_logical) should
      always be covered by allocated blocks so we should check that by
      modifying and to or in the assertion.
      Signed-off-by: NBaokun Li <libaokun1@huawei.com>
      Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      a936ae29
    • B
      ext4: fix bug_on ext4_mb_use_inode_pa · d45943ea
      Baokun Li 提交于
      hulk inclusion
      category: bugfix
      bugzilla: 186777, https://gitee.com/openeuler/kernel/issues/I5C568
      CVE: NA
      
      --------------------------------
      
      Hulk Robot reported a BUG_ON:
      
      ==================================================================
      kernel BUG at fs/ext4/mballoc.c:3211!
      [...]
      RIP: 0010:ext4_mb_mark_diskspace_used.cold+0x85/0x136f
      [...]
      Call Trace:
       ext4_mb_new_blocks+0x9df/0x5d30
       ext4_ext_map_blocks+0x1803/0x4d80
       ext4_map_blocks+0x3a4/0x1a10
       ext4_writepages+0x126d/0x2c30
       do_writepages+0x7f/0x1b0
       __filemap_fdatawrite_range+0x285/0x3b0
       file_write_and_wait_range+0xb1/0x140
       ext4_sync_file+0x1aa/0xca0
       vfs_fsync_range+0xfb/0x260
       do_fsync+0x48/0xa0
      [...]
      ==================================================================
      
      Above issue may happen as follows:
      -------------------------------------
      do_fsync
       vfs_fsync_range
        ext4_sync_file
         file_write_and_wait_range
          __filemap_fdatawrite_range
           do_writepages
            ext4_writepages
             mpage_map_and_submit_extent
              mpage_map_one_extent
               ext4_map_blocks
                ext4_mb_new_blocks
                 ext4_mb_normalize_request
                  >>> start + size <= ac->ac_o_ex.fe_logical
                 ext4_mb_regular_allocator
                  ext4_mb_simple_scan_group
                   ext4_mb_use_best_found
                    ext4_mb_new_preallocation
                     ext4_mb_new_inode_pa
                      ext4_mb_use_inode_pa
                       >>> set ac->ac_b_ex.fe_len <= 0
                 ext4_mb_mark_diskspace_used
                  >>> BUG_ON(ac->ac_b_ex.fe_len <= 0);
      
      we can easily reproduce this problem with the following commands:
      	`fallocate -l100M disk`
      	`mkfs.ext4 -b 1024 -g 256 disk`
      	`mount disk /mnt`
      	`fsstress -d /mnt -l 0 -n 1000 -p 1`
      
      The size must be smaller than or equal to EXT4_BLOCKS_PER_GROUP.
      Therefore, "start + size <= ac->ac_o_ex.fe_logical" may occur
      when the size is truncated. So start should be the start position of
      the group where ac_o_ex.fe_logical is located after alignment.
      In addition, when the value of fe_logical or EXT4_BLOCKS_PER_GROUP
      is very large, the value calculated by start_off is more accurate.
      
      Fixes: cd648b8a ("ext4: trim allocation requests to group size")
      Reported-by: NHulk Robot <hulkci@huawei.com>
      Signed-off-by: NBaokun Li <libaokun1@huawei.com>
      Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      d45943ea
  3. 17 5月, 2022 1 次提交
  4. 27 4月, 2022 2 次提交
  5. 13 10月, 2021 1 次提交
  6. 15 6月, 2021 1 次提交
  7. 19 4月, 2021 1 次提交
  8. 18 1月, 2021 1 次提交
  9. 12 1月, 2021 1 次提交
  10. 07 11月, 2020 2 次提交
  11. 22 10月, 2020 1 次提交
  12. 18 10月, 2020 5 次提交
  13. 20 8月, 2020 3 次提交
  14. 19 8月, 2020 4 次提交
  15. 08 8月, 2020 3 次提交
  16. 06 8月, 2020 3 次提交
  17. 11 6月, 2020 1 次提交
    • R
      ext4: mballoc: Use this_cpu_read instead of this_cpu_ptr · 81198536
      Ritesh Harjani 提交于
      Simplify reading a seq variable by directly using this_cpu_read API
      instead of doing this_cpu_ptr and then dereferencing it.
      
      This also avoid the below kernel BUG: which happens when
      CONFIG_DEBUG_PREEMPT is enabled
      
      BUG: using smp_processor_id() in preemptible [00000000] code: syz-fuzzer/6927
      caller is ext4_mb_new_blocks+0xa4d/0x3b70 fs/ext4/mballoc.c:4711
      CPU: 1 PID: 6927 Comm: syz-fuzzer Not tainted 5.7.0-next-20200602-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x18f/0x20d lib/dump_stack.c:118
       check_preemption_disabled+0x20d/0x220 lib/smp_processor_id.c:48
       ext4_mb_new_blocks+0xa4d/0x3b70 fs/ext4/mballoc.c:4711
       ext4_ext_map_blocks+0x201b/0x33e0 fs/ext4/extents.c:4244
       ext4_map_blocks+0x4cb/0x1640 fs/ext4/inode.c:626
       ext4_getblk+0xad/0x520 fs/ext4/inode.c:833
       ext4_bread+0x7c/0x380 fs/ext4/inode.c:883
       ext4_append+0x153/0x360 fs/ext4/namei.c:67
       ext4_init_new_dir fs/ext4/namei.c:2757 [inline]
       ext4_mkdir+0x5e0/0xdf0 fs/ext4/namei.c:2802
       vfs_mkdir+0x419/0x690 fs/namei.c:3632
       do_mkdirat+0x21e/0x280 fs/namei.c:3655
       do_syscall_64+0x60/0xe0 arch/x86/entry/common.c:359
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Fixes: 42f56b7a4a7d ("ext4: mballoc: introduce pcpu seqcnt for freeing PA
      to improve ENOSPC handling")
      Suggested-by: NBorislav Petkov <bp@alien8.de>
      Tested-by: NMarek Szyprowski <m.szyprowski@samsung.com>
      Signed-off-by: NRitesh Harjani <riteshh@linux.ibm.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reported-by: syzbot+82f324bb69744c5f6969@syzkaller.appspotmail.com
      Link: https://lore.kernel.org/r/534f275016296996f54ecf65168bb3392b6f653d.1591699601.git.riteshh@linux.ibm.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
      81198536
  18. 04 6月, 2020 7 次提交
    • R
      ext4: mballoc: use lock for checking free blocks while retrying · 99377830
      Ritesh Harjani 提交于
      Currently while doing block allocation grp->bb_free may be getting
      modified if discard is happening in parallel.
      For e.g. consider a case where there are lot of threads who have
      preallocated lot of blocks and there is a thread which is trying
      to discard all of this group's PA. Now it could happen that
      we see all of those group's bb_free is zero and fail the allocation
      while there is sufficient space if we free up all the PA.
      
      So this patch adds another flag "EXT4_MB_STRICT_CHECK" which will be set
      if we are unable to allocate any blocks in the first try (since we may
      not have considered blocks about to be discarded from PA lists).
      So during retry attempt to allocate blocks we will use ext4_lock_group()
      for checking if the group is good or not.
      Signed-off-by: NRitesh Harjani <riteshh@linux.ibm.com>
      Link: https://lore.kernel.org/r/9cb740a117c958c36596f167b12af1beae9a68b7.1589955723.git.riteshh@linux.ibm.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
      99377830
    • R
      ext4: mballoc: refactor ext4_mb_good_group() · 8ef123fe
      Ritesh Harjani 提交于
      ext4_mb_good_group() definition was changed some time back
      and now it even initializes the buddy cache (via ext4_mb_init_group()),
      if in case the EXT4_MB_GRP_NEED_INIT() is true for a group.
      Note that ext4_mb_init_group() could sleep and so should not be called
      under a spinlock held.
      This is fine as of now because ext4_mb_good_group() is called before
      loading the buddy bitmap without ext4_lock_group() held
      and again called after loading the bitmap, only this time with
      ext4_lock_group() held.
      But still this whole thing is confusing.
      
      So this patch refactors out ext4_mb_good_group_nolock() which should be
      called when without holding ext4_lock_group().
      Also in further patches we hold the spinlock (ext4_lock_group()) while
      doing any calculations which involves grp->bb_free or grp->bb_fragments.
      Signed-off-by: NRitesh Harjani <riteshh@linux.ibm.com>
      Link: https://lore.kernel.org/r/d9f7d031a5fbe1c943fae6bf1ff5cdf0604ae722.1589955723.git.riteshh@linux.ibm.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
      8ef123fe
    • R
      ext4: mballoc: introduce pcpu seqcnt for freeing PA to improve ENOSPC handling · 07b5b8e1
      Ritesh Harjani 提交于
      There could be a race in function ext4_mb_discard_group_preallocations()
      where the 1st thread may iterate through group's bb_prealloc_list and
      remove all the PAs and add to function's local list head.
      Now if the 2nd thread comes in to discard the group preallocations,
      it will see that the group->bb_prealloc_list is empty and will return 0.
      
      Consider for a case where we have less number of groups
      (for e.g. just group 0),
      this may even return an -ENOSPC error from ext4_mb_new_blocks()
      (where we call for ext4_mb_discard_group_preallocations()).
      But that is wrong, since 2nd thread should have waited for 1st thread
      to release all the PAs and should have retried for allocation.
      Since 1st thread was anyway going to discard the PAs.
      
      The algorithm using this percpu seq counter goes below:
      1. We sample the percpu discard_pa_seq counter before trying for block
         allocation in ext4_mb_new_blocks().
      2. We increment this percpu discard_pa_seq counter when we either allocate
         or free these blocks i.e. while marking those blocks as used/free in
         mb_mark_used()/mb_free_blocks().
      3. We also increment this percpu seq counter when we successfully identify
         that the bb_prealloc_list is not empty and hence proceed for discarding
         of those PAs inside ext4_mb_discard_group_preallocations().
      
      Now to make sure that the regular fast path of block allocation is not
      affected, as a small optimization we only sample the percpu seq counter
      on that cpu. Only when the block allocation fails and when freed blocks
      found were 0, that is when we sample percpu seq counter for all cpus using
      below function ext4_get_discard_pa_seq_sum(). This happens after making
      sure that all the PAs on grp->bb_prealloc_list got freed or if it's empty.
      
      It can be well argued that why don't just check for grp->bb_free to
      see if there are any free blocks to be allocated. So here are the two
      concerns which were discussed:-
      
      1. If for some reason the blocks available in the group are not
         appropriate for allocation logic (say for e.g.
         EXT4_MB_HINT_GOAL_ONLY, although this is not yet implemented), then
         the retry logic may result into infinte looping since grp->bb_free is
         non-zero.
      
      2. Also before preallocation was clubbed with block allocation with the
         same ext4_lock_group() held, there were lot of races where grp->bb_free
         could not be reliably relied upon.
      Due to above, this patch considers discard_pa_seq logic to determine if
      we should retry for block allocation. Say if there are are n threads
      trying for block allocation and none of those could allocate or discard
      any of the blocks, then all of those n threads will fail the block
      allocation and return -ENOSPC error. (Since the seq counter for all of
      those will match as no block allocation/discard was done during that
      duration).
      Signed-off-by: NRitesh Harjani <riteshh@linux.ibm.com>
      Link: https://lore.kernel.org/r/7f254686903b87c419d798742fd9a1be34f0657b.1589955723.git.riteshh@linux.ibm.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
      07b5b8e1
    • R
      ext4: mballoc: refactor ext4_mb_discard_preallocations() · cf5e2ca6
      Ritesh Harjani 提交于
      Implement ext4_mb_discard_preallocations_should_retry()
      which we will need in later patches to add more logic
      like check for sequence number match to see if we should
      retry for block allocation or not.
      
      There should be no functionality change in this patch.
      Signed-off-by: NRitesh Harjani <riteshh@linux.ibm.com>
      Link: https://lore.kernel.org/r/1cfae0098d2aa9afbeb59331401258182868c8f2.1589955723.git.riteshh@linux.ibm.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
      cf5e2ca6
    • R
      ext4: mballoc: add blocks to PA list under same spinlock after allocating blocks · 53f86b17
      Ritesh Harjani 提交于
      ext4_mb_discard_preallocations() only checks for grp->bb_prealloc_list
      of every group to discard the group's PA to free up the space if
      allocation request fails. Consider below race:-
      
      Process A  				Process B
      
      1. allocate blocks
      					1. Fails block allocation from
      					     ext4_mb_regular_allocator()
         ext4_lock_group()
      	allocated blocks
      	more than ac_o_ex.fe_len
         ext4_unlock_group()
      					2. Scans the
      					   grp->bb_prealloc_list (under
      					   ext4_lock_group()) and
      					   find nothing and thus return
      					   -ENOSPC.
      
      2. Add the additional blocks to PA list
      
         ext4_lock_group()
         	add blocks to grp->bb_prealloc_list
         ext4_unlock_group()
      
      Above race could be avoided if we add those additional blocks to
      grp->bb_prealloc_list at the same time with block allocation when
      ext4_lock_group() was still held.
      With this discard-PA will know if there are actually any blocks which
      could be freed from the PA
      Signed-off-by: NRitesh Harjani <riteshh@linux.ibm.com>
      Link: https://lore.kernel.org/r/a2217dd782585b42328981832e6d396abaaccb80.1589955723.git.riteshh@linux.ibm.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
      53f86b17
    • R
      ext4: mballoc: make mb_debug() implementation to use pr_debug() · d3df1453
      Ritesh Harjani 提交于
      mb_debug() msg had only 1 control level for all type of msgs.
      And if we enable mballoc_debug then all of those msgs would be enabled.
      Instead of adding multiple debug levels for mb_debug() msgs, use
      pr_debug() with which we could have finer control to print msgs at all
      of different levels (i.e. at file, func, line no.).
      
      Also add process name/pid, superblk id, and other info in mb_debug()
      msg. This also kills the mballoc_debug module parameter, since it is
      not needed any more.
      Signed-off-by: NRitesh Harjani <riteshh@linux.ibm.com>
      Link: https://lore.kernel.org/r/f0c660cbde9e2edbe95c67942ca9ad80dd2231eb.1589086800.git.riteshh@linux.ibm.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
      d3df1453
    • R
      ext4: mballoc: fix possible NULL ptr & remove BUG_ONs from DOUBLE_CHECK · eb2b8ebb
      Ritesh Harjani 提交于
      Make sure to check for e4b->bd_info->bb_bitmap == NULL, in
      mb_cmp_bitmaps() and return if NULL, to avoid possible NULL ptr
      dereference. Similar to how we do this in other ifdef DOUBLE_CHECK
      functions.
      
      Also remove the BUG_ON() logic if kmalloc() or ext4_read_block_bitmap()
      fails. We should simply mark grp->bb_bitmap as NULL if above happens.
      In fact ext4_read_block_bitmap() may even return an error in case of resize
      ioctl. Hence remove this BUG_ON logic (fstests ext4/032 may trigger
      this).
      
      Link: https://lore.kernel.org/r/9a54f8a696ff17c057cd571be3d15ac3ec1407f1.1589086800.git.riteshh@linux.ibm.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
      eb2b8ebb