1. 16 6月, 2020 8 次提交
    • Q
      ext4: fix a data race at inode->i_blocks · 7b067e44
      Qian Cai 提交于
      task #28557685
      
      commit 28936b62e71e41600bab319f262ea9f9b1027629 upstream.
      
      inode->i_blocks could be accessed concurrently as noticed by KCSAN,
      
       BUG: KCSAN: data-race in ext4_do_update_inode [ext4] / inode_add_bytes
      
       write to 0xffff9a00d4b982d0 of 8 bytes by task 22100 on cpu 118:
        inode_add_bytes+0x65/0xf0
        __inode_add_bytes at fs/stat.c:689
        (inlined by) inode_add_bytes at fs/stat.c:702
        ext4_mb_new_blocks+0x418/0xca0 [ext4]
        ext4_ext_map_blocks+0x1a6b/0x27b0 [ext4]
        ext4_map_blocks+0x1a9/0x950 [ext4]
        _ext4_get_block+0xfc/0x270 [ext4]
        ext4_get_block_unwritten+0x33/0x50 [ext4]
        __block_write_begin_int+0x22e/0xae0
        __block_write_begin+0x39/0x50
        ext4_write_begin+0x388/0xb50 [ext4]
        ext4_da_write_begin+0x35f/0x8f0 [ext4]
        generic_perform_write+0x15d/0x290
        ext4_buffered_write_iter+0x11f/0x210 [ext4]
        ext4_file_write_iter+0xce/0x9e0 [ext4]
        new_sync_write+0x29c/0x3b0
        __vfs_write+0x92/0xa0
        vfs_write+0x103/0x260
        ksys_write+0x9d/0x130
        __x64_sys_write+0x4c/0x60
        do_syscall_64+0x91/0xb05
        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
       read to 0xffff9a00d4b982d0 of 8 bytes by task 8 on cpu 65:
        ext4_do_update_inode+0x4a0/0xf60 [ext4]
        ext4_inode_blocks_set at fs/ext4/inode.c:4815
        ext4_mark_iloc_dirty+0xaf/0x160 [ext4]
        ext4_mark_inode_dirty+0x129/0x3e0 [ext4]
        ext4_convert_unwritten_extents+0x253/0x2d0 [ext4]
        ext4_convert_unwritten_io_end_vec+0xc5/0x150 [ext4]
        ext4_end_io_rsv_work+0x22c/0x350 [ext4]
        process_one_work+0x54f/0xb90
        worker_thread+0x80/0x5f0
        kthread+0x1cd/0x1f0
        ret_from_fork+0x27/0x50
      
       4 locks held by kworker/u256:0/8:
        #0: ffff9a025abc4328 ((wq_completion)ext4-rsv-conversion){+.+.}, at: process_one_work+0x443/0xb90
        #1: ffffab5a862dbe20 ((work_completion)(&ei->i_rsv_conversion_work)){+.+.}, at: process_one_work+0x443/0xb90
        #2: ffff9a025a9d0f58 (jbd2_handle){++++}, at: start_this_handle+0x1c1/0x9d0 [jbd2]
        #3: ffff9a00d4b985d8 (&(&ei->i_raw_lock)->rlock){+.+.}, at: ext4_do_update_inode+0xaa/0xf60 [ext4]
       irq event stamp: 3009267
       hardirqs last  enabled at (3009267): [<ffffffff980da9b7>] __find_get_block+0x107/0x790
       hardirqs last disabled at (3009266): [<ffffffff980da8f9>] __find_get_block+0x49/0x790
       softirqs last  enabled at (3009230): [<ffffffff98a0034c>] __do_softirq+0x34c/0x57c
       softirqs last disabled at (3009223): [<ffffffff97cc67a2>] irq_exit+0xa2/0xc0
      
       Reported by Kernel Concurrency Sanitizer on:
       CPU: 65 PID: 8 Comm: kworker/u256:0 Tainted: G L 5.6.0-rc2-next-20200221+ #7
       Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
       Workqueue: ext4-rsv-conversion ext4_end_io_rsv_work [ext4]
      
      The plain read is outside of inode->i_lock critical section which
      results in a data race. Fix it by adding READ_ONCE() there.
      
      Link: https://lore.kernel.org/r/20200222043258.2279-1-cai@lca.pwSigned-off-by: NQian Cai <cai@lca.pw>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NJeffle Xu <jefflexu@linux.alibaba.com>
      Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      7b067e44
    • Q
      ext4: fix a data race in EXT4_I(inode)->i_disksize · ff11a028
      Qian Cai 提交于
      task #28557685
      
      commit 35df4299a6487f323b0aca120ea3f485dfee2ae3 upstream.
      
      EXT4_I(inode)->i_disksize could be accessed concurrently as noticed by
      KCSAN,
      
       BUG: KCSAN: data-race in ext4_write_end [ext4] / ext4_writepages [ext4]
      
       write to 0xffff91c6713b00f8 of 8 bytes by task 49268 on cpu 127:
        ext4_write_end+0x4e3/0x750 [ext4]
        ext4_update_i_disksize at fs/ext4/ext4.h:3032
        (inlined by) ext4_update_inode_size at fs/ext4/ext4.h:3046
        (inlined by) ext4_write_end at fs/ext4/inode.c:1287
        generic_perform_write+0x208/0x2a0
        ext4_buffered_write_iter+0x11f/0x210 [ext4]
        ext4_file_write_iter+0xce/0x9e0 [ext4]
        new_sync_write+0x29c/0x3b0
        __vfs_write+0x92/0xa0
        vfs_write+0x103/0x260
        ksys_write+0x9d/0x130
        __x64_sys_write+0x4c/0x60
        do_syscall_64+0x91/0xb47
        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
       read to 0xffff91c6713b00f8 of 8 bytes by task 24872 on cpu 37:
        ext4_writepages+0x10ac/0x1d00 [ext4]
        mpage_map_and_submit_extent at fs/ext4/inode.c:2468
        (inlined by) ext4_writepages at fs/ext4/inode.c:2772
        do_writepages+0x5e/0x130
        __writeback_single_inode+0xeb/0xb20
        writeback_sb_inodes+0x429/0x900
        __writeback_inodes_wb+0xc4/0x150
        wb_writeback+0x4bd/0x870
        wb_workfn+0x6b4/0x960
        process_one_work+0x54c/0xbe0
        worker_thread+0x80/0x650
        kthread+0x1e0/0x200
        ret_from_fork+0x27/0x50
      
       Reported by Kernel Concurrency Sanitizer on:
       CPU: 37 PID: 24872 Comm: kworker/u261:2 Tainted: G        W  O L 5.5.0-next-20200204+ #5
       Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
       Workqueue: writeback wb_workfn (flush-7:0)
      
      Since only the read is operating as lockless (outside of the
      "i_data_sem"), load tearing could introduce a logic bug. Fix it by
      adding READ_ONCE() for the read and WRITE_ONCE() for the write.
      Signed-off-by: NQian Cai <cai@lca.pw>
      Link: https://lore.kernel.org/r/1581085751-31793-1-git-send-email-cai@lca.pwSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NJeffle Xu <jefflexu@linux.alibaba.com>
      Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      ff11a028
    • S
      ext4: add cond_resched() to ext4_protect_reserved_inode · b6063100
      Shijie Luo 提交于
      task #28557685
      
      commit af133ade9a40794a37104ecbcc2827c0ea373a3c upstream.
      
      When journal size is set too big by "mkfs.ext4 -J size=", or when
      we mount a crafted image to make journal inode->i_size too big,
      the loop, "while (i < num)", holds cpu too long. This could cause
      soft lockup.
      
      [  529.357541] Call trace:
      [  529.357551]  dump_backtrace+0x0/0x198
      [  529.357555]  show_stack+0x24/0x30
      [  529.357562]  dump_stack+0xa4/0xcc
      [  529.357568]  watchdog_timer_fn+0x300/0x3e8
      [  529.357574]  __hrtimer_run_queues+0x114/0x358
      [  529.357576]  hrtimer_interrupt+0x104/0x2d8
      [  529.357580]  arch_timer_handler_virt+0x38/0x58
      [  529.357584]  handle_percpu_devid_irq+0x90/0x248
      [  529.357588]  generic_handle_irq+0x34/0x50
      [  529.357590]  __handle_domain_irq+0x68/0xc0
      [  529.357593]  gic_handle_irq+0x6c/0x150
      [  529.357595]  el1_irq+0xb8/0x140
      [  529.357599]  __ll_sc_atomic_add_return_acquire+0x14/0x20
      [  529.357668]  ext4_map_blocks+0x64/0x5c0 [ext4]
      [  529.357693]  ext4_setup_system_zone+0x330/0x458 [ext4]
      [  529.357717]  ext4_fill_super+0x2170/0x2ba8 [ext4]
      [  529.357722]  mount_bdev+0x1a8/0x1e8
      [  529.357746]  ext4_mount+0x44/0x58 [ext4]
      [  529.357748]  mount_fs+0x50/0x170
      [  529.357752]  vfs_kern_mount.part.9+0x54/0x188
      [  529.357755]  do_mount+0x5ac/0xd78
      [  529.357758]  ksys_mount+0x9c/0x118
      [  529.357760]  __arm64_sys_mount+0x28/0x38
      [  529.357764]  el0_svc_common+0x78/0x130
      [  529.357766]  el0_svc_handler+0x38/0x78
      [  529.357769]  el0_svc+0x8/0xc
      [  541.356516] watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [mount:18674]
      
      Link: https://lore.kernel.org/r/20200211011752.29242-1-luoshijie1@huawei.comReviewed-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NShijie Luo <luoshijie1@huawei.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NJeffle Xu <jefflexu@linux.alibaba.com>
      Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      b6063100
    • E
      ext4: fix deadlock allocating crypto bounce page from mempool · 4141548c
      Eric Biggers 提交于
      task #28557685
      
      [ Upstream commit 547c556f4db7c09447ecf5f833ab6aaae0c5ab58 ]
      
      ext4_writepages() on an encrypted file has to encrypt the data, but it
      can't modify the pagecache pages in-place, so it encrypts the data into
      bounce pages and writes those instead.  All bounce pages are allocated
      from a mempool using GFP_NOFS.
      
      This is not correct use of a mempool, and it can deadlock.  This is
      because GFP_NOFS includes __GFP_DIRECT_RECLAIM, which enables the "never
      fail" mode for mempool_alloc() where a failed allocation will fall back
      to waiting for one of the preallocated elements in the pool.
      
      But since this mode is used for all a bio's pages and not just the
      first, it can deadlock waiting for pages already in the bio to be freed.
      
      This deadlock can be reproduced by patching mempool_alloc() to pretend
      that pool->alloc() always fails (so that it always falls back to the
      preallocations), and then creating an encrypted file of size > 128 KiB.
      
      Fix it by only using GFP_NOFS for the first page in the bio.  For
      subsequent pages just use GFP_NOWAIT, and if any of those fail, just
      submit the bio and start a new one.
      
      This will need to be fixed in f2fs too, but that's less straightforward.
      
      Fixes: c9af28fd ("ext4 crypto: don't let data integrity writebacks fail with ENOMEM")
      Cc: stable@vger.kernel.org
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Link: https://lore.kernel.org/r/20191231181149.47619-1-ebiggers@kernel.orgSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NJeffle Xu <jefflexu@linux.alibaba.com>
      Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      4141548c
    • C
      ext4: set error return correctly when ext4_htree_store_dirent fails · 219c5eb7
      Colin Ian King 提交于
      task #28557685
      
      [ Upstream commit 7a14826ede1d714f0bb56de8167c0e519041eeda ]
      
      Currently when the call to ext4_htree_store_dirent fails the error return
      variable 'ret' is is not being set to the error code and variable count is
      instead, hence the error code is not being returned.  Fix this by assigning
      ret to the error return code.
      
      Addresses-Coverity: ("Unused value")
      Fixes: 8af0f082 ("ext4: fix readdir error in the case of inline_data+dir_index")
      Signed-off-by: NColin Ian King <colin.king@canonical.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NJeffle Xu <jefflexu@linux.alibaba.com>
      Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      219c5eb7
    • D
      ext4: unlock on error in ext4_expand_extra_isize() · b23f8091
      Dan Carpenter 提交于
      task #28557685
      
      commit 7f420d64a08c1dcd65b27be82a27cf2bdb2e7847 upstream.
      
      We need to unlock the xattr before returning on this error path.
      
      Cc: stable@kernel.org # 4.13
      Fixes: c03b45b8 ("ext4, project: expand inode extra size if possible")
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Link: https://lore.kernel.org/r/20191213185010.6k7yl2tck3wlsdkt@kili.mountainSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NJeffle Xu <jefflexu@linux.alibaba.com>
      Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      b23f8091
    • J
      ext4: check for directory entries too close to block end · 3ac53bd1
      Jan Kara 提交于
      task #28557685
      
      commit 109ba779d6cca2d519c5dd624a3276d03e21948e upstream.
      
      ext4_check_dir_entry() currently does not catch a case when a directory
      entry ends so close to the block end that the header of the next
      directory entry would not fit in the remaining space. This can lead to
      directory iteration code trying to access address beyond end of current
      buffer head leading to oops.
      
      CC: stable@vger.kernel.org
      Signed-off-by: NJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20191202170213.4761-3-jack@suse.czSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NJeffle Xu <jefflexu@linux.alibaba.com>
      Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      3ac53bd1
    • J
      ext4: fix ext4_empty_dir() for directories with holes · 824b5c3c
      Jan Kara 提交于
      task #28557685
      
      commit 64d4ce892383b2ad6d782e080d25502f91bf2a38 upstream.
      
      Function ext4_empty_dir() doesn't correctly handle directories with
      holes and crashes on bh->b_data dereference when bh is NULL. Reorganize
      the loop to use 'offset' variable all the times instead of comparing
      pointers to current direntry with bh->b_data pointer. Also add more
      strict checking of '.' and '..' directory entries to avoid entering loop
      in possibly invalid state on corrupted filesystems.
      
      References: CVE-2019-19037
      CC: stable@vger.kernel.org
      Fixes: 4e19d6b65fb4 ("ext4: allow directory holes")
      Signed-off-by: NJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20191202170213.4761-2-jack@suse.czSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NJeffle Xu <jefflexu@linux.alibaba.com>
      Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      824b5c3c
  2. 12 6月, 2020 1 次提交
    • J
      ext4: fix partial cluster initialization when splitting extent · b4a0105f
      Jeffle Xu 提交于
      fix #27212891
      
      Fix the bug when calculating the physical block number of the first
      block in the split extent.
      
      This bug will cause xfstests shared/298 failure on ext4 with bigalloc
      enabled occasionally. Ext4 error messages indicate that previously freed
      blocks are being freed again, and the following fsck will fail due to
      the inconsistency of block bitmap and bg descriptor.
      
      The following is an example case:
      
      1. First, Initialize a ext4 filesystem with cluster size '16K', block size
      '4K', in which case, one cluster contains four blocks.
      
      2. Create one file (e.g., xxx.img) on this ext4 filesystem. Now the extent
      tree of this file is like:
      
      ...
      36864:[0]4:220160
      36868:[0]14332:145408
      51200:[0]2:231424
      ...
      
      3. Then execute PUNCH_HOLE fallocate on this file. The hole range is
      like:
      
      ..
      ext4_ext_remove_space: dev 254,16 ino 12 since 49506 end 49506 depth 1
      ext4_ext_remove_space: dev 254,16 ino 12 since 49544 end 49546 depth 1
      ext4_ext_remove_space: dev 254,16 ino 12 since 49605 end 49607 depth 1
      ...
      
      4. Then the extent tree of this file after punching is like
      
      ...
      49507:[0]37:158047
      49547:[0]58:158087
      ...
      
      5. Detailed procedure of punching hole [49544, 49546]
      
      5.1. The block address space:
      ```
      lblk        ~49505  49506   49507~49543     49544~49546    49547~
      	  ---------+------+-------------+----------------+--------
      	    extent | hole |   extent	|	hole	 | extent
      	  ---------+------+-------------+----------------+--------
      pblk       ~158045  158046  158047~158083  158084~158086   158087~
      ```
      
      5.2. The detailed layout of cluster 39521:
      ```
      		cluster 39521
      	<------------------------------->
      
      		hole		  extent
      	<----------------------><--------
      
      lblk      49544   49545   49546   49547
      	+-------+-------+-------+-------+
      	|	|	|	|	|
      	+-------+-------+-------+-------+
      pblk     158084  1580845  158086  158087
      ```
      
      5.3. The ftrace output when punching hole [49544, 49546]:
      - ext4_ext_remove_space (start 49544, end 49546)
        - ext4_ext_rm_leaf (start 49544, end 49546, last_extent [49507(158047), 40], partial [pclu 39522 lblk 0 state 2])
          - ext4_remove_blocks (extent [49507(158047), 40], from 49544 to 49546, partial [pclu 39522 lblk 0 state 2]
            - ext4_free_blocks: (block 158084 count 4)
              - ext4_mballoc_free (extent 1/6753/1)
      
      5.4. Ext4 error message in dmesg:
      EXT4-fs error (device vdb): mb_free_blocks:1457: group 1, block 158084:freeing already freed block (bit 6753); block bitmap corrupt.
      EXT4-fs error (device vdb): ext4_mb_generate_buddy:747: group 1, block bitmap and bg descriptor inconsistent: 19550 vs 19551 free clusters
      
      In this case, the whole cluster 39521 is freed mistakenly when freeing
      pblock 158084~158086 (i.e., the first three blocks of this cluster),
      although pblock 158087 (the last remaining block of this cluster) has
      not been freed yet.
      
      The root cause of this isuue is that, the pclu of the partial cluster is
      calculated mistakenly in ext4_ext_remove_space(). The correct
      partial_cluster.pclu (i.e., the cluster number of the first block in the
      next extent, that is, lblock 49597 (pblock 158086)) should be 39521 rather
      than 39522.
      
      Fixes: f4226d9e ("ext4: fix partial cluster initialization")
      Signed-off-by: NJeffle Xu <jefflexu@linux.alibaba.com>
      Reviewed-by: NEric Whitney <enwlinux@gmail.com>
      Cc: stable@kernel.org # v3.19+
      Link: https://lore.kernel.org/r/1590121124-37096-1-git-send-email-jefflexu@linux.alibaba.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      b4a0105f
  3. 30 4月, 2020 1 次提交
  4. 24 4月, 2020 1 次提交
  5. 18 3月, 2020 17 次提交
  6. 17 1月, 2020 3 次提交
  7. 15 1月, 2020 3 次提交
  8. 27 12月, 2019 6 次提交
    • E
      ext4: fix bigalloc cluster freeing when hole punching under load · 9e24ae5c
      Eric Whitney 提交于
      commit 7bd75230b43727b258a4f7a59d62114cffe1b6c8 upstream.
      
      Ext4 may not free clusters correctly when punching holes in bigalloc
      file systems under high load conditions.  If it's not possible to
      extend and restart the journal in ext4_ext_rm_leaf() when preparing to
      remove blocks from a punched region, a retry of the entire punch
      operation is triggered in ext4_ext_remove_space().  This causes a
      partial cluster to be set to the first cluster in the extent found to
      the right of the punched region.  However, if the punch operation
      prior to the retry had made enough progress to delete one or more
      extents and a partial cluster candidate for freeing had already been
      recorded, the retry would overwrite the partial cluster.  The loss of
      this information makes it impossible to correctly free the original
      partial cluster in all cases.
      
      This bug can cause generic/476 to fail when run as part of
      xfstests-bld's bigalloc and bigalloc_1k test cases.  The failure is
      reported when e2fsck detects bad iblocks counts greater than expected
      in units of whole clusters and also detects a number of negative block
      bitmap differences equal to the iblocks discrepancy in cluster units.
      Signed-off-by: NEric Whitney <enwlinux@gmail.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      9e24ae5c
    • X
      ext4: unlock unused_pages timely when doing writeback · 507547e8
      Xiaoguang Wang 提交于
      commit a297b2fcee461e40df763e179cbbfba5a9e572d2 upstream.
      
      In mpage_add_bh_to_extent(), when accumulated extents length is greater
      than MAX_WRITEPAGES_EXTENT_LEN or buffer head's b_stat is not equal, we
      will not continue to search unmapped area for this page, but note this
      page is locked, and will only be unlocked in mpage_release_unused_pages()
      after ext4_io_submit, if io also is throttled by blk-throttle or similar
      io qos, we will hold this page locked for a while, it's unnecessary.
      
      I think the best fix is to refactor mpage_add_bh_to_extent() to let it
      return some hints whether to unlock this page, but given that we will
      improve dioread_nolock later, we can let it done later, so currently
      the simple fix would just call mpage_release_unused_pages() before
      ext4_io_submit().
      Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      Reviewed-by: NLiu Bo <bo.liu@linux.alibaba.com>
      507547e8
    • E
      ext4: fix reserved cluster accounting at page invalidation time · c94af26e
      Eric Whitney 提交于
      commit f456767d3391e9f7d9d25a2e7241d75676dc19da upstream.
      
      Add new code to count canceled pending cluster reservations on bigalloc
      file systems and to reduce the cluster reservation count on all file
      systems using delayed allocation.  This replaces old code in
      ext4_da_page_release_reservations that was incorrect.
      Signed-off-by: NEric Whitney <enwlinux@gmail.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      Reviewed-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
      c94af26e
    • E
      ext4: adjust reserved cluster count when removing extents · 60a5bf34
      Eric Whitney 提交于
      commit 9fe671496b6c286f9033aedfc1718d67721da0ae upstream.
      
      Modify ext4_ext_remove_space() and the code it calls to correct the
      reserved cluster count for pending reservations (delayed allocated
      clusters shared with allocated blocks) when a block range is removed
      from the extent tree.  Pending reservations may be found for the clusters
      at the ends of written or unwritten extents when a block range is removed.
      If a physical cluster at the end of an extent is freed, it's necessary
      to increment the reserved cluster count to maintain correct accounting
      if the corresponding logical cluster is shared with at least one
      delayed and unwritten extent as found in the extents status tree.
      
      Add a new function, ext4_rereserve_cluster(), to reapply a reservation
      on a delayed allocated cluster sharing blocks with a freed allocated
      cluster.  To avoid ENOSPC on reservation, a flag is applied to
      ext4_free_blocks() to briefly defer updating the freeclusters counter
      when an allocated cluster is freed.  This prevents another thread
      from allocating the freed block before the reservation can be reapplied.
      
      Redefine the partial cluster object as a struct to carry more state
      information and to clarify the code using it.
      
      Adjust the conditional code structure in ext4_ext_remove_space to
      reduce the indentation level in the main body of the code to improve
      readability.
      Signed-off-by: NEric Whitney <enwlinux@gmail.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      Reviewed-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
      60a5bf34
    • E
      ext4: reduce reserved cluster count by number of allocated clusters · d49d8b35
      Eric Whitney 提交于
      commit b6bf9171ef5c37b66d446378ba63af5339a56a97 upstream.
      
      Ext4 does not always reduce the reserved cluster count by the number
      of clusters allocated when mapping a delayed extent.  It sometimes
      adds back one or more clusters after allocation if delalloc blocks
      adjacent to the range allocated by ext4_ext_map_blocks() share the
      clusters newly allocated for that range.  However, this overcounts
      the number of clusters needed to satisfy future mapping requests
      (holding one or more reservations for clusters that have already been
      allocated) and premature ENOSPC and quota failures, etc., result.
      
      Ext4 also does not reduce the reserved cluster count when allocating
      clusters for non-delayed allocated writes that have previously been
      reserved for delayed writes.  This also results in overcounts.
      
      To make it possible to handle reserved cluster accounting for
      fallocated regions in the same manner as used for other non-delayed
      writes, do the reserved cluster accounting for them at the time of
      allocation.  In the current code, this is only done later when a
      delayed extent sharing the fallocated region is finally mapped.
      
      Address comment correcting handling of unsigned long long constant
      from Jan Kara's review of RFC version of this patch.
      Signed-off-by: NEric Whitney <enwlinux@gmail.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      Reviewed-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
      d49d8b35
    • E
      ext4: fix reserved cluster accounting at delayed write time · f683c7e6
      Eric Whitney 提交于
      commit 0b02f4c0d6d9e2c611dfbdd4317193e9dca740e6 upstream.
      
      The code in ext4_da_map_blocks sometimes reserves space for more
      delayed allocated clusters than it should, resulting in premature
      ENOSPC, exceeded quota, and inaccurate free space reporting.
      
      Fix this by checking for written and unwritten blocks shared in the
      same cluster with the newly delayed allocated block.  A cluster
      reservation should not be made for a cluster for which physical space
      has already been allocated.
      Signed-off-by: NEric Whitney <enwlinux@gmail.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      Reviewed-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
      f683c7e6