1. 18 3月, 2020 13 次提交
  2. 17 1月, 2020 3 次提交
  3. 15 1月, 2020 3 次提交
  4. 27 12月, 2019 8 次提交
    • E
      ext4: fix bigalloc cluster freeing when hole punching under load · 9e24ae5c
      Eric Whitney 提交于
      commit 7bd75230b43727b258a4f7a59d62114cffe1b6c8 upstream.
      
      Ext4 may not free clusters correctly when punching holes in bigalloc
      file systems under high load conditions.  If it's not possible to
      extend and restart the journal in ext4_ext_rm_leaf() when preparing to
      remove blocks from a punched region, a retry of the entire punch
      operation is triggered in ext4_ext_remove_space().  This causes a
      partial cluster to be set to the first cluster in the extent found to
      the right of the punched region.  However, if the punch operation
      prior to the retry had made enough progress to delete one or more
      extents and a partial cluster candidate for freeing had already been
      recorded, the retry would overwrite the partial cluster.  The loss of
      this information makes it impossible to correctly free the original
      partial cluster in all cases.
      
      This bug can cause generic/476 to fail when run as part of
      xfstests-bld's bigalloc and bigalloc_1k test cases.  The failure is
      reported when e2fsck detects bad iblocks counts greater than expected
      in units of whole clusters and also detects a number of negative block
      bitmap differences equal to the iblocks discrepancy in cluster units.
      Signed-off-by: NEric Whitney <enwlinux@gmail.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      9e24ae5c
    • X
      ext4: unlock unused_pages timely when doing writeback · 507547e8
      Xiaoguang Wang 提交于
      commit a297b2fcee461e40df763e179cbbfba5a9e572d2 upstream.
      
      In mpage_add_bh_to_extent(), when accumulated extents length is greater
      than MAX_WRITEPAGES_EXTENT_LEN or buffer head's b_stat is not equal, we
      will not continue to search unmapped area for this page, but note this
      page is locked, and will only be unlocked in mpage_release_unused_pages()
      after ext4_io_submit, if io also is throttled by blk-throttle or similar
      io qos, we will hold this page locked for a while, it's unnecessary.
      
      I think the best fix is to refactor mpage_add_bh_to_extent() to let it
      return some hints whether to unlock this page, but given that we will
      improve dioread_nolock later, we can let it done later, so currently
      the simple fix would just call mpage_release_unused_pages() before
      ext4_io_submit().
      Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      Reviewed-by: NLiu Bo <bo.liu@linux.alibaba.com>
      507547e8
    • E
      ext4: fix reserved cluster accounting at page invalidation time · c94af26e
      Eric Whitney 提交于
      commit f456767d3391e9f7d9d25a2e7241d75676dc19da upstream.
      
      Add new code to count canceled pending cluster reservations on bigalloc
      file systems and to reduce the cluster reservation count on all file
      systems using delayed allocation.  This replaces old code in
      ext4_da_page_release_reservations that was incorrect.
      Signed-off-by: NEric Whitney <enwlinux@gmail.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      Reviewed-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
      c94af26e
    • E
      ext4: adjust reserved cluster count when removing extents · 60a5bf34
      Eric Whitney 提交于
      commit 9fe671496b6c286f9033aedfc1718d67721da0ae upstream.
      
      Modify ext4_ext_remove_space() and the code it calls to correct the
      reserved cluster count for pending reservations (delayed allocated
      clusters shared with allocated blocks) when a block range is removed
      from the extent tree.  Pending reservations may be found for the clusters
      at the ends of written or unwritten extents when a block range is removed.
      If a physical cluster at the end of an extent is freed, it's necessary
      to increment the reserved cluster count to maintain correct accounting
      if the corresponding logical cluster is shared with at least one
      delayed and unwritten extent as found in the extents status tree.
      
      Add a new function, ext4_rereserve_cluster(), to reapply a reservation
      on a delayed allocated cluster sharing blocks with a freed allocated
      cluster.  To avoid ENOSPC on reservation, a flag is applied to
      ext4_free_blocks() to briefly defer updating the freeclusters counter
      when an allocated cluster is freed.  This prevents another thread
      from allocating the freed block before the reservation can be reapplied.
      
      Redefine the partial cluster object as a struct to carry more state
      information and to clarify the code using it.
      
      Adjust the conditional code structure in ext4_ext_remove_space to
      reduce the indentation level in the main body of the code to improve
      readability.
      Signed-off-by: NEric Whitney <enwlinux@gmail.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      Reviewed-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
      60a5bf34
    • E
      ext4: reduce reserved cluster count by number of allocated clusters · d49d8b35
      Eric Whitney 提交于
      commit b6bf9171ef5c37b66d446378ba63af5339a56a97 upstream.
      
      Ext4 does not always reduce the reserved cluster count by the number
      of clusters allocated when mapping a delayed extent.  It sometimes
      adds back one or more clusters after allocation if delalloc blocks
      adjacent to the range allocated by ext4_ext_map_blocks() share the
      clusters newly allocated for that range.  However, this overcounts
      the number of clusters needed to satisfy future mapping requests
      (holding one or more reservations for clusters that have already been
      allocated) and premature ENOSPC and quota failures, etc., result.
      
      Ext4 also does not reduce the reserved cluster count when allocating
      clusters for non-delayed allocated writes that have previously been
      reserved for delayed writes.  This also results in overcounts.
      
      To make it possible to handle reserved cluster accounting for
      fallocated regions in the same manner as used for other non-delayed
      writes, do the reserved cluster accounting for them at the time of
      allocation.  In the current code, this is only done later when a
      delayed extent sharing the fallocated region is finally mapped.
      
      Address comment correcting handling of unsigned long long constant
      from Jan Kara's review of RFC version of this patch.
      Signed-off-by: NEric Whitney <enwlinux@gmail.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      Reviewed-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
      d49d8b35
    • E
      ext4: fix reserved cluster accounting at delayed write time · f683c7e6
      Eric Whitney 提交于
      commit 0b02f4c0d6d9e2c611dfbdd4317193e9dca740e6 upstream.
      
      The code in ext4_da_map_blocks sometimes reserves space for more
      delayed allocated clusters than it should, resulting in premature
      ENOSPC, exceeded quota, and inaccurate free space reporting.
      
      Fix this by checking for written and unwritten blocks shared in the
      same cluster with the newly delayed allocated block.  A cluster
      reservation should not be made for a cluster for which physical space
      has already been allocated.
      Signed-off-by: NEric Whitney <enwlinux@gmail.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      Reviewed-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
      f683c7e6
    • E
      ext4: add new pending reservation mechanism · a993adbe
      Eric Whitney 提交于
      commit 1dc0aa46e74a3366e12f426b7caaca477853e9c3 upstream.
      
      Add new pending reservation mechanism to help manage reserved cluster
      accounting.  Its primary function is to avoid the need to read extents
      from the disk when invalidating pages as a result of a truncate, punch
      hole, or collapse range operation.
      Signed-off-by: NEric Whitney <enwlinux@gmail.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      Reviewed-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
      a993adbe
    • E
      ext4: generalize extents status tree search functions · fccb6f6e
      Eric Whitney 提交于
      commit ad431025aecda85d3ebef5e4a3aca5c1c681d0c7 upstream.
      
      Ext4 contains a few functions that are used to search for delayed
      extents or blocks in the extents status tree.  Rather than duplicate
      code to add new functions to search for extents with different status
      values, such as written or a combination of delayed and unwritten,
      generalize the existing code to search for caller-specified extents
      status values.  Also, move this code into extents_status.c where it
      is better associated with the data structures it operates upon, and
      where it can be more readily used to implement new extents status tree
      functions that might want a broader scope for i_es_lock.
      
      Three missing static specifiers in RFC version of patch reported and
      fixed by Fengguang Wu <fengguang.wu@intel.com>.
      Signed-off-by: NEric Whitney <enwlinux@gmail.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      Reviewed-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
      fccb6f6e
  5. 18 12月, 2019 3 次提交
    • Y
      ext4: fix a bug in ext4_wait_for_tail_page_commit · b1ec93dd
      yangerkun 提交于
      commit 565333a1554d704789e74205989305c811fd9c7a upstream.
      
      No need to wait for any commit once the page is fully truncated.
      Besides, it may confuse e.g. concurrent ext4_writepage() with the page
      still be dirty (will be cleared by truncate_pagecache() in
      ext4_setattr()) but buffers has been freed; and then trigger a bug
      show as below:
      
      [   26.057508] ------------[ cut here ]------------
      [   26.058531] kernel BUG at fs/ext4/inode.c:2134!
      ...
      [   26.088130] Call trace:
      [   26.088695]  ext4_writepage+0x914/0xb28
      [   26.089541]  writeout.isra.4+0x1b4/0x2b8
      [   26.090409]  move_to_new_page+0x3b0/0x568
      [   26.091338]  __unmap_and_move+0x648/0x988
      [   26.092241]  unmap_and_move+0x48c/0xbb8
      [   26.093096]  migrate_pages+0x220/0xb28
      [   26.093945]  kernel_mbind+0x828/0xa18
      [   26.094791]  __arm64_sys_mbind+0xc8/0x138
      [   26.095716]  el0_svc_common+0x190/0x490
      [   26.096571]  el0_svc_handler+0x60/0xd0
      [   26.097423]  el0_svc+0x8/0xc
      
      Run the procedure (generate by syzkaller) parallel with ext3.
      
      void main()
      {
      	int fd, fd1, ret;
      	void *addr;
      	size_t length = 4096;
      	int flags;
      	off_t offset = 0;
      	char *str = "12345";
      
      	fd = open("a", O_RDWR | O_CREAT);
      	assert(fd >= 0);
      
      	/* Truncate to 4k */
      	ret = ftruncate(fd, length);
      	assert(ret == 0);
      
      	/* Journal data mode */
      	flags = 0xc00f;
      	ret = ioctl(fd, _IOW('f', 2, long), &flags);
      	assert(ret == 0);
      
      	/* Truncate to 0 */
      	fd1 = open("a", O_TRUNC | O_NOATIME);
      	assert(fd1 >= 0);
      
      	addr = mmap(NULL, length, PROT_WRITE | PROT_READ,
      					MAP_SHARED, fd, offset);
      	assert(addr != (void *)-1);
      
      	memcpy(addr, str, 5);
      	mbind(addr, length, 0, 0, 0, MPOL_MF_MOVE);
      }
      
      And the bug will be triggered once we seen the below order.
      
      reproduce1                         reproduce2
      
      ...                            |   ...
      truncate to 4k                 |
      change to journal data mode    |
                                     |   memcpy(set page dirty)
      truncate to 0:                 |
      ext4_setattr:                  |
      ...                            |
      ext4_wait_for_tail_page_commit |
                                     |   mbind(trigger bug)
      truncate_pagecache(clean dirty)|   ...
      ...                            |
      
      mbind will call ext4_writepage() since the page still be dirty, and then
      report the bug since the buffers has been free. Fix it by return
      directly once offset equals to 0 which means the page has been fully
      truncated.
      Reported-by: NHulk Robot <hulkci@huawei.com>
      Signed-off-by: Nyangerkun <yangerkun@huawei.com>
      Link: https://lore.kernel.org/r/20190919063508.1045-1-yangerkun@huawei.comReviewed-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b1ec93dd
    • T
      ext4: work around deleting a file with i_nlink == 0 safely · 8e7a8653
      Theodore Ts'o 提交于
      commit c7df4a1ecb8579838ec8c56b2bb6a6716e974f37 upstream.
      
      If the file system is corrupted such that a file's i_links_count is
      too small, then it's possible that when unlinking that file, i_nlink
      will already be zero.  Previously we were working around this kind of
      corruption by forcing i_nlink to one; but we were doing this before
      trying to delete the directory entry --- and if the file system is
      corrupted enough that ext4_delete_entry() fails, then we exit with
      i_nlink elevated, and this causes the orphan inode list handling to be
      FUBAR'ed, such that when we unmount the file system, the orphan inode
      list can get corrupted.
      
      A better way to fix this is to simply skip trying to call drop_nlink()
      if i_nlink is already zero, thus moving the check to the place where
      it makes the most sense.
      
      https://bugzilla.kernel.org/show_bug.cgi?id=205433
      
      Link: https://lore.kernel.org/r/20191112032903.8828-1-tytso@mit.eduSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org
      Reviewed-by: NAndreas Dilger <adilger@dilger.ca>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8e7a8653
    • J
      ext4: Fix credit estimate for final inode freeing · 595a92a4
      Jan Kara 提交于
      commit 65db869c754e7c271691dd5feabf884347e694f5 upstream.
      
      Estimate for the number of credits needed for final freeing of inode in
      ext4_evict_inode() was to small. We may modify 4 blocks (inode & sb for
      orphan deletion, bitmap & group descriptor for inode freeing) and not
      just 3.
      
      [ Fixed minor whitespace nit. -- TYT ]
      
      Fixes: e50e5129 ("ext4: xattr-in-inode support")
      CC: stable@vger.kernel.org
      Signed-off-by: NJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20191105164437.32602-6-jack@suse.czSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      595a92a4
  6. 05 12月, 2019 1 次提交
  7. 24 11月, 2019 1 次提交
  8. 06 11月, 2019 1 次提交
  9. 08 10月, 2019 1 次提交
    • Z
      ext4: fix potential use after free after remounting with noblock_validity · 5b400fed
      zhangyi (F) 提交于
      [ Upstream commit 7727ae52975d4f4ef7ff69ed8e6e25f6a4168158 ]
      
      Remount process will release system zone which was allocated before if
      "noblock_validity" is specified. If we mount an ext4 file system to two
      mountpoints with default mount options, and then remount one of them
      with "noblock_validity", it may trigger a use after free problem when
      someone accessing the other one.
      
       # mount /dev/sda foo
       # mount /dev/sda bar
      
      User access mountpoint "foo"   |   Remount mountpoint "bar"
                                     |
      ext4_map_blocks()              |   ext4_remount()
      check_block_validity()         |   ext4_setup_system_zone()
      ext4_data_block_valid()        |   ext4_release_system_zone()
                                     |   free system_blks rb nodes
      access system_blks rb nodes    |
      trigger use after free         |
      
      This problem can also be reproduced by one mountpint, At the same time,
      add_system_zone() can get called during remount as well so there can be
      racing ext4_data_block_valid() reading the rbtree at the same time.
      
      This patch add RCU to protect system zone from releasing or building
      when doing a remount which inverse current "noblock_validity" mount
      option. It assign the rbtree after the whole tree was complete and
      do actual freeing after rcu grace period, avoid any intermediate state.
      
      Reported-by: syzbot+1e470567330b7ad711d5@syzkaller.appspotmail.com
      Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      5b400fed
  10. 05 10月, 2019 2 次提交
  11. 16 9月, 2019 4 次提交