1. 30 10月, 2019 5 次提交
    • E
      ext4: adjust reserved cluster count when removing extents · eb792dc6
      Eric Whitney 提交于
      commit 9fe671496b6c286f9033aedfc1718d67721da0ae upstream.
      
      Modify ext4_ext_remove_space() and the code it calls to correct the
      reserved cluster count for pending reservations (delayed allocated
      clusters shared with allocated blocks) when a block range is removed
      from the extent tree.  Pending reservations may be found for the clusters
      at the ends of written or unwritten extents when a block range is removed.
      If a physical cluster at the end of an extent is freed, it's necessary
      to increment the reserved cluster count to maintain correct accounting
      if the corresponding logical cluster is shared with at least one
      delayed and unwritten extent as found in the extents status tree.
      
      Add a new function, ext4_rereserve_cluster(), to reapply a reservation
      on a delayed allocated cluster sharing blocks with a freed allocated
      cluster.  To avoid ENOSPC on reservation, a flag is applied to
      ext4_free_blocks() to briefly defer updating the freeclusters counter
      when an allocated cluster is freed.  This prevents another thread
      from allocating the freed block before the reservation can be reapplied.
      
      Redefine the partial cluster object as a struct to carry more state
      information and to clarify the code using it.
      
      Adjust the conditional code structure in ext4_ext_remove_space to
      reduce the indentation level in the main body of the code to improve
      readability.
      Signed-off-by: NEric Whitney <enwlinux@gmail.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      Reviewed-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
      eb792dc6
    • E
      ext4: reduce reserved cluster count by number of allocated clusters · 39ad46f5
      Eric Whitney 提交于
      commit b6bf9171ef5c37b66d446378ba63af5339a56a97 upstream.
      
      Ext4 does not always reduce the reserved cluster count by the number
      of clusters allocated when mapping a delayed extent.  It sometimes
      adds back one or more clusters after allocation if delalloc blocks
      adjacent to the range allocated by ext4_ext_map_blocks() share the
      clusters newly allocated for that range.  However, this overcounts
      the number of clusters needed to satisfy future mapping requests
      (holding one or more reservations for clusters that have already been
      allocated) and premature ENOSPC and quota failures, etc., result.
      
      Ext4 also does not reduce the reserved cluster count when allocating
      clusters for non-delayed allocated writes that have previously been
      reserved for delayed writes.  This also results in overcounts.
      
      To make it possible to handle reserved cluster accounting for
      fallocated regions in the same manner as used for other non-delayed
      writes, do the reserved cluster accounting for them at the time of
      allocation.  In the current code, this is only done later when a
      delayed extent sharing the fallocated region is finally mapped.
      
      Address comment correcting handling of unsigned long long constant
      from Jan Kara's review of RFC version of this patch.
      Signed-off-by: NEric Whitney <enwlinux@gmail.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      Reviewed-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
      39ad46f5
    • E
      ext4: fix reserved cluster accounting at delayed write time · b834eb8a
      Eric Whitney 提交于
      commit 0b02f4c0d6d9e2c611dfbdd4317193e9dca740e6 upstream.
      
      The code in ext4_da_map_blocks sometimes reserves space for more
      delayed allocated clusters than it should, resulting in premature
      ENOSPC, exceeded quota, and inaccurate free space reporting.
      
      Fix this by checking for written and unwritten blocks shared in the
      same cluster with the newly delayed allocated block.  A cluster
      reservation should not be made for a cluster for which physical space
      has already been allocated.
      Signed-off-by: NEric Whitney <enwlinux@gmail.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      Reviewed-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
      b834eb8a
    • E
      ext4: add new pending reservation mechanism · a7fecf4b
      Eric Whitney 提交于
      commit 1dc0aa46e74a3366e12f426b7caaca477853e9c3 upstream.
      
      Add new pending reservation mechanism to help manage reserved cluster
      accounting.  Its primary function is to avoid the need to read extents
      from the disk when invalidating pages as a result of a truncate, punch
      hole, or collapse range operation.
      Signed-off-by: NEric Whitney <enwlinux@gmail.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      Reviewed-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
      a7fecf4b
    • E
      ext4: generalize extents status tree search functions · 05cc8939
      Eric Whitney 提交于
      commit ad431025aecda85d3ebef5e4a3aca5c1c681d0c7 upstream.
      
      Ext4 contains a few functions that are used to search for delayed
      extents or blocks in the extents status tree.  Rather than duplicate
      code to add new functions to search for extents with different status
      values, such as written or a combination of delayed and unwritten,
      generalize the existing code to search for caller-specified extents
      status values.  Also, move this code into extents_status.c where it
      is better associated with the data structures it operates upon, and
      where it can be more readily used to implement new extents status tree
      functions that might want a broader scope for i_es_lock.
      
      Three missing static specifiers in RFC version of patch reported and
      fixed by Fengguang Wu <fengguang.wu@intel.com>.
      Signed-off-by: NEric Whitney <enwlinux@gmail.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      Reviewed-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
      05cc8939
  2. 08 10月, 2019 1 次提交
    • Z
      ext4: fix potential use after free after remounting with noblock_validity · 5b400fed
      zhangyi (F) 提交于
      [ Upstream commit 7727ae52975d4f4ef7ff69ed8e6e25f6a4168158 ]
      
      Remount process will release system zone which was allocated before if
      "noblock_validity" is specified. If we mount an ext4 file system to two
      mountpoints with default mount options, and then remount one of them
      with "noblock_validity", it may trigger a use after free problem when
      someone accessing the other one.
      
       # mount /dev/sda foo
       # mount /dev/sda bar
      
      User access mountpoint "foo"   |   Remount mountpoint "bar"
                                     |
      ext4_map_blocks()              |   ext4_remount()
      check_block_validity()         |   ext4_setup_system_zone()
      ext4_data_block_valid()        |   ext4_release_system_zone()
                                     |   free system_blks rb nodes
      access system_blks rb nodes    |
      trigger use after free         |
      
      This problem can also be reproduced by one mountpint, At the same time,
      add_system_zone() can get called during remount as well so there can be
      racing ext4_data_block_valid() reading the rbtree at the same time.
      
      This patch add RCU to protect system zone from releasing or building
      when doing a remount which inverse current "noblock_validity" mount
      option. It assign the rbtree after the whole tree was complete and
      do actual freeing after rcu grace period, avoid any intermediate state.
      
      Reported-by: syzbot+1e470567330b7ad711d5@syzkaller.appspotmail.com
      Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      5b400fed
  3. 05 10月, 2019 2 次提交
  4. 16 9月, 2019 4 次提交
  5. 28 7月, 2019 4 次提交
  6. 31 5月, 2019 2 次提交
  7. 22 5月, 2019 12 次提交
  8. 02 5月, 2019 1 次提交
  9. 20 4月, 2019 4 次提交
  10. 06 4月, 2019 1 次提交
  11. 27 3月, 2019 3 次提交
    • Z
      ext4: brelse all indirect buffer in ext4_ind_remove_space() · d12d8641
      zhangyi (F) 提交于
      commit 674a2b27234d1b7afcb0a9162e81b2e53aeef217 upstream.
      
      All indirect buffers get by ext4_find_shared() should be released no
      mater the branch should be freed or not. But now, we forget to release
      the lower depth indirect buffers when removing space from the same
      higher depth indirect block. It will lead to buffer leak and futher
      more, it may lead to quota information corruption when using old quota,
      consider the following case.
      
       - Create and mount an empty ext4 filesystem without extent and quota
         features,
       - quotacheck and enable the user & group quota,
       - Create some files and write some data to them, and then punch hole
         to some files of them, it may trigger the buffer leak problem
         mentioned above.
       - Disable quota and run quotacheck again, it will create two new
         aquota files and write the checked quota information to them, which
         probably may reuse the freed indirect block(the buffer and page
         cache was not freed) as data block.
       - Enable quota again, it will invoke
         vfs_load_quota_inode()->invalidate_bdev() to try to clean unused
         buffers and pagecache. Unfortunately, because of the buffer of quota
         data block is still referenced, quota code cannot read the up to date
         quota info from the device and lead to quota information corruption.
      
      This problem can be reproduced by xfstests generic/231 on ext3 file
      system or ext4 file system without extent and quota features.
      
      This patch fix this problem by releasing the missing indirect buffers,
      in ext4_ind_remove_space().
      Reported-by: NHulk Robot <hulkci@huawei.com>
      Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Cc: stable@kernel.org
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d12d8641
    • L
      ext4: fix data corruption caused by unaligned direct AIO · 76c9ee6b
      Lukas Czerner 提交于
      commit 372a03e01853f860560eade508794dd274e9b390 upstream.
      
      Ext4 needs to serialize unaligned direct AIO because the zeroing of
      partial blocks of two competing unaligned AIOs can result in data
      corruption.
      
      However it decides not to serialize if the potentially unaligned aio is
      past i_size with the rationale that no pending writes are possible past
      i_size. Unfortunately if the i_size is not block aligned and the second
      unaligned write lands past i_size, but still into the same block, it has
      the potential of corrupting the previous unaligned write to the same
      block.
      
      This is (very simplified) reproducer from Frank
      
          // 41472 = (10 * 4096) + 512
          // 37376 = 41472 - 4096
      
          ftruncate(fd, 41472);
          io_prep_pwrite(iocbs[0], fd, buf[0], 4096, 37376);
          io_prep_pwrite(iocbs[1], fd, buf[1], 4096, 41472);
      
          io_submit(io_ctx, 1, &iocbs[1]);
          io_submit(io_ctx, 1, &iocbs[2]);
      
          io_getevents(io_ctx, 2, 2, events, NULL);
      
      Without this patch the 512B range from 40960 up to the start of the
      second unaligned write (41472) is going to be zeroed overwriting the data
      written by the first write. This is a data corruption.
      
      00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
      *
      00009200  30 30 30 30 30 30 30 30  30 30 30 30 30 30 30 30
      *
      0000a000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
      *
      0000a200  31 31 31 31 31 31 31 31  31 31 31 31 31 31 31 31
      
      With this patch the data corruption is avoided because we will recognize
      the unaligned_aio and wait for the unwritten extent conversion.
      
      00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
      *
      00009200  30 30 30 30 30 30 30 30  30 30 30 30 30 30 30 30
      *
      0000a200  31 31 31 31 31 31 31 31  31 31 31 31 31 31 31 31
      *
      0000b200
      Reported-by: NFrank Sorenson <fsorenso@redhat.com>
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Fixes: e9e3bcec ("ext4: serialize unaligned asynchronous DIO")
      Cc: stable@vger.kernel.org
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      76c9ee6b
    • J
      ext4: fix NULL pointer dereference while journal is aborted · 558331d0
      Jiufei Xue 提交于
      commit fa30dde38aa8628c73a6dded7cb0bba38c27b576 upstream.
      
      We see the following NULL pointer dereference while running xfstests
      generic/475:
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
      PGD 8000000c84bad067 P4D 8000000c84bad067 PUD c84e62067 PMD 0
      Oops: 0000 [#1] SMP PTI
      CPU: 7 PID: 9886 Comm: fsstress Kdump: loaded Not tainted 5.0.0-rc8 #10
      RIP: 0010:ext4_do_update_inode+0x4ec/0x760
      ...
      Call Trace:
      ? jbd2_journal_get_write_access+0x42/0x50
      ? __ext4_journal_get_write_access+0x2c/0x70
      ? ext4_truncate+0x186/0x3f0
      ext4_mark_iloc_dirty+0x61/0x80
      ext4_mark_inode_dirty+0x62/0x1b0
      ext4_truncate+0x186/0x3f0
      ? unmap_mapping_pages+0x56/0x100
      ext4_setattr+0x817/0x8b0
      notify_change+0x1df/0x430
      do_truncate+0x5e/0x90
      ? generic_permission+0x12b/0x1a0
      
      This is triggered because the NULL pointer handle->h_transaction was
      dereferenced in function ext4_update_inode_fsync_trans().
      I found that the h_transaction was set to NULL in jbd2__journal_restart
      but failed to attached to a new transaction while the journal is aborted.
      
      Fix this by checking the handle before updating the inode.
      
      Fixes: b436b9be ("ext4: Wait for proper transaction commit on fsync")
      Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      Cc: stable@kernel.org
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      558331d0
  12. 24 3月, 2019 1 次提交
    • J
      ext4: fix crash during online resizing · 8a4fdc64
      Jan Kara 提交于
      commit f96c3ac8dfc24b4e38fc4c2eba5fea2107b929d1 upstream.
      
      When computing maximum size of filesystem possible with given number of
      group descriptor blocks, we forget to include s_first_data_block into
      the number of blocks. Thus for filesystems with non-zero
      s_first_data_block it can happen that computed maximum filesystem size
      is actually lower than current filesystem size which confuses the code
      and eventually leads to a BUG_ON in ext4_alloc_group_tables() hitting on
      flex_gd->count == 0. The problem can be reproduced like:
      
      truncate -s 100g /tmp/image
      mkfs.ext4 -b 1024 -E resize=262144 /tmp/image 32768
      mount -t ext4 -o loop /tmp/image /mnt
      resize2fs /dev/loop0 262145
      resize2fs /dev/loop0 300000
      
      Fix the problem by properly including s_first_data_block into the
      computed number of filesystem blocks.
      
      Fixes: 1c6bd717 "ext4: convert file system to meta_bg if needed..."
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8a4fdc64