1. 30 6月, 2016 1 次提交
    • V
      ext4: check for extents that wrap around · f70749ca
      Vegard Nossum 提交于
      An extent with lblock = 4294967295 and len = 1 will pass the
      ext4_valid_extent() test:
      
      	ext4_lblk_t last = lblock + len - 1;
      
      	if (len == 0 || lblock > last)
      		return 0;
      
      since last = 4294967295 + 1 - 1 = 4294967295. This would later trigger
      the BUG_ON(es->es_lblk + es->es_len < es->es_lblk) in ext4_es_end().
      
      We can simplify it by removing the - 1 altogether and changing the test
      to use lblock + len <= lblock, since now if len = 0, then lblock + 0 ==
      lblock and it fails, and if len > 0 then lblock + len > lblock in order
      to pass (i.e. it doesn't overflow).
      
      Fixes: 5946d089 ("ext4: check for overlapping extents in ext4_valid_extent_entries()")
      Fixes: 2f974865 ("ext4: check for zero length extent explicitly")
      Cc: Eryu Guan <guaneryu@gmail.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPhil Turnbull <phil.turnbull@oracle.com>
      Signed-off-by: NVegard Nossum <vegard.nossum@oracle.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      f70749ca
  2. 06 5月, 2016 1 次提交
  3. 27 4月, 2016 1 次提交
  4. 26 4月, 2016 1 次提交
    • T
      ext4: fix jbd2 handle extension in ext4_ext_truncate_extend_restart() · 7b808191
      Theodore Ts'o 提交于
      The function jbd2_journal_extend() takes as its argument the number of
      new credits to be added to the handle.  We weren't taking into account
      the currently unused handle credits; worse, we would try to extend the
      handle by N credits when it had N credits available.
      
      In the case where jbd2_journal_extend() fails because the transaction
      is too large, when jbd2_journal_restart() gets called, the N credits
      owned by the handle gets returned to the transaction, and the
      transaction commit is asynchronously requested, and then
      start_this_handle() will be able to successfully attach the handle to
      the current transaction since the required credits are now available.
      
      This is mostly harmless, but since ext4_ext_truncate_extend_restart()
      returns EAGAIN, the truncate machinery will once again try to call
      ext4_ext_truncate_extend_restart(), which will do the above sequence
      over and over again until the transaction has committed.
      
      This was found while I was debugging a lockup in caused by running
      xfstests generic/074 in the data=journal case.  I'm still not sure why
      we ended up looping forever, which suggests there may still be another
      bug hiding in the transaction accounting machinery, but this commit
      prevents us from looping in the first place.
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      7b808191
  5. 10 3月, 2016 3 次提交
  6. 09 3月, 2016 1 次提交
    • J
      ext4: simplify io_end handling for AIO DIO · 109811c2
      Jan Kara 提交于
      When mapping blocks for direct IO, we allocate io_end structure before
      mapping blocks and store pointer to it in the inode. This creates a
      requirement that any AIO DIO using io_end must be protected by i_mutex.
      This created problems in the past with dioread_nolock mode which was
      corrupting io_end pointers. Also io_end is allocated unnecessarily in
      case where we don't need to convert any extents (which is a common case
      for example when overwriting file).
      
      We fix the problem by allocating io_end only once we return unwritten
      extent from block mapping function for AIO DIO (so we can save some
      pointless io_end allocations) and we pass pointer to it in bh->b_private
      which generic DIO code later passes to our end IO callback. That way we
      remove any need for global pointer to io_end structure and thus fix the
      races.
      
      The downside of this change is that the checking for unwritten IO in
      flight in ext4_extents_can_be_merged() is more racy since we now
      increment i_unwritten / set EXT4_STATE_DIO_UNWRITTEN only after dropping
      i_data_sem. However the check has been racy already before because
      ext4_writepages() already increment i_unwritten after dropping
      i_data_sem and reserved blocks save us from hitting ENOSPC in the worst
      case.
      Signed-off-by: NJan Kara <jack@suse.cz>
      109811c2
  7. 23 2月, 2016 1 次提交
  8. 12 2月, 2016 1 次提交
  9. 23 1月, 2016 1 次提交
    • A
      wrappers for ->i_mutex access · 5955102c
      Al Viro 提交于
      parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
      inode_foo(inode) being mutex_foo(&inode->i_mutex).
      
      Please, use those for access to ->i_mutex; over the coming cycle
      ->i_mutex will become rwsem, with ->lookup() done with it held
      only shared.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      5955102c
  10. 08 12月, 2015 6 次提交
    • J
      ext4: implement allocation of pre-zeroed blocks · c86d8db3
      Jan Kara 提交于
      DAX page fault path needs to get blocks that are pre-zeroed to avoid
      races when two concurrent page faults happen in the same block of a
      file. Implement support for this in ext4_map_blocks().
      Signed-off-by: NJan Kara <jack@suse.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      c86d8db3
    • J
      ext4: provide ext4_issue_zeroout() · 53085fac
      Jan Kara 提交于
      Create new function ext4_issue_zeroout() to zeroout contiguous (both
      logically and physically) part of inode data. We will need to issue
      zeroout when extent structure is not readily available and this function
      will allow us to do it without making up fake extent structures.
      Signed-off-by: NJan Kara <jack@suse.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      53085fac
    • J
      ext4: fix races of writeback with punch hole and zero range · 01127848
      Jan Kara 提交于
      When doing delayed allocation, update of on-disk inode size is postponed
      until IO submission time. However hole punch or zero range fallocate
      calls can end up discarding the tail page cache page and thus on-disk
      inode size would never be properly updated.
      
      Make sure the on-disk inode size is updated before truncating page
      cache.
      Signed-off-by: NJan Kara <jack@suse.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      01127848
    • J
      ext4: fix races between buffered IO and collapse / insert range · 32ebffd3
      Jan Kara 提交于
      Current code implementing FALLOC_FL_COLLAPSE_RANGE and
      FALLOC_FL_INSERT_RANGE is prone to races with buffered writes and page
      faults. If buffered write or write via mmap manages to squeeze between
      filemap_write_and_wait_range() and truncate_pagecache() in the fallocate
      implementations, the written data is simply discarded by
      truncate_pagecache() although it should have been shifted.
      
      Fix the problem by moving filemap_write_and_wait_range() call inside
      i_mutex and i_mmap_sem. That way we are protected against races with
      both buffered writes and page faults.
      Signed-off-by: NJan Kara <jack@suse.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      32ebffd3
    • J
      ext4: move unlocked dio protection from ext4_alloc_file_blocks() · 17048e8a
      Jan Kara 提交于
      Currently ext4_alloc_file_blocks() was handling protection against
      unlocked DIO. However we now need to sometimes call it under i_mmap_sem
      and sometimes not and DIO protection ranks above it (although strictly
      speaking this cannot currently create any deadlocks). Also
      ext4_zero_range() was actually getting & releasing unlocked DIO
      protection twice in some cases. Luckily it didn't introduce any real bug
      but it was a land mine waiting to be stepped on.  So move DIO protection
      out from ext4_alloc_file_blocks() into the two callsites.
      Signed-off-by: NJan Kara <jack@suse.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      17048e8a
    • J
      ext4: fix races between page faults and hole punching · ea3d7209
      Jan Kara 提交于
      Currently, page faults and hole punching are completely unsynchronized.
      This can result in page fault faulting in a page into a range that we
      are punching after truncate_pagecache_range() has been called and thus
      we can end up with a page mapped to disk blocks that will be shortly
      freed. Filesystem corruption will shortly follow. Note that the same
      race is avoided for truncate by checking page fault offset against
      i_size but there isn't similar mechanism available for punching holes.
      
      Fix the problem by creating new rw semaphore i_mmap_sem in inode and
      grab it for writing over truncate, hole punching, and other functions
      removing blocks from extent tree and for read over page faults. We
      cannot easily use i_data_sem for this since that ranks below transaction
      start and we need something ranking above it so that it can be held over
      the whole truncate / hole punching operation. Also remove various
      workarounds we had in the code to reduce race window when page fault
      could have created pages with stale mapping information.
      Signed-off-by: NJan Kara <jack@suse.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      ea3d7209
  11. 18 10月, 2015 2 次提交
  12. 03 10月, 2015 1 次提交
    • T
      ext4 crypto: fix bugs in ext4_encrypted_zeroout() · 36086d43
      Theodore Ts'o 提交于
      Fix multiple bugs in ext4_encrypted_zeroout(), including one that
      could cause us to write an encrypted zero page to the wrong location
      on disk, potentially causing data and file system corruption.
      Fortunately, this tends to only show up in stress tests, but even with
      these fixes, we are seeing some test failures with generic/127 --- but
      these are now caused by data failures instead of metadata corruption.
      
      Since ext4_encrypted_zeroout() is only used for some optimizations to
      keep the extent tree from being too fragmented, and
      ext4_encrypted_zeroout() itself isn't all that optimized from a time
      or IOPS perspective, disable the extent tree optimization for
      encrypted inodes for now.  This prevents the data corruption issues
      reported by generic/127 until we can figure out what's going wrong.
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      36086d43
  13. 29 9月, 2015 1 次提交
  14. 02 7月, 2015 1 次提交
  15. 21 6月, 2015 1 次提交
    • T
      ext4: prevent ext4_quota_write() from failing due to ENOSPC · c5e298ae
      Theodore Ts'o 提交于
      In order to prevent quota block tracking to be inaccurate when
      ext4_quota_write() fails with ENOSPC, we make two changes.  The quota
      file can now use the reserved block (since the quota file is arguably
      file system metadata), and ext4_quota_write() now uses
      ext4_should_retry_alloc() to retry the block allocation after a commit
      has completed and released some blocks for allocation.
      
      This fixes failures of xfstests generic/270:
      
      Quota error (device vdc): write_blk: dquota write failed
      Quota error (device vdc): qtree_write_dquot: Error -28 occurred while creating quota
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      c5e298ae
  16. 15 6月, 2015 2 次提交
    • L
      ext4: wait for existing dio workers in ext4_alloc_file_blocks() · 0d306dcf
      Lukas Czerner 提交于
      Currently existing dio workers can jump in and potentially increase
      extent tree depth while we're allocating blocks in
      ext4_alloc_file_blocks().  This may cause us to underestimate the
      number of credits needed for the transaction because the extent tree
      depth can change after our estimation.
      
      Fix this by waiting for all the existing dio workers in the same way
      as we do it in ext4_punch_hole.  We've seen errors caused by this in
      xfstest generic/299, however it's really hard to reproduce.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      0d306dcf
    • L
      ext4: recalculate journal credits as inode depth changes · 4134f5c8
      Lukas Czerner 提交于
      Currently in ext4_alloc_file_blocks() the number of credits is
      calculated only once before we enter the allocation loop. However within
      the allocation loop the extent tree depth can change, hence the number
      of credits needed can increase potentially exceeding the number of credits
      reserved in the handle which can cause journal failures.
      
      Fix this by recalculating number of credits when the inode depth
      changes. Note that even though ext4_alloc_file_blocks() is only
      currently used by extent base inodes we will avoid recalculating number
      of credits unnecessarily in the case of indirect based inodes.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      4134f5c8
  17. 09 6月, 2015 1 次提交
    • N
      ext4: Add support FALLOC_FL_INSERT_RANGE for fallocate · 331573fe
      Namjae Jeon 提交于
      This patch implements fallocate's FALLOC_FL_INSERT_RANGE for Ext4.
      
      1) Make sure that both offset and len are block size aligned.
      2) Update the i_size of inode by len bytes.
      3) Compute the file's logical block number against offset. If the computed
         block number is not the starting block of the extent, split the extent
         such that the block number is the starting block of the extent.
      4) Shift all the extents which are lying between [offset, last allocated extent]
         towards right by len bytes. This step will make a hole of len bytes
         at offset.
      Signed-off-by: NNamjae Jeon <namjae.jeon@samsung.com>
      Signed-off-by: NAshish Sangwan <a.sangwan@samsung.com>
      331573fe
  18. 08 6月, 2015 1 次提交
  19. 02 6月, 2015 1 次提交
    • T
      writeback: separate out include/linux/backing-dev-defs.h · 66114cad
      Tejun Heo 提交于
      With the planned cgroup writeback support, backing-dev related
      declarations will be more widely used across block and cgroup;
      unfortunately, including backing-dev.h from include/linux/blkdev.h
      makes cyclic include dependency quite likely.
      
      This patch separates out backing-dev-defs.h which only has the
      essential definitions and updates blkdev.h to include it.  c files
      which need access to more backing-dev details now include
      backing-dev.h directly.  This takes backing-dev.h off the common
      include dependency chain making it a lot easier to use it across block
      and cgroup.
      
      v2: fs/fat build failure fixed.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      66114cad
  20. 15 5月, 2015 2 次提交
  21. 03 5月, 2015 1 次提交
  22. 12 4月, 2015 1 次提交
  23. 03 4月, 2015 4 次提交
    • E
      ext4: don't release reserved space for previously allocated cluster · 9d21c9fa
      Eric Whitney 提交于
      When xfstests' auto group is run on a bigalloc filesystem with a
      4.0-rc3 kernel, e2fsck failures and kernel warnings occur for some
      tests. e2fsck reports incorrect iblocks values, and the warnings
      indicate that the space reserved for delayed allocation is being
      overdrawn at allocation time.
      
      Some of these errors occur because the reserved space is incorrectly
      decreased by one cluster when ext4_ext_map_blocks satisfies an
      allocation request by mapping an unused portion of a previously
      allocated cluster.  Because a cluster's worth of reserved space was
      already released when it was first allocated, it should not be released
      again.
      
      This patch appears to correct the e2fsck failure reported for
      generic/232 and the kernel warnings produced by ext4/001, generic/009,
      and generic/033.  Failures and warnings for some other tests remain to
      be addressed.
      Signed-off-by: NEric Whitney <enwlinux@gmail.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      9d21c9fa
    • E
      ext4: fix loss of delalloc extent info in ext4_zero_range() · 94426f4b
      Eric Whitney 提交于
      In ext4_zero_range(), removing a file's entire block range from the
      extent status tree removes all records of that file's delalloc extents.
      The delalloc accounting code uses this information, and its loss can
      then lead to accounting errors and kernel warnings at writeback time and
      subsequent file system damage.  This is most noticeable on bigalloc
      file systems where code in ext4_ext_map_blocks() handles cases where
      delalloc extents share clusters with a newly allocated extent.
      
      Because we're not deleting a block range and are correctly updating the
      status of its associated extent, there is no need to remove anything
      from the extent status tree.
      
      When this patch is combined with an unrelated bug fix for
      ext4_zero_range(), kernel warnings and e2fsck errors reported during
      xfstests runs on bigalloc filesystems are greatly reduced without
      introducing regressions on other xfstests-bld test scenarios.
      Signed-off-by: NEric Whitney <enwlinux@gmail.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      94426f4b
    • L
      ext4: allocate entire range in zero range · 0f2af21a
      Lukas Czerner 提交于
      Currently there is a bug in zero range code which causes zero range
      calls to only allocate block aligned portion of the range, while
      ignoring the rest in some cases.
      
      In some cases, namely if the end of the range is past i_size, we do
      attempt to preallocate the last nonaligned block. However this might
      cause kernel to BUG() in some carefully designed zero range requests
      on setups where page size > block size.
      
      Fix this problem by first preallocating the entire range, including
      the nonaligned edges and converting the written extents to unwritten
      in the next step. This approach will also give us the advantage of
      having the range to be as linearly contiguous as possible.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      0f2af21a
    • X
      ext4: fix comments in ext4_can_extents_be_merged() · 4255c224
      Xiaoguang Wang 提交于
      Since commit a9b82415, we are allowed to merge unwritten extents,
      so here these comments are wrong, remove it.
      Signed-off-by: NXiaoguang Wang <wangxg.fnst@cn.fujitsu.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      4255c224
  24. 03 1月, 2015 1 次提交
  25. 03 12月, 2014 2 次提交
  26. 26 11月, 2014 1 次提交