1. 08 1月, 2016 1 次提交
    • D
      xfs: bmapbt checking on debug kernels too expensive · e3543819
      Dave Chinner 提交于
      For large sparse or fragmented files, checking every single entry in
      the bmapbt on every operation is prohibitively expensive. Especially
      as such checks rarely discover problems during normal operations on
      high extent coutn files. Our regression tests don't tend to exercise
      files with hundreds of thousands to millions of extents, so mostly
      this isn't noticed.
      
      However, trying to run things like xfs_mdrestore of large filesystem
      dumps on a debug kernel quickly becomes impossible as the CPU is
      completely burnt up repeatedly walking the sparse file bmapbt that
      is generated for every allocation that is made.
      
      Hence, if the file has more than 10,000 extents, just don't bother
      with walking the tree to check it exhaustively. The btree code has
      checks that ensure that the newly inserted/removed/modified record
      is correctly ordered, so the entrie tree walk in thses cases has
      limited additional value.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      e3543819
  2. 04 1月, 2016 2 次提交
  3. 03 11月, 2015 1 次提交
    • D
      xfs: introduce BMAPI_ZERO for allocating zeroed extents · 3fbbbea3
      Dave Chinner 提交于
      To enable DAX to do atomic allocation of zeroed extents, we need to
      drive the block zeroing deep into the allocator. Because
      xfs_bmapi_write() can return merged extents on allocation that were
      only partially allocated (i.e. requested range spans allocated and
      hole regions, allocation into the hole was contiguous), we cannot
      zero the extent returned from xfs_bmapi_write() as that can
      overwrite existing data with zeros.
      
      Hence we have to drive the extent zeroing into the allocation code,
      prior to where we merge the extents into the BMBT and return the
      resultant map. This means we need to propagate this need down to
      the xfs_alloc_vextent() and issue the block zeroing at this point.
      
      While this functionality is being introduced for DAX, there is no
      reason why it is specific to DAX - we can per-zero blocks during the
      allocation transaction on any type of device. It's just slow (and
      usually slower than unwritten allocation and conversion) on
      traditional block devices so doesn't tend to get used. We can,
      however, hook hardware zeroing optimisations via sb_issue_zeroout()
      to this operation, so it may be useful in future and hence the
      "allocate zeroed blocks" API needs to be implementation neutral.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      3fbbbea3
  4. 12 10月, 2015 2 次提交
    • B
      xfs: per-filesystem stats counter implementation · ff6d6af2
      Bill O'Donnell 提交于
      This patch modifies the stats counting macros and the callers
      to those macros to properly increment, decrement, and add-to
      the xfs stats counts. The counts for global and per-fs stats
      are correctly advanced, and cleared by writing a "1" to the
      corresponding clear file.
      
      global counts: /sys/fs/xfs/stats/stats
      per-fs counts: /sys/fs/xfs/sda*/stats/stats
      
      global clear:  /sys/fs/xfs/stats/stats_clear
      per-fs clear:  /sys/fs/xfs/sda*/stats/stats_clear
      
      [dchinner: cleaned up macro variables, removed CONFIG_FS_PROC around
       stats structures and macros. ]
      Signed-off-by: NBill O'Donnell <billodo@redhat.com>
      Reviewed-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      ff6d6af2
    • B
      xfs: log local to remote symlink conversions correctly on v5 supers · b7cdc66b
      Brian Foster 提交于
      A local format symlink inode is converted to extent format when an
      extended attribute is set on an inode as part of the attribute fork
      creation. This means a block is allocated, the local symlink target name
      is copied to the block and the block is logged. Currently,
      xfs_bmap_local_to_extents() handles logging the remote block data based
      on the size of the data fork prior to the conversion. This is not
      correct on v5 superblock filesystems, which add an additional header to
      remote symlink blocks that is nonexistent in local format inodes.
      
      As a result, the full length of the remote symlink block content is not
      logged. This can lead to corruption should a crash occur and log
      recovery replay this transaction.
      
      Since a callout is already used to initialize the new remote symlink
      block, update the local-to-extents conversion mechanism to make the
      callout also responsible for logging the block. It is already required
      to set the log buffer type and format the block appropriately based on
      the superblock version. This ensures the remote symlink is always logged
      correctly. Note that xfs_bmap_local_to_extents() is only called for
      symlinks so there are no other callouts that require modification.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      b7cdc66b
  5. 19 8月, 2015 1 次提交
  6. 22 6月, 2015 2 次提交
    • D
      xfs: clean up XFS_MIN_FREELIST macros · 496817b4
      Dave Chinner 提交于
      We no longer calculate the minimum freelist size from the on-disk
      AGF, so we don't need the macros used for this. That means the
      nested macros can be cleaned up, and turn this into an actual
      function so the logic is clear and concise. This will make it much
      easier to add support for the rmap btree when the time comes.
      
      This also gets rid of the XFS_AG_MAXLEVELS macro used by these
      freelist macros as it is simply a wrapper around a single variable.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      496817b4
    • D
      xfs: xfs_alloc_fix_freelist() can use incore perag structures · 50adbcb4
      Dave Chinner 提交于
      At the moment, xfs_alloc_fix_freelist() uses a mix of per-ag based
      access and agf buffer  based access to freelist and space usage
      information. However, once the AGF buffer is locked inside this
      function, it is guaranteed that both the in-memory and on-disk
      values are identical. xfs_alloc_fix_freelist() doesn't modify the
      values in the structures directly, so it is a read-only user of the
      infomration, and hence can use the per-ag structure exclusively for
      determining what it should do.
      
      This opens up an avenue for cleaning up a lot of duplicated logic
      whose only difference is the structure it gets the data from, and in
      doing so removes a lot of needless byte swapping overhead when
      fixing up the free list.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      50adbcb4
  7. 04 6月, 2015 2 次提交
    • C
      xfs: saner xfs_trans_commit interface · 70393313
      Christoph Hellwig 提交于
      The flags argument to xfs_trans_commit is not useful for most callers, as
      a commit of a transaction without a permanent log reservation must pass
      0 here, and all callers for a transaction with a permanent log reservation
      except for xfs_trans_roll must pass XFS_TRANS_RELEASE_LOG_RES.  So remove
      the flags argument from the public xfs_trans_commit interfaces, and
      introduce low-level __xfs_trans_commit variant just for xfs_trans_roll
      that regrants a log reservation instead of releasing it.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      70393313
    • C
      xfs: remove the flags argument to xfs_trans_cancel · 4906e215
      Christoph Hellwig 提交于
      xfs_trans_cancel takes two flags arguments: XFS_TRANS_RELEASE_LOG_RES and
      XFS_TRANS_ABORT.  Both of them are a direct product of the transaction
      state, and can be deducted:
      
       - any dirty transaction needs XFS_TRANS_ABORT to be properly canceled,
         and XFS_TRANS_ABORT is a noop for a transaction that is not dirty.
       - any transaction with a permanent log reservation needs
         XFS_TRANS_RELEASE_LOG_RES to be properly canceled, and passing
         XFS_TRANS_RELEASE_LOG_RES for a transaction without a permanent
         log reservation is invalid.
      
      So just remove the flags argument and do the right thing.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      4906e215
  8. 01 6月, 2015 1 次提交
    • B
      xfs: always log the inode on unwritten extent conversion · 2e588a46
      Brian Foster 提交于
      The fsync() requirements for crash consistency on XFS are to flush file
      data and force any in-core inode updates to the log. We currently check
      whether the inode is pinned to identify whether the log needs to be
      forced, since a non-zero pin count generally represents an inode that
      has transactions awaiting a flush to the on-disk log.
      
      This is not sufficient in all cases, however. Reports of xfstests test
      generic/311 failures on ppc64/s390x hosts have identified failures to
      fsync outstanding inode modifications due to the inode not being pinned
      at the time of the fsync. This occurs because certain bmap updates can
      complete by logging bmapbt buffers but without ever dirtying (and thus
      pinning) the core inode. The following is a specific incarnation of this
      problem:
      
      $ mount $dev /mnt -o noatime,nobarrier
      $ for i in $(seq 0 2 31); do \
              xfs_io -f -c "falloc $((i * 32768)) 32k" -c fsync /mnt/file; \
      	done
      $ xfs_io -c "pwrite -S 0 80k 16k" -c fsync -c "pwrite 76k 4k" -c fsync /mnt/file; \
      	hexdump /mnt/file; \
      	./xfstests-dev/src/godown /mnt
      ...
      0000000 0000 0000 0000 0000 0000 0000 0000 0000
      *
      0013000 cdcd cdcd cdcd cdcd cdcd cdcd cdcd cdcd
      *
      0014000 0000 0000 0000 0000 0000 0000 0000 0000
      *
      00f8000
      $ umount /mnt; mount ...
      $ hexdump /mnt/file
      0000000 0000 0000 0000 0000 0000 0000 0000 0000
      *
      00f8000
      
      In short, the unwritten extent conversion for the last write is lost
      despite the fact that an fsync executed before the filesystem was
      shutdown. Note that this is impossible to reproduce on v5 supers due to
      unconditional time callbacks for di_changecount and highly difficult to
      reproduce on CONFIG_HZ=1000 kernels due to those same callbacks
      frequently updating cmtime prior to the bmap update. CONFIG_HZ=100
      reduces timer granularity enough to increase the odds that time updates
      are skipped and allows this to reproduce within a handful of attempts.
      
      To deal with this problem, unconditionally log the core in the unwritten
      extent conversion path. Fix up logflags after the extent conversion to
      keep the extent update code consistent with the other extent update
      helpers. This fixup is not necessary for the other (hole, delay) extent
      helpers because they execute in the block allocation codepath, which
      already logs the inode for other reasons (e.g., for di_nblocks).
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      2e588a46
  9. 29 5月, 2015 1 次提交
    • D
      xfs: extent size hints can round up extents past MAXEXTLEN · 6dea405e
      Dave Chinner 提交于
      This results in BMBT corruption, as seen by this test:
      
      # mkfs.xfs -f -d size=40051712b,agcount=4 /dev/vdc
      ....
      # mount /dev/vdc /mnt/scratch
      # xfs_io -ft -c "extsize 16m" -c "falloc 0 30g" -c "bmap -vp" /mnt/scratch/foo
      
      which results in this failure on a debug kernel:
      
      XFS: Assertion failed: (blockcount & xfs_mask64hi(64-BMBT_BLOCKCOUNT_BITLEN)) == 0, file: fs/xfs/libxfs/xfs_bmap_btree.c, line: 211
      ....
      Call Trace:
       [<ffffffff814cf0ff>] xfs_bmbt_set_allf+0x8f/0x100
       [<ffffffff814cf18d>] xfs_bmbt_set_all+0x1d/0x20
       [<ffffffff814f2efe>] xfs_iext_insert+0x9e/0x120
       [<ffffffff814c7956>] ? xfs_bmap_add_extent_hole_real+0x1c6/0xc70
       [<ffffffff814c7956>] xfs_bmap_add_extent_hole_real+0x1c6/0xc70
       [<ffffffff814caaab>] xfs_bmapi_write+0x72b/0xed0
       [<ffffffff811c72ac>] ? kmem_cache_alloc+0x15c/0x170
       [<ffffffff814fe070>] xfs_alloc_file_space+0x160/0x400
       [<ffffffff81ddcc29>] ? down_write+0x29/0x60
       [<ffffffff815063eb>] xfs_file_fallocate+0x29b/0x310
       [<ffffffff811d2bc8>] ? __sb_start_write+0x58/0x120
       [<ffffffff811e3e18>] ? do_vfs_ioctl+0x318/0x570
       [<ffffffff811cd680>] vfs_fallocate+0x140/0x260
       [<ffffffff811ce6f8>] SyS_fallocate+0x48/0x80
       [<ffffffff81ddec09>] system_call_fastpath+0x12/0x17
      
      The tracepoint that indicates the extent that triggered the assert
      failure is:
      
      xfs_iext_insert:   idx 0 offset 0 block 16777224 count 2097152 flag 1
      
      Clearly indicating that the extent length is greater than MAXEXTLEN,
      which is 2097151. A prior trace point shows the allocation was an
      exact size match and that a length greater than MAXEXTLEN was asked
      for:
      
      xfs_alloc_size_done:  agno 1 agbno 8 minlen 2097152 maxlen 2097152
      					    ^^^^^^^        ^^^^^^^
      
      We don't see this problem with extent size hints through the IO path
      because we can't do single IOs large enough to trigger MAXEXTLEN
      allocation. fallocate(), OTOH, is not limited in it's allocation
      sizes and so needs help here.
      
      The issue is that the extent size hint alignment is rounding up the
      extent size past MAXEXTLEN, because xfs_bmapi_write() is not taking
      into account extent size hints when calculating the maximum extent
      length to allocate. xfs_bmapi_reserve_delalloc() is already doing
      this, but direct extent allocation is not.
      
      Unfortunately, the calculation in xfs_bmapi_reserve_delalloc() is
      wrong, and it works only because delayed allocation extents are not
      limited in size to MAXEXTLEN in the in-core extent tree. hence this
      calculation does not work for direct allocation, and the delalloc
      code needs fixing. This may, in fact be the underlying bug that
      occassionally causes transaction overruns in delayed allocation
      extent conversion, so now we know it's wrong we should fix it, too.
      Many thanks to Brian Foster for finding this problem during review
      of this patch.
      
      Hence the fix, after much code reading, is to allow
      xfs_bmap_extsize_align() to align partial extents when full
      alignment would extend the alignment past MAXEXTLEN. We can safely
      do this because all callers have higher layer allocation loops that
      already handle short allocations, and so will simply run another
      allocation to cover the remainder of the requested allocation range
      that we ignored during alignment. The advantage of this approach is
      that it also removes the need for callers to do anything other than
      limit their requests to MAXEXTLEN - they don't really need to be
      aware of extent size hints at all.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      6dea405e
  10. 25 3月, 2015 2 次提交
  11. 23 2月, 2015 4 次提交
    • E
      xfs: pass mp to XFS_WANT_CORRUPTED_RETURN · 5fb5aeee
      Eric Sandeen 提交于
      Today, if we hit an XFS_WANT_CORRUPTED_RETURN we don't print any
      information about which filesystem hit it.  Passing in the mp allows
      us to print the filesystem (device) name, which is a pretty critical
      piece of information.
      
      Tested by running fsfuzzer 'til I hit some.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      5fb5aeee
    • E
      xfs: pass mp to XFS_WANT_CORRUPTED_GOTO · c29aad41
      Eric Sandeen 提交于
      Today, if we hit an XFS_WANT_CORRUPTED_GOTO we don't print any
      information about which filesystem hit it.  Passing in the mp allows
      us to print the filesystem (device) name, which is a pretty critical
      piece of information.
      
      Tested by running fsfuzzer 'til I hit some.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      c29aad41
    • D
      xfs: introduce xfs_mod_frextents · bab98bbe
      Dave Chinner 提交于
      Add a new helper to modify the incore counter of free realtime
      extents. This matches the helpers used for inode and data block
      counters, and removes a significant users of the xfs_mod_incore_sb()
      interface.
      
      Based on a patch originally from Christoph Hellwig.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      bab98bbe
    • D
      xfs: use generic percpu counters for free block counter · 0d485ada
      Dave Chinner 提交于
      XFS has hand-rolled per-cpu counters for the superblock since before
      there was any generic implementation. The free block counter is
      special in that it is used for ENOSPC detection outside transaction
      contexts for for delayed allocation. This means that the counter
      needs to be accurate at zero. The current per-cpu counter code jumps
      through lots of hoops to ensure we never run past zero, but we don't
      need to make all those jumps with the generic counter
      implementation.
      
      The generic counter implementation allows us to pass a "batch"
      threshold at which the addition/subtraction to the counter value
      will be folded back into global value under lock. We can use this
      feature to reduce the batch size as we approach 0 in a very similar
      manner to the existing counters and their rebalance algorithm. If we
      use a batch size of 1 as we approach 0, then every addition and
      subtraction will be done against the global value and hence allow
      accurate detection of zero threshold crossing.
      
      Hence we can replace the handrolled, accurate-at-zero counters with
      generic percpu counters.
      
      Note: this removes just enough of the icsb infrastructure to compile
      without warnings. The rest will go in subsequent commits.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      0d485ada
  12. 22 1月, 2015 3 次提交
    • D
      xfs: set buf types when converting extent formats · fe22d552
      Dave Chinner 提交于
      Conversion from local to extent format does not set the buffer type
      correctly on the new extent buffer when a symlink data is moved out
      of line.
      
      Fix the symlink code and leave a comment in the generic bmap code
      reminding us that the format-specific data copy needs to set the
      destination buffer type appropriately.
      
      cc: <stable@vger.kernel.org> # 3.10 to current
      Tested-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      fe22d552
    • D
      xfs: consolidate superblock logging functions · 61e63ecb
      Dave Chinner 提交于
      We now have several superblock loggin functions that are identical
      except for the transaction reservation and whether it shoul dbe a
      synchronous transaction or not. Consolidate these all into a single
      function, a single reserveration and a sync flag and call it
      xfs_sync_sb().
      
      Also, xfs_mod_sb() is not really a modification function - it's the
      operation of logging the superblock buffer. hence change the name of
      it to reflect this.
      
      Note that we have to change the mp->m_update_flags that are passed
      around at mount time to a boolean simply to indicate a superblock
      update is needed.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      61e63ecb
    • D
      xfs: remove bitfield based superblock updates · 4d11a402
      Dave Chinner 提交于
      When we log changes to the superblock, we first have to write them
      to the on-disk buffer, and then log that. Right now we have a
      complex bitfield based arrangement to only write the modified field
      to the buffer before we log it.
      
      This used to be necessary as a performance optimisation because we
      logged the superblock buffer in every extent or inode allocation or
      freeing, and so performance was extremely important. We haven't done
      this for years, however, ever since the lazy superblock counters
      pulled the superblock logging out of the transaction commit
      fast path.
      
      Hence we have a bunch of complexity that is not necessary that makes
      writing the in-core superblock to disk much more complex than it
      needs to be. We only need to log the superblock now during
      management operations (e.g. during mount, unmount or quota control
      operations) so it is not a performance critical path anymore.
      
      As such, remove the complex field based logging mechanism and
      replace it with a simple conversion function similar to what we use
      for all other on-disk structures.
      
      This means we always log the entirity of the superblock, but again
      because we rarely modify the superblock this is not an issue for log
      bandwidth or CPU time. Indeed, if we do log the superblock
      frequently, delayed logging will minimise the impact of this
      overhead.
      
      [Fixed gquota/pquota inode sharing regression noticed by bfoster.]
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      4d11a402
  13. 04 12月, 2014 3 次提交
  14. 01 12月, 2014 1 次提交
  15. 28 11月, 2014 3 次提交
  16. 23 9月, 2014 4 次提交
    • B
      xfs: writeback and inval. file range to be shifted by collapse · f71721d0
      Brian Foster 提交于
      The collapse range operation currently writes the entire file before
      starting the collapse to avoid changes in the in-core extent list due to
      writeback causing the extent count to change. Now that collapse range is
      fsb based rather than extent index based it can sustain changes in the
      extent list during the shift sequence without disruption.
      
      Modify xfs_collapse_file_space() to writeback and invalidate pages
      associated with the range of the file to be shifted.
      xfs_free_file_space() currently has similar behavior, but the space free
      need only affect the region of the file that is freed and this could
      change in the future.
      
      Also update the comments to reflect the current implementation. We
      retain the eofblocks trim permanently as a best option for dealing with
      delalloc extents. We don't shift delalloc extents because this scenario
      only occurs with post-eof preallocation (since data must be flushed such
      that the cache can be invalidated and data can be shifted). That means
      said space must also be initialized before being shifted into the
      accessible region of the file only to be immediately truncated off as
      the last part of the collapse. In other words, the eofblocks trim will
      happen anyways, we just run it first to ensure the file remains in a
      consistent state throughout the collapse.
      
      Finally, detect and fail explicitly in the event of a delalloc extent
      during the extent shift. The implementation does not support delalloc
      extents and the caller is expected to prevent this scenario in advance
      as is done by collapse.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      f71721d0
    • B
      xfs: refactor single extent shift into xfs_bmse_shift_one() helper · a979bdfe
      Brian Foster 提交于
      xfs_bmap_shift_extents() has a variety of conditions and error checks
      that make the logic difficult to follow and indent heavy. Refactor the
      loop body of this function into a new xfs_bmse_shift_one() helper. This
      simplifies the error checks, eliminates index decrement on merge hack by
      pushing the index increment down into the helper, and makes the code
      more readable by reducing multiple levels of indentation.
      
      This is a code refactor only. The behavior of extent shift and collapse
      range is not modified.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      a979bdfe
    • B
      xfs: refactor shift-by-merge into xfs_bmse_merge() helper · ddb19e31
      Brian Foster 提交于
      The extent shift mechanism in xfs_bmap_shift_extents() is complicated
      and handles several different, non-deterministic scenarios. These
      include extent shifts, extent merges and potential btree updates in
      either of the former scenarios.
      
      Refactor the code to be more linear and readable. The loop logic in
      xfs_bmap_shift_extents() and some initial error checking is adjusted
      slightly. The associated btree lookup and update/delete operations are
      condensed into single blocks of code. This reduces the number of
      btree-specific blocks and facilitates the separation of the merge
      operation into a new xfs_bmse_merge() and xfs_bmse_can_merge() helpers.
      
      This is a code refactor only. The behavior of extent shift and collapse
      range is not modified.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      ddb19e31
    • B
      xfs: track collapse via file offset rather than extent index · 2c845f5a
      Brian Foster 提交于
      The collapse range implementation uses a transaction per extent shift.
      The progress of the overall operation is tracked via the current extent
      index of the in-core extent list. This is racy because the ilock must be
      dropped and reacquired for each transaction according to locking and log
      reservation rules. Therefore, writeback to prior regions of the file is
      possible and can change the extent count. This changes the extent to
      which the current index refers and causes the collapse to fail mid
      operation. To avoid this problem, the entire file is currently written
      back before the collapse operation starts.
      
      To eliminate the need to flush the entire file, use the file offset
      (fsb) to track the progress of the overall extent shift operation rather
      than the extent index. Modify xfs_bmap_shift_extents() to
      unconditionally convert the start_fsb parameter to an extent index and
      return the file offset of the extent where the shift left off, if
      further extents exist. The bulk of ths function can remain based on
      extent index as ilock is held by the caller. xfs_collapse_file_space()
      now uses the fsb output as the starting point for the subsequent shift.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      2c845f5a
  17. 02 9月, 2014 1 次提交
    • B
      xfs: don't log inode unless extent shift makes extent modifications · ca446d88
      Brian Foster 提交于
      The file collapse mechanism uses xfs_bmap_shift_extents() to collapse
      all subsequent extents down into the specified, previously punched out,
      region. This function performs some validation, such as whether a
      sufficient hole exists in the target region of the collapse, then shifts
      the remaining exents downward.
      
      The exit path of the function currently logs the inode unconditionally.
      While we must log the inode (and abort) if an error occurs and the
      transaction is dirty, the initial validation paths can generate errors
      before the transaction has been dirtied. This creates an unnecessary
      filesystem shutdown scenario, as the caller will cancel a transaction
      that has been marked dirty.
      
      Modify xfs_bmap_shift_extents() to OR the logflags bits as modifications
      are made to the inode bmap. Only log the inode in the exit path if
      logflags has been set. This ensures we only have to cancel a dirty
      transaction if modifications have been made and prevents an unnecessary
      filesystem shutdown otherwise.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      ca446d88
  18. 30 7月, 2014 1 次提交
  19. 25 6月, 2014 2 次提交
  20. 22 6月, 2014 1 次提交
  21. 06 6月, 2014 1 次提交
  22. 23 4月, 2014 1 次提交