1. 06 4月, 2016 1 次提交
    • C
      xfs: better xfs_trans_alloc interface · 253f4911
      Christoph Hellwig 提交于
      Merge xfs_trans_reserve and xfs_trans_alloc into a single function call
      that returns a transaction with all the required log and block reservations,
      and which allows passing transaction flags directly to avoid the cumbersome
      _xfs_trans_alloc interface.
      
      While we're at it we also get rid of the transaction type argument that has
      been superflous since we stopped supporting the non-CIL logging mode.  The
      guts of it will be removed in another patch.
      
      [dchinner: fixed transaction leak in error path in xfs_setattr_nonsize]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      253f4911
  2. 18 3月, 2016 2 次提交
  3. 16 3月, 2016 2 次提交
  4. 15 3月, 2016 6 次提交
    • C
      xfs: always set rvalp in xfs_dir2_node_trim_free · 355cced4
      Christoph Hellwig 提交于
      xfs_dir2_node_trim_free can return with setting the rvalp argument
      pointer.  Initialize it to 0 at the beginning of the function and
      only update it to 1 if we succeeded trimming a freespace block.
      Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NCarlos Maiolino <cmaiolino@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      355cced4
    • E
      xfs: ensure committed is initialized in xfs_trans_roll · cc07eed8
      Eric Sandeen 提交于
      __xfs_trans_roll() can return without setting the
      *committed argument; this was a problem for xfs_bmap_finish():
      
              int       committed;/* xact committed or not */
      ...
              error = __xfs_trans_roll(tp, ip, &committed);
              if (error) {
      ...
                      if (committed) {
      
      and we tested an uninitialized "committed" variable on the
      error path.  No caller is preserving "committed" state across
      calls to __xfs_trans_roll(), so just initialize committed inside
      the function to avoid future errors like this.
      Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      cc07eed8
    • B
      xfs: borrow indirect blocks from freed extent when available · d34999c9
      Brian Foster 提交于
      xfs_bmap_del_extent() handles extent removal from the in-core and
      on-disk extent lists. When removing a delalloc range, it updates the
      indirect block reservation appropriately based on the removal. It
      currently enforces that the new indirect block reservation is less than
      or equal to the original. This is normally the case in all situations
      except for in certain cases when the removed range creates a hole in a
      single delalloc extent, thus splitting a single delalloc extent in two.
      
      It is possible with small enough extents to split an indlen==1 extent
      into two such slightly smaller extents. This leaves one extent with 0
      indirect blocks and leads to assert failures in other areas (e.g.,
      xfs_bunmapi() if the extent happens to be removed).
      
      Update the indlen distribution code to steal blocks from the deleted
      extent, if necessary, to satisfy the worst case total indirect
      reservation for the new extents. This is safe as the caller does not
      update the fdblocks counters until the extent is removed. Blocks stolen
      in this manner simply remain accounted as allocated, having ownership
      transferred from the data extent to an indirect reservation.
      
      As a precaution, fall back to the original reservation algorithm if the
      new indlen requirement is not met and warn if we end up with extents
      without any reservation at all to detect this more easily in the future.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      d34999c9
    • B
      xfs: refactor delalloc indlen reservation split into helper · a9bd24ac
      Brian Foster 提交于
      The delayed allocation indirect reservation splitting code is not
      sufficient in some cases where a delalloc extent is split in two. In
      preparation for enhancements to this code, refactor the current indlen
      distribution algorithm into a new helper function.
      
      [dchinner: rename temp, temp2 variables]
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      a9bd24ac
    • B
      xfs: update freeblocks counter after extent deletion · b2706a05
      Brian Foster 提交于
      xfs_bunmapi() currently updates the fdblocks counter, unreserves quota,
      etc. before the extent is deleted by xfs_bmap_del_extent(). The function
      has problems dividing up the indirect reserved blocks for scenarios
      where a single delalloc extent is split in two. Particularly, there
      aren't always enough blocks reserved for multiple extents in a single
      extent reservation.
      
      The solution to this problem is to allow the extent removal code to
      steal from the deleted extent to meet indirect reservation requirements.
      Move the block of code in xfs_bmapi() that updates the fdblocks counter
      to after the call to xfs_bmap_del_extent() to allow the codepath to
      update the extent record before the free blocks are accounted. Also,
      reshuffle the code slightly so the delalloc accounting occurs near the
      xfs_bmap_del_extent() call to provide context for the comments.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      b2706a05
    • B
      xfs: debug mode forced buffered write failure · 801cc4e1
      Brian Foster 提交于
      Add a DEBUG mode-only sysfs knob to enable forced buffered write
      failure. An additional side effect of this mode is brute force killing
      of delayed allocation blocks in the range of the write. The latter is
      the prime motiviation behind this patch, as userspace test
      infrastructure requires a reliable mechanism to create and split
      delalloc extents without causing extent conversion.
      
      Certain fallocate operations (i.e., zero range) were used for this in
      the past, but the implementations have changed such that delalloc
      extents are flushed and converted to real blocks, rendering the test
      useless.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      801cc4e1
  5. 09 3月, 2016 2 次提交
  6. 07 3月, 2016 8 次提交
  7. 02 3月, 2016 6 次提交
  8. 01 3月, 2016 4 次提交
  9. 28 2月, 2016 2 次提交
    • R
      dax: move writeback calls into the filesystems · 7f6d5b52
      Ross Zwisler 提交于
      Previously calls to dax_writeback_mapping_range() for all DAX filesystems
      (ext2, ext4 & xfs) were centralized in filemap_write_and_wait_range().
      
      dax_writeback_mapping_range() needs a struct block_device, and it used
      to get that from inode->i_sb->s_bdev.  This is correct for normal inodes
      mounted on ext2, ext4 and XFS filesystems, but is incorrect for DAX raw
      block devices and for XFS real-time files.
      
      Instead, call dax_writeback_mapping_range() directly from the filesystem
      ->writepages function so that it can supply us with a valid block
      device.  This also fixes DAX code to properly flush caches in response
      to sync(2).
      Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Cc: Al Viro <viro@ftp.linux.org.uk>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Jens Axboe <axboe@fb.com>
      Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7f6d5b52
    • R
      dax: give DAX clearing code correct bdev · 20a90f58
      Ross Zwisler 提交于
      dax_clear_blocks() needs a valid struct block_device and previously it
      was using inode->i_sb->s_bdev in all cases.  This is correct for normal
      inodes on mounted ext2, ext4 and XFS filesystems, but is incorrect for
      DAX raw block devices and for XFS real-time devices.
      
      Instead, rename dax_clear_blocks() to dax_clear_sectors(), and change
      its arguments to take a bdev and a sector instead of an inode and a
      block.  This better reflects what the function does, and it allows the
      filesystem and raw block device code to pass in an appropriate struct
      block_device.
      Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Suggested-by: NDan Williams <dan.j.williams@intel.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Cc: Al Viro <viro@ftp.linux.org.uk>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Jens Axboe <axboe@fb.com>
      Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      20a90f58
  10. 15 2月, 2016 6 次提交
    • D
      xfs: don't chain ioends during writepage submission · e10de372
      Dave Chinner 提交于
      Currently we can build a long ioend chain during ->writepages that
      gets attached to the writepage context. IO submission only then
      occurs when we finish all the writepage processing. This means we
      can have many ioends allocated and pending, and this violates the
      mempool guarantees that we need to give about forwards progress.
      i.e. we really should only have one ioend being built at a time,
      otherwise we may drain the mempool trying to allocate a new ioend
      and that blocks submission, completion and freeing of ioends that
      are already in progress.
      
      To prevent this situation from happening, we need to submit ioends
      for IO as soon as they are ready for dispatch rather than queuing
      them for later submission. This means the ioends have bios built
      immediately and they get queued on any plug that is current active.
      Hence if we schedule away from writeback, the ioends that have been
      built will make forwards progress due to the plug flushing on
      context switch. This will also prevent context switches from
      creating unnecessary IO submission latency.
      
      We can't completely avoid having nested IO allocation - when we have
      a block size smaller than a page size, we still need to hold the
      ioend submission until after we have marked the current page dirty.
      Hence we may need multiple ioends to be held while the current page
      is completely mapped and made ready for IO dispatch. We cannot avoid
      this problem - the current code already has this ioend chaining
      within a page so we can mostly ignore that it occurs.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      e10de372
    • D
      xfs: factor mapping out of xfs_do_writepage · bfce7d2e
      Dave Chinner 提交于
      Separate out the bufferhead based mapping from the writepage code so
      that we have a clear separation of the page operations and the
      bufferhead state.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      bfce7d2e
    • D
      xfs: xfs_cluster_write is redundant · ad68972a
      Dave Chinner 提交于
      xfs_cluster_write() is not necessary now that xfs_vm_writepages()
      aggregates writepage calls across a single mapping. This means we no
      longer need to do page lookups in xfs_cluster_write, so writeback
      only needs to look up th epage cache once per page being written.
      This also removes a large amount of mostly duplicate code between
      xfs_do_writepage() and xfs_convert_page().
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      ad68972a
    • D
      xfs: Introduce writeback context for writepages · fbcc0256
      Dave Chinner 提交于
      xfs_vm_writepages() calls generic_writepages to writeback a range of
      a file, but then xfs_vm_writepage() clusters pages itself as it does
      not have any context it can pass between->writepage calls from
      __write_cache_pages().
      
      Introduce a writeback context for xfs_vm_writepages() and call
      __write_cache_pages directly with our own writepage callback so that
      we can pass that context to each writepage invocation. This
      encapsulates the current mapping, whether it is valid or not, the
      current ioend and it's IO type and the ioend chain being built.
      
      This requires us to move the ioend submission up to the level where
      the writepage context is declared. This does mean we do not submit
      IO until we packaged the entire writeback range, but with the block
      plugging in the writepages call this is the way IO is submitted,
      anyway.
      
      It also means that we need to handle discontiguous page ranges.  If
      the pages sent down by write_cache_pages to the writepage callback
      are discontiguous, we need to detect this and put each discontiguous
      page range into individual ioends. This is needed to ensure that the
      ioend accurately represents the range of the file that it covers so
      that file size updates during IO completion set the size correctly.
      Failure to take into account the discontiguous ranges results in
      files being too small when writeback patterns are non-sequential.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      fbcc0256
    • D
      xfs: remove xfs_cancel_ioend · 150d5be0
      Dave Chinner 提交于
      We currently have code to cancel ioends being built because we
      change bufferhead state as we build the ioend. On error, this needs
      to be unwound and so we have cancelling code that walks the buffers
      on the ioend chain and undoes these state changes.
      
      However, the IO submission path already handles state changes for
      buffers when a submission error occurs, so we don't really need a
      separate cancel function to do this - we can simply submit the
      ioend chain with the specific error and it will be cancelled rather
      than submitted.
      
      Hence we can remove the explicit cancel code and just rely on
      submission to deal with the error correctly.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      150d5be0
    • D
      xfs: remove nonblocking mode from xfs_vm_writepage · 988ef927
      Dave Chinner 提交于
      Remove the nonblocking optimisation done for mapping lookups during
      writeback. It's not clear that leaving a hole in the writeback range
      just because we couldn't get a lock is really a win, as it makes us
      do another small random IO later on rather than a large sequential
      IO now.
      
      As this gets in the way of sane error handling later on, just remove
      for the moment and we can re-introduce an equivalent optimisation in
      future if we see problems due to extent map lock contention.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      988ef927
  11. 10 2月, 2016 1 次提交