1. 03 8月, 2016 8 次提交
    • D
      aa966d84
    • D
      xfs: add rmap btree operations · 4b8ed677
      Darrick J. Wong 提交于
      Originally-From: Dave Chinner <dchinner@redhat.com>
      
      Implement the generic btree operations needed to manipulate rmap
      btree blocks. This is very similar to the per-ag freespace btree
      implementation, and uses the AGFL for allocation and freeing of
      blocks.
      
      Adapt the rmap btree to store owner offsets within each rmap record,
      and to handle the primary key being redefined as the tuple
      [agblk, owner, offset].  The expansion of the primary key is crucial
      to allowing multiple owners per extent.
      
      [darrick: adapt the btree ops to deal with offsets]
      [darrick: remove init_rec_from_key]
      [darrick: move unwritten bit to rm_offset]
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      4b8ed677
    • D
      xfs: define the on-disk rmap btree format · 035e00ac
      Darrick J. Wong 提交于
      Originally-From: Dave Chinner <dchinner@redhat.com>
      
      Now we have all the surrounding call infrastructure in place, we can
      start filling out the rmap btree implementation. Start with the
      on-disk btree format; add everything needed to read, write and
      manipulate rmap btree blocks. This prepares the way for adding the
      btree operations implementation.
      
      [darrick: record owner and offset info in rmap btree]
      [darrick: fork, bmbt and unwritten state in rmap btree]
      [darrick: flags are a separate field in xfs_rmap_irec]
      [darrick: calculate maxlevels separately]
      [darrick: move the 'unwritten' bit into unused parts of rm_offset]
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      035e00ac
    • D
      xfs: introduce rmap extent operation stubs · 673930c3
      Darrick J. Wong 提交于
      Originally-From: Dave Chinner <dchinner@redhat.com>
      
      Add the stubs into the extent allocation and freeing paths that the
      rmap btree implementation will hook into. While doing this, add the
      trace points that will be used to track rmap btree extent
      manipulations.
      
      [darrick.wong@oracle.com: Extend the stubs to take full owner info.]
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      673930c3
    • D
      xfs: add tracepoints and error injection for deferred extent freeing · ba9e7802
      Darrick J. Wong 提交于
      Add a couple of tracepoints for the deferred extent free operation and
      a site for injecting errors while finishing the operation.  This makes
      it easier to debug deferred ops and test log redo.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      ba9e7802
    • D
      xfs: add tracepoints for the deferred ops mechanism · 3cd48abc
      Darrick J. Wong 提交于
      Add tracepoints for the internals of the deferred ops mechanism
      and tracepoint classes for clients of the dops, to make debugging
      easier.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      3cd48abc
    • D
      xfs: introduce interval queries on btrees · 105f7d83
      Darrick J. Wong 提交于
      Create a function to enable querying of btree records mapping to a
      range of keys.  This will be used in subsequent patches to allow
      querying the reverse mapping btree to find the extents mapped to a
      range of physical blocks, though the generic code can be used for
      any range query.
      
      The overlapped query range function needs to use the btree get_block
      helper because the root block could be an inode, in which case
      bc_bufs[nlevels-1] will be NULL.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      105f7d83
    • D
      xfs: support btrees with overlapping intervals for keys · 2c813ad6
      Darrick J. Wong 提交于
      On a filesystem with both reflink and reverse mapping enabled, it's
      possible to have multiple rmap records referring to the same blocks on
      disk.  When overlapping intervals are possible, querying a classic
      btree to find all records intersecting a given interval is inefficient
      because we cannot use the left side of the search interval to filter
      out non-matching records the same way that we can use the existing
      btree key to filter out records coming after the right side of the
      search interval.  This will become important once we want to use the
      rmap btree to rebuild BMBTs, or implement the (future) fsmap ioctl.
      
      (For the non-overlapping case, we can perform such queries trivially
      by starting at the left side of the interval and walking the tree
      until we pass the right side.)
      
      Therefore, extend the btree code to come closer to supporting
      intervals as a first-class record attribute.  This involves widening
      the btree node's key space to store both the lowest key reachable via
      the node pointer (as the btree does now) and the highest key reachable
      via the same pointer and teaching the btree modifying functions to
      keep the highest-key records up to date.
      
      This behavior can be turned on via a new btree ops flag so that btrees
      that cannot store overlapping intervals don't pay the overhead costs
      in terms of extra code and disk format changes.
      
      When we're deleting a record in a btree that supports overlapped
      interval records and the deletion results in two btree blocks being
      joined, we defer updating the high/low keys until after all possible
      joining (at higher levels in the tree) have finished.  At this point,
      the btree pointers at all levels have been updated to remove the empty
      blocks and we can update the low and high keys.
      
      When we're doing this, we must be careful to update the keys of all
      node pointers up to the root instead of stopping at the first set of
      keys that don't need updating.  This is because it's possible for a
      single deletion to cause joining of multiple levels of tree, and so
      we need to update everything going back to the root.
      
      The diff_two_keys functions return < 0, 0, or > 0 if key1 is less than,
      equal to, or greater than key2, respectively.  This is consistent
      with the rest of the kernel and the C library.
      
      In btree_updkeys(), we need to evaluate the force_all parameter before
      running the key diff to avoid reading uninitialized memory when we're
      forcing a key update.  This happens when we've allocated an empty slot
      at level N + 1 to point to a new block at level N and we're in the
      process of filling out the new keys.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      2c813ad6
  2. 20 7月, 2016 2 次提交
  3. 21 6月, 2016 2 次提交
  4. 18 5月, 2016 1 次提交
  5. 06 4月, 2016 2 次提交
  6. 08 2月, 2016 1 次提交
    • C
      xfs: don't use ioends for direct write completions · 273dda76
      Christoph Hellwig 提交于
      We only need to communicate two bits of information to the direct I/O
      completion handler:
      
       (1) do we need to convert any unwritten extents in the range
       (2) do we need to check if we need to update the inode size based
           on the range passed to the completion handler
      
      We can use the private data passed to the get_block handler and the
      completion handler as a simple bitmask to communicate this information
      instead of the current complicated infrastructure reusing the ioends
      from the buffer I/O path, and thus avoiding a memory allocation and
      a context switch for any non-trivial direct write.  As a nice side
      effect we also decouple the direct I/O path implementation from that
      of the buffered I/O path.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      
      273dda76
  7. 08 1月, 2016 1 次提交
  8. 03 11月, 2015 1 次提交
  9. 12 10月, 2015 1 次提交
  10. 09 9月, 2015 1 次提交
  11. 19 8月, 2015 1 次提交
  12. 29 5月, 2015 1 次提交
    • B
      xfs: allocate sparse inode chunks on full chunk allocation failure · 56d1115c
      Brian Foster 提交于
      xfs_ialloc_ag_alloc() makes several attempts to allocate a full inode
      chunk. If all else fails, reduce the allocation to the sparse length and
      alignment and attempt to allocate a sparse inode chunk.
      
      If sparse chunk allocation succeeds, check whether an inobt record
      already exists that can track the chunk. If so, inherit and update the
      existing record. Otherwise, insert a new record for the sparse chunk.
      
      Create helpers to align sparse chunk inode records and insert or update
      existing records in the inode btrees. The xfs_inobt_insert_sprec()
      helper implements the merge or update semantics required for sparse
      inode records with respect to both the inobt and finobt. To update the
      inobt, either insert a new record or merge with an existing record. To
      update the finobt, use the updated inobt record to either insert or
      replace an existing record.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      56d1115c
  13. 16 4月, 2015 3 次提交
    • D
      xfs: DIO writes within EOF don't need an ioend · a06c277a
      Dave Chinner 提交于
      DIO writes that lie entirely within EOF have nothing to do in IO
      completion. In this case, we don't need no steekin' ioend, and so we
      can avoid allocating an ioend until we have a mapping that spans
      EOF.
      
      This means that IO completion has two contexts - deferred completion
      to the dio workqueue that uses an ioend, and interrupt completion
      that does nothing because there is nothing that can be done in this
      context.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      a06c277a
    • D
      xfs: handle DIO overwrite EOF update completion correctly · 6dfa1b67
      Dave Chinner 提交于
      Currently a DIO overwrite that extends the EOF (e.g sub-block IO or
      write into allocated blocks beyond EOF) requires a transaction for
      the EOF update. Thi is done in IO completion context, but we aren't
      explicitly handling this situation properly and so it can run in
      interrupt context. Ensure that we defer IO that spans EOF correctly
      to the DIO completion workqueue, and now that we have an ioend in IO
      completion we can use the common ioend completion path to do all the
      work.
      
      Note: we do not preallocate the append transaction as we can have
      multiple mapping and allocation calls per direct IO. hence
      preallocating can still leave us with nested transactions by
      attempting to map and allocate more blocks after we've preallocated
      an append transaction.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      6dfa1b67
    • D
      xfs: DIO needs an ioend for writes · d5cc2e3f
      Dave Chinner 提交于
      Currently we can only tell DIO completion that an IO requires
      unwritten extent completion. This is done by a hacky non-null
      private pointer passed to Io completion, but the private pointer
      does not actually contain any information that is used.
      
      We also need to pass to IO completion the fact that the IO may be
      beyond EOF and so a size update transaction needs to be done. This
      is currently determined by checks in the io completion, but we need
      to determine if this is necessary at block mapping time as we need
      to defer the size update transactions to a completion workqueue,
      just like unwritten extent conversion.
      
      To do this, first we need to allocate and pass an ioend to to IO
      completion. Add this for unwritten extent conversion; we'll do the
      EOF updates in the next commit.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      d5cc2e3f
  14. 25 3月, 2015 2 次提交
  15. 23 2月, 2015 2 次提交
  16. 02 10月, 2014 1 次提交
    • D
      xfs: introduce xfs_buf_submit[_wait] · 595bff75
      Dave Chinner 提交于
      There is a lot of cookie-cutter code that looks like:
      
      	if (shutdown)
      		handle buffer error
      	xfs_buf_iorequest(bp)
      	error = xfs_buf_iowait(bp)
      	if (error)
      		handle buffer error
      
      spread through XFS. There's significant complexity now in
      xfs_buf_iorequest() to specifically handle this sort of synchronous
      IO pattern, but there's all sorts of nasty surprises in different
      error handling code dependent on who owns the buffer references and
      the locks.
      
      Pull this pattern into a single helper, where we can hide all the
      synchronous IO warts and hence make the error handling for all the
      callers much saner. This removes the need for a special extra
      reference to protect IO completion processing, as we can now hold a
      single reference across dispatch and waiting, simplifying the sync
      IO smeantics and error handling.
      
      In doing this, also rename xfs_buf_iorequest to xfs_buf_submit and
      make it explicitly handle on asynchronous IO. This forces all users
      to be switched specifically to one interface or the other and
      removes any ambiguity between how the interfaces are to be used. It
      also means that xfs_buf_iowait() goes away.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      595bff75
  17. 12 6月, 2014 1 次提交
    • A
      ->splice_write() via ->write_iter() · 8d020765
      Al Viro 提交于
      iter_file_splice_write() - a ->splice_write() instance that gathers the
      pipe buffers, builds a bio_vec-based iov_iter covering those and feeds
      it to ->write_iter().  A bunch of simple cases coverted to that...
      
      [AV: fixed the braino spotted by Cyrill]
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      8d020765
  18. 23 4月, 2014 1 次提交
  19. 14 4月, 2014 1 次提交
  20. 24 2月, 2014 1 次提交
  21. 07 11月, 2013 2 次提交
  22. 28 6月, 2013 1 次提交
    • D
      xfs: Introduce an ordered buffer item · 5f6bed76
      Dave Chinner 提交于
      If we have a buffer that we have modified but we do not wish to
      physically log in a transaction (e.g. we've logged a logical
      change), we still need to ensure that transactional integrity is
      maintained. Hence we must not move the tail of the log past the
      transaction that the buffer is associated with before the buffer is
      written to disk.
      
      This means these special buffers still need to be included in the
      transaction and added to the AIL just like a normal buffer, but we
      do not want the modifications to the buffer written into the
      transaction. IOWs, what we want is an "ordered buffer" that
      maintains the same transactional life cycle as a physically logged
      buffer, just without the transcribing of the modifications to the
      log.
      
      Hence we need to flag the buffer as an "ordered buffer" to avoid
      including it in vector size calculations or formatting during the
      transaction. Once the transaction is committed, the buffer appears
      for all intents to be the same as a physically logged buffer as it
      transitions through the log and AIL.
      
      Relogging will also work just fine for such an ordered buffer - the
      logical transaction will be replayed before the subsequent
      modifications that relog the buffer, so everything will be
      reconstructed correctly by recovery.
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      5f6bed76
  23. 20 6月, 2013 1 次提交
  24. 22 5月, 2013 1 次提交
  25. 23 3月, 2013 1 次提交