1. 15 5月, 2012 5 次提交
    • D
      xfs: use blocks for counting length of buffers · 4e94b71b
      Dave Chinner 提交于
      Now that we pass block counts everywhere, and index buffers by block
      number, track the length of the buffer in units of blocks rather
      than bytes. Convert the code to use block counts, and those that
      need byte counts get converted at the time of use.
      
      Also, remove the XFS_BUF_{SET_}SIZE() macros that are just wrappers
      around the buffer length. They only serve to make the code shouty
      loud and don't actually add any real value.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      4e94b71b
    • D
      xfs: kill b_file_offset · de1cbee4
      Dave Chinner 提交于
      Seeing as we pass block numbers around everywhere in the buffer
      cache now, it makes no sense to index everything by byte offset.
      Replace all the byte offset indexing with block number based
      indexing, and replace all uses of the byte offset with direct
      conversion from the block index.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      de1cbee4
    • D
      xfs: clean up buffer get/read call API · e70b73f8
      Dave Chinner 提交于
      The xfs_buf_get/read API is not consistent in the units it uses, and
      does not use appropriate or consistent units/types for the
      variables.
      
      Convert the API to use disk addresses and block counts for all
      buffer get and read calls. Use consistent naming for all the
      functions and their declarations, and convert the internal functions
      to use disk addresses and block counts to avoid need to convert them
      from one type to another and back again.
      
      Fix all the callers to use disk addresses and block counts. In many
      cases, this removes an additional conversion from the function call
      as the callers already have a block count.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      e70b73f8
    • D
      xfs: check for buffer errors before waiting · 0e95f19a
      Dave Chinner 提交于
      If we call xfs_buf_iowait() on a buffer that failed dispatch due to
      an IO error, it will wait forever for an Io that does not exist.
      This is hndled in xfs_buf_read, but there is other code that calls
      xfs_buf_iowait directly that doesn't.
      
      Rather than make the call sites have to handle checking for dispatch
      errors and then checking for completion errors, make
      xfs_buf_iowait() check for dispatch errors on the buffer before
      waiting. This means we handle both dispatch and completion errors
      with one set of error handling at the caller sites.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      0e95f19a
    • C
      xfs: on-stack delayed write buffer lists · 43ff2122
      Christoph Hellwig 提交于
      Queue delwri buffers on a local on-stack list instead of a per-buftarg one,
      and write back the buffers per-process instead of by waking up xfsbufd.
      
      This is now easily doable given that we have very few places left that write
      delwri buffers:
      
       - log recovery:
      	Only done at mount time, and already forcing out the buffers
      	synchronously using xfs_flush_buftarg
      
       - quotacheck:
      	Same story.
      
       - dquot reclaim:
      	Writes out dirty dquots on the LRU under memory pressure.  We might
      	want to look into doing more of this via xfsaild, but it's already
      	more optimal than the synchronous inode reclaim that writes each
      	buffer synchronously.
      
       - xfsaild:
      	This is the main beneficiary of the change.  By keeping a local list
      	of buffers to write we reduce latency of writing out buffers, and
      	more importably we can remove all the delwri list promotions which
      	were hitting the buffer cache hard under sustained metadata loads.
      
      The implementation is very straight forward - xfs_buf_delwri_queue now gets
      a new list_head pointer that it adds the delwri buffers to, and all callers
      need to eventually submit the list using xfs_buf_delwi_submit or
      xfs_buf_delwi_submit_nowait.  Buffers that already are on a delwri list are
      skipped in xfs_buf_delwri_queue, assuming they already are on another delwri
      list.  The biggest change to pass down the buffer list was done to the AIL
      pushing. Now that we operate on buffers the trylock, push and pushbuf log
      item methods are merged into a single push routine, which tries to lock the
      item, and if possible add the buffer that needs writeback to the buffer list.
      This leads to much simpler code than the previous split but requires the
      individual IOP_PUSH instances to unlock and reacquire the AIL around calls
      to blocking routines.
      
      Given that xfsailds now also handle writing out buffers, the conditions for
      log forcing and the sleep times needed some small changes.  The most
      important one is that we consider an AIL busy as long we still have buffers
      to push, and the other one is that we do increment the pushed LSN for
      buffers that are under flushing at this moment, but still count them towards
      the stuck items for restart purposes.  Without this we could hammer on stuck
      items without ever forcing the log and not make progress under heavy random
      delete workloads on fast flash storage devices.
      
      [ Dave Chinner:
      	- rebase on previous patches.
      	- improved comments for XBF_DELWRI_Q handling
      	- fix XBF_ASYNC handling in queue submission (test 106 failure)
      	- rename delwri submit function buffer list parameters for clarity
      	- xfs_efd_item_push() should return XFS_ITEM_PINNED ]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      43ff2122
  2. 29 3月, 2012 1 次提交
  3. 17 12月, 2011 1 次提交
  4. 12 10月, 2011 12 次提交
  5. 13 8月, 2011 1 次提交
    • C
      xfs: remove subdirectories · c59d87c4
      Christoph Hellwig 提交于
      Use the move from Linux 2.6 to Linux 3.x as an excuse to kill the
      annoying subdirectories in the XFS source code.  Besides the large
      amount of file rename the only changes are to the Makefile, a few
      files including headers with the subdirectory prefix, and the binary
      sysctl compat code that includes a header under fs/xfs/ from
      kernel/.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      c59d87c4
  6. 26 7月, 2011 11 次提交
  7. 13 7月, 2011 3 次提交
  8. 08 7月, 2011 3 次提交
  9. 20 5月, 2011 1 次提交
    • D
      xfs: reset buffer pointers before freeing them · 44396476
      Dave Chinner 提交于
      When we free a vmapped buffer, we need to ensure the vmap address
      and length we free is the same as when it was allocated. In various
      places in the log code we change the memory the buffer is pointing
      to before issuing IO, but we never reset the buffer to point back to
      it's original memory (or no memory, if that is the case for the
      buffer).
      
      As a result, when we free the buffer it points to memory that is
      owned by something else and attempts to unmap and free it. Because
      the range does not match any known mapped range, it can trigger
      BUG_ON() traps in the vmap code, and potentially corrupt the vmap
      area tracking.
      
      Fix this by always resetting these buffers to their original state
      before freeing them.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      44396476
  10. 26 3月, 2011 1 次提交
    • D
      xfs: stop using the page cache to back the buffer cache · 0e6e847f
      Dave Chinner 提交于
      Now that the buffer cache has it's own LRU, we do not need to use
      the page cache to provide persistent caching and reclaim
      infrastructure. Convert the buffer cache to use alloc_pages()
      instead of the page cache. This will remove all the overhead of page
      cache management from setup and teardown of the buffers, as well as
      needing to mark pages accessed as we find buffers in the buffer
      cache.
      
      By avoiding the page cache, we also remove the need to keep state in
      the page_private(page) field for persistant storage across buffer
      free/buffer rebuild and so all that code can be removed. This also
      fixes the long-standing problem of not having enough bits in the
      page_private field to track all the state needed for a 512
      sector/64k page setup.
      
      It also removes the need for page locking during reads as the pages
      are unique to the buffer and nobody else will be attempting to
      access them.
      
      Finally, it removes the buftarg address space lock as a point of
      global contention on workloads that allocate and free buffers
      quickly such as when creating or removing large numbers of inodes in
      parallel. This remove the 16TB limit on filesystem size on 32 bit
      machines as the page index (32 bit) is no longer used for lookups
      of metadata buffers - the buffer cache is now solely indexed by disk
      address which is stored in a 64 bit field in the buffer.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      0e6e847f
  11. 12 1月, 2011 1 次提交
    • C
      xfs: fix error handling for synchronous writes · bfc60177
      Christoph Hellwig 提交于
      If we get an IO error on a synchronous superblock write, we attach an
      error release function to it so that when the last reference goes away
      the release function is called and the buffer is invalidated and
      unlocked. The buffer is left locked until the release function is
      called so that other concurrent users of the buffer will be locked out
      until the buffer error is fully processed.
      
      Unfortunately, for the superblock buffer the filesyetm itself holds a
      reference to the buffer which prevents the reference count from
      dropping to zero and the release function being called. As a result,
      once an IO error occurs on a sync write, the buffer will never be
      unlocked and all future attempts to lock the buffer will hang.
      
      To make matters worse, this problems is not unique to such buffers;
      if there is a concurrent _xfs_buf_find() running, the lookup will grab
      a reference to the buffer and then wait on the buffer lock, preventing
      the reference count from ever falling to zero and hence unlocking the
      buffer.
      
      As such, the whole b_relse function implementation is broken because it
      cannot rely on the buffer reference count falling to zero to unlock the
      errored buffer. The synchronous write error path is the only path that
      uses this callback - it is used to ensure that the synchronous waiter
      gets the buffer error before the error state is cleared from the buffer
      by the release function.
      
      Given that the only sychronous buffer writes now go through xfs_bwrite
      and the error path in question can only occur for a write of a dirty,
      logged buffer, we can move most of the b_relse processing to happen
      inline in xfs_buf_iodone_callbacks, just like a normal I/O completion.
      In addition to that we make sure the error is not cleared in
      xfs_buf_iodone_callbacks, so that xfs_bwrite can reliably check it.
      Given that xfs_bwrite keeps the buffer locked until it has waited for
      it and checked the error this allows to reliably propagate the error
      to the caller, and make sure that the buffer is reliably unlocked.
      
      Given that xfs_buf_iodone_callbacks was the only instance of the
      b_relse callback we can remove it entirely.
      
      Based on earlier patches by Dave Chinner and Ajeet Yadav.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reported-by: NAjeet Yadav <ajeet.yadav.77@gmail.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      bfc60177