1. 11 9月, 2013 2 次提交
    • D
      xfs: rework buffer dispose list tracking · a4082357
      Dave Chinner 提交于
      In converting the buffer lru lists to use the generic code, the locking
      for marking the buffers as on the dispose list was lost.  This results in
      confusion in LRU buffer tracking and acocunting, resulting in reference
      counts being mucked up and filesystem beig unmountable.
      
      To fix this, introduce an internal buffer spinlock to protect the state
      field that holds the dispose list information.  Because there is now
      locking needed around xfs_buf_lru_add/del, and they are used in exactly
      one place each two lines apart, get rid of the wrappers and code the logic
      directly in place.
      
      Further, the LRU emptying code used on unmount is less than optimal.
      Convert it to use a dispose list as per a normal shrinker walk, and repeat
      the walk that fills the dispose list until the LRU is empty.  Thi avoids
      needing to drop and regain the LRU lock for every item being freed, and
      allows the same logic as the shrinker isolate call to be used.  Simpler,
      easier to understand.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NGlauber Costa <glommer@openvz.org>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
      Cc: Arve Hjønnevåg <arve@android.com>
      Cc: Carlos Maiolino <cmaiolino@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Chuck Lever <chuck.lever@oracle.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: J. Bruce Fields <bfields@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Kent Overstreet <koverstreet@google.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Thomas Hellstrom <thellstrom@vmware.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      a4082357
    • D
      xfs: convert buftarg LRU to generic code · e80dfa19
      Dave Chinner 提交于
      Convert the buftarg LRU to use the new generic LRU list and take advantage
      of the functionality it supplies to make the buffer cache shrinker node
      aware.
      Signed-off-by: NGlauber Costa <glommer@openvz.org>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
      Cc: Arve Hjønnevåg <arve@android.com>
      Cc: Carlos Maiolino <cmaiolino@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Chuck Lever <chuck.lever@oracle.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: J. Bruce Fields <bfields@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Kent Overstreet <koverstreet@google.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Thomas Hellstrom <thellstrom@vmware.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      e80dfa19
  2. 17 1月, 2013 1 次提交
  3. 18 12月, 2012 1 次提交
  4. 16 11月, 2012 3 次提交
    • D
      xfs: convert buffer verifiers to an ops structure. · 1813dd64
      Dave Chinner 提交于
      To separate the verifiers from iodone functions and associate read
      and write verifiers at the same time, introduce a buffer verifier
      operations structure to the xfs_buf.
      
      This avoids the need for assigning the write verifier, clearing the
      iodone function and re-running ioend processing in the read
      verifier, and gets rid of the nasty "b_pre_io" name for the write
      verifier function pointer. If we ever need to, it will also be
      easier to add further content specific callbacks to a buffer with an
      ops structure in place.
      
      We also avoid needing to export verifier functions, instead we
      can simply export the ops structures for those that are needed
      outside the function they are defined in.
      
      This patch also fixes a directory block readahead verifier issue
      it exposed.
      
      This patch also adds ops callbacks to the inode/alloc btree blocks
      initialised by growfs. These will need more work before they will
      work with CRCs.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NPhil White <pwhite@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      1813dd64
    • D
      xfs: add buffer pre-write callback · cfb02852
      Dave Chinner 提交于
      Add a callback to the buffer write path to enable verification of
      the buffer and CRC calculation prior to issuing the write to the
      underlying storage.
      
      If the callback function detects some kind of failure or error
      condition, it must mark the buffer with an error so that the caller
      can take appropriate action. In the case of xfs_buf_ioapply(), a
      corrupt metadta buffer willt rigger a shutdown of the filesystem,
      because something is clearly wrong and we can't allow corrupt
      metadata to be written to disk.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NPhil White <pwhite@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      cfb02852
    • D
      xfs: make buffer read verication an IO completion function · c3f8fc73
      Dave Chinner 提交于
      Add a verifier function callback capability to the buffer read
      interfaces.  This will be used by the callers to supply a function
      that verifies the contents of the buffer when it is read from disk.
      This patch does not provide callback functions, but simply modifies
      the interfaces to allow them to be called.
      
      The reason for adding this to the read interfaces is that it is very
      difficult to tell fom the outside is a buffer was just read from
      disk or whether we just pulled it out of cache. Supplying a callbck
      allows the buffer cache to use it's internal knowledge of the buffer
      to execute it only when the buffer is read from disk.
      
      It is intended that the verifier functions will mark the buffer with
      an EFSCORRUPTED error when verification fails. This allows the
      reading context to distinguish a verification error from an IO
      error, and potentially take further actions on the buffer (e.g.
      attempt repair) based on the error reported.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NPhil White <pwhite@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      c3f8fc73
  5. 30 8月, 2012 1 次提交
    • C
      xfs: fix race while discarding buffers [V4] · 6fb8a90a
      Carlos Maiolino 提交于
      While xfs_buftarg_shrink() is freeing buffers from the dispose list (filled with
      buffers from lru list), there is a possibility to have xfs_buf_stale() racing
      with it, and removing buffers from dispose list before xfs_buftarg_shrink() does
      it.
      
      This happens because xfs_buftarg_shrink() handle the dispose list without
      locking and the test condition in xfs_buf_stale() checks for the buffer being in
      *any* list:
      
      if (!list_empty(&bp->b_lru))
      
      If the buffer happens to be on dispose list, this causes the buffer counter of
      lru list (btp->bt_lru_nr) to be decremented twice (once in xfs_buftarg_shrink()
      and another in xfs_buf_stale()) causing a wrong account usage of the lru list.
      
      This may cause xfs_buftarg_shrink() to return a wrong value to the memory
      shrinker shrink_slab(), and such account error may also cause an underflowed
      value to be returned; since the counter is lower than the current number of
      items in the lru list, a decrement may happen when the counter is 0, causing
      an underflow on the counter.
      
      The fix uses a new flag field (and a new buffer flag) to serialize buffer
      handling during the shrink process. The new flag field has been designed to use
      btp->bt_lru_lock/unlock instead of xfs_buf_lock/unlock mechanism.
      
      dchinner, sandeen, aquini and aris also deserve credits for this.
      Signed-off-by: NCarlos Maiolino <cmaiolino@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      6fb8a90a
  6. 25 8月, 2012 1 次提交
    • C
      xfs: fix race while discarding buffers [V4] · e599b325
      Carlos Maiolino 提交于
      While xfs_buftarg_shrink() is freeing buffers from the dispose list (filled with
      buffers from lru list), there is a possibility to have xfs_buf_stale() racing
      with it, and removing buffers from dispose list before xfs_buftarg_shrink() does
      it.
      
      This happens because xfs_buftarg_shrink() handle the dispose list without
      locking and the test condition in xfs_buf_stale() checks for the buffer being in
      *any* list:
      
      if (!list_empty(&bp->b_lru))
      
      If the buffer happens to be on dispose list, this causes the buffer counter of
      lru list (btp->bt_lru_nr) to be decremented twice (once in xfs_buftarg_shrink()
      and another in xfs_buf_stale()) causing a wrong account usage of the lru list.
      
      This may cause xfs_buftarg_shrink() to return a wrong value to the memory
      shrinker shrink_slab(), and such account error may also cause an underflowed
      value to be returned; since the counter is lower than the current number of
      items in the lru list, a decrement may happen when the counter is 0, causing
      an underflow on the counter.
      
      The fix uses a new flag field (and a new buffer flag) to serialize buffer
      handling during the shrink process. The new flag field has been designed to use
      btp->bt_lru_lock/unlock instead of xfs_buf_lock/unlock mechanism.
      
      dchinner, sandeen, aquini and aris also deserve credits for this.
      Signed-off-by: NCarlos Maiolino <cmaiolino@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      e599b325
  7. 14 7月, 2012 2 次提交
  8. 02 7月, 2012 3 次提交
  9. 15 5月, 2012 10 次提交
    • D
      xfs: make XBF_MAPPED the default behaviour · 611c9946
      Dave Chinner 提交于
      Rather than specifying XBF_MAPPED for almost all buffers, introduce
      XBF_UNMAPPED for the couple of users that use unmapped buffers.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      611c9946
    • D
      xfs: kill XBF_DONTBLOCK · aa5c158e
      Dave Chinner 提交于
      Just about all callers of xfs_buf_read() and xfs_buf_get() use XBF_DONTBLOCK.
      This is used to make memory allocation use GFP_NOFS rather than GFP_KERNEL to
      avoid recursion through memory reclaim back into the filesystem.
      
      All the blocking get calls in growfs occur inside a transaction, even though
      they are no part of the transaction, so all allocation will be GFP_NOFS due to
      the task flag PF_TRANS being set. The blocking read calls occur during log
      recovery, so they will probably be unaffected by converting to GFP_NOFS
      allocations.
      
      Hence make XBF_DONTBLOCK behaviour always occur for buffers and kill the flag.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      aa5c158e
    • D
      xfs: kill XBF_LOCK · a8acad70
      Dave Chinner 提交于
      Buffers are always returned locked from the lookup routines. Hence
      we don't need to tell the lookup routines to return locked buffers,
      on to try and lock them. Remove XBF_LOCK from all the callers and
      from internal buffer cache usage.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      a8acad70
    • D
      xfs: kill xfs_buf_btoc · 795cac72
      Dave Chinner 提交于
      xfs_buf_btoc and friends are simple macros that do basic block
      to page index conversion and vice versa. These aren't widely used,
      and we use open coded masking and shifting everywhere else. Hence
      remove the macros and open code the work they do.
      
      Also, use of PAGE_CACHE_{SIZE|SHIFT|MASK} for these macros is now
      incorrect - we are using pages directly and not the page cache, so
      use PAGE_{SIZE|MASK|SHIFT} instead.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      795cac72
    • D
      xfs: use blocks for storing the desired IO size · aa0e8833
      Dave Chinner 提交于
      Now that we pass block counts everywhere, and index buffers by block
      number and length in units of blocks, convert the desired IO size
      into block counts rather than bytes. Convert the code to use block
      counts, and those that need byte counts get converted at the time of
      use.
      
      Rename the b_desired_count variable to something closer to it's
      purpose - b_io_length - as it is only used to specify the length of
      an IO for a subset of the buffer.  The only time this is used is for
      log IO - both writing iclogs and during log recovery. In all other
      cases, the b_io_length matches b_length, and hence a lot of code
      confuses the two. e.g. the buf item code uses the io count
      exclusively when it should be using the buffer length. Fix these
      apprpriately as they are found.
      
      Also, remove the XFS_BUF_{SET_}COUNT() macros that are just wrappers
      around the desired IO length. They only serve to make the code
      shouty loud, don't actually add any real value, and are often used
      incorrectly.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      aa0e8833
    • D
      xfs: use blocks for counting length of buffers · 4e94b71b
      Dave Chinner 提交于
      Now that we pass block counts everywhere, and index buffers by block
      number, track the length of the buffer in units of blocks rather
      than bytes. Convert the code to use block counts, and those that
      need byte counts get converted at the time of use.
      
      Also, remove the XFS_BUF_{SET_}SIZE() macros that are just wrappers
      around the buffer length. They only serve to make the code shouty
      loud and don't actually add any real value.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      4e94b71b
    • D
      xfs: kill b_file_offset · de1cbee4
      Dave Chinner 提交于
      Seeing as we pass block numbers around everywhere in the buffer
      cache now, it makes no sense to index everything by byte offset.
      Replace all the byte offset indexing with block number based
      indexing, and replace all uses of the byte offset with direct
      conversion from the block index.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      de1cbee4
    • D
      xfs: clean up buffer get/read call API · e70b73f8
      Dave Chinner 提交于
      The xfs_buf_get/read API is not consistent in the units it uses, and
      does not use appropriate or consistent units/types for the
      variables.
      
      Convert the API to use disk addresses and block counts for all
      buffer get and read calls. Use consistent naming for all the
      functions and their declarations, and convert the internal functions
      to use disk addresses and block counts to avoid need to convert them
      from one type to another and back again.
      
      Fix all the callers to use disk addresses and block counts. In many
      cases, this removes an additional conversion from the function call
      as the callers already have a block count.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      e70b73f8
    • D
      xfs: check for buffer errors before waiting · 0e95f19a
      Dave Chinner 提交于
      If we call xfs_buf_iowait() on a buffer that failed dispatch due to
      an IO error, it will wait forever for an Io that does not exist.
      This is hndled in xfs_buf_read, but there is other code that calls
      xfs_buf_iowait directly that doesn't.
      
      Rather than make the call sites have to handle checking for dispatch
      errors and then checking for completion errors, make
      xfs_buf_iowait() check for dispatch errors on the buffer before
      waiting. This means we handle both dispatch and completion errors
      with one set of error handling at the caller sites.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      0e95f19a
    • C
      xfs: on-stack delayed write buffer lists · 43ff2122
      Christoph Hellwig 提交于
      Queue delwri buffers on a local on-stack list instead of a per-buftarg one,
      and write back the buffers per-process instead of by waking up xfsbufd.
      
      This is now easily doable given that we have very few places left that write
      delwri buffers:
      
       - log recovery:
      	Only done at mount time, and already forcing out the buffers
      	synchronously using xfs_flush_buftarg
      
       - quotacheck:
      	Same story.
      
       - dquot reclaim:
      	Writes out dirty dquots on the LRU under memory pressure.  We might
      	want to look into doing more of this via xfsaild, but it's already
      	more optimal than the synchronous inode reclaim that writes each
      	buffer synchronously.
      
       - xfsaild:
      	This is the main beneficiary of the change.  By keeping a local list
      	of buffers to write we reduce latency of writing out buffers, and
      	more importably we can remove all the delwri list promotions which
      	were hitting the buffer cache hard under sustained metadata loads.
      
      The implementation is very straight forward - xfs_buf_delwri_queue now gets
      a new list_head pointer that it adds the delwri buffers to, and all callers
      need to eventually submit the list using xfs_buf_delwi_submit or
      xfs_buf_delwi_submit_nowait.  Buffers that already are on a delwri list are
      skipped in xfs_buf_delwri_queue, assuming they already are on another delwri
      list.  The biggest change to pass down the buffer list was done to the AIL
      pushing. Now that we operate on buffers the trylock, push and pushbuf log
      item methods are merged into a single push routine, which tries to lock the
      item, and if possible add the buffer that needs writeback to the buffer list.
      This leads to much simpler code than the previous split but requires the
      individual IOP_PUSH instances to unlock and reacquire the AIL around calls
      to blocking routines.
      
      Given that xfsailds now also handle writing out buffers, the conditions for
      log forcing and the sleep times needed some small changes.  The most
      important one is that we consider an AIL busy as long we still have buffers
      to push, and the other one is that we do increment the pushed LSN for
      buffers that are under flushing at this moment, but still count them towards
      the stuck items for restart purposes.  Without this we could hammer on stuck
      items without ever forcing the log and not make progress under heavy random
      delete workloads on fast flash storage devices.
      
      [ Dave Chinner:
      	- rebase on previous patches.
      	- improved comments for XBF_DELWRI_Q handling
      	- fix XBF_ASYNC handling in queue submission (test 106 failure)
      	- rename delwri submit function buffer list parameters for clarity
      	- xfs_efd_item_push() should return XFS_ITEM_PINNED ]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      43ff2122
  10. 29 3月, 2012 1 次提交
  11. 17 12月, 2011 1 次提交
  12. 12 10月, 2011 12 次提交
  13. 13 8月, 2011 1 次提交
    • C
      xfs: remove subdirectories · c59d87c4
      Christoph Hellwig 提交于
      Use the move from Linux 2.6 to Linux 3.x as an excuse to kill the
      annoying subdirectories in the XFS source code.  Besides the large
      amount of file rename the only changes are to the Makefile, a few
      files including headers with the subdirectory prefix, and the binary
      sysctl compat code that includes a header under fs/xfs/ from
      kernel/.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      c59d87c4
  14. 26 7月, 2011 1 次提交