1. 24 10月, 2013 2 次提交
  2. 01 10月, 2013 1 次提交
    • D
      xfs: lockdep needs to know about 3 dquot-deep nesting · f112a049
      Dave Chinner 提交于
      Michael Semon reported that xfs/299 generated this lockdep warning:
      
      =============================================
      [ INFO: possible recursive locking detected ]
      3.12.0-rc2+ #2 Not tainted
      ---------------------------------------------
      touch/21072 is trying to acquire lock:
       (&xfs_dquot_other_class){+.+...}, at: [<c12902fb>] xfs_trans_dqlockedjoin+0x57/0x64
      
      but task is already holding lock:
       (&xfs_dquot_other_class){+.+...}, at: [<c12902fb>] xfs_trans_dqlockedjoin+0x57/0x64
      
      other info that might help us debug this:
       Possible unsafe locking scenario:
      
             CPU0
             ----
        lock(&xfs_dquot_other_class);
        lock(&xfs_dquot_other_class);
      
       *** DEADLOCK ***
      
       May be due to missing lock nesting notation
      
      7 locks held by touch/21072:
       #0:  (sb_writers#10){++++.+}, at: [<c11185b6>] mnt_want_write+0x1e/0x3e
       #1:  (&type->i_mutex_dir_key#4){+.+.+.}, at: [<c11078ee>] do_last+0x245/0xe40
       #2:  (sb_internal#2){++++.+}, at: [<c122c9e0>] xfs_trans_alloc+0x1f/0x35
       #3:  (&(&ip->i_lock)->mr_lock/1){+.+...}, at: [<c126cd1b>] xfs_ilock+0x100/0x1f1
       #4:  (&(&ip->i_lock)->mr_lock){++++-.}, at: [<c126cf52>] xfs_ilock_nowait+0x105/0x22f
       #5:  (&dqp->q_qlock){+.+...}, at: [<c12902fb>] xfs_trans_dqlockedjoin+0x57/0x64
       #6:  (&xfs_dquot_other_class){+.+...}, at: [<c12902fb>] xfs_trans_dqlockedjoin+0x57/0x64
      
      The lockdep annotation for dquot lock nesting only understands
      locking for user and "other" dquots, not user, group and quota
      dquots. Fix the annotations to match the locking heirarchy we now
      have.
      Reported-by: NMichael L. Semon <mlsemon35@gmail.com>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      f112a049
  3. 11 9月, 2013 1 次提交
    • D
      xfs: convert dquot cache lru to list_lru · cd56a39a
      Dave Chinner 提交于
      Convert the XFS dquot lru to use the list_lru construct and convert the
      shrinker to being node aware.
      
      [glommer@openvz.org: edited for conflicts + warning fixes]
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NGlauber Costa <glommer@openvz.org>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
      Cc: Arve Hjønnevåg <arve@android.com>
      Cc: Carlos Maiolino <cmaiolino@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Chuck Lever <chuck.lever@oracle.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: J. Bruce Fields <bfields@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Kent Overstreet <koverstreet@google.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Thomas Hellstrom <thellstrom@vmware.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      cd56a39a
  4. 13 8月, 2013 3 次提交
  5. 11 7月, 2013 1 次提交
    • C
      xfs: Add pquota fields where gquota is used. · 92f8ff73
      Chandra Seetharaman 提交于
      Add project quota changes to all the places where group quota field
      is used:
         * add separate project quota members into various structures
         * split project quota and group quotas so that instead of overriding
           the group quota members incore, the new project quota members are
           used instead
         * get rid of usage of the OQUOTA flag incore, in favor of separate
           group and project quota flags.
         * add a project dquot argument to various functions.
      
      Not using the pquotino field from superblock yet.
      Signed-off-by: NChandra Seetharaman <sekharan@us.ibm.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      92f8ff73
  6. 29 6月, 2013 3 次提交
  7. 06 6月, 2013 1 次提交
    • D
      xfs: rework dquot CRCs · bb9b8e86
      Dave Chinner 提交于
      Calculating dquot CRCs when the backing buffer is written back just
      doesn't work reliably. There are several places which manipulate
      dquots directly in the buffers, and they don't calculate CRCs
      appropriately, nor do they always set the buffer up to calculate
      CRCs appropriately.
      
      Firstly, if we log a dquot buffer (e.g. during allocation) it gets
      logged without valid CRC, and so on recovery we end up with a dquot
      that is not valid.
      
      Secondly, if we recover/repair a dquot, we don't have a verifier
      attached to the buffer and hence CRCs are not calculated on the way
      down to disk.
      
      Thirdly, calculating the CRC after we've changed the contents means
      that if we re-read the dquot from the buffer, we cannot verify the
      contents of the dquot are valid, as the CRC is invalid.
      
      So, to avoid all the dquot CRC errors that are being detected by the
      read verifier, change to using the same model as for inodes. That
      is, dquot CRCs are calculated and written to the backing buffer at
      the time the dquot is flushed to the backing buffer. If we modify
      the dquot directly in the backing buffer, calculate the CRC
      immediately after the modification is complete. Hence the dquot in
      the on-disk buffer should always have a valid CRC.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      
      (cherry picked from commit 6fcdc59d)
      bb9b8e86
  8. 05 6月, 2013 1 次提交
    • D
      xfs: rework dquot CRCs · 6fcdc59d
      Dave Chinner 提交于
      Calculating dquot CRCs when the backing buffer is written back just
      doesn't work reliably. There are several places which manipulate
      dquots directly in the buffers, and they don't calculate CRCs
      appropriately, nor do they always set the buffer up to calculate
      CRCs appropriately.
      
      Firstly, if we log a dquot buffer (e.g. during allocation) it gets
      logged without valid CRC, and so on recovery we end up with a dquot
      that is not valid.
      
      Secondly, if we recover/repair a dquot, we don't have a verifier
      attached to the buffer and hence CRCs are not calculated on the way
      down to disk.
      
      Thirdly, calculating the CRC after we've changed the contents means
      that if we re-read the dquot from the buffer, we cannot verify the
      contents of the dquot are valid, as the CRC is invalid.
      
      So, to avoid all the dquot CRC errors that are being detected by the
      read verifier, change to using the same model as for inodes. That
      is, dquot CRCs are calculated and written to the backing buffer at
      the time the dquot is flushed to the backing buffer. If we modify
      the dquot directly in the backing buffer, calculate the CRC
      immediately after the modification is complete. Hence the dquot in
      the on-disk buffer should always have a valid CRC.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      6fcdc59d
  9. 22 4月, 2013 1 次提交
  10. 23 3月, 2013 2 次提交
  11. 02 2月, 2013 1 次提交
  12. 16 11月, 2012 5 次提交
    • D
      xfs: convert buffer verifiers to an ops structure. · 1813dd64
      Dave Chinner 提交于
      To separate the verifiers from iodone functions and associate read
      and write verifiers at the same time, introduce a buffer verifier
      operations structure to the xfs_buf.
      
      This avoids the need for assigning the write verifier, clearing the
      iodone function and re-running ioend processing in the read
      verifier, and gets rid of the nasty "b_pre_io" name for the write
      verifier function pointer. If we ever need to, it will also be
      easier to add further content specific callbacks to a buffer with an
      ops structure in place.
      
      We also avoid needing to export verifier functions, instead we
      can simply export the ops structures for those that are needed
      outside the function they are defined in.
      
      This patch also fixes a directory block readahead verifier issue
      it exposed.
      
      This patch also adds ops callbacks to the inode/alloc btree blocks
      initialised by growfs. These will need more work before they will
      work with CRCs.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NPhil White <pwhite@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      1813dd64
    • D
      xfs: connect up write verifiers to new buffers · b0f539de
      Dave Chinner 提交于
      Metadata buffers that are read from disk have write verifiers
      already attached to them, but newly allocated buffers do not. Add
      appropriate write verifiers to all new metadata buffers.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      b0f539de
    • D
      xfs: add pre-write metadata buffer verifier callbacks · 612cfbfe
      Dave Chinner 提交于
      These verifiers are essentially the same code as the read verifiers,
      but do not require ioend processing. Hence factor the read verifier
      functions and add a new write verifier wrapper that is used as the
      callback.
      
      This is done as one large patch for all verifiers rather than one
      patch per verifier as the change is largely mechanical. This
      includes hooking up the write verifier via the read verifier
      function.
      
      Hooking up the write verifier for buffers obtained via
      xfs_trans_get_buf() will be done in a separate patch as that touches
      code in many different places rather than just the verifier
      functions.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      612cfbfe
    • D
      xfs: verify dquot blocks as they are read from disk · c6319198
      Dave Chinner 提交于
      Add a dquot buffer verify callback function and pass it into the
      buffer read functions. This checks all the dquots in a buffer, but
      cannot completely verify the dquot ids are correct. Also, errors
      cannot be repaired, so an additional function is added to repair bad
      dquots in the buffer if such an error is detected in a context where
      repair is allowed.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NPhil White <pwhite@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      c6319198
    • D
      xfs: make buffer read verication an IO completion function · c3f8fc73
      Dave Chinner 提交于
      Add a verifier function callback capability to the buffer read
      interfaces.  This will be used by the callers to supply a function
      that verifies the contents of the buffer when it is read from disk.
      This patch does not provide callback functions, but simply modifies
      the interfaces to allow them to be called.
      
      The reason for adding this to the read interfaces is that it is very
      difficult to tell fom the outside is a buffer was just read from
      disk or whether we just pulled it out of cache. Supplying a callbck
      allows the buffer cache to use it's internal knowledge of the buffer
      to execute it only when the buffer is read from disk.
      
      It is intended that the verifier functions will mark the buffer with
      an EFSCORRUPTED error when verification fails. This allows the
      reading context to distinguish a verification error from an IO
      error, and potentially take further actions on the buffer (e.g.
      attempt repair) based on the error reported.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NPhil White <pwhite@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      c3f8fc73
  13. 15 5月, 2012 6 次提交
    • D
      xfs: move xfsagino_t to xfs_types.h · 60a34607
      Dave Chinner 提交于
      Untangle the header file includes a bit by moving the definition of
      xfs_agino_t to xfs_types.h. This removes the dependency that xfs_ag.h has on
      xfs_inum.h, meaning we don't need to include xfs_inum.h everywhere we include
      xfs_ag.h.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      60a34607
    • D
      xfs: pass shutdown method into xfs_trans_ail_delete_bulk · 04913fdd
      Dave Chinner 提交于
      xfs_trans_ail_delete_bulk() can be called from different contexts so
      if the item is not in the AIL we need different shutdown for each
      context.  Pass in the shutdown method needed so the correct action
      can be taken.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      04913fdd
    • C
      xfs: on-stack delayed write buffer lists · 43ff2122
      Christoph Hellwig 提交于
      Queue delwri buffers on a local on-stack list instead of a per-buftarg one,
      and write back the buffers per-process instead of by waking up xfsbufd.
      
      This is now easily doable given that we have very few places left that write
      delwri buffers:
      
       - log recovery:
      	Only done at mount time, and already forcing out the buffers
      	synchronously using xfs_flush_buftarg
      
       - quotacheck:
      	Same story.
      
       - dquot reclaim:
      	Writes out dirty dquots on the LRU under memory pressure.  We might
      	want to look into doing more of this via xfsaild, but it's already
      	more optimal than the synchronous inode reclaim that writes each
      	buffer synchronously.
      
       - xfsaild:
      	This is the main beneficiary of the change.  By keeping a local list
      	of buffers to write we reduce latency of writing out buffers, and
      	more importably we can remove all the delwri list promotions which
      	were hitting the buffer cache hard under sustained metadata loads.
      
      The implementation is very straight forward - xfs_buf_delwri_queue now gets
      a new list_head pointer that it adds the delwri buffers to, and all callers
      need to eventually submit the list using xfs_buf_delwi_submit or
      xfs_buf_delwi_submit_nowait.  Buffers that already are on a delwri list are
      skipped in xfs_buf_delwri_queue, assuming they already are on another delwri
      list.  The biggest change to pass down the buffer list was done to the AIL
      pushing. Now that we operate on buffers the trylock, push and pushbuf log
      item methods are merged into a single push routine, which tries to lock the
      item, and if possible add the buffer that needs writeback to the buffer list.
      This leads to much simpler code than the previous split but requires the
      individual IOP_PUSH instances to unlock and reacquire the AIL around calls
      to blocking routines.
      
      Given that xfsailds now also handle writing out buffers, the conditions for
      log forcing and the sleep times needed some small changes.  The most
      important one is that we consider an AIL busy as long we still have buffers
      to push, and the other one is that we do increment the pushed LSN for
      buffers that are under flushing at this moment, but still count them towards
      the stuck items for restart purposes.  Without this we could hammer on stuck
      items without ever forcing the log and not make progress under heavy random
      delete workloads on fast flash storage devices.
      
      [ Dave Chinner:
      	- rebase on previous patches.
      	- improved comments for XBF_DELWRI_Q handling
      	- fix XBF_ASYNC handling in queue submission (test 106 failure)
      	- rename delwri submit function buffer list parameters for clarity
      	- xfs_efd_item_push() should return XFS_ITEM_PINNED ]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      43ff2122
    • C
      xfs: do not write the buffer from xfs_qm_dqflush · fe7257fd
      Christoph Hellwig 提交于
      Instead of writing the buffer directly from inside xfs_qm_dqflush return it
      to the caller and let the caller decide what to do with the buffer.  Also
      remove the pincount check in xfs_qm_dqflush that all non-blocking callers
      already implement and the now unused flags parameter and the XFS_DQ_IS_DIRTY
      check that all callers already perform.
      
      [ Dave Chinner: fixed build error cause by missing '{'. ]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      fe7257fd
    • C
      xfs: remove log item from AIL in xfs_iflush after a shutdown · 32ce90a4
      Christoph Hellwig 提交于
      If a filesystem has been forced shutdown we are never going to write inodes
      to disk, which means the inode items will stay in the AIL until we free
      the inode. Currently that is not a problem, but a pending change requires us
      to empty the AIL before shutting down the filesystem. In that case leaving
      the inode in the AIL is lethal. Make sure to remove the log item from the AIL
      to allow emptying the AIL on shutdown filesystems.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      32ce90a4
    • C
      xfs: remove log item from AIL in xfs_qm_dqflush after a shutdown · dea96095
      Christoph Hellwig 提交于
      If a filesystem has been forced shutdown we are never going to write dquots
      to disk, which means the dquot items will stay in the AIL forever.
      Currently that is not a problem, but a pending chance requires us to
      empty the AIL before shutting down the filesystem, in which case this
      behaviour is lethal.  Make sure to remove the log item from the AIL
      to allow emptying the AIL on shutdown filesystems.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      dea96095
  14. 23 3月, 2012 1 次提交
  15. 15 3月, 2012 5 次提交
  16. 23 2月, 2012 1 次提交
  17. 22 2月, 2012 2 次提交
  18. 11 2月, 2012 2 次提交
    • C
      xfs: use a normal shrinker for the dquot freelist · 92b2e5b3
      Christoph Hellwig 提交于
      Stop reusing dquots from the freelist when allocating new ones directly, and
      implement a shrinker that actually follows the specifications for the
      interface.  The shrinker implementation is still highly suboptimal at this
      point, but we can gradually work on it.
      
      This also fixes an bug in the previous lock ordering, where we would take
      the hash and dqlist locks inside of the freelist lock against the normal
      lock ordering.  This is only solvable by introducing the dispose list,
      and thus not when using direct reclaim of unused dquots for new allocations.
      
      As a side-effect the quota upper bound and used to free ratio values in
      /proc/fs/xfs/xqm are set to 0 as these values don't make any sense in the
      new world order.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      
      (cherry picked from commit 04da0c81)
      92b2e5b3
    • C
      xfs: use a normal shrinker for the dquot freelist · 04da0c81
      Christoph Hellwig 提交于
      Stop reusing dquots from the freelist when allocating new ones directly, and
      implement a shrinker that actually follows the specifications for the
      interface.  The shrinker implementation is still highly suboptimal at this
      point, but we can gradually work on it.
      
      This also fixes an bug in the previous lock ordering, where we would take
      the hash and dqlist locks inside of the freelist lock against the normal
      lock ordering.  This is only solvable by introducing the dispose list,
      and thus not when using direct reclaim of unused dquots for new allocations.
      
      As a side-effect the quota upper bound and used to free ratio values in
      /proc/fs/xfs/xqm are set to 0 as these values don't make any sense in the
      new world order.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      04da0c81
  19. 04 2月, 2012 1 次提交