1. 12 1月, 2011 1 次提交
    • D
      xfs: introduce xfs_rw_lock() helpers for locking the inode · 487f84f3
      Dave Chinner 提交于
      We need to obtain the i_mutex, i_iolock and i_ilock during the read
      and write paths. Add a set of wrapper functions to neatly
      encapsulate the lock ordering and shared/exclusive semantics to make
      the locking easier to follow and get right.
      
      Note that this changes some of the exclusive locking serialisation in
      that serialisation will occur against the i_mutex instead of the
      XFS_IOLOCK_EXCL. This does not change any behaviour, and it is
      arguably more efficient to use the mutex for such serialisation than
      the rw_sem.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      487f84f3
  2. 11 1月, 2011 3 次提交
  3. 21 12月, 2010 5 次提交
    • D
      xfs: introduce new locks for the log grant ticket wait queues · 3f16b985
      Dave Chinner 提交于
      The log grant ticket wait queues are currently protected by the log
      grant lock.  However, the queues are functionally independent from
      each other, and operations on them only require serialisation
      against other queue operations now that all of the other log
      variables they use are atomic values.
      
      Hence, we can make them independent of the grant lock by introducing
      new locks just to protect the lists operations. because the lists
      are independent, we can use a lock per list and ensure that reserve
      and write head queuing do not contend.
      
      To ensure forced shutdowns work correctly in conjunction with the
      new fast paths, ensure that we check whether the log has been shut
      down in the grant functions once we hold the relevant spin locks but
      before we go to sleep. This is needed to co-ordinate correctly with
      the wakeups that are issued on the ticket queues so we don't leave
      any processes sleeping on the queues during a shutdown.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      3f16b985
    • D
      xfs: convert l_tail_lsn to an atomic variable. · 1c3cb9ec
      Dave Chinner 提交于
      log->l_tail_lsn is currently protected by the log grant lock. The
      lock is only needed for serialising readers against writers, so we
      don't really need the lock if we make the l_tail_lsn variable an
      atomic. Converting the l_tail_lsn variable to an atomic64_t means we
      can start to peel back the grant lock from various operations.
      
      Also, provide functions to safely crack an atomic LSN variable into
      it's component pieces and to recombined the components into an
      atomic variable. Use them where appropriate.
      
      This also removes the need for explicitly holding a spinlock to read
      the l_tail_lsn on 32 bit platforms.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      
      1c3cb9ec
    • D
      xfs: use wait queues directly for the log wait queues · eb40a875
      Dave Chinner 提交于
      The log grant queues are one of the few places left using sv_t
      constructs for waiting. Given we are touching this code, we should
      convert them to plain wait queues. While there, convert all the
      other sv_t users in the log code as well.
      
      Seeing as this removes the last users of the sv_t type, remove the
      header file defining the wrapper and the fragments that still
      reference it.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      eb40a875
    • D
      xfs: combine grant heads into a single 64 bit integer · a69ed03c
      Dave Chinner 提交于
      Prepare for switching the grant heads to atomic variables by
      combining the two 32 bit values that make up the grant head into a
      single 64 bit variable.  Provide wrapper functions to combine and
      split the grant heads appropriately for calculations and use them as
      necessary.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      a69ed03c
    • D
      xfs: convert log grant ticket queues to list heads · 10547941
      Dave Chinner 提交于
      The grant write and reserve queues use a roll-your-own double linked
      list, so convert it to a standard list_head structure and convert
      all the list traversals to use list_for_each_entry(). We can also
      get rid of the XLOG_TIC_IN_Q flag as we can use the list_empty()
      check to tell if the ticket is in a list or not.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      10547941
  4. 17 12月, 2010 1 次提交
    • D
      xfs: reduce the number of AIL push wakeups · e677d0f9
      Dave Chinner 提交于
      The xfaild often tries to rest to wait for congestion to pass of for
      IO to complete, but is regularly woken in tail-pushing situations.
      In severe cases, the xfsaild is getting woken tens of thousands of
      times a second. Reduce the number needless wakeups by only waking
      the xfsaild if the new target is larger than the old one. Further
      make short sleeps uninterruptible as they occur when the xfsaild has
      decided it needs to back off to allow some IO to complete and being
      woken early is counter-productive.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      e677d0f9
  5. 02 12月, 2010 2 次提交
    • D
      xfs: connect up buffer reclaim priority hooks · 821eb21d
      Dave Chinner 提交于
      Now that the buffer reclaim infrastructure can handle different reclaim
      priorities for different types of buffers, reconnect the hooks in the
      XFS code that has been sitting dormant since it was ported to Linux. This
      should finally give use reclaim prioritisation that is on a par with the
      functionality that Irix provided XFS 15 years ago.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      821eb21d
    • D
      xfs: add a lru to the XFS buffer cache · 430cbeb8
      Dave Chinner 提交于
      Introduce a per-buftarg LRU for memory reclaim to operate on. This
      is the last piece we need to put in place so that we can fully
      control the buffer lifecycle. This allows XFS to be responsibile for
      maintaining the working set of buffers under memory pressure instead
      of relying on the VM reclaim not to take pages we need out from
      underneath us.
      
      The implementation introduces a b_lru_ref counter into the buffer.
      This is currently set to 1 whenever the buffer is referenced and so is used to
      determine if the buffer should be added to the LRU or not when freed.
      Effectively it allows lazy LRU initialisation of the buffer so we do not need
      to touch the LRU list and locks in xfs_buf_find().
      
      Instead, when the buffer is being released and we drop the last
      reference to it, we check the b_lru_ref count and if it is none zero
      we re-add the buffer reference and add the inode to the LRU. The
      b_lru_ref counter is decremented by the shrinker, and whenever the
      shrinker comes across a buffer with a zero b_lru_ref counter, if
      released the LRU reference on the buffer. In the absence of a lookup
      race, this will result in the buffer being freed.
      
      This counting mechanism is used instead of a reference flag so that
      it is simple to re-introduce buffer-type specific reclaim reference
      counts to prioritise reclaim more effectively. We still have all
      those hooks in the XFS code, so this will provide the infrastructure
      to re-implement that functionality.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      430cbeb8
  6. 30 11月, 2010 1 次提交
  7. 16 12月, 2010 1 次提交
  8. 17 12月, 2010 1 次提交
    • D
      xfs: convert inode cache lookups to use RCU locking · 1a3e8f3d
      Dave Chinner 提交于
      With delayed logging greatly increasing the sustained parallelism of inode
      operations, the inode cache locking is showing significant read vs write
      contention when inode reclaim runs at the same time as lookups. There is
      also a lot more write lock acquistions than there are read locks (4:1 ratio)
      so the read locking is not really buying us much in the way of parallelism.
      
      To avoid the read vs write contention, change the cache to use RCU locking on
      the read side. To avoid needing to RCU free every single inode, use the built
      in slab RCU freeing mechanism. This requires us to be able to detect lookups of
      freed inodes, so enѕure that ever freed inode has an inode number of zero and
      the XFS_IRECLAIM flag set. We already check the XFS_IRECLAIM flag in cache hit
      lookup path, but also add a check for a zero inode number as well.
      
      We canthen convert all the read locking lockups to use RCU read side locking
      and hence remove all read side locking.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      1a3e8f3d
  9. 23 12月, 2010 1 次提交
    • D
      xfs: provide a inode iolock lockdep class · dcfcf205
      Dave Chinner 提交于
      The XFS iolock needs to be re-initialised to a new lock class before
      it enters reclaim to prevent lockdep false positives. Unfortunately,
      this is not sufficient protection as inodes in the XFS_IRECLAIMABLE
      state can be recycled and not re-initialised before being reused.
      
      We need to re-initialise the lock state when transfering out of
      XFS_IRECLAIMABLE state to XFS_INEW, but we need to keep the same
      class as if the inode was just allocated. Hence we need a specific
      lockdep class variable for the iolock so that both initialisations
      use the same class.
      
      While there, add a specific class for inodes in the reclaim state so
      that it is easy to tell from lockdep reports what state the inode
      was in that generated the report.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      dcfcf205
  10. 17 12月, 2010 11 次提交
  11. 01 12月, 2010 2 次提交
    • D
      xfs: push stale, pinned buffers on trylock failures · 90810b9e
      Dave Chinner 提交于
      As reported by Nick Piggin, XFS is suffering from long pauses under
      highly concurrent workloads when hosted on ramdisks. The problem is
      that an inode buffer is stuck in the pinned state in memory and as a
      result either the inode buffer or one of the inodes within the
      buffer is stopping the tail of the log from being moved forward.
      
      The system remains in this state until a periodic log force issued
      by xfssyncd causes the buffer to be unpinned. The main problem is
      that these are stale buffers, and are hence held locked until the
      transaction/checkpoint that marked them state has been committed to
      disk. When the filesystem gets into this state, only the xfssyncd
      can cause the async transactions to be committed to disk and hence
      unpin the inode buffer.
      
      This problem was encountered when scaling the busy extent list, but
      only the blocking lock interface was fixed to solve the problem.
      Extend the same fix to the buffer trylock operations - if we fail to
      lock a pinned, stale buffer, then force the log immediately so that
      when the next attempt to lock it comes around, it will have been
      unpinned.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      90810b9e
    • D
      xfs: fix failed write truncation handling. · c726de44
      Dave Chinner 提交于
      Since the move to the new truncate sequence we call xfs_setattr to
      truncate down excessively instanciated blocks.  As shown by the testcase
      in kernel.org BZ #22452 that doesn't work too well.  Due to the confusion
      of the internal inode size, and the VFS inode i_size it zeroes data that
      it shouldn't.
      
      But full blown truncate seems like overkill here.  We only instanciate
      delayed allocations in the write path, and given that we never released
      the iolock we can't have converted them to real allocations yet either.
      
      The only nasty case is pre-existing preallocation which we need to skip.
      We already do this for page discard during writeback, so make the delayed
      allocation block punching a generic function and call it from the failed
      write path as well as xfs_aops_discard_page. The callers are
      responsible for ensuring that partial blocks are not truncated away,
      and that they hold the ilock.
      
      Based on a fix originally from Christoph Hellwig. This version used
      filesystem blocks as the range unit.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      c726de44
  12. 11 11月, 2010 6 次提交
  13. 29 10月, 2010 1 次提交
  14. 27 10月, 2010 1 次提交
    • W
      writeback: remove nonblocking/encountered_congestion references · 1b430bee
      Wu Fengguang 提交于
      This removes more dead code that was somehow missed by commit 0d99519e
      (writeback: remove unused nonblocking and congestion checks).  There are
      no behavior change except for the removal of two entries from one of the
      ext4 tracing interface.
      
      The nonblocking checks in ->writepages are no longer used because the
      flusher now prefer to block on get_request_wait() than to skip inodes on
      IO congestion.  The latter will lead to more seeky IO.
      
      The nonblocking checks in ->writepage are no longer used because it's
      redundant with the WB_SYNC_NONE check.
      
      We no long set ->nonblocking in VM page out and page migration, because
      a) it's effectively redundant with WB_SYNC_NONE in current code
      b) it's old semantic of "Don't get stuck on request queues" is mis-behavior:
         that would skip some dirty inodes on congestion and page out others, which
         is unfair in terms of LRU age.
      
      Inspired by Christoph Hellwig. Thanks!
      Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Sage Weil <sage@newdream.net>
      Cc: Steve French <sfrench@samba.org>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Christoph Hellwig <hch@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1b430bee
  15. 26 10月, 2010 3 次提交