1. 30 10月, 2008 1 次提交
    • D
      [XFS] Use a cursor for AIL traversal. · 27d8d5fe
      David Chinner 提交于
      To replace the current generation number ensuring sanity of the AIL
      traversal, replace it with an external cursor that is linked to the AIL.
      
      Basically, we store the next item in the cursor whenever we want to drop
      the AIL lock to do something to the current item. When we regain the lock.
      the current item may already be free, so we can't reference it, but the
      next item in the traversal is already held in the cursor.
      
      When we move or delete an object, we search all the active cursors and if
      there is an item match we clear the cursor(s) that point to the object.
      This forces the traversal to restart transparently.
      
      We don't invalidate the cursor on insert because the cursor still points
      to a valid item. If the intem is inserted between the current item and the
      cursor it does not matter; the traversal is considered to be past the
      insertion point so it will be picked up in the next traversal.
      
      Hence traversal restarts pretty much disappear altogether with this method
      of traversal, which should substantially reduce the overhead of pushing on
      a busy AIL.
      
      Version 2 o add restart logic o comment cursor interface o minor cleanups
      
      SGI-PV: 988143
      
      SGI-Modid: xfs-linux-melb:xfs-kern:32347a
      Signed-off-by: NDavid Chinner <david@fromorbit.com>
      Signed-off-by: NLachlan McIlroy <lachlan@sgi.com>
      Signed-off-by: NChristoph Hellwig <hch@infradead.org>
      27d8d5fe
  2. 11 10月, 2008 1 次提交
  3. 17 9月, 2008 2 次提交
  4. 13 8月, 2008 4 次提交
  5. 28 7月, 2008 3 次提交
  6. 12 7月, 2008 1 次提交
    • D
      Fix reference counting race on log buffers · 49641f1a
      Dave Chinner 提交于
      When we release the iclog, we do an atomic_dec_and_lock to determine if
      we are the last reference and need to trigger update of log headers and
      writeout.  However, in xlog_state_get_iclog_space() we also need to
      check if we have the last reference count there.  If we do, we release
      the log buffer, otherwise we decrement the reference count.
      
      But the compare and decrement in xlog_state_get_iclog_space() is not
      atomic, so both places can see a reference count of 2 and neither will
      release the iclog.  That leads to a filesystem hang.
      
      Close the race by replacing the atomic_read() and atomic_dec() pair with
      atomic_add_unless() to ensure that they are executed atomically.
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      Reviewed-by: NTim Shimmin <tes@sgi.com>
      Tested-by: NEric Sandeen <sandeen@sandeen.net>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      49641f1a
  7. 18 4月, 2008 9 次提交
  8. 10 4月, 2008 1 次提交
  9. 14 2月, 2008 1 次提交
  10. 07 2月, 2008 10 次提交
    • D
      [XFS] Move AIL pushing into it's own thread · 249a8c11
      David Chinner 提交于
      When many hundreds to thousands of threads all try to do simultaneous
      transactions and the log is in a tail-pushing situation (i.e. full), we
      can get multiple threads walking the AIL list and contending on the AIL
      lock.
      
      The AIL push is, in effect, a simple I/O dispatch algorithm complicated by
      the ordering constraints placed on it by the transaction subsystem. It
      really does not need multiple threads to push on it - even when only a
      single CPU is pushing the AIL, it can push the I/O out far faster that
      pretty much any disk subsystem can handle.
      
      So, to avoid contention problems stemming from multiple list walkers, move
      the list walk off into another thread and simply provide a "target" to
      push to. When a thread requires a push, it sets the target and wakes the
      push thread, then goes to sleep waiting for the required amount of space
      to become available in the log.
      
      This mechanism should also be a lot fairer under heavy load as the waiters
      will queue in arrival order, rather than queuing in "who completed a push
      first" order.
      
      Also, by moving the pushing to a separate thread we can do more
      effectively overload detection and prevention as we can keep context from
      loop iteration to loop iteration. That is, we can push only part of the
      list each loop and not have to loop back to the start of the list every
      time we run. This should also help by reducing the number of items we try
      to lock and/or push items that we cannot move.
      
      Note that this patch is not intended to solve the inefficiencies in the
      AIL structure and the associated issues with extremely large list
      contents. That needs to be addresses separately; parallel access would
      cause problems to any new structure as well, so I'm only aiming to isolate
      the structure from unbounded parallelism here.
      
      SGI-PV: 972759
      SGI-Modid: xfs-linux-melb:xfs-kern:30371a
      Signed-off-by: NDavid Chinner <dgc@sgi.com>
      Signed-off-by: NLachlan McIlroy <lachlan@sgi.com>
      249a8c11
    • T
      [XFS] Remove the BPCSHIFT and NB* based macros from XFS. · e6a4b37f
      Tim Shimmin 提交于
      The BPCSHIFT based macros, btoc*, ctob*, offtoc* and ctooff are either not
      used or don't need to be used. The NDPP, NDPP, NBBY macros don't need to
      be used but instead are replaced directly by PAGE_SIZE and PAGE_CACHE_SIZE
      where appropriate. Initial patch and motivation from Nicolas Kaiser.
      
      SGI-PV: 971186
      SGI-Modid: xfs-linux-melb:xfs-kern:30096a
      Signed-off-by: NTim Shimmin <tes@sgi.com>
      Signed-off-by: NLachlan McIlroy <lachlan@sgi.com>
      e6a4b37f
    • N
      [XFS] Remove bogus assert · f7b7c367
      Niv Sardi 提交于
      This assert is bogus. We can have a forced shutdown occur
      between the check for the XLOG_FORCED_SHUTDOWN and the ASSERT. Also, the
      logging system shouldn't care about the state of XFS_FORCED_SHUTDOWN, it
      should only check XLOG_FORCED_SHUTDOWN. The logging system has it's own
      forced shutdown flag so, for the case of a forced shutdown that's not due
      to a logging error, we can flush the log.
      
      SGI-PV: 972985
      SGI-Modid: xfs-linux-melb:xfs-kern:30029a
      Signed-off-by: NNiv Sardi <xaiki@sgi.com>
      Signed-off-by: NDavid Chinner <dgc@sgi.com>
      Signed-off-by: NLachlan McIlroy <lachlan@sgi.com>
      f7b7c367
    • D
      [XFS] Fix up sparse warnings. · a8272ce0
      David Chinner 提交于
      These are mostly locking annotations, marking things static, casts where
      needed and declaring stuff in header files.
      
      SGI-PV: 971186
      SGI-Modid: xfs-linux-melb:xfs-kern:30002a
      Signed-off-by: NDavid Chinner <dgc@sgi.com>
      Signed-off-by: NChristoph Hellwig <hch@infradead.org>
      Signed-off-by: NLachlan McIlroy <lachlan@sgi.com>
      a8272ce0
    • C
      [XFS] xlog_rec_header/xlog_rec_ext_header endianess annotations · b53e675d
      Christoph Hellwig 提交于
      Mostly trivial conversion with one exceptions: h_num_logops was kept in
      native endian previously and only converted to big endian in xlog_sync,
      but we always keep it big endian now. With todays cpus fast byteswap
      instructions that's not an issue but the new variant keeps the code clean
      and maintainable.
      
      SGI-PV: 971186
      SGI-Modid: xfs-linux-melb:xfs-kern:29821a
      Signed-off-by: NChristoph Hellwig <hch@infradead.org>
      Signed-off-by: NLachlan McIlroy <lachlan@sgi.com>
      Signed-off-by: NTim Shimmin <tes@sgi.com>
      b53e675d
    • C
      [XFS] clean up some xfs_log_priv.h macros · 67fcb7bf
      Christoph Hellwig 提交于
      - the various assign lsn macros are replaced by a single inline,
      xlog_assign_lsn, which is equivalent to ASSIGN_ANY_LSN_HOST except
      for a more sane calling convention. ASSIGN_LSN_DISK is replaced
      by xlog_assign_lsn and a manual bytespap, and ASSIGN_LSN by the same,
      except we pass the cycle and block arguments explicitly instead of a
      log paramter. The latter two variants only had 2, respectively one
      user anyway.
      - the GET_CYCLE is replaced by a xlog_get_cycle inline with exactly the
      same calling conventions.
      - GET_CLIENT_ID is replaced by xlog_get_client_id which leaves away
      the unused arch argument. Instead of conditional defintions
      depending on host endianess we now do an unconditional swap and shift
      then, which generates equal code.
      - the unused XLOG_SET macro is removed.
      
      SGI-PV: 971186
      SGI-Modid: xfs-linux-melb:xfs-kern:29820a
      Signed-off-by: NChristoph Hellwig <hch@infradead.org>
      Signed-off-by: NLachlan McIlroy <lachlan@sgi.com>
      Signed-off-by: NTim Shimmin <tes@sgi.com>
      67fcb7bf
    • C
      [XFS] clean up some xfs_log_priv.h macros · 03bea6fe
      Christoph Hellwig 提交于
      - the various assign lsn macros are replaced by a single inline,
      xlog_assign_lsn, which is equivalent to ASSIGN_ANY_LSN_HOST except
      for a more sane calling convention. ASSIGN_LSN_DISK is replaced
      by xlog_assign_lsn and a manual bytespap, and ASSIGN_LSN by the same,
      except we pass the cycle and block arguments explicitly instead of a
      log paramter. The latter two variants only had 2, respectively one
      user anyway.
      - the GET_CYCLE is replaced by a xlog_get_cycle inline with exactly the
      same calling conventions.
      - GET_CLIENT_ID is replaced by xlog_get_client_id which leaves away
      the unused arch argument. Instead of conditional defintions
      depending on host endianess we now do an unconditional swap and shift
      then, which generates equal code.
      - the unused XLOG_SET macro is removed.
      
      SGI-PV: 971186
      SGI-Modid: xfs-linux-melb:xfs-kern:29819a
      Signed-off-by: NChristoph Hellwig <hch@infradead.org>
      Signed-off-by: NLachlan McIlroy <lachlan@sgi.com>
      Signed-off-by: NTim Shimmin <tes@sgi.com>
      03bea6fe
    • E
      [XFS] Remove spin.h · 007c61c6
      Eric Sandeen 提交于
      remove spinlock init abstraction macro in spin.h, remove the callers, and
      remove the file. Move no-op spinlock_destroy to xfs_linux.h Cleanup
      spinlock locals in xfs_mount.c
      
      SGI-PV: 970382
      SGI-Modid: xfs-linux-melb:xfs-kern:29751a
      Signed-off-by: NEric Sandeen <sandeen@sandeen.net>
      Signed-off-by: NDonald Douwsma <donaldd@sgi.com>
      Signed-off-by: NLachlan McIlroy <lachlan@sgi.com>
      Signed-off-by: NTim Shimmin <tes@sgi.com>
      007c61c6
    • E
      [XFS] Unwrap GRANT_LOCK. · c8b5ea28
      Eric Sandeen 提交于
      Un-obfuscate GRANT_LOCK, remove GRANT_LOCK->mutex_lock->spin_lock macros,
      call spin_lock directly, remove extraneous cookie holdover from old xfs
      code, and change lock type to spinlock_t.
      
      SGI-PV: 970382
      SGI-Modid: xfs-linux-melb:xfs-kern:29741a
      Signed-off-by: NEric Sandeen <sandeen@sandeen.net>
      Signed-off-by: NDonald Douwsma <donaldd@sgi.com>
      Signed-off-by: NTim Shimmin <tes@sgi.com>
      c8b5ea28
    • E
      [XFS] Unwrap LOG_LOCK. · b22cd72c
      Eric Sandeen 提交于
      Un-obfuscate LOG_LOCK, remove LOG_LOCK->mutex_lock->spin_lock macros, call
      spin_lock directly, remove extraneous cookie holdover from old xfs code,
      and change lock type to spinlock_t.
      
      SGI-PV: 970382
      SGI-Modid: xfs-linux-melb:xfs-kern:29740a
      Signed-off-by: NEric Sandeen <sandeen@sandeen.net>
      Signed-off-by: NDonald Douwsma <donaldd@sgi.com>
      Signed-off-by: NTim Shimmin <tes@sgi.com>
      b22cd72c
  11. 16 10月, 2007 2 次提交
  12. 15 10月, 2007 2 次提交
  13. 05 9月, 2007 1 次提交
  14. 14 7月, 2007 2 次提交
    • D
      [XFS] Lazy Superblock Counters · 92821e2b
      David Chinner 提交于
      When we have a couple of hundred transactions on the fly at once, they all
      typically modify the on disk superblock in some way.
      create/unclink/mkdir/rmdir modify inode counts, allocation/freeing modify
      free block counts.
      
      When these counts are modified in a transaction, they must eventually lock
      the superblock buffer and apply the mods. The buffer then remains locked
      until the transaction is committed into the incore log buffer. The result
      of this is that with enough transactions on the fly the incore superblock
      buffer becomes a bottleneck.
      
      The result of contention on the incore superblock buffer is that
      transaction rates fall - the more pressure that is put on the superblock
      buffer, the slower things go.
      
      The key to removing the contention is to not require the superblock fields
      in question to be locked. We do that by not marking the superblock dirty
      in the transaction. IOWs, we modify the incore superblock but do not
      modify the cached superblock buffer. In short, we do not log superblock
      modifications to critical fields in the superblock on every transaction.
      In fact we only do it just before we write the superblock to disk every
      sync period or just before unmount.
      
      This creates an interesting problem - if we don't log or write out the
      fields in every transaction, then how do the values get recovered after a
      crash? the answer is simple - we keep enough duplicate, logged information
      in other structures that we can reconstruct the correct count after log
      recovery has been performed.
      
      It is the AGF and AGI structures that contain the duplicate information;
      after recovery, we walk every AGI and AGF and sum their individual
      counters to get the correct value, and we do a transaction into the log to
      correct them. An optimisation of this is that if we have a clean unmount
      record, we know the value in the superblock is correct, so we can avoid
      the summation walk under normal conditions and so mount/recovery times do
      not change under normal operation.
      
      One wrinkle that was discovered during development was that the blocks
      used in the freespace btrees are never accounted for in the AGF counters.
      This was once a valid optimisation to make; when the filesystem is full,
      the free space btrees are empty and consume no space. Hence when it
      matters, the "accounting" is correct. But that means the when we do the
      AGF summations, we would not have a correct count and xfs_check would
      complain. Hence a new counter was added to track the number of blocks used
      by the free space btrees. This is an *on-disk format change*.
      
      As a result of this, lazy superblock counters are a mkfs option and at the
      moment on linux there is no way to convert an old filesystem. This is
      possible - xfs_db can be used to twiddle the right bits and then
      xfs_repair will do the format conversion for you. Similarly, you can
      convert backwards as well. At some point we'll add functionality to
      xfs_admin to do the bit twiddling easily....
      
      SGI-PV: 964999
      SGI-Modid: xfs-linux-melb:xfs-kern:28652a
      Signed-off-by: NDavid Chinner <dgc@sgi.com>
      Signed-off-by: NChristoph Hellwig <hch@infradead.org>
      Signed-off-by: NTim Shimmin <tes@sgi.com>
      92821e2b
    • D
      [XFS] Fix vmalloc leak on mount/unmount. · 511105b3
      David Chinner 提交于
      When setting the length of the iclogbuf to write out we should just be
      changing the desired byte count rather completely reassociating the buffer
      memory with the buffer. Reassociating the buffer memory changes the
      apparent length of the buffer and hence when we free the buffer, we don't
      free all the vmap()d space we originally allocated.
      
      SGI-PV: 964983
      SGI-Modid: xfs-linux-melb:xfs-kern:28640a
      Signed-off-by: NDavid Chinner <dgc@sgi.com>
      Signed-off-by: NChristoph Hellwig <hch@infradead.org>
      Signed-off-by: NTim Shimmin <tes@sgi.com>
      511105b3