1. 12 10月, 2011 4 次提交
  2. 11 8月, 2011 1 次提交
  3. 26 7月, 2011 8 次提交
  4. 13 7月, 2011 3 次提交
  5. 08 7月, 2011 1 次提交
  6. 31 3月, 2011 1 次提交
  7. 07 3月, 2011 1 次提交
  8. 28 1月, 2011 1 次提交
    • D
      xfs: fix efi item leak on forced shutdown · e34a314c
      Dave Chinner 提交于
      After test 139, kmemleak shows:
      
      unreferenced object 0xffff880078b405d8 (size 400):
        comm "xfs_io", pid 4904, jiffies 4294909383 (age 1186.728s)
        hex dump (first 32 bytes):
          60 c1 17 79 00 88 ff ff 60 c1 17 79 00 88 ff ff  `..y....`..y....
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<ffffffff81afb04d>] kmemleak_alloc+0x2d/0x60
          [<ffffffff8115c6cf>] kmem_cache_alloc+0x13f/0x2b0
          [<ffffffff814aaa97>] kmem_zone_alloc+0x77/0xf0
          [<ffffffff814aab2e>] kmem_zone_zalloc+0x1e/0x50
          [<ffffffff8147cd6b>] xfs_efi_init+0x4b/0xb0
          [<ffffffff814a4ee8>] xfs_trans_get_efi+0x58/0x90
          [<ffffffff81455fab>] xfs_bmap_finish+0x8b/0x1d0
          [<ffffffff814851b4>] xfs_itruncate_finish+0x2c4/0x5d0
          [<ffffffff814a970f>] xfs_setattr+0x8df/0xa70
          [<ffffffff814b5c7b>] xfs_vn_setattr+0x1b/0x20
          [<ffffffff8117dc00>] notify_change+0x170/0x2e0
          [<ffffffff81163bf6>] do_truncate+0x66/0xa0
          [<ffffffff81163d0b>] sys_ftruncate+0xdb/0xe0
          [<ffffffff8103a002>] system_call_fastpath+0x16/0x1b
          [<ffffffffffffffff>] 0xffffffffffffffff
      
      The cause of the leak is that the "remove" parameter of IOP_UNPIN()
      is never set when a CIL push is aborted. This means that the EFI
      item is never freed if it was in the push being cancelled. The
      problem is specific to delayed logging, but has uncovered a couple
      of problems with the handling of IOP_UNPIN(remove).
      
      Firstly, we cannot safely call xfs_trans_del_item() from IOP_UNPIN()
      in the CIL commit failure path or the iclog write failure path
      because for delayed loging we have no transaction context. Hence we
      must only call xfs_trans_del_item() if the log item being unpinned
      has an active log item descriptor.
      
      Secondly, xfs_trans_uncommit() does not handle log item descriptor
      freeing during the traversal of log items on a transaction. It can
      reference a freed log item descriptor when unpinning an EFI item.
      Hence it needs to use a safe list traversal method to allow items to
      be removed from the transaction during IOP_UNPIN().
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      e34a314c
  9. 12 1月, 2011 1 次提交
    • C
      xfs: fix error handling for synchronous writes · bfc60177
      Christoph Hellwig 提交于
      If we get an IO error on a synchronous superblock write, we attach an
      error release function to it so that when the last reference goes away
      the release function is called and the buffer is invalidated and
      unlocked. The buffer is left locked until the release function is
      called so that other concurrent users of the buffer will be locked out
      until the buffer error is fully processed.
      
      Unfortunately, for the superblock buffer the filesyetm itself holds a
      reference to the buffer which prevents the reference count from
      dropping to zero and the release function being called. As a result,
      once an IO error occurs on a sync write, the buffer will never be
      unlocked and all future attempts to lock the buffer will hang.
      
      To make matters worse, this problems is not unique to such buffers;
      if there is a concurrent _xfs_buf_find() running, the lookup will grab
      a reference to the buffer and then wait on the buffer lock, preventing
      the reference count from ever falling to zero and hence unlocking the
      buffer.
      
      As such, the whole b_relse function implementation is broken because it
      cannot rely on the buffer reference count falling to zero to unlock the
      errored buffer. The synchronous write error path is the only path that
      uses this callback - it is used to ensure that the synchronous waiter
      gets the buffer error before the error state is cleared from the buffer
      by the release function.
      
      Given that the only sychronous buffer writes now go through xfs_bwrite
      and the error path in question can only occur for a write of a dirty,
      logged buffer, we can move most of the b_relse processing to happen
      inline in xfs_buf_iodone_callbacks, just like a normal I/O completion.
      In addition to that we make sure the error is not cleared in
      xfs_buf_iodone_callbacks, so that xfs_bwrite can reliably check it.
      Given that xfs_bwrite keeps the buffer locked until it has waited for
      it and checked the error this allows to reliably propagate the error
      to the caller, and make sure that the buffer is reliably unlocked.
      
      Given that xfs_buf_iodone_callbacks was the only instance of the
      b_relse callback we can remove it entirely.
      
      Based on earlier patches by Dave Chinner and Ajeet Yadav.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reported-by: NAjeet Yadav <ajeet.yadav.77@gmail.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      bfc60177
  10. 03 12月, 2010 1 次提交
    • D
      xfs: consume iodone callback items on buffers as they are processed · c90821a2
      Dave Chinner 提交于
      To allow buffer iodone callbacks to consume multiple items off the
      callback list, first we need to convert the xfs_buf_do_callbacks()
      to consume items and always pull the next item from the head of the
      list.
      
      The means the item list walk is never dependent on knowing the
      next item on the list and hence allows callbacks to remove items
      from the list as well. This allows callbacks to do bulk operations
      by scanning the list for identical callbacks, consuming them all
      and then processing them in bulk, negating the need for multiple
      callbacks of that type.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      c90821a2
  11. 19 10月, 2010 2 次提交
  12. 27 7月, 2010 8 次提交
    • C
      xfs: kill the b_strat callback in xfs_buf · 939d723b
      Christoph Hellwig 提交于
      The b_strat callback is used by xfs_buf_iostrategy to perform additional
      checks before submitting a buffer.  It is used in xfs_bwrite and when
      writing out delayed buffers.  In xfs_bwrite it we can de-virtualize the
      call easily as b_strat is set a few lines above the call to
      xfs_buf_iostrategy.  For the delayed buffers the rationale is a bit
      more complicated:
      
       - there are three callers of xfs_buf_delwri_queue, which places buffers
         on the delwri list:
          (1) xfs_bdwrite - this sets up b_strat, so it's fine
          (2) xfs_buf_iorequest.  None of the callers can have XBF_DELWRI set:
      	- xlog_bdstrat is only used for log buffers, which are never delwri
      	- _xfs_buf_read explicitly clears the delwri flag
      	- xfs_buf_iodone_work retries log buffers only
      	- xfsbdstrat - only used for reads, superblock writes without the
      	  delwri flag, log I/O and file zeroing with explicitly allocated
      	  buffers.
      	- xfs_buf_iostrategy - only calls xfs_buf_iorequest if b_strat is
      	  not set
          (3) xfs_buf_unlock
      	- only puts the buffer on the delwri list if the DELWRI flag is
      	  already set.  The DELWRI flag is only ever set in xfs_bwrite,
      	  xfs_buf_iodone_callbacks, or xfs_trans_log_buf.  For
      	  xfs_buf_iodone_callbacks and xfs_trans_log_buf we require
      	  an initialized buf item, which means b_strat was set to
      	  xfs_bdstrat_cb in xfs_buf_item_init.
      
      Conclusion: we can just get rid of the callback and replace it with
      explicit calls to xfs_bdstrat_cb.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      939d723b
    • C
      xfs: fix the xfs_log_iovec i_addr type · 4e0d5f92
      Christoph Hellwig 提交于
      By making this member a void pointer we can get rid of a lot of pointless
      casts.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      4e0d5f92
    • C
      xfs: simplify buffer pinning · 4d16e924
      Christoph Hellwig 提交于
      Get rid of the xfs_buf_pin/xfs_buf_unpin/xfs_buf_ispin helpers and opencode
      them in their only callers, just like we did for the inode pinning a while
      ago.  Also remove duplicate trace points - the bufitem tracepoints cover
      all the information that is present in a buffer tracepoint.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      4d16e924
    • C
      xfs: give li_cb callbacks the correct prototype · ca30b2a7
      Christoph Hellwig 提交于
      Stop the function pointer casting madness and give all the li_cb instances
      correct prototype.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      ca30b2a7
    • C
      xfs: give xfs_item_ops methods the correct prototypes · 7bfa31d8
      Christoph Hellwig 提交于
      Stop the function pointer casting madness and give all the xfs_item_ops the
      correct prototypes.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      7bfa31d8
    • C
      xfs: merge iop_unpin_remove into iop_unpin · 9412e318
      Christoph Hellwig 提交于
      The unpin_remove item operation instances always share most of the
      implementation with the respective unpin implementation.  So instead
      of keeping two different entry points add a remove flag to the unpin
      operation and share the code more easily.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      9412e318
    • C
      xfs: simplify log item descriptor tracking · e98c414f
      Christoph Hellwig 提交于
      Currently we track log item descriptor belonging to a transaction using a
      complex opencoded chunk allocator.  This code has been there since day one
      and seems to work around the lack of an efficient slab allocator.
      
      This patch replaces it with dynamically allocated log item descriptors
      from a dedicated slab pool, linked to the transaction by a linked list.
      
      This allows to greatly simplify the log item descriptor tracking to the
      point where it's just a couple hundred lines in xfs_trans.c instead of
      a separate file.  The external API has also been simplified while we're
      at it - the xfs_trans_add_item and xfs_trans_del_item functions to add/
      delete items from a transaction have been simplified to the bare minium,
      and the xfs_trans_find_item function is replaced with a direct dereference
      of the li_desc field.  All debug code walking the list of log items in
      a transaction is down to a simple list_for_each_entry.
      
      Note that we could easily use a singly linked list here instead of the
      double linked list from list.h as the fastpath only does deletion from
      sequential traversal.  But given that we don't have one available as
      a library function yet I use the list.h functions for simplicity.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      e98c414f
    • C
      xfs: drop dmapi hooks · 288699fe
      Christoph Hellwig 提交于
      Dmapi support was never merged upstream, but we still have a lot of hooks
      bloating XFS for it, all over the fast pathes of the filesystem.
      
      This patch drops over 700 lines of dmapi overhead.  If we'll ever get HSM
      support in mainline at least the namespace events can be done much saner
      in the VFS instead of the individual filesystem, so it's not like this
      is much help for future work.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      288699fe
  13. 24 5月, 2010 3 次提交
    • D
      xfs: Ensure inode allocation buffers are fully replayed · ccf7c23f
      Dave Chinner 提交于
      With delayed logging, we can get inode allocation buffers in the
      same transaction inode unlink buffers. We don't currently mark inode
      allocation buffers in the log, so inode unlink buffers take
      precedence over allocation buffers.
      
      The result is that when they are combined into the same checkpoint,
      only the unlinked inode chain fields are replayed, resulting in
      uninitialised inode buffers being detected when the next inode
      modification is replayed.
      
      To fix this, we need to ensure that we do not set the inode buffer
      flag in the buffer log item format flags if the inode allocation has
      not already hit the log. To avoid requiring a change to log
      recovery, we really need to make this a modification that relies
      only on in-memory sate.
      
      We can do this by checking during buffer log formatting (while the
      CIL cannot be flushed) if we are still in the same sequence when we
      commit the unlink transaction as the inode allocation transaction.
      If we are, then we do not add the inode buffer flag to the buffer
      log format item flags. This means the entire buffer will be
      replayed, not just the unlinked fields. We do this while
      CIL flusheѕ are locked out to ensure that we don't race with the
      sequence numbers changing and hence fail to put the inode buffer
      flag in the buffer format flags when we really need to.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      ccf7c23f
    • D
      xfs: Clean up XFS_BLI_* flag namespace · c1155410
      Dave Chinner 提交于
      Clean up the buffer log format (XFS_BLI_*) flags because they have a
      polluted namespace. They XFS_BLI_ prefix is used for both in-memory
      and on-disk flag feilds, but have overlapping values for different
      flags. Rename the buffer log format flags to use the XFS_BLF_*
      prefix to avoid confusing them with the in-memory XFS_BLI_* prefixed
      flags.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      c1155410
    • D
      xfs: modify buffer item reference counting · 64fc35de
      Dave Chinner 提交于
      The buffer log item reference counts used to take referenceѕ for every
      transaction, similar to the pin counting. This is symmetric (like the
      pin/unpin) with respect to transaction completion, but with dleayed logging
      becomes assymetric as the pinning becomes assymetric w.r.t. transaction
      completion.
      
      To make both cases the same, allow the buffer pinning to take a reference to
      the buffer log item and always drop the reference the transaction has on it
      when being unlocked. This is balanced correctly because the unpin operation
      always drops a reference to the log item. Hence reference counting becomes
      symmetric w.r.t. item pinning as well as w.r.t active transactions and as a
      result the reference counting model remain consistent between normal and
      delayed logging.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      64fc35de
  14. 19 5月, 2010 2 次提交
  15. 02 2月, 2010 1 次提交
    • D
      xfs: Don't issue buffer IO direct from AIL push V2 · d808f617
      Dave Chinner 提交于
      All buffers logged into the AIL are marked as delayed write.
      When the AIL needs to push the buffer out, it issues an async write of the
      buffer. This means that IO patterns are dependent on the order of
      buffers in the AIL.
      
      Instead of flushing the buffer, promote the buffer in the delayed
      write list so that the next time the xfsbufd is run the buffer will
      be flushed by the xfsbufd. Return the state to the xfsaild that the
      buffer was promoted so that the xfsaild knows that it needs to cause
      the xfsbufd to run to flush the buffers that were promoted.
      
      Using the xfsbufd for issuing the IO allows us to dispatch all
      buffer IO from the one queue. This means that we can make much more
      enlightened decisions on what order to flush buffers to disk as
      we don't have multiple places issuing IO. Optimisations to xfsbufd
      will be in a future patch.
      
      Version 2
      - kill XFS_ITEM_FLUSHING as it is now unused.
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      d808f617
  16. 22 1月, 2010 1 次提交
  17. 15 12月, 2009 1 次提交
    • C
      xfs: event tracing support · 0b1b213f
      Christoph Hellwig 提交于
      Convert the old xfs tracing support that could only be used with the
      out of tree kdb and xfsidbg patches to use the generic event tracer.
      
      To use it make sure CONFIG_EVENT_TRACING is enabled and then enable
      all xfs trace channels by:
      
         echo 1 > /sys/kernel/debug/tracing/events/xfs/enable
      
      or alternatively enable single events by just doing the same in one
      event subdirectory, e.g.
      
         echo 1 > /sys/kernel/debug/tracing/events/xfs/xfs_ihold/enable
      
      or set more complex filters, etc. In Documentation/trace/events.txt
      all this is desctribed in more detail.  To reads the events do a
      
         cat /sys/kernel/debug/tracing/trace
      
      Compared to the last posting this patch converts the tracing mostly to
      the one tracepoint per callsite model that other users of the new
      tracing facility also employ.  This allows a very fine-grained control
      of the tracing, a cleaner output of the traces and also enables the
      perf tool to use each tracepoint as a virtual performance counter,
           allowing us to e.g. count how often certain workloads git various
           spots in XFS.  Take a look at
      
          http://lwn.net/Articles/346470/
      
      for some examples.
      
      Also the btree tracing isn't included at all yet, as it will require
      additional core tracing features not in mainline yet, I plan to
      deliver it later.
      
      And the really nice thing about this patch is that it actually removes
      many lines of code while adding this nice functionality:
      
       fs/xfs/Makefile                |    8
       fs/xfs/linux-2.6/xfs_acl.c     |    1
       fs/xfs/linux-2.6/xfs_aops.c    |   52 -
       fs/xfs/linux-2.6/xfs_aops.h    |    2
       fs/xfs/linux-2.6/xfs_buf.c     |  117 +--
       fs/xfs/linux-2.6/xfs_buf.h     |   33
       fs/xfs/linux-2.6/xfs_fs_subr.c |    3
       fs/xfs/linux-2.6/xfs_ioctl.c   |    1
       fs/xfs/linux-2.6/xfs_ioctl32.c |    1
       fs/xfs/linux-2.6/xfs_iops.c    |    1
       fs/xfs/linux-2.6/xfs_linux.h   |    1
       fs/xfs/linux-2.6/xfs_lrw.c     |   87 --
       fs/xfs/linux-2.6/xfs_lrw.h     |   45 -
       fs/xfs/linux-2.6/xfs_super.c   |  104 ---
       fs/xfs/linux-2.6/xfs_super.h   |    7
       fs/xfs/linux-2.6/xfs_sync.c    |    1
       fs/xfs/linux-2.6/xfs_trace.c   |   75 ++
       fs/xfs/linux-2.6/xfs_trace.h   | 1369 +++++++++++++++++++++++++++++++++++++++++
       fs/xfs/linux-2.6/xfs_vnode.h   |    4
       fs/xfs/quota/xfs_dquot.c       |  110 ---
       fs/xfs/quota/xfs_dquot.h       |   21
       fs/xfs/quota/xfs_qm.c          |   40 -
       fs/xfs/quota/xfs_qm_syscalls.c |    4
       fs/xfs/support/ktrace.c        |  323 ---------
       fs/xfs/support/ktrace.h        |   85 --
       fs/xfs/xfs.h                   |   16
       fs/xfs/xfs_ag.h                |   14
       fs/xfs/xfs_alloc.c             |  230 +-----
       fs/xfs/xfs_alloc.h             |   27
       fs/xfs/xfs_alloc_btree.c       |    1
       fs/xfs/xfs_attr.c              |  107 ---
       fs/xfs/xfs_attr.h              |   10
       fs/xfs/xfs_attr_leaf.c         |   14
       fs/xfs/xfs_attr_sf.h           |   40 -
       fs/xfs/xfs_bmap.c              |  507 +++------------
       fs/xfs/xfs_bmap.h              |   49 -
       fs/xfs/xfs_bmap_btree.c        |    6
       fs/xfs/xfs_btree.c             |    5
       fs/xfs/xfs_btree_trace.h       |   17
       fs/xfs/xfs_buf_item.c          |   87 --
       fs/xfs/xfs_buf_item.h          |   20
       fs/xfs/xfs_da_btree.c          |    3
       fs/xfs/xfs_da_btree.h          |    7
       fs/xfs/xfs_dfrag.c             |    2
       fs/xfs/xfs_dir2.c              |    8
       fs/xfs/xfs_dir2_block.c        |   20
       fs/xfs/xfs_dir2_leaf.c         |   21
       fs/xfs/xfs_dir2_node.c         |   27
       fs/xfs/xfs_dir2_sf.c           |   26
       fs/xfs/xfs_dir2_trace.c        |  216 ------
       fs/xfs/xfs_dir2_trace.h        |   72 --
       fs/xfs/xfs_filestream.c        |    8
       fs/xfs/xfs_fsops.c             |    2
       fs/xfs/xfs_iget.c              |  111 ---
       fs/xfs/xfs_inode.c             |   67 --
       fs/xfs/xfs_inode.h             |   76 --
       fs/xfs/xfs_inode_item.c        |    5
       fs/xfs/xfs_iomap.c             |   85 --
       fs/xfs/xfs_iomap.h             |    8
       fs/xfs/xfs_log.c               |  181 +----
       fs/xfs/xfs_log_priv.h          |   20
       fs/xfs/xfs_log_recover.c       |    1
       fs/xfs/xfs_mount.c             |    2
       fs/xfs/xfs_quota.h             |    8
       fs/xfs/xfs_rename.c            |    1
       fs/xfs/xfs_rtalloc.c           |    1
       fs/xfs/xfs_rw.c                |    3
       fs/xfs/xfs_trans.h             |   47 +
       fs/xfs/xfs_trans_buf.c         |   62 -
       fs/xfs/xfs_vnodeops.c          |    8
       70 files changed, 2151 insertions(+), 2592 deletions(-)
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      0b1b213f