1. 28 6月, 2013 1 次提交
  2. 02 2月, 2013 9 次提交
  3. 31 7月, 2012 1 次提交
    • J
      xfs: Convert to new freezing code · d9457dc0
      Jan Kara 提交于
      Generic code now blocks all writers from standard write paths. So we add
      blocking of all writers coming from ioctl (we get a protection of ioctl against
      racing remount read-only as a bonus) and convert xfs_file_aio_write() to a
      non-racy freeze protection. We also keep freeze protection on transaction
      start to block internal filesystem writes such as removal of preallocated
      blocks.
      
      CC: Ben Myers <bpm@sgi.com>
      CC: Alex Elder <elder@kernel.org>
      CC: xfs@oss.sgi.com
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      d9457dc0
  4. 30 5月, 2012 1 次提交
  5. 15 5月, 2012 4 次提交
  6. 23 2月, 2012 1 次提交
  7. 14 2月, 2012 2 次提交
  8. 09 12月, 2011 3 次提交
  9. 12 10月, 2011 2 次提交
    • C
      xfs: simplify xfs_trans_ijoin* again · ddc3415a
      Christoph Hellwig 提交于
      There is no reason to keep a reference to the inode even if we unlock
      it during transaction commit because we never drop a reference between
      the ijoin and commit.  Also use this fact to merge xfs_trans_ijoin_ref
      back into xfs_trans_ijoin - the third argument decides if an unlock
      is needed now.
      
      I'm actually starting to wonder if allowing inodes to be unlocked
      at transaction commit really is worth the effort.  The only real
      benefit is that they can be unlocked earlier when commiting a
      synchronous transactions, but that could be solved by doing the
      log force manually after the unlock, too.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      
      ddc3415a
    • C
      xfs: unlock the inode before log force in xfs_fsync · b1037058
      Christoph Hellwig 提交于
      Only read the LSN we need to push to with the ilock held, and then release
      it before we do the log force to improve concurrency.
      
      This also removes the only direct caller of _xfs_trans_commit, thus
      allowing it to be merged into the plain xfs_trans_commit again.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      
      b1037058
  10. 21 7月, 2011 1 次提交
    • D
      xfs: use a cursor for bulk AIL insertion · 1d8c95a3
      Dave Chinner 提交于
      Delayed logging can insert tens of thousands of log items into the
      AIL at the same LSN. When the committing of log commit records
      occur, we can get insertions occurring at an LSN that is not at the
      end of the AIL. If there are thousands of items in the AIL on the
      tail LSN, each insertion has to walk the AIL to find the correct
      place to insert the new item into the AIL. This can consume large
      amounts of CPU time and block other operations from occurring while
      the traversals are in progress.
      
      To avoid this repeated walk, use a AIL cursor to record
      where we should be inserting the new items into the AIL without
      having to repeat the walk. The cursor infrastructure already
      provides this functionality for push walks, so is a simple extension
      of existing code. While this will not avoid the initial walk, it
      will avoid repeating it tens of thousands of times during a single
      checkpoint commit.
      
      This version includes logic improvements from Christoph Hellwig.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      1d8c95a3
  11. 11 7月, 2011 1 次提交
  12. 08 7月, 2011 1 次提交
  13. 07 7月, 2011 1 次提交
    • D
      xfs: unpin stale inodes directly in IOP_COMMITTED · 1316d4da
      Dave Chinner 提交于
      When inodes are marked stale in a transaction, they are treated
      specially when the inode log item is being inserted into the AIL.
      It tries to avoid moving the log item forward in the AIL due to a
      race condition with the writing the underlying buffer back to disk.
      The was "fixed" in commit de25c181 ("xfs: avoid moving stale inodes
      in the AIL").
      
      To avoid moving the item forward, we return a LSN smaller than the
      commit_lsn of the completing transaction, thereby trying to trick
      the commit code into not moving the inode forward at all. I'm not
      sure this ever worked as intended - it assumes the inode is already
      in the AIL, but I don't think the returned LSN would have been small
      enough to prevent moving the inode. It appears that the reason it
      worked is that the lower LSN of the inodes meant they were inserted
      into the AIL and flushed before the inode buffer (which was moved to
      the commit_lsn of the transaction).
      
      The big problem is that with delayed logging, the returning of the
      different LSN means insertion takes the slow, non-bulk path.  Worse
      yet is that insertion is to a position -before- the commit_lsn so it
      is doing a AIL traversal on every insertion, and has to walk over
      all the items that have already been inserted into the AIL. It's
      expensive.
      
      To compound the matter further, with delayed logging inodes are
      likely to go from clean to stale in a single checkpoint, which means
      they aren't even in the AIL at all when we come across them at AIL
      insertion time. Hence these were all getting inserted into the AIL
      when they simply do not need to be as inodes marked XFS_ISTALE are
      never written back.
      
      Transactional/recovery integrity is maintained in this case by the
      other items in the unlink transaction that were modified (e.g. the
      AGI btree blocks) and committed in the same checkpoint.
      
      So to fix this, simply unpin the stale inodes directly in
      xfs_inode_item_committed() and return -1 to indicate that the AIL
      insertion code does not need to do any further processing of these
      inodes.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      1316d4da
  14. 25 5月, 2011 1 次提交
    • C
      xfs: add online discard support · e84661aa
      Christoph Hellwig 提交于
      Now that we have reliably tracking of deleted extents in a
      transaction we can easily implement "online" discard support
      which calls blkdev_issue_discard once a transaction commits.
      
      The actual discard is a two stage operation as we first have
      to mark the busy extent as not available for reuse before we
      can start the actual discard.  Note that we don't bother
      supporting discard for the non-delaylog mode.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      e84661aa
  15. 29 4月, 2011 1 次提交
  16. 28 1月, 2011 2 次提交
    • D
      xfs: handle CIl transaction commit failures correctly · c6f990d1
      Dave Chinner 提交于
      Failure to commit a transaction into the CIL is not handled
      correctly. This currently can only happen when racing with a
      shutdown and requires an explicit shutdown check, so it rare and can
      be avoided. Remove the shutdown check and make the CIL commit a void
      function to indicate it will always succeed, thereby removing the
      incorrectly handled failure case.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      c6f990d1
    • D
      xfs: fix efi item leak on forced shutdown · e34a314c
      Dave Chinner 提交于
      After test 139, kmemleak shows:
      
      unreferenced object 0xffff880078b405d8 (size 400):
        comm "xfs_io", pid 4904, jiffies 4294909383 (age 1186.728s)
        hex dump (first 32 bytes):
          60 c1 17 79 00 88 ff ff 60 c1 17 79 00 88 ff ff  `..y....`..y....
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<ffffffff81afb04d>] kmemleak_alloc+0x2d/0x60
          [<ffffffff8115c6cf>] kmem_cache_alloc+0x13f/0x2b0
          [<ffffffff814aaa97>] kmem_zone_alloc+0x77/0xf0
          [<ffffffff814aab2e>] kmem_zone_zalloc+0x1e/0x50
          [<ffffffff8147cd6b>] xfs_efi_init+0x4b/0xb0
          [<ffffffff814a4ee8>] xfs_trans_get_efi+0x58/0x90
          [<ffffffff81455fab>] xfs_bmap_finish+0x8b/0x1d0
          [<ffffffff814851b4>] xfs_itruncate_finish+0x2c4/0x5d0
          [<ffffffff814a970f>] xfs_setattr+0x8df/0xa70
          [<ffffffff814b5c7b>] xfs_vn_setattr+0x1b/0x20
          [<ffffffff8117dc00>] notify_change+0x170/0x2e0
          [<ffffffff81163bf6>] do_truncate+0x66/0xa0
          [<ffffffff81163d0b>] sys_ftruncate+0xdb/0xe0
          [<ffffffff8103a002>] system_call_fastpath+0x16/0x1b
          [<ffffffffffffffff>] 0xffffffffffffffff
      
      The cause of the leak is that the "remove" parameter of IOP_UNPIN()
      is never set when a CIL push is aborted. This means that the EFI
      item is never freed if it was in the push being cancelled. The
      problem is specific to delayed logging, but has uncovered a couple
      of problems with the handling of IOP_UNPIN(remove).
      
      Firstly, we cannot safely call xfs_trans_del_item() from IOP_UNPIN()
      in the CIL commit failure path or the iclog write failure path
      because for delayed loging we have no transaction context. Hence we
      must only call xfs_trans_del_item() if the log item being unpinned
      has an active log item descriptor.
      
      Secondly, xfs_trans_uncommit() does not handle log item descriptor
      freeing during the traversal of log items on a transaction. It can
      reference a freed log item descriptor when unpinning an EFI item.
      Hence it needs to use a safe list traversal method to allow items to
      be removed from the transaction during IOP_UNPIN().
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      e34a314c
  17. 12 1月, 2011 1 次提交
  18. 20 12月, 2010 1 次提交
    • D
      xfs: bulk AIL insertion during transaction commit · 0e57f6a3
      Dave Chinner 提交于
      When inserting items into the AIL from the transaction committed
      callbacks, we take the AIL lock for every single item that is to be
      inserted. For a CIL checkpoint commit, this can be tens of thousands
      of individual inserts, yet almost all of the items will be inserted
      at the same point in the AIL because they have the same index.
      
      To reduce the overhead and contention on the AIL lock for such
      operations, introduce a "bulk insert" operation which allows a list
      of log items with the same LSN to be inserted in a single operation
      via a list splice. To do this, we need to pre-sort the log items
      being committed into a temporary list for insertion.
      
      The complexity is that not every log item will end up with the same
      LSN, and not every item is actually inserted into the AIL. Items
      that don't match the commit LSN will be inserted and unpinned as per
      the current one-at-a-time method (relatively rare), while items that
      are not to be inserted will be unpinned and freed immediately. Items
      that are to be inserted at the given commit lsn are placed in a
      temporary array and inserted into the AIL in bulk each time the
      array fills up.
      
      As a result of this, we trade off AIL hold time for a significant
      reduction in traffic. lock_stat output shows that the worst case
      hold time is unchanged, but contention from AIL inserts drops by an
      order of magnitude and the number of lock traversal decreases
      significantly.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      0e57f6a3
  19. 19 10月, 2010 4 次提交
  20. 24 8月, 2010 1 次提交
    • D
      xfs: unlock items before allowing the CIL to commit · d17c701c
      Dave Chinner 提交于
      When we commit a transaction using delayed logging, we need to
      unlock the items in the transaciton before we unlock the CIL context
      and allow it to be checkpointed. If we unlock them after we release
      the CIl context lock, the CIL can checkpoint and complete before
      we free the log items. This breaks stale buffer item unlock and
      unpin processing as there is an implicit assumption that the unlock
      will occur before the unpin.
      
      Also, some log items need to store the LSN of the transaction commit
      in the item (inodes and EFIs) and so can race with other transaction
      completions if we don't prevent the CIL from checkpointing before
      the unlock occurs.
      
      Cc: <stable@kernel.org>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      d17c701c
  21. 27 7月, 2010 1 次提交
    • D
      xfs: fix xfs_trans_add_item() lockdep warnings · 43869706
      Dave Chinner 提交于
      xfs_trans_add_item() is called with ip->i_ilock held, which means it
      is unsafe for memory reclaim to recurse back into the filesystem
      (ilock is required in writeback). Hence the allocation needs to be
      KM_NOFS to avoid recursion.
      
      Lockdep report indicating memory allocation being called with the
      ip->i_ilock held is as follows:
      
      [ 1749.866796] =================================
      [ 1749.867788] [ INFO: inconsistent lock state ]
      [ 1749.868327] 2.6.35-rc3-dgc+ #25
      [ 1749.868741] ---------------------------------
      [ 1749.868741] inconsistent {IN-RECLAIM_FS-W} -> {RECLAIM_FS-ON-W} usage.
      [ 1749.868741] dd/2835 [HC0[0]:SC0[0]:HE1:SE1] takes:
      [ 1749.868741]  (&(&ip->i_lock)->mr_lock){++++?.}, at: [<ffffffff813170fb>] xfs_ilock+0x10b/0x190
      [ 1749.868741] {IN-RECLAIM_FS-W} state was registered at:
      [ 1749.868741]   [<ffffffff810b3a97>] __lock_acquire+0x437/0x1450
      [ 1749.868741]   [<ffffffff810b4b56>] lock_acquire+0xa6/0x160
      [ 1749.868741]   [<ffffffff810a20b5>] down_write_nested+0x65/0xb0
      [ 1749.868741]   [<ffffffff813170fb>] xfs_ilock+0x10b/0x190
      [ 1749.868741]   [<ffffffff8134e819>] xfs_reclaim_inode+0x99/0x310
      [ 1749.868741]   [<ffffffff8134f56b>] xfs_inode_ag_walk+0x8b/0x150
      [ 1749.868741]   [<ffffffff8134f6bb>] xfs_inode_ag_iterator+0x8b/0xf0
      [ 1749.868741]   [<ffffffff8134f7a8>] xfs_reclaim_inode_shrink+0x88/0x90
      [ 1749.868741]   [<ffffffff81119d07>] shrink_slab+0x137/0x1a0
      [ 1749.868741]   [<ffffffff8111bbe1>] balance_pgdat+0x421/0x6a0
      [ 1749.868741]   [<ffffffff8111bf7d>] kswapd+0x11d/0x320
      [ 1749.868741]   [<ffffffff8109ce56>] kthread+0x96/0xa0
      [ 1749.868741]   [<ffffffff81035de4>] kernel_thread_helper+0x4/0x10
      [ 1749.868741] irq event stamp: 4234335
      [ 1749.868741] hardirqs last  enabled at (4234335): [<ffffffff81147d25>] kmem_cache_free+0x115/0x220
      [ 1749.868741] hardirqs last disabled at (4234334): [<ffffffff81147c4d>] kmem_cache_free+0x3d/0x220
      [ 1749.868741] softirqs last  enabled at (4233112): [<ffffffff81084dd2>] __do_softirq+0x142/0x260
      [ 1749.868741] softirqs last disabled at (4233095): [<ffffffff81035edc>] call_softirq+0x1c/0x50
      [ 1749.868741] 
      [ 1749.868741] other info that might help us debug this:
      [ 1749.868741] 2 locks held by dd/2835:
      [ 1749.868741]  #0:  (&(&ip->i_iolock)->mr_lock#2){+.+.+.}, at: [<ffffffff81316edd>] xfs_ilock_nowait+0xed/0x200
      [ 1749.868741]  #1:  (&(&ip->i_lock)->mr_lock){++++?.}, at: [<ffffffff813170fb>] xfs_ilock+0x10b/0x190
      [ 1749.868741] 
      [ 1749.868741] stack backtrace:
      [ 1749.868741] Pid: 2835, comm: dd Not tainted 2.6.35-rc3-dgc+ #25
      [ 1749.868741] Call Trace:
      [ 1749.868741]  [<ffffffff810b1faa>] print_usage_bug+0x18a/0x190
      [ 1749.868741]  [<ffffffff8104264f>] ? save_stack_trace+0x2f/0x50
      [ 1749.868741]  [<ffffffff810b2400>] ? check_usage_backwards+0x0/0xf0
      [ 1749.868741]  [<ffffffff810b2f11>] mark_lock+0x331/0x400
      [ 1749.868741]  [<ffffffff810b3047>] mark_held_locks+0x67/0x90
      [ 1749.868741]  [<ffffffff810b3111>] lockdep_trace_alloc+0xa1/0xe0
      [ 1749.868741]  [<ffffffff81147419>] kmem_cache_alloc+0x39/0x1e0
      [ 1749.868741]  [<ffffffff8133f954>] kmem_zone_alloc+0x94/0xe0
      [ 1749.868741]  [<ffffffff8133f9be>] kmem_zone_zalloc+0x1e/0x50
      [ 1749.868741]  [<ffffffff81335f02>] xfs_trans_add_item+0x72/0xb0
      [ 1749.868741]  [<ffffffff81339e41>] xfs_trans_ijoin+0xa1/0xd0
      [ 1749.868741]  [<ffffffff81319f82>] xfs_itruncate_finish+0x312/0x5d0
      [ 1749.868741]  [<ffffffff8133cb87>] xfs_free_eofblocks+0x227/0x280
      [ 1749.868741]  [<ffffffff8133cd18>] xfs_release+0x138/0x190
      [ 1749.868741]  [<ffffffff813464c5>] xfs_file_release+0x15/0x20
      [ 1749.868741]  [<ffffffff81150ebf>] fput+0x13f/0x260
      [ 1749.868741]  [<ffffffff8114d8c2>] filp_close+0x52/0x80
      [ 1749.868741]  [<ffffffff8114d9a9>] sys_close+0xb9/0x120
      [ 1749.868741]  [<ffffffff81034ff2>] system_call_fastpath+0x16/0x1b
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      43869706