1. 20 5月, 2014 2 次提交
    • D
      xfs: log vector rounding leaks log space · 110dc24a
      Dave Chinner 提交于
      The addition of direct formatting of log items into the CIL
      linear buffer added alignment restrictions that the start of each
      vector needed to be 64 bit aligned. Hence padding was added in
      xlog_finish_iovec() to round up the vector length to ensure the next
      vector started with the correct alignment.
      
      This adds a small number of bytes to the size of
      the linear buffer that is otherwise unused. The issue is that we
      then use the linear buffer size to determine the log space used by
      the log item, and this includes the unused space. Hence when we
      account for space used by the log item, it's more than is actually
      written into the iclogs, and hence we slowly leak this space.
      
      This results on log hangs when reserving space, with threads getting
      stuck with these stack traces:
      
      Call Trace:
      [<ffffffff81d15989>] schedule+0x29/0x70
      [<ffffffff8150d3a2>] xlog_grant_head_wait+0xa2/0x1a0
      [<ffffffff8150d55d>] xlog_grant_head_check+0xbd/0x140
      [<ffffffff8150ee33>] xfs_log_reserve+0x103/0x220
      [<ffffffff814b7f05>] xfs_trans_reserve+0x2f5/0x310
      .....
      
      The 4 bytes is significant. Brain Foster did all the hard work in
      tracking down a reproducable leak to inode chunk allocation (it went
      away with the ikeep mount option). His rough numbers were that
      creating 50,000 inodes leaked 11 log blocks. This turns out to be
      roughly 800 inode chunks or 1600 inode cluster buffers. That
      works out at roughly 4 bytes per cluster buffer logged, and at that
      I started looking for a 4 byte leak in the buffer logging code.
      
      What I found was that a struct xfs_buf_log_format structure for an
      inode cluster buffer is 28 bytes in length. This gets rounded up to
      32 bytes, but the vector length remains 28 bytes. Hence the CIL
      ticket reservation is decremented by 32 bytes (via lv->lv_buf_len)
      for that vector rather than 28 bytes which are written into the log.
      
      The fix for this problem is to separately track the bytes used by
      the log vectors in the item and use that instead of the buffer
      length when accounting for the log space that will be used by the
      formatted log item.
      
      Again, thanks to Brian Foster for doing all the hard work and long
      hours to isolate this leak and make finding the bug relatively
      simple.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      110dc24a
    • N
      xfs: remove XFS_TRANS_RESERVE in collapse range · ce576f1c
      Namjae Jeon 提交于
      There is no need to dip into reserve pool. Reserve pool is used for much
      more important things. And xfs_trans_reserve will never return ENOSPC
      because punch hole is already done. If we get ENOSPC, collapse range
      will be simply failed.
      
      Cc: Brian Foster <bfoster@redhat.com>
      Signed-off-by: NNamjae Jeon <namjae.jeon@samsung.com>
      Signed-off-by: NAshish Sangwan <a.sangwan@samsung.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      ce576f1c
  2. 07 5月, 2014 5 次提交
  3. 06 5月, 2014 2 次提交
    • D
      xfs: remote attribute overwrite causes transaction overrun · 8275cdd0
      Dave Chinner 提交于
      Commit e461fcb1 ("xfs: remote attribute lookups require the value
      length") passes the remote attribute length in the xfs_da_args
      structure on lookup so that CRC calculations and validity checking
      can be performed correctly by related code. This, unfortunately has
      the side effect of changing the args->valuelen parameter in cases
      where it shouldn't.
      
      That is, when we replace a remote attribute, the incoming
      replacement stores the value and length in args->value and
      args->valuelen, but then the lookup which finds the existing remote
      attribute overwrites args->valuelen with the length of the remote
      attribute being replaced. Hence when we go to create the new
      attribute, we create it of the size of the existing remote
      attribute, not the size it is supposed to be. When the new attribute
      is much smaller than the old attribute, this results in a
      transaction overrun and an ASSERT() failure on a debug kernel:
      
      XFS: Assertion failed: tp->t_blk_res_used <= tp->t_blk_res, file: fs/xfs/xfs_trans.c, line: 331
      
      Fix this by keeping the remote attribute value length separate to
      the attribute value length in the xfs_da_args structure. The enables
      us to pass the length of the remote attribute to be removed without
      overwriting the new attribute's length.
      
      Also, ensure that when we save remote block contexts for a later
      rename we zero the original state variables so that we don't confuse
      the state of the attribute to be removes with the state of the new
      attribute that we just added. [Spotted by Brain Foster.]
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      8275cdd0
    • B
      xfs: initialize default acls for ->tmpfile() · d540e43b
      Brian Foster 提交于
      The current tmpfile handler does not initialize default ACLs. Doing so
      within xfs_vn_tmpfile() makes it roughly equivalent to xfs_vn_mknod(),
      which is already used as a common create handler.
      
      xfs_vn_mknod() does not currently have a mechanism to determine whether
      to link the file into the namespace. Therefore, further abstract
      xfs_vn_mknod() into a new xfs_generic_create() handler with a tmpfile
      parameter. This new handler calls xfs_create_tmpfile() and d_tmpfile()
      on the dentry when called via ->tmpfile().
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      d540e43b
  4. 05 5月, 2014 2 次提交
  5. 04 5月, 2014 3 次提交
    • M
      dcache: don't need rcu in shrink_dentry_list() · 60942f2f
      Miklos Szeredi 提交于
      Since now the shrink list is private and nobody can free the dentry while
      it is on the shrink list, we can remove RCU protection from this.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      60942f2f
    • A
      more graceful recovery in umount_collect() · 9c8c10e2
      Al Viro 提交于
      Start with shrink_dcache_parent(), then scan what remains.
      
      First of all, BUG() is very much an overkill here; we are holding
      ->s_umount, and hitting BUG() means that a lot of interesting stuff
      will be hanging after that point (sync(2), for example).  Moreover,
      in cases when there had been more than one leak, we'll be better
      off reporting all of them.  And more than just the last component
      of pathname - %pd is there for just such uses...
      
      That was the last user of dentry_lru_del(), so kill it off...
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      9c8c10e2
    • A
      don't remove from shrink list in select_collect() · fe91522a
      Al Viro 提交于
      	If we find something already on a shrink list, just increment
      data->found and do nothing else.  Loops in shrink_dcache_parent() and
      check_submounts_and_drop() will do the right thing - everything we
      did put into our list will be evicted and if there had been nothing,
      but data->found got non-zero, well, we have somebody else shrinking
      those guys; just try again.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      fe91522a
  6. 01 5月, 2014 6 次提交
  7. 29 4月, 2014 5 次提交
  8. 28 4月, 2014 15 次提交