1. 06 6月, 2014 1 次提交
    • D
      xfs: introduce directory geometry structure · 0650b554
      Dave Chinner 提交于
      The directory code has a dependency on the struct xfs_mount to
      supply the directory block geometry. Block size, block log size,
      and other parameters are pre-caclulated in the struct xfs_mount or
      access directly from the superblock embedded in the struct
      xfs_mount.
      
      Extract all of this geometry information out of the struct xfs_mount
      and superblock and place it into a new struct xfs_da_geometry
      defined by the directory code. Allocate and initialise it at mount
      time, and attach it to the struct xfs_mount so it canbe passed back
      into the directory code appropriately rather than using the struct
      xfs_mount.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      0650b554
  2. 15 5月, 2014 9 次提交
  3. 13 5月, 2014 5 次提交
  4. 07 5月, 2014 3 次提交
    • D
      xfs: fix directory readahead offset off-by-one · 8cfcc3e5
      Dave Chinner 提交于
      Directory readahead can throw loud scary but harmless warnings
      when multiblock directories are in use a specific pattern of
      discontiguous blocks are found in the directory. That is, if a hole
      follows a discontiguous block, it will throw a warning like:
      
      XFS (dm-1): xfs_da_do_buf: bno 637 dir: inode 34363923462
      XFS (dm-1): [00] br_startoff 637 br_startblock 1917954575 br_blockcount 1 br_state 0
      XFS (dm-1): [01] br_startoff 638 br_startblock -2 br_blockcount 1 br_state 0
      
      And dump a stack trace.
      
      This is because the readahead offset increment loop does a double
      increment of the block index - it does an increment for the loop
      iteration as well as increase the loop counter by the number of
      blocks in the extent. As a result, the readahead offset does not get
      incremented correctly for discontiguous blocks and hence can ask for
      readahead of a directory block from an offset part way through a
      directory block.  If that directory block is followed by a hole, it
      will trigger a mapping warning like the above.
      
      The bad readahead will be ignored, though, because the main
      directory block read loop uses the correct mapping offsets rather
      than the readahead offset and so will ignore the bad readahead
      altogether.
      
      Fix the warning by ensuring that the readahead offset is correctly
      incremented.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      8cfcc3e5
    • D
      xfs: don't sleep in xlog_cil_force_lsn on shutdown · ac983517
      Dave Chinner 提交于
      Reports of a shutdown hang when fsyncing a directory have surfaced,
      such as this:
      
      [ 3663.394472] Call Trace:
      [ 3663.397199]  [<ffffffff815f1889>] schedule+0x29/0x70
      [ 3663.402743]  [<ffffffffa01feda5>] xlog_cil_force_lsn+0x185/0x1a0 [xfs]
      [ 3663.416249]  [<ffffffffa01fd3af>] _xfs_log_force_lsn+0x6f/0x2f0 [xfs]
      [ 3663.429271]  [<ffffffffa01a339d>] xfs_dir_fsync+0x7d/0xe0 [xfs]
      [ 3663.435873]  [<ffffffff811df8c5>] do_fsync+0x65/0xa0
      [ 3663.441408]  [<ffffffff811dfbc0>] SyS_fsync+0x10/0x20
      [ 3663.447043]  [<ffffffff815fc7d9>] system_call_fastpath+0x16/0x1b
      
      If we trigger a shutdown in xlog_cil_push() from xlog_write(), we
      will never wake waiters on the current push sequence number, so
      anything waiting in xlog_cil_force_lsn() for that push sequence
      number to come up will not get woken and hence stall the shutdown.
      
      Fix this by ensuring we call wake_up_all(&cil->xc_commit_wait) in
      the push abort handling, in the log shutdown code when waking all
      waiters, and adding a shutdown check in the sequence completion wait
      loops to ensure they abort when a wakeup due to a shutdown occurs.
      Reported-by: NBoris Ranto <branto@redhat.com>
      Reported-by: NEric Sandeen <esandeen@redhat.com>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      ac983517
    • D
      xfs: truncate_setsize should be outside transactions · 49abc3a8
      Dave Chinner 提交于
      truncate_setsize() removes pages from the page cache, and hence
      requires page locks to be held. It is not valid to lock a page cache
      page inside a transaction context as we can hold page locks when we
      we reserve space for a transaction. If we do, then we expose an ABBA
      deadlock between log space reservation and page locks.
      
      That is, both the write path and writeback lock a page, then start a
      transaction for block allocation, which means they can block waiting
      for a log reservation with the page lock held. If we hold a log
      reservation and then do something that locks a page (e.g.
      truncate_setsize in xfs_setattr_size) then that page lock can block
      on the page locked and waiting for a log reservation. If the
      transaction that is waiting for the page lock is the only active
      transaction in the system that can free log space via a commit,
      then writeback will never make progress and so log space will never
      free up.
      
      This issue with xfs_setattr_size() was introduced back in 2010 by
      commit fa9b227e ("xfs: new truncate sequence") which moved the page
      cache truncate from outside the transaction context (what was
      xfs_itruncate_data()) to inside the transaction context as a call to
      truncate_setsize().
      
      The reason truncate_setsize() was located where in this place was
      that we can't shouldn't change the file size until after we are in
      the transaction context and the operation will either succeed or
      shut down the filesystem on failure. However, block_truncate_page()
      already modifies the file contents before we enter the transaction
      context, so we can't really fulfill this guarantee in any way. Hence
      we may as well ensure that on success or failure, the in-memory
      inode and data is truncated away and that the application cleans up
      the mess appropriately.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      49abc3a8
  5. 06 5月, 2014 2 次提交
    • D
      xfs: remote attribute overwrite causes transaction overrun · 8275cdd0
      Dave Chinner 提交于
      Commit e461fcb1 ("xfs: remote attribute lookups require the value
      length") passes the remote attribute length in the xfs_da_args
      structure on lookup so that CRC calculations and validity checking
      can be performed correctly by related code. This, unfortunately has
      the side effect of changing the args->valuelen parameter in cases
      where it shouldn't.
      
      That is, when we replace a remote attribute, the incoming
      replacement stores the value and length in args->value and
      args->valuelen, but then the lookup which finds the existing remote
      attribute overwrites args->valuelen with the length of the remote
      attribute being replaced. Hence when we go to create the new
      attribute, we create it of the size of the existing remote
      attribute, not the size it is supposed to be. When the new attribute
      is much smaller than the old attribute, this results in a
      transaction overrun and an ASSERT() failure on a debug kernel:
      
      XFS: Assertion failed: tp->t_blk_res_used <= tp->t_blk_res, file: fs/xfs/xfs_trans.c, line: 331
      
      Fix this by keeping the remote attribute value length separate to
      the attribute value length in the xfs_da_args structure. The enables
      us to pass the length of the remote attribute to be removed without
      overwriting the new attribute's length.
      
      Also, ensure that when we save remote block contexts for a later
      rename we zero the original state variables so that we don't confuse
      the state of the attribute to be removes with the state of the new
      attribute that we just added. [Spotted by Brain Foster.]
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      8275cdd0
    • B
      xfs: initialize default acls for ->tmpfile() · d540e43b
      Brian Foster 提交于
      The current tmpfile handler does not initialize default ACLs. Doing so
      within xfs_vn_tmpfile() makes it roughly equivalent to xfs_vn_mknod(),
      which is already used as a common create handler.
      
      xfs_vn_mknod() does not currently have a mechanism to determine whether
      to link the file into the namespace. Therefore, further abstract
      xfs_vn_mknod() into a new xfs_generic_create() handler with a tmpfile
      parameter. This new handler calls xfs_create_tmpfile() and d_tmpfile()
      on the dentry when called via ->tmpfile().
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      d540e43b
  6. 05 5月, 2014 5 次提交
    • F
      xfs: Fix wrong error codes being returned · b28fd7b5
      From: Tuomas Tynkkynen 提交于
      xfs_{compat_,}attrmulti_by_handle could return an errno with incorrect
      sign in some cases. While at it, make sure ENOMEM is returned instead of
      E2BIG if kmalloc fails.
      Signed-off-by: NTuomas Tynkkynen <tuomas.tynkkynen@iki.fi>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      b28fd7b5
    • D
      xfs: remove dquot hints · 3c353375
      Dave Chinner 提交于
      group and project quota hints are currently stored on the user
      dquot. If we are attaching quotas to the inode, then the group and
      project dquots are stored as hints on the user dquot to save having
      to look them up again later.
      
      The thing is, the hints are not used for that inode for the rest of
      the life of the inode - the dquots are attached directly to the
      inode itself - so the only time the hints are used is when an inode
      first has dquots attached.
      
      When the hints on the user dquot don't match the dquots being
      attache dto the inode, they are then removed and replaced with the
      new hints. If a user is concurrently modifying files in different
      group and/or project contexts, then this leads to thrashing of the
      hints attached to user dquot.
      
      If user quotas are not enabled, then hints are never even used.
      
      So, if the hints are used to avoid the cost of the lookup, is the
      cost of the lookup significant enough to justify the hint
      infrstructure? Maybe it was once, when there was a global quota
      manager shared between all XFS filesystems and was hash table based.
      
      However, lookups are now much simpler, requiring only a single lock and
      radix tree lookup local to the filesystem and no hash or LRU
      manipulations to be made. Hence the cost of lookup is much lower
      than when hints were implemented. Turns out that benchmarks show
      that, too, with thir being no differnce in performance when doing
      file creation workloads as a single user with user, group and
      project quotas enabled - the hints do not make the code go any
      faster. In fact, removing the hints shows a 2-3% reduction in the
      time it takes to create 50 million inodes....
      
      So, let's just get rid of the hints and the complexity around them.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      3c353375
    • E
      xfs: bulletfproof xfs_qm_scall_trunc_qfiles() · f58522c5
      Eric Sandeen 提交于
      Coverity noticed that if we sent junk into
      xfs_qm_scall_trunc_qfiles(), we could get back an
      uninitialized error value.  So sanitize the flags we
      will accept, and initialize error anyway for good measure.
      
      (This bug may have been introduced via c61a9e39).
      
      Should resolve Coverity CID 1163872.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NJie Liu <jeff.liu@oracle.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      f58522c5
    • E
      xfs: fix Q_XQUOTARM ioctl · 9da93f9b
      Eric Sandeen 提交于
      The Q_XQUOTARM quotactl was not working properly, because
      we weren't passing around proper flags.  The xfs_fs_set_xstate()
      ioctl handler used the same flags for Q_XQUOTAON/OFF as
      well as for Q_XQUOTARM, but Q_XQUOTAON/OFF look for
      XFS_UQUOTA_ACCT, XFS_UQUOTA_ENFD, XFS_GQUOTA_ACCT etc,
      i.e. quota type + state, while Q_XQUOTARM looks only for
      the type of quota, i.e. XFS_DQ_USER, XFS_DQ_GROUP etc.
      
      Unfortunately these flag spaces overlap a bit, so we
      got semi-random results for Q_XQUOTARM; i.e. the value
      for XFS_DQ_USER == XFS_UQUOTA_ACCT, etc.  yeargh.
      
      Add a new quotactl op vector specifically for the QUOTARM
      operation, since it operates with a different flag space.
      
      This has been broken more or less forever, AFAICT.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Acked-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      9da93f9b
    • D
      xfs: fully support v5 format filesystems · c99d609a
      Dave Chinner 提交于
      We have had this code in the kernel for over a year now and have
      shaken all the known issues out of the code over the past few
      releases. It's now time to remove the experimental warnings during
      mount and fully support the new filesystem format in production
      systems.
      
      Remove the experimental warning, and add a version number to the
      initial "mounting filesystem" message to tell use what type of
      filesystem is being mounted. Also, remove the temporary inode
      cluster size output at mount time now we know that this code works
      fine.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      c99d609a
  7. 24 4月, 2014 11 次提交
  8. 23 4月, 2014 4 次提交