1. 15 3月, 2017 1 次提交
  2. 08 3月, 2017 1 次提交
  3. 31 1月, 2017 1 次提交
  4. 25 1月, 2017 1 次提交
    • C
      xfs: use per-AG reservations for the finobt · 76d771b4
      Christoph Hellwig 提交于
      Currently we try to rely on the global reserved block pool for block
      allocations for the free inode btree, but I have customer reports
      (fairly complex workload, need to find an easier reproducer) where that
      is not enough as the AG where we free an inode that requires a new
      finobt block is entirely full.  This causes us to cancel a dirty
      transaction and thus a file system shutdown.
      
      I think the right way to guard against this is to treat the finot the same
      way as the refcount btree and have a per-AG reservations for the possible
      worst case size of it, and the patch below implements that.
      
      Note that this could increase mount times with large finobt trees.  In
      an ideal world we would have added a field for the number of finobt
      fields to the AGI, similar to what we did for the refcount blocks.
      We should do add it next time we rev the AGI or AGF format by adding
      new fields.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      76d771b4
  5. 05 12月, 2016 1 次提交
  6. 30 11月, 2016 1 次提交
  7. 06 10月, 2016 4 次提交
  8. 05 10月, 2016 1 次提交
    • D
      xfs: when replaying bmap operations, don't let unlinked inodes get reaped · 17c12bcd
      Darrick J. Wong 提交于
      Log recovery will iget an inode to replay BUI items and iput the inode
      when it's done.  Unfortunately, if the inode was unlinked, the iput
      will see that i_nlink == 0 and decide to truncate & free the inode,
      which prevents us from replaying subsequent BUIs.  We can't skip the
      BUIs because we have to replay all the redo items to ensure that
      atomic operations complete.
      
      Since unlinked inode recovery will reap the inode anyway, we can
      safely introduce a new inode flag to indicate that an inode is in this
      'unlinked recovery' state and should not be auto-reaped in the
      drop_inode path.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      17c12bcd
  9. 28 9月, 2016 1 次提交
  10. 22 9月, 2016 1 次提交
  11. 03 8月, 2016 3 次提交
  12. 01 6月, 2016 1 次提交
  13. 18 5月, 2016 6 次提交
  14. 06 4月, 2016 3 次提交
  15. 10 2月, 2016 1 次提交
  16. 09 2月, 2016 7 次提交
  17. 11 1月, 2016 1 次提交
    • E
      xfs: eliminate committed arg from xfs_bmap_finish · f6106efa
      Eric Sandeen 提交于
      Calls to xfs_bmap_finish() and xfs_trans_ijoin(), and the
      associated comments were replicated several times across
      the attribute code, all dealing with what to do if the
      transaction was or wasn't committed.
      
      And in that replicated code, an ASSERT() test of an
      uninitialized variable occurs in several locations:
      
      	error = xfs_attr_thing(&args);
      	if (!error) {
      		error = xfs_bmap_finish(&args.trans, args.flist,
      					&committed);
      	}
      	if (error) {
      		ASSERT(committed);
      
      If the first xfs_attr_thing() failed, we'd skip the xfs_bmap_finish,
      never set "committed", and then test it in the ASSERT.
      
      Fix this up by moving the committed state internal to xfs_bmap_finish,
      and add a new inode argument.  If an inode is passed in, it is passed
      through to __xfs_trans_roll() and joined to the transaction there if
      the transaction was committed.
      
      xfs_qm_dqalloc() was a little unique in that it called bjoin rather
      than ijoin, but as Dave points out we can detect the committed state
      but checking whether (*tpp != tp).
      
      Addresses-Coverity-Id: 102360
      Addresses-Coverity-Id: 102361
      Addresses-Coverity-Id: 102363
      Addresses-Coverity-Id: 102364
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      f6106efa
  18. 04 1月, 2016 2 次提交
    • D
      xfs: introduce per-inode DAX enablement · 58f88ca2
      Dave Chinner 提交于
      Rather than just being able to turn DAX on and off via a mount
      option, some applications may only want to enable DAX for certain
      performance critical files in a filesystem.
      
      This patch introduces a new inode flag to enable DAX in the v3 inode
      di_flags2 field. It adds support for setting and clearing flags in
      the di_flags2 field via the XFS_IOC_FSSETXATTR ioctl, and sets the
      S_DAX inode flag appropriately when it is seen.
      
      When this flag is set on a directory, it acts as an "inherit flag".
      That is, inodes created in the directory will automatically inherit
      the on-disk inode DAX flag, enabling administrators to set up
      directory heirarchies that automatically use DAX. Setting this flag
      on an empty root directory will make the entire filesystem use DAX
      by default.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      58f88ca2
    • D
      xfs: use FS_XFLAG definitions directly · e7b89481
      Dave Chinner 提交于
      Now that the ioctls have been hoisted up to the VFS level, use
      the VFs definitions directly and remove the XFS specific definitions
      completely. Userspace is going to have to handle the change of this
      interface separately, so removing the definitions from xfs_fs.h is
      not an issue here at all.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      e7b89481
  19. 03 11月, 2015 1 次提交
    • D
      xfs: optimise away log forces on timestamp updates for fdatasync · fc0561ce
      Dave Chinner 提交于
      xfs: timestamp updates cause excessive fdatasync log traffic
      
      Sage Weil reported that a ceph test workload was writing to the
      log on every fdatasync during an overwrite workload. Event tracing
      showed that the only metadata modification being made was the
      timestamp updates during the write(2) syscall, but fdatasync(2)
      is supposed to ignore them. The key observation was that the
      transactions in the log all looked like this:
      
      INODE: #regs: 4   ino: 0x8b  flags: 0x45   dsize: 32
      
      And contained a flags field of 0x45 or 0x85, and had data and
      attribute forks following the inode core. This means that the
      timestamp updates were triggering dirty relogging of previously
      logged parts of the inode that hadn't yet been flushed back to
      disk.
      
      There are two parts to this problem. The first is that XFS relogs
      dirty regions in subsequent transactions, so it carries around the
      fields that have been dirtied since the last time the inode was
      written back to disk, not since the last time the inode was forced
      into the log.
      
      The second part is that on v5 filesystems, the inode change count
      update during inode dirtying also sets the XFS_ILOG_CORE flag, so
      on v5 filesystems this makes a timestamp update dirty the entire
      inode.
      
      As a result when fdatasync is run, it looks at the dirty fields in
      the inode, and sees more than just the timestamp flag, even though
      the only metadata change since the last fdatasync was just the
      timestamps. Hence we force the log on every subsequent fdatasync
      even though it is not needed.
      
      To fix this, add a new field to the inode log item that tracks
      changes since the last time fsync/fdatasync forced the log to flush
      the changes to the journal. This flag is updated when we dirty the
      inode, but we do it before updating the change count so it does not
      carry the "core dirty" flag from timestamp updates. The fields are
      zeroed when the inode is marked clean (due to writeback/freeing) or
      when an fsync/datasync forces the log. Hence if we only dirty the
      timestamps on the inode between fsync/fdatasync calls, the fdatasync
      will not trigger another log force.
      
      Over 100 runs of the test program:
      
      Ext4 baseline:
      	runtime: 1.63s +/- 0.24s
      	avg lat: 1.59ms +/- 0.24ms
      	iops: ~2000
      
      XFS, vanilla kernel:
              runtime: 2.45s +/- 0.18s
      	avg lat: 2.39ms +/- 0.18ms
      	log forces: ~400/s
      	iops: ~1000
      
      XFS, patched kernel:
              runtime: 1.49s +/- 0.26s
      	avg lat: 1.46ms +/- 0.25ms
      	log forces: ~30/s
      	iops: ~1500
      Reported-by: NSage Weil <sage@redhat.com>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      fc0561ce
  20. 12 10月, 2015 1 次提交
    • B
      xfs: per-filesystem stats counter implementation · ff6d6af2
      Bill O'Donnell 提交于
      This patch modifies the stats counting macros and the callers
      to those macros to properly increment, decrement, and add-to
      the xfs stats counts. The counts for global and per-fs stats
      are correctly advanced, and cleared by writing a "1" to the
      corresponding clear file.
      
      global counts: /sys/fs/xfs/stats/stats
      per-fs counts: /sys/fs/xfs/sda*/stats/stats
      
      global clear:  /sys/fs/xfs/stats/stats_clear
      per-fs clear:  /sys/fs/xfs/sda*/stats/stats_clear
      
      [dchinner: cleaned up macro variables, removed CONFIG_FS_PROC around
       stats structures and macros. ]
      Signed-off-by: NBill O'Donnell <billodo@redhat.com>
      Reviewed-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      ff6d6af2
  21. 25 8月, 2015 1 次提交