1. 06 10月, 2016 2 次提交
  2. 05 10月, 2016 1 次提交
    • D
      xfs: when replaying bmap operations, don't let unlinked inodes get reaped · 17c12bcd
      Darrick J. Wong 提交于
      Log recovery will iget an inode to replay BUI items and iput the inode
      when it's done.  Unfortunately, if the inode was unlinked, the iput
      will see that i_nlink == 0 and decide to truncate & free the inode,
      which prevents us from replaying subsequent BUIs.  We can't skip the
      BUIs because we have to replay all the redo items to ensure that
      atomic operations complete.
      
      Since unlinked inode recovery will reap the inode anyway, we can
      safely introduce a new inode flag to indicate that an inode is in this
      'unlinked recovery' state and should not be auto-reaped in the
      drop_inode path.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      17c12bcd
  3. 03 8月, 2016 3 次提交
  4. 01 6月, 2016 1 次提交
  5. 18 5月, 2016 6 次提交
  6. 06 4月, 2016 3 次提交
  7. 10 2月, 2016 1 次提交
  8. 09 2月, 2016 7 次提交
  9. 11 1月, 2016 1 次提交
    • E
      xfs: eliminate committed arg from xfs_bmap_finish · f6106efa
      Eric Sandeen 提交于
      Calls to xfs_bmap_finish() and xfs_trans_ijoin(), and the
      associated comments were replicated several times across
      the attribute code, all dealing with what to do if the
      transaction was or wasn't committed.
      
      And in that replicated code, an ASSERT() test of an
      uninitialized variable occurs in several locations:
      
      	error = xfs_attr_thing(&args);
      	if (!error) {
      		error = xfs_bmap_finish(&args.trans, args.flist,
      					&committed);
      	}
      	if (error) {
      		ASSERT(committed);
      
      If the first xfs_attr_thing() failed, we'd skip the xfs_bmap_finish,
      never set "committed", and then test it in the ASSERT.
      
      Fix this up by moving the committed state internal to xfs_bmap_finish,
      and add a new inode argument.  If an inode is passed in, it is passed
      through to __xfs_trans_roll() and joined to the transaction there if
      the transaction was committed.
      
      xfs_qm_dqalloc() was a little unique in that it called bjoin rather
      than ijoin, but as Dave points out we can detect the committed state
      but checking whether (*tpp != tp).
      
      Addresses-Coverity-Id: 102360
      Addresses-Coverity-Id: 102361
      Addresses-Coverity-Id: 102363
      Addresses-Coverity-Id: 102364
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      f6106efa
  10. 04 1月, 2016 2 次提交
    • D
      xfs: introduce per-inode DAX enablement · 58f88ca2
      Dave Chinner 提交于
      Rather than just being able to turn DAX on and off via a mount
      option, some applications may only want to enable DAX for certain
      performance critical files in a filesystem.
      
      This patch introduces a new inode flag to enable DAX in the v3 inode
      di_flags2 field. It adds support for setting and clearing flags in
      the di_flags2 field via the XFS_IOC_FSSETXATTR ioctl, and sets the
      S_DAX inode flag appropriately when it is seen.
      
      When this flag is set on a directory, it acts as an "inherit flag".
      That is, inodes created in the directory will automatically inherit
      the on-disk inode DAX flag, enabling administrators to set up
      directory heirarchies that automatically use DAX. Setting this flag
      on an empty root directory will make the entire filesystem use DAX
      by default.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      58f88ca2
    • D
      xfs: use FS_XFLAG definitions directly · e7b89481
      Dave Chinner 提交于
      Now that the ioctls have been hoisted up to the VFS level, use
      the VFs definitions directly and remove the XFS specific definitions
      completely. Userspace is going to have to handle the change of this
      interface separately, so removing the definitions from xfs_fs.h is
      not an issue here at all.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      e7b89481
  11. 03 11月, 2015 1 次提交
    • D
      xfs: optimise away log forces on timestamp updates for fdatasync · fc0561ce
      Dave Chinner 提交于
      xfs: timestamp updates cause excessive fdatasync log traffic
      
      Sage Weil reported that a ceph test workload was writing to the
      log on every fdatasync during an overwrite workload. Event tracing
      showed that the only metadata modification being made was the
      timestamp updates during the write(2) syscall, but fdatasync(2)
      is supposed to ignore them. The key observation was that the
      transactions in the log all looked like this:
      
      INODE: #regs: 4   ino: 0x8b  flags: 0x45   dsize: 32
      
      And contained a flags field of 0x45 or 0x85, and had data and
      attribute forks following the inode core. This means that the
      timestamp updates were triggering dirty relogging of previously
      logged parts of the inode that hadn't yet been flushed back to
      disk.
      
      There are two parts to this problem. The first is that XFS relogs
      dirty regions in subsequent transactions, so it carries around the
      fields that have been dirtied since the last time the inode was
      written back to disk, not since the last time the inode was forced
      into the log.
      
      The second part is that on v5 filesystems, the inode change count
      update during inode dirtying also sets the XFS_ILOG_CORE flag, so
      on v5 filesystems this makes a timestamp update dirty the entire
      inode.
      
      As a result when fdatasync is run, it looks at the dirty fields in
      the inode, and sees more than just the timestamp flag, even though
      the only metadata change since the last fdatasync was just the
      timestamps. Hence we force the log on every subsequent fdatasync
      even though it is not needed.
      
      To fix this, add a new field to the inode log item that tracks
      changes since the last time fsync/fdatasync forced the log to flush
      the changes to the journal. This flag is updated when we dirty the
      inode, but we do it before updating the change count so it does not
      carry the "core dirty" flag from timestamp updates. The fields are
      zeroed when the inode is marked clean (due to writeback/freeing) or
      when an fsync/datasync forces the log. Hence if we only dirty the
      timestamps on the inode between fsync/fdatasync calls, the fdatasync
      will not trigger another log force.
      
      Over 100 runs of the test program:
      
      Ext4 baseline:
      	runtime: 1.63s +/- 0.24s
      	avg lat: 1.59ms +/- 0.24ms
      	iops: ~2000
      
      XFS, vanilla kernel:
              runtime: 2.45s +/- 0.18s
      	avg lat: 2.39ms +/- 0.18ms
      	log forces: ~400/s
      	iops: ~1000
      
      XFS, patched kernel:
              runtime: 1.49s +/- 0.26s
      	avg lat: 1.46ms +/- 0.25ms
      	log forces: ~30/s
      	iops: ~1500
      Reported-by: NSage Weil <sage@redhat.com>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      fc0561ce
  12. 12 10月, 2015 1 次提交
    • B
      xfs: per-filesystem stats counter implementation · ff6d6af2
      Bill O'Donnell 提交于
      This patch modifies the stats counting macros and the callers
      to those macros to properly increment, decrement, and add-to
      the xfs stats counts. The counts for global and per-fs stats
      are correctly advanced, and cleared by writing a "1" to the
      corresponding clear file.
      
      global counts: /sys/fs/xfs/stats/stats
      per-fs counts: /sys/fs/xfs/sda*/stats/stats
      
      global clear:  /sys/fs/xfs/stats/stats_clear
      per-fs clear:  /sys/fs/xfs/sda*/stats/stats_clear
      
      [dchinner: cleaned up macro variables, removed CONFIG_FS_PROC around
       stats structures and macros. ]
      Signed-off-by: NBill O'Donnell <billodo@redhat.com>
      Reviewed-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      ff6d6af2
  13. 25 8月, 2015 1 次提交
  14. 20 8月, 2015 1 次提交
  15. 19 8月, 2015 4 次提交
    • D
      xfs: stop holding ILOCK over filldir callbacks · dbad7c99
      Dave Chinner 提交于
      The recent change to the readdir locking made in 40194ecc ("xfs:
      reinstate the ilock in xfs_readdir") for CXFS directory sanity was
      probably the wrong thing to do. Deep in the readdir code we
      can take page faults in the filldir callback, and so taking a page
      fault while holding an inode ilock creates a new set of locking
      issues that lockdep warns all over the place about.
      
      The locking order for regular inodes w.r.t. page faults is io_lock
      -> pagefault -> mmap_sem -> ilock. The directory readdir code now
      triggers ilock -> page fault -> mmap_sem. While we cannot deadlock
      at this point, it inverts all the locking patterns that lockdep
      normally sees on XFS inodes, and so triggers lockdep. We worked
      around this with commit 93a8614e ("xfs: fix directory inode iolock
      lockdep false positive"), but that then just moved the lockdep
      warning to deeper in the page fault path and triggered on security
      inode locks. Fixing the shmem issue there just moved the lockdep
      reports somewhere else, and now we are getting false positives from
      filesystem freezing annotations getting confused.
      
      Further, if we enter memory reclaim in a readdir path, we now get
      lockdep warning about potential deadlocks because the ilock is held
      when we enter reclaim. This, again, is different to a regular file
      in that we never allow memory reclaim to run while holding the ilock
      for regular files. Hence lockdep now throws
      ilock->kmalloc->reclaim->ilock warnings.
      
      Basically, the problem is that the ilock is being used to protect
      the directory data and the inode metadata, whereas for a regular
      file the iolock protects the data and the ilock protects the
      metadata. From the VFS perspective, the i_mutex serialises all
      accesses to the directory data, and so not holding the ilock for
      readdir doesn't matter. The issue is that CXFS doesn't access
      directory data via the VFS, so it has no "data serialisaton"
      mechanism. Hence we need to hold the IOLOCK in the correct places to
      provide this low level directory data access serialisation.
      
      The ilock can then be used just when the extent list needs to be
      read, just like we do for regular files. The directory modification
      code can take the iolock exclusive when the ilock is also taken,
      and this then ensures that readdir is correct excluded while
      modifications are in progress.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      dbad7c99
    • D
      xfs: clean up inode lockdep annotations · 0952c818
      Dave Chinner 提交于
      Lockdep annotations are a maintenance nightmare. Locking has to be
      modified to suit the limitations of the annotations, and we're
      always having to fix the annotations because they are unable to
      express the complexity of locking heirarchies correctly.
      
      So, next up, we've got more issues with lockdep annotations for
      inode locking w.r.t. XFS_LOCK_PARENT:
      
      	- lockdep classes are exclusive and can't be ORed together
      	  to form new classes.
      	- IOLOCK needs multiple PARENT subclasses to express the
      	  changes needed for the readdir locking rework needed to
      	  stop the endless flow of lockdep false positives involving
      	  readdir calling filldir under the ILOCK.
      	- there are only 8 unique lockdep subclasses available,
      	  so we can't create a generic solution.
      
      IOWs we need to treat the 3-bit space available to each lock type
      differently:
      
      	- IOLOCK uses xfs_lock_two_inodes(), so needs:
      		- at least 2 IOLOCK subclasses
      		- at least 2 IOLOCK_PARENT subclasses
      	- MMAPLOCK uses xfs_lock_two_inodes(), so needs:
      		- at least 2 MMAPLOCK subclasses
      	- ILOCK uses xfs_lock_inodes with up to 5 inodes, so needs:
      		- at least 5 ILOCK subclasses
      		- one ILOCK_PARENT subclass
      		- one RTBITMAP subclass
      		- one RTSUM subclass
      
      For the IOLOCK, split the space into two sets of subclasses.
      For the MMAPLOCK, just use half the space for the one subclass to
      match the non-parent lock classes of the IOLOCK.
      For the ILOCK, use 0-4 as the ILOCK subclasses, 5-7 for the
      remaining individual subclasses.
      
      Because they are now all different, modify xfs_lock_inumorder() to
      handle the nested subclasses, and to assert fail if passed an
      invalid subclass. Further, annotate xfs_lock_inodes() to assert fail
      if an invalid combination of lock primitives and inode counts are
      passed that would result in a lockdep subclass annotation overflow.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      0952c818
    • D
      xfs: fix sb_meta_uuid usage · bbf155ad
      Dave Chinner 提交于
      After changing the UUID on a v5 filesystem, xfstests fails
      immediately on a debug kernel with:
      
      XFS: Assertion failed: uuid_equal(&ip->i_d.di_uuid, &mp->m_sb.sb_uuid), file: fs/xfs/xfs_inode.c, line: 799
      
      This needs to check against the sb_meta_uuid, not the user visible
      UUID that was changed.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      bbf155ad
    • B
      xfs: add missing bmap cancel calls in error paths · d4a97a04
      Brian Foster 提交于
      If a failure occurs after the bmap free list is populated and before
      xfs_bmap_finish() completes successfully (which returns a partial
      list on failure), the bmap free list must be cancelled. Otherwise,
      the extent items on the list are never freed and a memory leak
      occurs.
      
      Several random error paths throughout the code suffer this problem.
      Fix these up such that xfs_bmap_cancel() is always called on error.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      d4a97a04
  16. 29 7月, 2015 1 次提交
  17. 22 6月, 2015 1 次提交
  18. 04 6月, 2015 3 次提交
    • C
      xfs: saner xfs_trans_commit interface · 70393313
      Christoph Hellwig 提交于
      The flags argument to xfs_trans_commit is not useful for most callers, as
      a commit of a transaction without a permanent log reservation must pass
      0 here, and all callers for a transaction with a permanent log reservation
      except for xfs_trans_roll must pass XFS_TRANS_RELEASE_LOG_RES.  So remove
      the flags argument from the public xfs_trans_commit interfaces, and
      introduce low-level __xfs_trans_commit variant just for xfs_trans_roll
      that regrants a log reservation instead of releasing it.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      70393313
    • C
      xfs: remove the flags argument to xfs_trans_cancel · 4906e215
      Christoph Hellwig 提交于
      xfs_trans_cancel takes two flags arguments: XFS_TRANS_RELEASE_LOG_RES and
      XFS_TRANS_ABORT.  Both of them are a direct product of the transaction
      state, and can be deducted:
      
       - any dirty transaction needs XFS_TRANS_ABORT to be properly canceled,
         and XFS_TRANS_ABORT is a noop for a transaction that is not dirty.
       - any transaction with a permanent log reservation needs
         XFS_TRANS_RELEASE_LOG_RES to be properly canceled, and passing
         XFS_TRANS_RELEASE_LOG_RES for a transaction without a permanent
         log reservation is invalid.
      
      So just remove the flags argument and do the right thing.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      4906e215
    • C
      xfs: switch remaining xfs_trans_dup users to xfs_trans_roll · 2e6db6c4
      Christoph Hellwig 提交于
      We have three remaining callers of xfs_trans_dup:
      
       - xfs_itruncate_extents which open codes xfs_trans_roll
       - xfs_bmap_finish doesn't have an xfs_inode argument and thus leaves
         attaching them to it's callers, but otherwise is identical to
         xfs_trans_roll
       - xfs_dir_ialloc looks at the log reservations in the old xfs_trans
         structure instead of the log reservation parameters, but otherwise
         is identical to xfs_trans_roll.
      
      By allowing a NULL xfs_inode argument to xfs_trans_roll we can switch
      these three remaining users over to xfs_trans_roll and mark xfs_trans_dup
      static.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      2e6db6c4