1. 07 6月, 2018 1 次提交
    • D
      xfs: convert to SPDX license tags · 0b61f8a4
      Dave Chinner 提交于
      Remove the verbose license text from XFS files and replace them
      with SPDX tags. This does not change the license of any of the code,
      merely refers to the common, up-to-date license files in LICENSES/
      
      This change was mostly scripted. fs/xfs/Makefile and
      fs/xfs/libxfs/xfs_fs.h were modified by hand, the rest were detected
      and modified by the following command:
      
      for f in `git grep -l "GNU General" fs/xfs/` ; do
      	echo $f
      	cat $f | awk -f hdr.awk > $f.new
      	mv -f $f.new $f
      done
      
      And the hdr.awk script that did the modification (including
      detecting the difference between GPL-2.0 and GPL-2.0+ licenses)
      is as follows:
      
      $ cat hdr.awk
      BEGIN {
      	hdr = 1.0
      	tag = "GPL-2.0"
      	str = ""
      }
      
      /^ \* This program is free software/ {
      	hdr = 2.0;
      	next
      }
      
      /any later version./ {
      	tag = "GPL-2.0+"
      	next
      }
      
      /^ \*\// {
      	if (hdr > 0.0) {
      		print "// SPDX-License-Identifier: " tag
      		print str
      		print $0
      		str=""
      		hdr = 0.0
      		next
      	}
      	print $0
      	next
      }
      
      /^ \* / {
      	if (hdr > 1.0)
      		next
      	if (hdr > 0.0) {
      		if (str != "")
      			str = str "\n"
      		str = str $0
      		next
      	}
      	print $0
      	next
      }
      
      /^ \*/ {
      	if (hdr > 0.0)
      		next
      	print $0
      	next
      }
      
      // {
      	if (hdr > 0.0) {
      		if (str != "")
      			str = str "\n"
      		str = str $0
      		next
      	}
      	print $0
      }
      
      END { }
      $
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      0b61f8a4
  2. 22 5月, 2018 1 次提交
    • D
      xfs: prepare xfs_break_layouts() for another layout type · 69eb5fa1
      Dan Williams 提交于
      When xfs is operating as the back-end of a pNFS block server, it
      prevents collisions between local and remote operations by requiring a
      lease to be held for remotely accessed blocks. Local filesystem
      operations break those leases before writing or mutating the extent map
      of the file.
      
      A similar mechanism is needed to prevent operations on pinned dax
      mappings, like device-DMA, from colliding with extent unmap operations.
      
      BREAK_WRITE and BREAK_UNMAP are introduced as two distinct levels of
      layout breaking.
      
      Layouts are broken in the BREAK_WRITE case to ensure that layout-holders
      do not collide with local writes. Additionally, layouts are broken in
      the BREAK_UNMAP case to make sure the layout-holder has a consistent
      view of the file's extent map. While BREAK_WRITE breaks can be satisfied
      be recalling FL_LAYOUT leases, BREAK_UNMAP breaks additionally require
      waiting for busy dax-pages to go idle while holding XFS_MMAPLOCK_EXCL.
      
      After this refactoring xfs_break_layouts() becomes the entry point for
      coordinating both types of breaks. Finally, xfs_break_leased_layouts()
      becomes just the BREAK_WRITE handler.
      
      Note that the unlock tracking is needed in a follow on change. That will
      coordinate retrying either break handler until both successfully test
      for a lease break while maintaining the lock state.
      
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
      Reported-by: NDave Chinner <david@fromorbit.com>
      Reported-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      69eb5fa1
  3. 16 5月, 2018 1 次提交
    • B
      xfs: factor out nodiscard helpers · 4e529339
      Brian Foster 提交于
      The changes to skip discards of speculative preallocation and
      unwritten extents introduced several new wrapper functions through
      the bunmapi -> extent free codepath to reduce churn in all of the
      associated callers. In several cases, these wrappers simply toggle a
      single flag to skip or not skip discards for the resulting blocks.
      
      The explicit _nodiscard() wrappers for such an isolated set of
      callers is a bit overkill. Kill off these wrappers and replace with
      the calls to the underlying functions in the contexts that need to
      control discard behavior. Retain the wrappers that preserve the
      original calling conventions to serve the original purpose of
      reducing code churn.
      
      This is a refactoring patch and does not change behavior.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      4e529339
  4. 10 5月, 2018 1 次提交
    • B
      xfs: skip online discard during eofblocks trims · 13b86fc3
      Brian Foster 提交于
      We've had reports of online discard operations being sent from XFS
      on write-only workloads. These discards occur as a result of
      eofblocks trims that can occur after a large file copy completes.
      
      These discards are slightly confusing for users who might be paying
      close attention to online discards (i.e., vdo) due to performance
      sensitivity. They also happen to be spurious because freed post-eof
      blocks by definition have not been written to during the current
      allocation cycle.
      
      Update xfs_free_eofblocks() to skip discards that are purely
      attributed to eofblocks trims. This cuts down the number of spurious
      discards that may occur on write-only workloads due to normal
      preallocation activity.
      
      Note that discards of post-eof extents can still occur from other
      codepaths that do not isolate handling of post-eof blocks from those
      within eof. For example, file unlinks and truncates may still cause
      discards for any file blocks affected by the operation.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      13b86fc3
  5. 10 4月, 2018 1 次提交
  6. 03 4月, 2018 1 次提交
  7. 16 3月, 2018 1 次提交
  8. 29 1月, 2018 1 次提交
  9. 09 1月, 2018 1 次提交
  10. 21 12月, 2017 1 次提交
  11. 09 12月, 2017 1 次提交
    • C
      xfs: remove "no-allocation" reservations for file creations · f59cf5c2
      Christoph Hellwig 提交于
      If we create a new file we will need an inode, and usually some metadata
      in the parent direction.  Aiming for everything to go well despite the
      lack of a reservation leads to dirty transactions cancelled under a heavy
      create/delete load.  This patch removes those nospace transactions, which
      will lead to slightly earlier ENOSPC on some workloads, but instead
      prevent file system shutdowns due to cancelling dirty transactions for
      others.
      
      A customer could observe assertations failures and shutdowns due to
      cancelation of dirty transactions during heavy NFS workloads as shown
      below:
      
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728125] XFS: Assertion failed: error != -ENOSPC, file: fs/xfs/xfs_inode.c, line: 1262
      
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728222] Call Trace:
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728246]  [<ffffffff81795daf>] dump_stack+0x63/0x81
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728262]  [<ffffffff810a1a5a>] warn_slowpath_common+0x8a/0xc0
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728264]  [<ffffffff810a1b8a>] warn_slowpath_null+0x1a/0x20
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728285]  [<ffffffffa01bf403>] asswarn+0x33/0x40 [xfs]
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728308]  [<ffffffffa01bb07e>] xfs_create+0x7be/0x7d0 [xfs]
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728329]  [<ffffffffa01b6ffb>] xfs_generic_create+0x1fb/0x2e0 [xfs]
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728348]  [<ffffffffa01b7114>] xfs_vn_mknod+0x14/0x20 [xfs]
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728366]  [<ffffffffa01b7153>] xfs_vn_create+0x13/0x20 [xfs]
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728380]  [<ffffffff81231de5>] vfs_create+0xd5/0x140
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728390]  [<ffffffffa045ddb9>] do_nfsd_create+0x499/0x610 [nfsd]
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728396]  [<ffffffffa0465fa5>] nfsd3_proc_create+0x135/0x210 [nfsd]
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728401]  [<ffffffffa04561e3>] nfsd_dispatch+0xc3/0x210 [nfsd]
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728416]  [<ffffffffa03bfa43>] svc_process_common+0x453/0x6f0 [sunrpc]
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728423]  [<ffffffffa03bfdf3>] svc_process+0x113/0x1f0 [sunrpc]
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728427]  [<ffffffffa0455bcf>] nfsd+0x10f/0x180 [nfsd]
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728432]  [<ffffffffa0455ac0>] ? nfsd_destroy+0x80/0x80 [nfsd]
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728438]  [<ffffffff810c0d58>] kthread+0xd8/0xf0
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728441]  [<ffffffff810c0c80>] ? kthread_create_on_node+0x1b0/0x1b0
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728451]  [<ffffffff8179d962>] ret_from_fork+0x42/0x70
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728453]  [<ffffffff810c0c80>] ? kthread_create_on_node+0x1b0/0x1b0
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728454] ---[ end trace f9822c842fec81d4 ]---
      
      2017-05-30 21:17:06 kernel: ALERT: [ 2670.728477] XFS (sdb): Internal error xfs_trans_cancel at line 983 of file fs/xfs/xfs_trans.c.  Caller xfs_create+0x4ee/0x7d0 [xfs]
      
      2017-05-30 21:17:06 kernel: ALERT: [ 2670.728684] XFS (sdb): Corruption of in-memory data detected. Shutting down filesystem
      2017-05-30 21:17:06 kernel: ALERT: [ 2670.728685] XFS (sdb): Please umount the filesystem and rectify the problem(s)
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      f59cf5c2
  12. 27 10月, 2017 1 次提交
  13. 03 7月, 2017 1 次提交
  14. 20 6月, 2017 1 次提交
    • D
      xfs: remove double-underscore integer types · c8ce540d
      Darrick J. Wong 提交于
      This is a purely mechanical patch that removes the private
      __{u,}int{8,16,32,64}_t typedefs in favor of using the system
      {u,}int{8,16,32,64}_t typedefs.  This is the sed script used to perform
      the transformation and fix the resulting whitespace and indentation
      errors:
      
      s/typedef\t__uint8_t/typedef __uint8_t\t/g
      s/typedef\t__uint/typedef __uint/g
      s/typedef\t__int\([0-9]*\)_t/typedef int\1_t\t/g
      s/__uint8_t\t/__uint8_t\t\t/g
      s/__uint/uint/g
      s/__int\([0-9]*\)_t\t/__int\1_t\t\t/g
      s/__int/int/g
      /^typedef.*int[0-9]*_t;$/d
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      c8ce540d
  15. 28 4月, 2017 1 次提交
  16. 30 11月, 2016 1 次提交
  17. 10 11月, 2016 1 次提交
    • B
      xfs: fix unbalanced inode reclaim flush locking · 98efe8af
      Brian Foster 提交于
      Filesystem shutdown testing on an older distro kernel has uncovered an
      imbalanced locking pattern for the inode flush lock in
      xfs_reclaim_inode(). Specifically, there is a double unlock sequence
      between the call to xfs_iflush_abort() and xfs_reclaim_inode() at the
      "reclaim:" label.
      
      This actually does not cause obvious problems on current kernels due to
      the current flush lock implementation. Older kernels use a counting
      based flush lock mechanism, however, which effectively breaks the lock
      indefinitely when an already unlocked flush lock is repeatedly unlocked.
      Though this only currently occurs on filesystem shutdown, it has
      reproduced the effect of elevating an fs shutdown to a system-wide crash
      or hang.
      
      As it turns out, the flush lock is not actually required for the reclaim
      logic in xfs_reclaim_inode() because by that time we have already cycled
      the flush lock once while holding ILOCK_EXCL. Therefore, remove the
      additional flush lock/unlock cycle around the 'reclaim:' label and
      update branches into this label to release the flush lock where
      appropriate. Add an assert to xfs_ifunlock() to help prevent future
      occurences of the same problem.
      Reported-by: NZorro Lang <zlang@redhat.com>
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      98efe8af
  18. 06 10月, 2016 2 次提交
  19. 05 10月, 2016 2 次提交
  20. 04 10月, 2016 1 次提交
  21. 19 9月, 2016 1 次提交
  22. 03 8月, 2016 1 次提交
  23. 20 7月, 2016 1 次提交
  24. 21 6月, 2016 1 次提交
  25. 01 6月, 2016 1 次提交
  26. 06 4月, 2016 1 次提交
  27. 09 2月, 2016 3 次提交
  28. 08 2月, 2016 1 次提交
  29. 19 8月, 2015 1 次提交
    • D
      xfs: clean up inode lockdep annotations · 0952c818
      Dave Chinner 提交于
      Lockdep annotations are a maintenance nightmare. Locking has to be
      modified to suit the limitations of the annotations, and we're
      always having to fix the annotations because they are unable to
      express the complexity of locking heirarchies correctly.
      
      So, next up, we've got more issues with lockdep annotations for
      inode locking w.r.t. XFS_LOCK_PARENT:
      
      	- lockdep classes are exclusive and can't be ORed together
      	  to form new classes.
      	- IOLOCK needs multiple PARENT subclasses to express the
      	  changes needed for the readdir locking rework needed to
      	  stop the endless flow of lockdep false positives involving
      	  readdir calling filldir under the ILOCK.
      	- there are only 8 unique lockdep subclasses available,
      	  so we can't create a generic solution.
      
      IOWs we need to treat the 3-bit space available to each lock type
      differently:
      
      	- IOLOCK uses xfs_lock_two_inodes(), so needs:
      		- at least 2 IOLOCK subclasses
      		- at least 2 IOLOCK_PARENT subclasses
      	- MMAPLOCK uses xfs_lock_two_inodes(), so needs:
      		- at least 2 MMAPLOCK subclasses
      	- ILOCK uses xfs_lock_inodes with up to 5 inodes, so needs:
      		- at least 5 ILOCK subclasses
      		- one ILOCK_PARENT subclass
      		- one RTBITMAP subclass
      		- one RTSUM subclass
      
      For the IOLOCK, split the space into two sets of subclasses.
      For the MMAPLOCK, just use half the space for the one subclass to
      match the non-parent lock classes of the IOLOCK.
      For the ILOCK, use 0-4 as the ILOCK subclasses, 5-7 for the
      remaining individual subclasses.
      
      Because they are now all different, modify xfs_lock_inumorder() to
      handle the nested subclasses, and to assert fail if passed an
      invalid subclass. Further, annotate xfs_lock_inodes() to assert fail
      if an invalid combination of lock primitives and inode counts are
      passed that would result in a lockdep subclass annotation overflow.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      0952c818
  30. 23 2月, 2015 3 次提交
    • D
      xfs: inodes are new until the dentry cache is set up · 58c90473
      Dave Chinner 提交于
      Al Viro noticed a generic set of issues to do with filehandle lookup
      racing with dentry cache setup. They involve a filehandle lookup
      occurring while an inode is being created and the filehandle lookup
      racing with the dentry creation for the real file. This can lead to
      multiple dentries for the one path being instantiated. There are a
      host of other issues around this same set of paths.
      
      The underlying cause is that file handle lookup only waits on inode
      cache instantiation rather than full dentry cache instantiation. XFS
      is mostly immune to the problems discovered due to it's own internal
      inode cache, but there are a couple of corner cases where races can
      happen.
      
      We currently clear the XFS_INEW flag when the inode is fully set up
      after insertion into the cache. Newly allocated inodes are inserted
      locked and so aren't usable until the allocation transaction
      commits. This, however, occurs before the dentry and security
      information is fully initialised and hence the inode is unlocked and
      available for lookups to find too early.
      
      To solve the problem, only clear the XFS_INEW flag for newly created
      inodes once the dentry is fully instantiated. This means lookups
      will retry until the XFS_INEW flag is removed from the inode and
      hence avoids the race conditions in questions.
      
      THis also means that xfs_create(), xfs_create_tmpfile() and
      xfs_symlink() need to finish the setup of the inode in their error
      paths if we had allocated the inode but failed later in the creation
      process. xfs_symlink(), in particular, needed a lot of help to make
      it's error handling match that of xfs_create().
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      58c90473
    • D
      xfs: ensure truncate forces zeroed blocks to disk · 5885ebda
      Dave Chinner 提交于
      A new fsync vs power fail test in xfstests indicated that XFS can
      have unreliable data consistency when doing extending truncates that
      require block zeroing. The blocks beyond EOF get zeroed in memory,
      but we never force those changes to disk before we run the
      transaction that extends the file size and exposes those blocks to
      userspace. This can result in the blocks not being correctly zeroed
      after a crash.
      
      Because in-memory behaviour is correct, tools like fsx don't pick up
      any coherency problems - it's not until the filesystem is shutdown
      or the system crashes after writing the truncate transaction to the
      journal but before the zeroed data in the page cache is flushed that
      the issue is exposed.
      
      Fix this by also flushing the dirty data in memory region between
      the old size and new size when we've found blocks that need zeroing
      in the truncate process.
      Reported-by: NLiu Bo <bo.li.liu@oracle.com>
      cc: <stable@vger.kernel.org>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      5885ebda
    • D
      xfs: introduce mmap/truncate lock · 653c60b6
      Dave Chinner 提交于
      Right now we cannot serialise mmap against truncate or hole punch
      sanely. ->page_mkwrite is not able to take locks that the read IO
      path normally takes (i.e. the inode iolock) because that could
      result in lock inversions (read - iolock - page fault - page_mkwrite
      - iolock) and so we cannot use an IO path lock to serialise page
      write faults against truncate operations.
      
      Instead, introduce a new lock that is used *only* in the
      ->page_mkwrite path that is the equivalent of the iolock. The lock
      ordering in a page fault is i_mmaplock -> page lock -> i_ilock,
      and so in truncate we can i_iolock -> i_mmaplock and so lock out
      new write faults during the process of truncation.
      
      Because i_mmap_lock is outside the page lock, we can hold it across
      all the same operations we hold the i_iolock for. The only
      difference is that we never hold the i_mmaplock in the normal IO
      path and so do not ever have the possibility that we can page fault
      inside it. Hence there are no recursion issues on the i_mmap_lock
      and so we can use it to serialise page fault IO against inode
      modification operations that affect the IO path.
      
      This patch introduces the i_mmaplock infrastructure, lockdep
      annotations and initialisation/destruction code. Use of the new lock
      will be in subsequent patches.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      653c60b6
  31. 02 2月, 2015 1 次提交
  32. 24 12月, 2014 1 次提交
  33. 28 11月, 2014 1 次提交
  34. 02 10月, 2014 1 次提交
    • B
      xfs: check for inode size overflow in xfs_new_eof() · ce57bcf6
      Brian Foster 提交于
      If we write to the maximum file offset (2^63-2), XFS fails to log the
      inode size update when the page is flushed. For example:
      
      $ xfs_io -fc "pwrite `echo "2^63-1-1" | bc` 1" /mnt/file
      wrote 1/1 bytes at offset 9223372036854775806
      1.000000 bytes, 1 ops; 0.0000 sec (22.711 KiB/sec and 23255.8140 ops/sec)
      $ stat -c %s /mnt/file
      9223372036854775807
      $ umount /mnt ; mount <dev> /mnt/
      $ stat -c %s /mnt/file
      0
      
      This occurs because XFS calculates the new file size as io_offset +
      io_size, I/O occurs in block sized requests, and the maximum supported
      file size is not block aligned. Therefore, a write to the max allowable
      offset on a 4k blocksize fs results in a write of size 4k to offset
      2^63-4096 (e.g., equivalent to round_down(2^63-1, 4096), or IOW the
      offset of the block that contains the max file size). The offset plus
      size calculation (2^63 - 4096 + 4096 == 2^63) overflows the signed
      64-bit variable which goes negative and causes the > comparison to the
      on-disk inode size to fail. This returns 0 from xfs_new_eof() and
      results in no change to the inode on-disk.
      
      Update xfs_new_eof() to explicitly detect overflow of the local
      calculation and use the VFS inode size in this scenario. The VFS inode
      size is capped to the maximum and thus XFS writes the correct inode size
      to disk.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      ce57bcf6