1. 12 7月, 2018 1 次提交
  2. 25 6月, 2018 1 次提交
    • D
      xfs: recheck reflink state after grabbing ILOCK_SHARED for a write · 5bd88d15
      Darrick J. Wong 提交于
      The reflink iflag could have changed since the earlier unlocked check,
      so if we got ILOCK_SHARED for a write and but we're now a reflink inode
      we have to switch to ILOCK_EXCL and relock.
      
      This helps us avoid blowing lock assertions in things like generic/166:
      
      XFS: Assertion failed: xfs_isilocked(ip, XFS_ILOCK_EXCL), file: fs/xfs/xfs_reflink.c, line: 383
      WARNING: CPU: 1 PID: 24707 at fs/xfs/xfs_message.c:104 assfail+0x25/0x30 [xfs]
      Modules linked in: deadline_iosched dm_snapshot dm_bufio ext4 mbcache jbd2 dm_flakey xfs libcrc32c dax_pmem device_dax nd_pmem sch_fq_codel af_packet [last unloaded: scsi_debug]
      CPU: 1 PID: 24707 Comm: xfs_io Not tainted 4.18.0-rc1-djw #1
      Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1ubuntu1 04/01/2014
      RIP: 0010:assfail+0x25/0x30 [xfs]
      Code: ff 0f 0b c3 90 66 66 66 66 90 48 89 f1 41 89 d0 48 c7 c6 e8 ef 1b a0 48 89 fa 31 ff e8 54 f9 ff ff 80 3d fd ba 0f 00 00 75 03 <0f> 0b c3 0f 0b 66 0f 1f 44 00 00 66 66 66 66 90 48 63 f6 49 89 f9
      RSP: 0018:ffffc90006423ad8 EFLAGS: 00010246
      RAX: 0000000000000000 RBX: ffff880030b65e80 RCX: 0000000000000000
      RDX: 00000000ffffffc0 RSI: 000000000000000a RDI: ffffffffa01b0447
      RBP: ffffc90006423c10 R08: 0000000000000000 R09: 0000000000000000
      R10: ffff88003d43fc30 R11: f000000000000000 R12: ffff880077cda000
      R13: 0000000000000000 R14: ffffc90006423c30 R15: ffffc90006423bf9
      FS:  00007feba8986800(0000) GS:ffff88003ec00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 000000000138ab58 CR3: 000000003d40a000 CR4: 00000000000006a0
      Call Trace:
       xfs_reflink_allocate_cow+0x24c/0x3d0 [xfs]
       xfs_file_iomap_begin+0x6d2/0xeb0 [xfs]
       ? iomap_to_fiemap+0x80/0x80
       iomap_apply+0x5e/0x130
       iomap_dio_rw+0x2e0/0x400
       ? iomap_to_fiemap+0x80/0x80
       ? xfs_file_dio_aio_write+0x133/0x4a0 [xfs]
       xfs_file_dio_aio_write+0x133/0x4a0 [xfs]
       xfs_file_write_iter+0x7b/0xb0 [xfs]
       __vfs_write+0x16f/0x1f0
       vfs_write+0xc8/0x1c0
       ksys_pwrite64+0x74/0x90
       do_syscall_64+0x56/0x180
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      5bd88d15
  3. 21 6月, 2018 1 次提交
  4. 09 6月, 2018 1 次提交
  5. 07 6月, 2018 1 次提交
    • D
      xfs: convert to SPDX license tags · 0b61f8a4
      Dave Chinner 提交于
      Remove the verbose license text from XFS files and replace them
      with SPDX tags. This does not change the license of any of the code,
      merely refers to the common, up-to-date license files in LICENSES/
      
      This change was mostly scripted. fs/xfs/Makefile and
      fs/xfs/libxfs/xfs_fs.h were modified by hand, the rest were detected
      and modified by the following command:
      
      for f in `git grep -l "GNU General" fs/xfs/` ; do
      	echo $f
      	cat $f | awk -f hdr.awk > $f.new
      	mv -f $f.new $f
      done
      
      And the hdr.awk script that did the modification (including
      detecting the difference between GPL-2.0 and GPL-2.0+ licenses)
      is as follows:
      
      $ cat hdr.awk
      BEGIN {
      	hdr = 1.0
      	tag = "GPL-2.0"
      	str = ""
      }
      
      /^ \* This program is free software/ {
      	hdr = 2.0;
      	next
      }
      
      /any later version./ {
      	tag = "GPL-2.0+"
      	next
      }
      
      /^ \*\// {
      	if (hdr > 0.0) {
      		print "// SPDX-License-Identifier: " tag
      		print str
      		print $0
      		str=""
      		hdr = 0.0
      		next
      	}
      	print $0
      	next
      }
      
      /^ \* / {
      	if (hdr > 1.0)
      		next
      	if (hdr > 0.0) {
      		if (str != "")
      			str = str "\n"
      		str = str $0
      		next
      	}
      	print $0
      	next
      }
      
      /^ \*/ {
      	if (hdr > 0.0)
      		next
      	print $0
      	next
      }
      
      // {
      	if (hdr > 0.0) {
      		if (str != "")
      			str = str "\n"
      		str = str $0
      		next
      	}
      	print $0
      }
      
      END { }
      $
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      0b61f8a4
  6. 10 5月, 2018 4 次提交
  7. 02 3月, 2018 3 次提交
  8. 03 1月, 2018 1 次提交
  9. 09 12月, 2017 1 次提交
    • D
      xfs: make iomap_begin functions trim iomaps consistently · b7e0b6ff
      Darrick J. Wong 提交于
      Historically, the XFS iomap_begin function only returned mappings for
      exactly the range queried, i.e. it doesn't do XFS_BMAPI_ENTIRE lookups.
      The current vfs iomap consumers are only set up to deal with trimmed
      mappings.  xfs_xattr_iomap_begin does BMAPI_ENTIRE lookups, which is
      inconsistent with the current iomap usage.  Remove the flag so that both
      iomap_begin functions behave the same way.
      
      FWIW this also fixes a behavioral regression in xattr FIEMAP that was
      introduced in 4.8 wherein attr fork extents are no longer trimmed like
      they used to be.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      b7e0b6ff
  10. 14 11月, 2017 1 次提交
  11. 07 11月, 2017 1 次提交
  12. 03 11月, 2017 1 次提交
    • C
      xfs: support for synchronous DAX faults · a39e596b
      Christoph Hellwig 提交于
      Return IOMAP_F_DIRTY from xfs_file_iomap_begin() when asked to prepare
      blocks for writing and the inode is pinned, and has dirty fields other
      than the timestamps.  In __xfs_filemap_fault() we then detect this case
      and call dax_finish_sync_fault() to make sure all metadata is committed,
      and to insert the page table entry.
      
      Note that this will also dirty corresponding radix tree entry which is
      what we want - fsync(2) will still provide data integrity guarantees for
      applications not using userspace flushing. And applications using
      userspace flushing can avoid calling fsync(2) and thus avoid the
      performance overhead.
      
      [JK: Added VM_SYNC flag handling]
      Reviewed-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      a39e596b
  13. 02 11月, 2017 1 次提交
  14. 02 10月, 2017 1 次提交
  15. 27 9月, 2017 1 次提交
  16. 02 9月, 2017 2 次提交
  17. 01 9月, 2017 1 次提交
  18. 28 6月, 2017 2 次提交
  19. 20 6月, 2017 1 次提交
  20. 14 5月, 2017 1 次提交
    • D
      dax, xfs, ext4: compile out iomap-dax paths in the FS_DAX=n case · f5705aa8
      Dan Williams 提交于
      Tetsuo reports:
      
        fs/built-in.o: In function `xfs_file_iomap_end':
        xfs_iomap.c:(.text+0xe0ef9): undefined reference to `put_dax'
        fs/built-in.o: In function `xfs_file_iomap_begin':
        xfs_iomap.c:(.text+0xe1a7f): undefined reference to `dax_get_by_host'
        make: *** [vmlinux] Error 1
        $ grep DAX .config
        CONFIG_DAX=m
        # CONFIG_DEV_DAX is not set
        # CONFIG_FS_DAX is not set
      
      When FS_DAX=n we can/must throw away the dax code in filesystems.
      Implement 'fs_' versions of dax_get_by_host() and put_dax() that are
      nops in the FS_DAX=n case.
      
      Cc: <linux-xfs@vger.kernel.org>
      Cc: <linux-ext4@vger.kernel.org>
      Cc: Jan Kara <jack@suse.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Tested-by: NTony Luck <tony.luck@intel.com>
      Fixes: ef510424 ("block, dax: move 'select DAX' from BLOCK to FS_DAX")
      Reported-by: NTetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      f5705aa8
  21. 26 4月, 2017 1 次提交
  22. 07 4月, 2017 1 次提交
  23. 04 4月, 2017 1 次提交
  24. 09 3月, 2017 1 次提交
    • B
      xfs: use iomap new flag for newly allocated delalloc blocks · f65e6fad
      Brian Foster 提交于
      Commit fa7f138a ("xfs: clear delalloc and cache on buffered write
      failure") fixed one regression in the iomap error handling code and
      exposed another. The fundamental problem is that if a buffered write
      is a rewrite of preexisting delalloc blocks and the write fails, the
      failure handling code can punch out preexisting blocks with valid
      file data.
      
      This was reproduced directly by sub-block writes in the LTP
      kernel/syscalls/write/write03 test. A first 100 byte write allocates
      a single block in a file. A subsequent 100 byte write fails and
      punches out the block, including the data successfully written by
      the previous write.
      
      To address this problem, update the ->iomap_begin() handler to
      distinguish newly allocated delalloc blocks from preexisting
      delalloc blocks via the IOMAP_F_NEW flag. Use this flag in the
      ->iomap_end() handler to decide when a failed or short write should
      punch out delalloc blocks.
      
      This introduces the subtle requirement that ->iomap_begin() should
      never combine newly allocated delalloc blocks with existing blocks
      in the resulting iomap descriptor. This can occur when a new
      delalloc reservation merges with a neighboring extent that is part
      of the current write, for example. Therefore, drop the
      post-allocation extent lookup from xfs_bmapi_reserve_delalloc() and
      just return the record inserted into the fork. This ensures only new
      blocks are returned and thus that preexisting delalloc blocks are
      always handled as "found" blocks and not punched out on a failed
      rewrite.
      Reported-by: NXiong Zhou <xzhou@redhat.com>
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      f65e6fad
  25. 17 2月, 2017 2 次提交
    • B
      xfs: resurrect debug mode drop buffered writes mechanism · 9dbddd7b
      Brian Foster 提交于
      A debug mode write failure mechanism was introduced to XFS in commit
      801cc4e1 ("xfs: debug mode forced buffered write failure") to
      facilitate targeted testing of delalloc indirect reservation management
      from userspace. This code was subsequently rendered ineffective by the
      move to iomap based buffered writes in commit 68a9f5e7 ("xfs:
      implement iomap based buffered write path"). This likely went unnoticed
      because the associated userspace code had not made it into xfstests.
      
      Resurrect this mechanism to facilitate effective indlen reservation
      testing from xfstests. The move to iomap based buffered writes relocated
      the hook this mechanism needs to return write failure from XFS to
      generic code. The failure trigger must remain in XFS. Given that
      limitation, convert this from a write failure mechanism to one that
      simply drops writes without returning failure to userspace. Rename all
      "fail_writes" references to "drop_writes" to illustrate the point. This
      is more hacky than preferred, but still triggers the XFS error handling
      behavior required to drive the indlen tests. This is only available in
      DEBUG mode and for testing purposes only.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      9dbddd7b
    • B
      xfs: clear delalloc and cache on buffered write failure · fa7f138a
      Brian Foster 提交于
      The buffered write failure handling code in
      xfs_file_iomap_end_delalloc() has a couple minor problems. First, if
      written == 0, start_fsb is not rounded down and it fails to kill off a
      delalloc block if the start offset is block unaligned. This results in a
      lingering delalloc block and broken delalloc block accounting detected
      at unmount time. Fix this by rounding down start_fsb in the unlikely
      event that written == 0.
      
      Second, it is possible for a failed overwrite of a delalloc extent to
      leave dirty pagecache around over a hole in the file. This is because is
      possible to hit ->iomap_end() on write failure before the iomap code has
      attempted to allocate pagecache, and thus has no need to clean it up. If
      the targeted delalloc extent was successfully written by a previous
      write, however, then it does still have dirty pages when ->iomap_end()
      punches out the underlying blocks. This ultimately results in writeback
      over a hole. To fix this problem, unconditionally punch out the
      pagecache from XFS before the associated delalloc range.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      fa7f138a
  26. 07 2月, 2017 3 次提交
  27. 03 2月, 2017 1 次提交
    • D
      xfs: mark speculative prealloc CoW fork extents unwritten · 5eda4300
      Darrick J. Wong 提交于
      Christoph Hellwig pointed out that there's a potentially nasty race when
      performing simultaneous nearby directio cow writes:
      
      "Thread 1 writes a range from B to c
      
      "                    B --------- C
                                 p
      
      "a little later thread 2 writes from A to B
      
      "        A --------- B
                     p
      
      [editor's note: the 'p' denote cowextsize boundaries, which I added to
      make this more clear]
      
      "but the code preallocates beyond B into the range where thread
      "1 has just written, but ->end_io hasn't been called yet.
      "But once ->end_io is called thread 2 has already allocated
      "up to the extent size hint into the write range of thread 1,
      "so the end_io handler will splice the unintialized blocks from
      "that preallocation back into the file right after B."
      
      We can avoid this race by ensuring that thread 1 cannot accidentally
      remap the blocks that thread 2 allocated (as part of speculative
      preallocation) as part of t2's write preparation in t1's end_io handler.
      The way we make this happen is by taking advantage of the unwritten
      extent flag as an intermediate step.
      
      Recall that when we begin the process of writing data to shared blocks,
      we create a delayed allocation extent in the CoW fork:
      
      D: --RRRRRRSSSRRRRRRRR---
      C: ------DDDDDDD---------
      
      When a thread prepares to CoW some dirty data out to disk, it will now
      convert the delalloc reservation into an /unwritten/ allocated extent in
      the cow fork.  The da conversion code tries to opportunistically
      allocate as much of a (speculatively prealloc'd) extent as possible, so
      we may end up allocating a larger extent than we're actually writing
      out:
      
      D: --RRRRRRSSSRRRRRRRR---
      U: ------UUUUUUU---------
      
      Next, we convert only the part of the extent that we're actively
      planning to write to normal (i.e. not unwritten) status:
      
      D: --RRRRRRSSSRRRRRRRR---
      U: ------UURRUUU---------
      
      If the write succeeds, the end_cow function will now scan the relevant
      range of the CoW fork for real extents and remap only the real extents
      into the data fork:
      
      D: --RRRRRRRRSRRRRRRRR---
      U: ------UU--UUU---------
      
      This ensures that we never obliterate valid data fork extents with
      unwritten blocks from the CoW fork.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      5eda4300
  28. 31 1月, 2017 1 次提交
  29. 24 1月, 2017 1 次提交
    • C
      xfs: fix COW writeback race · d2b3964a
      Christoph Hellwig 提交于
      Due to the way how xfs_iomap_write_allocate tries to convert the whole
      found extents from delalloc to real space we can run into a race
      condition with multiple threads doing writes to this same extent.
      For the non-COW case that is harmless as the only thing that can happen
      is that we call xfs_bmapi_write on an extent that has already been
      converted to a real allocation.  For COW writes where we move the extent
      from the COW to the data fork after I/O completion the race is, however,
      not quite as harmless.  In the worst case we are now calling
      xfs_bmapi_write on a region that contains hole in the COW work, which
      will trip up an assert in debug builds or lead to file system corruption
      in non-debug builds.  This seems to be reproducible with workloads of
      small O_DSYNC write, although so far I've not managed to come up with
      a with an isolated reproducer.
      
      The fix for the issue is relatively simple:  tell xfs_bmapi_write
      that we are only asked to convert delayed allocations and skip holes
      in that case.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      d2b3964a
  30. 30 11月, 2016 1 次提交
    • C
      xfs: use iomap_dio_rw · acdda3aa
      Christoph Hellwig 提交于
      Straight switch over to using iomap for direct I/O - we already have the
      non-COW dio path in write_begin for DAX and files with extent size hints,
      so nothing to add there.  The COW path is ported over from the old
      get_blocks version and a bit of a mess, but I have some work in progress
      to make it look more like the buffered I/O COW path.
      
      This gets rid of xfs_get_blocks_direct and the last caller of
      xfs_get_blocks with the create flag set, so all that code can be removed.
      
      Last but not least I've removed a comment in xfs_filemap_fault that
      refers to xfs_get_blocks entirely instead of updating it - while the
      reference is correct, the whole DAX fault path looks different than
      the non-DAX one, so it seems rather pointless.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Tested-by: NJens Axboe <axboe@fb.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      acdda3aa