1. 12 7月, 2018 11 次提交
    • C
      xfs: remove xfs_reflink_find_cow_mapping · 060d4eaa
      Christoph Hellwig 提交于
      We only have one caller left, and open coding the simple extent list
      lookup in it allows us to make the code both more understandable and
      reuse calculations and variables already present.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      060d4eaa
    • D
      xfs: make xfs_writepage_map extent map centric · e2f6ad46
      Dave Chinner 提交于
      xfs_writepage_map() iterates over the bufferheads on a page to decide
      what sort of IO to do and what actions to take.  However, when it comes
      to reflink and deciding when it needs to execute a COW operation, we no
      longer look at the bufferhead state but instead we ignore than and look
      up internal state held in the COW fork extent list.
      
      This means xfs_writepage_map() is somewhat confused. It does stuff, then
      ignores it, then tries to handle the impedence mismatch by shovelling the
      results inside the existing mapping code.  It works, but it's a bit of a
      mess and it makes it hard to fix the cached map bug that the writepage
      code currently has.
      
      To unify the two different mechanisms, we first have to choose a direction.
      That's already been set - we're de-emphasising bufferheads so they are no
      longer a control structure as we need to do taht to allow for eventual
      removal.  Hence we need to move away from looking at bufferhead state to
      determine what operations we need to perform.
      
      We can't completely get rid of bufferheads yet - they do contain some
      state that is absolutely necessary, such as whether that part of the page
      contains valid data or not (buffer_uptodate()).  Other state in the
      bufferhead is redundant:
      
      	BH_dirty - the page is dirty, so we can ignore this and just
      		write it
      	BH_delay - we have delalloc extent info in the DATA fork extent
      		tree
      	BH_unwritten - same as BH_delay
      	BH_mapped - indicates we've already used it once for IO and it is
      		mapped to a disk address. Needs to be ignored for COW
      		blocks.
      
      The BH_mapped flag is an interesting case - it's supposed to indicate that
      it's already mapped to disk and so we can just use it "as is".  In theory,
      we don't even have to do an extent lookup to find where to write it too,
      but we have to do that anyway to determine we are actually writing over a
      valid extent.  Hence it's not even serving the purpose of avoiding a an
      extent lookup during writeback, and so we can pretty much ignore it.
      Especially as we have to ignore it for COW operations...
      
      Therefore, use the extent map as the source of information to tell us
      what actions we need to take and what sort of IO we should perform.  The
      first step is to have xfs_map_blocks() set the io type according to what
      it looks up.  This means it can easily handle both normal overwrite and
      COW cases.  The only thing we also need to add is the ability to return
      hole mappings.
      
      We need to return and cache hole mappings now for the case of multiple
      blocks per page.  We no longer use the BH_mapped to indicate a block over
      a hole, so we have to get that info from xfs_map_blocks().  We cache it so
      that holes that span two pages don't need separate lookups.  This allows us
      to avoid ever doing write IO over a hole, too.
      
      Now that we have xfs_map_blocks() returning both a cached map and the type
      of IO we need to perform, we can rewrite xfs_writepage_map() to drop all
      the bufferhead control. It's also much simplified because it doesn't need
      to explicitly handle COW operations.  Instead of iterating bufferheads, it
      iterates blocks within the page and then looks up what per-block state is
      required from the appropriate bufferhead.  It then validates the cached
      map, and if it's not valid, we get a new map.  If we don't get a valid map
      or it's over a hole, we skip the block.
      
      At this point, we have to remap the bufferhead via xfs_map_at_offset().
      As previously noted, we had to do this even if the buffer was already
      mapped as the mapping would be stale for XFS_IO_DELALLOC, XFS_IO_UNWRITTEN
      and XFS_IO_COW IO types.  With xfs_map_blocks() now controlling the type,
      even XFS_IO_OVERWRITE types need remapping, as converted-but-not-yet-
      written delalloc extents beyond EOF can be reported at XFS_IO_OVERWRITE.
      Bufferheads that span such regions still need their BH_Delay flags cleared
      and their block numbers calculated, so we now unconditionally map each
      bufferhead before submission.
      
      But wait! There's more - remember the old "treat unwritten extents as
      holes on read" hack?  Yeah, that means we can have a dirty page with
      unmapped, unwritten bufferheads that contain data!  What makes these so
      special is that the unwritten "hole" bufferheads do not have a valid block
      device pointer, so if we attempt to write them xfs_add_to_ioend() blows
      up. So we make xfs_map_at_offset() do the "realtime or data device"
      lookup from the inode and ignore what was or wasn't put into the
      bufferhead when the buffer was instantiated.
      
      The astute reader will have realised by now that this code treats
      unwritten extents in multiple-blocks-per-page situations differently.
      If we get any combination of unwritten blocks on a dirty page that contain
      valid data in the page, we're going to convert them to real extents.  This
      can actually be a win, because it means that pages with interleaving
      unwritten and written blocks will get converted to a single written extent
      with zeros replacing the interspersed unwritten blocks.  This is actually
      good for reducing extent list and conversion overhead, and it means we
      issue a contiguous IO instead of lots of little ones.  The downside is
      that we use up a little extra IO bandwidth.  Neither of these seem like a
      bad thing given that spinning disks are seek sensitive, and SSDs/pmem have
      bandwidth to burn and the lower Io latency/CPU overhead of fewer, larger
      IOs will result in better performance on them...
      
      As a result of all this, the only state we actually care about from the
      bufferhead is a single flag - BH_Uptodate. We still use the bufferhead to
      pass some information to the bio via xfs_add_to_ioend(), but that is
      trivial to separate and pass explicitly.  This means we really only need
      1 bit of state per block per page from the buffered write path in the
      writeback path.  Everything else we do with the bufferhead is purely to
      make the buffered IO front end continue to work correctly. i.e we've
      pretty much marginalised bufferheads in the writeback path completely.
      Signed-off-By: NDave Chinner <dchinner@redhat.com>
      [hch: forward port, refactor and split off bits into other commits]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      e2f6ad46
    • C
      xfs: rename the offset variable in xfs_writepage_map · 6a4c9501
      Christoph Hellwig 提交于
      Calling it file_offset makes the usage more clear, especially with
      a new poffset variable that will be added soon for the offset inside
      the page.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      6a4c9501
    • C
      xfs: remove xfs_map_cow · 5c665e5b
      Christoph Hellwig 提交于
      We can handle the existing cow mapping case as a special case directly
      in xfs_writepage_map, and share code for allocating delalloc blocks
      with regular I/O in xfs_map_blocks.  This means we need to always
      call xfs_map_blocks for reflink inodes, but we can still skip most of
      the work if it turns out that there is no COW mapping overlapping the
      current block.
      
      As a subtle detail we need to start caching holes in the wpc to deal
      with the case of COW reservations between EOF.  But we'll need that
      infrastructure later anyway, so this is no big deal.
      
      Based on a patch from Dave Chinner.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      5c665e5b
    • C
      xfs: remove xfs_reflink_trim_irec_to_next_cow · fca8c805
      Christoph Hellwig 提交于
      We already have to check for overlapping COW extents everytime we
      come back to a page in xfs_writepage_map / xfs_map_cow, so this
      additional trim is not required.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      fca8c805
    • C
      xfs: don't use XFS_BMAPI_IGSTATE in xfs_map_blocks · a7b28f72
      Christoph Hellwig 提交于
      We want to be able to use the extent state as a reliably indicator for
      the type of I/O, and stop using the buffer head state.  For this we
      need to stop using the XFS_BMAPI_IGSTATE so that we don't see merged
      extents of different types.
      
      Based on a patch from Dave Chinner.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      a7b28f72
    • C
      xfs: don't clear imap_valid for a non-uptodate buffers · c57371a1
      Christoph Hellwig 提交于
      Finding a buffer that isn't uptodate doesn't invalidate the mapping for
      any given block.  The last_sector check will already take care of starting
      another ioend as soon as we find any non-update buffer, and if the current
      mapping doesn't include the next uptodate buffer the xfs_imap_valid check
      will take care of it.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      c57371a1
    • C
      xfs: do not set the page uptodate in xfs_writepage_map · 91cdfd17
      Christoph Hellwig 提交于
      We already track the page uptodate status based on the buffer uptodate
      status, which is updated whenever reading or zeroing blocks.
      
      This code has been there since commit a ptool commit in 2002, which
      claims to:
      
          "merge" the 2.4 fsx fix for block size < page size to 2.5.  This needed
          major changes to actually fit.
      
      and isn't present in other writepage implementations.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      91cdfd17
    • C
      xfs: move locking into xfs_bmap_punch_delalloc_range · d4380177
      Christoph Hellwig 提交于
      Both callers want the same looking, so do it only once.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      d4380177
    • C
      xfs: simplify xfs_aops_discard_page · 03625721
      Christoph Hellwig 提交于
      Instead of looking at the buffer heads to see if a block is delalloc just
      call xfs_bmap_punch_delalloc_range on the whole page - this will leave
      any non-delalloc block intact and handle the iteration for us.  As a side
      effect one more place stops caring about buffer heads and we can remove the
      xfs_check_page_type function entirely.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      03625721
    • C
      xfs: use iomap for blocksize == PAGE_SIZE readpage and readpages · 8b2e77c1
      Christoph Hellwig 提交于
      For file systems with a block size that equals the page size we never do
      partial reads, so we can use the buffer_head-less iomap versions of
      readpage and readpages without conflicting with the buffer_head structures
      create later in write_begin.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      8b2e77c1
  2. 09 6月, 2018 1 次提交
  3. 07 6月, 2018 1 次提交
    • D
      xfs: convert to SPDX license tags · 0b61f8a4
      Dave Chinner 提交于
      Remove the verbose license text from XFS files and replace them
      with SPDX tags. This does not change the license of any of the code,
      merely refers to the common, up-to-date license files in LICENSES/
      
      This change was mostly scripted. fs/xfs/Makefile and
      fs/xfs/libxfs/xfs_fs.h were modified by hand, the rest were detected
      and modified by the following command:
      
      for f in `git grep -l "GNU General" fs/xfs/` ; do
      	echo $f
      	cat $f | awk -f hdr.awk > $f.new
      	mv -f $f.new $f
      done
      
      And the hdr.awk script that did the modification (including
      detecting the difference between GPL-2.0 and GPL-2.0+ licenses)
      is as follows:
      
      $ cat hdr.awk
      BEGIN {
      	hdr = 1.0
      	tag = "GPL-2.0"
      	str = ""
      }
      
      /^ \* This program is free software/ {
      	hdr = 2.0;
      	next
      }
      
      /any later version./ {
      	tag = "GPL-2.0+"
      	next
      }
      
      /^ \*\// {
      	if (hdr > 0.0) {
      		print "// SPDX-License-Identifier: " tag
      		print str
      		print $0
      		str=""
      		hdr = 0.0
      		next
      	}
      	print $0
      	next
      }
      
      /^ \* / {
      	if (hdr > 1.0)
      		next
      	if (hdr > 0.0) {
      		if (str != "")
      			str = str "\n"
      		str = str $0
      		next
      	}
      	print $0
      	next
      }
      
      /^ \*/ {
      	if (hdr > 0.0)
      		next
      	print $0
      	next
      }
      
      // {
      	if (hdr > 0.0) {
      		if (str != "")
      			str = str "\n"
      		str = str $0
      		next
      	}
      	print $0
      }
      
      END { }
      $
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      0b61f8a4
  4. 02 6月, 2018 1 次提交
  5. 31 5月, 2018 1 次提交
  6. 16 5月, 2018 1 次提交
  7. 12 4月, 2018 1 次提交
  8. 11 4月, 2018 1 次提交
  9. 31 3月, 2018 1 次提交
    • D
      xfs, dax: introduce xfs_dax_aops · 6e2608df
      Dan Williams 提交于
      In preparation for the dax implementation to start associating dax pages
      to inodes via page->mapping, we need to provide a 'struct
      address_space_operations' instance for dax. Otherwise, direct-I/O
      triggers incorrect page cache assumptions and warnings like the
      following:
      
       WARNING: CPU: 27 PID: 1783 at fs/xfs/xfs_aops.c:1468
       xfs_vm_set_page_dirty+0xf3/0x1b0 [xfs]
       [..]
       CPU: 27 PID: 1783 Comm: dma-collision Tainted: G           O 4.15.0-rc2+ #984
       [..]
       Call Trace:
        set_page_dirty_lock+0x40/0x60
        bio_set_pages_dirty+0x37/0x50
        iomap_dio_actor+0x2b7/0x3b0
        ? iomap_dio_zero+0x110/0x110
        iomap_apply+0xa4/0x110
        iomap_dio_rw+0x29e/0x3b0
        ? iomap_dio_zero+0x110/0x110
        ? xfs_file_dio_aio_read+0x7c/0x1a0 [xfs]
        xfs_file_dio_aio_read+0x7c/0x1a0 [xfs]
        xfs_file_read_iter+0xa0/0xc0 [xfs]
        __vfs_read+0xf9/0x170
        vfs_read+0xa6/0x150
        SyS_pread64+0x93/0xb0
        entry_SYSCALL_64_fastpath+0x1f/0x96
      
      ...where the default set_page_dirty() handler assumes that dirty state
      is being tracked in 'struct page' flags.
      
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Suggested-by: NJan Kara <jack@suse.cz>
      Suggested-by: NDave Chinner <david@fromorbit.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      6e2608df
  10. 19 3月, 2018 1 次提交
  11. 16 3月, 2018 2 次提交
  12. 12 3月, 2018 1 次提交
  13. 29 1月, 2018 1 次提交
    • D
      xfs: skip CoW writes past EOF when writeback races with truncate · 70c57dcd
      Darrick J. Wong 提交于
      Every so often we blow the ASSERT(type != XFS_IO_COW) in xfs_map_blocks
      when running fsstress, as we do in generic/269.  The cause of this is
      writeback racing with truncate -- writeback doesn't take the iolock, so
      truncate can sneak in to decrease i_size and truncate page cache while
      writeback is gathering buffer heads to schedule writeout.
      
      If we hit this race on a block that has a CoW mapping, we'll get a valid
      imap from the CoW fork but the reduced i_size trims the mapping to zero
      length (which makes it invalid), so we call xfs_map_blocks to try again.
      This doesn't do much anyway, since any mapping we get out of that will
      also be invalid, so we might as well skip the assert and just stop.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      70c57dcd
  14. 13 1月, 2018 1 次提交
    • D
      xfs: use %px for data pointers when debugging · c9690043
      Darrick J. Wong 提交于
      Starting with commit 57e73442 ("vsprintf: refactor %pK code out of
      pointer"), the behavior of the raw '%p' printk format specifier was
      changed to print a 32-bit hash of the pointer value to avoid leaking
      kernel pointers into dmesg.  For most situations that's good.
      
      This is /undesirable/ behavior when we're trying to debug XFS, however,
      so define a PTR_FMT that prints the actual pointer when we're in debug
      mode.
      
      Note that %p for tracepoints still prints the raw pointer, so in the
      long run we could consider rewriting some of these messages as
      tracepoints.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      c9690043
  15. 03 1月, 2018 1 次提交
  16. 01 12月, 2017 2 次提交
  17. 17 10月, 2017 2 次提交
    • B
      xfs: trim writepage mapping to within eof · 40214d12
      Brian Foster 提交于
      The writeback rework in commit fbcc0256 ("xfs: Introduce
      writeback context for writepages") introduced a subtle change in
      behavior with regard to the block mapping used across the
      ->writepages() sequence. The previous xfs_cluster_write() code would
      only flush pages up to EOF at the time of the writepage, thus
      ensuring that any pages due to file-extending writes would be
      handled on a separate cycle and with a new, updated block mapping.
      
      The updated code establishes a block mapping in xfs_writepage_map()
      that could extend beyond EOF if the file has post-eof preallocation.
      Because we now use the generic writeback infrastructure and pass the
      cached mapping to each writepage call, there is no implicit EOF
      limit in place. If eofblocks trimming occurs during ->writepages(),
      any post-eof portion of the cached mapping becomes invalid. The
      eofblocks code has no means to serialize against writeback because
      there are no pages associated with post-eof blocks. Therefore if an
      eofblocks trim occurs and is followed by a file-extending buffered
      write, not only has the mapping become invalid, but we could end up
      writing a page to disk based on the invalid mapping.
      
      Consider the following sequence of events:
      
      - A buffered write creates a delalloc extent and post-eof
        speculative preallocation.
      - Writeback starts and on the first writepage cycle, the delalloc
        extent is converted to real blocks (including the post-eof blocks)
        and the mapping is cached.
      - The file is closed and xfs_release() trims post-eof blocks. The
        cached writeback mapping is now invalid.
      - Another buffered write appends the file with a delalloc extent.
      - The concurrent writeback cycle picks up the just written page
        because the writeback range end is LLONG_MAX. xfs_writepage_map()
        attributes it to the (now invalid) cached mapping and writes the
        data to an incorrect location on disk (and where the file offset is
        still backed by a delalloc extent).
      
      This problem is reproduced by xfstests test generic/464, which
      triggers racing writes, appends, open/closes and writeback requests.
      
      To address this problem, trim the mapping used during writeback to
      within EOF when the mapping is validated. This ensures the mapping
      is revalidated for any pages encountered beyond EOF as of the time
      the current mapping was cached or last validated.
      Reported-by: NEryu Guan <eguan@redhat.com>
      Diagnosed-by: NEryu Guan <eguan@redhat.com>
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      40214d12
    • D
      xfs: cancel dirty pages on invalidation · 793d7dbe
      Dave Chinner 提交于
      Recently we've had warnings arise from the vm handing us pages
      without bufferheads attached to them. This should not ever occur
      in XFS, but we don't defend against it properly if it does. The only
      place where we remove bufferheads from a page is in
      xfs_vm_releasepage(), but we can't tell the difference here between
      "page is dirty so don't release" and "page is dirty but is being
      invalidated so release it".
      
      In some places that are invalidating pages ask for pages to be
      released and follow up afterward calling ->releasepage by checking
      whether the page was dirty and then aborting the invalidation. This
      is a possible vector for releasing buffers from a page but then
      leaving it in the mapping, so we really do need to avoid dirty pages
      in xfs_vm_releasepage().
      
      To differentiate between invalidated pages and normal pages, we need
      to clear the page dirty flag when invalidating the pages. This can
      be done through xfs_vm_invalidatepage(), and will result
      xfs_vm_releasepage() seeing the page as clean which matches the
      bufferhead state on the page after calling block_invalidatepage().
      
      Hence we can re-add the page dirty check in xfs_vm_releasepage to
      catch the case where we might be releasing a page that is actually
      dirty and so should not have the bufferheads on it removed. This
      will remove one possible vector of "dirty page with no bufferheads"
      and so help narrow down the search for the root cause of that
      problem.
      Signed-Off-By: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      793d7dbe
  18. 27 9月, 2017 1 次提交
  19. 04 9月, 2017 1 次提交
  20. 01 9月, 2017 1 次提交
  21. 24 8月, 2017 1 次提交
    • C
      block: replace bi_bdev with a gendisk pointer and partitions index · 74d46992
      Christoph Hellwig 提交于
      This way we don't need a block_device structure to submit I/O.  The
      block_device has different life time rules from the gendisk and
      request_queue and is usually only available when the block device node
      is open.  Other callers need to explicitly create one (e.g. the lightnvm
      passthrough code, or the new nvme multipathing code).
      
      For the actual I/O path all that we need is the gendisk, which exists
      once per block device.  But given that the block layer also does
      partition remapping we additionally need a partition index, which is
      used for said remapping in generic_make_request.
      
      Note that all the block drivers generally want request_queue or
      sometimes the gendisk, so this removes a layer of indirection all
      over the stack.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      74d46992
  22. 28 6月, 2017 1 次提交
  23. 22 6月, 2017 1 次提交
    • D
      xfs: don't allow bmap on rt files · eb5e248d
      Darrick J. Wong 提交于
      bmap returns a dumb LBA address but not the block device that goes with
      that LBA.  Swapfiles don't care about this and will blindly assume that
      the data volume is the correct blockdev, which is totally bogus for
      files on the rt subvolume.  This results in the swap code doing IOs to
      arbitrary locations on the data device(!) if the passed in mapping is a
      realtime file, so just turn off bmap for rt files.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      eb5e248d
  24. 21 6月, 2017 1 次提交
    • D
      xfs: don't allow bmap on rt files · 61d819e7
      Darrick J. Wong 提交于
      bmap returns a dumb LBA address but not the block device that goes with
      that LBA.  Swapfiles don't care about this and will blindly assume that
      the data volume is the correct blockdev, which is totally bogus for
      files on the rt subvolume.  This results in the swap code doing IOs to
      arbitrary locations on the data device(!) if the passed in mapping is a
      realtime file, so just turn off bmap for rt files.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      61d819e7
  25. 20 6月, 2017 1 次提交
    • D
      xfs: remove double-underscore integer types · c8ce540d
      Darrick J. Wong 提交于
      This is a purely mechanical patch that removes the private
      __{u,}int{8,16,32,64}_t typedefs in favor of using the system
      {u,}int{8,16,32,64}_t typedefs.  This is the sed script used to perform
      the transformation and fix the resulting whitespace and indentation
      errors:
      
      s/typedef\t__uint8_t/typedef __uint8_t\t/g
      s/typedef\t__uint/typedef __uint/g
      s/typedef\t__int\([0-9]*\)_t/typedef int\1_t\t/g
      s/__uint8_t\t/__uint8_t\t\t/g
      s/__uint/uint/g
      s/__int\([0-9]*\)_t\t/__int\1_t\t\t/g
      s/__int/int/g
      /^typedef.*int[0-9]*_t;$/d
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      c8ce540d
  26. 09 6月, 2017 1 次提交
  27. 06 5月, 2017 1 次提交
    • E
      xfs: fix use-after-free in xfs_finish_page_writeback · 161f55ef
      Eryu Guan 提交于
      Commit 28b783e4 ("xfs: bufferhead chains are invalid after
      end_page_writeback") fixed one use-after-free issue by
      pre-calculating the loop conditionals before calling bh->b_end_io()
      in the end_io processing loop, but it assigned 'next' pointer before
      checking end offset boundary & breaking the loop, at which point the
      bh might be freed already, and caused use-after-free.
      
      This is caught by KASAN when running fstests generic/127 on sub-page
      block size XFS.
      
      [ 2517.244502] run fstests generic/127 at 2017-04-27 07:30:50
      [ 2747.868840] ==================================================================
      [ 2747.876949] BUG: KASAN: use-after-free in xfs_destroy_ioend+0x3d3/0x4e0 [xfs] at addr ffff8801395ae698
      ...
      [ 2747.918245] Call Trace:
      [ 2747.920975]  dump_stack+0x63/0x84
      [ 2747.924673]  kasan_object_err+0x21/0x70
      [ 2747.928950]  kasan_report+0x271/0x530
      [ 2747.933064]  ? xfs_destroy_ioend+0x3d3/0x4e0 [xfs]
      [ 2747.938409]  ? end_page_writeback+0xce/0x110
      [ 2747.943171]  __asan_report_load8_noabort+0x19/0x20
      [ 2747.948545]  xfs_destroy_ioend+0x3d3/0x4e0 [xfs]
      [ 2747.953724]  xfs_end_io+0x1af/0x2b0 [xfs]
      [ 2747.958197]  process_one_work+0x5ff/0x1000
      [ 2747.962766]  worker_thread+0xe4/0x10e0
      [ 2747.966946]  kthread+0x2d3/0x3d0
      [ 2747.970546]  ? process_one_work+0x1000/0x1000
      [ 2747.975405]  ? kthread_create_on_node+0xc0/0xc0
      [ 2747.980457]  ? syscall_return_slowpath+0xe6/0x140
      [ 2747.985706]  ? do_page_fault+0x30/0x80
      [ 2747.989887]  ret_from_fork+0x2c/0x40
      [ 2747.993874] Object at ffff8801395ae690, in cache buffer_head size: 104
      [ 2748.001155] Allocated:
      [ 2748.003782] PID = 8327
      [ 2748.006411]  save_stack_trace+0x1b/0x20
      [ 2748.010688]  save_stack+0x46/0xd0
      [ 2748.014383]  kasan_kmalloc+0xad/0xe0
      [ 2748.018370]  kasan_slab_alloc+0x12/0x20
      [ 2748.022648]  kmem_cache_alloc+0xb8/0x1b0
      [ 2748.027024]  alloc_buffer_head+0x22/0xc0
      [ 2748.031399]  alloc_page_buffers+0xd1/0x250
      [ 2748.035968]  create_empty_buffers+0x30/0x410
      [ 2748.040730]  create_page_buffers+0x120/0x1b0
      [ 2748.045493]  __block_write_begin_int+0x17a/0x1800
      [ 2748.050740]  iomap_write_begin+0x100/0x2f0
      [ 2748.055308]  iomap_zero_range_actor+0x253/0x5c0
      [ 2748.060362]  iomap_apply+0x157/0x270
      [ 2748.064347]  iomap_zero_range+0x5a/0x80
      [ 2748.068624]  iomap_truncate_page+0x6b/0xa0
      [ 2748.073227]  xfs_setattr_size+0x1f7/0xa10 [xfs]
      [ 2748.078312]  xfs_vn_setattr_size+0x68/0x140 [xfs]
      [ 2748.083589]  xfs_file_fallocate+0x4ac/0x820 [xfs]
      [ 2748.088838]  vfs_fallocate+0x2cf/0x780
      [ 2748.093021]  SyS_fallocate+0x48/0x80
      [ 2748.097006]  do_syscall_64+0x18a/0x430
      [ 2748.101186]  return_from_SYSCALL_64+0x0/0x6a
      [ 2748.105948] Freed:
      [ 2748.108189] PID = 8327
      [ 2748.110816]  save_stack_trace+0x1b/0x20
      [ 2748.115093]  save_stack+0x46/0xd0
      [ 2748.118788]  kasan_slab_free+0x73/0xc0
      [ 2748.122969]  kmem_cache_free+0x7a/0x200
      [ 2748.127247]  free_buffer_head+0x41/0x80
      [ 2748.131524]  try_to_free_buffers+0x178/0x250
      [ 2748.136316]  xfs_vm_releasepage+0x2e9/0x3d0 [xfs]
      [ 2748.141563]  try_to_release_page+0x100/0x180
      [ 2748.146325]  invalidate_inode_pages2_range+0x7da/0xcf0
      [ 2748.152087]  xfs_shift_file_space+0x37d/0x6e0 [xfs]
      [ 2748.157557]  xfs_collapse_file_space+0x49/0x120 [xfs]
      [ 2748.163223]  xfs_file_fallocate+0x2a7/0x820 [xfs]
      [ 2748.168462]  vfs_fallocate+0x2cf/0x780
      [ 2748.172642]  SyS_fallocate+0x48/0x80
      [ 2748.176629]  do_syscall_64+0x18a/0x430
      [ 2748.180810]  return_from_SYSCALL_64+0x0/0x6a
      
      Fixed it by checking on offset against end & breaking out first,
      dereference bh only if there're still bufferheads to process.
      Signed-off-by: NEryu Guan <eguan@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      161f55ef