1. 17 10月, 2017 1 次提交
    • D
      xfs: cancel dirty pages on invalidation · 793d7dbe
      Dave Chinner 提交于
      Recently we've had warnings arise from the vm handing us pages
      without bufferheads attached to them. This should not ever occur
      in XFS, but we don't defend against it properly if it does. The only
      place where we remove bufferheads from a page is in
      xfs_vm_releasepage(), but we can't tell the difference here between
      "page is dirty so don't release" and "page is dirty but is being
      invalidated so release it".
      
      In some places that are invalidating pages ask for pages to be
      released and follow up afterward calling ->releasepage by checking
      whether the page was dirty and then aborting the invalidation. This
      is a possible vector for releasing buffers from a page but then
      leaving it in the mapping, so we really do need to avoid dirty pages
      in xfs_vm_releasepage().
      
      To differentiate between invalidated pages and normal pages, we need
      to clear the page dirty flag when invalidating the pages. This can
      be done through xfs_vm_invalidatepage(), and will result
      xfs_vm_releasepage() seeing the page as clean which matches the
      bufferhead state on the page after calling block_invalidatepage().
      
      Hence we can re-add the page dirty check in xfs_vm_releasepage to
      catch the case where we might be releasing a page that is actually
      dirty and so should not have the bufferheads on it removed. This
      will remove one possible vector of "dirty page with no bufferheads"
      and so help narrow down the search for the root cause of that
      problem.
      Signed-Off-By: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      793d7dbe
  2. 27 9月, 2017 1 次提交
  3. 04 9月, 2017 1 次提交
  4. 01 9月, 2017 1 次提交
  5. 24 8月, 2017 1 次提交
    • C
      block: replace bi_bdev with a gendisk pointer and partitions index · 74d46992
      Christoph Hellwig 提交于
      This way we don't need a block_device structure to submit I/O.  The
      block_device has different life time rules from the gendisk and
      request_queue and is usually only available when the block device node
      is open.  Other callers need to explicitly create one (e.g. the lightnvm
      passthrough code, or the new nvme multipathing code).
      
      For the actual I/O path all that we need is the gendisk, which exists
      once per block device.  But given that the block layer also does
      partition remapping we additionally need a partition index, which is
      used for said remapping in generic_make_request.
      
      Note that all the block drivers generally want request_queue or
      sometimes the gendisk, so this removes a layer of indirection all
      over the stack.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      74d46992
  6. 28 6月, 2017 1 次提交
  7. 22 6月, 2017 1 次提交
    • D
      xfs: don't allow bmap on rt files · eb5e248d
      Darrick J. Wong 提交于
      bmap returns a dumb LBA address but not the block device that goes with
      that LBA.  Swapfiles don't care about this and will blindly assume that
      the data volume is the correct blockdev, which is totally bogus for
      files on the rt subvolume.  This results in the swap code doing IOs to
      arbitrary locations on the data device(!) if the passed in mapping is a
      realtime file, so just turn off bmap for rt files.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      eb5e248d
  8. 21 6月, 2017 1 次提交
    • D
      xfs: don't allow bmap on rt files · 61d819e7
      Darrick J. Wong 提交于
      bmap returns a dumb LBA address but not the block device that goes with
      that LBA.  Swapfiles don't care about this and will blindly assume that
      the data volume is the correct blockdev, which is totally bogus for
      files on the rt subvolume.  This results in the swap code doing IOs to
      arbitrary locations on the data device(!) if the passed in mapping is a
      realtime file, so just turn off bmap for rt files.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      61d819e7
  9. 20 6月, 2017 1 次提交
    • D
      xfs: remove double-underscore integer types · c8ce540d
      Darrick J. Wong 提交于
      This is a purely mechanical patch that removes the private
      __{u,}int{8,16,32,64}_t typedefs in favor of using the system
      {u,}int{8,16,32,64}_t typedefs.  This is the sed script used to perform
      the transformation and fix the resulting whitespace and indentation
      errors:
      
      s/typedef\t__uint8_t/typedef __uint8_t\t/g
      s/typedef\t__uint/typedef __uint/g
      s/typedef\t__int\([0-9]*\)_t/typedef int\1_t\t/g
      s/__uint8_t\t/__uint8_t\t\t/g
      s/__uint/uint/g
      s/__int\([0-9]*\)_t\t/__int\1_t\t\t/g
      s/__int/int/g
      /^typedef.*int[0-9]*_t;$/d
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      c8ce540d
  10. 09 6月, 2017 1 次提交
  11. 06 5月, 2017 1 次提交
    • E
      xfs: fix use-after-free in xfs_finish_page_writeback · 161f55ef
      Eryu Guan 提交于
      Commit 28b783e4 ("xfs: bufferhead chains are invalid after
      end_page_writeback") fixed one use-after-free issue by
      pre-calculating the loop conditionals before calling bh->b_end_io()
      in the end_io processing loop, but it assigned 'next' pointer before
      checking end offset boundary & breaking the loop, at which point the
      bh might be freed already, and caused use-after-free.
      
      This is caught by KASAN when running fstests generic/127 on sub-page
      block size XFS.
      
      [ 2517.244502] run fstests generic/127 at 2017-04-27 07:30:50
      [ 2747.868840] ==================================================================
      [ 2747.876949] BUG: KASAN: use-after-free in xfs_destroy_ioend+0x3d3/0x4e0 [xfs] at addr ffff8801395ae698
      ...
      [ 2747.918245] Call Trace:
      [ 2747.920975]  dump_stack+0x63/0x84
      [ 2747.924673]  kasan_object_err+0x21/0x70
      [ 2747.928950]  kasan_report+0x271/0x530
      [ 2747.933064]  ? xfs_destroy_ioend+0x3d3/0x4e0 [xfs]
      [ 2747.938409]  ? end_page_writeback+0xce/0x110
      [ 2747.943171]  __asan_report_load8_noabort+0x19/0x20
      [ 2747.948545]  xfs_destroy_ioend+0x3d3/0x4e0 [xfs]
      [ 2747.953724]  xfs_end_io+0x1af/0x2b0 [xfs]
      [ 2747.958197]  process_one_work+0x5ff/0x1000
      [ 2747.962766]  worker_thread+0xe4/0x10e0
      [ 2747.966946]  kthread+0x2d3/0x3d0
      [ 2747.970546]  ? process_one_work+0x1000/0x1000
      [ 2747.975405]  ? kthread_create_on_node+0xc0/0xc0
      [ 2747.980457]  ? syscall_return_slowpath+0xe6/0x140
      [ 2747.985706]  ? do_page_fault+0x30/0x80
      [ 2747.989887]  ret_from_fork+0x2c/0x40
      [ 2747.993874] Object at ffff8801395ae690, in cache buffer_head size: 104
      [ 2748.001155] Allocated:
      [ 2748.003782] PID = 8327
      [ 2748.006411]  save_stack_trace+0x1b/0x20
      [ 2748.010688]  save_stack+0x46/0xd0
      [ 2748.014383]  kasan_kmalloc+0xad/0xe0
      [ 2748.018370]  kasan_slab_alloc+0x12/0x20
      [ 2748.022648]  kmem_cache_alloc+0xb8/0x1b0
      [ 2748.027024]  alloc_buffer_head+0x22/0xc0
      [ 2748.031399]  alloc_page_buffers+0xd1/0x250
      [ 2748.035968]  create_empty_buffers+0x30/0x410
      [ 2748.040730]  create_page_buffers+0x120/0x1b0
      [ 2748.045493]  __block_write_begin_int+0x17a/0x1800
      [ 2748.050740]  iomap_write_begin+0x100/0x2f0
      [ 2748.055308]  iomap_zero_range_actor+0x253/0x5c0
      [ 2748.060362]  iomap_apply+0x157/0x270
      [ 2748.064347]  iomap_zero_range+0x5a/0x80
      [ 2748.068624]  iomap_truncate_page+0x6b/0xa0
      [ 2748.073227]  xfs_setattr_size+0x1f7/0xa10 [xfs]
      [ 2748.078312]  xfs_vn_setattr_size+0x68/0x140 [xfs]
      [ 2748.083589]  xfs_file_fallocate+0x4ac/0x820 [xfs]
      [ 2748.088838]  vfs_fallocate+0x2cf/0x780
      [ 2748.093021]  SyS_fallocate+0x48/0x80
      [ 2748.097006]  do_syscall_64+0x18a/0x430
      [ 2748.101186]  return_from_SYSCALL_64+0x0/0x6a
      [ 2748.105948] Freed:
      [ 2748.108189] PID = 8327
      [ 2748.110816]  save_stack_trace+0x1b/0x20
      [ 2748.115093]  save_stack+0x46/0xd0
      [ 2748.118788]  kasan_slab_free+0x73/0xc0
      [ 2748.122969]  kmem_cache_free+0x7a/0x200
      [ 2748.127247]  free_buffer_head+0x41/0x80
      [ 2748.131524]  try_to_free_buffers+0x178/0x250
      [ 2748.136316]  xfs_vm_releasepage+0x2e9/0x3d0 [xfs]
      [ 2748.141563]  try_to_release_page+0x100/0x180
      [ 2748.146325]  invalidate_inode_pages2_range+0x7da/0xcf0
      [ 2748.152087]  xfs_shift_file_space+0x37d/0x6e0 [xfs]
      [ 2748.157557]  xfs_collapse_file_space+0x49/0x120 [xfs]
      [ 2748.163223]  xfs_file_fallocate+0x2a7/0x820 [xfs]
      [ 2748.168462]  vfs_fallocate+0x2cf/0x780
      [ 2748.172642]  SyS_fallocate+0x48/0x80
      [ 2748.176629]  do_syscall_64+0x18a/0x430
      [ 2748.180810]  return_from_SYSCALL_64+0x0/0x6a
      
      Fixed it by checking on offset against end & breaking out first,
      dereference bh only if there're still bufferheads to process.
      Signed-off-by: NEryu Guan <eguan@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      161f55ef
  12. 04 5月, 2017 1 次提交
  13. 04 4月, 2017 2 次提交
  14. 08 3月, 2017 2 次提交
  15. 28 2月, 2017 1 次提交
  16. 03 2月, 2017 1 次提交
    • D
      xfs: mark speculative prealloc CoW fork extents unwritten · 5eda4300
      Darrick J. Wong 提交于
      Christoph Hellwig pointed out that there's a potentially nasty race when
      performing simultaneous nearby directio cow writes:
      
      "Thread 1 writes a range from B to c
      
      "                    B --------- C
                                 p
      
      "a little later thread 2 writes from A to B
      
      "        A --------- B
                     p
      
      [editor's note: the 'p' denote cowextsize boundaries, which I added to
      make this more clear]
      
      "but the code preallocates beyond B into the range where thread
      "1 has just written, but ->end_io hasn't been called yet.
      "But once ->end_io is called thread 2 has already allocated
      "up to the extent size hint into the write range of thread 1,
      "so the end_io handler will splice the unintialized blocks from
      "that preallocation back into the file right after B."
      
      We can avoid this race by ensuring that thread 1 cannot accidentally
      remap the blocks that thread 2 allocated (as part of speculative
      preallocation) as part of t2's write preparation in t1's end_io handler.
      The way we make this happen is by taking advantage of the unwritten
      extent flag as an intermediate step.
      
      Recall that when we begin the process of writing data to shared blocks,
      we create a delayed allocation extent in the CoW fork:
      
      D: --RRRRRRSSSRRRRRRRR---
      C: ------DDDDDDD---------
      
      When a thread prepares to CoW some dirty data out to disk, it will now
      convert the delalloc reservation into an /unwritten/ allocated extent in
      the cow fork.  The da conversion code tries to opportunistically
      allocate as much of a (speculatively prealloc'd) extent as possible, so
      we may end up allocating a larger extent than we're actually writing
      out:
      
      D: --RRRRRRSSSRRRRRRRR---
      U: ------UUUUUUU---------
      
      Next, we convert only the part of the extent that we're actively
      planning to write to normal (i.e. not unwritten) status:
      
      D: --RRRRRRSSSRRRRRRRR---
      U: ------UURRUUU---------
      
      If the write succeeds, the end_cow function will now scan the relevant
      range of the CoW fork for real extents and remap only the real extents
      into the data fork:
      
      D: --RRRRRRRRSRRRRRRRR---
      U: ------UU--UUU---------
      
      This ensures that we never obliterate valid data fork extents with
      unwritten blocks from the CoW fork.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      5eda4300
  17. 12 1月, 2017 1 次提交
    • J
      xfs: Timely free truncated dirty pages · 0a417b8d
      Jan Kara 提交于
      Commit 99579cce "xfs: skip dirty pages in ->releasepage()" started
      to skip dirty pages in xfs_vm_releasepage() which also has the effect
      that if a dirty page is truncated, it does not get freed by
      block_invalidatepage() and is lingering in LRU list waiting for reclaim.
      So a simple loop like:
      
      while true; do
      	dd if=/dev/zero of=file bs=1M count=100
      	rm file
      done
      
      will keep using more and more memory until we hit low watermarks and
      start pagecache reclaim which will eventually reclaim also the truncate
      pages. Keeping these truncated (and thus never usable) pages in memory
      is just a waste of memory, is unnecessarily stressing page cache
      reclaim, and reportedly also leads to anonymous mmap(2) returning ENOMEM
      prematurely.
      
      So instead of just skipping dirty pages in xfs_vm_releasepage(), return
      to old behavior of skipping them only if they have delalloc or unwritten
      buffers and fix the spurious warnings by warning only if the page is
      clean.
      
      CC: stable@vger.kernel.org
      CC: Brian Foster <bfoster@redhat.com>
      CC: Vlastimil Babka <vbabka@suse.cz>
      Reported-by: NPetr Tůma <petr.tuma@d3s.mff.cuni.cz>
      Fixes: 99579cceSigned-off-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      0a417b8d
  18. 30 11月, 2016 2 次提交
    • C
      xfs: use iomap_dio_rw · acdda3aa
      Christoph Hellwig 提交于
      Straight switch over to using iomap for direct I/O - we already have the
      non-COW dio path in write_begin for DAX and files with extent size hints,
      so nothing to add there.  The COW path is ported over from the old
      get_blocks version and a bit of a mess, but I have some work in progress
      to make it look more like the buffered I/O COW path.
      
      This gets rid of xfs_get_blocks_direct and the last caller of
      xfs_get_blocks with the create flag set, so all that code can be removed.
      
      Last but not least I've removed a comment in xfs_filemap_fault that
      refers to xfs_get_blocks entirely instead of updating it - while the
      reference is correct, the whole DAX fault path looks different than
      the non-DAX one, so it seems rather pointless.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Tested-by: NJens Axboe <axboe@fb.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      acdda3aa
    • C
      xfs: remove i_iolock and use i_rwsem in the VFS inode instead · 65523218
      Christoph Hellwig 提交于
      This patch drops the XFS-own i_iolock and uses the VFS i_rwsem which
      recently replaced i_mutex instead.  This means we only have to take
      one lock instead of two in many fast path operations, and we can
      also shrink the xfs_inode structure.  Thanks to the xfs_ilock family
      there is very little churn, the only thing of note is that we need
      to switch to use the lock_two_directory helper for taking the i_rwsem
      on two inodes in a few places to make sure our lock order matches
      the one used in the VFS.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Tested-by: NJens Axboe <axboe@fb.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      65523218
  19. 24 11月, 2016 1 次提交
  20. 08 11月, 2016 2 次提交
    • B
      xfs: don't BUG() on mixed direct and mapped I/O · 04197b34
      Brian Foster 提交于
      We've had reports of generic/095 causing XFS to BUG() in
      __xfs_get_blocks() due to the existence of delalloc blocks on a
      direct I/O read. generic/095 issues a mix of various types of I/O,
      including direct and memory mapped I/O to a single file. This is
      clearly not supported behavior and is known to lead to such
      problems. E.g., the lack of exclusion between the direct I/O and
      write fault paths means that a write fault can allocate delalloc
      blocks in a region of a file that was previously a hole after the
      direct read has attempted to flush/inval the file range, but before
      it actually reads the block mapping. In turn, the direct read
      discovers a delalloc extent and cannot proceed.
      
      While the appropriate solution here is to not mix direct and memory
      mapped I/O to the same regions of the same file, the current
      BUG_ON() behavior is probably overkill as it can crash the entire
      system.  Instead, localize the failure to the I/O in question by
      returning an error for a direct I/O that cannot be handled safely
      due to delalloc blocks. Be careful to allow the case of a direct
      write to post-eof delalloc blocks. This can occur due to speculative
      preallocation and is safe as post-eof blocks are not accompanied by
      dirty pages in pagecache (conversely, preallocation within eof must
      have been zeroed, and thus dirtied, before the inode size could have
      been increased beyond said blocks).
      
      Finally, provide an additional warning if a direct I/O write occurs
      while the file is memory mapped. This may not catch all problematic
      scenarios, but provides a hint that some known-to-be-problematic I/O
      methods are in use.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      04197b34
    • R
      xfs: use struct iomap based DAX PMD fault path · 862f1b9d
      Ross Zwisler 提交于
      Switch xfs_filemap_pmd_fault() from using dax_pmd_fault() to the new and
      improved dax_iomap_pmd_fault().  Also, now that it has no more users,
      remove xfs_get_blocks_dax_fault().
      Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      862f1b9d
  21. 03 11月, 2016 1 次提交
  22. 01 11月, 2016 1 次提交
  23. 11 10月, 2016 1 次提交
  24. 06 10月, 2016 3 次提交
    • D
      xfs: implement CoW for directio writes · 0613f16c
      Darrick J. Wong 提交于
      For O_DIRECT writes to shared blocks, we have to CoW them just like
      we would with buffered writes.  For writes that are not block-aligned,
      just bounce them to the page cache.
      
      For block-aligned writes, however, we can do better than that.  Use
      the same mechanisms that we employ for buffered CoW to set up a
      delalloc reservation, allocate all the blocks at once, issue the
      writes against the new blocks and use the same ioend functions to
      remap the blocks after the write.  This should be fairly performant.
      
      Christoph discovered that xfs_reflink_allocate_cow_range may stumble
      over invalid entries in the extent array given that it drops the ilock
      but still expects the index to be stable.  Simple fixing it to a new
      lookup for every iteration still isn't correct given that
      xfs_bmapi_allocate will trigger a BUG_ON() if hitting a hole, and
      there is nothing preventing a xfs_bunmapi_cow call removing extents
      once we dropped the ilock either.
      
      This patch duplicates the inner loop of xfs_bmapi_allocate into a
      helper for xfs_reflink_allocate_cow_range so that it can be done under
      the same ilock critical section as our CoW fork delayed allocation.
      The directio CoW warts will be revisited in a later patch.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      0613f16c
    • D
      xfs: report shared extent mappings to userspace correctly · db1327b1
      Darrick J. Wong 提交于
      Report shared extents through the iomap interface so that FIEMAP flags
      shared blocks accurately.  Have xfs_vm_bmap return zero for reflinked
      files because the bmap-based swap code requires static block mappings,
      which is incompatible with copy on write.
      
      NOTE: Existing userspace bmap users such as lilo will have the same
      problem with reflink files.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      db1327b1
    • D
      xfs: move mappings from cow fork to data fork after copy-write · 43caeb18
      Darrick J. Wong 提交于
      After the write component of a copy-write operation finishes, clean up
      the bookkeeping left behind.  On error, we simply free the new blocks
      and pass the error up.  If we succeed, however, then we must remove
      the old data fork mapping and move the cow fork mapping to the data
      fork.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      [hch: Call the CoW failure function during xfs_cancel_ioend]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      43caeb18
  25. 05 10月, 2016 2 次提交
    • D
      xfs: allocate delayed extents in CoW fork · ef473667
      Darrick J. Wong 提交于
      Modify the writepage handler to find and convert pending delalloc
      extents to real allocations.  Furthermore, when we're doing non-cow
      writes to a part of a file that already has a CoW reservation (the
      cowextsz hint that we set up in a subsequent patch facilitates this),
      promote the write to copy-on-write so that the entire extent can get
      written out as a single extent on disk, thereby reducing post-CoW
      fragmentation.
      
      Christoph moved the CoW support code in _map_blocks to a separate helper
      function, refactored other functions, and reduced the number of CoW fork
      lookups, so I merged those changes here to reduce churn.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      ef473667
    • D
      xfs: support allocating delayed extents in CoW fork · 60b4984f
      Darrick J. Wong 提交于
      Modify xfs_bmap_add_extent_delay_real() so that we can convert delayed
      allocation extents in the CoW fork to real allocations, and wire this
      up all the way back to xfs_iomap_write_allocate().  In a subsequent
      patch, we'll modify the writepage handler to call this.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      60b4984f
  26. 19 9月, 2016 1 次提交
  27. 22 7月, 2016 2 次提交
    • D
      xfs: bufferhead chains are invalid after end_page_writeback · 28b783e4
      Dave Chinner 提交于
      In xfs_finish_page_writeback(), we have a loop that looks like this:
      
              do {
                      if (off < bvec->bv_offset)
                              goto next_bh;
                      if (off > end)
                              break;
                      bh->b_end_io(bh, !error);
      next_bh:
                      off += bh->b_size;
              } while ((bh = bh->b_this_page) != head);
      
      The b_end_io function is end_buffer_async_write(), which will call
      end_page_writeback() once all the buffers have marked as no longer
      under IO.  This issue here is that the only thing currently
      protecting both the bufferhead chain and the page from being
      reclaimed is the PageWriteback state held on the page.
      
      While we attempt to limit the loop to just the buffers covered by
      the IO, we still read from the buffer size and follow the next
      pointer in the bufferhead chain. There is no guarantee that either
      of these are valid after the PageWriteback flag has been cleared.
      Hence, loops like this are completely unsafe, and result in
      use-after-free issues. One such problem was caught by Calvin Owens
      with KASAN:
      
      .....
       INFO: Freed in 0x103fc80ec age=18446651500051355200 cpu=2165122683 pid=-1
        free_buffer_head+0x41/0x90
        __slab_free+0x1ed/0x340
        kmem_cache_free+0x270/0x300
        free_buffer_head+0x41/0x90
        try_to_free_buffers+0x171/0x240
        xfs_vm_releasepage+0xcb/0x3b0
        try_to_release_page+0x106/0x190
        shrink_page_list+0x118e/0x1a10
        shrink_inactive_list+0x42c/0xdf0
        shrink_zone_memcg+0xa09/0xfa0
        shrink_zone+0x2c3/0xbc0
      .....
       Call Trace:
        <IRQ>  [<ffffffff81e8b8e4>] dump_stack+0x68/0x94
        [<ffffffff8153a995>] print_trailer+0x115/0x1a0
        [<ffffffff81541174>] object_err+0x34/0x40
        [<ffffffff815436e7>] kasan_report_error+0x217/0x530
        [<ffffffff81543b33>] __asan_report_load8_noabort+0x43/0x50
        [<ffffffff819d651f>] xfs_destroy_ioend+0x3bf/0x4c0
        [<ffffffff819d69d4>] xfs_end_bio+0x154/0x220
        [<ffffffff81de0c58>] bio_endio+0x158/0x1b0
        [<ffffffff81dff61b>] blk_update_request+0x18b/0xb80
        [<ffffffff821baf57>] scsi_end_request+0x97/0x5a0
        [<ffffffff821c5558>] scsi_io_completion+0x438/0x1690
        [<ffffffff821a8d95>] scsi_finish_command+0x375/0x4e0
        [<ffffffff821c3940>] scsi_softirq_done+0x280/0x340
      
      
      Where the access is occuring during IO completion after the buffer
      had been freed from direct memory reclaim.
      
      Prevent use-after-free accidents in this end_io processing loop by
      pre-calculating the loop conditionals before calling bh->b_end_io().
      The loop is already limited to just the bufferheads covered by the
      IO in progress, so the offset checks are sufficient to prevent
      accessing buffers in the chain after end_page_writeback() has been
      called by the the bh->b_end_io() callout.
      
      Yet another example of why Bufferheads Must Die.
      
      cc: <stable@vger.kernel.org> # 4.7
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reported-and-Tested-by: NCalvin Owens <calvinowens@fb.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      28b783e4
    • B
      xfs: skip dirty pages in ->releasepage() · 99579cce
      Brian Foster 提交于
      XFS has had scattered reports of delalloc blocks present at
      ->releasepage() time. This results in a warning with a stack trace
      similar to the following:
      
       ...
       Call Trace:
        [<ffffffffa23c5b8f>] dump_stack+0x63/0x84
        [<ffffffffa20837a7>] warn_slowpath_common+0x97/0xe0
        [<ffffffffa208380a>] warn_slowpath_null+0x1a/0x20
        [<ffffffffa2326caf>] xfs_vm_releasepage+0x10f/0x140
        [<ffffffffa218c680>] ? page_mkclean_one+0xd0/0xd0
        [<ffffffffa218d3a0>] ? anon_vma_prepare+0x150/0x150
        [<ffffffffa21521c2>] try_to_release_page+0x32/0x50
        [<ffffffffa2166b2e>] shrink_active_list+0x3ce/0x3e0
        [<ffffffffa21671c7>] shrink_lruvec+0x687/0x7d0
        [<ffffffffa21673ec>] shrink_zone+0xdc/0x2c0
        [<ffffffffa2168539>] kswapd+0x4f9/0x970
        [<ffffffffa2168040>] ? mem_cgroup_shrink_node_zone+0x1a0/0x1a0
        [<ffffffffa20a0d99>] kthread+0xc9/0xe0
        [<ffffffffa20a0cd0>] ? kthread_stop+0x100/0x100
        [<ffffffffa26b404f>] ret_from_fork+0x3f/0x70
        [<ffffffffa20a0cd0>] ? kthread_stop+0x100/0x100
      
      This occurs because it is possible for shrink_active_list() to send
      pages marked dirty to ->releasepage() when certain buffer_head threshold
      conditions are met. shrink_active_list() doesn't check the page dirty
      state apparently to handle an old ext3 corner case where in some cases
      clean pages would not have the dirty bit cleared, thus it is up to the
      filesystem to determine how to handle the page.
      
      XFS currently handles the delalloc case properly, but this behavior
      makes the warning spurious. Update the XFS ->releasepage() handler to
      explicitly skip dirty pages. Retain the existing delalloc/unwritten
      checks so we continue to warn if such buffers exist on clean pages when
      they shouldn't.
      Diagnosed-by: NDave Chinner <david@fromorbit.com>
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      99579cce
  28. 20 7月, 2016 1 次提交
  29. 21 6月, 2016 2 次提交
  30. 08 6月, 2016 2 次提交