1. 31 1月, 2017 6 次提交
  2. 28 1月, 2017 1 次提交
    • B
      xfs: prevent quotacheck from overloading inode lru · e0d76fa4
      Brian Foster 提交于
      Quotacheck runs at mount time in situations where quota accounting must
      be recalculated. In doing so, it uses bulkstat to visit every inode in
      the filesystem. Historically, every inode processed during quotacheck
      was released and immediately tagged for reclaim because quotacheck runs
      before the superblock is marked active by the VFS. In other words,
      the final iput() lead to an immediate ->destroy_inode() call, which
      allowed the XFS background reclaim worker to start reclaiming inodes.
      
      Commit 17c12bcd ("xfs: when replaying bmap operations, don't let
      unlinked inodes get reaped") marks the XFS superblock active sooner as
      part of the mount process to support caching inodes processed during log
      recovery. This occurs before quotacheck and thus means all inodes
      processed by quotacheck are inserted to the LRU on release.  The
      s_umount lock is held until the mount has completed and thus prevents
      the shrinkers from operating on the sb. This means that quotacheck can
      excessively populate the inode LRU and lead to OOM conditions on systems
      without sufficient RAM.
      
      Update the quotacheck bulkstat handler to set XFS_IGET_DONTCACHE on
      inodes processed by quotacheck. This causes ->drop_inode() to return 1
      and in turn causes iput_final() to evict the inode. This preserves the
      original quotacheck behavior and prevents it from overloading the LRU
      and running out of memory.
      
      CC: stable@vger.kernel.org # v4.9
      Reported-by: NMartin Svec <martin.svec@zoner.cz>
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NEric Sandeen <sandeen@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      e0d76fa4
  3. 27 1月, 2017 1 次提交
    • D
      xfs: fix bmv_count confusion w/ shared extents · c364b6d0
      Darrick J. Wong 提交于
      In a bmapx call, bmv_count is the total size of the array, including the
      zeroth element that userspace uses to supply the search key.  The output
      array starts at offset 1 so that we can set up the user for the next
      invocation.  Since we now can split an extent into multiple bmap records
      due to shared/unshared status, we have to be careful that we don't
      overflow the output array.
      
      In the original patch f86f4037 ("xfs: teach get_bmapx about shared
      extents and the CoW fork") I used cur_ext (the output index) to check
      for overflows, albeit with an off-by-one error.  Since nexleft no longer
      describes the number of unfilled slots in the output, we can rip all
      that out and use cur_ext for the overflow check directly.
      
      Failure to do this causes heap corruption in bmapx callers such as
      xfs_io and xfs_scrub.  xfs/328 can reproduce this problem.
      Reviewed-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      c364b6d0
  4. 26 1月, 2017 2 次提交
    • D
      xfs: clear _XBF_PAGES from buffers when readahead page · 2aa6ba7b
      Darrick J. Wong 提交于
      If we try to allocate memory pages to back an xfs_buf that we're trying
      to read, it's possible that we'll be so short on memory that the page
      allocation fails.  For a blocking read we'll just wait, but for
      readahead we simply dump all the pages we've collected so far.
      
      Unfortunately, after dumping the pages we neglect to clear the
      _XBF_PAGES state, which means that the subsequent call to xfs_buf_free
      thinks that b_pages still points to pages we own.  It then double-frees
      the b_pages pages.
      
      This results in screaming about negative page refcounts from the memory
      manager, which xfs oughtn't be triggering.  To reproduce this case,
      mount a filesystem where the size of the inodes far outweighs the
      availalble memory (a ~500M inode filesystem on a VM with 300MB memory
      did the trick here) and run bulkstat in parallel with other memory
      eating processes to put a huge load on the system.  The "check summary"
      phase of xfs_scrub also works for this purpose.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NEric Sandeen <sandeen@redhat.com>
      2aa6ba7b
    • C
      xfs: extsize hints are not unlikely in xfs_bmap_btalloc · 493611eb
      Christoph Hellwig 提交于
      With COW files they are the hotpath, just like for files with the
      extent size hint attribute.  We really shouldn't micro-manage anything
      but failure cases with unlikely.
      
      Additionally Arnd Bergmann recently reported that one of these two
      unlikely annotations causes link failures together with an upcoming
      kernel instrumentation patch, so let's get rid of it ASAP.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reported-by: NArnd Bergmann <arnd@arndb.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      493611eb
  5. 25 1月, 2017 4 次提交
  6. 24 1月, 2017 1 次提交
    • C
      xfs: fix COW writeback race · d2b3964a
      Christoph Hellwig 提交于
      Due to the way how xfs_iomap_write_allocate tries to convert the whole
      found extents from delalloc to real space we can run into a race
      condition with multiple threads doing writes to this same extent.
      For the non-COW case that is harmless as the only thing that can happen
      is that we call xfs_bmapi_write on an extent that has already been
      converted to a real allocation.  For COW writes where we move the extent
      from the COW to the data fork after I/O completion the race is, however,
      not quite as harmless.  In the worst case we are now calling
      xfs_bmapi_write on a region that contains hole in the COW work, which
      will trip up an assert in debug builds or lead to file system corruption
      in non-debug builds.  This seems to be reproducible with workloads of
      small O_DSYNC write, although so far I've not managed to come up with
      a with an isolated reproducer.
      
      The fix for the issue is relatively simple:  tell xfs_bmapi_write
      that we are only asked to convert delayed allocations and skip holes
      in that case.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      d2b3964a
  7. 19 1月, 2017 1 次提交
  8. 18 1月, 2017 7 次提交
  9. 12 1月, 2017 1 次提交
    • J
      xfs: Timely free truncated dirty pages · 0a417b8d
      Jan Kara 提交于
      Commit 99579cce "xfs: skip dirty pages in ->releasepage()" started
      to skip dirty pages in xfs_vm_releasepage() which also has the effect
      that if a dirty page is truncated, it does not get freed by
      block_invalidatepage() and is lingering in LRU list waiting for reclaim.
      So a simple loop like:
      
      while true; do
      	dd if=/dev/zero of=file bs=1M count=100
      	rm file
      done
      
      will keep using more and more memory until we hit low watermarks and
      start pagecache reclaim which will eventually reclaim also the truncate
      pages. Keeping these truncated (and thus never usable) pages in memory
      is just a waste of memory, is unnecessarily stressing page cache
      reclaim, and reportedly also leads to anonymous mmap(2) returning ENOMEM
      prematurely.
      
      So instead of just skipping dirty pages in xfs_vm_releasepage(), return
      to old behavior of skipping them only if they have delalloc or unwritten
      buffers and fix the spurious warnings by warning only if the page is
      clean.
      
      CC: stable@vger.kernel.org
      CC: Brian Foster <bfoster@redhat.com>
      CC: Vlastimil Babka <vbabka@suse.cz>
      Reported-by: NPetr Tůma <petr.tuma@d3s.mff.cuni.cz>
      Fixes: 99579cceSigned-off-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      0a417b8d
  10. 10 1月, 2017 5 次提交
  11. 04 1月, 2017 4 次提交
  12. 25 12月, 2016 1 次提交
  13. 23 12月, 2016 1 次提交
  14. 10 12月, 2016 2 次提交
    • D
      vfs: refactor clone/dedupe_file_range common functions · 876bec6f
      Darrick J. Wong 提交于
      Hoist both the XFS reflink inode state and preparation code and the XFS
      file blocks compare functions into the VFS so that ocfs2 can take
      advantage of it for reflink and dedupe.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      876bec6f
    • C
      fs: try to clone files first in vfs_copy_file_range · a76b5b04
      Christoph Hellwig 提交于
      A clone is a perfectly fine implementation of a file copy, so most
      file systems just implement the copy that way.  Instead of duplicating
      this logic move it to the VFS.  Currently btrfs and XFS implement copies
      the same way as clones and there is no behavior change for them, cifs
      only implements clones and grow support for copy_file_range with this
      patch.  NFS implements both, so this will allow copy_file_range to work
      on servers that only implement CLONE and be lot more efficient on servers
      that implements CLONE and COPY.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      a76b5b04
  15. 09 12月, 2016 3 次提交