1. 23 2月, 2017 13 次提交
  2. 29 1月, 2017 15 次提交
  3. 28 1月, 2017 1 次提交
    • B
      xfs: prevent quotacheck from overloading inode lru · e0d76fa4
      Brian Foster 提交于
      Quotacheck runs at mount time in situations where quota accounting must
      be recalculated. In doing so, it uses bulkstat to visit every inode in
      the filesystem. Historically, every inode processed during quotacheck
      was released and immediately tagged for reclaim because quotacheck runs
      before the superblock is marked active by the VFS. In other words,
      the final iput() lead to an immediate ->destroy_inode() call, which
      allowed the XFS background reclaim worker to start reclaiming inodes.
      
      Commit 17c12bcd ("xfs: when replaying bmap operations, don't let
      unlinked inodes get reaped") marks the XFS superblock active sooner as
      part of the mount process to support caching inodes processed during log
      recovery. This occurs before quotacheck and thus means all inodes
      processed by quotacheck are inserted to the LRU on release.  The
      s_umount lock is held until the mount has completed and thus prevents
      the shrinkers from operating on the sb. This means that quotacheck can
      excessively populate the inode LRU and lead to OOM conditions on systems
      without sufficient RAM.
      
      Update the quotacheck bulkstat handler to set XFS_IGET_DONTCACHE on
      inodes processed by quotacheck. This causes ->drop_inode() to return 1
      and in turn causes iput_final() to evict the inode. This preserves the
      original quotacheck behavior and prevents it from overloading the LRU
      and running out of memory.
      
      CC: stable@vger.kernel.org # v4.9
      Reported-by: NMartin Svec <martin.svec@zoner.cz>
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NEric Sandeen <sandeen@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      e0d76fa4
  4. 27 1月, 2017 6 次提交
  5. 26 1月, 2017 2 次提交
    • D
      xfs: clear _XBF_PAGES from buffers when readahead page · 2aa6ba7b
      Darrick J. Wong 提交于
      If we try to allocate memory pages to back an xfs_buf that we're trying
      to read, it's possible that we'll be so short on memory that the page
      allocation fails.  For a blocking read we'll just wait, but for
      readahead we simply dump all the pages we've collected so far.
      
      Unfortunately, after dumping the pages we neglect to clear the
      _XBF_PAGES state, which means that the subsequent call to xfs_buf_free
      thinks that b_pages still points to pages we own.  It then double-frees
      the b_pages pages.
      
      This results in screaming about negative page refcounts from the memory
      manager, which xfs oughtn't be triggering.  To reproduce this case,
      mount a filesystem where the size of the inodes far outweighs the
      availalble memory (a ~500M inode filesystem on a VM with 300MB memory
      did the trick here) and run bulkstat in parallel with other memory
      eating processes to put a huge load on the system.  The "check summary"
      phase of xfs_scrub also works for this purpose.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NEric Sandeen <sandeen@redhat.com>
      2aa6ba7b
    • C
      xfs: extsize hints are not unlikely in xfs_bmap_btalloc · 493611eb
      Christoph Hellwig 提交于
      With COW files they are the hotpath, just like for files with the
      extent size hint attribute.  We really shouldn't micro-manage anything
      but failure cases with unlikely.
      
      Additionally Arnd Bergmann recently reported that one of these two
      unlikely annotations causes link failures together with an upcoming
      kernel instrumentation patch, so let's get rid of it ASAP.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reported-by: NArnd Bergmann <arnd@arndb.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      493611eb
  6. 25 1月, 2017 3 次提交
    • B
      xfs: remove racy hasattr check from attr ops · 5a93790d
      Brian Foster 提交于
      xfs_attr_[get|remove]() have unlocked attribute fork checks to optimize
      away a lock cycle in cases where the fork does not exist or is otherwise
      empty. This check is not safe, however, because an attribute fork short
      form to extent format conversion includes a transient state that causes
      the xfs_inode_hasattr() check to fail. Specifically,
      xfs_attr_shortform_to_leaf() creates an empty extent format attribute
      fork and then adds the existing shortform attributes to it.
      
      This means that lookup of an existing xattr can spuriously return
      -ENOATTR when racing against a setxattr that causes the associated
      format conversion. This was originally reproduced by an untar on a
      particularly configured glusterfs volume, but can also be reproduced on
      demand with properly crafted xattr requests.
      
      The format conversion occurs under the exclusive ilock. xfs_attr_get()
      and xfs_attr_remove() already have the proper locking and checks further
      down in the functions to handle this situation correctly. Drop the
      unlocked checks to avoid the spurious failure and rely on the existing
      logic.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      5a93790d
    • C
      xfs: use per-AG reservations for the finobt · 76d771b4
      Christoph Hellwig 提交于
      Currently we try to rely on the global reserved block pool for block
      allocations for the free inode btree, but I have customer reports
      (fairly complex workload, need to find an easier reproducer) where that
      is not enough as the AG where we free an inode that requires a new
      finobt block is entirely full.  This causes us to cancel a dirty
      transaction and thus a file system shutdown.
      
      I think the right way to guard against this is to treat the finot the same
      way as the refcount btree and have a per-AG reservations for the possible
      worst case size of it, and the patch below implements that.
      
      Note that this could increase mount times with large finobt trees.  In
      an ideal world we would have added a field for the number of finobt
      fields to the AGI, similar to what we did for the refcount blocks.
      We should do add it next time we rev the AGI or AGF format by adding
      new fields.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      76d771b4
    • C
      xfs: only update mount/resv fields on success in __xfs_ag_resv_init · 4dfa2b84
      Christoph Hellwig 提交于
      Try to reserve the blocks first and only then update the fields in
      or hanging off the mount structure.  This way we can call __xfs_ag_resv_init
      again after a previous failure.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      4dfa2b84