1. 23 2月, 2015 1 次提交
  2. 04 12月, 2014 1 次提交
    • D
      xfs: fix premature enospc on inode allocation · 7a1df156
      Dave Chinner 提交于
      After growing a filesystem, XFS can fail to allocate inodes even
      though there is a large amount of space available in the filesystem
      for inodes. The issue is caused by a nearly full allocation group
      having enough free space in it to be considered for inode
      allocation, but not enough contiguous free space to actually
      allocation inodes.  This situation results in successful selection
      of the AG for allocation, then failure of the allocation resulting
      in ENOSPC being reported to the caller.
      
      It is caused by two possible issues. Firstly, we only consider the
      lognest free extent and whether it would fit an inode chunk. If the
      extent is not correctly aligned, then we can't allocate an inode
      chunk in it regardless of the fact that it is large enough. This
      tends to be a permanent error until space in the AG is freed.
      
      The second issue is that we don't actually lock the AGI or AGF when
      we are doing these checks, and so by the time we get to actually
      allocating the inode chunk the space we thought we had in the AG may
      have been allocated. This tends to be a spurious error as it
      requires a race to trigger. Hence this case is ignored in this patch
      as the reported problem is for permanent errors.
      
      The first issue could be addressed by simply taking into account the
      alignment when checking the longest extent. This, however, would
      prevent allocation in AGs that have aligned, exact sized extents
      free. However, this case should be fairly rare compared to the
      number of allocations that occur near ENOSPC that would trigger this
      condition.
      
      Hence, when selecting the inode AG, take into account the inode
      cluster alignment when checking the lognest free extent in the AG.
      If we can't find any AGs with a contiguous free space large
      enough to be aligned, drop the alignment addition and just try for
      an AG that has enough contiguous free space available for an inode
      chunk. This won't prevent issues from occurring, but should avoid
      situations where other AGs have lots of free space but the selected
      AG can't allocate due to alignment constraints.
      Reported-by: NArkadiusz Miskiewicz <arekm@maven.pl>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      7a1df156
  3. 01 12月, 2014 1 次提交
  4. 28 11月, 2014 3 次提交
  5. 29 9月, 2014 1 次提交
  6. 09 9月, 2014 1 次提交
    • E
      xfs: add a few more verifier tests · e1b05723
      Eric Sandeen 提交于
      These were exposed by fsfuzzer runs; without them we fail
      in various exciting and sometimes convoluted ways when we
      encounter disk corruption.
      
      Without the MAXLEVELS tests we tend to walk off the end of
      an array in a loop like this:
      
              for (i = 0; i < cur->bc_nlevels; i++) {
                      if (cur->bc_bufs[i])
      
      Without the dirblklog test we try to allocate more memory
      than we could possibly hope for and loop forever:
      
      xfs_dabuf_map()
      	nfsb = mp->m_dir_geo->fsbcount;
      	irecs = kmem_zalloc(sizeof(irec) * nfsb, KM_SLEEP...
      
      As for the logbsize check, that's the convoluted one.
      
      If logbsize is specified at mount time, it's sanitized
      in xfs_parseargs; in particular it makes sure that it's
      not > XLOG_MAX_RECORD_BSIZE.
      
      If not specified at mount time, it comes from the superblock
      via sb_logsunit; this is limited to 256k at mkfs time as well;
      it's copied into m_logbsize in xfs_finish_flags().
      
      However, if for some reason the on-disk value is corrupt and
      too large, nothing catches it.  It's a circuitous path, but
      that size eventually finds its way to places that make the kernel
      very unhappy, leading to oopses in xlog_pack_data() because we
      use the size as an index into iclog->ic_data, but the array
      is not necessarily that big.
      
      Anyway - bounds checking when we read from disk is a good thing!
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      e1b05723
  7. 25 6月, 2014 2 次提交
  8. 22 6月, 2014 1 次提交
  9. 06 6月, 2014 1 次提交
  10. 20 5月, 2014 2 次提交
  11. 24 4月, 2014 6 次提交
  12. 07 3月, 2014 1 次提交
    • B
      xfs: avoid AGI/AGF deadlock scenario for inode chunk allocation · e480a723
      Brian Foster 提交于
      The inode chunk allocation path can lead to deadlock conditions if
      a transaction is dirtied with an AGF (to fix up the freelist) for
      an AG that cannot satisfy the actual allocation request. This code
      path is written to try and avoid this scenario, but it can be
      reproduced by running xfstests generic/270 in a loop on a 512b fs.
      
      An example situation is:
      - process A attempts an inode allocation on AG 3, modifies
        the freelist, fails the allocation and ultimately moves on to
        AG 0 with the AG 3 AGF held
      - process B is doing a free space operation (i.e., truncate) and
        acquires the AG 0 AGF, waits on the AG 3 AGF
      - process A acquires the AG 0 AGI, waits on the AG 0 AGF (deadlock)
      
      The problem here is that process A acquired the AG 3 AGF while
      moving on to AG 0 (and releasing the AG 3 AGI with the AG 3 AGF
      held). xfs_dialloc() makes one pass through each of the AGs when
      attempting to allocate an inode chunk. The expectation is a clean
      transaction if a particular AG cannot satisfy the allocation
      request. xfs_ialloc_ag_alloc() is written to support this through
      use of the minalignslop allocation args field.
      
      When using the agi->agi_newino optimization, we attempt an exact
      bno allocation request based on the location of the previously
      allocated chunk. minalignslop is set to inform the allocator that
      we will require alignment on this chunk, and thus to not allow the
      request for this AG if the extra space is not available. Suppose
      that the AG in question has just enough space for this request, but
      not at the requested bno. xfs_alloc_fix_freelist() will proceed as
      normal as it determines the request should succeed, and thus it is
      allowed to modify the agf. xfs_alloc_ag_vextent() ultimately fails
      because the requested bno is not available. In response, the caller
      moves on to a NEAR_BNO allocation request for the same AG. The
      alignment is set, but the minalignslop field is never reset. This
      increases the overall requirement of the request from the first
      attempt. If this delta is the difference between allocation success
      and failure for the AG, xfs_alloc_fix_freelist() rejects this
      request outright the second time around and causes the allocation
      request to unnecessarily fail for this AG.
      
      To address this situation, reset the minalignslop field immediately
      after use and prevent it from leaking into subsequent requests.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      e480a723
  13. 27 2月, 2014 4 次提交
  14. 13 12月, 2013 5 次提交
  15. 12 12月, 2013 1 次提交
  16. 07 11月, 2013 1 次提交
  17. 24 10月, 2013 3 次提交
    • D
      xfs: decouple inode and bmap btree header files · a4fbe6ab
      Dave Chinner 提交于
      Currently the xfs_inode.h header has a dependency on the definition
      of the BMAP btree records as the inode fork includes an array of
      xfs_bmbt_rec_host_t objects in it's definition.
      
      Move all the btree format definitions from xfs_btree.h,
      xfs_bmap_btree.h, xfs_alloc_btree.h and xfs_ialloc_btree.h to
      xfs_format.h to continue the process of centralising the on-disk
      format definitions. With this done, the xfs inode definitions are no
      longer dependent on btree header files.
      
      The enables a massive culling of unnecessary includes, with close to
      200 #include directives removed from the XFS kernel code base.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      a4fbe6ab
    • D
      xfs: decouple log and transaction headers · 239880ef
      Dave Chinner 提交于
      xfs_trans.h has a dependency on xfs_log.h for a couple of
      structures. Most code that does transactions doesn't need to know
      anything about the log, but this dependency means that they have to
      include xfs_log.h. Decouple the xfs_trans.h and xfs_log.h header
      files and clean up the includes to be in dependency order.
      
      In doing this, remove the direct include of xfs_trans_reserve.h from
      xfs_trans.h so that we remove the dependency between xfs_trans.h and
      xfs_mount.h. Hence the xfs_trans.h include can be moved to the
      indicate the actual dependencies other header files have on it.
      
      Note that these are kernel only header files, so this does not
      translate to any userspace changes at all.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      239880ef
    • D
      xfs: create a shared header file for format-related information · 70a9883c
      Dave Chinner 提交于
      All of the buffer operations structures are needed to be exported
      for xfs_db, so move them all to a common location rather than
      spreading them all over the place. They are verifying the on-disk
      format, so while xfs_format.h might be a good place, it is not part
      of the on disk format.
      
      Hence we need to create a new header file that we centralise these
      related definitions. Start by moving the bffer operations
      structures, and then also move all the other definitions that have
      crept into xfs_log_format.h and xfs_format.h as there was no other
      shared header file to put them in.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      70a9883c
  18. 31 8月, 2013 1 次提交
  19. 21 8月, 2013 1 次提交
  20. 13 8月, 2013 2 次提交
  21. 28 6月, 2013 1 次提交
    • D
      xfs: Use inode create transaction · ddf6ad01
      Dave Chinner 提交于
      Replace the use of buffer based logging of inode initialisation,
      uses the new logical form to describe the range to be initialised
      in recovery. We continue to "log" the inode buffers to push them
      into the AIL and ensure that the inode create transaction is not
      removed from the log before the inode buffers are written to disk.
      
      Update the transaction identifier and reservations to match the
      changed implementation.
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      ddf6ad01