1. 29 5月, 2015 1 次提交
  2. 23 2月, 2015 3 次提交
    • E
      xfs: pass mp to XFS_WANT_CORRUPTED_RETURN · 5fb5aeee
      Eric Sandeen 提交于
      Today, if we hit an XFS_WANT_CORRUPTED_RETURN we don't print any
      information about which filesystem hit it.  Passing in the mp allows
      us to print the filesystem (device) name, which is a pretty critical
      piece of information.
      
      Tested by running fsfuzzer 'til I hit some.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      5fb5aeee
    • E
      xfs: pass mp to XFS_WANT_CORRUPTED_GOTO · c29aad41
      Eric Sandeen 提交于
      Today, if we hit an XFS_WANT_CORRUPTED_GOTO we don't print any
      information about which filesystem hit it.  Passing in the mp allows
      us to print the filesystem (device) name, which is a pretty critical
      piece of information.
      
      Tested by running fsfuzzer 'til I hit some.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      c29aad41
    • D
      xfs: use generic percpu counters for inode counter · 501ab323
      Dave Chinner 提交于
      XFS has hand-rolled per-cpu counters for the superblock since before
      there was any generic implementation. There are some warts around
      the  use of them for the inode counter as the hand rolled counter is
      designed to be accurate at zero, but has no specific accurracy at
      any other value. This design causes problems for the maximum inode
      count threshold enforcement, as there is no trigger that balances
      the counters as they get close tothe maximum threshold.
      
      Instead of designing new triggers for balancing, just replace the
      handrolled per-cpu counter with a generic counter.  This enables us
      to update the counter through the normal superblock modification
      funtions, but rather than do that we add a xfs_mod_icount() helper
      function (from Christoph Hellwig) and keep the percpu counter
      outside the superblock in the struct xfs_mount.
      
      This means we still need to initialise the per-cpu counter
      specifically when we read the superblock, and vice versa when we
      log/write it, but it does mean that we don't need to change any
      other code.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      501ab323
  3. 04 12月, 2014 1 次提交
    • D
      xfs: fix premature enospc on inode allocation · 7a1df156
      Dave Chinner 提交于
      After growing a filesystem, XFS can fail to allocate inodes even
      though there is a large amount of space available in the filesystem
      for inodes. The issue is caused by a nearly full allocation group
      having enough free space in it to be considered for inode
      allocation, but not enough contiguous free space to actually
      allocation inodes.  This situation results in successful selection
      of the AG for allocation, then failure of the allocation resulting
      in ENOSPC being reported to the caller.
      
      It is caused by two possible issues. Firstly, we only consider the
      lognest free extent and whether it would fit an inode chunk. If the
      extent is not correctly aligned, then we can't allocate an inode
      chunk in it regardless of the fact that it is large enough. This
      tends to be a permanent error until space in the AG is freed.
      
      The second issue is that we don't actually lock the AGI or AGF when
      we are doing these checks, and so by the time we get to actually
      allocating the inode chunk the space we thought we had in the AG may
      have been allocated. This tends to be a spurious error as it
      requires a race to trigger. Hence this case is ignored in this patch
      as the reported problem is for permanent errors.
      
      The first issue could be addressed by simply taking into account the
      alignment when checking the longest extent. This, however, would
      prevent allocation in AGs that have aligned, exact sized extents
      free. However, this case should be fairly rare compared to the
      number of allocations that occur near ENOSPC that would trigger this
      condition.
      
      Hence, when selecting the inode AG, take into account the inode
      cluster alignment when checking the lognest free extent in the AG.
      If we can't find any AGs with a contiguous free space large
      enough to be aligned, drop the alignment addition and just try for
      an AG that has enough contiguous free space available for an inode
      chunk. This won't prevent issues from occurring, but should avoid
      situations where other AGs have lots of free space but the selected
      AG can't allocate due to alignment constraints.
      Reported-by: NArkadiusz Miskiewicz <arekm@maven.pl>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      7a1df156
  4. 01 12月, 2014 1 次提交
  5. 28 11月, 2014 3 次提交
  6. 29 9月, 2014 1 次提交
  7. 09 9月, 2014 1 次提交
    • E
      xfs: add a few more verifier tests · e1b05723
      Eric Sandeen 提交于
      These were exposed by fsfuzzer runs; without them we fail
      in various exciting and sometimes convoluted ways when we
      encounter disk corruption.
      
      Without the MAXLEVELS tests we tend to walk off the end of
      an array in a loop like this:
      
              for (i = 0; i < cur->bc_nlevels; i++) {
                      if (cur->bc_bufs[i])
      
      Without the dirblklog test we try to allocate more memory
      than we could possibly hope for and loop forever:
      
      xfs_dabuf_map()
      	nfsb = mp->m_dir_geo->fsbcount;
      	irecs = kmem_zalloc(sizeof(irec) * nfsb, KM_SLEEP...
      
      As for the logbsize check, that's the convoluted one.
      
      If logbsize is specified at mount time, it's sanitized
      in xfs_parseargs; in particular it makes sure that it's
      not > XLOG_MAX_RECORD_BSIZE.
      
      If not specified at mount time, it comes from the superblock
      via sb_logsunit; this is limited to 256k at mkfs time as well;
      it's copied into m_logbsize in xfs_finish_flags().
      
      However, if for some reason the on-disk value is corrupt and
      too large, nothing catches it.  It's a circuitous path, but
      that size eventually finds its way to places that make the kernel
      very unhappy, leading to oopses in xlog_pack_data() because we
      use the size as an index into iclog->ic_data, but the array
      is not necessarily that big.
      
      Anyway - bounds checking when we read from disk is a good thing!
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      e1b05723
  8. 25 6月, 2014 2 次提交
  9. 22 6月, 2014 1 次提交
  10. 06 6月, 2014 1 次提交
  11. 20 5月, 2014 2 次提交
  12. 24 4月, 2014 6 次提交
  13. 07 3月, 2014 1 次提交
    • B
      xfs: avoid AGI/AGF deadlock scenario for inode chunk allocation · e480a723
      Brian Foster 提交于
      The inode chunk allocation path can lead to deadlock conditions if
      a transaction is dirtied with an AGF (to fix up the freelist) for
      an AG that cannot satisfy the actual allocation request. This code
      path is written to try and avoid this scenario, but it can be
      reproduced by running xfstests generic/270 in a loop on a 512b fs.
      
      An example situation is:
      - process A attempts an inode allocation on AG 3, modifies
        the freelist, fails the allocation and ultimately moves on to
        AG 0 with the AG 3 AGF held
      - process B is doing a free space operation (i.e., truncate) and
        acquires the AG 0 AGF, waits on the AG 3 AGF
      - process A acquires the AG 0 AGI, waits on the AG 0 AGF (deadlock)
      
      The problem here is that process A acquired the AG 3 AGF while
      moving on to AG 0 (and releasing the AG 3 AGI with the AG 3 AGF
      held). xfs_dialloc() makes one pass through each of the AGs when
      attempting to allocate an inode chunk. The expectation is a clean
      transaction if a particular AG cannot satisfy the allocation
      request. xfs_ialloc_ag_alloc() is written to support this through
      use of the minalignslop allocation args field.
      
      When using the agi->agi_newino optimization, we attempt an exact
      bno allocation request based on the location of the previously
      allocated chunk. minalignslop is set to inform the allocator that
      we will require alignment on this chunk, and thus to not allow the
      request for this AG if the extra space is not available. Suppose
      that the AG in question has just enough space for this request, but
      not at the requested bno. xfs_alloc_fix_freelist() will proceed as
      normal as it determines the request should succeed, and thus it is
      allowed to modify the agf. xfs_alloc_ag_vextent() ultimately fails
      because the requested bno is not available. In response, the caller
      moves on to a NEAR_BNO allocation request for the same AG. The
      alignment is set, but the minalignslop field is never reset. This
      increases the overall requirement of the request from the first
      attempt. If this delta is the difference between allocation success
      and failure for the AG, xfs_alloc_fix_freelist() rejects this
      request outright the second time around and causes the allocation
      request to unnecessarily fail for this AG.
      
      To address this situation, reset the minalignslop field immediately
      after use and prevent it from leaking into subsequent requests.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      e480a723
  14. 27 2月, 2014 4 次提交
  15. 13 12月, 2013 5 次提交
  16. 12 12月, 2013 1 次提交
  17. 07 11月, 2013 1 次提交
  18. 24 10月, 2013 3 次提交
    • D
      xfs: decouple inode and bmap btree header files · a4fbe6ab
      Dave Chinner 提交于
      Currently the xfs_inode.h header has a dependency on the definition
      of the BMAP btree records as the inode fork includes an array of
      xfs_bmbt_rec_host_t objects in it's definition.
      
      Move all the btree format definitions from xfs_btree.h,
      xfs_bmap_btree.h, xfs_alloc_btree.h and xfs_ialloc_btree.h to
      xfs_format.h to continue the process of centralising the on-disk
      format definitions. With this done, the xfs inode definitions are no
      longer dependent on btree header files.
      
      The enables a massive culling of unnecessary includes, with close to
      200 #include directives removed from the XFS kernel code base.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      a4fbe6ab
    • D
      xfs: decouple log and transaction headers · 239880ef
      Dave Chinner 提交于
      xfs_trans.h has a dependency on xfs_log.h for a couple of
      structures. Most code that does transactions doesn't need to know
      anything about the log, but this dependency means that they have to
      include xfs_log.h. Decouple the xfs_trans.h and xfs_log.h header
      files and clean up the includes to be in dependency order.
      
      In doing this, remove the direct include of xfs_trans_reserve.h from
      xfs_trans.h so that we remove the dependency between xfs_trans.h and
      xfs_mount.h. Hence the xfs_trans.h include can be moved to the
      indicate the actual dependencies other header files have on it.
      
      Note that these are kernel only header files, so this does not
      translate to any userspace changes at all.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      239880ef
    • D
      xfs: create a shared header file for format-related information · 70a9883c
      Dave Chinner 提交于
      All of the buffer operations structures are needed to be exported
      for xfs_db, so move them all to a common location rather than
      spreading them all over the place. They are verifying the on-disk
      format, so while xfs_format.h might be a good place, it is not part
      of the on disk format.
      
      Hence we need to create a new header file that we centralise these
      related definitions. Start by moving the bffer operations
      structures, and then also move all the other definitions that have
      crept into xfs_log_format.h and xfs_format.h as there was no other
      shared header file to put them in.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      70a9883c
  19. 31 8月, 2013 1 次提交
  20. 21 8月, 2013 1 次提交