1. 29 1月, 2018 1 次提交
    • C
      Split buffer's b_fspriv field · fb1755a6
      Carlos Maiolino 提交于
      By splitting the b_fspriv field into two different fields (b_log_item
      and b_li_list). It's possible to get rid of an old ABI workaround, by
      using the new b_log_item field to store xfs_buf_log_item separated from
      the log items attached to the buffer, which will be linked in the new
      b_li_list field.
      
      This way, there is no more need to reorder the log items list to place
      the buf_log_item at the beginning of the list, simplifying a bit the
      logic to handle buffer IO.
      
      This also opens the possibility to change buffer's log items list into a
      proper list_head.
      
      b_log_item field is still defined as a void *, because it is still used
      by the log buffers to store xlog_in_core structures, and there is no
      need to add an extra field on xfs_buf just for xlog_in_core.
      Signed-off-by: NCarlos Maiolino <cmaiolino@redhat.com>
      Reviewed-by: NBill O'Donnell <billodo@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      [darrick: minor style changes]
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      fb1755a6
  2. 18 1月, 2018 1 次提交
  3. 09 1月, 2018 4 次提交
  4. 22 12月, 2017 1 次提交
    • D
      xfs: always honor OWN_UNKNOWN rmap removal requests · 33df3a9c
      Darrick J. Wong 提交于
      Calling xfs_rmap_free with an unknown owner is supposed to remove any
      rmaps covering that range regardless of owner.  This is used by the EFI
      recovery code to say "we're freeing this, it mustn't be owned by
      anything anymore", but for whatever reason xfs_free_ag_extent filters
      them out.
      
      Therefore, remove the filter and make xfs_rmap_unmap actually treat it
      as a wildcard owner -- free anything that's already there, and if
      there's no owner at all then that's fine too.
      
      There are two existing callers of bmap_add_free that take care the rmap
      deferred ops themselves and use OWN_UNKNOWN to skip the EFI-based rmap
      cleanup; convert these to use OWN_NULL (via helpers), and now we really
      require that an RUI (if any) gets added to the defer ops before any EFI.
      
      Lastly, now that xfs_free_extent filters out OWN_NULL rmap free requests,
      growfs will have to consult directly with the rmap to ensure that there
      aren't any rmaps in the grown region.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      33df3a9c
  5. 02 11月, 2017 1 次提交
  6. 27 10月, 2017 1 次提交
  7. 12 10月, 2017 1 次提交
  8. 28 6月, 2017 1 次提交
  9. 20 6月, 2017 1 次提交
  10. 04 4月, 2017 2 次提交
  11. 18 2月, 2017 1 次提交
  12. 10 2月, 2017 1 次提交
  13. 10 1月, 2017 4 次提交
    • C
      xfs: don't rely on ->total in xfs_alloc_space_available · 12ef8301
      Christoph Hellwig 提交于
      ->total is a bit of an odd parameter passed down to the low-level
      allocator all the way from the high-level callers.  It's supposed to
      contain the maximum number of blocks to be allocated for the whole
      transaction [1].
      
      But in xfs_iomap_write_allocate we only convert existing delayed
      allocations and thus only have a minimal block reservation for the
      current transaction, so xfs_alloc_space_available can't use it for
      the allocation decisions.  Use the maximum of args->total and the
      calculated block requirement to make a decision.  We probably should
      get rid of args->total eventually and instead apply ->minleft more
      broadly, but that will require some extensive changes all over.
      
      [1] which creates lots of confusion as most callers don't decrement it
      once doing a first allocation.  But that's for a separate series.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      12ef8301
    • C
      xfs: adjust allocation length in xfs_alloc_space_available · 54fee133
      Christoph Hellwig 提交于
      We must decide in xfs_alloc_fix_freelist if we can perform an
      allocation from a given AG is possible or not based on the available
      space, and should not fail the allocation past that point on a
      healthy file system.
      
      But currently we have two additional places that second-guess
      xfs_alloc_fix_freelist: xfs_alloc_ag_vextent tries to adjust the
      maxlen parameter to remove the reservation before doing the
      allocation (but ignores the various minium freespace requirements),
      and xfs_alloc_fix_minleft tries to fix up the allocated length
      after we've found an extent, but ignores the reservations and also
      doesn't take the AGFL into account (and thus fails allocations
      for not matching minlen in some cases).
      
      Remove all these later fixups and just correct the maxlen argument
      inside xfs_alloc_fix_freelist once we have the AGF buffer locked.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      54fee133
    • C
      xfs: fix bogus minleft manipulations · 255c5162
      Christoph Hellwig 提交于
      We can't just set minleft to 0 when we're low on space - that's exactly
      what we need minleft for: to protect space in the AG for btree block
      allocations when we are low on free space.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      255c5162
    • C
      xfs: bump up reserved blocks in xfs_alloc_set_aside · 5149fd32
      Christoph Hellwig 提交于
      Setting aside 4 blocks globally for bmbt splits isn't all that useful,
      as different threads can allocate space in parallel.  Bump it to 4
      blocks per AG to allow each thread that is currently doing an
      allocation to dip into it separately.  Without that we may no have
      enough reserved blocks if there are enough parallel transactions
      in an almost out space file system that all run into bmap btree
      splits.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      5149fd32
  14. 05 12月, 2016 1 次提交
  15. 04 10月, 2016 4 次提交
  16. 26 9月, 2016 1 次提交
    • D
      xfs: remote attribute blocks aren't really userdata · 292378ed
      Dave Chinner 提交于
      When adding a new remote attribute, we write the attribute to the
      new extent before the allocation transaction is committed. This
      means we cannot reuse busy extents as that violates crash
      consistency semantics. Hence we currently treat remote attribute
      extent allocation like userdata because it has the same overwrite
      ordering constraints as userdata.
      
      Unfortunately, this also allows the allocator to incorrectly apply
      extent size hints to the remote attribute extent allocation. This
      results in interesting failures, such as transaction block
      reservation overruns and in-memory inode attribute fork corruption.
      
      To fix this, we need to separate the busy extent reuse configuration
      from the userdata configuration. This changes the definition of
      XFS_BMAPI_METADATA slightly - it now means that allocation is
      metadata and reuse of busy extents is acceptible due to the metadata
      ordering semantics of the journal. If this flag is not set, it
      means the allocation is that has unordered data writeback, and hence
      busy extent reuse is not allowed. It no longer implies the
      allocation is for user data, just that the data write will not be
      strictly ordered. This matches the semantics for both user data
      and remote attribute block allocation.
      
      As such, This patch changes the "userdata" field to a "datatype"
      field, and adds a "no busy reuse" flag to the field.
      When we detect an unordered data extent allocation, we immediately set
      the no reuse flag. We then set the "user data" flags based on the
      inode fork we are allocating the extent to. Hence we only set
      userdata flags on data fork allocations now and consider attribute
      fork remote extents to be an unordered metadata extent.
      
      The result is that remote attribute extents now have the expected
      allocation semantics, and the data fork allocation behaviour is
      completely unchanged.
      
      It should be noted that there may be other ways to fix this (e.g.
      use ordered metadata buffers for the remote attribute extent data
      write) but they are more invasive and difficult to validate both
      from a design and implementation POV. Hence this patch takes the
      simple, obvious route to fixing the problem...
      Reported-and-tested-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      292378ed
  17. 19 9月, 2016 1 次提交
    • D
      xfs: set up per-AG free space reservations · 3fd129b6
      Darrick J. Wong 提交于
      One unfortunate quirk of the reference count and reverse mapping
      btrees -- they can expand in size when blocks are written to *other*
      allocation groups if, say, one large extent becomes a lot of tiny
      extents.  Since we don't want to start throwing errors in the middle
      of CoWing, we need to reserve some blocks to handle future expansion.
      The transaction block reservation counters aren't sufficient here
      because we have to have a reserve of blocks in every AG, not just
      somewhere in the filesystem.
      
      Therefore, create two per-AG block reservation pools.  One feeds the
      AGFL so that rmapbt expansion always succeeds, and the other feeds all
      other metadata so that refcountbt expansion never fails.
      
      Use the count of how many reserved blocks we need to have on hand to
      create a virtual reservation in the AG.  Through selective clamping of
      the maximum length of allocation requests and of the length of the
      longest free extent, we can make it look like there's less free space
      in the AG unless the reservation owner is asking for blocks.
      
      In other words, play some accounting tricks in-core to make sure that
      we always have blocks available.  On the plus side, there's nothing to
      clean up if we crash, which is contrast to the strategy that the rough
      draft used (actually removing extents from the freespace btrees).
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      3fd129b6
  18. 26 8月, 2016 1 次提交
  19. 17 8月, 2016 2 次提交
  20. 03 8月, 2016 9 次提交
  21. 21 6月, 2016 1 次提交
    • D
      xfs: refactor btree maxlevels computation · 19b54ee6
      Darrick J. Wong 提交于
      Create a common function to calculate the maximum height of a per-AG
      btree.  This will eventually be used by the rmapbt and refcountbt
      code to calculate appropriate maxlevels values for each.  This is
      important because the verifiers and the transaction block
      reservations depend on accurate estimates of how many blocks are
      needed to satisfy a btree split.
      
      We were mistakenly using the max bnobt height for all the btrees,
      which creates a dangerous situation since the larger records and
      keys in an rmapbt make it very possible that the rmapbt will be
      taller than the bnobt and so we can run out of transaction block
      reservation.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      19b54ee6