1. 26 7月, 2018 1 次提交
    • A
      gfs2: Special-case rindex for gfs2_grow · 77612578
      Andreas Gruenbacher 提交于
      To speed up the common case of appending to a file,
      gfs2_write_alloc_required presumes that writing beyond the end of a file
      will always require additional blocks to be allocated.  This assumption
      is incorrect for preallocates files, but there are no negative
      consequences as long as *some* space is still left on the filesystem.
      
      One special file that always has some space preallocated beyond the end
      of the file is the rindex: when growing a filesystem, gfs2_grow adds one
      or more new resource groups and appends records describing those
      resource groups to the rindex; the preallocated space ensures that this
      is always possible.
      
      However, when a filesystem is completely full, gfs2_write_alloc_required
      will indicate that an additional allocation is required, and appending
      the next record to the rindex will fail even though space for that
      record has already been preallocated.  To fix that, skip the incorrect
      optimization in gfs2_write_alloc_required, but for the rindex only.
      Other writes to preallocated space beyond the end of the file are still
      allowed to fail on completely full filesystems.
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      Reviewed-by: NBob Peterson <rpeterso@redhat.com>
      77612578
  2. 02 7月, 2018 4 次提交
  3. 21 6月, 2018 1 次提交
  4. 04 6月, 2018 4 次提交
  5. 02 6月, 2018 2 次提交
  6. 17 4月, 2018 1 次提交
    • A
      gfs2: Remove sdp->sd_jheightsize · 9a38662b
      Andreas Gruenbacher 提交于
      GFS2 keeps two arrarys in the superblock that define the maximum size of
      an inode depending on the inode's height: sdp->sd_heightsize defines the
      heights in units of sb->s_blocksize; sdp->sd_jheightsize defines them in
      units of sb->s_blocksize - sizeof(struct gfs2_meta_header).  These
      arrays are used to determine when additional layers of indirect blocks
      are needed.  The second array is used for directories which have an
      additional gfs2_meta_header at the beginning of each block.
      
      Distinguishing between these two cases makes no sense: the height
      required for representing N blocks will come out the same no matter if
      the calculation is done in gross (sb->s_blocksize) or net
      (sb->s_blocksize - sizeof(struct gfs2_meta_header)) units.
      
      Stuffed directories don't have an additional gfs2_meta_header, but the
      stuffed case is handled separately for both files and directories,
      anyway.
      
      Remove the unncessary sdp->sd_jheightsize array.
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      9a38662b
  7. 13 4月, 2018 1 次提交
  8. 29 3月, 2018 1 次提交
  9. 24 3月, 2018 1 次提交
    • A
      gfs2: Check for the end of metadata in punch_hole · bb491ce6
      Andreas Gruenbacher 提交于
      When punching a hole or truncating an inode down to a given size, also
      check if the truncate point / start of the hole is within the range we
      have metadata for.  Otherwise, we can end up freeing blocks that
      shouldn't be freed, corrupting the inode, or crashing the machine when
      trying to punch a hole into the void.
      
      When growing an inode via truncate, we set the new size but we don't
      allocate additional levels of indirect blocks and grow the inode height.
      When shrinking that inode again, the new size may still point beyond the
      end of the inode's metadata.
      
      Fixes xfstest generic/476.
      Debugged-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      bb491ce6
  10. 09 3月, 2018 1 次提交
  11. 08 3月, 2018 1 次提交
  12. 14 2月, 2018 1 次提交
    • A
      gfs2: Fixes to "Implement iomap for block_map" · 49edd5bf
      Andreas Gruenbacher 提交于
      It turns out that commit 3974320c "Implement iomap for block_map"
      introduced a few bugs that trigger occasional failures with xfstest
      generic/476:
      
      In gfs2_iomap_begin, we jump to do_alloc when we determine that we are
      beyond the end of the allocated metadata (height > ip->i_height).
      There, we can end up calling hole_size with a metapath that doesn't
      match the current metadata tree, which doesn't make sense.  After
      untangling the code at do_alloc, fix this by checking if the block we
      are looking for is within the range of allocated metadata.
      
      In addition, add a BUG() in case gfs2_iomap_begin is accidentally called
      for reading stuffed files: this is handled separately.  Make sure we
      don't truncate iomap->length for reads beyond the end of the file; in
      that case, the entire range counts as a hole.
      
      Finally, revert to taking a bitmap write lock when doing allocations.
      It's unclear why that change didn't lead to any failures during testing.
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      49edd5bf
  13. 19 1月, 2018 4 次提交
  14. 17 1月, 2018 8 次提交
  15. 31 10月, 2017 2 次提交
    • B
      GFS2: Implement iomap for block_map · 3974320c
      Bob Peterson 提交于
      This patch implements iomap for block mapping, and switches the
      block_map function to use it under the covers.
      
      The additional IOMAP_F_BOUNDARY iomap flag indicates when iomap has
      reached a "metadata boundary" and fetching the next mapping is likely to
      incur an additional I/O.  This flag is used for setting the bh buffer
      boundary flag.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      3974320c
    • B
      GFS2: Make height info part of metapath · 5f8bd444
      Bob Peterson 提交于
      This patch eliminates height parameters from function gfs2_bmap_alloc.
      Function find_metapath determines the metapath's "find height", also
      known as the desired height. Function lookup_metapath determines the
      metapath's "actual height", previously known as starting height or
      sheight. Function gfs2_bmap_alloc now gets both height values from
      the metapath. This simplification was done as a step toward switching
      the block_map functions to using iomap. The bh_map responsibilities
      are also removed from function gfs2_bmap_alloc for the same reason.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      5f8bd444
  16. 26 9月, 2017 1 次提交
  17. 31 8月, 2017 1 次提交
    • B
      GFS2: Fix non-recursive truncate bug · c4a9d189
      Bob Peterson 提交于
      Before this patch if you truncated a file to a smaller size it
      wasn't freeing all the blocks properly. There are two reasons.
      
      First, the metapath comparison was not comparing previous heights.
      I added a function, mp_eq_to_hgt, which checks the metapath at
      all heights prior to the target height.
      
      Second, in function find_nonnull_ptr, it needed to zero out all
      pointers for heights following the target height. Translated into
      decimal integer terms, this way a number like 299, when incremented,
      becomes 300, not 399. The 2 gets incremented to 3, and the following
      digits need to be reset.
      
      These two things allow the truncate state machine to properly find
      the blocks it needs to delete.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      c4a9d189
  18. 21 7月, 2017 1 次提交
    • C
      gfs2: add flag REQ_PRIO for metadata I/O · e477b24b
      Coly Li 提交于
      When gfs2 does metadata I/O, only REQ_META is used as a metadata hint of
      the bio. But flag REQ_META is just a hint for block trace, not for block
      layer code to handle a bio as metadata request.
      
      For some of metadata I/Os of gfs2, A REQ_PRIO flag on the metadata bio
      would be very informative to block layer code. For example, if bcache is
      used as a I/O cache for gfs2, it will be possible for bcache code to get
      the hint and cache the pre-fetched metadata blocks on cache device. This
      behavior may be helpful to improve metadata I/O performance if the
      following requests hit the cache.
      
      Here are the locations in gfs2 code where a REQ_PRIO flag should be added,
      - All places where REQ_READAHEAD is used, gfs2 code uses this flag for
        metadata read ahead.
      - In gfs2_meta_rq() where the first metadata block is read in.
      - In gfs2_write_buf_to_page(), read in quota metadata blocks to have them
        up to date.
      These metadata blocks are probably to be accessed again in future, adding
      a REQ_PRIO flag may have bcache to keep such metadata in fast cache
      device. For system without a cache layer, REQ_PRIO can still provide hint
      to block layer to handle metadata requests more properly.
      Signed-off-by: NColy Li <colyli@suse.de>
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      e477b24b
  19. 05 7月, 2017 1 次提交
  20. 09 5月, 2017 1 次提交
  21. 19 4月, 2017 1 次提交
    • B
      GFS2: Non-recursive delete · d552a2b9
      Bob Peterson 提交于
      Implement truncate/delete as a non-recursive algorithm. The older
      algorithm was implemented with recursion to strip off each layer
      at a time (going by height, starting with the maximum height.
      This version tries to do the same thing but without recursion,
      and without needing to allocate new structures or lists in memory.
      
      For example, say you want to truncate a very large file to 1 byte,
      and its end-of-file metapath is: 0.505.463.428. The starting
      metapath would be 0.0.0.0. Since it's a truncate to non-zero, it
      needs to preserve that byte, and all metadata pointing to it.
      So it would start at 0.0.0.0, look up all its metadata buffers,
      then free all data blocks pointed to at the highest level.
      After that buffer is "swept", it moves on to 0.0.0.1, then
      0.0.0.2, etc., reading in buffers and sweeping them clean.
      When it gets to the end of the 0.0.0 metadata buffer (for 4K
      blocks the last valid one is 0.0.0.508), it backs up to the
      previous height and starts working on 0.0.1.0, then 0.0.1.1,
      and so forth. After it reaches the end and sweeps 0.0.1.508,
      it continues with 0.0.2.0, and so on. When that height is
      exhausted, and it reaches 0.0.508.508 it backs up another level,
      to 0.1.0.0, then 0.1.0.1, through 0.1.0.508. So it has to keep
      marching backwards and forwards through the metadata until it's
      all swept clean. Once it has all the data blocks freed, it
      lowers the strip height, and begins the process all over again,
      but with one less height. This time it sweeps 0.0.0 through
      0.505.463. When that's clean, it lowers the strip height again
      and works to free 0.505. Eventually it strips the lowest height, 0.
      For a delete or truncate to 0, all metadata for all heights of
      0.0.0.0 would be freed. For a truncate to 1 byte, 0.0.0.0 would
      be preserved.
      
      This isn't much different from normal integer incrementing,
      where an integer gets incremented from 0000 (0.0.0.0) to 3021
      (3.0.2.1). So 0000 gets increments to 0001, 0002, up to 0009,
      then on to 0010, 0011 up to 0099, then 0100 and so forth. It's
      just that each "digit" goes from 0 to 508 (for a total of 509
      pointers) rather than from 0 to 9.
      
      Note that the dinode will only have 483 pointers due to the
      dinode structure itself.
      
      Also note: this is just an example. These numbers (509 and 483)
      are based on a standard 4K block size. Smaller block sizes will
      yield smaller numbers of indirect pointers accordingly.
      
      The truncation process is accomplished with the help of two
      major functions and a few helper functions.
      
      Functions do_strip and recursive_scan are obsolete, so removed.
      
      New function sweep_bh_for_rgrps cleans a buffer_head pointed to
      by the given metapath and height. By cleaning, I mean it frees
      all blocks starting at the offset passed in metapath. It starts
      at the first block in the buffer pointed to by the metapath and
      identifies its resource group (rgrp). From there it frees all
      subsequent block pointers that lie within that rgrp. If it's
      already inside a transaction, it stays within it as long as it
      can. In other words, it doesn't close a transaction until it knows
      it's freed what it can from the resource group. In this way,
      multiple buffers may be cleaned in a single transaction, as long
      as those blocks in the buffer all lie within the same rgrp.
      
      If it's not in a transaction, it starts one. If the buffer_head
      has references to blocks within multiple rgrps, it frees all the
      blocks inside the first rgrp it finds, then closes the
      transaction. Then it repeats the cycle: identifies the next
      unfreed block, uses it to find its rgrp, then starts a new
      transaction for that set. It repeats this process repeatedly
      until the buffer_head contains no more references to any blocks
      past the given metapath.
      
      Function trunc_dealloc has been reworked into a finite state
      automaton. It has basically 3 active states:
      DEALLOC_MP_FULL, DEALLOC_MP_LOWER, and DEALLOC_FILL_MP:
      
      The DEALLOC_MP_FULL state implies the metapath has a full set
      of buffers out to the "shrink height", and therefore, it can
      call function sweep_bh_for_rgrps to free the blocks within the
      highest height of the metapath. If it's just swept the lowest
      level (or an error has occurred) the state machine is ended.
      Otherwise it proceeds to the DEALLOC_MP_LOWER state.
      
      The DEALLOC_MP_LOWER state implies we are finished with a given
      buffer_head, which may now be released, and therefore we are
      then missing some buffer information from the metapath. So we
      need to find more buffers to read in. In most cases, this is
      just a matter of releasing the buffer_head and moving to the
      next pointer from the previous height, so it may be read in and
      swept as well. If it can't find another non-null pointer to
      process, it checks whether it's reached the end of a height
      and needs to lower the strip height, or whether it still needs
      move forward through the previous height's metadata. In this
      state, all zero-pointers are skipped. From this state, it can
      only loop around (once more backing up another height) or,
      once a valid metapath is found (one that has non-zero
      pointers), proceed to state DEALLOC_FILL_MP.
      
      The DEALLOC_FILL_MP state implies that we have a metapath
      but not all its buffers are read in. So we must proceed to read
      in buffer_heads until the metapath has a valid buffer for every
      height. If the previous state backed us up 3 heights, we may
      need to read in a buffer, increment the height, then repeat the
      process until buffers have been read in for all required heights.
      If it's successful reading a buffer, and it's at the highest
      height we need, it proceeds back to the DEALLOC_MP_FULL state.
      If it's unable to fill in a buffer, (encounters a hole, etc.)
      it tries to find another non-zero block pointer. If they're all
      zero, it lowers the height and returns to the DEALLOC_MP_LOWER
      state. If it finds a good non-null pointer, it loops around and
      reads it in, while keeping the metapath in lock-step with the
      pointers it examines.
      
      The state machine runs until the truncation request is
      satisfied. Then any transactions are ended, the quota and
      statfs data are updated, and the function is complete.
      
      Helper function metaptr1 was introduced to be an easy way to
      determine the start of a buffer_head's indirect pointers.
      
      Helper function lookup_mp_height was introduced to find a
      metapath index and read in the buffer that corresponds to it.
      In this way, function lookup_metapath becomes a simple loop to
      call it for every height.
      
      Helper function fillup_metapath is similar to lookup_metapath
      except it can do partial lookups. If the state machine
      backed up multiple levels (like 2999 wrapping to 3000) it
      needs to find out the next starting point and start issuing
      metadata reads at that point.
      
      Helper function hptrs is a shortcut to determine how many
      pointers should be expected in a buffer. Height 0 is the dinode
      which has fewer pointers than the others.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      d552a2b9
  22. 06 1月, 2017 1 次提交
    • B
      GFS2: Limit number of transaction blocks requested for truncates · 2fcf5cc3
      Bob Peterson 提交于
      This patch limits the number of transaction blocks requested during
      file truncates. If we have very large multi-terabyte files, and want
      to delete or truncate them, they might span so many resource groups
      that we overflow the journal blocks, and cause an assert failure.
      By limiting the number of blocks in the transaction, we prevent this
      overflow and give other running processes time to do transactions.
      
      The limiting factor I chose is sd_log_thresh2 which is currently
      set to 4/5ths of the journal. This same ratio is used in function
      gfs2_ail_flush_reqd to determine when a log flush is required.
      If we make the maximum value less than this, we can get into a
      infinite hang whereby the log stops moving because the number of
      used blocks is less than the threshold and the iterative loop
      needs more, but since we're under the threshold, the log daemon
      never starts any IO on the log.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      2fcf5cc3
新手
引导
客服 返回
顶部