1. 18 9月, 2014 3 次提交
    • F
      Btrfs: improve free space cache management and space allocation · 20005523
      Filipe Manana 提交于
      While under random IO, a block group's free space cache eventually reaches
      a state where it has a mix of extent entries and bitmap entries representing
      free space regions.
      
      As later free space regions are returned to the cache, some of them are merged
      with existing extent entries if they are contiguous with them. But others are
      not merged, because despite the existence of adjacent free space regions in
      the cache, the merging doesn't happen because the existing free space regions
      are represented in bitmap extents. Even when new free space regions are merged
      with existing extent entries (enlarging the free space range they represent),
      we create chances of having after an enlarged region that is contiguous with
      some other region represented in a bitmap entry.
      
      Both clustered and non-clustered space allocation work by iterating over our
      extent and bitmap entries and skipping any that represents a region smaller
      then the allocation request (and giving preference to extent entries before
      bitmap entries). By having a contiguous free space region that is represented
      by 2 (or more) entries (mix of extent and bitmap entries), we end up not
      satisfying an allocation request with a size larger than the size of any of
      the entries but no larger than the sum of their sizes. Making the caller assume
      we're under a ENOSPC condition or force it to allocate multiple smaller space
      regions (as we do for file data writes), which adds extra overhead and more
      chances of causing fragmentation due to the smaller regions being all spread
      apart from each other (more likely when under concurrency).
      
      For example, if we have the following in the cache:
      
      * extent entry representing free space range: [128Mb - 256Kb, 128Mb[
      
      * bitmap entry covering the range [128Mb, 256Mb[, but only with the bits
        representing the range [128Mb, 128Mb + 768Kb[ set - that is, only that
        space in this 128Mb area is marked as free
      
      An allocation request for 1Mb, starting at offset not greater than 128Mb - 256Kb,
      would fail before, despite the existence of such contiguous free space area in the
      cache. The caller could only allocate up to 768Kb of space at once and later another
      256Kb (or vice-versa). In between each smaller allocation request, another task
      working on a different file/inode might come in and take that space, preventing the
      former task of getting a contiguous 1Mb region of free space.
      
      Therefore this change implements the ability to move free space from bitmap
      entries into existing and new free space regions represented with extent
      entries. This is done when a space region is added to the cache.
      
      A test was added to the sanity tests that explains in detail the issue too.
      
      Some performance test results with compilebench on a 4 cores machine, with
      32Gb of ram and using an HDD follow.
      
      Test: compilebench -D /mnt -i 30 -r 1000 --makej
      
      Before this change:
      
         intial create total runs 30 avg 69.02 MB/s (user 0.28s sys 0.57s)
         compile total runs 30 avg 314.96 MB/s (user 0.12s sys 0.25s)
         read compiled tree total runs 3 avg 27.14 MB/s (user 1.52s sys 0.90s)
         delete compiled tree total runs 30 avg 3.14 seconds (user 0.15s sys 0.66s)
      
      After this change:
      
         intial create total runs 30 avg 68.37 MB/s (user 0.29s sys 0.55s)
         compile total runs 30 avg 382.83 MB/s (user 0.12s sys 0.24s)
         read compiled tree total runs 3 avg 27.82 MB/s (user 1.45s sys 0.97s)
         delete compiled tree total runs 30 avg 3.18 seconds (user 0.17s sys 0.65s)
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      20005523
    • D
      btrfs: use DIV_ROUND_UP instead of open-coded variants · ed6078f7
      David Sterba 提交于
      The form
      
        (value + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT
      
      is equivalent to
      
        (value + PAGE_CACHE_SIZE - 1) / PAGE_CACHE_SIZE
      
      The rest is a simple subsitution, no difference in the generated
      assembly code.
      Signed-off-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NChris Mason <clm@fb.com>
      ed6078f7
    • D
      btrfs: cleanup ino cache members of btrfs_root · 57cdc8db
      David Sterba 提交于
      The naming is confusing, generic yet used for a specific cache. Add a
      prefix 'ino_' or rename appropriately.
      Signed-off-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NChris Mason <clm@fb.com>
      57cdc8db
  2. 20 6月, 2014 2 次提交
    • M
      Btrfs: fix broken free space cache after the system crashed · e570fd27
      Miao Xie 提交于
      When we mounted the filesystem after the crash, we got the following
      message:
        BTRFS error (device xxx): block group xxxx has wrong amount of free space
        BTRFS error (device xxx): failed to load free space cache for block group xxx
      
      It is because we didn't update the metadata of the allocated space (in extent
      tree) until the file data was written into the disk. During this time, there was
      no information about the allocated spaces in either the extent tree nor the
      free space cache. when we wrote out the free space cache at this time (commit
      transaction), those spaces were lost. In fact, only the free space that is
      used to store the file data had this problem, the others didn't because
      the metadata of them is updated in the same transaction context.
      
      There are many methods which can fix the above problem
      - track the allocated space, and write it out when we write out the free
        space cache
      - account the size of the allocated space that is used to store the file
        data, if the size is not zero, don't write out the free space cache.
      
      The first one is complex and may make the performance drop down.
      This patch chose the second method, we use a per-block-group variant to
      account the size of that allocated space. Besides that, we also introduce
      a per-block-group read-write semaphore to avoid the race between
      the allocation and the free space cache write out.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      e570fd27
    • M
      Btrfs: make free space cache write out functions more readable · 5349d6c3
      Miao Xie 提交于
      This patch makes the free space cache write out functions more readable,
      and beisdes that, it also reduces the stack space that the function --
      __btrfs_write_out_cache uses from 194bytes to 144bytes.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      5349d6c3
  3. 10 6月, 2014 2 次提交
  4. 29 1月, 2014 2 次提交
  5. 12 11月, 2013 4 次提交
  6. 21 9月, 2013 1 次提交
    • M
      Btrfs: allocate the free space by the existed max extent size when ENOSPC · a4820398
      Miao Xie 提交于
      By the current code, if the requested size is very large, and all the extents
      in the free space cache are small, we will waste lots of the cpu time to cut
      the requested size in half and search the cache again and again until it gets
      down to the size the allocator can return. In fact, we can know the max extent
      size in the cache after the first search, so we needn't cut the size in half
      repeatedly, and just use the max extent size directly. This way can save
      lots of cpu time and make the performance grow up when there are only fragments
      in the free space cache.
      
      According to my test, if there are only 4KB free space extents in the fs,
      and the total size of those extents are 256MB, we can reduce the execute
      time of the following test from 5.4s to 1.4s.
        dd if=/dev/zero of=<testfile> bs=1MB count=1 oflag=sync
      
      Changelog v2 -> v3:
      - fix the problem that we skip the block group with the space which is
        less than we need.
      
      Changelog v1 -> v2:
      - address the problem that we return a wrong start position when searching
        the free space in a bitmap.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      a4820398
  7. 13 9月, 2013 1 次提交
  8. 01 9月, 2013 4 次提交
  9. 14 6月, 2013 3 次提交
  10. 28 5月, 2013 1 次提交
  11. 18 5月, 2013 2 次提交
  12. 07 5月, 2013 5 次提交
    • E
      btrfs: make static code static & remove dead code · 48a3b636
      Eric Sandeen 提交于
      Big patch, but all it does is add statics to functions which
      are in fact static, then remove the associated dead-code fallout.
      
      removed functions:
      
      btrfs_iref_to_path()
      __btrfs_lookup_delayed_deletion_item()
      __btrfs_search_delayed_insertion_item()
      __btrfs_search_delayed_deletion_item()
      find_eb_for_page()
      btrfs_find_block_group()
      range_straddles_pages()
      extent_range_uptodate()
      btrfs_file_extent_length()
      btrfs_scrub_cancel_devid()
      btrfs_start_transaction_lflush()
      
      btrfs_print_tree() is left because it is used for debugging.
      btrfs_start_transaction_lflush() and btrfs_reada_detach() are
      left for symmetry.
      
      ulist.c functions are left, another patch will take care of those.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      48a3b636
    • J
      Btrfs: deal with free space cache errors while replaying log · b50c6e25
      Josef Bacik 提交于
      So everybody who got hit by my fsync bug will still continue to hit this
      BUG_ON() in the free space cache, which is pretty heavy handed.  So I took a
      file system that had this bug and fixed up all the BUG_ON()'s and leaks that
      popped up when I tried to mount a broken file system like this.  With this patch
      we just fail to mount instead of panicing.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      b50c6e25
    • S
      Btrfs: Include the device in most error printk()s · c2cf52eb
      Simon Kirby 提交于
      With more than one btrfs volume mounted, it can be very difficult to find
      out which volume is hitting an error. btrfs_error() will print this, but
      it is currently rigged as more of a fatal error handler, while many of
      the printk()s are currently for debugging and yet-unhandled cases.
      
      This patch just changes the functions where the device information is
      already available. Some cases remain where the root or fs_info is not
      passed to the function emitting the error.
      
      This may introduce some confusion with volumes backed by multiple devices
      emitting errors referring to the primary device in the set instead of the
      one on which the error occurred.
      
      Use btrfs_printk(fs_info, format, ...) rather than writing the device
      string every time, and introduce macro wrappers ala XFS for brevity.
      Since the function already cannot be used for continuations, print a
      newline as part of the btrfs_printk() message rather than at each caller.
      Signed-off-by: NSimon Kirby <sim@hostway.ca>
      Reviewed-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      c2cf52eb
    • L
      Btrfs: cleanup unused arguments of btrfs_csum_data · b0496686
      Liu Bo 提交于
      Argument 'root' is no more used in btrfs_csum_data().
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      b0496686
    • J
      Btrfs: add some free space cache tests · 74255aa0
      Josef Bacik 提交于
      We keep hitting bugs in the tree log replay because btrfs_remove_free_space
      doesn't account for some corner case.  So add a bunch of tests to try and fully
      test btrfs_remove_free_space since the only time it is called is during tree log
      replay.  These tests all finish successfully, so as we find more of these bugs
      we need to add to these tests to make sure we don't regress in fixing things.
      I've hidden the tests behind a Kconfig option, but they take no time to run so
      all btrfs developers should have this turned on all the time.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      74255aa0
  13. 21 2月, 2013 1 次提交
    • J
      Btrfs: relax the block group size limit for bitmaps · dde5740f
      Josef Bacik 提交于
      Dave pointed out that xfstests 273 will tell you that it failed to load the
      space cache for a block group when it remounts.  This is because we run out
      of space writing out the block group cache.  This is ok and is working as it
      should, but let's try to be a bit nicer.  This happens because the block
      group was 100mb, but bitmap entries cover 128mb, so we were only getting
      extent entries for this block group, which ended up being too many to fit in
      the free space cache.  So relax the bitmap size requirements to block groups
      that are at least half the size a bitmap will cover or larger, that way we
      can still keep the amount of space used in the free space cache low enough
      to be able to write it out.  With this patch I no longer fail to write out
      the free space cache.  Thanks,
      Reported-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      dde5740f
  14. 02 2月, 2013 1 次提交
    • D
      Btrfs: RAID5 and RAID6 · 53b381b3
      David Woodhouse 提交于
      This builds on David Woodhouse's original Btrfs raid5/6 implementation.
      The code has changed quite a bit, blame Chris Mason for any bugs.
      
      Read/modify/write is done after the higher levels of the filesystem have
      prepared a given bio.  This means the higher layers are not responsible
      for building full stripes, and they don't need to query for the topology
      of the extents that may get allocated during delayed allocation runs.
      It also means different files can easily share the same stripe.
      
      But, it does expose us to incorrect parity if we crash or lose power
      while doing a read/modify/write cycle.  This will be addressed in a
      later commit.
      
      Scrub is unable to repair crc errors on raid5/6 chunks.
      
      Discard does not work on raid5/6 (yet)
      
      The stripe size is fixed at 64KiB per disk.  This will be tunable
      in a later commit.
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      53b381b3
  15. 25 1月, 2013 1 次提交
    • J
      Btrfs: fix panic when recovering tree log · b0175117
      Josef Bacik 提交于
      A user reported a BUG_ON(ret) that occured during tree log replay.  Ret was
      -EAGAIN, so what I think happened is that we removed an extent that covered
      a bitmap entry and an extent entry.  We remove the part from the bitmap and
      return -EAGAIN and then search for the next piece we want to remove, which
      happens to be an entire extent entry, so we just free the sucker and return.
      The problem is ret is still set to -EAGAIN so we trip the BUG_ON().  The
      user used btrfs-zero-log so I'm not 100% sure this is what happened so I've
      added a WARN_ON() to catch the other possibility.  Thanks,
      Reported-by: NJan Steffens <jan.steffens@gmail.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      b0175117
  16. 17 12月, 2012 2 次提交
  17. 12 12月, 2012 1 次提交
  18. 09 10月, 2012 1 次提交
    • J
      Btrfs: cache extent state when writing out dirty metadata pages · e6138876
      Josef Bacik 提交于
      Everytime we write out dirty pages we search for an offset in the tree,
      convert the bits in the state, and then when we wait we search for the
      offset again and clear the bits.  So for every dirty range in the io tree we
      are doing 4 rb searches, which is suboptimal.  With this patch we are only
      doing 2 searches for every cycle (modulo weird things happening).  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      e6138876
  19. 04 10月, 2012 1 次提交
  20. 24 7月, 2012 1 次提交
  21. 03 7月, 2012 1 次提交
    • J
      Btrfs: fix tree log remove space corner case · bdb7d303
      Josef Bacik 提交于
      The tree log stuff can have allocated space that we end up having split
      across a bitmap and a real extent.  The free space code does not deal with
      this, it assumes that if it finds an extent or bitmap entry that the entire
      range must fall within the entry it finds.  This isn't necessarily the case,
      so rework the remove function so it can handle this case properly.  This
      fixed two panics the user hit, first in the case where the space was
      initially in a bitmap and then in an extent entry, and then the reverse
      case.  Thanks,
      Reported-and-tested-by: NShaun Reich <sreich@kde.org>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      bdb7d303
新手
引导
客服 返回
顶部