1. 13 5月, 2011 3 次提交
    • A
      btrfs: quasi-round-robin for chunk allocation · 73c5de00
      Arne Jansen 提交于
      In a multi device setup, the chunk allocator currently always allocates
      chunks on the devices in the same order. This leads to a very uneven
      distribution, especially with RAID1 or RAID10 and an uneven number of
      devices.
      This patch always sorts the devices before allocating, and allocates the
      stripes on the devices with the most available space, as long as there
      is enough space available. In a low space situation, it first tries to
      maximize striping.
      The patch also simplifies the allocator and reduces the checks for
      corner cases.
      The simplification is done by several means. First, it defines the
      properties of each RAID type upfront. These properties are used afterwards
      instead of differentiating cases in several places.
      Second, the old allocator defined a minimum stripe size for each block
      group type, tried to find a large enough chunk, and if this fails just
      allocates a smaller one. This is now done in one step. The largest possible
      chunk (up to max_chunk_size) is searched and allocated.
      Because we now have only one pass, the allocation of the map (struct
      map_lookup) is moved down to the point where the number of stripes is
      already known. This way we avoid reallocation of the map.
      We still avoid allocating stripes that are not a multiple of STRIPE_SIZE.
      73c5de00
    • A
      btrfs: heed alloc_start · a9c9bf68
      Arne Jansen 提交于
      currently alloc_start is disregarded if the requested
      chunk size is bigger than (device size - alloc_start),
      but smaller than the device size.
      The only situation where I see this could have made sense
      was when a chunk equal the size of the device has been
      requested. This was possible as the allocator failed to
      take alloc_start into account when calculating the request
      chunk size. As this gets fixed by this patch, the workaround
      is not necessary anymore.
      a9c9bf68
    • A
      btrfs: move btrfs_cmp_device_free_bytes to super.c · bcd53741
      Arne Jansen 提交于
      this function won't be used here anymore, so move it super.c where it is
      used for df-calculation
      bcd53741
  2. 26 4月, 2011 8 次提交
  3. 18 4月, 2011 1 次提交
    • C
      Btrfs: fix free space cache leak · f65647c2
      Chris Mason 提交于
      The free space caching code was recently reworked to
      cache all the pages it needed instead of using find_get_page everywhere.
      
      One loop was missed though, so it ended up leaking pages.  This fixes
      it to use our page array instead of find_get_page.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      f65647c2
  4. 16 4月, 2011 3 次提交
  5. 13 4月, 2011 5 次提交
  6. 12 4月, 2011 8 次提交
  7. 09 4月, 2011 8 次提交
    • J
      Btrfs: check for duplicate iov_base's when doing dio reads · 93a54bc4
      Josef Bacik 提交于
      Apparently it is ok to submit a read to an IDE device with the same target page
      for different offsets.  This is what Windows does under qemu.  The problem is
      under DIO we expect them to be different buffers for checksumming reasons, and
      so this sort of thing will result in checksum errors, when in reality the file
      is fine.  So when reading, check to make sure that all iov bases are different,
      and if they aren't fall back to buffered mode, since that will work out right.
      Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      93a54bc4
    • J
      Btrfs: reuse the extent_map we found when calling btrfs_get_extent · 16d299ac
      Josef Bacik 提交于
      In btrfs_get_block_direct we call btrfs_get_extent to lookup the extent for the
      range that we are looking for.  If we don't find an extent, btrfs_get_extent
      will insert a extent_map for that area and mark it as a hole.  So it does the
      job of allocating a new extent map and inserting it into the io tree.  But if
      we're creating a new extent we free it up and redo all of that work.  So instead
      pass the em to btrfs_new_extent_direct(), and if it will work just allocate the
      disk space and set it up properly and bypass the freeing/allocating of a new
      extent map and the expensive operation of inserting the thing into the io_tree.
      Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      16d299ac
    • J
      Btrfs: do not use async submit for small DIO io's · 1ae39938
      Josef Bacik 提交于
      When looking at our DIO performance Chris said that for small IO's doing the
      async submit stuff tends to be more overhead than it's worth.  With this on top
      of my other fixes I get about a 17-20% speedup doing a sequential dd with 4k
      IO's.  Basically if we don't have to split the bio for the map length it's small
      enough to be directly submitted, otherwise go back to the async submit.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      1ae39938
    • J
      Btrfs: don't split dio bios if we don't have to · 02f57c7a
      Josef Bacik 提交于
      We have been unconditionally allocating a new bio and re-adding all pages from
      our original bio to the new bio.  This is needed if our original bio is larger
      than our stripe size, but if it is smaller than the stripe size then there is no
      need to do this.  So check the map length and if we are under that then go ahead
      and submit the original bio.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      02f57c7a
    • J
      Btrfs: do not call btrfs_update_inode in endio if nothing changed · 1ef30be1
      Josef Bacik 提交于
      In the DIO code we often don't update the i_disk_size because the i_size isn't
      updated until after the DIO is completed, so basically we are allocating a path,
      doing a search, and updating the inode item for no reason since nothing changed.
      btrfs_ordered_update_i_size will return 1 if it didn't update i_disk_size, so
      only run btrfs_update_inode if btrfs_ordered_update_i_size returns 0.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      1ef30be1
    • J
      Btrfs: map the inode item when doing fill_inode_item · 12ddb96c
      Josef Bacik 提交于
      Instead of calling kmap_atomic for every thing we set in the inode item, map the
      entire inode item at the start and unmap it at the end.  This makes a sequential
      dd of 400mb O_DIRECT something like 1% faster.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      12ddb96c
    • J
      Btrfs: only retry transaction reservation once · 06d5a589
      Josef Bacik 提交于
      I saw a lockup where we kept getting into this start transaction->commit
      transaction loop because of enospce.  The fact is if we fail to make our
      reservation, we've tried _everything_ several times, so we only need to try and
      commit the transaction once, and if that doesn't work then we really are out of
      space and need to just exit.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      06d5a589
    • J
      Btrfs: deal with the case that we run out of space in the cache · be1a12a0
      Josef Bacik 提交于
      Currently we don't handle running out of space in the cache, so to fix this we
      keep track of how far in the cache we are.  Then we only dirty the pages if we
      successfully modify all of them, otherwise if we have an error or run out of
      space we can just drop them and not worry about the vm writing them out.
      Thanks,
      
      Tested-by Johannes Hirte <johannes.hirte@fem.tu-ilmenau.de>
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      be1a12a0
  8. 05 4月, 2011 4 次提交