1. 09 10月, 2012 1 次提交
    • J
      Btrfs: cache extent state when writing out dirty metadata pages · e6138876
      Josef Bacik 提交于
      Everytime we write out dirty pages we search for an offset in the tree,
      convert the bits in the state, and then when we wait we search for the
      offset again and clear the bits.  So for every dirty range in the io tree we
      are doing 4 rb searches, which is suboptimal.  With this patch we are only
      doing 2 searches for every cycle (modulo weird things happening).  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      e6138876
  2. 04 10月, 2012 1 次提交
  3. 24 7月, 2012 1 次提交
  4. 03 7月, 2012 1 次提交
    • J
      Btrfs: fix tree log remove space corner case · bdb7d303
      Josef Bacik 提交于
      The tree log stuff can have allocated space that we end up having split
      across a bitmap and a real extent.  The free space code does not deal with
      this, it assumes that if it finds an extent or bitmap entry that the entire
      range must fall within the entry it finds.  This isn't necessarily the case,
      so rework the remove function so it can handle this case properly.  This
      fixed two panics the user hit, first in the case where the space was
      initially in a bitmap and then in an extent entry, and then the reverse
      case.  Thanks,
      Reported-and-tested-by: NShaun Reich <sreich@kde.org>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      bdb7d303
  5. 30 5月, 2012 3 次提交
    • J
      Btrfs: merge contigous regions when loading free space cache · cd023e7b
      Josef Bacik 提交于
      When we write out the free space cache we will write out everything that is
      in our in memory tree, and then we will just walk the pinned extents tree
      and write anything we see there.  The problem with this is that during
      normal operations the pinned extents will be merged back into the free space
      tree normally, and then we can allocate space from the merged areas and
      commit them to the tree log.  If we crash and replay the tree log we will
      crash again because the tree log will try to free up space from what looks
      like 2 seperate but contiguous entries, since one entry is from the original
      free space cache and the other was a pinned extent that was merged back.  To
      fix this we just need to walk the free space tree after we load it and merge
      contiguous entries back together.  This will keep the tree log stuff from
      breaking and it will make the allocator behave more nicely.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      cd023e7b
    • J
      Btrfs: finish ordered extents in their own thread · 5fd02043
      Josef Bacik 提交于
      We noticed that the ordered extent completion doesn't really rely on having
      a page and that it could be done independantly of ending the writeback on a
      page.  This patch makes us not do the threaded endio stuff for normal
      buffered writes and direct writes so we can end page writeback as soon as
      possible (in irq context) and only start threads to do the ordered work when
      it is actually done.  Compression needs to be reworked some to take
      advantage of this as well, but atm it has to do a find_get_page in its endio
      handler so it must be done in its own thread.  This makes direct writes
      quite a bit faster.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      5fd02043
    • A
      btrfs: trivial endianness annotations · 528c0327
      Al Viro 提交于
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      528c0327
  6. 13 4月, 2012 1 次提交
  7. 22 3月, 2012 2 次提交
  8. 15 2月, 2012 1 次提交
  9. 10 2月, 2012 1 次提交
  10. 27 1月, 2012 3 次提交
  11. 17 1月, 2012 1 次提交
  12. 11 1月, 2012 4 次提交
    • L
      Btrfs: rewrite btrfs_trim_block_group() · 7fe1e641
      Li Zefan 提交于
      There are various bugs in block group trimming:
      
      - It may trim from offset smaller than user-specified offset.
      - It may trim beyond user-specified range.
      - It may leak free space for extents smaller than specified minlen.
      - It may truncate the last trimmed extent thus leak free space.
      - With mixed extents+bitmaps, some extents may not be trimmed.
      - With mixed extents+bitmaps, some bitmaps may not be trimmed (even
      none will be trimmed). Even for those trimmed, not all the free space
      in the bitmaps will be trimmed.
      
      I rewrite btrfs_trim_block_group() and break it into two functions.
      One is to trim extents only, and the other is to trim bitmaps only.
      
      Before patching:
      
      	# fstrim -v /mnt/
      	/mnt/: 1496465408 bytes were trimmed
      
      After patching:
      
      	# fstrim -v /mnt/
      	/mnt/: 2193768448 bytes were trimmed
      
      And this matches the total free space:
      
      	# btrfs fi df /mnt
      	Data: total=3.58GB, used=1.79GB
      	System, DUP: total=8.00MB, used=4.00KB
      	System: total=4.00MB, used=0.00
      	Metadata, DUP: total=205.12MB, used=97.14MB
      	Metadata: total=8.00MB, used=0.00
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      7fe1e641
    • L
      Btrfs: check the return value of io_ctl_init() · 706efc66
      Li Zefan 提交于
      It can return -ENOMEM.
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      706efc66
    • L
      Btrfs: avoid possible NULL deref in io_ctl_drop_pages() · a1ee5a45
      Li Zefan 提交于
      If we run into some failure path in io_ctl_prepare_pages(),
      io_ctl->pages[] array may have some NULL pointers.
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      a1ee5a45
    • L
      Btrfs: add pinned extents to on-disk free space cache correctly · db804f23
      Li Zefan 提交于
      I got this while running xfstests:
      
      [24256.836098] block group 317849600 has an wrong amount of free space
      [24256.836100] btrfs: failed to load free space cache for block group 317849600
      
      We should clamp the extent returned by find_first_extent_bit(),
      so the start of the extent won't smaller than the start of the
      block group.
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      db804f23
  13. 08 1月, 2012 1 次提交
  14. 15 12月, 2011 1 次提交
  15. 01 12月, 2011 2 次提交
    • A
      Btrfs: reset cluster's max_size when creating bitmap · b78d09bc
      Alexandre Oliva 提交于
      The field that indicates the size of the largest contiguous chunk of
      free space in the cluster is not initialized when setting up bitmaps,
      it's only increased when we find a larger contiguous chunk.  We end up
      retaining a larger value than appropriate for highly-fragmented
      clusters, which may cause pointless searches for large contiguous
      groups, and even cause clusters that do not meet the density
      requirements to be set up.
      Signed-off-by: NAlexandre Oliva <oliva@lsd.ic.unicamp.br>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      b78d09bc
    • A
      Btrfs: initialize new bitmaps' list · f2d0f676
      Alexandre Oliva 提交于
      We're failing to create clusters with bitmaps because
      setup_cluster_no_bitmap checks that the list is empty before inserting
      the bitmap entry in the list for setup_cluster_bitmap, but the list
      field is only initialized when it is restored from the on-disk free
      space cache, or when it is written out to disk.
      
      Besides a potential race condition due to the multiple use of the list
      field, filesystem performance severely degrades over time: as we use
      up all non-bitmap free extents, the try-to-set-up-cluster dance is
      done at every metadata block allocation.  For every block group, we
      fail to set up a cluster, and after failing on them all up to twice,
      we fall back to the much slower unclustered allocation.
      
      To make matters worse, before the unclustered allocation, we try to
      create new block groups until we reach the 1% threshold, which
      introduces additional bitmaps and thus block groups that we'll iterate
      over at each metadata block request.
      f2d0f676
  16. 22 11月, 2011 1 次提交
    • C
      Btrfs: remove free-space-cache.c WARN during log replay · 24a70313
      Chris Mason 提交于
      The log replay code only partially loads block groups, since
      the block group caching code is able to detect and deal with
      extents the logging code has pinned down.
      
      While the logging code is pinning down block groups, there is
      a bogus WARN_ON we're hitting if the code wasn't able to find
      an extent in the cache.  This commit removes the warning because
      it can happen any time there isn't a valid free space cache
      for that block group.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      24a70313
  17. 20 11月, 2011 3 次提交
  18. 11 11月, 2011 1 次提交
  19. 06 11月, 2011 2 次提交
  20. 20 10月, 2011 9 次提交
    • J
      Btrfs: don't flush the cache inode before writing it · 016fc6a6
      Josef Bacik 提交于
      I noticed we had a little bit of latency when writing out the space cache
      inodes.  It's because we flush it before we write anything in case we have dirty
      pages already there.  This doesn't matter though since we're just going to
      overwrite the space, and there really shouldn't be any dirty pages anyway.  This
      makes some of my tests run a little bit faster.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      016fc6a6
    • J
      Btrfs: seperate out btrfs_block_rsv_check out into 2 different functions · 36ba022a
      Josef Bacik 提交于
      Currently btrfs_block_rsv_check does 2 things, it will either refill a block
      reserve like in the truncate or refill case, or it will check to see if there is
      enough space in the global reserve and possibly refill it.  However because of
      overcommit we could be well overcommitting ourselves just to try and refill the
      global reserve, when really we should just be committing the transaction.  So
      breack this out into btrfs_block_rsv_refill and btrfs_block_rsv_check.  Refill
      will try to reserve more metadata if it can and btrfs_block_rsv_check will not,
      it will only tell you if the factor of the total space is still reserved.
      Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      36ba022a
    • J
      Btrfs: inline checksums into the disk free space cache · 5b0e95bf
      Josef Bacik 提交于
      Yeah yeah I know this is how we used to do it and then I changed it, but damnit
      I'm changing it back.  The fact is that writing out checksums will modify
      metadata, which could cause us to dirty a block group we've already written out,
      so we have to truncate it and all of it's checksums and re-write it which will
      write new checksums which could dirty a blockg roup that has already been
      written and you see where I'm going with this?  This can cause unmount or really
      anything that depends on a transaction to commit to take it's sweet damned time
      to happen.  So go back to the way it was, only this time we're specifically
      setting NODATACOW because we can't go through the COW pathway anyway and we're
      doing our own built-in cow'ing by truncating the free space cache.  The other
      new thing is once we truncate the old cache and preallocate the new space, we
      don't need to do that song and dance at all for the rest of the transaction, we
      can just overwrite the existing space with the new cache if the block group
      changes for whatever reason, and the NODATACOW will let us do this fine.  So
      keep track of which transaction we last cleared our cache in and if we cleared
      it in this transaction just say we're all setup and carry on.  This survives
      xfstests and stress.sh.
      
      The inode cache will continue to use the normal csum infrastructure since it
      only gets written once and there will be no more modifications to the fs tree in
      a transaction commit.
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      5b0e95bf
    • J
      Btrfs: check the return value of filemap_write_and_wait in the space cache · 549b4fdb
      Josef Bacik 提交于
      We need to check the return value of filemap_write_and_wait in the space cache
      writeout code.  Also don't set the inode's generation until we're sure nothing
      else is going to fail.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      549b4fdb
    • J
      Btrfs: add a io_ctl struct and helpers for dealing with the space cache · a67509c3
      Josef Bacik 提交于
      In writing and reading the space cache we have one big loop that keeps track of
      which page we are on and then a bunch of sizeable loops underneath this big loop
      to try and read/write out properly.  Especially in the write case this makes
      things hugely complicated and hard to follow, and makes our error checking and
      recovery equally as complex.  So add a io_ctl struct with a bunch of helpers to
      keep track of the pages we have, where we are, if we have enough space etc.
      This unifies how we deal with the pages we're writing and keeps all the messy
      tracking internal.  This allows us to kill the big loops in both the read and
      write case and makes reviewing and chaning the write and read paths much
      simpler.  I've run xfstests and stress.sh on this code and it survives.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      a67509c3
    • J
      Btrfs: don't skip writing out a empty block groups cache · f75b130e
      Josef Bacik 提交于
      I noticed a slight bug where we will not bother writing out the block group
      cache's space cache if it's space tree is empty.  Since it could have a cluster
      or pinned extents that need to be written out this is just not a valid test.
      Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      f75b130e
    • J
      Btrfs: use the inode's mapping mask for allocating pages · 3b16a4e3
      Josef Bacik 提交于
      Johannes pointed out we were allocating only kernel pages for doing writes,
      which is kind of a big deal if you are on 32bit and have more than a gig of ram.
      So fix our allocations to use the mapping's gfp but still clear __GFP_FS so we
      don't re-enter.  Thanks,
      Reported-by: NJohannes Weiner <jweiner@redhat.com>
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      3b16a4e3
    • J
      Btrfs: stop passing a trans handle all around the reservation code · 4a92b1b8
      Josef Bacik 提交于
      The only thing that we need to have a trans handle for is in
      reserve_metadata_bytes and thats to know how much flushing we can do.  So
      instead of passing it around, just check current->journal_info for a
      trans_handle so we know if we can commit a transaction to try and free up space
      or not.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      4a92b1b8
    • J
      Btrfs: handle enospc accounting for free space inodes · c09544e0
      Josef Bacik 提交于
      Since free space inodes now use normal checksumming we need to make sure to
      account for their metadata use.  So reserve metadata space, and then if we fail
      to write out the metadata we can just release it, otherwise it will be freed up
      when the io completes.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      c09544e0