1. 13 4月, 2012 1 次提交
  2. 22 3月, 2012 2 次提交
  3. 15 2月, 2012 1 次提交
  4. 10 2月, 2012 1 次提交
  5. 27 1月, 2012 3 次提交
  6. 17 1月, 2012 1 次提交
  7. 11 1月, 2012 4 次提交
    • L
      Btrfs: rewrite btrfs_trim_block_group() · 7fe1e641
      Li Zefan 提交于
      There are various bugs in block group trimming:
      
      - It may trim from offset smaller than user-specified offset.
      - It may trim beyond user-specified range.
      - It may leak free space for extents smaller than specified minlen.
      - It may truncate the last trimmed extent thus leak free space.
      - With mixed extents+bitmaps, some extents may not be trimmed.
      - With mixed extents+bitmaps, some bitmaps may not be trimmed (even
      none will be trimmed). Even for those trimmed, not all the free space
      in the bitmaps will be trimmed.
      
      I rewrite btrfs_trim_block_group() and break it into two functions.
      One is to trim extents only, and the other is to trim bitmaps only.
      
      Before patching:
      
      	# fstrim -v /mnt/
      	/mnt/: 1496465408 bytes were trimmed
      
      After patching:
      
      	# fstrim -v /mnt/
      	/mnt/: 2193768448 bytes were trimmed
      
      And this matches the total free space:
      
      	# btrfs fi df /mnt
      	Data: total=3.58GB, used=1.79GB
      	System, DUP: total=8.00MB, used=4.00KB
      	System: total=4.00MB, used=0.00
      	Metadata, DUP: total=205.12MB, used=97.14MB
      	Metadata: total=8.00MB, used=0.00
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      7fe1e641
    • L
      Btrfs: check the return value of io_ctl_init() · 706efc66
      Li Zefan 提交于
      It can return -ENOMEM.
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      706efc66
    • L
      Btrfs: avoid possible NULL deref in io_ctl_drop_pages() · a1ee5a45
      Li Zefan 提交于
      If we run into some failure path in io_ctl_prepare_pages(),
      io_ctl->pages[] array may have some NULL pointers.
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      a1ee5a45
    • L
      Btrfs: add pinned extents to on-disk free space cache correctly · db804f23
      Li Zefan 提交于
      I got this while running xfstests:
      
      [24256.836098] block group 317849600 has an wrong amount of free space
      [24256.836100] btrfs: failed to load free space cache for block group 317849600
      
      We should clamp the extent returned by find_first_extent_bit(),
      so the start of the extent won't smaller than the start of the
      block group.
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      db804f23
  8. 08 1月, 2012 1 次提交
  9. 15 12月, 2011 1 次提交
  10. 01 12月, 2011 2 次提交
    • A
      Btrfs: reset cluster's max_size when creating bitmap · b78d09bc
      Alexandre Oliva 提交于
      The field that indicates the size of the largest contiguous chunk of
      free space in the cluster is not initialized when setting up bitmaps,
      it's only increased when we find a larger contiguous chunk.  We end up
      retaining a larger value than appropriate for highly-fragmented
      clusters, which may cause pointless searches for large contiguous
      groups, and even cause clusters that do not meet the density
      requirements to be set up.
      Signed-off-by: NAlexandre Oliva <oliva@lsd.ic.unicamp.br>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      b78d09bc
    • A
      Btrfs: initialize new bitmaps' list · f2d0f676
      Alexandre Oliva 提交于
      We're failing to create clusters with bitmaps because
      setup_cluster_no_bitmap checks that the list is empty before inserting
      the bitmap entry in the list for setup_cluster_bitmap, but the list
      field is only initialized when it is restored from the on-disk free
      space cache, or when it is written out to disk.
      
      Besides a potential race condition due to the multiple use of the list
      field, filesystem performance severely degrades over time: as we use
      up all non-bitmap free extents, the try-to-set-up-cluster dance is
      done at every metadata block allocation.  For every block group, we
      fail to set up a cluster, and after failing on them all up to twice,
      we fall back to the much slower unclustered allocation.
      
      To make matters worse, before the unclustered allocation, we try to
      create new block groups until we reach the 1% threshold, which
      introduces additional bitmaps and thus block groups that we'll iterate
      over at each metadata block request.
      f2d0f676
  11. 22 11月, 2011 1 次提交
    • C
      Btrfs: remove free-space-cache.c WARN during log replay · 24a70313
      Chris Mason 提交于
      The log replay code only partially loads block groups, since
      the block group caching code is able to detect and deal with
      extents the logging code has pinned down.
      
      While the logging code is pinning down block groups, there is
      a bogus WARN_ON we're hitting if the code wasn't able to find
      an extent in the cache.  This commit removes the warning because
      it can happen any time there isn't a valid free space cache
      for that block group.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      24a70313
  12. 20 11月, 2011 3 次提交
  13. 11 11月, 2011 1 次提交
  14. 06 11月, 2011 2 次提交
  15. 20 10月, 2011 14 次提交
    • J
      Btrfs: don't flush the cache inode before writing it · 016fc6a6
      Josef Bacik 提交于
      I noticed we had a little bit of latency when writing out the space cache
      inodes.  It's because we flush it before we write anything in case we have dirty
      pages already there.  This doesn't matter though since we're just going to
      overwrite the space, and there really shouldn't be any dirty pages anyway.  This
      makes some of my tests run a little bit faster.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      016fc6a6
    • J
      Btrfs: seperate out btrfs_block_rsv_check out into 2 different functions · 36ba022a
      Josef Bacik 提交于
      Currently btrfs_block_rsv_check does 2 things, it will either refill a block
      reserve like in the truncate or refill case, or it will check to see if there is
      enough space in the global reserve and possibly refill it.  However because of
      overcommit we could be well overcommitting ourselves just to try and refill the
      global reserve, when really we should just be committing the transaction.  So
      breack this out into btrfs_block_rsv_refill and btrfs_block_rsv_check.  Refill
      will try to reserve more metadata if it can and btrfs_block_rsv_check will not,
      it will only tell you if the factor of the total space is still reserved.
      Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      36ba022a
    • J
      Btrfs: inline checksums into the disk free space cache · 5b0e95bf
      Josef Bacik 提交于
      Yeah yeah I know this is how we used to do it and then I changed it, but damnit
      I'm changing it back.  The fact is that writing out checksums will modify
      metadata, which could cause us to dirty a block group we've already written out,
      so we have to truncate it and all of it's checksums and re-write it which will
      write new checksums which could dirty a blockg roup that has already been
      written and you see where I'm going with this?  This can cause unmount or really
      anything that depends on a transaction to commit to take it's sweet damned time
      to happen.  So go back to the way it was, only this time we're specifically
      setting NODATACOW because we can't go through the COW pathway anyway and we're
      doing our own built-in cow'ing by truncating the free space cache.  The other
      new thing is once we truncate the old cache and preallocate the new space, we
      don't need to do that song and dance at all for the rest of the transaction, we
      can just overwrite the existing space with the new cache if the block group
      changes for whatever reason, and the NODATACOW will let us do this fine.  So
      keep track of which transaction we last cleared our cache in and if we cleared
      it in this transaction just say we're all setup and carry on.  This survives
      xfstests and stress.sh.
      
      The inode cache will continue to use the normal csum infrastructure since it
      only gets written once and there will be no more modifications to the fs tree in
      a transaction commit.
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      5b0e95bf
    • J
      Btrfs: check the return value of filemap_write_and_wait in the space cache · 549b4fdb
      Josef Bacik 提交于
      We need to check the return value of filemap_write_and_wait in the space cache
      writeout code.  Also don't set the inode's generation until we're sure nothing
      else is going to fail.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      549b4fdb
    • J
      Btrfs: add a io_ctl struct and helpers for dealing with the space cache · a67509c3
      Josef Bacik 提交于
      In writing and reading the space cache we have one big loop that keeps track of
      which page we are on and then a bunch of sizeable loops underneath this big loop
      to try and read/write out properly.  Especially in the write case this makes
      things hugely complicated and hard to follow, and makes our error checking and
      recovery equally as complex.  So add a io_ctl struct with a bunch of helpers to
      keep track of the pages we have, where we are, if we have enough space etc.
      This unifies how we deal with the pages we're writing and keeps all the messy
      tracking internal.  This allows us to kill the big loops in both the read and
      write case and makes reviewing and chaning the write and read paths much
      simpler.  I've run xfstests and stress.sh on this code and it survives.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      a67509c3
    • J
      Btrfs: don't skip writing out a empty block groups cache · f75b130e
      Josef Bacik 提交于
      I noticed a slight bug where we will not bother writing out the block group
      cache's space cache if it's space tree is empty.  Since it could have a cluster
      or pinned extents that need to be written out this is just not a valid test.
      Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      f75b130e
    • J
      Btrfs: use the inode's mapping mask for allocating pages · 3b16a4e3
      Josef Bacik 提交于
      Johannes pointed out we were allocating only kernel pages for doing writes,
      which is kind of a big deal if you are on 32bit and have more than a gig of ram.
      So fix our allocations to use the mapping's gfp but still clear __GFP_FS so we
      don't re-enter.  Thanks,
      Reported-by: NJohannes Weiner <jweiner@redhat.com>
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      3b16a4e3
    • J
      Btrfs: stop passing a trans handle all around the reservation code · 4a92b1b8
      Josef Bacik 提交于
      The only thing that we need to have a trans handle for is in
      reserve_metadata_bytes and thats to know how much flushing we can do.  So
      instead of passing it around, just check current->journal_info for a
      trans_handle so we know if we can commit a transaction to try and free up space
      or not.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      4a92b1b8
    • J
      Btrfs: handle enospc accounting for free space inodes · c09544e0
      Josef Bacik 提交于
      Since free space inodes now use normal checksumming we need to make sure to
      account for their metadata use.  So reserve metadata space, and then if we fail
      to write out the metadata we can just release it, otherwise it will be freed up
      when the io completes.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      c09544e0
    • J
      Btrfs: put the block group cache after we commit the super · 300e4f8a
      Josef Bacik 提交于
      In moving some enospc stuff around I noticed that when we unmount we are often
      evicting the free space cache inodes before we do our last commit.  This isn't
      bad, but it makes us constantly have to re-read the inodes back.  So instead
      don't evict the cache until after we do our last commit, this will make things a
      little less crappy and makes a future enospc change work properly.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      300e4f8a
    • J
      Btrfs: fix call to btrfs_search_slot in free space cache · a9b5fcdd
      Josef Bacik 提交于
      We are setting ins_len to 1 even tho we are just modifying an item that should
      be there already.  This may cause the search stuff to split nodes on the way
      down needelessly.  Set this to 0 since we aren't inserting anything.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      a9b5fcdd
    • J
      Btrfs: allow callers to specify if flushing can occur for btrfs_block_rsv_check · 482e6dc5
      Josef Bacik 提交于
      If you run xfstest 224 it you will get lots of messages about not being able to
      delete inodes and that they will be cleaned up next mount.  This is because
      btrfs_block_rsv_check was not calling reserve_metadata_bytes with the ability to
      flush, so if there was not enough space, it simply failed.  But in truncate and
      evict case we could easily flush space to try and get enough space to do our
      work, so make btrfs_block_rsv_check take a flush argument to pass down to
      reserve_metadata_bytes.  Now xfstests 224 runs fine without all those
      complaints.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      482e6dc5
    • J
      Btrfs: ratelimit the generation printk for the free space cache · 6ab60601
      Josef Bacik 提交于
      A user reported getting spammed when moving to 3.0 by this message.  Since we
      switched to the normal checksumming infrastructure all old free space caches
      will be wrong and need to be regenerated so people are likely to see this
      message a lot, so ratelimit it so it doesn't fill up their logs and freak them
      out.  Thanks,
      Reported-by: NAndrew Lutomirski <luto@mit.edu>
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      6ab60601
    • J
      Btrfs: use bytes_may_use for all ENOSPC reservations · fb25e914
      Josef Bacik 提交于
      We have been using bytes_reserved for metadata reservations, which is wrong
      since we use that to keep track of outstanding reservations from the allocator.
      This resulted in us doing a lot of silly things to make sure we don't allocate a
      bunch of metadata chunks since we never had a real view of how much space was
      actually in use by metadata.
      
      This passes Arne's enospc test and xfstests as well as my own enospc tests.
      Hopefully this will get us moving in the right direction.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      fb25e914
  16. 11 9月, 2011 1 次提交
  17. 17 8月, 2011 1 次提交