1. 27 6月, 2012 1 次提交
    • J
      Btrfs: avoid waiting for delayed refs when we must not · 8ca78f3e
      Jan Schmidt 提交于
      We track two conditions to decide if we should sleep while waiting for more
      delayed refs, the number of delayed refs (num_refs) and the first entry in
      the list of blockers (first_seq).
      
      When we suspect staleness, we save num_refs and do one more cycle. If
      nothing changes, we then save first_seq for later comparison and do
      wait_event. We ought to save first_seq the very same moment we're saving
      num_refs. Otherwise we cannot be sure that nothing has changed and we might
      start waiting when we shouldn't, which could lead to starvation.
      Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>
      8ca78f3e
  2. 30 5月, 2012 1 次提交
    • J
      Btrfs: convert the inode bit field to use the actual bit operations · 72ac3c0d
      Josef Bacik 提交于
      Miao pointed this out while I was working on an orphan problem that messing
      with a bitfield where different ranges are protected by different locks
      doesn't work out right.  Turns out we've been doing this forever where we
      have different parts of the bit field protected by either no lock at all or
      different locks which could cause all sorts of weird problems including the
      issue I was hitting.  So instead make a runtime_flags thing that we use the
      normal bit operations on that are all atomic so we can keep having our
      no/different locking for the different flags and then make force_compress
      it's own thing so it can be treated normally.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      72ac3c0d
  3. 26 5月, 2012 1 次提交
  4. 11 5月, 2012 1 次提交
  5. 06 5月, 2012 1 次提交
    • C
      Btrfs: avoid sleeping in verify_parent_transid while atomic · b9fab919
      Chris Mason 提交于
      verify_parent_transid needs to lock the extent range to make
      sure no IO is underway, and so it can safely clear the
      uptodate bits if our checks fail.
      
      But, a few callers are using it with spinlocks held.  Most
      of the time, the generation numbers are going to match, and
      we don't want to switch to a blocking lock just for the error
      case.  This adds an atomic flag to verify_parent_transid,
      and changes it to return EAGAIN if it needs to block to
      properly verifiy things.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      b9fab919
  6. 28 4月, 2012 1 次提交
  7. 19 4月, 2012 2 次提交
    • A
      btrfs: don't return EINTR · b9688bb8
      Arne Jansen 提交于
      It is basically a good thing if we are interruptible when waiting for
      free space, but the generality in which it is implemented currently
      leads to system calls being interruptible that are not documented this
      way. For example git can't handle interrupted unlink(), leading to
      corrupt repos under space pressure.
      Instead we raise the bar to only be interruptible by SIGKILL.
      Thanks to David Sterba for suggesting this.
      Signed-off-by: NArne Jansen <sensille@gmx.net>
      b9688bb8
    • D
      Btrfs: double unlock bug in error handling · 253beebd
      Dan Carpenter 提交于
      The caller expects this function to return with the lock held and
      releases it immediately on error.
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      253beebd
  8. 13 4月, 2012 3 次提交
  9. 29 3月, 2012 2 次提交
  10. 27 3月, 2012 10 次提交
  11. 22 3月, 2012 4 次提交
  12. 24 2月, 2012 1 次提交
  13. 23 2月, 2012 1 次提交
  14. 17 2月, 2012 1 次提交
  15. 15 2月, 2012 1 次提交
    • L
      Btrfs: fix trim 0 bytes after a device delete · 2cac13e4
      Liu Bo 提交于
      A user reported a bug of btrfs's trim, that is we will trim 0 bytes
      after a device delete.
      
      The reproducer:
      
      $ mkfs.btrfs disk1
      $ mkfs.btrfs disk2
      $ mount disk1 /mnt
      $ fstrim -v /mnt
      $ btrfs device add disk2 /mnt
      $ btrfs device del disk1 /mnt
      $ fstrim -v /mnt
      
      This is because after we delete the device, the block group may start from
      a non-zero place, which will confuse trim to discard nothing.
      Reported-by: NLutz Euler <lutz.euler@freenet.de>
      Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
      2cac13e4
  16. 27 1月, 2012 1 次提交
    • M
      Btrfs: fix enospc error caused by wrong checks of the chunk · 9e622d6b
      Miao Xie 提交于
      When we did sysbench test for inline files, enospc error happened easily though
      there was lots of free disk space which could be allocated for new chunks.
      
      Reproduce steps:
       # mkfs.btrfs -b $((2 * 1024 * 1024 * 1024)) <test partition>
       # mount <test partition> /mnt
       # ulimit -n 102400
       # cd /mnt
       # sysbench --num-threads=1 --test=fileio --file-num=81920 \
       > --file-total-size=80M --file-block-size=1K --file-io-mode=sync \
       > --file-test-mode=seqwr prepare
       # sysbench --num-threads=1 --test=fileio --file-num=81920 \
       > --file-total-size=80M --file-block-size=1K --file-io-mode=sync \
       > --file-test-mode=seqwr run
       <soon later, BUG_ON() was triggered by enospc error>
      
      The reason of this bug is:
      Now, we can reserve space which is larger than the free space in the chunks if
      we have enough free disk space which can be used for new chunks. By this way,
      the space allocator should allocate a new chunk by force if there is no free
      space in the free space cache. But there are two wrong checks which break this
      operation.
      
      One is
      	if (ret == -ENOSPC && num_bytes > min_alloc_size)
      in btrfs_reserve_extent(), it is wrong, we should try to allocate a new chunk
      even we fail to allocate free space by minimum allocable size.
      
      The other is
      	if (space_info->force_alloc)
      		force = space_info->force_alloc;
      in do_chunk_alloc(). It makes the allocator ignore CHUNK_ALLOC_FORCE If someone
      sets ->force_alloc to CHUNK_ALLOC_LIMITED, and makes the enospc error happen.
      
      Fix these two wrong checks. Especially the second one, we fix it by changing
      the value of CHUNK_ALLOC_LIMITED and CHUNK_ALLOC_FORCE, and make
      CHUNK_ALLOC_FORCE greater than CHUNK_ALLOC_LIMITED since CHUNK_ALLOC_FORCE has
      higher priority. And if the value which is passed in by the caller is greater
      than ->force_alloc, use the passed value.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      9e622d6b
  17. 17 1月, 2012 8 次提交
    • C
      Btrfs: use larger system chunks · 96bdc7dc
      Chris Mason 提交于
      system chunks by default are very small.  This makes them slightly
      larger and also fixes the conditional checks to make sure we don't
      allocate a billion of them at once.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      96bdc7dc
    • J
      Btrfs: add a delalloc mutex to inodes for delalloc reservations · f248679e
      Josef Bacik 提交于
      I was using i_mutex for this, but we're getting bogus lockdep warnings by doing
      that and theres no real way to get rid of those, so just stop using i_mutex to
      protect delalloc metadata reservations and use a delalloc mutex instead.  This
      shouldn't be contended often at all, only if you are writing and mmap writing to
      the file at the same time.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      f248679e
    • J
      Btrfs: space leak tracepoints · 8c2a3ca2
      Josef Bacik 提交于
      This in addition to a script in my btrfs-tracing tree will help track down space
      leaks when we're getting space left over in block groups on umount.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      8c2a3ca2
    • J
      Btrfs: add allocator tracepoints · 3f7de037
      Josef Bacik 提交于
      I used these tracepoints when figuring out what the cluster stuff was doing, so
      add them to mainline in case we need to profile this stuff again.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      3f7de037
    • I
      Btrfs: implement online profile changing · e4d8ec0f
      Ilya Dryomov 提交于
      Profile changing is done by launching a balance with
      BTRFS_BALANCE_CONVERT bits set and target fields of respective
      btrfs_balance_args structs initialized.  Profile reducing code in this
      case will pick restriper's target profile if it's available instead of
      doing a blind reduce.  If target profile is not yet available it goes
      back to a plain reduce.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      e4d8ec0f
    • I
      Btrfs: do not reduce profile in do_chunk_alloc() · 70922617
      Ilya Dryomov 提交于
      Every caller of do_chunk_alloc() feeds it the reduced allocation
      profile, so stop trying to reduce it one more time.  Instead check the
      validity of the passed profile.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      70922617
    • I
      Btrfs: make avail_*_alloc_bits fields dynamic · 10ea00f5
      Ilya Dryomov 提交于
      Currently when new chunks are created respective avail_alloc_bits field
      is updated to reflect profiles of all chunks present in the system.
      However when chunks are removed profile bits are never cleared.
      
      This patch clears profile bit of respective avail_alloc_bits field when
      the last chunk with that profile is removed.  Restriper needs this to
      properly operate when "downgrading".
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      10ea00f5
    • I
      Btrfs: add BTRFS_AVAIL_ALLOC_BIT_SINGLE bit · a46d11a8
      Ilya Dryomov 提交于
      Right now on-disk BTRFS_BLOCK_GROUP_* profile bits are used for
      avail_{data,metadata,system}_alloc_bits fields, which gather info about
      available allocation profiles in the FS.  When chunk is created or read
      from disk, its profile is OR'ed with the corresponding avail_alloc_bits
      field.  Since SINGLE is denoted by 0 in the on-disk format, currently
      there is no way to tell when such chunks become avaialble.  Restriper
      needs that information, so add a separate bit for SINGLE profile.
      
      This bit is going to be in-memory only, it should never be written out
      to disk, so it's not a disk format change.  However to avoid remappings
      in future, reserve corresponding on-disk bit.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      a46d11a8