1. 17 1月, 2012 21 次提交
  2. 11 1月, 2012 11 次提交
    • L
      Btrfs: fix possible deadlock when opening a seed device · b367e47f
      Li Zefan 提交于
      The correct lock order is uuid_mutex -> volume_mutex -> chunk_mutex,
      but when we mount a filesystem which has backing seed devices, we have
      this lock chain:
      
          open_ctree()
              lock(chunk_mutex);
              read_chunk_tree();
                  read_one_dev();
                      open_seed_devices();
                          lock(uuid_mutex);
      
      and then we hit a lockdep splat.
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      b367e47f
    • L
      Btrfs: update global block_rsv when creating a new block group · c7c144db
      Li Zefan 提交于
      A bug was triggered while using seed device:
      
          # mkfs.btrfs /dev/loop1
          # btrfstune -S 1 /dev/loop1
          # mount -o /dev/loop1 /mnt
          # btrfs dev add /dev/loop2 /mnt
      
      btrfs: block rsv returned -28
      ------------[ cut here ]------------
      WARNING: at fs/btrfs/extent-tree.c:5969 btrfs_alloc_free_block+0x166/0x396 [btrfs]()
      ...
      Call Trace:
      ...
      [<f7b7c31c>] btrfs_cow_block+0x101/0x147 [btrfs]
      [<f7b7eaa6>] btrfs_search_slot+0x1b8/0x55f [btrfs]
      [<f7b7f844>] btrfs_insert_empty_items+0x42/0x7f [btrfs]
      [<f7b7f8c1>] btrfs_insert_item+0x40/0x7e [btrfs]
      [<f7b8ac02>] btrfs_make_block_group+0x243/0x2aa [btrfs]
      [<f7bb3f53>] __btrfs_alloc_chunk+0x672/0x70e [btrfs]
      [<f7bb41ff>] init_first_rw_device+0x77/0x13c [btrfs]
      [<f7bb5a62>] btrfs_init_new_device+0x664/0x9fd [btrfs]
      [<f7bbb65a>] btrfs_ioctl+0x694/0xdbe [btrfs]
      [<c04f55f7>] do_vfs_ioctl+0x496/0x4cc
      [<c04f5660>] sys_ioctl+0x33/0x4f
      [<c07b9edf>] sysenter_do_call+0x12/0x38
      ---[ end trace 906adac595facc7d ]---
      
      Since seed device is readonly, there's no usable space in the filesystem.
      Afterwards we add a sprout device to it, and the kernel creates a METADATA
      block group and a SYSTEM block group where comes free space we can reserve,
      but we still get revervation failure because the global block_rsv hasn't
      been updated accordingly.
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      c7c144db
    • L
      Btrfs: rewrite btrfs_trim_block_group() · 7fe1e641
      Li Zefan 提交于
      There are various bugs in block group trimming:
      
      - It may trim from offset smaller than user-specified offset.
      - It may trim beyond user-specified range.
      - It may leak free space for extents smaller than specified minlen.
      - It may truncate the last trimmed extent thus leak free space.
      - With mixed extents+bitmaps, some extents may not be trimmed.
      - With mixed extents+bitmaps, some bitmaps may not be trimmed (even
      none will be trimmed). Even for those trimmed, not all the free space
      in the bitmaps will be trimmed.
      
      I rewrite btrfs_trim_block_group() and break it into two functions.
      One is to trim extents only, and the other is to trim bitmaps only.
      
      Before patching:
      
      	# fstrim -v /mnt/
      	/mnt/: 1496465408 bytes were trimmed
      
      After patching:
      
      	# fstrim -v /mnt/
      	/mnt/: 2193768448 bytes were trimmed
      
      And this matches the total free space:
      
      	# btrfs fi df /mnt
      	Data: total=3.58GB, used=1.79GB
      	System, DUP: total=8.00MB, used=4.00KB
      	System: total=4.00MB, used=0.00
      	Metadata, DUP: total=205.12MB, used=97.14MB
      	Metadata: total=8.00MB, used=0.00
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      7fe1e641
    • L
      Btrfs: simplfy calculation of stripe length for discard operation · ec9ef7a1
      Li Zefan 提交于
      For btrfs raid, while discarding a range of space, we'll need to know
      the start offset and length to discard for each device, and it's done
      in btrfs_map_block().
      
      However the calculation is a bit complex for raid0 and raid10, so I
      reimplement it based on a fact that:
      
              dev1          dev2           dev3    (raid0)
              -----------------------------------
              s0 s3 s6      s1 s4 s7       s2 s5
      
      Each device has (total_stripes / nr_dev) stripes, or plus one.
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      ec9ef7a1
    • L
      Btrfs: don't pre-allocate btrfs bio · de11cc12
      Li Zefan 提交于
      We pre-allocate a btrfs bio with fixed size, and then may re-allocate
      memory if we find stripes are bigger than the fixed size. But this
      pre-allocation is not necessary.
      
      Also we don't have to calcuate the stripe number twice.
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      de11cc12
    • L
      Btrfs: don't pass a trans handle unnecessarily in volumes.c · 125ccb0a
      Li Zefan 提交于
      Some functions never use the transaction handle passed to them.
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      125ccb0a
    • L
      Btrfs: reserve metadata space in btrfs_ioctl_setflags() · 4da6f1a3
      Li Zefan 提交于
      Check and reserve space for btrfs_update_inode().
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      4da6f1a3
    • L
      Btrfs: remove BUG_ON()s in btrfs_ioctl_setflags() · f062abf0
      Li Zefan 提交于
      We can recover from errors and return -errno to user space.
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      f062abf0
    • L
      Btrfs: check the return value of io_ctl_init() · 706efc66
      Li Zefan 提交于
      It can return -ENOMEM.
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      706efc66
    • L
      Btrfs: avoid possible NULL deref in io_ctl_drop_pages() · a1ee5a45
      Li Zefan 提交于
      If we run into some failure path in io_ctl_prepare_pages(),
      io_ctl->pages[] array may have some NULL pointers.
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      a1ee5a45
    • L
      Btrfs: add pinned extents to on-disk free space cache correctly · db804f23
      Li Zefan 提交于
      I got this while running xfstests:
      
      [24256.836098] block group 317849600 has an wrong amount of free space
      [24256.836100] btrfs: failed to load free space cache for block group 317849600
      
      We should clamp the extent returned by find_first_extent_bit(),
      so the start of the extent won't smaller than the start of the
      block group.
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      db804f23
  3. 08 1月, 2012 2 次提交
  4. 07 1月, 2012 4 次提交
    • A
      Btrfs: test free space only for unclustered allocation · a5f6f719
      Alexandre Oliva 提交于
      Since the clustered allocation may be taking extents from a different
      block group, there's no point in spin-locking and testing the current
      block group free space before attempting to allocate space from a
      cluster, even more so when we might refrain from even trying the
      cluster in the current block group because, after the cluster was set
      up, not enough free space remained.  Furthermore, cluster creation
      attempts fail fast when the block group doesn't have enough free
      space, so the test was completely superfluous.
      
      I've move the free space test past the cluster allocation attempt,
      where it is more useful, and arranged for a cluster in the current
      block group to be released before trying an unclustered allocation,
      when we reach the LOOP_NO_EMPTY_SIZE stage, so that the free space in
      the cluster stands a chance of being combined with additional free
      space in the block group so as to succeed in the allocation attempt.
      Signed-off-by: NAlexandre Oliva <oliva@lsd.ic.unicamp.br>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      a5f6f719
    • C
      Btrfs: use bigger metadata chunks on bigger filesystems · 1100373f
      Chris Mason 提交于
      The 256MB chunk is a little small on a huge FS.  This scales up the
      chunk size.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      1100373f
    • C
      Btrfs: lower the bar for chunk allocation · cf1d72c9
      Chris Mason 提交于
      The chunk allocation code has tried to keep a pretty tight lid on creating new
      metadata chunks.  This is partially because in the past the reservation
      code didn't give us an accurate idea of how much space was being used.
      
      The new code is much more accurate, so we're able to get rid of some of these
      checks.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      cf1d72c9
    • C
      Btrfs: run chunk allocations while we do delayed refs · 203bf287
      Chris Mason 提交于
      Btrfs tries to batch extent allocation tree changes to improve performance
      and reduce metadata trashing.  But it doesn't allocate new metadata chunks
      while it is doing allocations for the extent allocation tree.
      
      This commit changes the delayed refence code to do chunk allocations if we're
      getting low on room.  It prevents crashes and improves performance.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      203bf287
  5. 05 1月, 2012 2 次提交