1. 28 4月, 2012 3 次提交
  2. 19 4月, 2012 19 次提交
  3. 13 4月, 2012 7 次提交
  4. 30 3月, 2012 1 次提交
  5. 29 3月, 2012 10 次提交
    • L
      Btrfs: update to the right index of defragment · e1f041e1
      Liu Bo 提交于
      When we use autodefrag, we forget to update the index which indicates
      the last page we've dirty.  And we'll set dirty flags on a same set of
      pages again and again.
      Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      e1f041e1
    • L
      Btrfs: do not bother to defrag an extent if it is a big real extent · 66c26892
      Liu Bo 提交于
      $ mkfs.btrfs /dev/sdb7
      $ mount /dev/sdb7 /mnt/btrfs/ -oautodefrag
      $ dd if=/dev/zero of=/mnt/btrfs/foobar bs=4k count=10 oflag=direct 2>/dev/null
      $ filefrag -v /mnt/btrfs/foobar
      Filesystem type is: 9123683e
      File size of /mnt/btrfs/foobar is 40960 (10 blocks, blocksize 4096)
       ext logical physical expected length flags
         0       0     3072              10 eof
      /mnt/btrfs/foobar: 1 extent found
      
      Now we have a big real extent [0, 40960), but autodefrag will still defrag it.
      
      $ sync
      $ filefrag -v /mnt/btrfs/foobar
      Filesystem type is: 9123683e
      File size of /mnt/btrfs/foobar is 40960 (10 blocks, blocksize 4096)
       ext logical physical expected length flags
         0       0     3082              10 eof
      /mnt/btrfs/foobar: 1 extent found
      
      So if we already find a big real extent, we're ok about that, just skip it.
      Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      66c26892
    • L
      Btrfs: add a check to decide if we should defrag the range · 17ce6ef8
      Liu Bo 提交于
      If our file's layout is as follows:
      | hole | data1 | hole | data2 |
      
      we do not need to defrag this file, because this file has holes and
      cannot be merged into one extent.
      Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      17ce6ef8
    • L
      Btrfs: fix recursive defragment with autodefrag option · 4cb13e5d
      Liu Bo 提交于
      $ mkfs.btrfs disk
      $ mount disk /mnt -o autodefrag
      $ dd if=/dev/zero of=/mnt/foobar bs=4k count=10 2>/dev/null && sync
      $ for i in `seq 9 -2 0`; do dd if=/dev/zero of=/mnt/foobar bs=4k count=1 \
        seek=$i conv=notrunc 2> /dev/null; done && sync
      
      then we'll get to defrag "foobar" again and again.
      So does option "-o autodefrag,compress".
      
      Reasons:
      When the cleaner kthread gets to fetch inodes from the defrag tree and defrag
      them, it will dirty pages and submit them, this will comes to another DATA COW
      where the processing inode will be inserted to the defrag tree again.
      
      This patch sets a rule for COW code, i.e. insert an inode when we're really
      going to make some defragments.
      Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      4cb13e5d
    • L
      Btrfs: fix the mismatch of page->mapping · 1f12bd06
      Liu Bo 提交于
      commit 600a45e1
      (Btrfs: fix deadlock on page lock when doing auto-defragment)
      fixes the deadlock on page, but it also introduces another bug.
      
      A page may have been truncated after unlock & lock.
      So we need to find it again to get the right one.
      
      And since we've held i_mutex lock, inode size remains unchanged and
      we can drop isize overflow checks.
      Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      1f12bd06
    • L
      Btrfs: fix race between direct io and autodefrag · ecb8bea8
      Liu Bo 提交于
      The bug is from running xfstests 209 with autodefrag.
      
      The race is as follows:
             t1                       t2(autodefrag)
         direct IO
           invalidate pagecache
           dio(old data)             add_inode_defrag
           invalidate pagecache
         endio
      
         direct IO
           invalidate pagecache
                                      run_defrag
                                        readpage(old data)
                                        set page dirty (old data)
           dio(new data, rewrite)
           invalidate pagecache (*)
           endio
      
      t2(autodefrag) will get old data into pagecache via readpage and set
      pagecache dirty.  Meanwhile, invalidate pagecache(*) will fail due to
      dirty flags in pages.  So the old data may be flushed into disk by
      flush thread, which will lead to data loss.
      
      And so does the case of user defragment progs.
      
      The patch fixes this race by holding i_mutex when we readpage and set page dirty.
      Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      ecb8bea8
    • L
      Btrfs: fix deadlock during allocating chunks · 15d1ff81
      Liu Bo 提交于
      This deadlock comes from xfstests 251.
      
      We'll hold the chunk_mutex throughout the whole of a chunk allocation.
      But if we find that we've used up system chunk space, we need to allocate a
      new system chunk, but this will lead to a recursion of chunk allocation and end
      up with a deadlock on chunk_mutex.
      So instead we need to allocate the system chunk first if we find we're in ENOSPC.
      Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      15d1ff81
    • L
      Btrfs: show useful info in space reservation tracepoint · 2bcc0328
      Liu Bo 提交于
      o For space info, the type of space info is useful for debug.
      o For transaction handle, its transid is useful.
      Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      2bcc0328
    • C
      Btrfs: don't use crc items bigger than 4KB · 7ca4be45
      Chris Mason 提交于
      With the big metadata blocks, we can have crc items
      that are much bigger than a page.  There are a few
      places that we try to kmalloc memory to hold the
      items during a split.
      
      Items bigger than 4KB don't really have a huge benefit
      in efficiency, but they do trigger larger order allocations.
      This commits changes the csums to make sure they stay under
      4KB.  This is not a format change, just a #define to limit
      huge items.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      7ca4be45
    • C
      Btrfs: flush out and clean up any block device pages during mount · 3c4bb26b
      Chris Mason 提交于
      Btrfs puts the filesystem metadata into its own address space, and
      somehow the block device address space isn't getting onto disk properly
      before a mount.  The end result is that a loop of mkfs and mounting the
      filesystem will sometimes find stale or incorrect data.
      
      This commit should fix it by sprinkling fdatawrites and invalidate_bdev
      calls around.  This is a short term measure to make sure it is fixed.
      The block devices really should be flushed and cleaned up higher in the
      stack.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      3c4bb26b