1. 29 8月, 2012 7 次提交
  2. 26 7月, 2012 10 次提交
  3. 25 7月, 2012 2 次提交
  4. 24 7月, 2012 21 次提交
    • L
      Btrfs: improve multi-thread buffer read · 67c9684f
      Liu Bo 提交于
      While testing with my buffer read fio jobs[1], I find that btrfs does not
      perform well enough.
      
      Here is a scenario in fio jobs:
      
      We have 4 threads, "t1 t2 t3 t4", starting to buffer read a same file,
      and all of them will race on add_to_page_cache_lru(), and if one thread
      successfully puts its page into the page cache, it takes the responsibility
      to read the page's data.
      
      And what's more, reading a page needs a period of time to finish, in which
      other threads can slide in and process rest pages:
      
           t1          t2          t3          t4
         add Page1
         read Page1  add Page2
           |         read Page2  add Page3
           |            |        read Page3  add Page4
           |            |           |        read Page4
      -----|------------|-----------|-----------|--------
           v            v           v           v
          bio          bio         bio         bio
      
      Now we have four bios, each of which holds only one page since we need to
      maintain consecutive pages in bio.  Thus, we can end up with far more bios
      than we need.
      
      Here we're going to
      a) delay the real read-page section and
      b) try to put more pages into page cache.
      
      With that said, we can make each bio hold more pages and reduce the number
      of bios we need.
      
      Here is some numbers taken from fio results:
               w/o patch                 w patch
             -------------  --------  ---------------
      READ:    745MB/s        +25%       934MB/s
      
      [1]:
      [global]
      group_reporting
      thread
      numjobs=4
      bs=32k
      rw=read
      ioengine=sync
      directory=/mnt/btrfs/
      
      [READ]
      filename=foobar
      size=2000M
      invalidate=1
      Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      67c9684f
    • L
      Btrfs: make btrfs's allocation smoothly with preallocation · df57dbe6
      Liu Bo 提交于
      For backref walking, we've introduce delayed ref's sequence.  However,
      it changes our preallocation behavior.
      
      The story is that when we preallocate an extent and then mark it written
      piece by piece, the ideal case should be that we don't need to COW the
      extent, which is why we use 'preallocate'.
      
      But we may not make use of preallocation, since when we check for cross refs on
      the extent, we may have two ref entries which have the same content except
      the sequence value, and we recognize them as cross refs and do COW to allocate
      another extent.
      
      So we end up with several pieces of space instead of an whole extent.
      Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      df57dbe6
    • J
      Btrfs: lock the transition from dirty to writeback for an eb · 51561ffe
      Josef Bacik 提交于
      There is a small window where an eb can have no IO bits set on it, which
      could potentially result in extent_buffer_under_io() returning false when we
      want it to return true, which could result in not fun things happening.  So
      in order to protect this case we need to hold the refs_lock when we make
      this transition to make sure we get reliable results out of
      extent_buffer_udner_io().  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      51561ffe
    • J
      Btrfs: fix potential race in extent buffer freeing · 594831c4
      Josef Bacik 提交于
      This sounds sort of impossible but it is the only thing I can think of and
      at the very least it is theoretically possible so here it goes.
      
      If we are in try_release_extent_buffer we will check that the ref count on
      the extent buffer is 1 and not under IO, and then go down and clear the tree
      ref.  If between this check and clearing the tree ref somebody else comes in
      and grabs a ref on the eb and the marks it dirty before
      try_release_extent_buffer() does it's tree ref clear we can end up with a
      dirty eb that will be freed while it is still dirty which will result in a
      panic.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      594831c4
    • J
      Btrfs: don't return true in releasepage unless we actually freed the eb · e64860aa
      Josef Bacik 提交于
      I noticed while looking at an extent_buffer race that we will
      unconditionally return 1 if we get down to release_extent_buffer after
      clearing the tree ref.  However we can easily race in here and get a ref on
      the eb and not actually free the eb.  So make release_extent_buffer return 1
      if it free'd the eb and 0 if not so we can be a little kinder to the vm.
      Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      e64860aa
    • S
      Btrfs: suppress printk() if all device I/O stats are zero · a98cdb85
      Stefan Behrens 提交于
      Code is added to suppress the I/O stats printing at mount time if all
      statistic values are zero.
      Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
      a98cdb85
    • S
      Btrfs: remove unwanted printk() for btrfs device I/O stats · 5021976d
      Stefan Behrens 提交于
      People complained about the annoying kernel log message
      "btrfs: no dev_stats entry found ... (OK on first mount after mkfs)"
      everytime a filesystem is mounted for the first time after running
      mkfs. Since the distribution of the btrfs-progs is not synchronized
      to the kernel version, mkfs like it is now will be used also in the
      future. Then this message is not useful to find errors, it is just
      annoying. This commit removes the printk().
      Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
      5021976d
    • L
      Btrfs: rewrite BTRFS_SETGET_FUNCS · 18077bb4
      Li Zefan 提交于
      BTRFS_SETGET_FUNCS macro is used to generate btrfs_set_foo() and
      btrfs_foo() functions, which read and write specific fields in the
      extent buffer.
      
      The total number of set/get functions is ~200, but in fact we only
      need 8 functions: 2 for u8 field, 2 for u16, 2 for u32 and 2 for u64.
      
      It results in redunction of ~37K bytes.
      
         text    data     bss     dec     hex filename
       629661   12489     216  642366   9cd3e fs/btrfs/btrfs.o.orig
       592637   12489     216  605342   93c9e fs/btrfs/btrfs.o
      Signed-off-by: NLi Zefan <lizefan@huawei.com>
      18077bb4
    • L
      Btrfs: zero unused bytes in inode item · 293f7e07
      Li Zefan 提交于
      The otime field is not zeroed, so users will see random otime in an old
      filesystem with a new kernel which has otime support in the future.
      
      The reserved bytes are also not zeroed, and we'll have compatibility
      issue if we make use of those bytes.
      Signed-off-by: NLi Zefan <lizefan@huawei.com>
      293f7e07
    • L
      Btrfs: kill free_space pointer from inode structure · b4d7c3c9
      Li Zefan 提交于
      Inodes always allocate free space with BTRFS_BLOCK_GROUP_DATA type,
      which means every inode has the same BTRFS_I(inode)->free_space pointer.
      
      This shrinks struct btrfs_inode by 4 bytes (or 8 bytes on 64 bits).
      Signed-off-by: NLi Zefan <lizefan@huawei.com>
      b4d7c3c9
    • A
      btrfs read error corrected message floods the console during recovery · d5b025d5
      Anand Jain 提交于
      Changing printk_in_rcu to printk_ratelimited_in_rcu will suffice
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      d5b025d5
    • J
      Btrfs: fix buffer leak in btrfs_next_old_leaf · e6466e35
      Jan Schmidt 提交于
      When calling btrfs_next_old_leaf, we were leaking an extent buffer in the
      rare case of using the deadlock avoidance code needed for the tree mod log.
      Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      e6466e35
    • L
      Btrfs: do not count in readonly bytes · f6175efa
      Liu Bo 提交于
      If a block group is ro, do not count its entries in when we dump space info.
      Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      f6175efa
    • L
      Btrfs: add ro notification to dump_space_info · 799ffc3c
      Liu Bo 提交于
      Block group has ro attributes, make dump_space_info show it.
      Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      799ffc3c
    • L
      Btrfs: fix a bug of writting free space cache during balance · cf7c1ef6
      Liu Bo 提交于
      Here is the whole story:
      1)
      A free space cache consists of two parts:
      o  free space cache inode, which is special becase it's stored in root tree.
      o  free space info, which is stored as the above inode's file data.
      
      But we only build up another new inode and does not flush its free space info
      onto disk when we _clear and setup_ free space cache, and this ends up with
      that the block group cache's cache_state remains DC_SETUP instead of DC_WRITTEN.
      
      And holding DC_SETUP means that we will not truncate this free space cache inode,
      which means the disk offset of its file extent will remain _unchanged_ at least
      until next transaction finishes committing itself.
      
      2)
      We can set a block group readonly when we relocate the block group.
      
      However,
      if the readonly block group covers the disk offset where our free space cache
      inode is going to write, it will force the free space cache inode into
      cow_file_range() and it'll end up hitting a BUG_ON.
      
      3)
      Due to the above analysis, we fix this bug by adding the missing dirty flag.
      
      4)
      However, it's not over, there is still another case, nospace_cache.
      
      With nospace_cache, we do not want to set dirty flag, instead we just truncate
      free space cache inode and bail out with setting cache state DC_WRITTEN.
      
      We can benifit from it since it saves us another 'pre-allocation' part which
      usually costs a lot.
      Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      cf7c1ef6
    • L
      Btrfs: do not abort transaction in prealloc case · 06789384
      Liu Bo 提交于
      During disk balance, we prealloc new file extent for file data relocation,
      but we may fail in 'no available space' case, and it leads to flipping btrfs
      into readonly.
      
      It is not necessary to bail out and abort transaction since we do have several
      ways to rescue ourselves from ENOSPC case.
      Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      06789384
    • L
      Btrfs: kill root from btrfs_is_free_space_inode · 83eea1f1
      Liu Bo 提交于
      Since root can be fetched via BTRFS_I macro directly, we can save an args
      for btrfs_is_free_space_inode().
      Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      83eea1f1
    • L
      Btrfs: fix btrfs_is_free_space_inode to recognize btree inode · 51a8cf9d
      Liu Bo 提交于
      For btree inode, its root is also 'tree root', so btree inode can be
      misunderstood as a free space inode.
      
      We should add one more check for btree inode.
      Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      51a8cf9d
    • S
      Btrfs: avoid I/O repair BUG() from btree_read_extent_buffer_pages() · c0901581
      Stefan Behrens 提交于
      From btree_read_extent_buffer_pages(), currently repair_io_failure()
      can be called with mirror_num being zero when submit_one_bio() returned
      an error before. This used to cause a BUG_ON(!mirror_num) in
      repair_io_failure() and indeed this is not a case that needs the I/O
      repair code to rewrite disk blocks.
      This commit prevents calling repair_io_failure() in this case and thus
      avoids the BUG_ON() and malfunction.
      Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      c0901581
    • J
      Btrfs: rework shrink_delalloc · f4c738c2
      Josef Bacik 提交于
      So shrink_delalloc has grown all sorts of cruft over the years thanks to
      many reworkings of how we track enospc.  What happens now as we fill up the
      disk is we will loop for freaking ever hoping to reclaim a arbitrary amount
      of space of metadata, this was from when everybody flushed at the same time.
      Now we only have people flushing one at a time.  So instead of trying to
      reclaim a huge amount of space, just try to flush a decent chunk of space,
      and stop looping as soon as we have enough free space to satisfy our
      reservation.  This makes xfstests 224 go much faster.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      f4c738c2
    • L
      Btrfs: do not set subvolume flags in readonly mode · b9ca0664
      Liu Bo 提交于
      $ mkfs.btrfs /dev/sdb7
      $ btrfstune -S1 /dev/sdb7
      $ mount /dev/sdb7 /mnt/btrfs
      mount: block device /dev/sdb7 is write-protected, mounting read-only
      $ btrfs dev add /dev/sdb8 /mnt/btrfs/
      
      Now we get a btrfs in which mnt flags has readonly but sb flags does
      not.  So for those ioctls that only check sb flags with MS_RDONLY, it
      is going to be a problem.
      Setting subvolume flags is such an ioctl, we should use mnt_want_write_file()
      to check RO flags.
      Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
      b9ca0664