1. 10 6月, 2014 1 次提交
  2. 25 4月, 2014 2 次提交
    • M
      Btrfs: fix inode caching vs tree log · 1c70d8fb
      Miao Xie 提交于
      Currently, with inode cache enabled, we will reuse its inode id immediately
      after unlinking file, we may hit something like following:
      
      |->iput inode
      |->return inode id into inode cache
      |->create dir,fsync
      |->power off
      
      An easy way to reproduce this problem is:
      
      mkfs.btrfs -f /dev/sdb
      mount /dev/sdb /mnt -o inode_cache,commit=100
      dd if=/dev/zero of=/mnt/data bs=1M count=10 oflag=sync
      inode_id=`ls -i /mnt/data | awk '{print $1}'`
      rm -f /mnt/data
      
      i=1
      while [ 1 ]
      do
              mkdir /mnt/dir_$i
              test1=`stat /mnt/dir_$i | grep Inode: | awk '{print $4}'`
              if [ $test1 -eq $inode_id ]
              then
      		dd if=/dev/zero of=/mnt/dir_$i/data bs=1M count=1 oflag=sync
      		echo b > /proc/sysrq-trigger
      	fi
      	sleep 1
              i=$(($i+1))
      done
      
      mount /dev/sdb /mnt
      umount /dev/sdb
      btrfs check /dev/sdb
      
      We fix this problem by adding unlinked inode's id into pinned tree,
      and we can not reuse them until committing transaction.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NWang Shilong <wangsl.fnst@cn.fujitsu.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      1c70d8fb
    • W
      Btrfs: avoid triggering bug_on() when we fail to start inode caching task · e60efa84
      Wang Shilong 提交于
      When running stress test(including snapshots,balance,fstress), we trigger
      the following BUG_ON() which is because we fail to start inode caching task.
      
      [  181.131945] kernel BUG at fs/btrfs/inode-map.c:179!
      [  181.137963] invalid opcode: 0000 [#1] SMP
      [  181.217096] CPU: 11 PID: 2532 Comm: btrfs Not tainted 3.14.0 #1
      [  181.240521] task: ffff88013b621b30 ti: ffff8800b6ada000 task.ti: ffff8800b6ada000
      [  181.367506] Call Trace:
      [  181.371107]  [<ffffffffa036c1be>] btrfs_return_ino+0x9e/0x110 [btrfs]
      [  181.379191]  [<ffffffffa038082b>] btrfs_evict_inode+0x46b/0x4c0 [btrfs]
      [  181.387464]  [<ffffffff810b5a70>] ? autoremove_wake_function+0x40/0x40
      [  181.395642]  [<ffffffff811dc5fe>] evict+0x9e/0x190
      [  181.401882]  [<ffffffff811dcde3>] iput+0xf3/0x180
      [  181.408025]  [<ffffffffa03812de>] btrfs_orphan_cleanup+0x1ee/0x430 [btrfs]
      [  181.416614]  [<ffffffffa03a6abd>] btrfs_mksubvol.isra.29+0x3bd/0x450 [btrfs]
      [  181.425399]  [<ffffffffa03a6cd6>] btrfs_ioctl_snap_create_transid+0x186/0x190 [btrfs]
      [  181.435059]  [<ffffffffa03a6e3b>] btrfs_ioctl_snap_create_v2+0xeb/0x130 [btrfs]
      [  181.444148]  [<ffffffffa03a9656>] btrfs_ioctl+0xf76/0x2b90 [btrfs]
      [  181.451971]  [<ffffffff8117e565>] ? handle_mm_fault+0x475/0xe80
      [  181.459509]  [<ffffffff8167ba0c>] ? __do_page_fault+0x1ec/0x520
      [  181.467046]  [<ffffffff81185b35>] ? do_mmap_pgoff+0x2f5/0x3c0
      [  181.474393]  [<ffffffff811d4da8>] do_vfs_ioctl+0x2d8/0x4b0
      [  181.481450]  [<ffffffff811d5001>] SyS_ioctl+0x81/0xa0
      [  181.488021]  [<ffffffff81680b69>] system_call_fastpath+0x16/0x1b
      
      We should avoid triggering BUG_ON() here, instead, we output warning messages
      and clear inode_cache option.
      Signed-off-by: NWang Shilong <wangsl.fnst@cn.fujitsu.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      e60efa84
  3. 07 4月, 2014 1 次提交
    • J
      Btrfs: remove transaction from send · 9e351cc8
      Josef Bacik 提交于
      Lets try this again.  We can deadlock the box if we send on a box and try to
      write onto the same fs with the app that is trying to listen to the send pipe.
      This is because the writer could get stuck waiting for a transaction commit
      which is being blocked by the send.  So fix this by making sure looking at the
      commit roots is always going to be consistent.  We do this by keeping track of
      which roots need to have their commit roots swapped during commit, and then
      taking the commit_root_sem and swapping them all at once.  Then make sure we
      take a read lock on the commit_root_sem in cases where we search the commit root
      to make sure we're always looking at a consistent view of the commit roots.
      Previously we had problems with this because we would swap a fs tree commit root
      and then swap the extent tree commit root independently which would cause the
      backref walking code to screw up sometimes.  With this patch we no longer
      deadlock and pass all the weird send/receive corner cases.  Thanks,
      Reportedy-by: NHugo Mills <hugo@carfax.org.uk>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      9e351cc8
  4. 12 11月, 2013 5 次提交
    • D
      btrfs: Use WARN_ON()'s return value in place of WARN_ON(1) · fae7f21c
      Dulshani Gunawardhana 提交于
      Use WARN_ON()'s return value in place of WARN_ON(1) for cleaner source
      code that outputs a more descriptive warnings. Also fix the styling
      warning of redundant braces that came up as a result of this fix.
      Signed-off-by: NDulshani Gunawardhana <dulshani.gunawardhana89@gmail.com>
      Reviewed-by: NZach Brown <zab@redhat.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      fae7f21c
    • S
      Btrfs: Don't allocate inode that is already in use · ff76b056
      Stefan Behrens 提交于
      Due to an off-by-one error, it is possible to reproduce a bug
      when the inode cache is used.
      
      The same inode number is assigned twice, the second time this
      leads to an EEXIST in btrfs_insert_empty_items().
      
      The issue can happen when a file is removed right after a subvolume
      is created and then a new inode number is created before the
      inodes in free_inode_pinned are processed.
      unlink() calls btrfs_return_ino() which calls start_caching() in this
      case which adds [highest_ino + 1, BTRFS_LAST_FREE_OBJECTID] by
      searching for the highest inode (which already cannot find the
      unlinked one anymore in btrfs_find_free_objectid()). So if this
      unlinked inode's number is equal to the highest_ino + 1 (or >= this value
      instead of > this value which was the off-by-one error), we mustn't add
      the inode number to free_ino_pinned (caching_thread() does it right).
      In this case we need to try directly to add the number to the inode_cache
      which will fail in this case.
      
      When this inode number is allocated while it is still in free_ino_pinned,
      it is allocated and still added to the free inode cache when the
      pinned inodes are processed, thus one of the following inode number
      allocations will get an inode that is already in use and fail with EEXIST
      in btrfs_insert_empty_items().
      
      One example which was created with the reproducer below:
      Create a snapshot, work in the newly created snapshot for the rest.
      In unlink(inode 34284) call btrfs_return_ino() which calls start_caching().
      start_caching() calls add_free_space [34284, 18446744073709517077].
      In btrfs_return_ino(), call start_caching pinned [34284, 1] which is wrong.
      mkdir() call btrfs_find_ino_for_alloc() which returns the number 34284.
      btrfs_unpin_free_ino calls add_free_space [34284, 1].
      mkdir() call btrfs_find_ino_for_alloc() which returns the number 34284.
      EEXIST when the new inode is inserted.
      
      One possible reproducer is this one:
       #!/bin/sh
       # preparation
      TEST_DEV=/dev/sdc1
      TEST_MNT=/mnt
      umount ${TEST_MNT} 2>/dev/null || true
      mkfs.btrfs -f ${TEST_DEV}
      mount ${TEST_DEV} ${TEST_MNT} -o \
       rw,relatime,compress=lzo,space_cache,inode_cache
      btrfs subv create ${TEST_MNT}/s1
      for i in `seq 34027`; do touch ${TEST_MNT}/s1/${i}; done
      btrfs subv snap ${TEST_MNT}/s1 ${TEST_MNT}/s2
      FILENAME=`find ${TEST_MNT}/s1/ -inum 4085 | sed 's|^.*/\([^/]*\)$|\1|'`
      rm ${TEST_MNT}/s2/$FILENAME
      touch ${TEST_MNT}/s2/$FILENAME
       # the following steps can be repeated to reproduce the issue again and again
      [ -e ${TEST_MNT}/s3 ] && btrfs subv del ${TEST_MNT}/s3
      btrfs subv snap ${TEST_MNT}/s2 ${TEST_MNT}/s3
      rm ${TEST_MNT}/s3/$FILENAME
      touch ${TEST_MNT}/s3/$FILENAME
      ls -alFi ${TEST_MNT}/s?/$FILENAME
      touch ${TEST_MNT}/s3/_1 || logger FAILED
      ls -alFi ${TEST_MNT}/s?/_1
      touch ${TEST_MNT}/s3/_2 || logger FAILED
      ls -alFi ${TEST_MNT}/s?/_2
      touch ${TEST_MNT}/s3/__1 || logger FAILED
      ls -alFi ${TEST_MNT}/s?/__1
      touch ${TEST_MNT}/s3/__2 || logger FAILED
      ls -alFi ${TEST_MNT}/s?/__2
       # if the above is not enough, add the following loop:
      for i in `seq 3 9`; do touch ${TEST_MNT}/s3/__${i} || logger FAILED; done
       #for i in `seq 3 34027`; do touch ${TEST_MNT}/s3/__${i} || logger FAILED; done
       # one of the touch(1) calls in s3 fail due to EEXIST because the inode is
       # already in use that btrfs_find_ino_for_alloc() returns.
      Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
      Reviewed-by: NJan Schmidt <list.btrfs@jan-o-sch.net>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      ff76b056
    • F
      Btrfs: remove path arg from btrfs_truncate_free_space_cache · 74514323
      Filipe David Borba Manana 提交于
      Not used for anything, and removing it avoids caller's need to
      allocate a path structure.
      Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      74514323
    • F
      Btrfs: remove duplicated ino cache's inode lookup · 53645a91
      Filipe David Borba Manana 提交于
      We're doing a unnecessary extra lookup of the ino cache's
      inode when we already have it (and holding a reference)
      during the process of saving the ino cache contents to disk.
      Therefore remove this extra lookup.
      Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      53645a91
    • S
      Btrfs: eliminate the exceptional root_tree refs=0 · 69e9c6c6
      Stefan Behrens 提交于
      The fact that btrfs_root_refs() returned 0 for the tree_root caused
      bugs in the past, therefore it is set to 1 with this patch and
      (hopefully) all affected code is adapted to this change.
      
      I verified this change by temporarily adding WARN_ON() checks
      everywhere where btrfs_root_refs() is used, checking whether the
      logic of the code is changed by btrfs_root_refs() returning 1
      instead of 0 for root->root_key.objectid == BTRFS_ROOT_TREE_OBJECTID.
      With these added checks, I ran the xfstests './check -g auto'.
      
      The two roots chunk_root and log_root_tree that are only referenced
      by the superblock and the log_roots below the log_root_tree still
      have btrfs_root_refs() == 0, only the tree_root is changed.
      Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      69e9c6c6
  5. 18 5月, 2013 2 次提交
  6. 12 12月, 2012 1 次提交
    • M
      Btrfs: improve the noflush reservation · 08e007d2
      Miao Xie 提交于
      In some places(such as: evicting inode), we just can not flush the reserved
      space of delalloc, flushing the delayed directory index and delayed inode
      is OK, but we don't try to flush those things and just go back when there is
      no enough space to be reserved. This patch fixes this problem.
      
      We defined 3 types of the flush operations: NO_FLUSH, FLUSH_LIMIT and FLUSH_ALL.
      If we can in the transaction, we should not flush anything, or the deadlock
      would happen, so use NO_FLUSH. If we flushing the reserved space of delalloc
      would cause deadlock, use FLUSH_LIMIT. In the other cases, FLUSH_ALL is used,
      and we will flush all things.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      08e007d2
  7. 29 3月, 2012 1 次提交
  8. 22 3月, 2012 1 次提交
  9. 24 2月, 2012 1 次提交
  10. 17 1月, 2012 1 次提交
  11. 11 11月, 2011 1 次提交
    • M
      Btrfs: fix no reserved space for writing out inode cache · ba38eb4d
      Miao Xie 提交于
      I-node cache forgets to reserve the space when writing out it. And when
      we do some stress test, such as synctest, it will trigger WARN_ON() in
      use_block_rsv().
      
      WARNING: at fs/btrfs/extent-tree.c:5718 btrfs_alloc_free_block+0xbf/0x281 [btrfs]()
      ...
      Call Trace:
       [<ffffffff8104df86>] warn_slowpath_common+0x80/0x98
       [<ffffffff8104dfb3>] warn_slowpath_null+0x15/0x17
       [<ffffffffa0369c60>] btrfs_alloc_free_block+0xbf/0x281 [btrfs]
       [<ffffffff810cbcb8>] ? __set_page_dirty_nobuffers+0xfe/0x108
       [<ffffffffa035c040>] __btrfs_cow_block+0x118/0x3b5 [btrfs]
       [<ffffffffa035c7ba>] btrfs_cow_block+0x103/0x14e [btrfs]
       [<ffffffffa035e4c4>] btrfs_search_slot+0x249/0x6a4 [btrfs]
       [<ffffffffa036d086>] btrfs_lookup_inode+0x2a/0x8a [btrfs]
       [<ffffffffa03788b7>] btrfs_update_inode+0xaa/0x141 [btrfs]
       [<ffffffffa036d7ec>] btrfs_save_ino_cache+0xea/0x202 [btrfs]
       [<ffffffffa03a761e>] ? btrfs_update_reloc_root+0x17e/0x197 [btrfs]
       [<ffffffffa0373867>] commit_fs_roots+0xaa/0x158 [btrfs]
       [<ffffffffa03746a6>] btrfs_commit_transaction+0x405/0x731 [btrfs]
       [<ffffffff810690df>] ? wake_up_bit+0x25/0x25
       [<ffffffffa039d652>] ? btrfs_log_dentry_safe+0x43/0x51 [btrfs]
       [<ffffffffa0381c5f>] btrfs_sync_file+0x16a/0x198 [btrfs]
       [<ffffffff81122806>] ? mntput+0x21/0x23
       [<ffffffff8112d150>] vfs_fsync_range+0x18/0x21
       [<ffffffff8112d170>] vfs_fsync+0x17/0x19
       [<ffffffff8112d316>] do_fsync+0x29/0x3e
       [<ffffffff8112d348>] sys_fsync+0xb/0xf
       [<ffffffff81468352>] system_call_fastpath+0x16/0x1b
      
      Sometimes it causes BUG_ON() in the reservation code of the delayed inode
      is triggered.
      
      So we must reserve enough space for inode cache.
      
      Note: If we can not reserve the enough space for inode cache, we will
      give up writing out it.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      ba38eb4d
  12. 20 10月, 2011 1 次提交
  13. 04 6月, 2011 4 次提交
  14. 27 5月, 2011 1 次提交
  15. 25 4月, 2011 2 次提交
    • L
      Btrfs: Support reading/writing on disk free ino cache · 82d5902d
      Li Zefan 提交于
      This is similar to block group caching.
      
      We dedicate a special inode in fs tree to save free ino cache.
      
      At the very first time we create/delete a file after mount, the free ino
      cache will be loaded from disk into memory. When the fs tree is commited,
      the cache will be written back to disk.
      
      To keep compatibility, we check the root generation against the generation
      of the special inode when loading the cache, so the loading will fail
      if the btrfs filesystem was mounted in an older kernel before.
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      82d5902d
    • L
      Btrfs: Cache free inode numbers in memory · 581bb050
      Li Zefan 提交于
      Currently btrfs stores the highest objectid of the fs tree, and it always
      returns (highest+1) inode number when we create a file, so inode numbers
      won't be reclaimed when we delete files, so we'll run out of inode numbers
      as we keep create/delete files in 32bits machines.
      
      This fixes it, and it works similarly to how we cache free space in block
      cgroups.
      
      We start a kernel thread to read the file tree. By scanning inode items,
      we know which chunks of inode numbers are free, and we cache them in
      an rb-tree.
      
      Because we are searching the commit root, we have to carefully handle the
      cross-transaction case.
      
      The rb-tree is a hybrid extent+bitmap tree, so if we have too many small
      chunks of inode numbers, we'll use bitmaps. Initially we allow 16K ram
      of extents, and a bitmap will be used if we exceed this threshold. The
      extents threshold is adjusted in runtime.
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      581bb050
  16. 28 3月, 2011 1 次提交
  17. 22 9月, 2009 1 次提交
  18. 27 4月, 2009 1 次提交
  19. 13 2月, 2009 1 次提交
    • J
      Btrfs: remove btrfs_init_path · e00f7308
      Jeff Mahoney 提交于
      btrfs_init_path was initially used when the path objects were on the
      stack.  Now all the work is done by btrfs_alloc_path and btrfs_init_path
      isn't required.
      
      This patch removes it, and just uses kmem_cache_zalloc to zero out the object.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      e00f7308
  20. 06 1月, 2009 1 次提交
  21. 26 9月, 2008 1 次提交
    • Z
      Btrfs: extent_map and data=ordered fixes for space balancing · 5b21f2ed
      Zheng Yan 提交于
      * Add an EXTENT_BOUNDARY state bit to keep the writepage code
      from merging data extents that are in the process of being
      relocated.  This allows us to do accounting for them properly.
      
      * The balancing code relocates data extents indepdent of the underlying
      inode.  The extent_map code was modified to properly account for
      things moving around (invalidating extent_map caches in the inode).
      
      * Don't take the drop_mutex in the create_subvol ioctl.  It isn't
      required.
      
      * Fix walking of the ordered extent list to avoid races with sys_unlink
      
      * Change the lock ordering rules.  Transaction start goes outside
      the drop_mutex.  This allows btrfs_commit_transaction to directly
      drop the relocation trees.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      5b21f2ed
  22. 25 9月, 2008 4 次提交
  23. 11 7月, 2007 1 次提交
  24. 12 6月, 2007 1 次提交
  25. 11 4月, 2007 1 次提交
  26. 06 4月, 2007 1 次提交
  27. 05 4月, 2007 1 次提交