1. 10 6月, 2014 3 次提交
    • J
      Btrfs: rework qgroup accounting · fcebe456
      Josef Bacik 提交于
      Currently qgroups account for space by intercepting delayed ref updates to fs
      trees.  It does this by adding sequence numbers to delayed ref updates so that
      it can figure out how the tree looked before the update so we can adjust the
      counters properly.  The problem with this is that it does not allow delayed refs
      to be merged, so if you say are defragging an extent with 5k snapshots pointing
      to it we will thrash the delayed ref lock because we need to go back and
      manually merge these things together.  Instead we want to process quota changes
      when we know they are going to happen, like when we first allocate an extent, we
      free a reference for an extent, we add new references etc.  This patch
      accomplishes this by only adding qgroup operations for real ref changes.  We
      only modify the sequence number when we need to lookup roots for bytenrs, this
      reduces the amount of churn on the sequence number and allows us to merge
      delayed refs as we add them most of the time.  This patch encompasses a bunch of
      architectural changes
      
      1) qgroup ref operations: instead of tracking qgroup operations through the
      delayed refs we simply add new ref operations whenever we notice that we need to
      when we've modified the refs themselves.
      
      2) tree mod seq:  we no longer have this separation of major/minor counters.
      this makes the sequence number stuff much more sane and we can remove some
      locking that was needed to protect the counter.
      
      3) delayed ref seq: we now read the tree mod seq number and use that as our
      sequence.  This means each new delayed ref doesn't have it's own unique sequence
      number, rather whenever we go to lookup backrefs we inc the sequence number so
      we can make sure to keep any new operations from screwing up our world view at
      that given point.  This allows us to merge delayed refs during runtime.
      
      With all of these changes the delayed ref stuff is a little saner and the qgroup
      accounting stuff no longer goes negative in some cases like it was before.
      Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      fcebe456
    • M
    • D
      btrfs: assert that send is not in progres before root deletion · 61155aa0
      David Sterba 提交于
      CC: Miao Xie <miaox@cn.fujitsu.com>
      CC: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NChris Mason <clm@fb.com>
      61155aa0
  2. 07 4月, 2014 2 次提交
    • J
      Btrfs: remove transaction from send · 9e351cc8
      Josef Bacik 提交于
      Lets try this again.  We can deadlock the box if we send on a box and try to
      write onto the same fs with the app that is trying to listen to the send pipe.
      This is because the writer could get stuck waiting for a transaction commit
      which is being blocked by the send.  So fix this by making sure looking at the
      commit roots is always going to be consistent.  We do this by keeping track of
      which roots need to have their commit roots swapped during commit, and then
      taking the commit_root_sem and swapping them all at once.  Then make sure we
      take a read lock on the commit_root_sem in cases where we search the commit root
      to make sure we're always looking at a consistent view of the commit roots.
      Previously we had problems with this because we would swap a fs tree commit root
      and then swap the extent tree commit root independently which would cause the
      backref walking code to screw up sometimes.  With this patch we no longer
      deadlock and pass all the weird send/receive corner cases.  Thanks,
      Reportedy-by: NHugo Mills <hugo@carfax.org.uk>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      9e351cc8
    • J
      Btrfs: don't clear uptodate if the eb is under IO · a26e8c9f
      Josef Bacik 提交于
      So I have an awful exercise script that will run snapshot, balance and
      send/receive in parallel.  This sometimes would crash spectacularly and when it
      came back up the fs would be completely hosed.  Turns out this is because of a
      bad interaction of balance and send/receive.  Send will hold onto its entire
      path for the whole send, but its blocks could get relocated out from underneath
      it, and because it doesn't old tree locks theres nothing to keep this from
      happening.  So it will go to read in a slot with an old transid, and we could
      have re-allocated this block for something else and it could have a completely
      different transid.  But because we think it is invalid we clear uptodate and
      re-read in the block.  If we do this before we actually write out the new block
      we could write back stale data to the fs, and boom we're screwed.
      
      Now we definitely need to fix this disconnect between send and balance, but we
      really really need to not allow ourselves to accidently read in stale data over
      new data.  So make sure we check if the extent buffer is not under io before
      clearing uptodate, this will kick back EIO to the caller instead of reading in
      stale data and keep us from corrupting the fs.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      a26e8c9f
  3. 21 3月, 2014 1 次提交
    • J
      Btrfs: fix deadlock with nested trans handles · 3bbb24b2
      Josef Bacik 提交于
      Zach found this deadlock that would happen like this
      
      btrfs_end_transaction <- reduce trans->use_count to 0
        btrfs_run_delayed_refs
          btrfs_cow_block
            find_free_extent
      	btrfs_start_transaction <- increase trans->use_count to 1
                allocate chunk
      	btrfs_end_transaction <- decrease trans->use_count to 0
      	  btrfs_run_delayed_refs
      	    lock tree block we are cowing above ^^
      
      We need to only decrease trans->use_count if it is above 1, otherwise leave it
      alone.  This will make nested trans be the only ones who decrease their added
      ref, and will let us get rid of the trans->use_count++ hack if we have to commit
      the transaction.  Thanks,
      
      cc: stable@vger.kernel.org
      Reported-by: NZach Brown <zab@redhat.com>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Tested-by: NZach Brown <zab@redhat.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      3bbb24b2
  4. 11 3月, 2014 3 次提交
  5. 29 1月, 2014 9 次提交
    • Q
      btrfs: Add noinode_cache mount option · 3818aea2
      Qu Wenruo 提交于
      Add noinode_cache mount option for btrfs.
      
      Since inode map cache involves all the btrfs_find_free_ino/return_ino
      things and if just trigger the mount_opt,
      an inode number get from inode map cache will not returned to inode map
      cache.
      
      To keep the find and return inode both in the same behavior,
      a new bit in mount_opt, CHANGE_INODE_CACHE, is introduced for this idea.
      CHANGE_INODE_CACHE is set/cleared in remounting, and the original
      INODE_MAP_CACHE is set/cleared according to CHANGE_INODE_CACHE after a
      success transaction.
      Since find/return inode is all done between btrfs_start_transaction and
      btrfs_commit_transaction, this will keep consistent behavior.
      
      Also noinode_cache mount option will not stop the caching_kthread.
      
      Cc: David Sterba <dsterba@suse.cz>
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      3818aea2
    • J
      Btrfs: throttle delayed refs better · 0a2b2a84
      Josef Bacik 提交于
      On one of our gluster clusters we noticed some pretty big lag spikes.  This
      turned out to be because our transaction commit was taking like 3 minutes to
      complete.  This is because we have like 30 gigs of metadata, so our global
      reserve would end up being the max which is like 512 mb.  So our throttling code
      would allow a ridiculous amount of delayed refs to build up and then they'd all
      get run at transaction commit time, and for a cold mounted file system that
      could take up to 3 minutes to run.  So fix the throttling to be based on both
      the size of the global reserve and how long it takes us to run delayed refs.
      This patch tracks the time it takes to run delayed refs and then only allows 1
      seconds worth of outstanding delayed refs at a time.  This way it will auto-tune
      itself from cold cache up to when everything is in memory and it no longer has
      to go to disk.  This makes our transaction commits take much less time to run.
      Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      0a2b2a84
    • J
      Btrfs: attach delayed ref updates to delayed ref heads · d7df2c79
      Josef Bacik 提交于
      Currently we have two rb-trees, one for delayed ref heads and one for all of the
      delayed refs, including the delayed ref heads.  When we process the delayed refs
      we have to hold onto the delayed ref lock for all of the selecting and merging
      and such, which results in quite a bit of lock contention.  This was solved by
      having a waitqueue and only one flusher at a time, however this hurts if we get
      a lot of delayed refs queued up.
      
      So instead just have an rb tree for the delayed ref heads, and then attach the
      delayed ref updates to an rb tree that is per delayed ref head.  Then we only
      need to take the delayed ref lock when adding new delayed refs and when
      selecting a delayed ref head to process, all the rest of the time we deal with a
      per delayed ref head lock which will be much less contentious.
      
      The locking rules for this get a little more complicated since we have to lock
      up to 3 things to properly process delayed refs, but I will address that problem
      later.  For now this passes all of xfstests and my overnight stress tests.
      Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      d7df2c79
    • J
      Btrfs: make fsync latency less sucky · 5039eddc
      Josef Bacik 提交于
      Looking into some performance related issues with large amounts of metadata
      revealed that we can have some pretty huge swings in fsync() performance.  If we
      have a lot of delayed refs backed up (as you will tend to do with lots of
      metadata) fsync() will wander off and try to run some of those delayed refs
      which can result in reading from disk and such.  Since the actual act of fsync()
      doesn't create any delayed refs there is no need to make it throttle on delayed
      ref stuff, that will be handled by other people.  With this patch we get much
      smoother fsync performance with large amounts of metadata.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      5039eddc
    • W
      Btrfs: fix protection between send and root deletion · 18f687d5
      Wang Shilong 提交于
      We should gurantee that parent and clone roots can not be destroyed
      during send, for this we have two ideas.
      
      1.by holding @subvol_sem, this might be a nightmare, because it will
      block all subvolumes deletion for a long time.
      
      2.Miao pointed out we can reuse @send_in_progress, that mean we will
      skip snapshot deletion if root sending is in progress.
      
      Here we adopt the second approach since it won't block other subvolumes
      deletion for a long time.
      
      Besides in btrfs_clean_one_deleted_snapshot(), we only check first root
      , if this root is involved in send, we return directly rather than
      continue to check.There are several reasons about it:
      
      1.this case happen seldomly.
      2.after sending,cleaner thread can continue to drop that root.
      3.make code simple
      
      Cc: David Sterba <dsterba@suse.cz>
      Signed-off-by: NWang Shilong <wangsl.fnst@cn.fujitsu.com>
      Reviewed-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      18f687d5
    • M
      Btrfs: remove btrfs_end_transaction_dmeta() · a56dbd89
      Miao Xie 提交于
      Two reasons:
      - btrfs_end_transaction_dmeta() is the same as btrfs_end_transaction_throttle()
        so it is unnecessary.
      - All the delayed items should be dealt in the current transaction, so the
        workers should not commit the transaction, instead, deal with the delayed
        items as many as possible.
      
      So we can remove btrfs_end_transaction_dmeta()
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      a56dbd89
    • F
      Btrfs: convert printk to btrfs_ and fix BTRFS prefix · efe120a0
      Frank Holton 提交于
      Convert all applicable cases of printk and pr_* to the btrfs_* macros.
      
      Fix all uses of the BTRFS prefix.
      Signed-off-by: NFrank Holton <fholton@gmail.com>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      efe120a0
    • W
      Btrfs: wrap repeated code into scrub_blocked_if_needed() · cb7ab021
      Wang Shilong 提交于
      Just wrap same code into one function scrub_blocked_if_needed().
      
      This make a change that we will move waiting (@workers_pending = 0)
      before we can wake up commiting transaction(atomic_inc(@scrub_paused)),
      we must take carefully to not deadlock here.
      
      Thread 1			Thread 2
      				|->btrfs_commit_transaction()
      					|->set trans type(COMMIT_DOING)
      					|->btrfs_scrub_paused()(blocked)
      |->join_transaction(blocked)
      
      Move btrfs_scrub_paused() before setting trans type which means we can
      still join a transaction when commiting_transaction is blocked.
      Signed-off-by: NWang Shilong <wangsl.fnst@cn.fujitsu.com>
      Suggested-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      cb7ab021
    • L
      Btrfs: introduce a head ref rbtree · c46effa6
      Liu Bo 提交于
      The way how we process delayed refs is
      1) get a bunch of head refs,
      2) pick up one head ref,
      3) go one node back for any delayed ref updates.
      
      The head ref is also linked in the same rbtree as the delayed ref is,
      so in 1) stage, we have to walk one by one including not only head refs, but
      delayed refs.
      
      When we have a great number of delayed refs pending to process,
      this'll cost time a lot.
      
      Here we introduce a head ref specific rbtree, it only has head refs, so troubles
      go away.
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      c46effa6
  6. 21 11月, 2013 1 次提交
    • L
      Btrfs: fix lockdep error in async commit · b1a06a4b
      Liu Bo 提交于
      Lockdep complains about btrfs's async commit:
      
      [ 2372.462171] [ BUG: bad unlock balance detected! ]
      [ 2372.462191] 3.12.0+ #32 Tainted: G        W
      [ 2372.462209] -------------------------------------
      [ 2372.462228] ceph-osd/14048 is trying to release lock (sb_internal) at:
      [ 2372.462275] [<ffffffffa022cb10>] btrfs_commit_transaction_async+0x1b0/0x2a0 [btrfs]
      [ 2372.462305] but there are no more locks to release!
      [ 2372.462324]
      [ 2372.462324] other info that might help us debug this:
      [ 2372.462349] no locks held by ceph-osd/14048.
      [ 2372.462367]
      [ 2372.462367] stack backtrace:
      [ 2372.462386] CPU: 2 PID: 14048 Comm: ceph-osd Tainted: G        W    3.12.0+ #32
      [ 2372.462414] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS 080015  11/09/2011
      [ 2372.462455]  ffffffffa022cb10 ffff88007490fd28 ffffffff816f094a ffff8800378aa320
      [ 2372.462491]  ffff88007490fd50 ffffffff810adf4c ffff8800378aa320 ffff88009af97650
      [ 2372.462526]  ffffffffa022cb10 ffff88007490fd88 ffffffff810b01ee ffff8800898c0000
      [ 2372.462562] Call Trace:
      [ 2372.462584]  [<ffffffffa022cb10>] ? btrfs_commit_transaction_async+0x1b0/0x2a0 [btrfs]
      [ 2372.462619]  [<ffffffff816f094a>] dump_stack+0x45/0x56
      [ 2372.462642]  [<ffffffff810adf4c>] print_unlock_imbalance_bug+0xec/0x100
      [ 2372.462677]  [<ffffffffa022cb10>] ? btrfs_commit_transaction_async+0x1b0/0x2a0 [btrfs]
      [ 2372.462710]  [<ffffffff810b01ee>] lock_release+0x18e/0x210
      [ 2372.462742]  [<ffffffffa022cb36>] btrfs_commit_transaction_async+0x1d6/0x2a0 [btrfs]
      [ 2372.462783]  [<ffffffffa025a7ce>] btrfs_ioctl_start_sync+0x3e/0xc0 [btrfs]
      [ 2372.462822]  [<ffffffffa025f1d3>] btrfs_ioctl+0x4c3/0x1f70 [btrfs]
      [ 2372.462849]  [<ffffffff812c0321>] ? avc_has_perm+0x121/0x1b0
      [ 2372.462873]  [<ffffffff812c0224>] ? avc_has_perm+0x24/0x1b0
      [ 2372.462897]  [<ffffffff8107ecc8>] ? sched_clock_cpu+0xa8/0x100
      [ 2372.462922]  [<ffffffff8117b145>] do_vfs_ioctl+0x2e5/0x4e0
      [ 2372.462946]  [<ffffffff812c19e6>] ? file_has_perm+0x86/0xa0
      [ 2372.462969]  [<ffffffff8117b3c1>] SyS_ioctl+0x81/0xa0
      [ 2372.462991]  [<ffffffff817045a4>] tracesys+0xdd/0xe2
      
      ====================================================
      
      It's because that we don't do the right thing when checking if it's ok to
      tell lockdep that we're trying to release the rwsem.
      
      If the trans handle's type is TRANS_ATTACH, we won't acquire the freeze rwsem, but
      as TRANS_ATTACH fits the check (trans < TRANS_JOIN_NOLOCK), we'll release the freeze
      rwsem, which makes lockdep complains a lot.
      Reported-by: NMa Jianpeng <majianpeng@gmail.com>
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      b1a06a4b
  7. 12 11月, 2013 8 次提交
    • M
      Btrfs: rename btrfs_start_all_delalloc_inodes · 91aef86f
      Miao Xie 提交于
      rename the function -- btrfs_start_all_delalloc_inodes(), and make its
      name be compatible to btrfs_wait_ordered_roots(), since they are always
      used at the same place.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      91aef86f
    • M
      Btrfs: don't wait for the completion of all the ordered extents · b0244199
      Miao Xie 提交于
      It is very likely that there are lots of ordered extents in the filesytem,
      if we wait for the completion of all of them when we want to reclaim some
      space for the metadata space reservation, we would be blocked for a long
      time. The performance would drop down suddenly for a long time.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      b0244199
    • F
      Btrfs: fix memory leaks on transaction commit failure · 80d94fb3
      Filipe David Borba Manana 提交于
      Structures of the types tree_mod_elem and qgroup_update are allocated
      during transaction commit but were not being released if the call to
      btrfs_run_delayed_items() returned an error.
      
      Stack trace reported by kmemleak:
      
      unreferenced object 0xffff880679f0b398 (size 128):
        comm "umount", pid 21508, jiffies 4295967793 (age 36718.112s)
        hex dump (first 32 bytes):
          60 b5 f0 79 06 88 ff ff 00 00 00 00 00 00 00 00  `..y............
          00 00 00 00 00 00 00 00 50 1c 00 00 00 00 00 00  ........P.......
        backtrace:
          [<ffffffff81742d26>] kmemleak_alloc+0x26/0x50
          [<ffffffff811889c2>] kmem_cache_alloc_trace+0x112/0x200
          [<ffffffffa046f2d3>] tree_mod_log_insert_key.constprop.45+0x93/0x150 [btrfs]
          [<ffffffffa04720f9>] __btrfs_cow_block+0x299/0x4f0 [btrfs]
          [<ffffffffa0472510>] btrfs_cow_block+0x120/0x1f0 [btrfs]
          [<ffffffffa0476679>] btrfs_search_slot+0x449/0x930 [btrfs]
          [<ffffffffa048eecf>] btrfs_lookup_inode+0x2f/0xa0 [btrfs]
          [<ffffffffa04eb49c>] __btrfs_update_delayed_inode+0x1c/0x1d0 [btrfs]
          [<ffffffffa04eb9e2>] __btrfs_run_delayed_items+0x162/0x1e0 [btrfs]
          [<ffffffffa04eba63>] btrfs_delayed_inode_exit+0x3/0x20 [btrfs]
          [<ffffffffa0499c63>] btrfs_commit_transaction+0x203/0xa50 [btrfs]
          [<ffffffffa046b519>] btrfs_sync_fs+0x69/0x110 [btrfs]
          [<ffffffff811cb210>] __sync_filesystem+0x30/0x60
          [<ffffffff811cb2bb>] sync_filesystem+0x4b/0x70
          [<ffffffff8119ce7b>] generic_shutdown_super+0x3b/0xf0
          [<ffffffff8119cfc6>] kill_anon_super+0x16/0x30
      unreferenced object 0xffff880677e0dd88 (size 32):
        comm "umount", pid 21508, jiffies 4295967793 (age 36718.112s)
        hex dump (first 32 bytes):
          78 75 11 a9 06 88 ff ff 00 c0 e0 77 06 88 ff ff  xu.........w....
          40 c3 a2 70 06 88 ff ff 00 00 00 00 00 00 00 00  @..p............
        backtrace:
          [<ffffffff81742d26>] kmemleak_alloc+0x26/0x50
          [<ffffffff811889c2>] kmem_cache_alloc_trace+0x112/0x200
          [<ffffffffa04fa54f>] btrfs_qgroup_record_ref+0xf/0x90 [btrfs]
          [<ffffffffa04e1914>] btrfs_add_delayed_tree_ref+0xf4/0x170 [btrfs]
          [<ffffffffa048518a>] btrfs_free_tree_block+0x9a/0x220 [btrfs]
          [<ffffffffa0472163>] __btrfs_cow_block+0x303/0x4f0 [btrfs]
          [<ffffffffa0472510>] btrfs_cow_block+0x120/0x1f0 [btrfs]
          [<ffffffffa0476679>] btrfs_search_slot+0x449/0x930 [btrfs]
          [<ffffffffa048eecf>] btrfs_lookup_inode+0x2f/0xa0 [btrfs]
          [<ffffffffa04eb49c>] __btrfs_update_delayed_inode+0x1c/0x1d0 [btrfs]
          [<ffffffffa04eb9e2>] __btrfs_run_delayed_items+0x162/0x1e0 [btrfs]
          [<ffffffffa04eba63>] btrfs_delayed_inode_exit+0x3/0x20 [btrfs]
          [<ffffffffa0499c63>] btrfs_commit_transaction+0x203/0xa50 [btrfs]
          [<ffffffffa046b519>] btrfs_sync_fs+0x69/0x110 [btrfs]
          [<ffffffff811cb210>] __sync_filesystem+0x30/0x60
          [<ffffffff811cb2bb>] sync_filesystem+0x4b/0x70
      Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      80d94fb3
    • M
      Btrfs: fix BUG_ON() casued by the reserved space migration · 20dd2cbf
      Miao Xie 提交于
      When we did space balance and snapshot creation at the same time, we might
      meet the following oops:
       kernel BUG at fs/btrfs/inode.c:3038!
       [SNIP]
       Call Trace:
       [<ffffffffa0411ec7>] btrfs_orphan_cleanup+0x293/0x407 [btrfs]
       [<ffffffffa042dc45>] btrfs_mksubvol.isra.28+0x259/0x373 [btrfs]
       [<ffffffffa042de85>] btrfs_ioctl_snap_create_transid+0x126/0x156 [btrfs]
       [<ffffffffa042dff1>] btrfs_ioctl_snap_create_v2+0xd0/0x121 [btrfs]
       [<ffffffffa0430b2c>] btrfs_ioctl+0x414/0x1854 [btrfs]
       [<ffffffff813b60b7>] ? __do_page_fault+0x305/0x379
       [<ffffffff811215a9>] vfs_ioctl+0x1d/0x39
       [<ffffffff81121d7c>] do_vfs_ioctl+0x32d/0x3e2
       [<ffffffff81057fe7>] ? finish_task_switch+0x80/0xb8
       [<ffffffff81121e88>] SyS_ioctl+0x57/0x83
       [<ffffffff813b39ff>] ? do_device_not_available+0x12/0x14
       [<ffffffff813b99c2>] system_call_fastpath+0x16/0x1b
       [SNIP]
       RIP  [<ffffffffa040da40>] btrfs_orphan_add+0xc3/0x126 [btrfs]
      
      The reason of the problem is that the relocation root creation stole
      the reserved space, which was reserved for orphan item deletion.
      
      There are several ways to fix this problem, one is to increasing
      the reserved space size of the space balace, and then we can use
      that space to create the relocation tree for each fs/file trees.
      But it is hard to calculate the suitable size because we doesn't
      know how many fs/file trees we need relocate.
      
      We fixed this problem by reserving the space for relocation root creation
      actively since the space it need is very small (one tree block, used for
      root node copy), then we use that reserved space to create the
      relocation tree. If we don't reserve space for relocation tree creation,
      we will use the reserved space of the balance.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      20dd2cbf
    • J
      Btrfs: fix two use-after-free bugs with transaction cleanup · 724e2315
      Josef Bacik 提交于
      I was noticing the slab redzone stuff going off every once and a while during
      transaction aborts.  This was caused by two things
      
      1) We would walk the pending snapshots and set their error to -ECANCELED.  We
      don't need to do this, the snapshot stuff waits for a transaction commit and if
      there is a problem we just free our pending snapshot object and exit.  Doing
      this was causing us to touch the pending snapshot object after the thing had
      already been freed.
      
      2) We were freeing the transaction manually with wanton disregard for it's
      use_count reference counter.  To fix this I cleaned up the transaction freeing
      loop to either wait for the transaction commit to finish if it was in the middle
      of that (since it will be cleaned and freed up there) or to do the cleanup
      oursevles.
      
      I also moved the global "kill all things dirty everywhere" stuff outside of the
      transaction cleanup loop since that only needs to be done once.  With this patch
      I'm no longer seeing slab corruption because of use after frees.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      724e2315
    • J
      Btrfs: remove all BUG_ON()'s from commit_cowonly_roots · c16ce190
      Josef Bacik 提交于
      Noticed this when forcing errors to happen during delayed ref running.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      c16ce190
    • J
      Btrfs: cleanup transaction on abort · 4e121c06
      Josef Bacik 提交于
      If we abort not during a transaction commit we won't clean up anything until we
      unmount.  Unfortunately if we abort in the middle of writing out an ordered
      extent we won't clean it up and if somebody is waiting on that ordered extent
      they will wait forever.  To fix this just make the transaction kthread call the
      cleanup transaction stuff if it notices theres an error, and make
      btrfs_end_transaction wake up the transaction kthread if there is an error.
      Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      4e121c06
    • J
      Btrfs: reset intwrite on transaction abort · e0228285
      Josef Bacik 提交于
      If we abort a transaction in the middle of a commit we weren't undoing the
      intwrite locking.  This patch fixes that problem.
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      e0228285
  8. 05 10月, 2013 1 次提交
    • J
      Btrfs: fix transid verify errors when recovering log tree · 60e7cd3a
      Josef Bacik 提交于
      If we crash with a log, remount and recover that log, and then crash before we
      can commit another transaction we will get transid verify errors on the next
      mount.  This is because we were not zero'ing out the log when we committed the
      transaction after recovery.  This is ok as long as we commit another transaction
      at some point in the future, but if you abort or something else goes wrong you
      can end up in this weird state because the recovery stuff says that the tree log
      should have a generation+1 of the super generation, which won't be the case of
      the transaction that was started for recovery.  Fix this by removing the check
      and _always_ zero out the log portion of the super when we commit a transaction.
      This fixes the transid verify issues I was seeing with my force errors tests.
      Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      60e7cd3a
  9. 21 9月, 2013 1 次提交
  10. 01 9月, 2013 7 次提交
  11. 10 8月, 2013 1 次提交
  12. 02 7月, 2013 1 次提交
    • J
      Btrfs: make the chunk allocator completely tree lockless · 6df9a95e
      Josef Bacik 提交于
      When adjusting the enospc rules for relocation I ran into a deadlock because we
      were relocating the only system chunk and that forced us to try and allocate a
      new system chunk while holding locks in the chunk tree, which caused us to
      deadlock.  To fix this I've moved all of the dev extent addition and chunk
      addition out to the delayed chunk completion stuff.  We still keep the in-memory
      stuff which makes sure everything is consistent.
      
      One change I had to make was to search the commit root of the device tree to
      find a free dev extent, and hold onto any chunk em's that we allocated in that
      transaction so we do not allocate the same dev extent twice.  This has the side
      effect of fixing a bug with balance that has been there ever since balance
      existed.  Basically you can free a block group and it's dev extent and then
      immediately allocate that dev extent for a new block group and write stuff to
      that dev extent, all within the same transaction.  So if you happen to crash
      during a balance you could come back to a completely broken file system.  This
      patch should keep these sort of things from happening in the future since we
      won't be able to allocate free'd dev extents until after the transaction
      commits.  This has passed all of the xfstests and my super annoying stress test
      followed by a balance.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      6df9a95e
  13. 01 7月, 2013 2 次提交