1. 29 1月, 2014 9 次提交
    • Q
      btrfs: Add noinode_cache mount option · 3818aea2
      Qu Wenruo 提交于
      Add noinode_cache mount option for btrfs.
      
      Since inode map cache involves all the btrfs_find_free_ino/return_ino
      things and if just trigger the mount_opt,
      an inode number get from inode map cache will not returned to inode map
      cache.
      
      To keep the find and return inode both in the same behavior,
      a new bit in mount_opt, CHANGE_INODE_CACHE, is introduced for this idea.
      CHANGE_INODE_CACHE is set/cleared in remounting, and the original
      INODE_MAP_CACHE is set/cleared according to CHANGE_INODE_CACHE after a
      success transaction.
      Since find/return inode is all done between btrfs_start_transaction and
      btrfs_commit_transaction, this will keep consistent behavior.
      
      Also noinode_cache mount option will not stop the caching_kthread.
      
      Cc: David Sterba <dsterba@suse.cz>
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      3818aea2
    • J
      Btrfs: throttle delayed refs better · 0a2b2a84
      Josef Bacik 提交于
      On one of our gluster clusters we noticed some pretty big lag spikes.  This
      turned out to be because our transaction commit was taking like 3 minutes to
      complete.  This is because we have like 30 gigs of metadata, so our global
      reserve would end up being the max which is like 512 mb.  So our throttling code
      would allow a ridiculous amount of delayed refs to build up and then they'd all
      get run at transaction commit time, and for a cold mounted file system that
      could take up to 3 minutes to run.  So fix the throttling to be based on both
      the size of the global reserve and how long it takes us to run delayed refs.
      This patch tracks the time it takes to run delayed refs and then only allows 1
      seconds worth of outstanding delayed refs at a time.  This way it will auto-tune
      itself from cold cache up to when everything is in memory and it no longer has
      to go to disk.  This makes our transaction commits take much less time to run.
      Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      0a2b2a84
    • J
      Btrfs: attach delayed ref updates to delayed ref heads · d7df2c79
      Josef Bacik 提交于
      Currently we have two rb-trees, one for delayed ref heads and one for all of the
      delayed refs, including the delayed ref heads.  When we process the delayed refs
      we have to hold onto the delayed ref lock for all of the selecting and merging
      and such, which results in quite a bit of lock contention.  This was solved by
      having a waitqueue and only one flusher at a time, however this hurts if we get
      a lot of delayed refs queued up.
      
      So instead just have an rb tree for the delayed ref heads, and then attach the
      delayed ref updates to an rb tree that is per delayed ref head.  Then we only
      need to take the delayed ref lock when adding new delayed refs and when
      selecting a delayed ref head to process, all the rest of the time we deal with a
      per delayed ref head lock which will be much less contentious.
      
      The locking rules for this get a little more complicated since we have to lock
      up to 3 things to properly process delayed refs, but I will address that problem
      later.  For now this passes all of xfstests and my overnight stress tests.
      Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      d7df2c79
    • J
      Btrfs: make fsync latency less sucky · 5039eddc
      Josef Bacik 提交于
      Looking into some performance related issues with large amounts of metadata
      revealed that we can have some pretty huge swings in fsync() performance.  If we
      have a lot of delayed refs backed up (as you will tend to do with lots of
      metadata) fsync() will wander off and try to run some of those delayed refs
      which can result in reading from disk and such.  Since the actual act of fsync()
      doesn't create any delayed refs there is no need to make it throttle on delayed
      ref stuff, that will be handled by other people.  With this patch we get much
      smoother fsync performance with large amounts of metadata.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      5039eddc
    • W
      Btrfs: fix protection between send and root deletion · 18f687d5
      Wang Shilong 提交于
      We should gurantee that parent and clone roots can not be destroyed
      during send, for this we have two ideas.
      
      1.by holding @subvol_sem, this might be a nightmare, because it will
      block all subvolumes deletion for a long time.
      
      2.Miao pointed out we can reuse @send_in_progress, that mean we will
      skip snapshot deletion if root sending is in progress.
      
      Here we adopt the second approach since it won't block other subvolumes
      deletion for a long time.
      
      Besides in btrfs_clean_one_deleted_snapshot(), we only check first root
      , if this root is involved in send, we return directly rather than
      continue to check.There are several reasons about it:
      
      1.this case happen seldomly.
      2.after sending,cleaner thread can continue to drop that root.
      3.make code simple
      
      Cc: David Sterba <dsterba@suse.cz>
      Signed-off-by: NWang Shilong <wangsl.fnst@cn.fujitsu.com>
      Reviewed-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      18f687d5
    • M
      Btrfs: remove btrfs_end_transaction_dmeta() · a56dbd89
      Miao Xie 提交于
      Two reasons:
      - btrfs_end_transaction_dmeta() is the same as btrfs_end_transaction_throttle()
        so it is unnecessary.
      - All the delayed items should be dealt in the current transaction, so the
        workers should not commit the transaction, instead, deal with the delayed
        items as many as possible.
      
      So we can remove btrfs_end_transaction_dmeta()
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      a56dbd89
    • F
      Btrfs: convert printk to btrfs_ and fix BTRFS prefix · efe120a0
      Frank Holton 提交于
      Convert all applicable cases of printk and pr_* to the btrfs_* macros.
      
      Fix all uses of the BTRFS prefix.
      Signed-off-by: NFrank Holton <fholton@gmail.com>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      efe120a0
    • W
      Btrfs: wrap repeated code into scrub_blocked_if_needed() · cb7ab021
      Wang Shilong 提交于
      Just wrap same code into one function scrub_blocked_if_needed().
      
      This make a change that we will move waiting (@workers_pending = 0)
      before we can wake up commiting transaction(atomic_inc(@scrub_paused)),
      we must take carefully to not deadlock here.
      
      Thread 1			Thread 2
      				|->btrfs_commit_transaction()
      					|->set trans type(COMMIT_DOING)
      					|->btrfs_scrub_paused()(blocked)
      |->join_transaction(blocked)
      
      Move btrfs_scrub_paused() before setting trans type which means we can
      still join a transaction when commiting_transaction is blocked.
      Signed-off-by: NWang Shilong <wangsl.fnst@cn.fujitsu.com>
      Suggested-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      cb7ab021
    • L
      Btrfs: introduce a head ref rbtree · c46effa6
      Liu Bo 提交于
      The way how we process delayed refs is
      1) get a bunch of head refs,
      2) pick up one head ref,
      3) go one node back for any delayed ref updates.
      
      The head ref is also linked in the same rbtree as the delayed ref is,
      so in 1) stage, we have to walk one by one including not only head refs, but
      delayed refs.
      
      When we have a great number of delayed refs pending to process,
      this'll cost time a lot.
      
      Here we introduce a head ref specific rbtree, it only has head refs, so troubles
      go away.
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      c46effa6
  2. 21 11月, 2013 1 次提交
    • L
      Btrfs: fix lockdep error in async commit · b1a06a4b
      Liu Bo 提交于
      Lockdep complains about btrfs's async commit:
      
      [ 2372.462171] [ BUG: bad unlock balance detected! ]
      [ 2372.462191] 3.12.0+ #32 Tainted: G        W
      [ 2372.462209] -------------------------------------
      [ 2372.462228] ceph-osd/14048 is trying to release lock (sb_internal) at:
      [ 2372.462275] [<ffffffffa022cb10>] btrfs_commit_transaction_async+0x1b0/0x2a0 [btrfs]
      [ 2372.462305] but there are no more locks to release!
      [ 2372.462324]
      [ 2372.462324] other info that might help us debug this:
      [ 2372.462349] no locks held by ceph-osd/14048.
      [ 2372.462367]
      [ 2372.462367] stack backtrace:
      [ 2372.462386] CPU: 2 PID: 14048 Comm: ceph-osd Tainted: G        W    3.12.0+ #32
      [ 2372.462414] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS 080015  11/09/2011
      [ 2372.462455]  ffffffffa022cb10 ffff88007490fd28 ffffffff816f094a ffff8800378aa320
      [ 2372.462491]  ffff88007490fd50 ffffffff810adf4c ffff8800378aa320 ffff88009af97650
      [ 2372.462526]  ffffffffa022cb10 ffff88007490fd88 ffffffff810b01ee ffff8800898c0000
      [ 2372.462562] Call Trace:
      [ 2372.462584]  [<ffffffffa022cb10>] ? btrfs_commit_transaction_async+0x1b0/0x2a0 [btrfs]
      [ 2372.462619]  [<ffffffff816f094a>] dump_stack+0x45/0x56
      [ 2372.462642]  [<ffffffff810adf4c>] print_unlock_imbalance_bug+0xec/0x100
      [ 2372.462677]  [<ffffffffa022cb10>] ? btrfs_commit_transaction_async+0x1b0/0x2a0 [btrfs]
      [ 2372.462710]  [<ffffffff810b01ee>] lock_release+0x18e/0x210
      [ 2372.462742]  [<ffffffffa022cb36>] btrfs_commit_transaction_async+0x1d6/0x2a0 [btrfs]
      [ 2372.462783]  [<ffffffffa025a7ce>] btrfs_ioctl_start_sync+0x3e/0xc0 [btrfs]
      [ 2372.462822]  [<ffffffffa025f1d3>] btrfs_ioctl+0x4c3/0x1f70 [btrfs]
      [ 2372.462849]  [<ffffffff812c0321>] ? avc_has_perm+0x121/0x1b0
      [ 2372.462873]  [<ffffffff812c0224>] ? avc_has_perm+0x24/0x1b0
      [ 2372.462897]  [<ffffffff8107ecc8>] ? sched_clock_cpu+0xa8/0x100
      [ 2372.462922]  [<ffffffff8117b145>] do_vfs_ioctl+0x2e5/0x4e0
      [ 2372.462946]  [<ffffffff812c19e6>] ? file_has_perm+0x86/0xa0
      [ 2372.462969]  [<ffffffff8117b3c1>] SyS_ioctl+0x81/0xa0
      [ 2372.462991]  [<ffffffff817045a4>] tracesys+0xdd/0xe2
      
      ====================================================
      
      It's because that we don't do the right thing when checking if it's ok to
      tell lockdep that we're trying to release the rwsem.
      
      If the trans handle's type is TRANS_ATTACH, we won't acquire the freeze rwsem, but
      as TRANS_ATTACH fits the check (trans < TRANS_JOIN_NOLOCK), we'll release the freeze
      rwsem, which makes lockdep complains a lot.
      Reported-by: NMa Jianpeng <majianpeng@gmail.com>
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      b1a06a4b
  3. 12 11月, 2013 8 次提交
    • M
      Btrfs: rename btrfs_start_all_delalloc_inodes · 91aef86f
      Miao Xie 提交于
      rename the function -- btrfs_start_all_delalloc_inodes(), and make its
      name be compatible to btrfs_wait_ordered_roots(), since they are always
      used at the same place.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      91aef86f
    • M
      Btrfs: don't wait for the completion of all the ordered extents · b0244199
      Miao Xie 提交于
      It is very likely that there are lots of ordered extents in the filesytem,
      if we wait for the completion of all of them when we want to reclaim some
      space for the metadata space reservation, we would be blocked for a long
      time. The performance would drop down suddenly for a long time.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      b0244199
    • F
      Btrfs: fix memory leaks on transaction commit failure · 80d94fb3
      Filipe David Borba Manana 提交于
      Structures of the types tree_mod_elem and qgroup_update are allocated
      during transaction commit but were not being released if the call to
      btrfs_run_delayed_items() returned an error.
      
      Stack trace reported by kmemleak:
      
      unreferenced object 0xffff880679f0b398 (size 128):
        comm "umount", pid 21508, jiffies 4295967793 (age 36718.112s)
        hex dump (first 32 bytes):
          60 b5 f0 79 06 88 ff ff 00 00 00 00 00 00 00 00  `..y............
          00 00 00 00 00 00 00 00 50 1c 00 00 00 00 00 00  ........P.......
        backtrace:
          [<ffffffff81742d26>] kmemleak_alloc+0x26/0x50
          [<ffffffff811889c2>] kmem_cache_alloc_trace+0x112/0x200
          [<ffffffffa046f2d3>] tree_mod_log_insert_key.constprop.45+0x93/0x150 [btrfs]
          [<ffffffffa04720f9>] __btrfs_cow_block+0x299/0x4f0 [btrfs]
          [<ffffffffa0472510>] btrfs_cow_block+0x120/0x1f0 [btrfs]
          [<ffffffffa0476679>] btrfs_search_slot+0x449/0x930 [btrfs]
          [<ffffffffa048eecf>] btrfs_lookup_inode+0x2f/0xa0 [btrfs]
          [<ffffffffa04eb49c>] __btrfs_update_delayed_inode+0x1c/0x1d0 [btrfs]
          [<ffffffffa04eb9e2>] __btrfs_run_delayed_items+0x162/0x1e0 [btrfs]
          [<ffffffffa04eba63>] btrfs_delayed_inode_exit+0x3/0x20 [btrfs]
          [<ffffffffa0499c63>] btrfs_commit_transaction+0x203/0xa50 [btrfs]
          [<ffffffffa046b519>] btrfs_sync_fs+0x69/0x110 [btrfs]
          [<ffffffff811cb210>] __sync_filesystem+0x30/0x60
          [<ffffffff811cb2bb>] sync_filesystem+0x4b/0x70
          [<ffffffff8119ce7b>] generic_shutdown_super+0x3b/0xf0
          [<ffffffff8119cfc6>] kill_anon_super+0x16/0x30
      unreferenced object 0xffff880677e0dd88 (size 32):
        comm "umount", pid 21508, jiffies 4295967793 (age 36718.112s)
        hex dump (first 32 bytes):
          78 75 11 a9 06 88 ff ff 00 c0 e0 77 06 88 ff ff  xu.........w....
          40 c3 a2 70 06 88 ff ff 00 00 00 00 00 00 00 00  @..p............
        backtrace:
          [<ffffffff81742d26>] kmemleak_alloc+0x26/0x50
          [<ffffffff811889c2>] kmem_cache_alloc_trace+0x112/0x200
          [<ffffffffa04fa54f>] btrfs_qgroup_record_ref+0xf/0x90 [btrfs]
          [<ffffffffa04e1914>] btrfs_add_delayed_tree_ref+0xf4/0x170 [btrfs]
          [<ffffffffa048518a>] btrfs_free_tree_block+0x9a/0x220 [btrfs]
          [<ffffffffa0472163>] __btrfs_cow_block+0x303/0x4f0 [btrfs]
          [<ffffffffa0472510>] btrfs_cow_block+0x120/0x1f0 [btrfs]
          [<ffffffffa0476679>] btrfs_search_slot+0x449/0x930 [btrfs]
          [<ffffffffa048eecf>] btrfs_lookup_inode+0x2f/0xa0 [btrfs]
          [<ffffffffa04eb49c>] __btrfs_update_delayed_inode+0x1c/0x1d0 [btrfs]
          [<ffffffffa04eb9e2>] __btrfs_run_delayed_items+0x162/0x1e0 [btrfs]
          [<ffffffffa04eba63>] btrfs_delayed_inode_exit+0x3/0x20 [btrfs]
          [<ffffffffa0499c63>] btrfs_commit_transaction+0x203/0xa50 [btrfs]
          [<ffffffffa046b519>] btrfs_sync_fs+0x69/0x110 [btrfs]
          [<ffffffff811cb210>] __sync_filesystem+0x30/0x60
          [<ffffffff811cb2bb>] sync_filesystem+0x4b/0x70
      Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      80d94fb3
    • M
      Btrfs: fix BUG_ON() casued by the reserved space migration · 20dd2cbf
      Miao Xie 提交于
      When we did space balance and snapshot creation at the same time, we might
      meet the following oops:
       kernel BUG at fs/btrfs/inode.c:3038!
       [SNIP]
       Call Trace:
       [<ffffffffa0411ec7>] btrfs_orphan_cleanup+0x293/0x407 [btrfs]
       [<ffffffffa042dc45>] btrfs_mksubvol.isra.28+0x259/0x373 [btrfs]
       [<ffffffffa042de85>] btrfs_ioctl_snap_create_transid+0x126/0x156 [btrfs]
       [<ffffffffa042dff1>] btrfs_ioctl_snap_create_v2+0xd0/0x121 [btrfs]
       [<ffffffffa0430b2c>] btrfs_ioctl+0x414/0x1854 [btrfs]
       [<ffffffff813b60b7>] ? __do_page_fault+0x305/0x379
       [<ffffffff811215a9>] vfs_ioctl+0x1d/0x39
       [<ffffffff81121d7c>] do_vfs_ioctl+0x32d/0x3e2
       [<ffffffff81057fe7>] ? finish_task_switch+0x80/0xb8
       [<ffffffff81121e88>] SyS_ioctl+0x57/0x83
       [<ffffffff813b39ff>] ? do_device_not_available+0x12/0x14
       [<ffffffff813b99c2>] system_call_fastpath+0x16/0x1b
       [SNIP]
       RIP  [<ffffffffa040da40>] btrfs_orphan_add+0xc3/0x126 [btrfs]
      
      The reason of the problem is that the relocation root creation stole
      the reserved space, which was reserved for orphan item deletion.
      
      There are several ways to fix this problem, one is to increasing
      the reserved space size of the space balace, and then we can use
      that space to create the relocation tree for each fs/file trees.
      But it is hard to calculate the suitable size because we doesn't
      know how many fs/file trees we need relocate.
      
      We fixed this problem by reserving the space for relocation root creation
      actively since the space it need is very small (one tree block, used for
      root node copy), then we use that reserved space to create the
      relocation tree. If we don't reserve space for relocation tree creation,
      we will use the reserved space of the balance.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      20dd2cbf
    • J
      Btrfs: fix two use-after-free bugs with transaction cleanup · 724e2315
      Josef Bacik 提交于
      I was noticing the slab redzone stuff going off every once and a while during
      transaction aborts.  This was caused by two things
      
      1) We would walk the pending snapshots and set their error to -ECANCELED.  We
      don't need to do this, the snapshot stuff waits for a transaction commit and if
      there is a problem we just free our pending snapshot object and exit.  Doing
      this was causing us to touch the pending snapshot object after the thing had
      already been freed.
      
      2) We were freeing the transaction manually with wanton disregard for it's
      use_count reference counter.  To fix this I cleaned up the transaction freeing
      loop to either wait for the transaction commit to finish if it was in the middle
      of that (since it will be cleaned and freed up there) or to do the cleanup
      oursevles.
      
      I also moved the global "kill all things dirty everywhere" stuff outside of the
      transaction cleanup loop since that only needs to be done once.  With this patch
      I'm no longer seeing slab corruption because of use after frees.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      724e2315
    • J
      Btrfs: remove all BUG_ON()'s from commit_cowonly_roots · c16ce190
      Josef Bacik 提交于
      Noticed this when forcing errors to happen during delayed ref running.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      c16ce190
    • J
      Btrfs: cleanup transaction on abort · 4e121c06
      Josef Bacik 提交于
      If we abort not during a transaction commit we won't clean up anything until we
      unmount.  Unfortunately if we abort in the middle of writing out an ordered
      extent we won't clean it up and if somebody is waiting on that ordered extent
      they will wait forever.  To fix this just make the transaction kthread call the
      cleanup transaction stuff if it notices theres an error, and make
      btrfs_end_transaction wake up the transaction kthread if there is an error.
      Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      4e121c06
    • J
      Btrfs: reset intwrite on transaction abort · e0228285
      Josef Bacik 提交于
      If we abort a transaction in the middle of a commit we weren't undoing the
      intwrite locking.  This patch fixes that problem.
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      e0228285
  4. 05 10月, 2013 1 次提交
    • J
      Btrfs: fix transid verify errors when recovering log tree · 60e7cd3a
      Josef Bacik 提交于
      If we crash with a log, remount and recover that log, and then crash before we
      can commit another transaction we will get transid verify errors on the next
      mount.  This is because we were not zero'ing out the log when we committed the
      transaction after recovery.  This is ok as long as we commit another transaction
      at some point in the future, but if you abort or something else goes wrong you
      can end up in this weird state because the recovery stuff says that the tree log
      should have a generation+1 of the super generation, which won't be the case of
      the transaction that was started for recovery.  Fix this by removing the check
      and _always_ zero out the log portion of the super when we commit a transaction.
      This fixes the transid verify issues I was seeing with my force errors tests.
      Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      60e7cd3a
  5. 21 9月, 2013 1 次提交
  6. 01 9月, 2013 7 次提交
  7. 10 8月, 2013 1 次提交
  8. 02 7月, 2013 1 次提交
    • J
      Btrfs: make the chunk allocator completely tree lockless · 6df9a95e
      Josef Bacik 提交于
      When adjusting the enospc rules for relocation I ran into a deadlock because we
      were relocating the only system chunk and that forced us to try and allocate a
      new system chunk while holding locks in the chunk tree, which caused us to
      deadlock.  To fix this I've moved all of the dev extent addition and chunk
      addition out to the delayed chunk completion stuff.  We still keep the in-memory
      stuff which makes sure everything is consistent.
      
      One change I had to make was to search the commit root of the device tree to
      find a free dev extent, and hold onto any chunk em's that we allocated in that
      transaction so we do not allocate the same dev extent twice.  This has the side
      effect of fixing a bug with balance that has been there ever since balance
      existed.  Basically you can free a block group and it's dev extent and then
      immediately allocate that dev extent for a new block group and write stuff to
      that dev extent, all within the same transaction.  So if you happen to crash
      during a balance you could come back to a completely broken file system.  This
      patch should keep these sort of things from happening in the future since we
      won't be able to allocate free'd dev extents until after the transaction
      commits.  This has passed all of the xfstests and my super annoying stress test
      followed by a balance.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      6df9a95e
  9. 01 7月, 2013 3 次提交
    • W
      Btrfs: fix the comment typo for btrfs_attach_transaction_barrier · 90b6d283
      Wang Sheng-Hui 提交于
      The comment is for btrfs_attach_transaction_barrier, not for
      btrfs_attach_transaction. Fix the typo.
      Signed-off-by: NWang Sheng-Hui <shhuiw@gmail.com>
      Acked-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      90b6d283
    • J
      Btrfs: fix transaction throttling for delayed refs · 1be41b78
      Josef Bacik 提交于
      Dave has this fs_mark script that can make btrfs abort with sufficient amount of
      ram.  This is because with more ram we can keep more dirty metadata in cache
      which in a round about way makes for many more pending delayed refs.  What
      happens is we end up not throttling the transaction enough so when we go to
      commit the transaction when we've completely filled the file system we'll
      abort() because we use all of the space in the global reserve and we still have
      delayed refs to run.  To fix this we need to make the delayed ref flushing and
      the transaction throttling dependant upon the number of delayed refs that we
      have instead of how much reserved space is left in the global reserve.  With
      this patch we not only stop aborting transactions but we also get a smoother run
      speed with fs_mark and it makes us about 10% faster.  Thanks,
      Reported-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      1be41b78
    • J
      Btrfs: stop waiting on current trans if we aborted · 501407aa
      Josef Bacik 提交于
      I hit a hang when run_delayed_refs returned an error in the beginning of
      btrfs_commit_transaction.  If we decide we need to commit the transaction in
      btrfs_end_transaction we'll set BLOCKED and start to commit, but if we get an
      error this early on we'll just exit without committing.  This is fine, except
      that anybody else who tried to start a transaction will sit in
      wait_current_trans() since we're set to BLOCKED and we never set it to something
      else and woke people up.  To fix this we want to check for trans->aborted
      everywhere we wait for the transaction state to change, and make
      btrfs_abort_transaction() wake up any waiters there may be.  All the callers
      will notice that the transaction has aborted and exit out properly.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      501407aa
  10. 14 6月, 2013 8 次提交
    • M
      Btrfs: merge pending IO for tree log write back · c6adc9cc
      Miao Xie 提交于
      Before applying this patch, we flushed the log tree of the fs/file
      tree firstly, and then flushed the log root tree. It is ineffective,
      especially on the hard disk. This patch improved this problem by wrapping
      the above two flushes by the same blk_plug.
      
      By test, the performance of the sync write went up ~60%(2.9MB/s -> 4.6MB/s)
      on my scsi disk whose disk buffer was enabled.
      
      Test step:
       # mkfs.btrfs -f -m single <disk>
       # mount <disk> <mnt>
       # dd if=/dev/zero of=<mnt>/file0 bs=32K count=1024 oflag=sync
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      c6adc9cc
    • M
      Btrfs: make the state of the transaction more readable · 4a9d8bde
      Miao Xie 提交于
      We used 3 variants to track the state of the transaction, it was complex
      and wasted the memory space. Besides that, it was hard to understand that
      which types of the transaction handles should be blocked in each transaction
      state, so the developers often made mistakes.
      
      This patch improved the above problem. In this patch, we define 6 states
      for the transaction,
        enum btrfs_trans_state {
      	TRANS_STATE_RUNNING		= 0,
      	TRANS_STATE_BLOCKED		= 1,
      	TRANS_STATE_COMMIT_START	= 2,
      	TRANS_STATE_COMMIT_DOING	= 3,
      	TRANS_STATE_UNBLOCKED		= 4,
      	TRANS_STATE_COMPLETED		= 5,
      	TRANS_STATE_MAX			= 6,
        }
      and just use 1 variant to track those state.
      
      In order to make the blocked handle types for each state more clear,
      we introduce a array:
        unsigned int btrfs_blocked_trans_types[TRANS_STATE_MAX] = {
      	[TRANS_STATE_RUNNING]		= 0U,
      	[TRANS_STATE_BLOCKED]		= (__TRANS_USERSPACE |
      					   __TRANS_START),
      	[TRANS_STATE_COMMIT_START]	= (__TRANS_USERSPACE |
      					   __TRANS_START |
      					   __TRANS_ATTACH),
      	[TRANS_STATE_COMMIT_DOING]	= (__TRANS_USERSPACE |
      					   __TRANS_START |
      					   __TRANS_ATTACH |
      					   __TRANS_JOIN),
      	[TRANS_STATE_UNBLOCKED]		= (__TRANS_USERSPACE |
      					   __TRANS_START |
      					   __TRANS_ATTACH |
      					   __TRANS_JOIN |
      					   __TRANS_JOIN_NOLOCK),
      	[TRANS_STATE_COMPLETED]		= (__TRANS_USERSPACE |
      					   __TRANS_START |
      					   __TRANS_ATTACH |
      					   __TRANS_JOIN |
      					   __TRANS_JOIN_NOLOCK),
        }
      it is very intuitionistic.
      
      Besides that, because we remove ->in_commit in transaction structure, so
      the lock ->commit_lock which was used to protect it is unnecessary, remove
      ->commit_lock.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      4a9d8bde
    • M
      Btrfs: remove the time check in btrfs_commit_transaction() · 581227d0
      Miao Xie 提交于
      We checked the commit time to avoid committing the transaction
      frequently, but it is unnecessary because:
      - It made the transaction commit spend more time, and delayed the
        operation of the external writers(TRANS_START/TRANS_USERSPACE).
      - Except the space that we have to commit transaction, such as
        snapshot creation, btrfs doesn't commit the transaction on its
        own initiative.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      581227d0
    • M
      Btrfs: remove unnecessary varient ->num_joined in btrfs_transaction structure · 3f1e3fa6
      Miao Xie 提交于
      We used ->num_joined track if there were some writers which join the current
      transaction when the committer was sleeping. If some writers joined the current
      transaction, we has to continue the while loop to do some necessary stuff, such
      as flush the ordered operations. But it is unnecessary because we will do it
      after the while loop.
      
      Besides that, tracking ->num_joined would make the committer drop into the while
      loop when there are lots of internal writers(TRANS_JOIN).
      
      So we remove ->num_joined and don't track if there are some writers which join
      the current transaction when the committer is sleeping.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      3f1e3fa6
    • M
      Btrfs: don't flush the delalloc inodes in the while loop if flushoncommit is set · 82436617
      Miao Xie 提交于
      It is unnecessary to flush the delalloc inodes again and again because
      we don't care the dirty pages which are introduced after the flush, and
      they will be flush in the transaction commit.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      82436617
    • M
      Btrfs: don't wait for all the writers circularly during the transaction commit · 0860adfd
      Miao Xie 提交于
      btrfs_commit_transaction has the following loop before we commit the
      transaction.
      
      do {
          // attempt to do some useful stuff and/or sleep
      } while (atomic_read(&cur_trans->num_writers) > 1 ||
      	 (should_grow && cur_trans->num_joined != joined));
      
      This is used to prevent from the TRANS_START to get in the way of a
      committing transaction. But it does not prevent from TRANS_JOIN, that
      is we would do this loop for a long time if some writers JOIN the
      current transaction endlessly.
      
      Because we need join the current transaction to do some useful stuff,
      we can not block TRANS_JOIN here. So we introduce a external writer
      counter, which is used to count the TRANS_USERSPACE/TRANS_START writers.
      If the external writer counter is zero, we can break the above loop.
      
      In order to make the code more clear, we don't use enum variant
      to define the type of the transaction handle, use bitmask instead.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      0860adfd
    • M
      Btrfs: remove the code for the impossible case in cleanup_transaction() · 25d8c284
      Miao Xie 提交于
      If the transaction is removed from the transaction list, it means the
      transaction has been committed successfully. So it is impossible to
      call cleanup_transaction(), otherwise there is something wrong with
      the code logic. Thus, we use BUG_ON() instead of the original handle.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      25d8c284
    • M
      Btrfs: just flush the delalloc inodes in the source tree before snapshot creation · 6a03843d
      Miao Xie 提交于
      Before applying this patch, we need flush all the delalloc inodes in
      the fs when we want to create a snapshot, it wastes time, and make
      the transaction commit be blocked for a long time. It means some other
      user operation would also be blocked for a long time.
      
      This patch improves this problem, we just flush the delalloc inodes that
      in the source trees before snapshot creation, so the transaction commit
      will complete quickly.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      6a03843d