1. 12 11月, 2013 10 次提交
    • J
      Btrfs: stop all workers after we free block groups · 96192499
      Josef Bacik 提交于
      Stefan was hitting a panic in the async worker stuff because we had outstanding
      read bios while we were stopping the worker threads.  You could reproduce this
      easily if you mount -o nospace_cache and ran generic/273.  This is because the
      caching thread stuff is still going and we were stopping all the worker threads.
      We need to stop the workers after this work is done, and the free block groups
      code will wait for all the caching threads to stop first so we don't run into
      this problem.  With this patch we no longer panic.  Thanks,
      
      Cc: stable@vger.kernel.org
      Reported-by: NStefan Behrens <sbehrens@giantdisaster.de>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      96192499
    • J
      Btrfs: free up block groups after everything · 2b1360da
      Josef Bacik 提交于
      If we abort a transaction we will do the tree log cleanup at unmount, but this
      happens after we free up the block groups.  This makes all the leak detection
      warnings go off because we think we've leaked space but in reality we just
      haven't cleaned it up yet.  So instead do the block group cleanup stuff after
      free'ing the fs roots so we don't get these warnings.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      2b1360da
    • J
      Btrfs: do not free the dirty bytes from the trans block rsv on cleanup · eb58bb37
      Josef Bacik 提交于
      The transactions should be cleaning up their reservations on failure, this just
      causes us to have warnings on unmount because we go negative by free'ing
      reservations that have already been free'ed.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      eb58bb37
    • F
      Btrfs: improve inode hash function/inode lookup · 778ba82b
      Filipe David Borba Manana 提交于
      Currently the hash value used for adding an inode to the VFS's inode
      hash table consists of the plain inode number, which is a 64 bits
      integer. This results in hash table buckets (hlist_head lists) with
      too many elements for at least 2 important scenarios:
      
      1) When we have many subvolumes. Each subvolume has its own btree
         where its files and directories are added to, and each has its
         own objectid (inode number) namespace. This means that if we have
         N subvolumes, and all have inode number X associated to a file or
         directory, the corresponding inodes all map to the same hash table
         entry, resulting in a bucket (hlist_head list) with N elements;
      
      2) On 32 bits machines. Th VFS hash values are unsigned longs, which
         are 32 bits wide on 32 bits machines, and the inode (objectid)
         numbers are 64 bits unsigned integers. We simply cast the inode
         numbers to hash values, which means that for all inodes with the
         same 32 bits lower half, the same hash bucket is used for all of
         them. For example, all inodes with a number (objectid) between
         0x0000_0000_ffff_ffff and 0xffff_ffff_ffff_ffff will end up in
         the same hash table bucket.
      
      This change ensures the inode's hash value depends both on the
      objectid (inode number) and its subvolume's (btree root) objectid.
      For 32 bits machines, this change gives better entropy by making
      the hash value depend on both the upper and lower 32 bits of the
      64 bits hash previously computed.
      Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      778ba82b
    • R
      btrfs: remove unused parameter from btrfs_header_fsid · 0a4e5586
      Ross Kirk 提交于
      Remove unused parameter, 'eb'. Unused since introduction in
      5f39d397
      
      Updated to be rebased against current upstream and correct diff supplied this time!
      Signed-off-by: NRoss Kirk <ross.kirk@gmail.com>
      Reviewed-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      0a4e5586
    • J
      Btrfs: fix two use-after-free bugs with transaction cleanup · 724e2315
      Josef Bacik 提交于
      I was noticing the slab redzone stuff going off every once and a while during
      transaction aborts.  This was caused by two things
      
      1) We would walk the pending snapshots and set their error to -ECANCELED.  We
      don't need to do this, the snapshot stuff waits for a transaction commit and if
      there is a problem we just free our pending snapshot object and exit.  Doing
      this was causing us to touch the pending snapshot object after the thing had
      already been freed.
      
      2) We were freeing the transaction manually with wanton disregard for it's
      use_count reference counter.  To fix this I cleaned up the transaction freeing
      loop to either wait for the transaction commit to finish if it was in the middle
      of that (since it will be cleaned and freed up there) or to do the cleanup
      oursevles.
      
      I also moved the global "kill all things dirty everywhere" stuff outside of the
      transaction cleanup loop since that only needs to be done once.  With this patch
      I'm no longer seeing slab corruption because of use after frees.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      724e2315
    • J
      Btrfs: don't delete ordered roots from list during cleanup · 1de2cfde
      Josef Bacik 提交于
      During transaction cleanup after an abort we are just removing roots from the
      ordered roots list which is incorrect.  We have a BUG_ON() to make sure that the
      root is still part of the ordered roots list when we put our ordered extent
      which we were tripping in this case.  So do like we do everywhere else and just
      move it to the tail of the ordered roots list and allow the normal cleanup to
      take care of stuff.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      1de2cfde
    • J
      Btrfs: cleanup transaction on abort · 4e121c06
      Josef Bacik 提交于
      If we abort not during a transaction commit we won't clean up anything until we
      unmount.  Unfortunately if we abort in the middle of writing out an ordered
      extent we won't clean it up and if somebody is waiting on that ordered extent
      they will wait forever.  To fix this just make the transaction kthread call the
      cleanup transaction stuff if it notices theres an error, and make
      btrfs_end_transaction wake up the transaction kthread if there is an error.
      Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      4e121c06
    • J
      Btrfs: add a sanity test for btrfs_split_item · 06ea65a3
      Josef Bacik 提交于
      While looking at somebodys corruption I became completely convinced that
      btrfs_split_item was broken, so I wrote this test to verify that it was working
      as it was supposed to.  Thankfully it appears to be working as intended, so just
      add this test to make sure nobody breaks it in the future.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      06ea65a3
    • S
      Btrfs: eliminate the exceptional root_tree refs=0 · 69e9c6c6
      Stefan Behrens 提交于
      The fact that btrfs_root_refs() returned 0 for the tree_root caused
      bugs in the past, therefore it is set to 1 with this patch and
      (hopefully) all affected code is adapted to this change.
      
      I verified this change by temporarily adding WARN_ON() checks
      everywhere where btrfs_root_refs() is used, checking whether the
      logic of the code is changed by btrfs_root_refs() returning 1
      instead of 0 for root->root_key.objectid == BTRFS_ROOT_TREE_OBJECTID.
      With these added checks, I ran the xfstests './check -g auto'.
      
      The two roots chunk_root and log_root_tree that are only referenced
      by the superblock and the log_roots below the log_root_tree still
      have btrfs_root_refs() == 0, only the tree_root is changed.
      Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      69e9c6c6
  2. 11 10月, 2013 1 次提交
    • M
      Btrfs: fix oops caused by the space balance and dead roots · c00869f1
      Miao Xie 提交于
      When doing space balance and subvolume destroy at the same time, we met
      the following oops:
      
      kernel BUG at fs/btrfs/relocation.c:2247!
      RIP: 0010: [<ffffffffa04cec16>] prepare_to_merge+0x154/0x1f0 [btrfs]
      Call Trace:
       [<ffffffffa04b5ab7>] relocate_block_group+0x466/0x4e6 [btrfs]
       [<ffffffffa04b5c7a>] btrfs_relocate_block_group+0x143/0x275 [btrfs]
       [<ffffffffa0495c56>] btrfs_relocate_chunk.isra.27+0x5c/0x5a2 [btrfs]
       [<ffffffffa0459871>] ? btrfs_item_key_to_cpu+0x15/0x31 [btrfs]
       [<ffffffffa048b46a>] ? btrfs_get_token_64+0x7e/0xcd [btrfs]
       [<ffffffffa04a3467>] ? btrfs_tree_read_unlock_blocking+0xb2/0xb7 [btrfs]
       [<ffffffffa049907d>] btrfs_balance+0x9c7/0xb6f [btrfs]
       [<ffffffffa049ef84>] btrfs_ioctl_balance+0x234/0x2ac [btrfs]
       [<ffffffffa04a1e8e>] btrfs_ioctl+0xd87/0x1ef9 [btrfs]
       [<ffffffff81122f53>] ? path_openat+0x234/0x4db
       [<ffffffff813c3b78>] ? __do_page_fault+0x31d/0x391
       [<ffffffff810f8ab6>] ? vma_link+0x74/0x94
       [<ffffffff811250f5>] vfs_ioctl+0x1d/0x39
       [<ffffffff811258c8>] do_vfs_ioctl+0x32d/0x3e2
       [<ffffffff811259d4>] SyS_ioctl+0x57/0x83
       [<ffffffff813c3bfa>] ? do_page_fault+0xe/0x10
       [<ffffffff813c73c2>] system_call_fastpath+0x16/0x1b
      
      It is because we returned the error number if the reference of the root was 0
      when doing space relocation. It was not right here, because though the root
      was dead(refs == 0), but the space it held still need be relocated, or we
      could not remove the block group. So in this case, we should return the root
      no matter it is dead or not.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      c00869f1
  3. 21 9月, 2013 2 次提交
  4. 01 9月, 2013 20 次提交
  5. 14 6月, 2013 7 次提交
    • J
      Btrfs: do not pin while under spin lock · e78417d1
      Josef Bacik 提交于
      When testing a corrupted fs I noticed I was getting sleep while atomic errors
      when the transaction aborted.  This is because btrfs_pin_extent may need to
      allocate memory and we are calling this under the spin lock.  Fix this by moving
      it out and doing the pin after dropping the spin lock but before dropping the
      mutex, the same way it works when delayed refs run normally.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      e78417d1
    • J
      Btrfs: fix qgroup rescan resume on mount · b382a324
      Jan Schmidt 提交于
      When called during mount, we cannot start the rescan worker thread until
      open_ctree is done. This commit restuctures the qgroup rescan internals to
      enable a clean deferral of the rescan resume operation.
      
      First of all, the struct qgroup_rescan is removed, saving us a malloc and
      some initialization synchronizations problems. Its only element (the worker
      struct) now lives within fs_info just as the rest of the rescan code.
      
      Then setting up a rescan worker is split into several reusable stages.
      Currently we have three different rescan startup scenarios:
      	(A) rescan ioctl
      	(B) rescan resume by mount
      	(C) rescan by quota enable
      
      Each case needs its own combination of the four following steps:
      	(1) set the progress [A, C: zero; B: state of umount]
      	(2) commit the transaction [A]
      	(3) set the counters [A, C: zero; B: state of umount]
      	(4) start worker [A, B, C]
      
      qgroup_rescan_init does step (1). There's no extra function added to commit
      a transaction, we've got that already. qgroup_rescan_zero_tracking does
      step (3). Step (4) is nothing more than a call to the generic
      btrfs_queue_worker.
      
      We also get rid of a double check for the rescan progress during
      btrfs_qgroup_account_ref, which is no longer required due to having step 2
      from the list above.
      
      As a side effect, this commit prepares to move the rescan start code from
      btrfs_run_qgroups (which is run during commit) to a less time critical
      section.
      Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      b382a324
    • M
      Btrfs: make the state of the transaction more readable · 4a9d8bde
      Miao Xie 提交于
      We used 3 variants to track the state of the transaction, it was complex
      and wasted the memory space. Besides that, it was hard to understand that
      which types of the transaction handles should be blocked in each transaction
      state, so the developers often made mistakes.
      
      This patch improved the above problem. In this patch, we define 6 states
      for the transaction,
        enum btrfs_trans_state {
      	TRANS_STATE_RUNNING		= 0,
      	TRANS_STATE_BLOCKED		= 1,
      	TRANS_STATE_COMMIT_START	= 2,
      	TRANS_STATE_COMMIT_DOING	= 3,
      	TRANS_STATE_UNBLOCKED		= 4,
      	TRANS_STATE_COMPLETED		= 5,
      	TRANS_STATE_MAX			= 6,
        }
      and just use 1 variant to track those state.
      
      In order to make the blocked handle types for each state more clear,
      we introduce a array:
        unsigned int btrfs_blocked_trans_types[TRANS_STATE_MAX] = {
      	[TRANS_STATE_RUNNING]		= 0U,
      	[TRANS_STATE_BLOCKED]		= (__TRANS_USERSPACE |
      					   __TRANS_START),
      	[TRANS_STATE_COMMIT_START]	= (__TRANS_USERSPACE |
      					   __TRANS_START |
      					   __TRANS_ATTACH),
      	[TRANS_STATE_COMMIT_DOING]	= (__TRANS_USERSPACE |
      					   __TRANS_START |
      					   __TRANS_ATTACH |
      					   __TRANS_JOIN),
      	[TRANS_STATE_UNBLOCKED]		= (__TRANS_USERSPACE |
      					   __TRANS_START |
      					   __TRANS_ATTACH |
      					   __TRANS_JOIN |
      					   __TRANS_JOIN_NOLOCK),
      	[TRANS_STATE_COMPLETED]		= (__TRANS_USERSPACE |
      					   __TRANS_START |
      					   __TRANS_ATTACH |
      					   __TRANS_JOIN |
      					   __TRANS_JOIN_NOLOCK),
        }
      it is very intuitionistic.
      
      Besides that, because we remove ->in_commit in transaction structure, so
      the lock ->commit_lock which was used to protect it is unnecessary, remove
      ->commit_lock.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      4a9d8bde
    • M
      Btrfs: cleanup unnecessary assignment when cleaning up all the residual transaction · ac673879
      Miao Xie 提交于
      When we umount a fs with serious errors, we will invoke btrfs_cleanup_transactions()
      to clean up the residual transaction. At this time, It is impossible to start a new
      transaction, so we needn't assign trans_no_join to 1, and also needn't clear running
      transaction every time we destroy a residual transaction.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      ac673879
    • M
      Btrfs: introduce per-subvolume ordered extent list · 199c2a9c
      Miao Xie 提交于
      The reason we introduce per-subvolume ordered extent list is the same
      as the per-subvolume delalloc inode list.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      199c2a9c
    • M
      Btrfs: introduce per-subvolume delalloc inode list · eb73c1b7
      Miao Xie 提交于
      When we create a snapshot, we need flush all delalloc inodes in the
      fs, just flushing the inodes in the source tree is OK. So we introduce
      per-subvolume delalloc inode list.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      eb73c1b7
    • M
      Btrfs: introduce grab/put functions for the root of the fs/file tree · b0feb9d9
      Miao Xie 提交于
      The grab/put funtions will be used in the next patch, which need grab
      the root object and ensure it is not freed. We use reference counter
      instead of the srcu lock is to aovid blocking the memory reclaim task,
      which invokes synchronize_srcu().
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      b0feb9d9