1. 06 12月, 2016 6 次提交
  2. 28 9月, 2016 1 次提交
  3. 27 9月, 2016 4 次提交
  4. 26 9月, 2016 1 次提交
    • J
      Btrfs: add a flags field to btrfs_fs_info · afcdd129
      Josef Bacik 提交于
      We have a lot of random ints in btrfs_fs_info that can be put into flags.  This
      is mostly equivalent with the exception of how we deal with quota going on or
      off, now instead we set a flag when we are turning it on or off and deal with
      that appropriately, rather than just having a pending state that the current
      quota_enabled gets set to.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      afcdd129
  5. 25 8月, 2016 1 次提交
    • W
      btrfs: fix fsfreeze hang caused by delayed iputs deal · 9e7cc91a
      Wang Xiaoguang 提交于
      When running fstests generic/068, sometimes we got below deadlock:
        xfs_io          D ffff8800331dbb20     0  6697   6693 0x00000080
        ffff8800331dbb20 ffff88007acfc140 ffff880034d895c0 ffff8800331dc000
        ffff880032d243e8 fffffffeffffffff ffff880032d24400 0000000000000001
        ffff8800331dbb38 ffffffff816a9045 ffff880034d895c0 ffff8800331dbba8
        Call Trace:
        [<ffffffff816a9045>] schedule+0x35/0x80
        [<ffffffff816abab2>] rwsem_down_read_failed+0xf2/0x140
        [<ffffffff8118f5e1>] ? __filemap_fdatawrite_range+0xd1/0x100
        [<ffffffff8134f978>] call_rwsem_down_read_failed+0x18/0x30
        [<ffffffffa06631fc>] ? btrfs_alloc_block_rsv+0x2c/0xb0 [btrfs]
        [<ffffffff810d32b5>] percpu_down_read+0x35/0x50
        [<ffffffff81217dfc>] __sb_start_write+0x2c/0x40
        [<ffffffffa067f5d5>] start_transaction+0x2a5/0x4d0 [btrfs]
        [<ffffffffa067f857>] btrfs_join_transaction+0x17/0x20 [btrfs]
        [<ffffffffa068ba34>] btrfs_evict_inode+0x3c4/0x5d0 [btrfs]
        [<ffffffff81230a1a>] evict+0xba/0x1a0
        [<ffffffff812316b6>] iput+0x196/0x200
        [<ffffffffa06851d0>] btrfs_run_delayed_iputs+0x70/0xc0 [btrfs]
        [<ffffffffa067f1d8>] btrfs_commit_transaction+0x928/0xa80 [btrfs]
        [<ffffffffa0646df0>] btrfs_freeze+0x30/0x40 [btrfs]
        [<ffffffff81218040>] freeze_super+0xf0/0x190
        [<ffffffff81229275>] do_vfs_ioctl+0x4a5/0x5c0
        [<ffffffff81003176>] ? do_audit_syscall_entry+0x66/0x70
        [<ffffffff810038cf>] ? syscall_trace_enter_phase1+0x11f/0x140
        [<ffffffff81229409>] SyS_ioctl+0x79/0x90
        [<ffffffff81003c12>] do_syscall_64+0x62/0x110
        [<ffffffff816acbe1>] entry_SYSCALL64_slow_path+0x25/0x25
      
      >From this warning, freeze_super() already holds SB_FREEZE_FS, but
      btrfs_freeze() will call btrfs_commit_transaction() again, if
      btrfs_commit_transaction() finds that it has delayed iputs to handle,
      it'll start_transaction(), which will try to get SB_FREEZE_FS lock
      again, then deadlock occurs.
      
      The root cause is that in btrfs, sync_filesystem(sb) does not make
      sure all metadata is updated. There still maybe some codes adding
      delayed iputs, see below sample race window:
      
               CPU1                                  |         CPU2
      |-> freeze_super()                             |
          |-> sync_filesystem(sb);                   |
          |                                          |-> cleaner_kthread()
          |                                          |   |-> btrfs_delete_unused_bgs()
          |                                          |       |-> btrfs_remove_chunk()
          |                                          |           |-> btrfs_remove_block_group()
          |                                          |               |-> btrfs_add_delayed_iput()
          |                                          |
          |-> sb->s_writers.frozen = SB_FREEZE_FS;   |
          |-> sb_wait_write(sb, SB_FREEZE_FS);       |
          |   acquire SB_FREEZE_FS lock.             |
          |                                          |
          |-> btrfs_freeze()                         |
              |-> btrfs_commit_transaction()         |
                  |-> btrfs_run_delayed_iputs()      |
                  |   will handle delayed iputs,     |
                  |   that means start_transaction() |
                  |   will be called, which will try |
                  |   to get SB_FREEZE_FS lock.      |
      
      To fix this issue, introduce a "int fs_frozen" to record internally whether
      fs has been frozen. If fs has been frozen, we can not handle delayed iputs.
      Signed-off-by: NWang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      [ add comment to btrfs_freeze ]
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      9e7cc91a
  6. 26 7月, 2016 3 次提交
  7. 23 6月, 2016 1 次提交
    • J
      Btrfs: track transid for delayed ref flushing · 31b9655f
      Josef Bacik 提交于
      Using the offwakecputime bpf script I noticed most of our time was spent waiting
      on the delayed ref throttling.  This is what is supposed to happen, but
      sometimes the transaction can commit and then we're waiting for throttling that
      doesn't matter anymore.  So change this stuff to be a little smarter by tracking
      the transid we were in when we initiated the throttling.  If the transaction we
      get is different then we can just bail out.  This resulted in a 50% speedup in
      my fs_mark test, and reduced the amount of time spent throttling by 60 seconds
      over the entire run (which is about 30 minutes).  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      31b9655f
  8. 18 6月, 2016 2 次提交
  9. 13 5月, 2016 1 次提交
  10. 12 5月, 2016 2 次提交
    • D
      btrfs: build fixup for qgroup_account_snapshot · 2c1984f2
      David Sterba 提交于
      The macro btrfs_std_error got renamed to btrfs_handle_fs_error in an
      independent branch for the same merge target (4.7). To make the code
      compilable for bisectability reasons, add a temporary stub.
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      2c1984f2
    • Q
      btrfs: qgroup: Fix qgroup accounting when creating snapshot · 6426c7ad
      Qu Wenruo 提交于
      Current btrfs qgroup design implies a requirement that after calling
      btrfs_qgroup_account_extents() there must be a commit root switch.
      
      Normally this is OK, as btrfs_qgroup_accounting_extents() is only called
      inside btrfs_commit_transaction() just be commit_cowonly_roots().
      
      However there is a exception at create_pending_snapshot(), which will
      call btrfs_qgroup_account_extents() but no any commit root switch.
      
      In case of creating a snapshot whose parent root is itself (create a
      snapshot of fs tree), it will corrupt qgroup by the following trace:
      (skipped unrelated data)
      ======
      btrfs_qgroup_account_extent: bytenr = 29786112, num_bytes = 16384, nr_old_roots = 0, nr_new_roots = 1
      qgroup_update_counters: qgid = 5, cur_old_count = 0, cur_new_count = 1, rfer = 0, excl = 0
      qgroup_update_counters: qgid = 5, cur_old_count = 0, cur_new_count = 1, rfer = 16384, excl = 16384
      btrfs_qgroup_account_extent: bytenr = 29786112, num_bytes = 16384, nr_old_roots = 0, nr_new_roots = 0
      ======
      
      The problem here is in first qgroup_account_extent(), the
      nr_new_roots of the extent is 1, which means its reference got
      increased, and qgroup increased its rfer and excl.
      
      But at second qgroup_account_extent(), its reference got decreased, but
      between these two qgroup_account_extent(), there is no switch roots.
      This leads to the same nr_old_roots, and this extent just got ignored by
      qgroup, which means this extent is wrongly accounted.
      
      Fix it by call commit_cowonly_roots() after qgroup_account_extent() in
      create_pending_snapshot(), with needed preparation.
      
      Mark: I added a check at the top of qgroup_account_snapshot() to skip this
      code if qgroups are turned off. xfstest btrfs/122 exposes this problem.
      Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
      Reviewed-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.de>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      6426c7ad
  11. 29 4月, 2016 1 次提交
  12. 28 4月, 2016 1 次提交
  13. 18 2月, 2016 2 次提交
  14. 07 1月, 2016 3 次提交
  15. 17 12月, 2015 1 次提交
    • F
      Btrfs: fix memory leaks after transaction is aborted · 7785a663
      Filipe Manana 提交于
      When a transaction is aborted, or its commit fails before writing the new
      superblock and calling btrfs_finish_extent_commit(), we leak reference
      counts on the block groups attached to the transaction's delete_bgs list,
      because btrfs_finish_extent_commit() is never called for those two cases.
      Fix this by dropping their references at btrfs_put_transaction(), which
      is called when transactions are aborted (by making the transaction kthread
      commit the transaction) or if their commits fail.
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      7785a663
  16. 10 12月, 2015 1 次提交
    • F
      Btrfs: fix unprotected list move from unused_bgs to deleted_bgs list · 348a0013
      Filipe Manana 提交于
      As of my previous change titled "Btrfs: fix scrub preventing unused block
      groups from being deleted", the following warning at
      extent-tree.c:btrfs_delete_unused_bgs() can be hit when we mount the a
      filesysten with "-o discard":
      
       10263  void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info)
       10264  {
       (...)
       10405                  if (trimming) {
       10406                          WARN_ON(!list_empty(&block_group->bg_list));
       10407                          spin_lock(&trans->transaction->deleted_bgs_lock);
       10408                          list_move(&block_group->bg_list,
       10409                                    &trans->transaction->deleted_bgs);
       10410                          spin_unlock(&trans->transaction->deleted_bgs_lock);
       10411                          btrfs_get_block_group(block_group);
       10412                  }
       (...)
      
      This happens because scrub can now add back the block group to the list of
      unused block groups (fs_info->unused_bgs). This is dangerous because we
      are moving the block group from the unused block groups list to the list
      of deleted block groups without holding the lock that protects the source
      list (fs_info->unused_bgs_lock).
      
      The following diagram illustrates how this happens:
      
                  CPU 1                                     CPU 2
      
       cleaner_kthread()
         btrfs_delete_unused_bgs()
      
           sees bg X in list
            fs_info->unused_bgs
      
           deletes bg X from list
            fs_info->unused_bgs
      
                                                  scrub_enumerate_chunks()
      
                                                    searches device tree using
                                                    its commit root
      
                                                    finds device extent for
                                                    block group X
      
                                                    gets block group X from the tree
                                                    fs_info->block_group_cache_tree
                                                    (via btrfs_lookup_block_group())
      
                                                    sets bg X to RO (again)
      
                                                    scrub_chunk(bg X)
      
                                                    sets bg X back to RW mode
      
                                                    adds bg X to the list
                                                    fs_info->unused_bgs again,
                                                    since it's still unused and
                                                    currently not in that list
      
           sets bg X to RO mode
      
           btrfs_remove_chunk(bg X)
      
           --> discard is enabled and bg X
               is in the fs_info->unused_bgs
               list again so the warning is
               triggered
           --> we move it from that list into
               the transaction's delete_bgs
               list, but we can have another
               task currently manipulating
               the first list (fs_info->unused_bgs)
      
      Fix this by using the same lock (fs_info->unused_bgs_lock) to protect both
      the list of unused block groups and the list of deleted block groups. This
      makes it safe and there's not much worry for more lock contention, as this
      lock is seldom used and only the cleaner kthread adds elements to the list
      of deleted block groups. The warning goes away too, as this was previously
      an impossible case (and would have been better a BUG_ON/ASSERT) but it's
      not impossible anymore.
      Reproduced with fstest btrfs/073 (using MOUNT_OPTIONS="-o discard").
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      348a0013
  17. 25 11月, 2015 1 次提交
    • F
      Btrfs: use global reserve when deleting unused block group after ENOSPC · 8eab77ff
      Filipe Manana 提交于
      It's possible to reach a state where the cleaner kthread isn't able to
      start a transaction to delete an unused block group due to lack of enough
      free metadata space and due to lack of unallocated device space to allocate
      a new metadata block group as well. If this happens try to use space from
      the global block group reserve just like we do for unlink operations, so
      that we don't reach a permanent state where starting a transaction for
      filesystem operations (file creation, renames, etc) keeps failing with
      -ENOSPC. Such an unfortunate state was observed on a machine where over
      a dozen unused data block groups existed and the cleaner kthread was
      failing to delete them due to ENOSPC error when attempting to start a
      transaction, and even running balance with a -dusage=0 filter failed with
      ENOSPC as well. Also unmounting and mounting again the filesystem didn't
      help. Allowing the cleaner kthread to use the global block reserve to
      delete the unused data block groups fixed the problem.
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NJeff Mahoney <jeffm@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      8eab77ff
  18. 22 10月, 2015 6 次提交
  19. 11 10月, 2015 2 次提交