1. 06 8月, 2018 21 次提交
  2. 28 6月, 2018 2 次提交
  3. 30 5月, 2018 1 次提交
    • Q
      btrfs: qgroup: show more meaningful qgroup_rescan_init error message · 9593bf49
      Qu Wenruo 提交于
      Error message from qgroup_rescan_init() mostly looks like:
      
        BTRFS info (device nvme0n1p1): qgroup_rescan_init failed with -115
      
      Which is far from meaningful, and sometimes confusing as for above
      -EINPROGRESS it's mostly (despite the init race) harmless, but sometimes
      it can also indicate problem if the return value is -EINVAL.
      
      Change it to some more meaningful messages like:
      
        BTRFS info (device nvme0n1p1): qgroup rescan is already in progress
      
      And
      
        BTRFS err(device nvme0n1p1): qgroup rescan init failed, qgroup is not enabled
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Reviewed-by: NNikolay Borisov <nborisov@suse.com>
      [ update the messages and level ]
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      9593bf49
  4. 29 5月, 2018 4 次提交
    • Q
      btrfs: qgroup: Finish rescan when hit the last leaf of extent tree · ff3d27a0
      Qu Wenruo 提交于
      Under the following case, qgroup rescan can double account cowed tree
      blocks:
      
      In this case, extent tree only has one tree block.
      
      -
      | transid=5 last committed=4
      | btrfs_qgroup_rescan_worker()
      | |- btrfs_start_transaction()
      | |  transid = 5
      | |- qgroup_rescan_leaf()
      |    |- btrfs_search_slot_for_read() on extent tree
      |       Get the only extent tree block from commit root (transid = 4).
      |       Scan it, set qgroup_rescan_progress to the last
      |       EXTENT/META_ITEM + 1
      |       now qgroup_rescan_progress = A + 1.
      |
      | fs tree get CoWed, new tree block is at A + 16K
      | transid 5 get committed
      -
      | transid=6 last committed=5
      | btrfs_qgroup_rescan_worker()
      | btrfs_qgroup_rescan_worker()
      | |- btrfs_start_transaction()
      | |  transid = 5
      | |- qgroup_rescan_leaf()
      |    |- btrfs_search_slot_for_read() on extent tree
      |       Get the only extent tree block from commit root (transid = 5).
      |       scan it using qgroup_rescan_progress (A + 1).
      |       found new tree block beyong A, and it's fs tree block,
      |       account it to increase qgroup numbers.
      -
      
      In above case, tree block A, and tree block A + 16K get accounted twice,
      while qgroup rescan should stop when it already reach the last leaf,
      other than continue using its qgroup_rescan_progress.
      
      Such case could happen by just looping btrfs/017 and with some
      possibility it can hit such double qgroup accounting problem.
      
      Fix it by checking the path to determine if we should finish qgroup
      rescan, other than relying on next loop to exit.
      Reported-by: NNikolay Borisov <nborisov@suse.com>
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      ff3d27a0
    • Q
      btrfs: qgroup: Search commit root for rescan to avoid missing extent · b6debf15
      Qu Wenruo 提交于
      When doing qgroup rescan using the following script (modified from
      btrfs/017 test case), we can sometimes hit qgroup corruption.
      
      ------
      umount $dev &> /dev/null
      umount $mnt &> /dev/null
      
      mkfs.btrfs -f -n 64k $dev
      mount $dev $mnt
      
      extent_size=8192
      
      xfs_io -f -d -c "pwrite 0 $extent_size" $mnt/foo > /dev/null
      btrfs subvolume snapshot $mnt $mnt/snap
      
      xfs_io -f -c "reflink $mnt/foo" $mnt/foo-reflink > /dev/null
      xfs_io -f -c "reflink $mnt/foo" $mnt/snap/foo-reflink > /dev/null
      xfs_io -f -c "reflink $mnt/foo" $mnt/snap/foo-reflink2 > /dev/unll
      btrfs quota enable $mnt
      
       # -W is the new option to only wait rescan while not starting new one
      btrfs quota rescan -W $mnt
      btrfs qgroup show -prce $mnt
      umount $mnt
      
       # Need to patch btrfs-progs to report qgroup mismatch as error
      btrfs check $dev || _fail
      ------
      
      For fast machine, we can hit some corruption which missed accounting
      tree blocks:
      ------
      qgroupid         rfer         excl     max_rfer     max_excl parent  child
      --------         ----         ----     --------     -------- ------  -----
      0/5           8.00KiB        0.00B         none         none ---     ---
      0/257         8.00KiB        0.00B         none         none ---     ---
      ------
      
      This is due to the fact that we're always searching commit root for
      btrfs_find_all_roots() at qgroup_rescan_leaf(), but the leaf we get is
      from current transaction, not commit root.
      
      And if our tree blocks get modified in current transaction, we won't
      find any owner in commit root, thus causing the corruption.
      
      Fix it by searching commit root for extent tree for
      qgroup_rescan_leaf().
      Reported-by: NNikolay Borisov <nborisov@suse.com>
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Reviewed-by: NNikolay Borisov <nborisov@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      b6debf15
    • Q
      btrfs: qgroup: Allow trace_btrfs_qgroup_account_extent() to record its transid · c9f6f3cd
      Qu Wenruo 提交于
      When debugging quota rescan race, some times btrfs rescan could account
      some old (committed) leaf and then re-account newly committed leaf
      in next generation.
      
      This race needs extra transid to locate, so add @transid for
      trace_btrfs_qgroup_account_extent() for such debug.
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      c9f6f3cd
    • Q
      btrfs: trace: Allow trace_qgroup_update_counters() to record old rfer/excl value · 8b317901
      Qu Wenruo 提交于
      Origin trace_qgroup_update_counters() only records qgroup id and its
      reference count change.
      
      It's good enough to debug qgroup accounting change, but when rescan race
      is involved, it's pretty hard to distinguish which modification belongs
      to which rescan.
      
      So add old_rfer and old_excl trace output to help distinguishing
      different rescan instance.
      (Different rescan instance should reset its qgroup->rfer to 0)
      
      For trace event parameter, it just changes from u64 qgroup_id to struct
      btrfs_qgroup *qgroup, so number of parameters is not changed at all.
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      8b317901
  5. 18 4月, 2018 1 次提交
    • Q
      btrfs: qgroup: Commit transaction in advance to reduce early EDQUOT · a514d638
      Qu Wenruo 提交于
      Unlike previous method that tries to commit transaction inside
      qgroup_reserve(), this time we will try to commit transaction using
      fs_info->transaction_kthread to avoid nested transaction and no need to
      worry about locking context.
      
      Since it's an asynchronous function call and we won't wait for
      transaction commit, unlike previous method, we must call it before we
      hit the qgroup limit.
      
      So this patch will use the ratio and size of qgroup meta_pertrans
      reservation as indicator to check if we should trigger a transaction
      commit.  (meta_prealloc won't be cleaned in transaction committ, it's
      useless anyway)
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      a514d638
  6. 12 4月, 2018 1 次提交
  7. 31 3月, 2018 10 次提交
    • D
      btrfs: use lockdep_assert_held for spinlocks · a4666e68
      David Sterba 提交于
      Using lockdep_assert_held is preferred, replace assert_spin_locked.
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      a4666e68
    • Q
      btrfs: Validate child tree block's level and first key · 581c1760
      Qu Wenruo 提交于
      We have several reports about node pointer points to incorrect child
      tree blocks, which could have even wrong owner and level but still with
      valid generation and checksum.
      
      Although btrfs check could handle it and print error message like:
      leaf parent key incorrect 60670574592
      
      Kernel doesn't have enough check on this type of corruption correctly.
      At least add such check to read_tree_block() and btrfs_read_buffer(),
      where we need two new parameters @level and @first_key to verify the
      child tree block.
      
      The new @level check is mandatory and all call sites are already
      modified to extract expected level from its call chain.
      
      While @first_key is optional, the following call sites are skipping such
      check:
      1) Root node/leaf
         As ROOT_ITEM doesn't contain the first key, skip @first_key check.
      2) Direct backref
         Only parent bytenr and level is known and we need to resolve the key
         all by ourselves, skip @first_key check.
      
      Another note of this verification is, it needs extra info from nodeptr
      or ROOT_ITEM, so it can't fit into current tree-checker framework, which
      is limited to node/leaf boundary.
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      581c1760
    • D
      btrfs: use helper to set ulist aux from a qgroup · a1840b50
      David Sterba 提交于
      We have a nice helper to do proper casting of a qgroup to a ulist aux
      value. And several places that could make use of it.
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      a1840b50
    • Q
      Revert "btrfs: qgroups: Retry after commit on getting EDQUOT" · 0b78877a
      Qu Wenruo 提交于
      This reverts commit 48a89bc4.
      
      The idea to commit transaction and free some space after hitting qgroup
      limit is good, although the problem is it can easily cause deadlocks.
      
      One deadlock example is caused by trying to flush data while still
      holding it:
      
      Call Trace:
       __schedule+0x49d/0x10f0
       schedule+0xc6/0x290
       schedule_timeout+0x187/0x1c0
       wait_for_completion+0x204/0x3a0
       btrfs_wait_ordered_extents+0xa40/0xaf0 [btrfs]
       qgroup_reserve+0x913/0xa10 [btrfs]
       btrfs_qgroup_reserve_data+0x3ef/0x580 [btrfs]
       btrfs_check_data_free_space+0x96/0xd0 [btrfs]
       __btrfs_buffered_write+0x3ac/0xd40 [btrfs]
       btrfs_file_write_iter+0x62a/0xba0 [btrfs]
       __vfs_write+0x320/0x430
       vfs_write+0x107/0x270
       SyS_write+0xbf/0x150
       do_syscall_64+0x1b0/0x3d0
       entry_SYSCALL64_slow_path+0x25/0x25
      
      Another can be caused by trying to commit one transaction while nesting
      with trans handle held by ourselves:
      
      btrfs_start_transaction()
      |- btrfs_qgroup_reserve_meta_pertrans()
         |- qgroup_reserve()
            |- btrfs_join_transaction()
            |- btrfs_commit_transaction()
      
      The retry is causing more problems than exppected when limit is enabled.
      At least a graceful EDQUOT is way better than deadlock.
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      0b78877a
    • Q
      btrfs: qgroup: Update trace events for metadata reservation · 4ee0d883
      Qu Wenruo 提交于
      Now trace_qgroup_meta_reserve() will have extra type parameter.
      
      And introduce two new trace events:
      
      1) trace_qgroup_meta_free_all_pertrans()
         For btrfs_qgroup_free_meta_all_pertrans()
      
      2) trace_qgroup_meta_convert()
         For btrfs_qgroup_convert_reserved_meta()
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      4ee0d883
    • Q
      btrfs: qgroup: Use root::qgroup_meta_rsv_* to record qgroup meta reserved space · 8287475a
      Qu Wenruo 提交于
      For quota disabled->enable case, it's possible that at reservation time
      quota was not enabled so no bytes were really reserved, while at release
      time, quota was enabled so we will try to release some bytes we didn't
      really own.
      
      Such situation can cause metadata reserveation underflow, for both types,
      also less possible for per-trans type since quota enable will commit
      transaction.
      
      To address this, record qgroup meta reserved bytes into
      root::qgroup_meta_rsv_pertrans and ::prealloc.
      So at releasing time we won't free any bytes we didn't reserve.
      
      For DATA, it's already handled by io_tree, so nothing needs to be done
      there.
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      8287475a
    • Q
      btrfs: qgroup: Introduce function to convert META_PREALLOC into META_PERTRANS · 64cfaef6
      Qu Wenruo 提交于
      For meta_prealloc reservation users, after btrfs_join_transaction()
      caller will modify tree so part (or even all) meta_prealloc reservation
      should be converted to meta_pertrans until transaction commit time.
      
      This patch introduces a new function,
      btrfs_qgroup_convert_reserved_meta() to do this for META_PREALLOC
      reservation user.
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      64cfaef6
    • Q
      btrfs: qgroup: Don't use root->qgroup_meta_rsv for qgroup · e1211d0e
      Qu Wenruo 提交于
      Since qgroup has seperate metadata reservation types now, we can
      completely get rid of the old root->qgroup_meta_rsv, which mostly acts
      as current META_PERTRANS reservation type.
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      e1211d0e
    • Q
      btrfs: qgroup: Split meta rsv type into meta_prealloc and meta_pertrans · 733e03a0
      Qu Wenruo 提交于
      Btrfs uses 2 different methods to reseve metadata qgroup space.
      
      1) Reserve at btrfs_start_transaction() time
         This is quite straightforward, caller will use the trans handler
         allocated to modify b-trees.
      
         In this case, reserved metadata should be kept until qgroup numbers
         are updated.
      
      2) Reserve by using block_rsv first, and later btrfs_join_transaction()
         This is more complicated, caller will reserve space using block_rsv
         first, and then later call btrfs_join_transaction() to get a trans
         handle.
      
         In this case, before we modify trees, the reserved space can be
         modified on demand, and after btrfs_join_transaction(), such reserved
         space should also be kept until qgroup numbers are updated.
      
      Since these two types behave differently, split the original "META"
      reservation type into 2 sub-types:
      
        META_PERTRANS:
          For above case 1)
      
        META_PREALLOC:
          For reservations that happened before btrfs_join_transaction() of
          case 2)
      
      NOTE: This patch will only convert existing qgroup meta reservation
      callers according to its situation, not ensuring all callers are at
      correct timing.
      Such fix will be added in later patches.
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      [ update comments ]
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      733e03a0
    • Q
      btrfs: qgroup: Cleanup the remaining old reservation counters · 5c40507f
      Qu Wenruo 提交于
      So qgroup is switched to new separate types reservation system.
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      5c40507f