1. 18 4月, 2018 1 次提交
    • Q
      btrfs: qgroup: Commit transaction in advance to reduce early EDQUOT · a514d638
      Qu Wenruo 提交于
      Unlike previous method that tries to commit transaction inside
      qgroup_reserve(), this time we will try to commit transaction using
      fs_info->transaction_kthread to avoid nested transaction and no need to
      worry about locking context.
      
      Since it's an asynchronous function call and we won't wait for
      transaction commit, unlike previous method, we must call it before we
      hit the qgroup limit.
      
      So this patch will use the ratio and size of qgroup meta_pertrans
      reservation as indicator to check if we should trigger a transaction
      commit.  (meta_prealloc won't be cleaned in transaction committ, it's
      useless anyway)
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      a514d638
  2. 12 4月, 2018 1 次提交
  3. 31 3月, 2018 16 次提交
    • D
      btrfs: use lockdep_assert_held for spinlocks · a4666e68
      David Sterba 提交于
      Using lockdep_assert_held is preferred, replace assert_spin_locked.
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      a4666e68
    • Q
      btrfs: Validate child tree block's level and first key · 581c1760
      Qu Wenruo 提交于
      We have several reports about node pointer points to incorrect child
      tree blocks, which could have even wrong owner and level but still with
      valid generation and checksum.
      
      Although btrfs check could handle it and print error message like:
      leaf parent key incorrect 60670574592
      
      Kernel doesn't have enough check on this type of corruption correctly.
      At least add such check to read_tree_block() and btrfs_read_buffer(),
      where we need two new parameters @level and @first_key to verify the
      child tree block.
      
      The new @level check is mandatory and all call sites are already
      modified to extract expected level from its call chain.
      
      While @first_key is optional, the following call sites are skipping such
      check:
      1) Root node/leaf
         As ROOT_ITEM doesn't contain the first key, skip @first_key check.
      2) Direct backref
         Only parent bytenr and level is known and we need to resolve the key
         all by ourselves, skip @first_key check.
      
      Another note of this verification is, it needs extra info from nodeptr
      or ROOT_ITEM, so it can't fit into current tree-checker framework, which
      is limited to node/leaf boundary.
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      581c1760
    • D
      btrfs: use helper to set ulist aux from a qgroup · a1840b50
      David Sterba 提交于
      We have a nice helper to do proper casting of a qgroup to a ulist aux
      value. And several places that could make use of it.
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      a1840b50
    • Q
      Revert "btrfs: qgroups: Retry after commit on getting EDQUOT" · 0b78877a
      Qu Wenruo 提交于
      This reverts commit 48a89bc4.
      
      The idea to commit transaction and free some space after hitting qgroup
      limit is good, although the problem is it can easily cause deadlocks.
      
      One deadlock example is caused by trying to flush data while still
      holding it:
      
      Call Trace:
       __schedule+0x49d/0x10f0
       schedule+0xc6/0x290
       schedule_timeout+0x187/0x1c0
       wait_for_completion+0x204/0x3a0
       btrfs_wait_ordered_extents+0xa40/0xaf0 [btrfs]
       qgroup_reserve+0x913/0xa10 [btrfs]
       btrfs_qgroup_reserve_data+0x3ef/0x580 [btrfs]
       btrfs_check_data_free_space+0x96/0xd0 [btrfs]
       __btrfs_buffered_write+0x3ac/0xd40 [btrfs]
       btrfs_file_write_iter+0x62a/0xba0 [btrfs]
       __vfs_write+0x320/0x430
       vfs_write+0x107/0x270
       SyS_write+0xbf/0x150
       do_syscall_64+0x1b0/0x3d0
       entry_SYSCALL64_slow_path+0x25/0x25
      
      Another can be caused by trying to commit one transaction while nesting
      with trans handle held by ourselves:
      
      btrfs_start_transaction()
      |- btrfs_qgroup_reserve_meta_pertrans()
         |- qgroup_reserve()
            |- btrfs_join_transaction()
            |- btrfs_commit_transaction()
      
      The retry is causing more problems than exppected when limit is enabled.
      At least a graceful EDQUOT is way better than deadlock.
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      0b78877a
    • Q
      btrfs: qgroup: Update trace events for metadata reservation · 4ee0d883
      Qu Wenruo 提交于
      Now trace_qgroup_meta_reserve() will have extra type parameter.
      
      And introduce two new trace events:
      
      1) trace_qgroup_meta_free_all_pertrans()
         For btrfs_qgroup_free_meta_all_pertrans()
      
      2) trace_qgroup_meta_convert()
         For btrfs_qgroup_convert_reserved_meta()
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      4ee0d883
    • Q
      btrfs: qgroup: Use root::qgroup_meta_rsv_* to record qgroup meta reserved space · 8287475a
      Qu Wenruo 提交于
      For quota disabled->enable case, it's possible that at reservation time
      quota was not enabled so no bytes were really reserved, while at release
      time, quota was enabled so we will try to release some bytes we didn't
      really own.
      
      Such situation can cause metadata reserveation underflow, for both types,
      also less possible for per-trans type since quota enable will commit
      transaction.
      
      To address this, record qgroup meta reserved bytes into
      root::qgroup_meta_rsv_pertrans and ::prealloc.
      So at releasing time we won't free any bytes we didn't reserve.
      
      For DATA, it's already handled by io_tree, so nothing needs to be done
      there.
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      8287475a
    • Q
      btrfs: qgroup: Introduce function to convert META_PREALLOC into META_PERTRANS · 64cfaef6
      Qu Wenruo 提交于
      For meta_prealloc reservation users, after btrfs_join_transaction()
      caller will modify tree so part (or even all) meta_prealloc reservation
      should be converted to meta_pertrans until transaction commit time.
      
      This patch introduces a new function,
      btrfs_qgroup_convert_reserved_meta() to do this for META_PREALLOC
      reservation user.
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      64cfaef6
    • Q
      btrfs: qgroup: Don't use root->qgroup_meta_rsv for qgroup · e1211d0e
      Qu Wenruo 提交于
      Since qgroup has seperate metadata reservation types now, we can
      completely get rid of the old root->qgroup_meta_rsv, which mostly acts
      as current META_PERTRANS reservation type.
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      e1211d0e
    • Q
      btrfs: qgroup: Split meta rsv type into meta_prealloc and meta_pertrans · 733e03a0
      Qu Wenruo 提交于
      Btrfs uses 2 different methods to reseve metadata qgroup space.
      
      1) Reserve at btrfs_start_transaction() time
         This is quite straightforward, caller will use the trans handler
         allocated to modify b-trees.
      
         In this case, reserved metadata should be kept until qgroup numbers
         are updated.
      
      2) Reserve by using block_rsv first, and later btrfs_join_transaction()
         This is more complicated, caller will reserve space using block_rsv
         first, and then later call btrfs_join_transaction() to get a trans
         handle.
      
         In this case, before we modify trees, the reserved space can be
         modified on demand, and after btrfs_join_transaction(), such reserved
         space should also be kept until qgroup numbers are updated.
      
      Since these two types behave differently, split the original "META"
      reservation type into 2 sub-types:
      
        META_PERTRANS:
          For above case 1)
      
        META_PREALLOC:
          For reservations that happened before btrfs_join_transaction() of
          case 2)
      
      NOTE: This patch will only convert existing qgroup meta reservation
      callers according to its situation, not ensuring all callers are at
      correct timing.
      Such fix will be added in later patches.
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      [ update comments ]
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      733e03a0
    • Q
      btrfs: qgroup: Cleanup the remaining old reservation counters · 5c40507f
      Qu Wenruo 提交于
      So qgroup is switched to new separate types reservation system.
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      5c40507f
    • Q
      64ee4e75
    • Q
      btrfs: qgroup: Fix wrong qgroup reservation update for relationship modification · 429d6275
      Qu Wenruo 提交于
      When modifying qgroup relationship, for qgroup which only owns exclusive
      extents, we will go through quick update path.
      
      In this path, we will add/subtract exclusive and reference number for
      parent qgroup, since the source (child) qgroup only has exclusive
      extents, destination (parent) qgroup will also own or lose those extents
      exclusively.
      
      The same should be the same for reservation, since later reservation
      adding/releasing will also affect parent qgroup, without the reservation
      carried from child, parent will underflow reservation or have dead
      reservation which will never be freed.
      
      However original code doesn't do the same thing for reservation.
      It handles qgroup reservation quite differently:
      
      It removes qgroup reservation, as it's allocating space from the
      reserved qgroup for relationship adding.
      But does nothing for qgroup reservation if we're removing a qgroup
      relationship.
      
      According to the original code, it looks just like because we're adding
      qgroup->rfer, the code assumes we're writing new data, so it's follows
      the normal write routine, by reducing qgroup->reserved and adding
      qgroup->rfer/excl.
      
      This old behavior is wrong, and should be fixed to follow the same
      excl/rfer behavior.
      
      Just fix it by using the correct behavior described above.
      
      Fixes: 31193213 ("Btrfs: qgroup: Introduce a may_use to account space_info->bytes_may_use.")
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      429d6275
    • Q
      btrfs: qgroup: Make qgroup_reserve and its callers to use separate reservation type · dba21324
      Qu Wenruo 提交于
      Since most callers of qgroup_reserve() are already defined by type,
      converting qgroup_reserve() is quite an easy work.
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      dba21324
    • Q
      btrfs: qgroup: Introduce helpers to update and access new qgroup rsv · f59c0347
      Qu Wenruo 提交于
      Introduce helpers to:
      
      1) Get total reserved space
         For limit calculation
      2) Add/release reserved space for given type
         With underflow detection and warning
      3) Add/release reserved space according to child qgroup
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      f59c0347
    • Q
      btrfs: qgroup: Skeleton to support separate qgroup reservation type · d4e5c920
      Qu Wenruo 提交于
      Instead of single qgroup->reserved, use a new structure btrfs_qgroup_rsv
      to store different types of reservation.
      
      This patch only updates the header needed to compile.
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      d4e5c920
    • N
      btrfs: Drop fs_info parameter from btrfs_qgroup_account_extents · 460fb20a
      Nikolay Borisov 提交于
      It's provided by the transaction handle.
      Signed-off-by: NNikolay Borisov <nborisov@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      460fb20a
  4. 26 3月, 2018 1 次提交
    • N
      btrfs: Move qgroup rescan on quota enable to btrfs_quota_enable · 5d23515b
      Nikolay Borisov 提交于
      Currently btrfs_run_qgroups is doing a bit too much. Not only is it
      responsible for synchronizing in-memory state of qgroups to disk but
      it also contains code to trigger the initial qgroup rescan when
      quota is enabled initially. This condition is detected by checking that
      BTRFS_FS_QUOTA_ENABLED is not set and BTRFS_FS_QUOTA_ENABLING is set.
      Nothing really requires from the code to be structured (and scattered)
      the way it is so let's streamline things. First move the quota rescan
      code into btrfs_quota_enable, where its invocation is closer to the
      use. This also makes the FS_QUOTA_ENABLING flag redundant so let's
      remove it as well.
      
      This has been tested with a full xfstest run with qgroups enabled on
      the scratch device of every xfstest and no regressions were observed.
      Signed-off-by: NNikolay Borisov <nborisov@suse.com>
      Reviewed-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      5d23515b
  5. 02 2月, 2018 1 次提交
    • N
      btrfs: Ignore errors from btrfs_qgroup_trace_extent_post · 952bd3db
      Nikolay Borisov 提交于
      Running generic/019 with qgroups on the scratch device enabled is almost
      guaranteed to trigger the BUG_ON in btrfs_free_tree_block. It's supposed
      to trigger only on -ENOMEM, in reality, however, it's possible to get
      -EIO from btrfs_qgroup_trace_extent_post. This function just finds the
      roots of the extent being tracked and sets the qrecord->old_roots list.
      If this operation fails nothing critical happens except the quota
      accounting can be considered wrong. In such case just set the
      INCONSISTENT flag for the quota and print a warning, rather than killing
      off the system. Additionally, it's possible to trigger a BUG_ON in
      btrfs_truncate_inode_items as well.
      Signed-off-by: NNikolay Borisov <nborisov@suse.com>
      Reviewed-by: NQu Wenruo <wqu@suse.com>
      [ error message adjustments ]
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      952bd3db
  6. 22 1月, 2018 1 次提交
  7. 02 11月, 2017 1 次提交
    • Z
      btrfs: add a flag to iterate_inodes_from_logical to find all extent refs for uncompressed extents · c995ab3c
      Zygo Blaxell 提交于
      The LOGICAL_INO ioctl provides a backward mapping from extent bytenr and
      offset (encoded as a single logical address) to a list of extent refs.
      LOGICAL_INO complements TREE_SEARCH, which provides the forward mapping
      (extent ref -> extent bytenr and offset, or logical address).  These are
      useful capabilities for programs that manipulate extents and extent
      references from userspace (e.g. dedup and defrag utilities).
      
      When the extents are uncompressed (and not encrypted and not other),
      check_extent_in_eb performs filtering of the extent refs to remove any
      extent refs which do not contain the same extent offset as the 'logical'
      parameter's extent offset.  This prevents LOGICAL_INO from returning
      references to more than a single block.
      
      To find the set of extent references to an uncompressed extent from [a, b),
      userspace has to run a loop like this pseudocode:
      
      	for (i = a; i < b; ++i)
      		extent_ref_set += LOGICAL_INO(i);
      
      At each iteration of the loop (up to 32768 iterations for a 128M extent),
      data we are interested in is collected in the kernel, then deleted by
      the filter in check_extent_in_eb.
      
      When the extents are compressed (or encrypted or other), the 'logical'
      parameter must be an extent bytenr (the 'a' parameter in the loop).
      No filtering by extent offset is done (or possible?) so the result is
      the complete set of extent refs for the entire extent.  This removes
      the need for the loop, since we get all the extent refs in one call.
      
      Add an 'ignore_offset' argument to iterate_inodes_from_logical,
      [...several levels of function call graph...], and check_extent_in_eb, so
      that we can disable the extent offset filtering for uncompressed extents.
      This flag can be set by an improved version of the LOGICAL_INO ioctl to
      get either behavior as desired.
      
      There is no functional change in this patch.  The new flag is always
      false.
      Signed-off-by: NZygo Blaxell <ce3g8jdj@umail.furryterror.org>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      [ minor coding style fixes ]
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      c995ab3c
  8. 26 9月, 2017 2 次提交
    • S
      btrfs: Report error on removing qgroup if del_qgroup_item fails · 36b96fdc
      Sargun Dhillon 提交于
      Previously, we were calling del_qgroup_item, and ignoring the return code
      resulting in a potential to have divergent in-memory state without an
      error. Perhaps, it makes sense to handle this error code, and put the
      filesystem into a read only, or similar state.
      
      This patch only adds reporting of the error if the error is fatal,
      (any error other than qgroup not found).
      Signed-off-by: NSargun Dhillon <sargun@sargun.me>
      Reviewed-by: NQu Wenruo <quwenruo.btrfs@gmx.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      36b96fdc
    • M
      btrfs: remove BTRFS_FS_QUOTA_DISABLING flag · c2faff79
      Misono, Tomohiro 提交于
      Currently, "btrfs quota enable" would fail after "btrfs quota disable" on
      the first time with syslog output "qgroup_rescan_init failed with -22", but
      it would succeed on the second time.
      
      When "quota disable" is called, BTRFS_FS_QUOTA_DISABLING flag bit will be
      set in fs_info->flags in btrfs_quota_disable(), but it will not be droppd
      in btrfs_run_qgroups() (which is called in btrfs_commit_transaction())
      because quota_root has already been freed. If "quota enable" is called
      after that, both BTRFS_FS_QUOTA_DISABLING and BTRFS_FS_QUOTA_ENABLED flag
      would be dropped in the btrfs_run_qgroups() since quota_root is not NULL.
      This leads to the failure of "quota enable" on the first time.
      
      BTRFS_FS_QUOTA_DISABLING flag is not used outside of "quota disable"
      context and is equivalent to whether quota_root is NULL or not.
      btrfs_run_qgroups() checks whether quota_root is NULL or not in the first
      place.
      
      So, let's remove BTRFS_FS_QUOTA_DISABLING flag.
      Signed-off-by: NTomohiro Misono <misono.tomohiro@jp.fujitsu.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      c2faff79
  9. 21 8月, 2017 1 次提交
  10. 16 8月, 2017 2 次提交
  11. 30 6月, 2017 6 次提交
    • C
      btrfs: fix integer overflow in calc_reclaim_items_nr · 6374e57a
      Chris Mason 提交于
      Dave Jones hit a WARN_ON(nr < 0) in btrfs_wait_ordered_roots() with
      v4.12-rc6.  This was because commit 70e7af24 made it possible for
      calc_reclaim_items_nr() to return a negative number.  It's not really a
      bug in that commit, it just didn't go far enough down the stack to find
      all the possible 64->32 bit overflows.
      
      This switches calc_reclaim_items_nr() to return a u64 and changes everyone
      that uses the results of that math to u64 as well.
      Reported-by: NDave Jones <davej@codemonkey.org.uk>
      Fixes: 70e7af24 ("Btrfs: fix delalloc accounting leak caused by u32 overflow")
      Signed-off-by: NChris Mason <clm@fb.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      6374e57a
    • Q
      btrfs: qgroup: Fix qgroup reserved space underflow by only freeing reserved ranges · bc42bda2
      Qu Wenruo 提交于
      [BUG]
      For the following case, btrfs can underflow qgroup reserved space
      at an error path:
      (Page size 4K, function name without "btrfs_" prefix)
      
               Task A                  |             Task B
      ----------------------------------------------------------------------
      Buffered_write [0, 2K)           |
      |- check_data_free_space()       |
      |  |- qgroup_reserve_data()      |
      |     Range aligned to page      |
      |     range [0, 4K)          <<< |
      |     4K bytes reserved      <<< |
      |- copy pages to page cache      |
                                       | Buffered_write [2K, 4K)
                                       | |- check_data_free_space()
                                       | |  |- qgroup_reserved_data()
                                       | |     Range alinged to page
                                       | |     range [0, 4K)
                                       | |     Already reserved by A <<<
                                       | |     0 bytes reserved      <<<
                                       | |- delalloc_reserve_metadata()
                                       | |  And it *FAILED* (Maybe EQUOTA)
                                       | |- free_reserved_data_space()
                                            |- qgroup_free_data()
                                               Range aligned to page range
                                               [0, 4K)
                                               Freeing 4K
      (Special thanks to Chandan for the detailed report and analyse)
      
      [CAUSE]
      Above Task B is freeing reserved data range [0, 4K) which is actually
      reserved by Task A.
      
      And at writeback time, page dirty by Task A will go through writeback
      routine, which will free 4K reserved data space at file extent insert
      time, causing the qgroup underflow.
      
      [FIX]
      For btrfs_qgroup_free_data(), add @reserved parameter to only free
      data ranges reserved by previous btrfs_qgroup_reserve_data().
      So in above case, Task B will try to free 0 byte, so no underflow.
      Reported-by: NChandan Rajendra <chandan@linux.vnet.ibm.com>
      Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
      Reviewed-by: NChandan Rajendra <chandan@linux.vnet.ibm.com>
      Tested-by: NChandan Rajendra <chandan@linux.vnet.ibm.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      bc42bda2
    • Q
      btrfs: qgroup: Introduce extent changeset for qgroup reserve functions · 364ecf36
      Qu Wenruo 提交于
      Introduce a new parameter, struct extent_changeset for
      btrfs_qgroup_reserved_data() and its callers.
      
      Such extent_changeset was used in btrfs_qgroup_reserve_data() to record
      which range it reserved in current reserve, so it can free it in error
      paths.
      
      The reason we need to export it to callers is, at buffered write error
      path, without knowing what exactly which range we reserved in current
      allocation, we can free space which is not reserved by us.
      
      This will lead to qgroup reserved space underflow.
      Reviewed-by: NChandan Rajendra <chandan@linux.vnet.ibm.com>
      Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      364ecf36
    • Q
      btrfs: qgroup: Return actually freed bytes for qgroup release or free data · 7bc329c1
      Qu Wenruo 提交于
      btrfs_qgroup_release/free_data() only returns 0 or a negative error
      number (ENOMEM is the only possible error).
      
      This is normally good enough, but sometimes we need the exact byte
      count it freed/released.
      
      Change it to return actually released/freed bytenr number instead of 0
      for success.
      And slightly modify related extent_changeset structure, since in btrfs
      one no-hole data extent won't be larger than 128M, so "unsigned int"
      is large enough for the use case.
      Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      7bc329c1
    • Q
      btrfs: qgroup: Cleanup btrfs_qgroup_prepare_account_extents function · d1b8b94a
      Qu Wenruo 提交于
      Quite a lot of qgroup corruption happens due to wrong time of calling
      btrfs_qgroup_prepare_account_extents().
      
      Since the safest time is to call it just before
      btrfs_qgroup_account_extents(), there is no need to separate these 2
      functions.
      
      Merging them will make code cleaner and less bug prone.
      Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
      [ changelog and comment adjustments ]
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      d1b8b94a
    • Q
      btrfs: qgroup: Add quick exit for non-fs extents · 5edfd9fd
      Qu Wenruo 提交于
      Modify btrfs_qgroup_account_extent() to exit quicker for non-fs extents.
      
      The quick exit condition is:
      1) The extent belongs to a non-fs tree
         Only fs-tree extents can affect qgroup numbers and is the only case
         where extent can be shared between different trees.
      
         Although strictly speaking extent in data-reloc or tree-reloc tree
         can be shared, data/tree-reloc root won't appear in the result of
         btrfs_find_all_roots(), so we can ignore such case.
      
         So we can check the first root in old_roots/new_roots ulist.
         - if we find the 1st root is a not a fs/subvol root, then we can skip
           the extent
         - if we find the 1st root is a fs/subvol root, then we must continue
           calculation
      
      OR
      
      2) both 'nr_old_roots' and 'nr_new_roots' are 0
         This means either such extent got allocated then freed in current
         transaction or it's a new reloc tree extent, whose nr_new_roots is 0.
         Either way it won't affect qgroup accounting and can be skipped
         safely.
      
      Such quick exit can make trace output more quite and less confusing:
      (example with fs uuid and time stamp removed)
      
      Before:
      ------
      add_delayed_tree_ref: bytenr=29556736 num_bytes=16384 action=ADD_DELAYED_REF parent=0(-) ref_root=2(EXTENT_TREE) level=0 type=TREE_BLOCK_REF seq=0
      btrfs_qgroup_account_extent: bytenr=29556736 num_bytes=16384 nr_old_roots=0 nr_new_roots=1
      ------
      Extent tree block will trigger btrfs_qgroup_account_extent() trace point
      while no qgroup number is changed, as extent tree won't affect qgroup
      accounting.
      
      After:
      ------
      add_delayed_tree_ref: bytenr=29556736 num_bytes=16384 action=ADD_DELAYED_REF parent=0(-) ref_root=2(EXTENT_TREE) level=0 type=TREE_BLOCK_REF seq=0
      ------
      Now such unrelated extent won't trigger btrfs_qgroup_account_extent()
      trace point, making the trace less noisy.
      Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
      [ changelog and comment adjustments ]
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      5edfd9fd
  12. 21 6月, 2017 1 次提交
  13. 20 6月, 2017 1 次提交
    • S
      btrfs: add quota override flag to enable quota override for CAP_SYS_RESOURCE · f29efe29
      Sargun Dhillon 提交于
      This patch introduces the quota override flag to btrfs_fs_info, and a
      change to quota limit checking code to temporarily allow for quota to be
      overridden for processes with CAP_SYS_RESOURCE.
      
      It's useful for administrative programs, such as log rotation, that may
      need to temporarily use more disk space in order to free up a greater
      amount of overall disk space without yielding more disk space to the
      rest of userland.
      
      Eventually, we may want to add the idea of an operator-specific quota,
      operator reserved space, or something else to allow for administrative
      override, but this is perhaps the simplest solution.
      Signed-off-by: NSargun Dhillon <sargun@sargun.me>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      [ minor changelog edits ]
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      f29efe29
  14. 19 4月, 2017 1 次提交
  15. 18 4月, 2017 4 次提交
    • Q
      btrfs: qgroup: Re-arrange tracepoint timing to co-operate with reserved space tracepoint · d51ea5dd
      Qu Wenruo 提交于
      Newly introduced qgroup reserved space trace points are normally nested
      into several common qgroup operations.
      
      While some other trace points are not well placed to co-operate with
      them, causing confusing output.
      
      This patch re-arrange trace_btrfs_qgroup_release_data() and
      trace_btrfs_qgroup_free_delayed_ref() trace points so they are triggered
      before reserved space ones.
      Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
      Reviewed-by: NJeff Mahoney <jeffm@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      d51ea5dd
    • Q
      btrfs: qgroup: Add trace point for qgroup reserved space · 3159fe7b
      Qu Wenruo 提交于
      Introduce the following trace points:
      qgroup_update_reserve
      qgroup_meta_reserve
      
      These trace points are handy to trace qgroup reserve space related
      problems.
      
      Also export btrfs_qgroup structure, as now we directly pass btrfs_qgroup
      structure to trace points, so that structure needs to be exported.
      Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      3159fe7b
    • G
      btrfs: qgroups: Retry after commit on getting EDQUOT · 48a89bc4
      Goldwyn Rodrigues 提交于
      We are facing the same problem with EDQUOT which was experienced with
      ENOSPC. Not sure if we require a full ticketing system such as ENOSPC, but
      here is a quick fix, which may be too big a hammer.
      
      Quotas are reserved during the start of an operation, incrementing
      qg->reserved. However, it is written to disk in a commit_transaction
      which could take as long as commit_interval. In the meantime there
      could be deletions which are not accounted for because deletions are
      accounted for only while committed (free_refroot). So, when we get
      a EDQUOT flush the data to disk and try again.
      
      This fixes fstests btrfs/139.
      
      Here is a sample script which shows this issue.
      
      DEVICE=/dev/vdb
      MOUNTPOINT=/mnt
      TESTVOL=$MOUNTPOINT/tmp
      QUOTA=5
      PROG=btrfs
      DD_BS="4k"
      DD_COUNT="256"
      RUN_TIMES=5000
      
      mkfs.btrfs -f $DEVICE
      mount -o commit=240 $DEVICE $MOUNTPOINT
      $PROG subvolume create $TESTVOL
      $PROG quota enable $TESTVOL
      $PROG qgroup limit ${QUOTA}G $TESTVOL
      
      typeset -i DD_RUN_GOOD
      typeset -i QUOTA
      
      function _check_cmd() {
              if [[ ${?} > 0 ]]; then
                      echo -n "$(date) E: Running previous command"
                      echo ${*}
                      echo "Without sync"
                      $PROG qgroup show -pcreFf ${TESTVOL}
                      echo "With sync"
                      $PROG qgroup show -pcreFf --sync ${TESTVOL}
                      exit 1
              fi
      }
      
      while true; do
        DD_RUN_GOOD=$RUN_TIMES
      
        while (( ${DD_RUN_GOOD} != 0 )); do
              dd if=/dev/zero of=${TESTVOL}/quotatest${DD_RUN_GOOD} bs=${DD_BS} count=${DD_COUNT}
              _check_cmd "dd if=/dev/zero of=${TESTVOL}/quotatest${DD_RUN_GOOD} bs=${DD_BS} count=${DD_COUNT}"
              DD_RUN_GOOD=(${DD_RUN_GOOD}-1)
        done
      
        $PROG qgroup show -pcref $TESTVOL
        echo "----------- Cleanup ---------- "
        rm $TESTVOL/quotatest*
      
      done
      Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
      Reviewed-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      48a89bc4
    • E
      btrfs: replace hardcoded value with SEQ_LAST macro · de47c9d3
      Edmund Nadolski 提交于
      Define the SEQ_LAST macro to replace (u64)-1 in places where said
      value triggers a special-case ref search behavior.
      Signed-off-by: NEdmund Nadolski <enadolski@suse.com>
      Reviewed-by: NJeff Mahoney <jeffm@suse.com>
      Reviewed-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      de47c9d3