1. 02 11月, 2017 1 次提交
    • Z
      btrfs: add a flag to iterate_inodes_from_logical to find all extent refs for uncompressed extents · c995ab3c
      Zygo Blaxell 提交于
      The LOGICAL_INO ioctl provides a backward mapping from extent bytenr and
      offset (encoded as a single logical address) to a list of extent refs.
      LOGICAL_INO complements TREE_SEARCH, which provides the forward mapping
      (extent ref -> extent bytenr and offset, or logical address).  These are
      useful capabilities for programs that manipulate extents and extent
      references from userspace (e.g. dedup and defrag utilities).
      
      When the extents are uncompressed (and not encrypted and not other),
      check_extent_in_eb performs filtering of the extent refs to remove any
      extent refs which do not contain the same extent offset as the 'logical'
      parameter's extent offset.  This prevents LOGICAL_INO from returning
      references to more than a single block.
      
      To find the set of extent references to an uncompressed extent from [a, b),
      userspace has to run a loop like this pseudocode:
      
      	for (i = a; i < b; ++i)
      		extent_ref_set += LOGICAL_INO(i);
      
      At each iteration of the loop (up to 32768 iterations for a 128M extent),
      data we are interested in is collected in the kernel, then deleted by
      the filter in check_extent_in_eb.
      
      When the extents are compressed (or encrypted or other), the 'logical'
      parameter must be an extent bytenr (the 'a' parameter in the loop).
      No filtering by extent offset is done (or possible?) so the result is
      the complete set of extent refs for the entire extent.  This removes
      the need for the loop, since we get all the extent refs in one call.
      
      Add an 'ignore_offset' argument to iterate_inodes_from_logical,
      [...several levels of function call graph...], and check_extent_in_eb, so
      that we can disable the extent offset filtering for uncompressed extents.
      This flag can be set by an improved version of the LOGICAL_INO ioctl to
      get either behavior as desired.
      
      There is no functional change in this patch.  The new flag is always
      false.
      Signed-off-by: NZygo Blaxell <ce3g8jdj@umail.furryterror.org>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      [ minor coding style fixes ]
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      c995ab3c
  2. 26 9月, 2017 2 次提交
    • S
      btrfs: Report error on removing qgroup if del_qgroup_item fails · 36b96fdc
      Sargun Dhillon 提交于
      Previously, we were calling del_qgroup_item, and ignoring the return code
      resulting in a potential to have divergent in-memory state without an
      error. Perhaps, it makes sense to handle this error code, and put the
      filesystem into a read only, or similar state.
      
      This patch only adds reporting of the error if the error is fatal,
      (any error other than qgroup not found).
      Signed-off-by: NSargun Dhillon <sargun@sargun.me>
      Reviewed-by: NQu Wenruo <quwenruo.btrfs@gmx.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      36b96fdc
    • M
      btrfs: remove BTRFS_FS_QUOTA_DISABLING flag · c2faff79
      Misono, Tomohiro 提交于
      Currently, "btrfs quota enable" would fail after "btrfs quota disable" on
      the first time with syslog output "qgroup_rescan_init failed with -22", but
      it would succeed on the second time.
      
      When "quota disable" is called, BTRFS_FS_QUOTA_DISABLING flag bit will be
      set in fs_info->flags in btrfs_quota_disable(), but it will not be droppd
      in btrfs_run_qgroups() (which is called in btrfs_commit_transaction())
      because quota_root has already been freed. If "quota enable" is called
      after that, both BTRFS_FS_QUOTA_DISABLING and BTRFS_FS_QUOTA_ENABLED flag
      would be dropped in the btrfs_run_qgroups() since quota_root is not NULL.
      This leads to the failure of "quota enable" on the first time.
      
      BTRFS_FS_QUOTA_DISABLING flag is not used outside of "quota disable"
      context and is equivalent to whether quota_root is NULL or not.
      btrfs_run_qgroups() checks whether quota_root is NULL or not in the first
      place.
      
      So, let's remove BTRFS_FS_QUOTA_DISABLING flag.
      Signed-off-by: NTomohiro Misono <misono.tomohiro@jp.fujitsu.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      c2faff79
  3. 21 8月, 2017 1 次提交
  4. 16 8月, 2017 2 次提交
  5. 30 6月, 2017 6 次提交
    • C
      btrfs: fix integer overflow in calc_reclaim_items_nr · 6374e57a
      Chris Mason 提交于
      Dave Jones hit a WARN_ON(nr < 0) in btrfs_wait_ordered_roots() with
      v4.12-rc6.  This was because commit 70e7af24 made it possible for
      calc_reclaim_items_nr() to return a negative number.  It's not really a
      bug in that commit, it just didn't go far enough down the stack to find
      all the possible 64->32 bit overflows.
      
      This switches calc_reclaim_items_nr() to return a u64 and changes everyone
      that uses the results of that math to u64 as well.
      Reported-by: NDave Jones <davej@codemonkey.org.uk>
      Fixes: 70e7af24 ("Btrfs: fix delalloc accounting leak caused by u32 overflow")
      Signed-off-by: NChris Mason <clm@fb.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      6374e57a
    • Q
      btrfs: qgroup: Fix qgroup reserved space underflow by only freeing reserved ranges · bc42bda2
      Qu Wenruo 提交于
      [BUG]
      For the following case, btrfs can underflow qgroup reserved space
      at an error path:
      (Page size 4K, function name without "btrfs_" prefix)
      
               Task A                  |             Task B
      ----------------------------------------------------------------------
      Buffered_write [0, 2K)           |
      |- check_data_free_space()       |
      |  |- qgroup_reserve_data()      |
      |     Range aligned to page      |
      |     range [0, 4K)          <<< |
      |     4K bytes reserved      <<< |
      |- copy pages to page cache      |
                                       | Buffered_write [2K, 4K)
                                       | |- check_data_free_space()
                                       | |  |- qgroup_reserved_data()
                                       | |     Range alinged to page
                                       | |     range [0, 4K)
                                       | |     Already reserved by A <<<
                                       | |     0 bytes reserved      <<<
                                       | |- delalloc_reserve_metadata()
                                       | |  And it *FAILED* (Maybe EQUOTA)
                                       | |- free_reserved_data_space()
                                            |- qgroup_free_data()
                                               Range aligned to page range
                                               [0, 4K)
                                               Freeing 4K
      (Special thanks to Chandan for the detailed report and analyse)
      
      [CAUSE]
      Above Task B is freeing reserved data range [0, 4K) which is actually
      reserved by Task A.
      
      And at writeback time, page dirty by Task A will go through writeback
      routine, which will free 4K reserved data space at file extent insert
      time, causing the qgroup underflow.
      
      [FIX]
      For btrfs_qgroup_free_data(), add @reserved parameter to only free
      data ranges reserved by previous btrfs_qgroup_reserve_data().
      So in above case, Task B will try to free 0 byte, so no underflow.
      Reported-by: NChandan Rajendra <chandan@linux.vnet.ibm.com>
      Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
      Reviewed-by: NChandan Rajendra <chandan@linux.vnet.ibm.com>
      Tested-by: NChandan Rajendra <chandan@linux.vnet.ibm.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      bc42bda2
    • Q
      btrfs: qgroup: Introduce extent changeset for qgroup reserve functions · 364ecf36
      Qu Wenruo 提交于
      Introduce a new parameter, struct extent_changeset for
      btrfs_qgroup_reserved_data() and its callers.
      
      Such extent_changeset was used in btrfs_qgroup_reserve_data() to record
      which range it reserved in current reserve, so it can free it in error
      paths.
      
      The reason we need to export it to callers is, at buffered write error
      path, without knowing what exactly which range we reserved in current
      allocation, we can free space which is not reserved by us.
      
      This will lead to qgroup reserved space underflow.
      Reviewed-by: NChandan Rajendra <chandan@linux.vnet.ibm.com>
      Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      364ecf36
    • Q
      btrfs: qgroup: Return actually freed bytes for qgroup release or free data · 7bc329c1
      Qu Wenruo 提交于
      btrfs_qgroup_release/free_data() only returns 0 or a negative error
      number (ENOMEM is the only possible error).
      
      This is normally good enough, but sometimes we need the exact byte
      count it freed/released.
      
      Change it to return actually released/freed bytenr number instead of 0
      for success.
      And slightly modify related extent_changeset structure, since in btrfs
      one no-hole data extent won't be larger than 128M, so "unsigned int"
      is large enough for the use case.
      Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      7bc329c1
    • Q
      btrfs: qgroup: Cleanup btrfs_qgroup_prepare_account_extents function · d1b8b94a
      Qu Wenruo 提交于
      Quite a lot of qgroup corruption happens due to wrong time of calling
      btrfs_qgroup_prepare_account_extents().
      
      Since the safest time is to call it just before
      btrfs_qgroup_account_extents(), there is no need to separate these 2
      functions.
      
      Merging them will make code cleaner and less bug prone.
      Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
      [ changelog and comment adjustments ]
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      d1b8b94a
    • Q
      btrfs: qgroup: Add quick exit for non-fs extents · 5edfd9fd
      Qu Wenruo 提交于
      Modify btrfs_qgroup_account_extent() to exit quicker for non-fs extents.
      
      The quick exit condition is:
      1) The extent belongs to a non-fs tree
         Only fs-tree extents can affect qgroup numbers and is the only case
         where extent can be shared between different trees.
      
         Although strictly speaking extent in data-reloc or tree-reloc tree
         can be shared, data/tree-reloc root won't appear in the result of
         btrfs_find_all_roots(), so we can ignore such case.
      
         So we can check the first root in old_roots/new_roots ulist.
         - if we find the 1st root is a not a fs/subvol root, then we can skip
           the extent
         - if we find the 1st root is a fs/subvol root, then we must continue
           calculation
      
      OR
      
      2) both 'nr_old_roots' and 'nr_new_roots' are 0
         This means either such extent got allocated then freed in current
         transaction or it's a new reloc tree extent, whose nr_new_roots is 0.
         Either way it won't affect qgroup accounting and can be skipped
         safely.
      
      Such quick exit can make trace output more quite and less confusing:
      (example with fs uuid and time stamp removed)
      
      Before:
      ------
      add_delayed_tree_ref: bytenr=29556736 num_bytes=16384 action=ADD_DELAYED_REF parent=0(-) ref_root=2(EXTENT_TREE) level=0 type=TREE_BLOCK_REF seq=0
      btrfs_qgroup_account_extent: bytenr=29556736 num_bytes=16384 nr_old_roots=0 nr_new_roots=1
      ------
      Extent tree block will trigger btrfs_qgroup_account_extent() trace point
      while no qgroup number is changed, as extent tree won't affect qgroup
      accounting.
      
      After:
      ------
      add_delayed_tree_ref: bytenr=29556736 num_bytes=16384 action=ADD_DELAYED_REF parent=0(-) ref_root=2(EXTENT_TREE) level=0 type=TREE_BLOCK_REF seq=0
      ------
      Now such unrelated extent won't trigger btrfs_qgroup_account_extent()
      trace point, making the trace less noisy.
      Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
      [ changelog and comment adjustments ]
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      5edfd9fd
  6. 21 6月, 2017 1 次提交
  7. 20 6月, 2017 1 次提交
    • S
      btrfs: add quota override flag to enable quota override for CAP_SYS_RESOURCE · f29efe29
      Sargun Dhillon 提交于
      This patch introduces the quota override flag to btrfs_fs_info, and a
      change to quota limit checking code to temporarily allow for quota to be
      overridden for processes with CAP_SYS_RESOURCE.
      
      It's useful for administrative programs, such as log rotation, that may
      need to temporarily use more disk space in order to free up a greater
      amount of overall disk space without yielding more disk space to the
      rest of userland.
      
      Eventually, we may want to add the idea of an operator-specific quota,
      operator reserved space, or something else to allow for administrative
      override, but this is perhaps the simplest solution.
      Signed-off-by: NSargun Dhillon <sargun@sargun.me>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      [ minor changelog edits ]
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      f29efe29
  8. 19 4月, 2017 1 次提交
  9. 18 4月, 2017 5 次提交
  10. 29 3月, 2017 1 次提交
    • G
      btrfs: Change qgroup_meta_rsv to 64bit · ce0dcee6
      Goldwyn Rodrigues 提交于
      Using an int value is causing qg->reserved to become negative and
      exclusive -EDQUOT to be reached prematurely.
      
      This affects exclusive qgroups only.
      
      TEST CASE:
      
      DEVICE=/dev/vdb
      MOUNTPOINT=/mnt
      SUBVOL=$MOUNTPOINT/tmp
      
      umount $SUBVOL
      umount $MOUNTPOINT
      
      mkfs.btrfs -f $DEVICE
      mount /dev/vdb $MOUNTPOINT
      btrfs quota enable $MOUNTPOINT
      btrfs subvol create $SUBVOL
      umount $MOUNTPOINT
      mount /dev/vdb $MOUNTPOINT
      mount -o subvol=tmp $DEVICE $SUBVOL
      btrfs qgroup limit -e 3G $SUBVOL
      
      btrfs quota rescan /mnt -w
      
      for i in `seq 1 44000`; do
        dd if=/dev/zero of=/mnt/tmp/test_$i bs=10k count=1
        if [[ $? > 0 ]]; then
           btrfs qgroup show -pcref $SUBVOL
           exit 1
        fi
      done
      Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
      [ add reproducer to changelog ]
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      ce0dcee6
  11. 17 2月, 2017 12 次提交
  12. 14 2月, 2017 2 次提交
    • J
      btrfs: allow unlink to exceed subvolume quota · 003d7c59
      Jeff Mahoney 提交于
      Once a qgroup limit is exceeded, it's impossible to restore normal
      operation to the subvolume without modifying the limit or removing
      the subvolume.  This is a surprising situation for many users used
      to the typical workflow with quotas on other file systems where it's
      possible to remove files until the used space is back under the limit.
      
      When we go to unlink a file and start the transaction, we'll hit
      the qgroup limit while trying to reserve space for the items we'll
      modify while removing the file.  We discussed last month how best
      to handle this situation and agreed that there is no perfect solution.
      The best principle-of-least-surprise solution is to handle it similarly
      to how we already handle ENOSPC when unlinking, which is to allow
      the operation to succeed with the expectation that it will ultimately
      release space under most circumstances.
      
      This patch modifies the transaction start path to select whether to
      honor the qgroups limits.  btrfs_start_transaction_fallback_global_rsv
      is the only caller that skips enforcement.  The reservation and tracking
      still happens normally -- it just skips the enforcement step.
      Signed-off-by: NJeff Mahoney <jeffm@suse.com>
      Reviewed-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      003d7c59
    • Q
      btrfs: Add WARN_ON for qgroup reserved underflow · 18dc22c1
      Qu Wenruo 提交于
      Goldwyn Rodrigues has exposed and fixed a bug which underflows btrfs
      qgroup reserved space, and leads to non-writable fs.
      
      This reminds us that we don't have enough underflow check for qgroup
      reserved space.
      
      For underflow case, we should not really underflow the numbers but warn
      and keeps qgroup still work.
      
      So add more check on qgroup reserved space and add WARN_ON() and
      btrfs_warn() for any underflow case.
      Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Reviewed-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      18dc22c1
  13. 06 12月, 2016 4 次提交
  14. 30 11月, 2016 1 次提交