1. 06 8月, 2018 2 次提交
  2. 12 4月, 2018 1 次提交
  3. 31 3月, 2018 2 次提交
    • Q
      btrfs: Validate child tree block's level and first key · 581c1760
      Qu Wenruo 提交于
      We have several reports about node pointer points to incorrect child
      tree blocks, which could have even wrong owner and level but still with
      valid generation and checksum.
      
      Although btrfs check could handle it and print error message like:
      leaf parent key incorrect 60670574592
      
      Kernel doesn't have enough check on this type of corruption correctly.
      At least add such check to read_tree_block() and btrfs_read_buffer(),
      where we need two new parameters @level and @first_key to verify the
      child tree block.
      
      The new @level check is mandatory and all call sites are already
      modified to extract expected level from its call chain.
      
      While @first_key is optional, the following call sites are skipping such
      check:
      1) Root node/leaf
         As ROOT_ITEM doesn't contain the first key, skip @first_key check.
      2) Direct backref
         Only parent bytenr and level is known and we need to resolve the key
         all by ourselves, skip @first_key check.
      
      Another note of this verification is, it needs extra info from nodeptr
      or ROOT_ITEM, so it can't fit into current tree-checker framework, which
      is limited to node/leaf boundary.
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      581c1760
    • N
      btrfs: Remove unused op_key var from add_delayed_refs · a6dbceaf
      Nikolay Borisov 提交于
      Added as part of 86d5f994 ("btrfs: convert prelimary reference
      tracking to use rbtrees") but never used. tmp_op_key essentially
      subsumed that variable.
      Signed-off-by: NNikolay Borisov <nborisov@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      a6dbceaf
  4. 26 3月, 2018 1 次提交
    • D
      btrfs: add more __cold annotations · e67c718b
      David Sterba 提交于
      The __cold functions are placed to a special section, as they're
      expected to be called rarely. This could help i-cache prefetches or help
      compiler to decide which branches are more/less likely to be taken
      without any other annotations needed.
      
      Though we can't add more __exit annotations, it's still possible to add
      __cold (that's also added with __exit). That way the following function
      categories are tagged:
      
      - printf wrappers, error messages
      - exit helpers
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      e67c718b
  5. 15 3月, 2018 1 次提交
    • E
      btrfs: add missing initialization in btrfs_check_shared · 18bf591b
      Edmund Nadolski 提交于
      This patch addresses an issue that causes fiemap to falsely
      report a shared extent.  The test case is as follows:
      
      xfs_io -f -d -c "pwrite -b 16k 0 64k" -c "fiemap -v" /media/scratch/file5
      sync
      xfs_io  -c "fiemap -v" /media/scratch/file5
      
      which gives the resulting output:
      
      wrote 65536/65536 bytes at offset 0
      64 KiB, 4 ops; 0.0000 sec (121.359 MiB/sec and 7766.9903 ops/sec)
      /media/scratch/file5:
       EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
         0: [0..127]:        24576..24703       128 0x2001
      /media/scratch/file5:
       EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
         0: [0..127]:        24576..24703       128   0x1
      
      This is because btrfs_check_shared calls find_parent_nodes
      repeatedly in a loop, passing a share_check struct to report
      the count of shared extent. But btrfs_check_shared does not
      re-initialize the count value to zero for subsequent calls
      from the loop, resulting in a false share count value. This
      is a regressive behavior from 4.13.
      
      With proper re-initialization the test result is as follows:
      
      wrote 65536/65536 bytes at offset 0
      64 KiB, 4 ops; 0.0000 sec (110.035 MiB/sec and 7042.2535 ops/sec)
      /media/scratch/file5:
       EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
         0: [0..127]:        24576..24703       128   0x1
      /media/scratch/file5:
       EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
         0: [0..127]:        24576..24703       128   0x1
      
      which corrects the regression.
      
      Fixes: 3ec4d323 ("btrfs: allow backref search checks for shared extents")
      Signed-off-by: NEdmund Nadolski <enadolski@suse.com>
      [ add text from cover letter to changelog ]
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      18bf591b
  6. 02 2月, 2018 1 次提交
    • Z
      btrfs: remove spurious WARN_ON(ref->count < 0) in find_parent_nodes · c8195a7b
      Zygo Blaxell 提交于
      Until v4.14, this warning was very infrequent:
      
      	WARNING: CPU: 3 PID: 18172 at fs/btrfs/backref.c:1391 find_parent_nodes+0xc41/0x14e0
      	Modules linked in: [...]
      	CPU: 3 PID: 18172 Comm: bees Tainted: G      D W    L  4.11.9-zb64+ #1
      	Hardware name: System manufacturer System Product Name/M5A78L-M/USB3, BIOS 2101    12/02/2014
      	Call Trace:
      	 dump_stack+0x85/0xc2
      	 __warn+0xd1/0xf0
      	 warn_slowpath_null+0x1d/0x20
      	 find_parent_nodes+0xc41/0x14e0
      	 __btrfs_find_all_roots+0xad/0x120
      	 ? extent_same_check_offsets+0x70/0x70
      	 iterate_extent_inodes+0x168/0x300
      	 iterate_inodes_from_logical+0x87/0xb0
      	 ? iterate_inodes_from_logical+0x87/0xb0
      	 ? extent_same_check_offsets+0x70/0x70
      	 btrfs_ioctl+0x8ac/0x2820
      	 ? lock_acquire+0xc2/0x200
      	 do_vfs_ioctl+0x91/0x700
      	 ? __fget+0x112/0x200
      	 SyS_ioctl+0x79/0x90
      	 entry_SYSCALL_64_fastpath+0x23/0xc6
      	 ? trace_hardirqs_off_caller+0x1f/0x140
      
      Starting with v4.14 (specifically 86d5f994 ("btrfs: convert prelimary
      reference tracking to use rbtrees")) the WARN_ON occurs three orders of
      magnitude more frequently--almost once per second while running workloads
      like bees.
      
      Replace the WARN_ON() with a comment rationale for its removal.
      The rationale is paraphrased from an explanation by Edmund Nadolski
      <enadolski@suse.de> on the linux-btrfs mailing list.
      
      Fixes: 8da6d581 ("Btrfs: added btrfs_find_all_roots()")
      Signed-off-by: NZygo Blaxell <ce3g8jdj@umail.furryterror.org>
      Reviewed-by: NLu Fengqi <lufq.fnst@cn.fujitsu.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      c8195a7b
  7. 22 1月, 2018 1 次提交
  8. 02 11月, 2017 2 次提交
    • J
      btrfs: track refs in a rb_tree instead of a list · 0e0adbcf
      Josef Bacik 提交于
      If we get a significant amount of delayed refs for a single block (think
      modifying multiple snapshots) we can end up spending an ungodly amount
      of time looping through all of the entries trying to see if they can be
      merged.  This is because we only add them to a list, so we have O(2n)
      for every ref head.  This doesn't make any sense as we likely have refs
      for different roots, and so they cannot be merged.  Tracking in a tree
      will allow us to break as soon as we hit an entry that doesn't match,
      making our worst case O(n).
      
      With this we can also merge entries more easily.  Before we had to hope
      that matching refs were on the ends of our list, but with the tree we
      can search down to exact matches and merge them at insert time.
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      0e0adbcf
    • Z
      btrfs: add a flag to iterate_inodes_from_logical to find all extent refs for uncompressed extents · c995ab3c
      Zygo Blaxell 提交于
      The LOGICAL_INO ioctl provides a backward mapping from extent bytenr and
      offset (encoded as a single logical address) to a list of extent refs.
      LOGICAL_INO complements TREE_SEARCH, which provides the forward mapping
      (extent ref -> extent bytenr and offset, or logical address).  These are
      useful capabilities for programs that manipulate extents and extent
      references from userspace (e.g. dedup and defrag utilities).
      
      When the extents are uncompressed (and not encrypted and not other),
      check_extent_in_eb performs filtering of the extent refs to remove any
      extent refs which do not contain the same extent offset as the 'logical'
      parameter's extent offset.  This prevents LOGICAL_INO from returning
      references to more than a single block.
      
      To find the set of extent references to an uncompressed extent from [a, b),
      userspace has to run a loop like this pseudocode:
      
      	for (i = a; i < b; ++i)
      		extent_ref_set += LOGICAL_INO(i);
      
      At each iteration of the loop (up to 32768 iterations for a 128M extent),
      data we are interested in is collected in the kernel, then deleted by
      the filter in check_extent_in_eb.
      
      When the extents are compressed (or encrypted or other), the 'logical'
      parameter must be an extent bytenr (the 'a' parameter in the loop).
      No filtering by extent offset is done (or possible?) so the result is
      the complete set of extent refs for the entire extent.  This removes
      the need for the loop, since we get all the extent refs in one call.
      
      Add an 'ignore_offset' argument to iterate_inodes_from_logical,
      [...several levels of function call graph...], and check_extent_in_eb, so
      that we can disable the extent offset filtering for uncompressed extents.
      This flag can be set by an improved version of the LOGICAL_INO ioctl to
      get either behavior as desired.
      
      There is no functional change in this patch.  The new flag is always
      false.
      Signed-off-by: NZygo Blaxell <ce3g8jdj@umail.furryterror.org>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      [ minor coding style fixes ]
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      c995ab3c
  9. 30 10月, 2017 1 次提交
  10. 21 8月, 2017 1 次提交
  11. 16 8月, 2017 11 次提交
  12. 20 6月, 2017 1 次提交
  13. 18 4月, 2017 3 次提交
  14. 17 2月, 2017 1 次提交
  15. 14 2月, 2017 1 次提交
  16. 06 12月, 2016 3 次提交
  17. 27 9月, 2016 2 次提交
  18. 26 9月, 2016 1 次提交
    • L
      btrfs: fix check_shared for fiemap ioctl · afce772e
      Lu Fengqi 提交于
      Only in the case of different root_id or different object_id, check_shared
      identified extent as the shared. However, If a extent was referred by
      different offset of same file, it should also be identified as shared.
      In addition, check_shared's loop scale is at least n^3, so if a extent
      has too many references, even causes soft hang up.
      
      First, add all delayed_ref to the ref_tree and calculate the unqiue_refs,
      if the unique_refs is greater than one, return BACKREF_FOUND_SHARED.
      Then individually add the on-disk reference(inline/keyed) to the ref_tree
      and calculate the unique_refs of the ref_tree to check if the unique_refs
      is greater than one.Because once there are two references to return
      SHARED, so the time complexity is close to the constant.
      Reported-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com>
      Signed-off-by: NLu Fengqi <lufq.fnst@cn.fujitsu.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      afce772e
  19. 25 8月, 2016 1 次提交
    • Q
      btrfs: backref: Fix soft lockup in __merge_refs function · d8422ba3
      Qu Wenruo 提交于
      When over 1000 file extents refers to one extent, find_parent_nodes()
      will be obviously slow, due to the O(n^2)~O(n^3) loops inside
      __merge_refs().
      
      The following ftrace shows the cubic growth of execution time:
      
      256 refs
       5) + 91.768 us   |  __add_keyed_refs.isra.12 [btrfs]();
       5)   1.447 us    |  __add_missing_keys.isra.13 [btrfs]();
       5) ! 114.544 us  |  __merge_refs [btrfs]();
       5) ! 136.399 us  |  __merge_refs [btrfs]();
      
      512 refs
       6) ! 279.859 us  |  __add_keyed_refs.isra.12 [btrfs]();
       6)   3.164 us    |  __add_missing_keys.isra.13 [btrfs]();
       6) ! 442.498 us  |  __merge_refs [btrfs]();
       6) # 2091.073 us |  __merge_refs [btrfs]();
      
      and 1024 refs
       7) ! 368.683 us  |  __add_keyed_refs.isra.12 [btrfs]();
       7)   4.810 us    |  __add_missing_keys.isra.13 [btrfs]();
       7) # 2043.428 us |  __merge_refs [btrfs]();
       7) * 18964.23 us |  __merge_refs [btrfs]();
      
      And sort them into the following char:
      (Unit: us)
      ------------------------------------------------------------------------
       Trace function        | 256 ref        | 512 refs      | 1024 refs    |
      ------------------------------------------------------------------------
       __add_keyed_refs      | 91             | 249           | 368          |
       __add_missing_keys    | 1              | 3             | 4            |
       __merge_refs 1st call | 114            | 442           | 2043         |
       __merge_refs 2nd call | 136            | 2091          | 18964        |
      ------------------------------------------------------------------------
      
      We can see the that __add_keyed_refs() grows almost in linear behavior.
      And __add_missing_keys() in this case doesn't change much or takes much
      time.
      
      While for the 1st __merge_refs() it's square growth
      for the 2nd __merge_refs() call it's cubic growth.
      
      It's no doubt that merge_refs() will take a long long time to execute if
      the number of refs continues its grows.
      
      So add a cond_resced() into the loop of __merge_refs().
      
      Although this will solve the problem of soft lockup, we need to use the
      new rb_tree based structure introduced by Lu Fengqi to really solve the
      long execution time.
      Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      d8422ba3
  20. 26 7月, 2016 2 次提交
  21. 26 5月, 2016 1 次提交