1. 18 2月, 2016 1 次提交
  2. 16 1月, 2016 1 次提交
  3. 07 1月, 2016 1 次提交
  4. 25 11月, 2015 1 次提交
    • J
      Btrfs: use btrfs_get_fs_root in resolve_indirect_ref · 2d9e9776
      Josef Bacik 提交于
      The backref code will look up the fs_root we're trying to resolve our indirect
      refs for, unfortunately we use btrfs_read_fs_root_no_name, which returns -ENOENT
      if the ref is 0.  This isn't helpful for the qgroup stuff with snapshot delete
      as it won't be able to search down the snapshot we are deleting, which will
      cause us to miss roots.  So use btrfs_get_fs_root and send false for check_ref
      so we can always get the root we're looking for.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.de>
      Signed-off-by: NChris Mason <clm@fb.com>
      2d9e9776
  5. 27 10月, 2015 1 次提交
  6. 22 10月, 2015 1 次提交
    • J
      Btrfs: fix qgroup sanity tests · d9ee522b
      Josef Bacik 提交于
      With my changes to allow us to find old roots when resolving indirect refs I
      introduced a regression to the sanity tests.  Since we don't really care to go
      down into the fs roots we just need to have the old behavior of returning ENOENT
      for dummy roots for the sanity tests.  In the future if we want to get fancy we
      can populate the test fs trees with the references as well.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      d9ee522b
  7. 14 10月, 2015 1 次提交
    • C
      btrfs: fix use after free iterating extrefs · dc6c5fb3
      Chris Mason 提交于
      The code for btrfs inode-resolve has never worked properly for
      files with enough hard links to trigger extrefs.  It was trying to
      get the leaf out of a path after freeing the path:
      
      	btrfs_release_path(path);
      	leaf = path->nodes[0];
      	item_size = btrfs_item_size_nr(leaf, slot);
      
      The fix here is to use the extent buffer we cloned just a little higher
      up to avoid deadlocks caused by using the leaf in the path.
      Signed-off-by: NChris Mason <clm@fb.com>
      cc: stable@vger.kernel.org # v3.7+
      cc: Mark Fasheh <mfasheh@suse.de>
      Reviewed-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: NMark Fasheh <mfasheh@suse.de>
      dc6c5fb3
  8. 09 8月, 2015 2 次提交
    • L
      Btrfs: fix warning in backref walking · acdf898d
      Liu Bo 提交于
      When we do backref walking, we search firstly in queued delayed refs
      and then the on-disk backrefs, but we parse differently for shared
      references, for delayed refs we also add 'ref->root' while for on-disk
      backrefs we don't, this can prevent us from merging refs indexed
      by the same bytenr and cause find_parent_nodes() to throw a warning at
      'WARN_ON(ref->count < 0)', for example, when we have a shared data extent
      with 'ref_cnt=1' and a delayed shared data with a BTRFS_DROP_DELAYED_REF,
      that happens.
      
      For shared references, no matter if it's delayed or on-disk, ref->root is
      not at all used, instead it's ref->parent that really matters, so this has
      delayed refs handled as the same way as on-disk refs.
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      acdf898d
    • F
      Btrfs: teach backref walking about backrefs with underflowed offset values · d6589101
      Filipe Manana 提交于
      When cloning/deduplicating file extents (through the clone and extent_same
      ioctls) we can get data back references with offset values that are a
      result of an unsigned integer arithmetic underflow, that is, values that
      are much larger then they could be otherwise.
      
      This is not a problem when decrementing or dropping the back references
      (happens when we overwrite the extents or punch a hole for example, through
      __btrfs_drop_extents()), since we compute the same too large offset value,
      but it is a problem for the backref walking code, used by an incremental
      send and the ioctls that are used by the btrfs tool "inspect-internal"
      commands, as it makes it miss the corresponding file extent items because
      the search key is set for an extent item that starts at an offset matching
      the exceptionally large offset value of the data back reference. For an
      incremental send this causes the send ioctl to fail with -EIO.
      
      So teach the backref walking code to deal with these cases by setting the
      search key's offset to 0 if the backref's offset value is larger than
      LLONG_MAX (the largest possible file offset). This makes sure the backref
      walking code finds the corresponding file extent items at the expense of
      scanning more items and leafs in the btree.
      
      Fixing the clone/dedup ioctls to not produce such underflowed results would
      require major changes breaking backward compatibility, updating user space
      tools, etc.
      
      Simple reproducer case for fstests:
      
        seq=`basename $0`
        seqres=$RESULT_DIR/$seq
        echo "QA output created by $seq"
      
        tmp=/tmp/$$
        status=1	# failure is the default!
        trap "_cleanup; exit \$status" 0 1 2 3 15
      
        _cleanup()
        {
            rm -fr $send_files_dir
            rm -f $tmp.*
        }
      
        # get standard environment, filters and checks
        . ./common/rc
        . ./common/filter
      
        # real QA test starts here
        _supported_fs btrfs
        _supported_os Linux
        _require_scratch
        _require_cloner
        _need_to_be_root
      
        send_files_dir=$TEST_DIR/btrfs-test-$seq
      
        rm -f $seqres.full
        rm -fr $send_files_dir
        mkdir $send_files_dir
      
        _scratch_mkfs >>$seqres.full 2>&1
        _scratch_mount
      
        # Create our test file with a single extent of 64K starting at file
        # offset 128K.
        $XFS_IO_PROG -f -c "pwrite -S 0xaa 128K 64K" $SCRATCH_MNT/foo \
            | _filter_xfs_io
      
        _run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT \
            $SCRATCH_MNT/mysnap1
      
        # Now clone parts of the original extent into lower offsets of the file.
        #
        # The first clone operation adds a file extent item to file offset 0
        # that points to our initial extent with a data offset of 16K. The
        # corresponding data back reference in the extent tree has an offset of
        # 18446744073709535232, which is the result of file_offset - data_offset
        # = 0 - 16K.
        #
        # The second clone operation adds a file extent item to file offset 16K
        # that points to our initial extent with a data offset of 48K. The
        # corresponding data back reference in the extent tree has an offset of
        # 18446744073709518848, which is the result of file_offset - data_offset
        # = 16K - 48K.
        #
        # Those large back reference offsets (result of unsigned arithmetic
        # underflow) confused the back reference walking code (used by an
        # incremental send and the multiple inspect-internal ioctls) and made it
        # miss the back references, which for the case of an incremental send it
        # made it fail with -EIO and print a message like the following to
        # dmesg:
        #
        # "BTRFS error (device sdc): did not find backref in send_root. \
        #  inode=257, offset=0, disk_byte=12845056 found extent=12845056"
        #
        $CLONER_PROG -s $(((128 + 16) * 1024)) -d 0 -l $((16 * 1024)) \
            $SCRATCH_MNT/foo $SCRATCH_MNT/foo
        $CLONER_PROG -s $(((128 + 48) * 1024)) -d $((16 * 1024)) \
            -l $((16 * 1024)) $SCRATCH_MNT/foo $SCRATCH_MNT/foo
      
        _run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT \
            $SCRATCH_MNT/mysnap2
      
        _run_btrfs_util_prog send $SCRATCH_MNT/mysnap1 -f $send_files_dir/1.snap
        _run_btrfs_util_prog send -p $SCRATCH_MNT/mysnap1 $SCRATCH_MNT/mysnap2 \
            -f $send_files_dir/2.snap
      
        echo "File digest in the original filesystem:"
        md5sum $SCRATCH_MNT/mysnap2/foo | _filter_scratch
      
        # Now recreate the filesystem by receiving both send streams and verify
        # we get the same file contents that the original filesystem had.
        _scratch_unmount
        _scratch_mkfs >>$seqres.full 2>&1
        _scratch_mount
      
        _run_btrfs_util_prog receive $SCRATCH_MNT -f $send_files_dir/1.snap
        _run_btrfs_util_prog receive $SCRATCH_MNT -f $send_files_dir/2.snap
      
        echo "File digest in the new filesystem:"
        md5sum $SCRATCH_MNT/mysnap2/foo | _filter_scratch
      
        status=0
        exit
      
      The test's expected golden output is:
      
        wrote 65536/65536 bytes at offset 131072
        XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
        File digest in the original filesystem:
        6c6079335cff141b8a31233ead04cbff  SCRATCH_MNT/mysnap2/foo
        File digest in the new filesystem:
        6c6079335cff141b8a31233ead04cbff  SCRATCH_MNT/mysnap2/foo
      
      But it failed with:
      
          (...)
          @@ -1,7 +1,5 @@
           QA output created by 097
           wrote 65536/65536 bytes at offset 131072
           XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
          -File digest in the original filesystem:
          -6c6079335cff141b8a31233ead04cbff  SCRATCH_MNT/mysnap2/foo
          -File digest in the new filesystem:
          -6c6079335cff141b8a31233ead04cbff  SCRATCH_MNT/mysnap2/foo
          ...
      
        $ cat /home/fdmanana/git/hub/xfstests/results//btrfs/097.full
        (...)
        ERROR: send ioctl failed with -5: Input/output error
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      d6589101
  9. 11 6月, 2015 3 次提交
  10. 03 6月, 2015 1 次提交
  11. 20 5月, 2015 1 次提交
    • M
      btrfs: clear 'ret' in btrfs_check_shared() loop · 2c2ed5aa
      Mark Fasheh 提交于
      btrfs_check_shared() is leaking a return value of '1' from
      find_parent_nodes(). As a result, callers (in this case, extent_fiemap())
      are told extents are shared when they are not. This in turn broke fiemap on
      btrfs for kernels v3.18 and up.
      
      The fix is simple - we just have to clear 'ret' after we are done processing
      the results of find_parent_nodes().
      
      It wasn't clear to me at first what was happening with return values in
      btrfs_check_shared() and find_parent_nodes() - thanks to Josef for the help
      on irc. I added documentation to both functions to make things more clear
      for the next hacker who might come across them.
      
      If we could queue this up for -stable too that would be great.
      Signed-off-by: NMark Fasheh <mfasheh@suse.de>
      Reviewed-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      2c2ed5aa
  12. 04 3月, 2015 1 次提交
  13. 15 1月, 2015 2 次提交
  14. 03 1月, 2015 1 次提交
    • F
      Btrfs: correctly get tree level in tree_backref_for_extent · a1317f45
      Filipe Manana 提交于
      If we are using skinny metadata, the block's tree level is in the offset
      of the key and not in a btrfs_tree_block_info structure following the
      extent item (it doesn't exist). Therefore fix it.
      
      Besides returning the correct level in the tree, this also prevents reading
      past the leaf's end in the case where the extent item is the last item in
      the leaf (eb) and it has only 1 inline reference - this is because
      sizeof(struct btrfs_tree_block_info) is greater than
      sizeof(struct btrfs_extent_inline_ref).
      
      Got it while running a scrub which produced the following warning:
      
          BTRFS: checksum error at logical 42123264 on dev /dev/sde, sector 15840: metadata node (level 24) in tree 5
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: NSatoru Takeuchi <takeuchi_satoru@jp.fujitsu.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      a1317f45
  15. 02 10月, 2014 1 次提交
  16. 18 9月, 2014 3 次提交
  17. 15 8月, 2014 2 次提交
    • T
      Btrfs: Fix memory corruption by ulist_add_merge() on 32bit arch · 4eb1f66d
      Takashi Iwai 提交于
      We've got bug reports that btrfs crashes when quota is enabled on
      32bit kernel, typically with the Oops like below:
       BUG: unable to handle kernel NULL pointer dereference at 00000004
       IP: [<f9234590>] find_parent_nodes+0x360/0x1380 [btrfs]
       *pde = 00000000
       Oops: 0000 [#1] SMP
       CPU: 0 PID: 151 Comm: kworker/u8:2 Tainted: G S      W 3.15.2-1.gd43d97e-default #1
       Workqueue: btrfs-qgroup-rescan normal_work_helper [btrfs]
       task: f1478130 ti: f147c000 task.ti: f147c000
       EIP: 0060:[<f9234590>] EFLAGS: 00010213 CPU: 0
       EIP is at find_parent_nodes+0x360/0x1380 [btrfs]
       EAX: f147dda8 EBX: f147ddb0 ECX: 00000011 EDX: 00000000
       ESI: 00000000 EDI: f147dda4 EBP: f147ddf8 ESP: f147dd38
        DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
       CR0: 8005003b CR2: 00000004 CR3: 00bf3000 CR4: 00000690
       Stack:
        00000000 00000000 f147dda4 00000050 00000001 00000000 00000001 00000050
        00000001 00000000 d3059000 00000001 00000022 000000a8 00000000 00000000
        00000000 000000a1 00000000 00000000 00000001 00000000 00000000 11800000
       Call Trace:
        [<f923564d>] __btrfs_find_all_roots+0x9d/0xf0 [btrfs]
        [<f9237bb1>] btrfs_qgroup_rescan_worker+0x401/0x760 [btrfs]
        [<f9206148>] normal_work_helper+0xc8/0x270 [btrfs]
        [<c025e38b>] process_one_work+0x11b/0x390
        [<c025eea1>] worker_thread+0x101/0x340
        [<c026432b>] kthread+0x9b/0xb0
        [<c0712a71>] ret_from_kernel_thread+0x21/0x30
        [<c0264290>] kthread_create_on_node+0x110/0x110
      
      This indicates a NULL corruption in prefs_delayed list.  The further
      investigation and bisection pointed that the call of ulist_add_merge()
      results in the corruption.
      
      ulist_add_merge() takes u64 as aux and writes a 64bit value into
      old_aux.  The callers of this function in backref.c, however, pass a
      pointer of a pointer to old_aux.  That is, the function overwrites
      64bit value on 32bit pointer.  This caused a NULL in the adjacent
      variable, in this case, prefs_delayed.
      
      Here is a quick attempt to band-aid over this: a new function,
      ulist_add_merge_ptr() is introduced to pass/store properly a pointer
      value instead of u64.  There are still ugly void ** cast remaining
      in the callers because void ** cannot be taken implicitly.  But, it's
      safer than explicit cast to u64, anyway.
      
      Bugzilla: https://bugzilla.novell.com/show_bug.cgi?id=887046
      Cc: <stable@vger.kernel.org> [v3.11+]
      Signed-off-by: NTakashi Iwai <tiwai@suse.de>
      Signed-off-by: NChris Mason <clm@fb.com>
      4eb1f66d
    • F
      Btrfs: read lock extent buffer while walking backrefs · 6f7ff6d7
      Filipe Manana 提交于
      Before processing the extent buffer, acquire a read lock on it, so
      that we're safe against concurrent updates on the extent buffer.
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      6f7ff6d7
  18. 10 6月, 2014 4 次提交
  19. 07 4月, 2014 1 次提交
    • J
      Btrfs: remove transaction from send · 9e351cc8
      Josef Bacik 提交于
      Lets try this again.  We can deadlock the box if we send on a box and try to
      write onto the same fs with the app that is trying to listen to the send pipe.
      This is because the writer could get stuck waiting for a transaction commit
      which is being blocked by the send.  So fix this by making sure looking at the
      commit roots is always going to be consistent.  We do this by keeping track of
      which roots need to have their commit roots swapped during commit, and then
      taking the commit_root_sem and swapping them all at once.  Then make sure we
      take a read lock on the commit_root_sem in cases where we search the commit root
      to make sure we're always looking at a consistent view of the commit roots.
      Previously we had problems with this because we would swap a fs tree commit root
      and then swap the extent tree commit root independently which would cause the
      backref walking code to screw up sometimes.  With this patch we no longer
      deadlock and pass all the weird send/receive corner cases.  Thanks,
      Reportedy-by: NHugo Mills <hugo@carfax.org.uk>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      9e351cc8
  20. 22 3月, 2014 1 次提交
  21. 11 3月, 2014 3 次提交
  22. 29 1月, 2014 7 次提交
    • W
      Btrfs: fix memory leaks on walking backrefs failure · f05c4746
      Wang Shilong 提交于
      When walking backrefs, we may iterate every inode's extent
      and add/merge them into ulist, and the caller will free memory
      from ulist.
      
      However, if we fail to allocate inode's extents element
      memory or ulist_add() fail to allocate memory, we won't
      add allocated memory into ulist, and the caller won't
      free some allocated memory thus memory leaks happen.
      Signed-off-by: NWang Shilong <wangsl.fnst@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      f05c4746
    • W
      Btrfs: add a reschedule point in btrfs_find_all_roots() · bca1a290
      Wang Shilong 提交于
      I can easily trigger the following warnings when enabling quota
      in my virtual machine(running Opensuse), Steps are firstly creating
      a subvolume full of fragment extents, and then create many snapshots
      (500 in my test case).
      
      [ 2362.808459] BUG: soft lockup - CPU#0 stuck for 22s! [btrfs-qgroup-re:1970]
      
      [ 2362.809023] task: e4af8450 ti: e371c000 task.ti: e371c000
      [ 2362.809026] EIP: 0060:[<fa38f4ae>] EFLAGS: 00000246 CPU: 0
      [ 2362.809049] EIP is at __merge_refs+0x5e/0x100 [btrfs]
      [ 2362.809051] EAX: 00000000 EBX: cfadbcf0 ECX: 00000000 EDX: cfadbcb0
      [ 2362.809052] ESI: dd8d3370 EDI: e371dde0 EBP: e371dd6c ESP: e371dd5c
      [ 2362.809054]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
      [ 2362.809055] CR0: 80050033 CR2: ac454d50 CR3: 009a9000 CR4: 001407d0
      [ 2362.809099] Stack:
      [ 2362.809100]  00000001 e371dde0 dfcc6890 f29f8000 e371de28 fa39016d 00000011 00000001
      [ 2362.809105]  99bfc000 00000000 93928000 00000000 00000001 00000050 e371dda8 00000001
      [ 2362.809109]  f3a31000 f3413000 00000001 e371ddb8 000040a8 00000202 00000000 00000023
      [ 2362.809113] Call Trace:
      [ 2362.809136]  [<fa39016d>] find_parent_nodes+0x34d/0x1280 [btrfs]
      [ 2362.809156]  [<fa391172>] btrfs_find_all_roots+0xb2/0x110 [btrfs]
      [ 2362.809174]  [<fa3934a8>] btrfs_qgroup_rescan_worker+0x358/0x7a0 [btrfs]
      [ 2362.809180]  [<c024d0ce>] ? lock_timer_base.isra.39+0x1e/0x40
      [ 2362.809199]  [<fa3648df>] worker_loop+0xff/0x470 [btrfs]
      [ 2362.809204]  [<c027a88a>] ? __wake_up_locked+0x1a/0x20
      [ 2362.809221]  [<fa3647e0>] ? btrfs_queue_worker+0x2b0/0x2b0 [btrfs]
      [ 2362.809225]  [<c025ebbc>] kthread+0x9c/0xb0
      [ 2362.809229]  [<c06b487b>] ret_from_kernel_thread+0x1b/0x30
      [ 2362.809233]  [<c025eb20>] ? kthread_create_on_node+0x110/0x110
      
      By adding a reschedule point at the end of btrfs_find_all_roots(), i no longer
      hit these warnings.
      
      Cc: Josef Bacik <jbacik@fb.com>
      Signed-off-by: NWang Shilong <wangsl.fnst@cn.fujitsu.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      bca1a290
    • W
      Btrfs: fix to catch all errors when resolving indirect ref · 95def2ed
      Wang Shilong 提交于
      We can only tolerate ENOENT here, for other errors, we should
      return directly.
      Signed-off-by: NWang Shilong <wangsl.fnst@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      95def2ed
    • W
      Btrfs: fix protection between walking backrefs and root deletion · 538f72cd
      Wang Shilong 提交于
      There is a race condition between resolving indirect ref and root deletion,
      and we should gurantee that root can not be destroyed to avoid accessing
      broken tree here.
      
      Here we fix it by holding @subvol_srcu, and we will release it as soon
      as we have held root node lock.
      Signed-off-by: NWang Shilong <wangsl.fnst@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      538f72cd
    • J
      Btrfs: only process as many file extents as there are refs · 7ef81ac8
      Josef Bacik 提交于
      The backref walking code will search down to the key it is looking for and then
      proceed to walk _all_ of the extents on the file until it hits the end.  This is
      suboptimal with large files, we only need to look for as many extents as we have
      references for that inode.  I have a testcase that creates a randomly written 4
      gig file and before this patch it took 6min 30sec to do the initial send, with
      this patch it takes 2min 30sec to do the intial send.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      7ef81ac8
    • J
      Btrfs: fix extent_from_logical to deal with skinny metadata · 580f0a67
      Josef Bacik 提交于
      I don't think this is an issue and I've not seen it in practice but
      extent_from_logical will fail to find a skinny extent because it uses
      btrfs_previous_item and gives it the normal extent item type.  This is just not
      a place to use btrfs_previous_item since we care about either normal extents or
      skinny extents, so open code btrfs_previous_item to properly check.  This would
      only affect metadata and the only place this is used for metadata is scrub and
      I'm pretty sure it's just for printing stuff out, not actually doing any work so
      hopefully it was never a problem other than a cosmetic one.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      580f0a67
    • J
      Btrfs: attach delayed ref updates to delayed ref heads · d7df2c79
      Josef Bacik 提交于
      Currently we have two rb-trees, one for delayed ref heads and one for all of the
      delayed refs, including the delayed ref heads.  When we process the delayed refs
      we have to hold onto the delayed ref lock for all of the selecting and merging
      and such, which results in quite a bit of lock contention.  This was solved by
      having a waitqueue and only one flusher at a time, however this hurts if we get
      a lot of delayed refs queued up.
      
      So instead just have an rb tree for the delayed ref heads, and then attach the
      delayed ref updates to an rb tree that is per delayed ref head.  Then we only
      need to take the delayed ref lock when adding new delayed refs and when
      selecting a delayed ref head to process, all the rest of the time we deal with a
      per delayed ref head lock which will be much less contentious.
      
      The locking rules for this get a little more complicated since we have to lock
      up to 3 things to properly process delayed refs, but I will address that problem
      later.  For now this passes all of xfstests and my overnight stress tests.
      Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      d7df2c79