1. 29 1月, 2014 10 次提交
    • F
      Btrfs: reduce btree node locking duration on item update · eb653de1
      Filipe David Borba Manana 提交于
      If we do a btree search with the goal of updating an existing item
      without changing its size (ins_len == 0 and cow == 1), then we never
      need to hold locks on upper level nodes (even when slot == 0) after we
      COW their child nodes/leaves, as we won't have node splits or merges
      in this scenario (that is, no key additions, removals or shifts on any
      nodes or leaves).
      
      Therefore release the locks immediately after COWing the child nodes/leaves
      while navigating the btree, even if their parent slot is 0, instead of
      returning a path to the caller with those nodes locked, which would get
      released only when the caller releases or frees the path (or if it calls
      btrfs_unlock_up_safe).
      
      This is a common scenario, for example when updating inode items in fs
      trees and block group items in the extent tree.
      
      The following benchmarks were performed on a quad core machine with 32Gb
      of ram, using a leaf/node size of 4Kb (to generate deeper fs trees more
      quickly).
      
        sysbench --test=fileio --file-num=131072 --file-total-size=8G \
          --file-test-mode=seqwr --num-threads=512 --file-block-size=8192 \
          --max-requests=100000 --file-io-mode=sync [prepare|run]
      
      Before this change:  49.85Mb/s (average of 5 runs)
      After this change:   50.38Mb/s (average of 5 runs)
      Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      eb653de1
    • F
      Btrfs: convert printk to btrfs_ and fix BTRFS prefix · efe120a0
      Frank Holton 提交于
      Convert all applicable cases of printk and pr_* to the btrfs_* macros.
      
      Fix all uses of the BTRFS prefix.
      Signed-off-by: NFrank Holton <fholton@gmail.com>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      efe120a0
    • F
      Btrfs: fix tree mod logging · 5de865ee
      Filipe David Borba Manana 提交于
      While running the test btrfs/004 from xfstests in a loop, it failed
      about 1 time out of 20 runs in my desktop. The failure happened in
      the backref walking part of the test, and the test's error message was
      like this:
      
        btrfs/004 93s ... [failed, exit status 1] - output mismatch (see /home/fdmanana/git/hub/xfstests_2/results//btrfs/004.out.bad)
            --- tests/btrfs/004.out	2013-11-26 18:25:29.263333714 +0000
            +++ /home/fdmanana/git/hub/xfstests_2/results//btrfs/004.out.bad	2013-12-10 15:25:10.327518516 +0000
            @@ -1,3 +1,8 @@
             QA output created by 004
             *** test backref walking
            -*** done
            +unexpected output from
            +	/home/fdmanana/git/hub/btrfs-progs/btrfs inspect-internal logical-resolve -P 141512704 /home/fdmanana/btrfs-tests/scratch_1
            +expected inum: 405, expected address: 454656, file: /home/fdmanana/btrfs-tests/scratch_1/snap1/p0/d6/d3d/d156/fce, got:
            +
             ...
             (Run 'diff -u tests/btrfs/004.out /home/fdmanana/git/hub/xfstests_2/results//btrfs/004.out.bad' to see the entire diff)
        Ran: btrfs/004
        Failures: btrfs/004
        Failed 1 of 1 tests
      
      But immediately after the test finished, the btrfs inspect-internal command
      returned the expected output:
      
        $ btrfs inspect-internal logical-resolve -P 141512704 /home/fdmanana/btrfs-tests/scratch_1
        inode 405 offset 454656 root 258
        inode 405 offset 454656 root 5
      
      It turned out this was because the btrfs_search_old_slot() calls performed
      during backref walking (backref.c:__resolve_indirect_ref) were not finding
      anything. The reason for this turned out to be that the tree mod logging
      code was not logging some node multi-step operations atomically, therefore
      btrfs_search_old_slot() callers iterated often over an incomplete tree that
      wasn't fully consistent with any tree state from the past. Besides missing
      items, this often (but not always) resulted in -EIO errors during old slot
      searches, reported in dmesg like this:
      
      [ 4299.933936] ------------[ cut here ]------------
      [ 4299.933949] WARNING: CPU: 0 PID: 23190 at fs/btrfs/ctree.c:1343 btrfs_search_old_slot+0x57b/0xab0 [btrfs]()
      [ 4299.933950] Modules linked in: btrfs raid6_pq xor pci_stub vboxpci(O) vboxnetadp(O) vboxnetflt(O) vboxdrv(O) bnep rfcomm bluetooth parport_pc ppdev binfmt_misc joydev snd_hda_codec_h
      [ 4299.933977] CPU: 0 PID: 23190 Comm: btrfs Tainted: G        W  O 3.12.0-fdm-btrfs-next-16+ #70
      [ 4299.933978] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z77 Pro4, BIOS P1.50 09/04/2012
      [ 4299.933979]  000000000000053f ffff8806f3fd98f8 ffffffff8176d284 0000000000000007
      [ 4299.933982]  0000000000000000 ffff8806f3fd9938 ffffffff8104a81c ffff880659c64b70
      [ 4299.933984]  ffff880659c643d0 ffff8806599233d8 ffff880701e2e938 0000160000000000
      [ 4299.933987] Call Trace:
      [ 4299.933991]  [<ffffffff8176d284>] dump_stack+0x55/0x76
      [ 4299.933994]  [<ffffffff8104a81c>] warn_slowpath_common+0x8c/0xc0
      [ 4299.933997]  [<ffffffff8104a86a>] warn_slowpath_null+0x1a/0x20
      [ 4299.934003]  [<ffffffffa065d3bb>] btrfs_search_old_slot+0x57b/0xab0 [btrfs]
      [ 4299.934005]  [<ffffffff81775f3b>] ? _raw_read_unlock+0x2b/0x50
      [ 4299.934010]  [<ffffffffa0655001>] ? __tree_mod_log_search+0x81/0xc0 [btrfs]
      [ 4299.934019]  [<ffffffffa06dd9b0>] __resolve_indirect_refs+0x130/0x5f0 [btrfs]
      [ 4299.934027]  [<ffffffffa06a21f1>] ? free_extent_buffer+0x61/0xc0 [btrfs]
      [ 4299.934034]  [<ffffffffa06de39c>] find_parent_nodes+0x1fc/0xe40 [btrfs]
      [ 4299.934042]  [<ffffffffa06b13e0>] ? defrag_lookup_extent+0xe0/0xe0 [btrfs]
      [ 4299.934048]  [<ffffffffa06b13e0>] ? defrag_lookup_extent+0xe0/0xe0 [btrfs]
      [ 4299.934056]  [<ffffffffa06df980>] iterate_extent_inodes+0xe0/0x250 [btrfs]
      [ 4299.934058]  [<ffffffff817762db>] ? _raw_spin_unlock+0x2b/0x50
      [ 4299.934065]  [<ffffffffa06dfb82>] iterate_inodes_from_logical+0x92/0xb0 [btrfs]
      [ 4299.934071]  [<ffffffffa06b13e0>] ? defrag_lookup_extent+0xe0/0xe0 [btrfs]
      [ 4299.934078]  [<ffffffffa06b7015>] btrfs_ioctl+0xf65/0x1f60 [btrfs]
      [ 4299.934080]  [<ffffffff811658b8>] ? handle_mm_fault+0x278/0xb00
      [ 4299.934083]  [<ffffffff81075563>] ? up_read+0x23/0x40
      [ 4299.934085]  [<ffffffff8177a41c>] ? __do_page_fault+0x20c/0x5a0
      [ 4299.934088]  [<ffffffff811b2946>] do_vfs_ioctl+0x96/0x570
      [ 4299.934090]  [<ffffffff81776e23>] ? error_sti+0x5/0x6
      [ 4299.934093]  [<ffffffff810b71e8>] ? trace_hardirqs_off_caller+0x28/0xd0
      [ 4299.934096]  [<ffffffff81776a09>] ? retint_swapgs+0xe/0x13
      [ 4299.934098]  [<ffffffff811b2eb1>] SyS_ioctl+0x91/0xb0
      [ 4299.934100]  [<ffffffff813eecde>] ? trace_hardirqs_on_thunk+0x3a/0x3f
      [ 4299.934102]  [<ffffffff8177ef12>] system_call_fastpath+0x16/0x1b
      [ 4299.934102]  [<ffffffff8177ef12>] system_call_fastpath+0x16/0x1b
      [ 4299.934104] ---[ end trace 48f0cfc902491414 ]---
      [ 4299.934378] btrfs bad fsid on block 0
      
      These tree mod log operations that must be performed atomically, tree_mod_log_free_eb,
      tree_mod_log_eb_copy, tree_mod_log_insert_root and tree_mod_log_insert_move, used to
      be performed atomically before the following commit:
      
        c8cc6341
        (Btrfs: stop using GFP_ATOMIC for the tree mod log allocations)
      
      That change removed the atomicity of such operations. This patch restores the
      atomicity while still not doing the GFP_ATOMIC allocations of tree_mod_elem
      structures, so it has to do the allocations using GFP_NOFS before acquiring
      the mod log lock.
      
      This issue has been experienced by several users recently, such as for example:
      
        http://www.spinics.net/lists/linux-btrfs/msg28574.html
      
      After running the btrfs/004 test for 679 consecutive iterations with this
      patch applied, I didn't ran into the issue anymore.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      5de865ee
    • F
      Btrfs: return immediately if tree log mod is not necessary · 78357766
      Filipe David Borba Manana 提交于
      In ctree.c:tree_mod_log_set_node_key() we were calling
      __tree_mod_log_insert_key() even when the modification doesn't need
      to be logged. This would allocate a tree_mod_elem structure, fill it
      and pass it to  __tree_mod_log_insert(), which would just acquire
      the tree mod log write lock and then free the tree_mod_elem structure
      and return (that is, a no-op).
      
      Therefore call tree_mod_log_insert() instead of __tree_mod_log_insert()
      which just returns immediately if the modification doesn't need to be
      logged (without allocating the structure, fill it, acquire write lock,
      free structure).
      Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      78357766
    • F
      Btrfs: more efficient push_leaf_right · 2ef1fed2
      Filipe David Borba Manana 提交于
      Currently when finding the leaf to insert a key into a btree, if the
      leaf doesn't have enough space to store the item we attempt to move
      off some items from our leaf to its right neighbor leaf, and if this
      fails to create enough free space in our leaf, we try to move off more
      items to the left neighbor leaf as well.
      
      When trying to move off items to the right neighbor leaf, if it has
      enough room to store the new key but not not enough room to move off
      at least one item from our target leaf, __push_leaf_right returns 1 and
      we have to attempt to move items to the left neighbor (push_leaf_left
      function) without touching the right neighbor leaf.
      For the case where the right leaf has enough room to store at least 1
      item from our leaf, we end up modifying (and dirtying) both our leaf
      and the right leaf. This is non-optimal for the case where the new key
      is greater than any key in our target leaf because it can be inserted at
      slot 0 of the right neighbor leaf and we don't need to touch our leaf
      at all nor to attempt to move off items to the left neighbor leaf.
      
      Therefore this change just selects the right neighbor leaf as our new
      target leaf if it has enough room for the new key without modifying our
      initial target leaf - we do this only if the new key is higher than any
      key in the initial target leaf.
      
      While running the following test, push_leaf_right was called by split_leaf
      4802 times. Out of those 4802 calls, for 2571 calls (53.5%) we hit this
      special case (right leaf has enough room and new key is higher than any key
      in the initial target leaf).
      
      Test:
      
        sysbench --test=fileio --file-num=512 --file-total-size=5G \
          --file-test-mode=[seqwr|rndwr] --num-threads=512 --file-block-size=8192 \
          --max-requests=100000 --file-io-mode=sync [prepare|run]
      
      Results:
      
      sequential writes
      
      Throughput before this change: 65.71Mb/sec (average of 10 runs)
      Throughput after this change:  66.58Mb/sec (average of 10 runs)
      
      random writes
      
      Throughput before this change: 10.75Mb/sec (average of 10 runs)
      Throughput after this change:  11.56Mb/sec (average of 10 runs)
      Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
      Reviewed-by: NLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      2ef1fed2
    • F
      Btrfs: try harder to avoid btree node splits · 5a4267ca
      Filipe David Borba Manana 提交于
      When attempting to move items from our target leaf to its neighbor
      leaves (right and left), we only need to free data_size - free_space
      bytes from our leaf in order to add the new item (which has size of
      data_size bytes). Therefore attempt to move items to the right and
      left leaves if they have at least data_size - free_space bytes free,
      instead of data_size bytes free.
      
      After 5 runs of the following test, I got a smaller number of btree
      node splits overall:
      
      sysbench --test=fileio --file-num=512 --file-total-size=5G \
        --file-test-mode=seqwr --num-threads=512 \
         --file-block-size=8192 --max-requests=100000 --file-io-mode=sync
      
      Before this change:
      * 6171 splits (average of 5 test runs)
      * 61.508Mb/sec of throughput (average of 5 test runs)
      
      After this change:
      * 6036 splits (average of 5 test runs)
      * 63.533Mb/sec of throughput (average of 5 test runs)
      
      An ideal test would not just have multiple threads/processes writing
      to a file (insertion of file extent items) but also do other operations
      that result in insertion of items with varied sizes, like file/directory
      creations, creation of links, symlinks, xattrs, etc.
      Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      5a4267ca
    • K
      btrfs: expand btrfs_find_item() to include find_orphan_item functionality · 3f870c28
      Kelley Nielsen 提交于
      This is the third step in bootstrapping the btrfs_find_item interface.
      The function find_orphan_item(), in orphan.c, is similar to the two
      functions already replaced by the new interface. It uses two parameters,
      which are already present in the interface, and is nearly identical to
      the function brought in in the previous patch.
      
      Replace the two calls to find_orphan_item() with calls to
      btrfs_find_item(), with the defined objectid and type that was used
      internally by find_orphan_item(), a null path, and a null key. Add a
      test for a null path to btrfs_find_item, and if it passes, allocate and
      free the path. Finally, remove find_orphan_item().
      Signed-off-by: NKelley Nielsen <kelleynnn@gmail.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      3f870c28
    • K
      btrfs: expand btrfs_find_item() to include find_root_ref functionality · 75ac2dd9
      Kelley Nielsen 提交于
      This patch is the second step in bootstrapping the btrfs_find_item
      interface. The btrfs_find_root_ref() is similar to the former
      __inode_info(); it accepts four of its parameters, and duplicates the
      first half of its functionality.
      
      Replace the one former call to btrfs_find_root_ref() with a call to
      btrfs_find_item(), along with the defined key type that was used
      internally by btrfs_find_root ref, and a null found key. In
      btrfs_find_item(), add a test for the null key at the place where
      the functionality of btrfs_find_root_ref() ends; btrfs_find_item()
      then returns if the test passes. Finally, remove btrfs_find_root_ref().
      Signed-off-by: NKelley Nielsen <kelleynnn@gmail.com>
      Suggested-by: NZach Brown <zab@redhat.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      75ac2dd9
    • K
      btrfs: bootstrap generic btrfs_find_item interface · e33d5c3d
      Kelley Nielsen 提交于
      There are many btrfs functions that manually search the tree for an
      item. They all reimplement the same mechanism and differ in the
      conditions that they use to find the item. __inode_info() is one such
      example. Zach Brown proposed creating a new interface to take the place
      of these functions.
      
      This patch is the first step to creating the interface. A new function,
      btrfs_find_item, has been added to ctree.c and prototyped in ctree.h.
      It is identical to __inode_info, except that the order of the parameters
      has been rearranged to more closely those of similar functions elsewhere
      in the code (now, root and path come first, then the objectid, offset
      and type, and the key to be filled in last). __inode_info's callers have
      been set to call this new function instead, and __inode_info itself has
      been removed.
      Signed-off-by: NKelley Nielsen <kelleynnn@gmail.com>
      Suggested-by: NZach Brown <zab@redhat.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      e33d5c3d
    • J
      Btrfs: incompatible format change to remove hole extents · 16e7549f
      Josef Bacik 提交于
      Btrfs has always had these filler extent data items for holes in inodes.  This
      has made somethings very easy, like logging hole punches and sending hole
      punches.  However for large holey files these extent data items are pure
      overhead.  So add an incompatible feature to no longer add hole extents to
      reduce the amount of metadata used by these sort of files.  This has a few
      changes for logging and send obviously since they will need to detect holes and
      log/send the holes if there are any.  I've tested this thoroughly with xfstests
      and it doesn't cause any issues with and without the incompat format set.
      Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      16e7549f
  2. 12 11月, 2013 8 次提交
  3. 21 9月, 2013 1 次提交
  4. 01 9月, 2013 9 次提交
    • F
      Btrfs: optimize key searches in btrfs_search_slot · d7396f07
      Filipe David Borba Manana 提交于
      When the binary search returns 0 (exact match), the target key
      will necessarily be at slot 0 of all nodes below the current one,
      so in this case the binary search is not needed because it will
      always return 0, and we waste time doing it, holding node locks
      for longer than necessary, etc.
      
      Below follow histograms with the times spent on the current approach of
      doing a binary search when the previous binary search returned 0, and
      times for the new approach, which directly picks the first item/child
      node in the leaf/node.
      
      Current approach:
      
      Count: 6682
      Range: 35.000 - 8370.000; Mean: 85.837; Median: 75.000; Stddev: 106.429
      Percentiles:  90th: 124.000; 95th: 145.000; 99th: 206.000
        35.000 -   61.080:  1235 ################
        61.080 -  106.053:  4207 #####################################################
       106.053 -  183.606:  1122 ##############
       183.606 -  317.341:   111 #
       317.341 -  547.959:     6 |
       547.959 - 8370.000:     1 |
      
      Approach proposed by this patch:
      
      Count: 6682
      Range:  6.000 - 135.000; Mean: 16.690; Median: 16.000; Stddev:  7.160
      Percentiles:  90th: 23.000; 95th: 27.000; 99th: 40.000
         6.000 -    8.418:    58 #
         8.418 -   11.670:  1149 #########################
        11.670 -   16.046:  2418 #####################################################
        16.046 -   21.934:  2098 ##############################################
        21.934 -   29.854:   744 ################
        29.854 -   40.511:   154 ###
        40.511 -   54.848:    41 #
        54.848 -   74.136:     5 |
        74.136 -  100.087:     9 |
       100.087 -  135.000:     6 |
      
      These samples were captured during a run of the btrfs tests 001, 002 and
      004 in the xfstests, with a leaf/node size of 4Kb.
      Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      d7396f07
    • G
      Btrfs: Make btrfs_header_chunk_tree_uuid() return unsigned long · b308bc2f
      Geert Uytterhoeven 提交于
      Internally, btrfs_header_chunk_tree_uuid() calculates an unsigned long, but
      casts it to a pointer, while all callers cast it to unsigned long again.
      Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      b308bc2f
    • G
      Btrfs: Make btrfs_header_fsid() return unsigned long · fba6aa75
      Geert Uytterhoeven 提交于
      Internally, btrfs_header_fsid() calculates an unsigned long, but casts
      it to a pointer, while all callers cast it to unsigned long again.
      Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      fba6aa75
    • G
      Btrfs: Remove superfluous casts from u64 to unsigned long long · c1c9ff7c
      Geert Uytterhoeven 提交于
      u64 is "unsigned long long" on all architectures now, so there's no need to
      cast it when formatting it using the "ll" length modifier.
      Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      c1c9ff7c
    • S
      Btrfs: get rid of sparse warnings · 35a3621b
      Stefan Behrens 提交于
      make C=2 fs/btrfs/ CF=-D__CHECK_ENDIAN__
      
      I tried to filter out the warnings for which patches have already
      been sent to the mailing list, pending for inclusion in btrfs-next.
      
      All these changes should be obviously safe.
      Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      35a3621b
    • J
      Btrfs: fix send issues related to inode number reuse · ba5e8f2e
      Josef Bacik 提交于
      If you are sending a snapshot and specifying a parent snapshot we will walk the
      trees and figure out where they differ and send the differences only.  The way
      we check for differences are if the leaves aren't the same and if the keys are
      not the same within the leaves.  So if neither leaf is the same (ie the leaf has
      been cow'ed from the parent snapshot) we walk each item in the send root and
      check it against the parent root.  If the items match exactly then we don't do
      anything.  This doesn't quite work for inode refs, since they will just have the
      name and the parent objectid.  If you move the file from a directory and then
      remove that directory and re-create a directory with the same inode number as
      the old directory and then move that file back into that directory we will
      assume that nothing changed and you will get errors when you try to receive.
      
      In order to fix this we need to do extra checking to see if the inode ref really
      is the same or not.  So do this by passing down BTRFS_COMPARE_TREE_SAME if the
      items match.  Then if the key type is an inode ref we can do some extra
      checking, otherwise we just keep processing.  The extra checking is to look up
      the generation of the directory in the parent volume and compare it to the
      generation of the send volume.  If they match then they are the same directory
      and we are good to go.  If they don't we have to add them to the changed refs
      list.
      
      This means we have to track the generation of the ref we're trying to lookup
      when we iterate all the refs for a particular inode.  So in the case of looking
      for new refs we have to get the generation from the parent volume, and in the
      case of looking for deleted refs we have to get the generation from the send
      volume to compare with.
      
      There was also the issue of using a ulist to keep track of the directories we
      needed to check.  Because we can get a deleted ref and a new ref for the same
      inode number the ulist won't work since it indexes based on the value.  So
      instead just dup any directory ref we find and add it to a local list, and then
      process that list as normal and do away with using a ulist for this altogether.
      
      Before we would fail all of the tests in the far-progs that related to moving
      directories (test group 32).  With this patch we now pass these tests, and all
      of the tests in the far-progs send testing suite.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      ba5e8f2e
    • J
      Btrfs: stop using GFP_ATOMIC when allocating rewind ebs · 9ec72677
      Josef Bacik 提交于
      There is no reason we can't just set the path to blocking and then do normal
      GFP_NOFS allocations for these extent buffers.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      9ec72677
    • J
      Btrfs: deal with enomem in the rewind path · db7f3436
      Josef Bacik 提交于
      We can get ENOMEM trying to allocate dummy bufs for the rewind operation of the
      tree mod log.  Instead of BUG_ON()'ing in this case pass up ENOMEM.  I looked
      back through the callers and I'm pretty sure I got everybody who did BUG_ON(ret)
      in this path.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      db7f3436
    • J
      Btrfs: stop using GFP_ATOMIC for the tree mod log allocations · c8cc6341
      Josef Bacik 提交于
      Previously we held the tree mod lock when adding stuff because we use it to
      check and see if we truly do want to track tree modifications.  This is
      admirable, but GFP_ATOMIC in a critical area that is going to get hit pretty
      hard and often is not nice.  So instead do our basic checks to see if we don't
      need to track modifications, and if those pass then do our allocation, and then
      when we go to insert the new modification check if we still care, and if we
      don't just free up our mod and return.  Otherwise we're good to go and we can
      carry on.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      c8cc6341
  5. 10 8月, 2013 1 次提交
  6. 02 7月, 2013 2 次提交
    • J
      Btrfs: only do the tree_mod_log_free_eb if this is our last ref · 7fb7d76f
      Josef Bacik 提交于
      There is another bug in the tree mod log stuff in that we're calling
      tree_mod_log_free_eb every single time a block is cow'ed.  The problem with this
      is that if this block is shared by multiple snapshots we will call this multiple
      times per block, so if we go to rewind the mod log for this block we'll BUG_ON()
      in __tree_mod_log_rewind because we try to rewind a free twice.  We only want to
      call tree_mod_log_free_eb if we are actually freeing the block.  With this patch
      I no longer hit the panic in __tree_mod_log_rewind.  Thanks,
      
      Cc: stable@vger.kernel.org
      Reviewed-by: NJan Schmidt <list.btrfs@jan-o-sch.net>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      7fb7d76f
    • J
      Btrfs: hold the tree mod lock in __tree_mod_log_rewind · f1ca7e98
      Josef Bacik 提交于
      We need to hold the tree mod log lock in __tree_mod_log_rewind since we walk
      forward in the tree mod entries, otherwise we'll end up with random entries and
      trip the BUG_ON() at the front of __tree_mod_log_rewind.  This fixes the panics
      people were seeing when running
      
      find /whatever -type f -exec btrfs fi defrag {} \;
      
      Thansk,
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      f1ca7e98
  7. 01 7月, 2013 2 次提交
    • J
      Btrfs: optimize reada_for_balance · 0b08851f
      Josef Bacik 提交于
      This patch does two things.  First we no longer explicitly read in the blocks
      we're trying to readahead.  For things like balance_level we may never actually
      use the blocks so this just adds uneeded latency, and balance_level and
      split_node will both read in the blocks they care about explicitly so if the
      blocks need to be waited on it will be done there.  Secondly we no longer drop
      the path if we do readahead, we just set the path blocking before we call
      reada_for_balance() and then we're good to go.  Hopefully this will cut down on
      the number of re-searches.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      0b08851f
    • J
      Btrfs: optimize read_block_for_search · bdf7c00e
      Josef Bacik 提交于
      This patch does two things, first it only does one call to
      btrfs_buffer_uptodate() with the gen specified instead of once with 0 and then
      again with gen specified.  The other thing is to call btrfs_read_buffer() on the
      buffer we've found instead of dropping it and then calling read_tree_block().
      This will keep us from doing yet another radix tree lookup for a buffer we've
      already found.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      bdf7c00e
  8. 14 6月, 2013 3 次提交
  9. 28 5月, 2013 1 次提交
  10. 18 5月, 2013 1 次提交
    • J
      Btrfs: handle running extent ops with skinny metadata · b1c79e09
      Josef Bacik 提交于
      Chris hit a bug where we weren't finding extent records when running extent ops.
      This is because we use the delayed_ref_head when running the extent op, which
      means we can't use the ->type checks to see if we are metadata.  We also lose
      the level of the metadata we are working on.  So to fix this we can just check
      the ->is_data section of the extent_op, and we can store the level of the buffer
      we were modifying in the extent_op.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      b1c79e09
  11. 07 5月, 2013 2 次提交
    • E
      btrfs: make static code static & remove dead code · 48a3b636
      Eric Sandeen 提交于
      Big patch, but all it does is add statics to functions which
      are in fact static, then remove the associated dead-code fallout.
      
      removed functions:
      
      btrfs_iref_to_path()
      __btrfs_lookup_delayed_deletion_item()
      __btrfs_search_delayed_insertion_item()
      __btrfs_search_delayed_deletion_item()
      find_eb_for_page()
      btrfs_find_block_group()
      range_straddles_pages()
      extent_range_uptodate()
      btrfs_file_extent_length()
      btrfs_scrub_cancel_devid()
      btrfs_start_transaction_lflush()
      
      btrfs_print_tree() is left because it is used for debugging.
      btrfs_start_transaction_lflush() and btrfs_reada_detach() are
      left for symmetry.
      
      ulist.c functions are left, another patch will take care of those.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      48a3b636
    • J
      Btrfs: separate sequence numbers for delayed ref tracking and tree mod log · fc36ed7e
      Jan Schmidt 提交于
      Sequence numbers for delayed refs have been introduced in the first version
      of the qgroup patch set. To solve the problem of find_all_roots on a busy
      file system, the tree mod log was introduced. The sequence numbers for that
      were simply shared between those two users.
      
      However, at one point in qgroup's quota accounting, there's a statement
      accessing the previous sequence number, that's still just doing (seq - 1)
      just as it would have to in the very first version.
      
      To satisfy that requirement, this patch makes the sequence number counter 64
      bit and splits it into a major part (used for qgroup sequence number
      counting) and a minor part (incremented for each tree modification in the
      log). This enables us to go exactly one major step backwards, as required
      for qgroups, while still incrementing the sequence counter for tree mod log
      insertions to keep track of their order. Keeping them in a single variable
      means there's no need to change all the code dealing with comparisons of two
      sequence numbers.
      
      The sequence number is reset to 0 on commit (not new in this patch), which
      ensures we won't overflow the two 32 bit counters.
      
      Without this fix, the qgroup tracking can occasionally go wrong and WARN_ONs
      from the tree mod log code may happen.
      Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      fc36ed7e