1. 26 4月, 2014 1 次提交
    • C
      Btrfs: limit the path size in send to PATH_MAX · cfd4a535
      Chris Mason 提交于
      fs_path_ensure_buf is used to make sure our path buffers for
      send are big enough for the path names as we construct them.
      The buffer size is limited to 32K by the length field in
      the struct.
      
      But bugs in the path construction can end up trying to build
      a huge buffer, and we'll do invalid memmmoves when the
      buffer length field wraps.
      
      This patch is step one, preventing the overflows.
      Signed-off-by: NChris Mason <clm@fb.com>
      cfd4a535
  2. 08 4月, 2014 3 次提交
    • F
      Btrfs: send, build path string only once in send_hole · c715e155
      Filipe Manana 提交于
      There's no point building the path string in each iteration of the
      send_hole loop, as it produces always the same string.
      Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      c715e155
    • F
      Btrfs: send, fix data corruption due to incorrect hole detection · 766b5e5a
      Filipe Manana 提交于
      During an incremental send, when we finish processing an inode (corresponding to
      a regular file) we would assume the gap between the end of the last processed file
      extent and the file's size corresponded to a file hole, and therefore incorrectly
      send a bunch of zero bytes to overwrite that region in the file.
      
      This affects only kernel 3.14.
      
      Reproducer:
      
          mkfs.btrfs -f /dev/sdc
          mount /dev/sdc /mnt
      
          xfs_io -f -c "falloc -k 0 268435456" /mnt/foo
      
          btrfs subvolume snapshot -r /mnt /mnt/mysnap0
      
          xfs_io -c "pwrite -S 0x01 -b 9216 16190218 9216" /mnt/foo
          xfs_io -c "pwrite -S 0x02 -b 1121 198720104 1121" /mnt/foo
          xfs_io -c "pwrite -S 0x05 -b 9216 107887439 9216" /mnt/foo
          xfs_io -c "pwrite -S 0x06 -b 9216 225520207 9216" /mnt/foo
          xfs_io -c "pwrite -S 0x07 -b 67584 102138300 67584" /mnt/foo
          xfs_io -c "pwrite -S 0x08 -b 7000 94897484 7000" /mnt/foo
          xfs_io -c "pwrite -S 0x09 -b 113664 245083212 113664" /mnt/foo
          xfs_io -c "pwrite -S 0x10 -b 123 17937788 123" /mnt/foo
          xfs_io -c "pwrite -S 0x11 -b 39936 229573311 39936" /mnt/foo
          xfs_io -c "pwrite -S 0x12 -b 67584 174792222 67584" /mnt/foo
          xfs_io -c "pwrite -S 0x13 -b 9216 249253213 9216" /mnt/foo
          xfs_io -c "pwrite -S 0x16 -b 67584 150046083 67584" /mnt/foo
          xfs_io -c "pwrite -S 0x17 -b 39936 118246040 39936" /mnt/foo
          xfs_io -c "pwrite -S 0x18 -b 67584 215965442 67584" /mnt/foo
          xfs_io -c "pwrite -S 0x19 -b 33792 97096725 33792" /mnt/foo
          xfs_io -c "pwrite -S 0x20 -b 125952 166300596 125952" /mnt/foo
          xfs_io -c "pwrite -S 0x21 -b 123 1078957 123" /mnt/foo
          xfs_io -c "pwrite -S 0x25 -b 9216 212044492 9216" /mnt/foo
          xfs_io -c "pwrite -S 0x26 -b 7000 265037146 7000" /mnt/foo
          xfs_io -c "pwrite -S 0x27 -b 42757 215922685 42757" /mnt/foo
          xfs_io -c "pwrite -S 0x28 -b 7000 69865411 7000" /mnt/foo
          xfs_io -c "pwrite -S 0x29 -b 67584 67948958 67584" /mnt/foo
          xfs_io -c "pwrite -S 0x30 -b 39936 266967019 39936" /mnt/foo
          xfs_io -c "pwrite -S 0x31 -b 1121 19582453 1121" /mnt/foo
          xfs_io -c "pwrite -S 0x32 -b 17408 257710255 17408" /mnt/foo
          xfs_io -c "pwrite -S 0x33 -b 39936 3895518 39936" /mnt/foo
          xfs_io -c "pwrite -S 0x34 -b 125952 12045847 125952" /mnt/foo
          xfs_io -c "pwrite -S 0x35 -b 17408 19156379 17408" /mnt/foo
          xfs_io -c "pwrite -S 0x36 -b 39936 50160066 39936" /mnt/foo
          xfs_io -c "pwrite -S 0x37 -b 113664 9549793 113664" /mnt/foo
          xfs_io -c "pwrite -S 0x38 -b 105472 94391506 105472" /mnt/foo
          xfs_io -c "pwrite -S 0x39 -b 23552 143632863 23552" /mnt/foo
          xfs_io -c "pwrite -S 0x40 -b 39936 241283845 39936" /mnt/foo
          xfs_io -c "pwrite -S 0x41 -b 113664 199937606 113664" /mnt/foo
          xfs_io -c "pwrite -S 0x42 -b 67584 67380093 67584" /mnt/foo
          xfs_io -c "pwrite -S 0x43 -b 67584 26793129 67584" /mnt/foo
          xfs_io -c "pwrite -S 0x44 -b 39936 14421913 39936" /mnt/foo
          xfs_io -c "pwrite -S 0x45 -b 123 253097405 123" /mnt/foo
          xfs_io -c "pwrite -S 0x46 -b 1121 128233424 1121" /mnt/foo
          xfs_io -c "pwrite -S 0x47 -b 105472 91577959 105472" /mnt/foo
          xfs_io -c "pwrite -S 0x48 -b 1121 7245381 1121" /mnt/foo
          xfs_io -c "pwrite -S 0x49 -b 113664 182414694 113664" /mnt/foo
          xfs_io -c "pwrite -S 0x50 -b 9216 32750608 9216" /mnt/foo
          xfs_io -c "pwrite -S 0x51 -b 67584 266546049 67584" /mnt/foo
          xfs_io -c "pwrite -S 0x52 -b 67584 87969398 67584" /mnt/foo
          xfs_io -c "pwrite -S 0x53 -b 9216 260848797 9216" /mnt/foo
          xfs_io -c "pwrite -S 0x54 -b 39936 119461243 39936" /mnt/foo
          xfs_io -c "pwrite -S 0x55 -b 7000 200178693 7000" /mnt/foo
          xfs_io -c "pwrite -S 0x56 -b 9216 243316029 9216" /mnt/foo
          xfs_io -c "pwrite -S 0x57 -b 7000 209658229 7000" /mnt/foo
          xfs_io -c "pwrite -S 0x58 -b 101376 179745192 101376" /mnt/foo
          xfs_io -c "pwrite -S 0x59 -b 9216 64012300 9216" /mnt/foo
          xfs_io -c "pwrite -S 0x60 -b 125952 181705139 125952" /mnt/foo
          xfs_io -c "pwrite -S 0x61 -b 23552 235737348 23552" /mnt/foo
          xfs_io -c "pwrite -S 0x62 -b 113664 106021355 113664" /mnt/foo
          xfs_io -c "pwrite -S 0x63 -b 67584 135753552 67584" /mnt/foo
          xfs_io -c "pwrite -S 0x64 -b 23552 95730888 23552" /mnt/foo
          xfs_io -c "pwrite -S 0x65 -b 11 17311415 11" /mnt/foo
          xfs_io -c "pwrite -S 0x66 -b 33792 120695553 33792" /mnt/foo
          xfs_io -c "pwrite -S 0x67 -b 9216 17164631 9216" /mnt/foo
          xfs_io -c "pwrite -S 0x68 -b 9216 136065853 9216" /mnt/foo
          xfs_io -c "pwrite -S 0x69 -b 67584 37752198 67584" /mnt/foo
          xfs_io -c "pwrite -S 0x70 -b 101376 189717473 101376" /mnt/foo
          xfs_io -c "pwrite -S 0x71 -b 7000 227463698 7000" /mnt/foo
          xfs_io -c "pwrite -S 0x72 -b 9216 12655137 9216" /mnt/foo
          xfs_io -c "pwrite -S 0x73 -b 7000 7488866 7000" /mnt/foo
          xfs_io -c "pwrite -S 0x74 -b 113664 87813649 113664" /mnt/foo
          xfs_io -c "pwrite -S 0x75 -b 33792 25802183 33792" /mnt/foo
          xfs_io -c "pwrite -S 0x76 -b 39936 93524024 39936" /mnt/foo
          xfs_io -c "pwrite -S 0x77 -b 33792 113336388 33792" /mnt/foo
          xfs_io -c "pwrite -S 0x78 -b 105472 184955320 105472" /mnt/foo
          xfs_io -c "pwrite -S 0x79 -b 101376 225691598 101376" /mnt/foo
          xfs_io -c "pwrite -S 0x80 -b 23552 77023155 23552" /mnt/foo
          xfs_io -c "pwrite -S 0x81 -b 11 201888192 11" /mnt/foo
          xfs_io -c "pwrite -S 0x82 -b 11 115332492 11" /mnt/foo
          xfs_io -c "pwrite -S 0x83 -b 67584 230278015 67584" /mnt/foo
          xfs_io -c "pwrite -S 0x84 -b 11 120589073 11" /mnt/foo
          xfs_io -c "pwrite -S 0x85 -b 125952 202207819 125952" /mnt/foo
          xfs_io -c "pwrite -S 0x86 -b 113664 86672080 113664" /mnt/foo
          xfs_io -c "pwrite -S 0x87 -b 17408 208459603 17408" /mnt/foo
          xfs_io -c "pwrite -S 0x88 -b 7000 73372211 7000" /mnt/foo
          xfs_io -c "pwrite -S 0x89 -b 7000 42252122 7000" /mnt/foo
          xfs_io -c "pwrite -S 0x90 -b 23552 46784881 23552" /mnt/foo
          xfs_io -c "pwrite -S 0x91 -b 101376 63172351 101376" /mnt/foo
          xfs_io -c "pwrite -S 0x92 -b 23552 59341931 23552" /mnt/foo
          xfs_io -c "pwrite -S 0x93 -b 39936 239599283 39936" /mnt/foo
          xfs_io -c "pwrite -S 0x94 -b 67584 175643105 67584" /mnt/foo
          xfs_io -c "pwrite -S 0x97 -b 23552 105534880 23552" /mnt/foo
          xfs_io -c "pwrite -S 0x98 -b 113664 8236844 113664" /mnt/foo
          xfs_io -c "pwrite -S 0x99 -b 125952 144489686 125952" /mnt/foo
          xfs_io -c "pwrite -S 0xa0 -b 7000 73273112 7000" /mnt/foo
          xfs_io -c "pwrite -S 0xa1 -b 125952 194580243 125952" /mnt/foo
          xfs_io -c "pwrite -S 0xa2 -b 123 56296779 123" /mnt/foo
          xfs_io -c "pwrite -S 0xa3 -b 11 233066845 11" /mnt/foo
          xfs_io -c "pwrite -S 0xa4 -b 39936 197727090 39936" /mnt/foo
          xfs_io -c "pwrite -S 0xa5 -b 101376 53579812 101376" /mnt/foo
          xfs_io -c "pwrite -S 0xa6 -b 9216 85669738 9216" /mnt/foo
          xfs_io -c "pwrite -S 0xa7 -b 125952 21266322 125952" /mnt/foo
          xfs_io -c "pwrite -S 0xa8 -b 23552 125726568 23552" /mnt/foo
          xfs_io -c "pwrite -S 0xa9 -b 9216 18423680 9216" /mnt/foo
          xfs_io -c "pwrite -S 0xb0 -b 1121 165901483 1121" /mnt/foo
      
          btrfs subvolume snapshot -r /mnt /mnt/mysnap1
      
          xfs_io -c "pwrite -S 0xff -b 10 16190218 10" /mnt/foo
      
          btrfs subvolume snapshot -r /mnt /mnt/mysnap2
      
          md5sum /mnt/foo          # returns 79e53f1466bfc09fd82b450689e6119e
          md5sum /mnt/mysnap2/foo  # returns 79e53f1466bfc09fd82b450689e6119e too
      
          btrfs send /mnt/mysnap1 -f /tmp/1.snap
          btrfs send -p /mnt/mysnap1 /mnt/mysnap2 -f /tmp/2.snap
      
          mkfs.btrfs -f /dev/sdc
          mount /dev/sdc /mnt
      
          btrfs receive /mnt -f /tmp/1.snap
          btrfs receive /mnt -f /tmp/2.snap
      
          md5sum /mnt/mysnap2/foo  # returns 2bb414c5155767cedccd7063e51beabd !!
      
      A testcase for xfstests follows soon too.
      Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      766b5e5a
    • J
      Btrfs: hold the commit_root_sem when getting the commit root during send · 3f8a18cc
      Josef Bacik 提交于
      We currently rely too heavily on roots being read-only to save us from just
      accessing root->commit_root.  We can easily balance blocks out from underneath a
      read only root, so to save us from getting screwed make sure we only access
      root->commit_root under the commit root sem.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      3f8a18cc
  3. 07 4月, 2014 2 次提交
    • J
      Btrfs: remove transaction from send · 9e351cc8
      Josef Bacik 提交于
      Lets try this again.  We can deadlock the box if we send on a box and try to
      write onto the same fs with the app that is trying to listen to the send pipe.
      This is because the writer could get stuck waiting for a transaction commit
      which is being blocked by the send.  So fix this by making sure looking at the
      commit roots is always going to be consistent.  We do this by keeping track of
      which roots need to have their commit roots swapped during commit, and then
      taking the commit_root_sem and swapping them all at once.  Then make sure we
      take a read lock on the commit_root_sem in cases where we search the commit root
      to make sure we're always looking at a consistent view of the commit roots.
      Previously we had problems with this because we would swap a fs tree commit root
      and then swap the extent tree commit root independently which would cause the
      backref walking code to screw up sometimes.  With this patch we no longer
      deadlock and pass all the weird send/receive corner cases.  Thanks,
      Reportedy-by: NHugo Mills <hugo@carfax.org.uk>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      9e351cc8
    • J
      Btrfs: don't clear uptodate if the eb is under IO · a26e8c9f
      Josef Bacik 提交于
      So I have an awful exercise script that will run snapshot, balance and
      send/receive in parallel.  This sometimes would crash spectacularly and when it
      came back up the fs would be completely hosed.  Turns out this is because of a
      bad interaction of balance and send/receive.  Send will hold onto its entire
      path for the whole send, but its blocks could get relocated out from underneath
      it, and because it doesn't old tree locks theres nothing to keep this from
      happening.  So it will go to read in a slot with an old transid, and we could
      have re-allocated this block for something else and it could have a completely
      different transid.  But because we think it is invalid we clear uptodate and
      re-read in the block.  If we do this before we actually write out the new block
      we could write back stale data to the fs, and boom we're screwed.
      
      Now we definitely need to fix this disconnect between send and balance, but we
      really really need to not allow ourselves to accidently read in stale data over
      new data.  So make sure we check if the extent buffer is not under io before
      clearing uptodate, this will kick back EIO to the caller instead of reading in
      stale data and keep us from corrupting the fs.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      a26e8c9f
  4. 22 3月, 2014 3 次提交
    • C
      btrfs: fix uninit variable warning · 73b802f4
      Chris Mason 提交于
      fs/btrfs/send.c:2926: warning: ‘entry’ may be used uninitialized in this
      function
      Signed-off-by: NChris Mason <clm@fb.com>
      73b802f4
    • F
      Btrfs: part 2, fix incremental send's decision to delay a dir move/rename · bfa7e1f8
      Filipe Manana 提交于
      For an incremental send, fix the process of determining whether the directory
      inode we're currently processing needs to have its move/rename operation delayed.
      
      We were ignoring the fact that if the inode's new immediate ancestor has a higher
      inode number than ours but wasn't renamed/moved, we might still need to delay our
      move/rename, because some other ancestor directory higher in the hierarchy might
      have an inode number higher than ours *and* was renamed/moved too - in this case
      we have to wait for rename/move of that ancestor to happen before our current
      directory's rename/move operation.
      
      Simple steps to reproduce this issue:
      
            $ mkfs.btrfs -f /dev/sdd
            $ mount /dev/sdd /mnt
      
            $ mkdir -p /mnt/a/x1/x2
            $ mkdir /mnt/a/Z
            $ mkdir -p /mnt/a/x1/x2/x3/x4/x5
      
            $ btrfs subvolume snapshot -r /mnt /mnt/snap1
            $ btrfs send /mnt/snap1 -f /tmp/base.send
      
            $ mv /mnt/a/x1/x2/x3 /mnt/a/Z/X33
            $ mv /mnt/a/x1/x2 /mnt/a/Z/X33/x4/x5/X22
      
            $ btrfs subvolume snapshot -r /mnt /mnt/snap2
            $ btrfs send -p /mnt/snap1 /mnt/snap2 -f /tmp/incremental.send
      
      The incremental send caused the kernel code to enter an infinite loop when
      building the path string for directory Z after its references are processed.
      
      A more complex scenario:
      
            $ mkfs.btrfs -f /dev/sdd
            $ mount /dev/sdd /mnt
      
            $ mkdir -p /mnt/a/b/c/d
            $ mkdir /mnt/a/b/c/d/e
            $ mkdir /mnt/a/b/c/d/f
            $ mv /mnt/a/b/c/d/e /mnt/a/b/c/d/f/E2
            $ mkdir /mmt/a/b/c/g
            $ mv /mnt/a/b/c/d /mnt/a/b/D2
      
            $ btrfs subvolume snapshot -r /mnt /mnt/snap1
            $ btrfs send /mnt/snap1 -f /tmp/base.send
      
            $ mkdir /mnt/a/o
            $ mv /mnt/a/b/c/g /mnt/a/b/D2/f/G2
            $ mv /mnt/a/b/D2 /mnt/a/b/dd
            $ mv /mnt/a/b/c /mnt/a/C2
            $ mv /mnt/a/b/dd/f /mnt/a/o/FF
            $ mv /mnt/a/b /mnt/a/o/FF/E2/BB
      
            $ btrfs subvolume snapshot -r /mnt /mnt/snap2
            $ btrfs send -p /mnt/snap1 /mnt/snap2 -f /tmp/incremental.send
      
      A test case for xfstests follows.
      Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      bfa7e1f8
    • F
      Btrfs: fix incremental send's decision to delay a dir move/rename · 7b119a8b
      Filipe Manana 提交于
      It's possible to change the parent/child relationship between directories
      in such a way that if a child directory has a higher inode number than
      its parent, it doesn't necessarily means the child rename/move operation
      can be performed immediately. The parent migth have its own rename/move
      operation delayed, therefore in this case the child needs to have its
      rename/move operation delayed too, and be performed after its new parent's
      rename/move.
      
      Steps to reproduce the issue:
      
            $ umount /mnt
            $ mkfs.btrfs -f /dev/sdd
            $ mount /dev/sdd /mnt
      
            $ mkdir /mnt/A
            $ mkdir /mnt/B
            $ mkdir /mnt/C
            $ mv /mnt/C /mnt/A
            $ mv /mnt/B /mnt/A/C
            $ mkdir /mnt/A/C/D
      
            $ btrfs subvolume snapshot -r /mnt /mnt/snap1
            $ btrfs send /mnt/snap1 -f /tmp/base.send
      
            $ mv /mnt/A/C/D /mnt/A/D2
            $ mv /mnt/A/C/B /mnt/A/D2/B2
            $ mv /mnt/A/C /mnt/A/D2/B2/C2
      
            $ btrfs subvolume snapshot -r /mnt /mnt/snap2
            $ btrfs send -p /mnt/snap1 /mnt/snap2 -f /tmp/incremental.send
      
      The incremental send caused the kernel code to enter an infinite loop when
      building the path string for directory C after its references are processed.
      
      The necessary conditions here are that C has an inode number higher than both
      A and B, and B as an higher inode number higher than A, and D has the highest
      inode number, that is:
          inode_number(A) < inode_number(B) < inode_number(C) < inode_number(D)
      
      The same issue could happen if after the first snapshot there's any number
      of intermediary parent directories between A2 and B2, and between B2 and C2.
      
      A test case for xfstests follows, covering this simple case and more advanced
      ones, with files and hard links created inside the directories.
      Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      7b119a8b
  5. 21 3月, 2014 1 次提交
  6. 11 3月, 2014 23 次提交
    • L
      Btrfs: add readahead for send_write · 2131bcd3
      Liu Bo 提交于
      Btrfs send reads data from disk and then writes to a stream via pipe or
      a file via flush.
      
      Currently we're going to read each page a time, so every page results
      in a disk read, which is not friendly to disks, esp. HDD.  Given that,
      the performance can be gained by adding readahead for those pages.
      
      Here is a quick test:
      $ btrfs subvolume create send
      $ xfs_io -f -c "pwrite 0 1G" send/foobar
      $ btrfs subvolume snap -r send ro
      $ time "btrfs send ro -f /dev/null"
      
                 w/o             w
      real    1m37.527s       0m9.097s
      user    0m0.122s        0m0.086s
      sys     0m53.191s       0m12.857s
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      2131bcd3
    • L
      Btrfs: share the same code for __record_{new,deleted}_ref · a4d96d62
      Liu Bo 提交于
      This has no functional change, only picks out the same part of two functions,
      and makes it shared.
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      a4d96d62
    • F
      Btrfs: avoid unnecessary utimes update in incremental send · fcbd2154
      Filipe Manana 提交于
      When we're finishing processing of an inode, if we're dealing with a
      directory inode that has a pending move/rename operation, we don't
      need to send a utimes update instruction to the send stream, as we'll
      do it later after doing the move/rename operation. Therefore we save
      some time here building paths and doing btree lookups.
      Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      fcbd2154
    • L
      Btrfs: skip search tree for REG files · 644d1940
      Liu Bo 提交于
      It is really unnecessary to search tree again for @gen, @mode and @rdev
      in the case of REG inodes' creation, as we've got btrfs_inode_item in sctx,
      and @gen, @mode and @rdev can easily be fetched.
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      644d1940
    • D
      9c9ca00b
    • D
      btrfs: send: fix old buffer length in fs_path_ensure_buf · 1b2782c8
      David Sterba 提交于
      In "btrfs: send: lower memory requirements in common case" the code to
      save the old_buf_len was incorrectly moved to a wrong place and broke
      the original logic.
      Reported-by: NFilipe David Manana <fdmanana@gmail.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.cz>
      Reviewed-by: NFilipe David Manana <fdmanana@gmail.com>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      1b2782c8
    • F
      Btrfs: fix send issuing outdated paths for utimes, chown and chmod · bf0d1f44
      Filipe Manana 提交于
      When doing an incremental send, if we had a directory pending a move/rename
      operation and none of its parents, except for the immediate parent, were
      pending a move/rename, after processing the directory's references, we would
      be issuing utimes, chown and chmod intructions against am outdated path - a
      path which matched the one in the parent root.
      
      This change also simplifies a bit the code that deals with building a path
      for a directory which has a move/rename operation delayed.
      
      Steps to reproduce:
      
          $ mkfs.btrfs -f /dev/sdb3
          $ mount /dev/sdb3 /mnt/btrfs
          $ mkdir -p /mnt/btrfs/a/b/c/d/e
          $ mkdir /mnt/btrfs/a/b/c/f
          $ chmod 0777 /mnt/btrfs/a/b/c/d/e
          $ btrfs subvolume snapshot -r /mnt/btrfs /mnt/btrfs/snap1
          $ btrfs send /mnt/btrfs/snap1 -f /tmp/base.send
          $ mv /mnt/btrfs/a/b/c/f /mnt/btrfs/a/b/f2
          $ mv /mnt/btrfs/a/b/c/d/e /mnt/btrfs/a/b/f2/e2
          $ mv /mnt/btrfs/a/b/c /mnt/btrfs/a/b/c2
          $ mv /mnt/btrfs/a/b/c2/d /mnt/btrfs/a/b/c2/d2
          $ chmod 0700 /mnt/btrfs/a/b/f2/e2
          $ btrfs subvolume snapshot -r /mnt/btrfs /mnt/btrfs/snap2
          $ btrfs send -p /mnt/btrfs/snap1 /mnt/btrfs/snap2 -f /tmp/incremental.send
      
          $ umount /mnt/btrfs
          $ mkfs.btrfs -f /dev/sdb3
          $ mount /dev/sdb3 /mnt/btrfs
          $ btrfs receive /mnt/btrfs -f /tmp/base.send
          $ btrfs receive /mnt/btrfs -f /tmp/incremental.send
      
      The second btrfs receive command failed with:
      
          ERROR: chmod a/b/c/d/e failed. No such file or directory
      
      A test case for xfstests follows.
      Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      bf0d1f44
    • F
      Btrfs: fix send attempting to rmdir non-empty directories · 9dc44214
      Filipe Manana 提交于
      The incremental send algorithm assumed that it was possible to issue
      a directory remove (rmdir) if the the inode number it was currently
      processing was greater than (or equal) to any inode that referenced
      the directory's inode. This wasn't a valid assumption because any such
      inode might be a child directory that is pending a move/rename operation,
      because it was moved into a directory that has a higher inode number and
      was moved/renamed too - in other words, the case the following commit
      addressed:
      
          9f03740a
          (Btrfs: fix infinite path build loops in incremental send)
      
      This made an incremental send issue an rmdir operation before the
      target directory was actually empty, which made btrfs receive fail.
      Therefore it needs to wait for all pending child directory inodes to
      be moved/renamed before sending an rmdir operation.
      
      Simple steps to reproduce this issue:
      
          $ mkfs.btrfs -f /dev/sdb3
          $ mount /dev/sdb3 /mnt/btrfs
          $ mkdir -p /mnt/btrfs/a/b/c/x
          $ mkdir /mnt/btrfs/a/b/y
          $ btrfs subvolume snapshot -r /mnt/btrfs /mnt/btrfs/snap1
          $ btrfs send /mnt/btrfs/snap1 -f /tmp/base.send
          $ mv /mnt/btrfs/a/b/y /mnt/btrfs/a/b/YY
          $ mv /mnt/btrfs/a/b/c/x /mnt/btrfs/a/b/YY
          $ rmdir /mnt/btrfs/a/b/c
          $ btrfs subvolume snapshot -r /mnt/btrfs /mnt/btrfs/snap2
          $ btrfs send -p /mnt/btrfs/snap1 /mnt/btrfs/snap2 -f /tmp/incremental.send
      
          $ umount /mnt/btrfs
          $ mkfs.btrfs -f /dev/sdb3
          $ mount /dev/sdb3 /mnt/btrfs
          $ btrfs receive /mnt/btrfs -f /tmp/base.send
          $ btrfs receive /mnt/btrfs -f /tmp/incremental.send
      
      The second btrfs receive command failed with:
      
          ERROR: rmdir o259-6-0 failed. Directory not empty
      
      A test case for xfstests follows.
      Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      9dc44214
    • F
      Btrfs: send, don't send rmdir for same target multiple times · 29d6d30f
      Filipe Manana 提交于
      When doing an incremental send, if we delete a directory that has N > 1
      hardlinks for the same file and that file has the highest inode number
      inside the directory contents, an incremental send would send N times an
      rmdir operation against the directory. This made the btrfs receive command
      fail on the second rmdir instruction, as the target directory didn't exist
      anymore.
      
      Steps to reproduce the issue:
      
          $ mkfs.btrfs -f /dev/sdb3
          $ mount /dev/sdb3 /mnt/btrfs
          $ mkdir -p /mnt/btrfs/a/b/c
          $ echo 'ola mundo' > /mnt/btrfs/a/b/c/foo.txt
          $ ln /mnt/btrfs/a/b/c/foo.txt /mnt/btrfs/a/b/c/bar.txt
          $ btrfs subvolume snapshot -r /mnt/btrfs /mnt/btrfs/snap1
          $ btrfs send /mnt/btrfs/snap1 -f /tmp/base.send
          $ rm -f /mnt/btrfs/a/b/c/foo.txt
          $ rm -f /mnt/btrfs/a/b/c/bar.txt
          $ rmdir /mnt/btrfs/a/b/c
          $ btrfs subvolume snapshot -r /mnt/btrfs /mnt/btrfs/snap2
          $ btrfs send -p /mnt/btrfs/snap1 /mnt/btrfs/snap2 -f /tmp/incremental.send
      
          $ umount /mnt/btrfs
          $ mkfs.btrfs -f /dev/sdb3
          $ mount /dev/sdb3 /mnt/btrfs
          $ btrfs receive /mnt/btrfs -f /tmp/base.send
          $ btrfs receive /mnt/btrfs -f /tmp/incremental.send
      
      The second btrfs receive command failed with:
      
          ERROR: rmdir o259-6-0 failed. No such file or directory
      
      A test case for xfstests follows.
      Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      29d6d30f
    • F
      Btrfs: incremental send, fix invalid path after dir rename · 2b863a13
      Filipe Manana 提交于
      This fixes yet one more case not caught by the commit titled:
      
         Btrfs: fix infinite path build loops in incremental send
      
      In this case, even before the initial full send, we have a directory
      which is a child of a directory with a higher inode number. Then we
      perform the initial send, and after we rename both the child and the
      parent, without moving them around. After doing these 2 renames, an
      incremental send sent a rename instruction for the child directory
      which contained an invalid "from" path (referenced the parent's old
      name, not the new one), which made the btrfs receive command fail.
      
      Steps to reproduce:
      
          $ mkfs.btrfs -f /dev/sdb3
          $ mount /dev/sdb3 /mnt/btrfs
          $ mkdir -p /mnt/btrfs/a/b
          $ mkdir /mnt/btrfs/d
          $ mkdir /mnt/btrfs/a/b/c
          $ mv /mnt/btrfs/d /mnt/btrfs/a/b/c
          $ btrfs subvolume snapshot -r /mnt/btrfs /mnt/btrfs/snap1
          $ btrfs send /mnt/btrfs/snap1 -f /tmp/base.send
          $ mv /mnt/btrfs/a/b/c /mnt/btrfs/a/b/x
          $ mv /mnt/btrfs/a/b/x/d /mnt/btrfs/a/b/x/y
          $ btrfs subvolume snapshot -r /mnt/btrfs /mnt/btrfs/snap2
          $ btrfs send -p /mnt/btrfs/snap1 /mnt/btrfs/snap2 -f /tmp/incremental.send
      
          $ umout /mnt/btrfs
          $ mkfs.btrfs -f /dev/sdb3
          $ mount /dev/sdb3 /mnt/btrfs
          $ btrfs receive /mnt/btrfs -f /tmp/base.send
          $ btrfs receive /mnt/btrfs -f /tmp/incremental.send
      
      The second btrfs receive command failed with:
        "ERROR: rename a/b/c/d -> a/b/x/y failed. No such file or directory"
      
      A test case for xfstests follows.
      Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      2b863a13
    • W
      Revert "Btrfs: remove transaction from btrfs send" · dcfd5ad2
      Wang Shilong 提交于
      This reverts commit 41ce9970.
      Previously i was thinking we can use readonly root's commit root
      safely while it is not true, readonly root may be cowed with the
      following cases.
      
      1.snapshot send root will cow source root.
      2.balance,device operations will also cow readonly send root
      to relocate.
      
      So i have two ideas to make us safe to use commit root.
      
      -->approach 1:
      make it protected by transaction and end transaction properly and we research
      next item from root node(see btrfs_search_slot_for_read()).
      
      -->approach 2:
      add another counter to local root structure to sync snapshot with send.
      and add a global counter to sync send with exclusive device operations.
      
      So with approach 2, send can use commit root safely, because we make sure
      send root can not be cowed during send. Unfortunately, it make codes *ugly*
      and more complex to maintain.
      
      To make snapshot and send exclusively, device operations and send operation
      exclusively with each other is a little confusing for common users.
      
      So why not drop into previous way.
      
      Cc: Josef Bacik <jbacik@fb.com>
      Signed-off-by: NWang Shilong <wangsl.fnst@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      dcfd5ad2
    • D
      btrfs: send: lower memory requirements in common case · ace01050
      David Sterba 提交于
      The fs_path structure uses an inline buffer and falls back to a chain of
      allocations, but vmalloc is not necessary because PATH_MAX fits into
      PAGE_SIZE.
      
      The size of fs_path has been reduced to 256 bytes from PAGE_SIZE,
      usually 4k. Experimental measurements show that most paths on a single
      filesystem do not exceed 200 bytes, and these get stored into the inline
      buffer directly, which is now 230 bytes. Longer paths are kmalloced when
      needed.
      Signed-off-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      ace01050
    • F
      Btrfs: make some tree searches in send.c more efficient · dff6d0ad
      Filipe David Borba Manana 提交于
      We have this pattern where we do search for a contiguous group of
      items in a tree and everytime we find an item, we process it, then
      we release our path, increment the offset of the search key, do
      another full tree search and repeat these steps until a tree search
      can't find more items we're interested in.
      
      Instead of doing these full tree searches after processing each item,
      just process the next item/slot in our leaf and don't release the path.
      Since all these trees are read only and we always use the commit root
      for a search and skip node/leaf locks, we're not affecting concurrency
      on the trees.
      Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      dff6d0ad
    • F
      Btrfs: use right extent item position in send when finding extent clones · a0859c09
      Filipe David Borba Manana 提交于
      This was a leftover from the commit:
      
         74dd17fb
         (Btrfs: fix btrfs send for inline items and compression)
      Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      a0859c09
    • D
      btrfs: send: remove BUG_ON from name_cache_delete · 57fb8910
      David Sterba 提交于
      If cleaning the name cache fails, we could try to proceed at the cost of
      some memory leak. This is not expected to happen often.
      Signed-off-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      57fb8910
    • D
      btrfs: send: remove BUG from process_all_refs · 4d1a63b2
      David Sterba 提交于
      There are only 2 static callers, the BUG would normally be never
      reached, but let's be nice.
      Signed-off-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      4d1a63b2
    • D
      btrfs: send: squeeze bitfilelds in fs_path · 1f5a7ff9
      David Sterba 提交于
      We know that buf_len is at most PATH_MAX, 4k, and can merge it with the
      reversed member. This saves 3 bytes in favor of inline_buf.
      Signed-off-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      1f5a7ff9
    • D
      btrfs: send: remove virtual_mem member from fs_path · e25a8122
      David Sterba 提交于
      We don't need to keep track of that, it's available via is_vmalloc_addr.
      Signed-off-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      e25a8122
    • D
      btrfs: send: remove prepared member from fs_path · b23ab57d
      David Sterba 提交于
      The member is used only to return value back from
      fs_path_prepare_for_add, we can do it locally and save 8 bytes for the
      inline_buf path.
      Signed-off-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      b23ab57d
    • D
      btrfs: send: replace check with an assert in gen_unique_name · 64792f25
      David Sterba 提交于
      The buffer passed to snprintf can hold the fully expanded format string,
      64 = 3x largest ULL + 3x char + trailing null.  I don't think that removing the
      check entirely is a good idea, hence the ASSERT.
      Signed-off-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      64792f25
    • F
      Btrfs: more send support for parent/child dir relationship inversion · 5ed7f9ff
      Filipe David Borba Manana 提交于
      The commit titled "Btrfs: fix infinite path build loops in incremental send"
      didn't cover a particular case where the parent-child relationship inversion
      of directories doesn't imply a rename of the new parent directory. This was
      due to a simple logic mistake, a logical and instead of a logical or.
      
      Steps to reproduce:
      
        $ mkfs.btrfs -f /dev/sdb3
        $ mount /dev/sdb3 /mnt/btrfs
        $ mkdir -p /mnt/btrfs/a/b/bar1/bar2/bar3/bar4
        $ btrfs subvol snapshot -r /mnt/btrfs /mnt/btrfs/snap1
        $ mv /mnt/btrfs/a/b/bar1/bar2/bar3/bar4 /mnt/btrfs/a/b/k44
        $ mv /mnt/btrfs/a/b/bar1/bar2/bar3 /mnt/btrfs/a/b/k44
        $ mv /mnt/btrfs/a/b/bar1/bar2 /mnt/btrfs/a/b/k44/bar3
        $ mv /mnt/btrfs/a/b/bar1 /mnt/btrfs/a/b/k44/bar3/bar2/k11
        $ btrfs subvol snapshot -r /mnt/btrfs /mnt/btrfs/snap2
        $ btrfs send -p /mnt/btrfs/snap1 /mnt/btrfs/snap2 > /tmp/incremental.send
      
      A patch to update the test btrfs/030 from xfstests, so that it covers
      this case, will be submitted soon.
      Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      5ed7f9ff
    • F
      Btrfs: fix send dealing with file renames and directory moves · 03cb4fb9
      Filipe David Borba Manana 提交于
      This fixes a case that the commit titled:
      
         Btrfs: fix infinite path build loops in incremental send
      
      didn't cover. If the parent-child relationship between 2 directories
      is inverted, both get renamed, and the former parent has a file that
      got renamed too (but remains a child of that directory), the incremental
      send operation would use the file's old path after sending an unlink
      operation for that old path, causing receive to fail on future operations
      like changing owner, permissions or utimes of the corresponding inode.
      
      This is not a regression from the commit mentioned before, as without
      that commit we would fall into the issues that commit fixed, so it's
      just one case that wasn't covered before.
      
      Simple steps to reproduce this issue are:
      
            $ mkfs.btrfs -f /dev/sdb3
            $ mount /dev/sdb3 /mnt/btrfs
            $ mkdir -p /mnt/btrfs/a/b/c/d
            $ touch /mnt/btrfs/a/b/c/d/file
            $ mkdir -p /mnt/btrfs/a/b/x
            $ btrfs subvol snapshot -r /mnt/btrfs /mnt/btrfs/snap1
            $ mv /mnt/btrfs/a/b/x /mnt/btrfs/a/b/c/x2
            $ mv /mnt/btrfs/a/b/c/d /mnt/btrfs/a/b/c/x2/d2
            $ mv /mnt/btrfs/a/b/c/x2/d2/file /mnt/btrfs/a/b/c/x2/d2/file2
            $ btrfs subvol snapshot -r /mnt/btrfs /mnt/btrfs/snap2
            $ btrfs send -p /mnt/btrfs/snap1 /mnt/btrfs/snap2 > /tmp/incremental.send
      
      A patch to update the test btrfs/030 from xfstests, so that it covers
      this case, will be submitted soon.
      Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      03cb4fb9
    • F
      Btrfs: add missing error check in incremental send · d86477b3
      Filipe David Borba Manana 提交于
      Function wait_for_parent_move() returns negative value if an error
      happened, 0 if we don't need to wait for the parent's move, and
      1 if the wait is needed.
      Before this change an error return value was being treated like the
      return value 1, which was not correct.
      Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      d86477b3
  7. 16 2月, 2014 1 次提交
    • F
      Btrfs: use right clone root offset for compressed extents · 93de4ba8
      Filipe David Borba Manana 提交于
      For non compressed extents, iterate_extent_inodes() gives us offsets
      that take into account the data offset from the file extent items, while
      for compressed extents it doesn't. Therefore we have to adjust them before
      placing them in a send clone instruction. Not doing this adjustment leads to
      the receiving end requesting for a wrong a file range to the clone ioctl,
      which results in different file content from the one in the original send
      root.
      
      Issue reproducible with the following excerpt from the test I made for
      xfstests:
      
        _scratch_mkfs
        _scratch_mount "-o compress-force=lzo"
      
        $XFS_IO_PROG -f -c "truncate 118811" $SCRATCH_MNT/foo
        $XFS_IO_PROG -c "pwrite -S 0x0d -b 39987 92267 39987" $SCRATCH_MNT/foo
      
        $BTRFS_UTIL_PROG subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/mysnap1
      
        $XFS_IO_PROG -c "pwrite -S 0x3e -b 80000 200000 80000" $SCRATCH_MNT/foo
        $BTRFS_UTIL_PROG filesystem sync $SCRATCH_MNT
        $XFS_IO_PROG -c "pwrite -S 0xdc -b 10000 250000 10000" $SCRATCH_MNT/foo
        $XFS_IO_PROG -c "pwrite -S 0xff -b 10000 300000 10000" $SCRATCH_MNT/foo
      
        # will be used for incremental send to be able to issue clone operations
        $BTRFS_UTIL_PROG subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/clones_snap
      
        $BTRFS_UTIL_PROG subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/mysnap2
      
        $FSSUM_PROG -A -f -w $tmp/1.fssum $SCRATCH_MNT/mysnap1
        $FSSUM_PROG -A -f -w $tmp/2.fssum -x $SCRATCH_MNT/mysnap2/mysnap1 \
            -x $SCRATCH_MNT/mysnap2/clones_snap $SCRATCH_MNT/mysnap2
        $FSSUM_PROG -A -f -w $tmp/clones.fssum $SCRATCH_MNT/clones_snap \
            -x $SCRATCH_MNT/clones_snap/mysnap1 -x $SCRATCH_MNT/clones_snap/mysnap2
      
        $BTRFS_UTIL_PROG send $SCRATCH_MNT/mysnap1 -f $tmp/1.snap
        $BTRFS_UTIL_PROG send $SCRATCH_MNT/clones_snap -f $tmp/clones.snap
        $BTRFS_UTIL_PROG send -p $SCRATCH_MNT/mysnap1 \
            -c $SCRATCH_MNT/clones_snap $SCRATCH_MNT/mysnap2 -f $tmp/2.snap
      
        _scratch_unmount
        _scratch_mkfs
        _scratch_mount
      
        $BTRFS_UTIL_PROG receive $SCRATCH_MNT -f $tmp/1.snap
        $FSSUM_PROG -r $tmp/1.fssum $SCRATCH_MNT/mysnap1 2>> $seqres.full
      
        $BTRFS_UTIL_PROG receive $SCRATCH_MNT -f $tmp/clones.snap
        $FSSUM_PROG -r $tmp/clones.fssum $SCRATCH_MNT/clones_snap 2>> $seqres.full
      
        $BTRFS_UTIL_PROG receive $SCRATCH_MNT -f $tmp/2.snap
        $FSSUM_PROG -r $tmp/2.fssum $SCRATCH_MNT/mysnap2 2>> $seqres.full
      Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      93de4ba8
  8. 09 2月, 2014 1 次提交
    • J
      Btrfs: fix assert screwup for the pending move stuff · 6cc98d90
      Josef Bacik 提交于
      Wang noticed that he was failing btrfs/030 even though me and Filipe couldn't
      reproduce.  Turns out this is because Wang didn't have CONFIG_BTRFS_ASSERT set,
      which meant that a key part of Filipe's original patch was not being built in.
      This appears to be a mess up with merging Filipe's patch as it does not exist in
      his original patch.  Fix this by changing how we make sure del_waiting_dir_move
      asserts that it did not error and take the function out of the ifdef check.
      This makes btrfs/030 pass with the assert on or off.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Reviewed-by: NFilipe Manana <fdmanana@gmail.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      6cc98d90
  9. 04 2月, 2014 1 次提交
  10. 29 1月, 2014 4 次提交
    • C
      Btrfs: don't use ram_bytes for uncompressed inline items · 514ac8ad
      Chris Mason 提交于
      If we truncate an uncompressed inline item, ram_bytes isn't updated to reflect
      the new size.  The fixe uses the size directly from the item header when
      reading uncompressed inlines, and also fixes truncate to update the
      size as it goes.
      Reported-by: NJens Axboe <axboe@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      CC: stable@vger.kernel.org
      514ac8ad
    • F
      Btrfs: fix send file hole detection leading to data corruption · bf54f412
      Filipe David Borba Manana 提交于
      There was a case where file hole detection was incorrect and it would
      cause an incremental send to override a section of a file with zeroes.
      
      This happened in the case where between the last leaf we processed which
      contained a file extent item for our current inode and the leaf we're
      currently are at (and has a file extent item for our current inode) there
      are only leafs containing exclusively file extent items for our current
      inode, and none of them was updated since the previous send operation.
      The file hole detection code would incorrectly consider the file range
      covered by these leafs as a hole.
      
      A test case for xfstests follows soon.
      Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      bf54f412
    • F
      Btrfs: make send's file extent item search more efficient · 7fdd29d0
      Filipe David Borba Manana 提交于
      Instead of looking for a file extent item, process it, release the path
      and do a btree search for the next file extent item, just process all
      file extent items in a leaf without intermediate btree searches. This way
      we save cpu and we're not blocking other tasks or affecting concurrency on
      the btree, because send's paths use the commit root and skip btree node/leaf
      locking.
      Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      7fdd29d0
    • F
      Btrfs: fix infinite path build loops in incremental send · 9f03740a
      Filipe David Borba Manana 提交于
      The send operation processes inodes by their ascending number, and assumes
      that any rename/move operation can be successfully performed (sent to the
      caller) once all previous inodes (those with a smaller inode number than the
      one we're currently processing) were processed.
      
      This is not true when an incremental send had to process an hierarchical change
      between 2 snapshots where the parent-children relationship between directory
      inodes was reversed - that is, parents became children and children became
      parents. This situation made the path building code go into an infinite loop,
      which kept allocating more and more memory that eventually lead to a krealloc
      warning being displayed in dmesg:
      
        WARNING: CPU: 1 PID: 5705 at mm/page_alloc.c:2477 __alloc_pages_nodemask+0x365/0xad0()
        Modules linked in: btrfs raid6_pq xor pci_stub vboxpci(O) vboxnetadp(O) vboxnetflt(O) vboxdrv(O) snd_hda_codec_hdmi snd_hda_codec_realtek joydev radeon snd_hda_intel snd_hda_codec snd_hwdep snd_seq_midi snd_pcm psmouse i915 snd_rawmidi serio_raw snd_seq_midi_event lpc_ich snd_seq snd_timer ttm snd_seq_device rfcomm drm_kms_helper parport_pc bnep bluetooth drm ppdev snd soundcore i2c_algo_bit snd_page_alloc binfmt_misc video lp parport r8169 mii hid_generic usbhid hid
        CPU: 1 PID: 5705 Comm: btrfs Tainted: G           O 3.13.0-rc7-fdm-btrfs-next-18+ #3
        Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z77 Pro4, BIOS P1.50 09/04/2012
        [ 5381.660441]  00000000000009ad ffff8806f6f2f4e8 ffffffff81777434 0000000000000007
        [ 5381.660447]  0000000000000000 ffff8806f6f2f528 ffffffff8104a9ec ffff8807038f36f0
        [ 5381.660452]  0000000000000000 0000000000000206 ffff8807038f2490 ffff8807038f36f0
        [ 5381.660457] Call Trace:
        [ 5381.660464]  [<ffffffff81777434>] dump_stack+0x4e/0x68
        [ 5381.660471]  [<ffffffff8104a9ec>] warn_slowpath_common+0x8c/0xc0
        [ 5381.660476]  [<ffffffff8104aa3a>] warn_slowpath_null+0x1a/0x20
        [ 5381.660480]  [<ffffffff81144995>] __alloc_pages_nodemask+0x365/0xad0
        [ 5381.660487]  [<ffffffff8108313f>] ? local_clock+0x4f/0x60
        [ 5381.660491]  [<ffffffff811430e8>] ? free_one_page+0x98/0x440
        [ 5381.660495]  [<ffffffff8108313f>] ? local_clock+0x4f/0x60
        [ 5381.660502]  [<ffffffff8113fae4>] ? __get_free_pages+0x14/0x50
        [ 5381.660508]  [<ffffffff81095fb8>] ? trace_hardirqs_off_caller+0x28/0xd0
        [ 5381.660515]  [<ffffffff81183caf>] alloc_pages_current+0x10f/0x1f0
        [ 5381.660520]  [<ffffffff8113fae4>] ? __get_free_pages+0x14/0x50
        [ 5381.660524]  [<ffffffff8113fae4>] __get_free_pages+0x14/0x50
        [ 5381.660530]  [<ffffffff8115dace>] kmalloc_order_trace+0x3e/0x100
        [ 5381.660536]  [<ffffffff81191ea0>] __kmalloc_track_caller+0x220/0x230
        [ 5381.660560]  [<ffffffffa0729fdb>] ? fs_path_ensure_buf.part.12+0x6b/0x200 [btrfs]
        [ 5381.660564]  [<ffffffff8178085c>] ? retint_restore_args+0xe/0xe
        [ 5381.660569]  [<ffffffff811580ef>] krealloc+0x6f/0xb0
        [ 5381.660586]  [<ffffffffa0729fdb>] fs_path_ensure_buf.part.12+0x6b/0x200 [btrfs]
        [ 5381.660601]  [<ffffffffa072a208>] fs_path_prepare_for_add+0x98/0xb0 [btrfs]
        [ 5381.660615]  [<ffffffffa072a2bc>] fs_path_add_path+0x2c/0x60 [btrfs]
        [ 5381.660628]  [<ffffffffa072c55c>] get_cur_path+0x7c/0x1c0 [btrfs]
      
      Even without this loop, the incremental send couldn't succeed, because it would attempt
      to send a rename/move operation for the lower inode before the highest inode number was
      renamed/move. This issue is easy to trigger with the following steps:
      
        $ mkfs.btrfs -f /dev/sdb3
        $ mount /dev/sdb3 /mnt/btrfs
        $ mkdir -p /mnt/btrfs/a/b/c/d
        $ mkdir /mnt/btrfs/a/b/c2
        $ btrfs subvol snapshot -r /mnt/btrfs /mnt/btrfs/snap1
        $ mv /mnt/btrfs/a/b/c/d /mnt/btrfs/a/b/c2/d2
        $ mv /mnt/btrfs/a/b/c /mnt/btrfs/a/b/c2/d2/cc
        $ btrfs subvol snapshot -r /mnt/btrfs /mnt/btrfs/snap2
        $ btrfs send -p /mnt/btrfs/snap1 /mnt/btrfs/snap2 > /tmp/incremental.send
      
      The structure of the filesystem when the first snapshot is taken is:
      
      	 .                       (ino 256)
      	 |-- a                   (ino 257)
      	     |-- b               (ino 258)
      	         |-- c           (ino 259)
      	         |   |-- d       (ino 260)
                       |
      	         |-- c2          (ino 261)
      
      And its structure when the second snapshot is taken is:
      
      	 .                       (ino 256)
      	 |-- a                   (ino 257)
      	     |-- b               (ino 258)
      	         |-- c2          (ino 261)
      	             |-- d2      (ino 260)
      	                 |-- cc  (ino 259)
      
      Before the move/rename operation is performed for the inode 259, the
      move/rename for inode 260 must be performed, since 259 is now a child
      of 260.
      
      A test case for xfstests, with a more complex scenario, will follow soon.
      Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      9f03740a