1. 12 10月, 2016 1 次提交
    • F
      Btrfs: fix incremental send failure caused by balance · d5e84fd8
      Filipe Manana 提交于
      Commit 95155585 ("Btrfs: send, don't bug on inconsistent snapshots")
      removed some BUG_ON() statements (replacing them with returning errors
      to user space and logging error messages) when a snapshot is in an
      inconsistent state due to failures to update a delayed inode item (ENOMEM
      or ENOSPC) after adding/updating/deleting references, xattrs or file
      extent items.
      
      However there is a case, when no errors happen, where a file extent item
      can be modified without having the corresponding inode item updated. This
      case happens during balance under very specific timings, when relocation
      is in the stage where it updates data pointers and a leaf that contains
      file extent items is COWed. When that happens file extent items get their
      disk_bytenr field updated to a new value that reflects the post relocation
      logical address of the extent, without updating their respective inode
      items (as there is nothing that needs to be updated on them). This is
      performed at relocation.c:replace_file_extents() through
      relocation.c:btrfs_reloc_cow_block().
      
      So make an incremental send deal with this case and don't do any processing
      for a file extent item that got its disk_bytenr field updated by relocation,
      since the extent's data is the same as the one pointed by the file extent
      item in the parent snapshot.
      
      After the recent commit mentioned above this case resulted in EIO errors
      returned to user space (and an error message logged to dmesg/syslog) when
      doing an incremental send, while before it, it resulted in hitting a
      BUG_ON leading to the following trace:
      
      [  952.206705] ------------[ cut here ]------------
      [  952.206714] kernel BUG at ../fs/btrfs/send.c:5653!
      [  952.206719] Internal error: Oops - BUG: 0 [#1] SMP
      [  952.209854] Modules linked in: st dm_mod nls_utf8 isofs fuse nf_log_ipv6 xt_pkttype xt_physdev br_netfilter nf_log_ipv4 nf_log_common xt_LOG xt_limit ebtable_filter ebtables af_packet bridge stp llc ip6t_REJECT xt_tcpudp nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_raw ipt_REJECT iptable_raw xt_CT iptable_filter ip6table_mangle nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_ipv4 nf_defrag_ipv4 ip_tables xt_conntrack nf_conntrack ip6table_filter ip6_tables x_tables xfs libcrc32c nls_iso8859_1 nls_cp437 vfat fat joydev aes_ce_blk ablk_helper cryptd snd_intel8x0 aes_ce_cipher snd_ac97_codec ac97_bus snd_pcm ghash_ce sha2_ce sha1_ce snd_timer snd virtio_net soundcore btrfs xor sr_mod cdrom hid_generic usbhid raid6_pq virtio_blk virtio_scsi bochs_drm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm virtio_mmio xhci_pci xhci_hcd usbcore usb_common virtio_pci virtio_ring virtio drm sg efivarfs
      [  952.228333] Supported: Yes
      [  952.228908] CPU: 0 PID: 12779 Comm: snapperd Not tainted 4.4.14-50-default #1
      [  952.230329] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
      [  952.231683] task: ffff800058e94100 ti: ffff8000d866c000 task.ti: ffff8000d866c000
      [  952.233279] PC is at changed_cb+0x9f4/0xa48 [btrfs]
      [  952.234375] LR is at changed_cb+0x58/0xa48 [btrfs]
      [  952.236552] pc : [<ffff7ffffc39de7c>] lr : [<ffff7ffffc39d4e0>] pstate: 80000145
      [  952.238049] sp : ffff8000d866fa20
      [  952.238732] x29: ffff8000d866fa20 x28: 0000000000000019
      [  952.239840] x27: 00000000000028d5 x26: 00000000000024a2
      [  952.241008] x25: 0000000000000002 x24: ffff8000e66e92f0
      [  952.242131] x23: ffff8000b8c76800 x22: ffff800092879140
      [  952.243238] x21: 0000000000000002 x20: ffff8000d866fb78
      [  952.244348] x19: ffff8000b8f8c200 x18: 0000000000002710
      [  952.245607] x17: 0000ffff90d42480 x16: ffff800000237dc0
      [  952.246719] x15: 0000ffff90de7510 x14: ab000c000a2faf08
      [  952.247835] x13: 0000000000577c2b x12: ab000c000b696665
      [  952.248981] x11: 2e65726f632f6966 x10: 652d34366d72612f
      [  952.250101] x9 : 32627572672f746f x8 : ab000c00092f1671
      [  952.251352] x7 : 8000000000577c2b x6 : ffff800053eadf45
      [  952.252468] x5 : 0000000000000000 x4 : ffff80005e169494
      [  952.253582] x3 : 0000000000000004 x2 : ffff8000d866fb78
      [  952.254695] x1 : 000000000003e2a3 x0 : 000000000003e2a4
      [  952.255803]
      [  952.256150] Process snapperd (pid: 12779, stack limit = 0xffff8000d866c020)
      [  952.257516] Stack: (0xffff8000d866fa20 to 0xffff8000d8670000)
      [  952.258654] fa20: ffff8000d866fae0 ffff7ffffc308fc0 ffff800092879140 ffff8000e66e92f0
      [  952.260219] fa40: 0000000000000035 ffff800055de6000 ffff8000b8c76800 ffff8000d866fb78
      [  952.261745] fa60: 0000000000000002 00000000000024a2 00000000000028d5 0000000000000019
      [  952.263269] fa80: ffff8000d866fae0 ffff7ffffc3090f0 ffff8000d866fae0 ffff7ffffc309128
      [  952.264797] faa0: ffff800092879140 ffff8000e66e92f0 0000000000000035 ffff800055de6000
      [  952.268261] fac0: ffff8000b8c76800 ffff8000d866fb78 0000000000000002 0000000000001000
      [  952.269822] fae0: ffff8000d866fbc0 ffff7ffffc39ecfc ffff8000b8f8c200 ffff8000b8f8c368
      [  952.271368] fb00: ffff8000b8f8c378 ffff800055de6000 0000000000000001 ffff8000ecb17500
      [  952.272893] fb20: ffff8000b8c76800 ffff800092879140 ffff800062b6d000 ffff80007a9e2470
      [  952.274420] fb40: ffff8000b8f8c208 0000000005784000 ffff8000580a8000 ffff8000b8f8c200
      [  952.276088] fb60: ffff7ffffc39d488 00000002b8f8c368 0000000000000000 000000000003e2a4
      [  952.280275] fb80: 000000000000006c ffff7ffffc39ec00 000000000003e2a4 000000000000006c
      [  952.283219] fba0: ffff8000b8f8c300 0000000000000100 0000000000000001 ffff8000ecb17500
      [  952.286166] fbc0: ffff8000d866fcd0 ffff7ffffc3643c0 ffff8000f8842700 0000ffff8ffe9278
      [  952.289136] fbe0: 0000000040489426 ffff800055de6000 0000ffff8ffe9278 0000000040489426
      [  952.292083] fc00: 000000000000011d 000000000000001d ffff80007a9e4598 ffff80007a9e43e8
      [  952.294959] fc20: ffff8000b8c7693f 0000000000003b24 0000000000000019 ffff8000b8f8c218
      [  952.301161] fc40: 00000001d866fc70 ffff8000b8c76800 0000000000000128 ffffffffffffff84
      [  952.305749] fc60: ffff800058e941ff 0000000000003a58 ffff8000d866fcb0 ffff8000000f7390
      [  952.308875] fc80: 000000000000012a 0000000000010290 ffff8000d866fc00 000000000000007b
      [  952.311915] fca0: 0000000000010290 ffff800046c1b100 74732d7366727462 000001006d616572
      [  952.314937] fcc0: ffff8000fffc4100 cb88537fdc8ba60e ffff8000d866fe10 ffff8000002499e8
      [  952.318008] fce0: 0000000040489426 ffff8000f8842700 0000ffff8ffe9278 ffff80007a9e4598
      [  952.321321] fd00: 0000ffff8ffe9278 0000000040489426 000000000000011d 000000000000001d
      [  952.324280] fd20: ffff80000072c000 ffff8000d866c000 ffff8000d866fda0 ffff8000000e997c
      [  952.327156] fd40: ffff8000fffc4180 00000000000031ed ffff8000fffc4180 ffff800046c1b7d4
      [  952.329895] fd60: 0000000000000140 0000ffff907ea170 000000000000011d 00000000000000dc
      [  952.334641] fd80: ffff80000072c000 ffff8000d866c000 0000000000000000 0000000000000002
      [  952.338002] fda0: ffff8000d866fdd0 ffff8000000ebacc ffff800046c1b080 ffff800046c1b7d4
      [  952.340724] fdc0: ffff8000d866fdf0 ffff8000000db67c 0000000000000040 ffff800000e69198
      [  952.343415] fde0: 0000ffff8ffea790 00000000000031ed ffff8000d866fe20 ffff800000254000
      [  952.346101] fe00: 000000000000001d 0000000000000004 ffff8000d866fe90 ffff800000249d3c
      [  952.348980] fe20: ffff8000f8842700 0000000000000000 ffff8000f8842701 0000000000000008
      [  952.351696] fe40: ffff8000d866fe70 0000000000000008 ffff8000d866fe90 ffff800000249cf8
      [  952.354387] fe60: ffff8000f8842700 0000ffff8ffe9170 ffff8000f8842701 0000000000000008
      [  952.357083] fe80: 0000ffff8ffe9278 ffff80008ff85500 0000ffff8ffe90c0 ffff800000085c84
      [  952.359800] fea0: 0000000000000000 0000ffff8ffe9170 ffffffffffffffff 0000ffff90d473bc
      [  952.365351] fec0: 0000000000000000 0000000000000015 0000000000000008 0000000040489426
      [  952.369550] fee0: 0000ffff8ffe9278 0000ffff907ea790 0000ffff907ea170 0000ffff907ea790
      [  952.372416] ff00: 0000ffff907ea170 0000000000000000 000000000000001d 0000000000000004
      [  952.375223] ff20: 0000ffff90a32220 00000000003d0f00 0000ffff907ea0a0 0000ffff8ffe8f30
      [  952.378099] ff40: 0000ffff9100f554 0000ffff91147000 0000ffff91117bc0 0000ffff90d473b0
      [  952.381115] ff60: 0000ffff9100f620 0000ffff880069b0 0000ffff8ffe9170 0000ffff8ffe91a0
      [  952.384003] ff80: 0000ffff8ffe9160 0000ffff8ffe9140 0000ffff88006990 0000ffff8ffe9278
      [  952.386860] ffa0: 0000ffff88008a60 0000ffff8ffe9480 0000ffff88014ca0 0000ffff8ffe90c0
      [  952.389654] ffc0: 0000ffff910be8e8 0000ffff8ffe90c0 0000ffff90d473bc 0000000000000000
      [  952.410986] ffe0: 0000000000000008 000000000000001d 6e2079747265706f 72616d223d656d61
      [  952.415497] Call trace:
      [  952.417403] [<ffff7ffffc39de7c>] changed_cb+0x9f4/0xa48 [btrfs]
      [  952.420023] [<ffff7ffffc308fc0>] btrfs_compare_trees+0x500/0x6b0 [btrfs]
      [  952.422759] [<ffff7ffffc39ecfc>] btrfs_ioctl_send+0xb4c/0xe10 [btrfs]
      [  952.425601] [<ffff7ffffc3643c0>] btrfs_ioctl+0x374/0x29a4 [btrfs]
      [  952.428031] [<ffff8000002499e8>] do_vfs_ioctl+0x33c/0x600
      [  952.430360] [<ffff800000249d3c>] SyS_ioctl+0x90/0xa4
      [  952.432552] [<ffff800000085c84>] el0_svc_naked+0x38/0x3c
      [  952.434803] Code: 2a1503e0 17fffdac b9404282 17ffff28 (d4210000)
      [  952.437457] ---[ end trace 9afd7090c466cf15 ]---
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      d5e84fd8
  2. 28 9月, 2016 1 次提交
  3. 27 9月, 2016 3 次提交
  4. 01 9月, 2016 1 次提交
    • J
      Btrfs: kill invalid ASSERT() in process_all_refs() · 3dc09ec8
      Josef Bacik 提交于
      Suppose you have the following tree in snap1 on a file system mounted with -o
      inode_cache so that inode numbers are recycled
      
      └── [    258]  a
          └── [    257]  b
      
      and then you remove b, rename a to c, and then re-create b in c so you have the
      following tree
      
      └── [    258]  c
          └── [    257]  b
      
      and then you try to do an incremental send you will hit
      
      ASSERT(pending_move == 0);
      
      in process_all_refs().  This is because we assume that any recycling of inodes
      will not have a pending change in our path, which isn't the case.  This is the
      case for the DELETE side, since we want to remove the old file using the old
      path, but on the create side we could have a pending move and need to do the
      normal pending rename dance.  So remove this ASSERT() and put a comment about
      why we ignore pending_move.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      3dc09ec8
  5. 01 8月, 2016 8 次提交
    • F
      Btrfs: send, don't bug on inconsistent snapshots · 95155585
      Filipe Manana 提交于
      When doing an incremental send, if we find a new/modified/deleted extent,
      reference or xattr without having previously processed the corresponding
      inode item we end up exexuting a BUG_ON(). This is because whenever an
      extent, xattr or reference is added, modified or deleted, we always expect
      to have the corresponding inode item updated. However there are situations
      where this will not happen due to transient -ENOMEM or -ENOSPC errors when
      doing delayed inode updates.
      
      For example, when punching holes we can succeed in deleting and modifying
      (shrinking) extents but later fail to do the delayed inode update. So after
      such failure we close our transaction handle and right after a snapshot of
      the fs/subvol tree can be made and used later for a send operation. The
      same thing can happen during truncate, link, unlink, and xattr related
      operations.
      
      So instead of executing a BUG_ON, make send return an -EIO error and print
      an informative error message do dmesg/syslog.
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      95155585
    • F
      Btrfs: send, avoid incorrect leaf accesses when sending utimes operations · 15b253ea
      Filipe Manana 提交于
      The caller of send_utimes() is supposed to be sure that the inode number
      it passes to this function does actually exists in the send snapshot.
      However due to logic/algorithm bugs (such as the one fixed by the patch
      titled "Btrfs: send, fix invalid leaf accesses due to incorrect utimes
      operations"), this might not be the case and when that happens it makes
      send_utimes() access use an unrelated leaf item as the target inode item
      or access beyond a leaf's boundaries (when the leaf is full and
      path->slots[0] matches the number of items in the leaf).
      
      So if the call to btrfs_search_slot() done by send_utimes() does not find
      the inode item, just make sure send_utimes() returns -ENOENT and does not
      silently accesses unrelated leaf items or does invalid leaf accesses, also
      allowing us to easialy and deterministically catch such algorithmic/logic
      bugs.
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      15b253ea
    • R
      Btrfs: send, fix invalid leaf accesses due to incorrect utimes operations · 764433a1
      Robbie Ko 提交于
      During an incremental send, if we have delayed rename operations for inodes
      that were children of directories which were removed in the send snapshot,
      we can end up accessing incorrect items in a leaf or accessing beyond the
      last item of the leaf due to issuing utimes operations for the removed
      inodes. Consider the following example:
      
        Parent snapshot:
        .                                                             (ino 256)
        |--- a/                                                       (ino 257)
        |    |--- c/                                                  (ino 262)
        |
        |--- b/                                                       (ino 258)
        |    |--- d/                                                  (ino 263)
        |
        |--- del/                                                     (ino 261)
              |--- x/                                                 (ino 259)
              |--- y/                                                 (ino 260)
      
        Send snapshot:
      
        .                                                             (ino 256)
        |--- a/                                                       (ino 257)
        |
        |--- b/                                                       (ino 258)
        |
        |--- c/                                                       (ino 262)
        |    |--- y/                                                  (ino 260)
        |
        |--- d/                                                       (ino 263)
             |--- x/                                                  (ino 259)
      
      1) When processing inodes 259 and 260, we end up delaying their rename
         operations because their parents, inodes 263 and 262 respectively, were
         not yet processed and therefore not yet renamed;
      
      2) When processing inode 262, its rename operation is issued and right
         after the rename operation for inode 260 is issued. However right after
         issuing the rename operation for inode 260, at send.c:apply_dir_move(),
         we issue utimes operations for all current and past parents of inode
         260. This means we try to send a utimes operation for its old parent,
         inode 261 (deleted in the send snapshot), which does not cause any
         immediate and deterministic failure, because when the target inode is
         not found in the send snapshot, the send.c:send_utimes() function
         ignores it and uses the leaf region pointed to by path->slots[0],
         which can be any unrelated item (belonging to other inode) or it can
         be a region outside the leaf boundaries, if the leaf is full and
         path->slots[0] matches the number of items in the leaf. So we end
         up either successfully sending a utimes operation, which is fine
         and irrelevant because the old parent (inode 261) will end up being
         deleted later, or we end up doing an invalid memory access tha
         crashes the kernel.
      
      So fix this by making apply_dir_move() issue utimes operations only for
      parents that still exist in the send snapshot. In a separate patch we
      will make send_utimes() return an error (-ENOENT) if the given inode
      does not exists in the send snapshot.
      Signed-off-by: NRobbie Ko <robbieko@synology.com>
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      [Rewrote change log to be more detailed and better organized]
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      764433a1
    • R
      Btrfs: send, fix warning due to late freeing of orphan_dir_info structures · 443f9d26
      Robbie Ko 提交于
      Under certain situations, when doing an incremental send, we can end up
      not freeing orphan_dir_info structures as soon as they are no longer
      needed. Instead we end up freeing them only after finishing the send
      stream, which causes a warning to be emitted:
      
      [282735.229200] ------------[ cut here ]------------
      [282735.229968] WARNING: CPU: 9 PID: 10588 at fs/btrfs/send.c:6298 btrfs_ioctl_send+0xe2f/0xe51 [btrfs]
      [282735.231282] Modules linked in: btrfs crc32c_generic xor raid6_pq acpi_cpufreq tpm_tis ppdev tpm parport_pc psmouse parport sg pcspkr i2c_piix4 i2c_core evdev processor serio_raw button loop autofs4 ext4 crc16 jbd2 mbcache sr_mod cdrom sd_mod ata_generic virtio_scsi ata_piix libata virtio_pci virtio_ring virtio e1000 scsi_mod floppy [last unloaded: btrfs]
      [282735.237130] CPU: 9 PID: 10588 Comm: btrfs Tainted: G        W       4.6.0-rc7-btrfs-next-31+ #1
      [282735.239309] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by qemu-project.org 04/01/2014
      [282735.240160]  0000000000000000 ffff880224273ca8 ffffffff8126b42c 0000000000000000
      [282735.240160]  0000000000000000 ffff880224273ce8 ffffffff81052b14 0000189a24273ac8
      [282735.240160]  ffff8802210c9800 0000000000000000 0000000000000001 0000000000000000
      [282735.240160] Call Trace:
      [282735.240160]  [<ffffffff8126b42c>] dump_stack+0x67/0x90
      [282735.240160]  [<ffffffff81052b14>] __warn+0xc2/0xdd
      [282735.240160]  [<ffffffff81052beb>] warn_slowpath_null+0x1d/0x1f
      [282735.240160]  [<ffffffffa03c99d5>] btrfs_ioctl_send+0xe2f/0xe51 [btrfs]
      [282735.240160]  [<ffffffffa0398358>] btrfs_ioctl+0x14f/0x1f81 [btrfs]
      [282735.240160]  [<ffffffff8108e456>] ? arch_local_irq_save+0x9/0xc
      [282735.240160]  [<ffffffff8118da05>] vfs_ioctl+0x18/0x34
      [282735.240160]  [<ffffffff8118e00c>] do_vfs_ioctl+0x550/0x5be
      [282735.240160]  [<ffffffff81196f0c>] ? __fget+0x6b/0x77
      [282735.240160]  [<ffffffff81196fa1>] ? __fget_light+0x62/0x71
      [282735.240160]  [<ffffffff8118e0d1>] SyS_ioctl+0x57/0x79
      [282735.240160]  [<ffffffff8149e025>] entry_SYSCALL_64_fastpath+0x18/0xa8
      [282735.240160]  [<ffffffff81100c6b>] ? time_hardirqs_off+0x9/0x14
      [282735.240160]  [<ffffffff8108e87d>] ? trace_hardirqs_off_caller+0x1f/0xaa
      [282735.256343] ---[ end trace a4539270c8056f93 ]---
      
      Consider the following example:
      
        Parent snapshot:
      
        .                                                             (ino 256)
        |--- a/                                                       (ino 257)
        |    |--- c/                                                  (ino 260)
        |
        |--- del/                                                     (ino 259)
              |--- tmp/                                               (ino 258)
              |--- x/                                                 (ino 261)
              |--- y/                                                 (ino 262)
      
        Send snapshot:
      
        .                                                             (ino 256)
        |--- a/                                                       (ino 257)
        |    |--- x/                                                  (ino 261)
        |    |--- y/                                                  (ino 262)
        |
        |--- c/                                                       (ino 260)
             |--- tmp/                                                (ino 258)
      
      1) When processing inode 258, we end up delaying its rename operation
         because it has an ancestor (in the send snapshot) that has a higher
         inode number (inode 260) which was also renamed in the send snapshot,
         therefore we delay the rename of inode 258 so that it happens after
         inode 260 is renamed;
      
      2) When processing inode 259, we end up delaying its deletion (rmdir
         operation) because it has a child inode (258) that has its rename
         operation delayed. At this point we allocate an orphan_dir_info
         structure and tag inode 258 so that we later attempt to see if we
         can delete (rmdir) inode 259 once inode 258 is renamed;
      
      3) When we process inode 260, after renaming it we finally do the rename
         operation for inode 258. Once we issue the rename operation for inode
         258 we notice that this inode was tagged so that we attempt to see
         if at this point we can delete (rmdir) inode 259. But at this point
         we can not still delete inode 259 because it has 2 children, inodes
         261 and 262, that were not yet processed and therefore not yet
         moved (renamed) away from inode 259. We end up not freeing the
         orphan_dir_info structure allocated in step 2;
      
      4) We process inodes 261 and 262, and once we move/rename inode 262
         we issue the rmdir operation for inode 260;
      
      5) We finish the send stream and notice that red black tree that
         contains orphan_dir_info structures is not empty, so we emit
         a warning and then free any orphan_dir_structures left.
      
      So fix this by freeing an orphan_dir_info structure once we try to
      apply a pending rename operation if we can not delete yet the tagged
      directory.
      
      A test case for fstests follows soon.
      Signed-off-by: NRobbie Ko <robbieko@synology.com>
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      [Modified changelog to be more detailed and easier to understand]
      443f9d26
    • R
      Btrfs: incremental send, fix premature rmdir operations · 99ea42dd
      Robbie Ko 提交于
      Under certain situations, an incremental send operation can contain
      a rmdir operation that will make the receiving end fail when attempting
      to execute it, because the target directory is not yet empty.
      
      Consider the following example:
      
        Parent snapshot:
      
        .                                                             (ino 256)
        |--- a/                                                       (ino 257)
        |    |--- c/                                                  (ino 260)
        |
        |--- del/                                                     (ino 259)
              |--- tmp/                                               (ino 258)
              |--- x/                                                 (ino 261)
      
        Send snapshot:
      
        .                                                             (ino 256)
        |--- a/                                                       (ino 257)
        |    |--- x/                                                  (ino 261)
        |
        |--- c/                                                       (ino 260)
             |--- tmp/                                                (ino 258)
      
      1) When processing inode 258, we delay its rename operation because inode
         260 is its new parent in the send snapshot and it was not yet renamed
         (since 260 > 258, that is, beyond the current progress);
      
      2) When processing inode 259, we realize we can not yet send an rmdir
         operation (against inode 259) because inode 258 was still not yet
         renamed/moved away from inode 259. Therefore we update data structures
         so that after inode 258 is renamed, we try again to see if we can
         finally send an rmdir operation for inode 259;
      
      3) When we process inode 260, we send a rename operation for it followed
         by a rename operation for inode 258. Once we send the rename operation
         for inode 258 we then check if we can finally issue an rmdir for its
         previous parent, inode 259, by calling the can_rmdir() function with
         a value of sctx->cur_ino + 1 (260 + 1 = 261) for its "progress"
         argument. This makes can_rmdir() return true (value 1) because even
         though there's still a child inode of inode 259 that was not yet
         renamed/moved, which is inode 261, the given value of progress (261)
         is not lower then 261 (that is, not lower than the inode number of
         some child of inode 259). So we end up sending a rmdir operation for
         inode 259 before its child inode 261 is processed and renamed.
      
      So fix this by passing the correct progress value to the call to
      can_rmdir() from within apply_dir_move() (where we issue delayed rename
      operations), which should match stcx->cur_ino (the number of the inode
      currently being processed) and not sctx->cur_ino + 1.
      
      A test case for fstests follows soon.
      Signed-off-by: NRobbie Ko <robbieko@synology.com>
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      [Rewrote change log to be more detailed, clear and well formatted]
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      99ea42dd
    • F
      Btrfs: incremental send, fix invalid paths for rename operations · 4122ea64
      Filipe Manana 提交于
      Example scenario:
      
        Parent snapshot:
      
        .                                                       (ino 277)
        |---- tmp/                                              (ino 278)
        |---- pre/                                              (ino 280)
        |      |---- wait_dir/                                  (ino 281)
        |
        |---- desc/                                             (ino 282)
        |---- ance/                                             (ino 283)
        |       |---- below_ance/                               (ino 279)
        |
        |---- other_dir/                                        (ino 284)
      
        Send snapshot:
      
        .                                                       (ino 277)
        |---- tmp/                                              (ino 278)
               |---- other_dir/                                 (ino 284)
                         |---- below_ance/                      (ino 279)
                         |            |---- pre/                (ino 280)
                         |
                         |---- wait_dir/                        (ino 281)
                                    |---- desc/                 (ino 282)
                                            |---- ance/         (ino 283)
      
      While computing the send stream the following steps happen:
      
      1) While processing inode 279 we end up delaying its rename operation
         because its new parent in the send snapshot, inode 284, was not
         yet processed and therefore not yet renamed;
      
      2) Later when processing inode 280 we end up renaming it immediately to
         "ance/below_once/pre" and not delay its rename operation because its
         new parent (inode 279 in the send snapshot) has its rename operation
         delayed and inode 280 is not an encestor of inode 279 (its parent in
         the send snapshot) in the parent snapshot;
      
      3) When processing inode 281 we end up delaying its rename operation
         because its new parent in the send snapshot, inode 284, was not yet
         processed and therefore not yet renamed;
      
      4) When processing inode 282 we do not delay its rename operation because
         its parent in the send snapshot, inode 281, already has its own rename
         operation delayed and our current inode (282) is not an ancestor of
         inode 281 in the parent snapshot. Therefore inode 282 is renamed to
         "ance/below_ance/pre/wait_dir";
      
      5) When processing inode 283 we realize that we can rename it because one
         of its ancestors in the send snapshot, inode 281, has its rename
         operation delayed and inode 283 is not an ancestor of inode 281 in the
         parent snapshot. So a rename operation to rename inode 283 to
         "ance/below_ance/pre/wait_dir/desc/ance" is issued. This path is
         invalid due to a missing path building loop that was undetected by
         the incremental send implementation, as inode 283 ends up getting
         included twice in the path (once with its path in the parent snapshot).
         Therefore its rename operation must wait before the ancestor inode 284
         is renamed.
      
      Fix this by not terminating the rename dependency checks when we find an
      ancestor, in the send snapshot, that has its rename operation delayed. So
      that we continue doing the same checks if the current inode is not an
      ancestor, in the parent snapshot, of an ancestor in the send snapshot we
      are processing in the loop.
      
      The problem and reproducer were reported by Robbie Ko, as part of a patch
      titled "Btrfs: incremental send, avoid ancestor rename to descendant".
      However the fix was unnecessarily complicated and can be addressed with
      much less code and effort.
      Reported-by: NRobbie Ko <robbieko@synology.com>
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      4122ea64
    • F
      Btrfs: send, add missing error check for calls to path_loop() · 7969e77a
      Filipe Manana 提交于
      The function path_loop() can return a negative integer, signaling an
      error, 0 if there's no path loop and 1 if there's a path loop. We were
      treating any non zero values as meaning that a path loop exists. Fix
      this by explicitly checking for errors and gracefully return them to
      user space.
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      7969e77a
    • R
      Btrfs: send, fix failure to move directories with the same name around · 801bec36
      Robbie Ko 提交于
      When doing an incremental send we can end up not moving directories that
      have the same name. This happens when the same parent directory has
      different child directories with the same name in the parent and send
      snapshots.
      
      For example, consider the following scenario:
      
        Parent snapshot:
      
        .                   (ino 256)
        |---- d/            (ino 257)
        |     |--- p1/      (ino 258)
        |
        |---- p1/           (ino 259)
      
        Send snapshot:
      
        .                    (ino 256)
        |--- d/              (ino 257)
             |--- p1/        (ino 259)
                   |--- p1/  (ino 258)
      
      The directory named "d" (inode 257) has in both snapshots an entry with
      the name "p1" but it refers to different inodes in both snapshots (inode
      258 in the parent snapshot and inode 259 in the send snapshot). When
      attempting to move inode 258, the operation is delayed because its new
      parent, inode 259, was not yet moved/renamed (as the stream is currently
      processing inode 258). Then when processing inode 259, we also end up
      delaying its move/rename operation so that it happens after inode 258 is
      moved/renamed. This decision to delay the move/rename rename operation
      of inode 259 is due to the fact that the new parent inode (257) still
      has inode 258 as its child, which has the same name has inode 259. So
      we end up with inode 258 move/rename operation waiting for inode's 259
      move/rename operation, which in turn it waiting for inode's 258
      move/rename. This results in ending the send stream without issuing
      move/rename operations for inodes 258 and 259 and generating the
      following warnings in syslog/dmesg:
      
      [148402.979747] ------------[ cut here ]------------
      [148402.980588] WARNING: CPU: 14 PID: 4117 at fs/btrfs/send.c:6177 btrfs_ioctl_send+0xe03/0xe51 [btrfs]
      [148402.981928] Modules linked in: btrfs crc32c_generic xor raid6_pq acpi_cpufreq tpm_tis ppdev tpm parport_pc psmouse parport sg pcspkr i2c_piix4 i2c_core evdev processor serio_raw button loop autofs4 ext4 crc16 jbd2 mbcache sr_mod cdrom sd_mod ata_generic virtio_scsi ata_piix libata virtio_pci virtio_ring virtio e1000 scsi_mod floppy [last unloaded: btrfs]
      [148402.986999] CPU: 14 PID: 4117 Comm: btrfs Tainted: G        W       4.6.0-rc7-btrfs-next-31+ #1
      [148402.988136] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by qemu-project.org 04/01/2014
      [148402.988136]  0000000000000000 ffff88022139fca8 ffffffff8126b42c 0000000000000000
      [148402.988136]  0000000000000000 ffff88022139fce8 ffffffff81052b14 000018212139fac8
      [148402.988136]  ffff88022b0db400 0000000000000000 0000000000000001 0000000000000000
      [148402.988136] Call Trace:
      [148402.988136]  [<ffffffff8126b42c>] dump_stack+0x67/0x90
      [148402.988136]  [<ffffffff81052b14>] __warn+0xc2/0xdd
      [148402.988136]  [<ffffffff81052beb>] warn_slowpath_null+0x1d/0x1f
      [148402.988136]  [<ffffffffa04bc831>] btrfs_ioctl_send+0xe03/0xe51 [btrfs]
      [148402.988136]  [<ffffffffa048b358>] btrfs_ioctl+0x14f/0x1f81 [btrfs]
      [148402.988136]  [<ffffffff8108e456>] ? arch_local_irq_save+0x9/0xc
      [148402.988136]  [<ffffffff8108eb51>] ? __lock_is_held+0x3c/0x57
      [148402.988136]  [<ffffffff8118da05>] vfs_ioctl+0x18/0x34
      [148402.988136]  [<ffffffff8118e00c>] do_vfs_ioctl+0x550/0x5be
      [148402.988136]  [<ffffffff81196f0c>] ? __fget+0x6b/0x77
      [148402.988136]  [<ffffffff81196fa1>] ? __fget_light+0x62/0x71
      [148402.988136]  [<ffffffff8118e0d1>] SyS_ioctl+0x57/0x79
      [148402.988136]  [<ffffffff8149e025>] entry_SYSCALL_64_fastpath+0x18/0xa8
      [148402.988136]  [<ffffffff8108e89d>] ? trace_hardirqs_off_caller+0x3f/0xaa
      [148403.011373] ---[ end trace a4539270c8056f8b ]---
      [148403.012296] ------------[ cut here ]------------
      [148403.013071] WARNING: CPU: 14 PID: 4117 at fs/btrfs/send.c:6194 btrfs_ioctl_send+0xe19/0xe51 [btrfs]
      [148403.014447] Modules linked in: btrfs crc32c_generic xor raid6_pq acpi_cpufreq tpm_tis ppdev tpm parport_pc psmouse parport sg pcspkr i2c_piix4 i2c_core evdev processor serio_raw button loop autofs4 ext4 crc16 jbd2 mbcache sr_mod cdrom sd_mod ata_generic virtio_scsi ata_piix libata virtio_pci virtio_ring virtio e1000 scsi_mod floppy [last unloaded: btrfs]
      [148403.019708] CPU: 14 PID: 4117 Comm: btrfs Tainted: G        W       4.6.0-rc7-btrfs-next-31+ #1
      [148403.020104] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by qemu-project.org 04/01/2014
      [148403.020104]  0000000000000000 ffff88022139fca8 ffffffff8126b42c 0000000000000000
      [148403.020104]  0000000000000000 ffff88022139fce8 ffffffff81052b14 000018322139fac8
      [148403.020104]  ffff88022b0db400 0000000000000000 0000000000000001 0000000000000000
      [148403.020104] Call Trace:
      [148403.020104]  [<ffffffff8126b42c>] dump_stack+0x67/0x90
      [148403.020104]  [<ffffffff81052b14>] __warn+0xc2/0xdd
      [148403.020104]  [<ffffffff81052beb>] warn_slowpath_null+0x1d/0x1f
      [148403.020104]  [<ffffffffa04bc847>] btrfs_ioctl_send+0xe19/0xe51 [btrfs]
      [148403.020104]  [<ffffffffa048b358>] btrfs_ioctl+0x14f/0x1f81 [btrfs]
      [148403.020104]  [<ffffffff8108e456>] ? arch_local_irq_save+0x9/0xc
      [148403.020104]  [<ffffffff8108eb51>] ? __lock_is_held+0x3c/0x57
      [148403.020104]  [<ffffffff8118da05>] vfs_ioctl+0x18/0x34
      [148403.020104]  [<ffffffff8118e00c>] do_vfs_ioctl+0x550/0x5be
      [148403.020104]  [<ffffffff81196f0c>] ? __fget+0x6b/0x77
      [148403.020104]  [<ffffffff81196fa1>] ? __fget_light+0x62/0x71
      [148403.020104]  [<ffffffff8118e0d1>] SyS_ioctl+0x57/0x79
      [148403.020104]  [<ffffffff8149e025>] entry_SYSCALL_64_fastpath+0x18/0xa8
      [148403.020104]  [<ffffffff8108e89d>] ? trace_hardirqs_off_caller+0x3f/0xaa
      [148403.038981] ---[ end trace a4539270c8056f8c ]---
      
      There's another issue caused by similar (but more complex) changes in the
      directory hierarchy that makes move/rename operations fail, described with
      the following example:
      
        Parent snapshot:
      
        .
        |---- a/                                                   (ino 262)
        |     |---- c/                                             (ino 268)
        |
        |---- d/                                                   (ino 263)
              |---- ance/                                          (ino 267)
                      |---- e/                                     (ino 264)
                      |---- f/                                     (ino 265)
                      |---- ance/                                  (ino 266)
      
        Send snapshot:
      
        .
        |---- a/                                                   (ino 262)
        |---- c/                                                   (ino 268)
        |     |---- ance/                                          (ino 267)
        |
        |---- d/                                                   (ino 263)
        |     |---- ance/                                          (ino 266)
        |
        |---- f/                                                   (ino 265)
              |---- e/                                             (ino 264)
      
      When the inode 265 is processed, the path for inode 267 is computed, which
      at that time corresponds to "d/ance", and it's stored in the names cache.
      Later on when processing inode 266, we end up orphanizing (renaming to a
      name matching the pattern o<ino>-<gen>-<seq>) inode 267 because it has
      the same name as inode 266 and it's currently a child of the new parent
      directory (inode 263) for inode 266. After the orphanization and while we
      are still processing inode 266, a rename operation for inode 266 is
      generated. However the source path for that rename operation is incorrect
      because it ends up using the old, pre-orphanization, name of inode 267.
      The no longer valid name for inode 267 was previously cached when
      processing inode 265 and it remains usable and considered valid until
      the inode currently being processed has a number greater than 267.
      This resulted in the receiving side failing with the following error:
      
        ERROR: rename d/ance/ance -> d/ance failed: No such file or directory
      
      So fix these issues by detecting such circular dependencies for rename
      operations and by clearing the cached name of an inode once the inode
      is orphanized.
      
      A test case for fstests will follow soon.
      Signed-off-by: NRobbie Ko <robbieko@synology.com>
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      [Rewrote change log to be more detailed and organized, and improved
       comments]
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      801bec36
  6. 26 5月, 2016 1 次提交
  7. 06 5月, 2016 6 次提交
  8. 05 4月, 2016 1 次提交
    • K
      mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros · 09cbfeaf
      Kirill A. Shutemov 提交于
      PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
      ago with promise that one day it will be possible to implement page
      cache with bigger chunks than PAGE_SIZE.
      
      This promise never materialized.  And unlikely will.
      
      We have many places where PAGE_CACHE_SIZE assumed to be equal to
      PAGE_SIZE.  And it's constant source of confusion on whether
      PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
      especially on the border between fs and mm.
      
      Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
      breakage to be doable.
      
      Let's stop pretending that pages in page cache are special.  They are
      not.
      
      The changes are pretty straight-forward:
      
       - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
      
       - page_cache_get() -> get_page();
      
       - page_cache_release() -> put_page();
      
      This patch contains automated changes generated with coccinelle using
      script below.  For some reason, coccinelle doesn't patch header files.
      I've called spatch for them manually.
      
      The only adjustment after coccinelle is revert of changes to
      PAGE_CAHCE_ALIGN definition: we are going to drop it later.
      
      There are few places in the code where coccinelle didn't reach.  I'll
      fix them manually in a separate patch.  Comments and documentation also
      will be addressed with the separate patch.
      
      virtual patch
      
      @@
      expression E;
      @@
      - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      expression E;
      @@
      - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      @@
      - PAGE_CACHE_SHIFT
      + PAGE_SHIFT
      
      @@
      @@
      - PAGE_CACHE_SIZE
      + PAGE_SIZE
      
      @@
      @@
      - PAGE_CACHE_MASK
      + PAGE_MASK
      
      @@
      expression E;
      @@
      - PAGE_CACHE_ALIGN(E)
      + PAGE_ALIGN(E)
      
      @@
      expression E;
      @@
      - page_cache_get(E)
      + get_page(E)
      
      @@
      expression E;
      @@
      - page_cache_release(E)
      + put_page(E)
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      09cbfeaf
  9. 12 3月, 2016 1 次提交
  10. 11 2月, 2016 1 次提交
  11. 01 1月, 2016 1 次提交
    • F
      Btrfs: send, don't BUG_ON() when an empty symlink is found · a879719b
      Filipe Manana 提交于
      When a symlink is successfully created it always has an inline extent
      containing the source path. However if an error happens when creating
      the symlink, we can leave in the subvolume's tree a symlink inode without
      any such inline extent item - this happens if after btrfs_symlink() calls
      btrfs_end_transaction() and before it calls the inode eviction handler
      (through the final iput() call), the transaction gets committed and a
      crash happens before the eviction handler gets called, or if a snapshot
      of the subvolume is made before the eviction handler gets called. Sadly
      we can't just avoid this by making btrfs_symlink() call
      btrfs_end_transaction() after it calls the eviction handler, because the
      later can commit the current transaction before it removes any items from
      the subvolume tree (if it encounters ENOSPC errors while reserving space
      for removing all the items).
      
      So make send fail more gracefully, with an -EIO error, and print a
      message to dmesg/syslog informing that there's an empty symlink inode,
      so that the user can delete the empty symlink or do something else
      about it.
      Reported-by: NStephen R. van den Berg <srb@cuci.nl>
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      a879719b
  12. 14 10月, 2015 1 次提交
    • R
      btrfs: fix resending received snapshot with parent · b96b1db0
      Robin Ruede 提交于
      This fixes a regression introduced by 37b8d27d between v4.1 and v4.2.
      
      When a snapshot is received, its received_uuid is set to the original
      uuid of the subvolume. When that snapshot is then resent to a third
      filesystem, it's received_uuid is set to the second uuid
      instead of the original one. The same was true for the parent_uuid.
      This behaviour was partially changed in 37b8d27d, but in that patch
      only the parent_uuid was taken from the real original,
      not the uuid itself, causing the search for the parent to fail in
      the case below.
      
      This happens for example when trying to send a series of linked
      snapshots (e.g. created by snapper) from the backup file system back
      to the original one.
      
      The following commands reproduce the issue in v4.2.1
      (no error in 4.1.6)
      
          # setup three test file systems
          for i in 1 2 3; do
      	    truncate -s 50M fs$i
      	    mkfs.btrfs fs$i
      	    mkdir $i
      	    mount fs$i $i
          done
          echo "content" > 1/testfile
          btrfs su snapshot -r 1/ 1/snap1
          echo "changed content" > 1/testfile
          btrfs su snapshot -r 1/ 1/snap2
      
          # works fine:
          btrfs send 1/snap1 | btrfs receive 2/
          btrfs send -p 1/snap1 1/snap2 | btrfs receive 2/
      
          # ERROR: could not find parent subvolume
          btrfs send 2/snap1 | btrfs receive 3/
          btrfs send -p 2/snap1 2/snap2 | btrfs receive 3/
      Signed-off-by: NRobin Ruede <rruede+git@gmail.com>
      Fixes: 37b8d27d ("Btrfs: use received_uuid of parent during send")
      Cc: stable@vger.kernel.org # v4.2+
      Reviewed-by: NFilipe Manana <fdmanana@suse.com>
      Tested-by: NEd Tomlinson <edt@aei.ca>
      b96b1db0
  13. 13 10月, 2015 1 次提交
    • F
      Btrfs: send, fix file corruption due to incorrect cloning operations · d906d49f
      Filipe Manana 提交于
      If we have a file that shares an extent with other files, when processing
      the extent item relative to a shared extent, we blindly issue a clone
      operation that will target a length matching the length in the extent item
      and uses as a source some other file the receiver already has and points
      to the same extent. However that range in the other file might not
      exclusively point only to the shared extent, and so using that length
      will result in the receiver getting a file with different data from the
      one in the send snapshot. This issue happened both for incremental and
      full send operations.
      
      So fix this by issuing clone operations with lengths that don't cover
      regions of the source file that point to different extents (or have holes).
      
      The following test case for fstests reproduces the problem.
      
        seq=`basename $0`
        seqres=$RESULT_DIR/$seq
        echo "QA output created by $seq"
      
        tmp=/tmp/$$
        status=1	# failure is the default!
        trap "_cleanup; exit \$status" 0 1 2 3 15
      
        _cleanup()
        {
            rm -fr $send_files_dir
            rm -f $tmp.*
        }
      
        # get standard environment, filters and checks
        . ./common/rc
        . ./common/filter
      
        # real QA test starts here
        _supported_fs btrfs
        _supported_os Linux
        _require_scratch
        _need_to_be_root
        _require_cp_reflink
        _require_xfs_io_command "fpunch"
      
        send_files_dir=$TEST_DIR/btrfs-test-$seq
      
        rm -f $seqres.full
        rm -fr $send_files_dir
        mkdir $send_files_dir
      
        _scratch_mkfs >>$seqres.full 2>&1
        _scratch_mount
      
        # Create our test file with a single 100K extent.
        $XFS_IO_PROG -f -c "pwrite -S 0xaa 0K 100K" \
           $SCRATCH_MNT/foo | _filter_xfs_io
      
        # Clone our file into a new file named bar.
        cp --reflink=always $SCRATCH_MNT/foo $SCRATCH_MNT/bar
      
        # Now overwrite parts of our foo file.
        $XFS_IO_PROG -c "pwrite -S 0xbb 50K 10K" \
           -c "pwrite -S 0xcc 90K 10K" \
           -c "fpunch 70K 10k" \
           $SCRATCH_MNT/foo | _filter_xfs_io
      
        _run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT \
           $SCRATCH_MNT/snap
      
        echo "File digests in the original filesystem:"
        md5sum $SCRATCH_MNT/snap/foo | _filter_scratch
        md5sum $SCRATCH_MNT/snap/bar | _filter_scratch
      
        _run_btrfs_util_prog send $SCRATCH_MNT/snap -f $send_files_dir/1.snap
      
        # Now recreate the filesystem by receiving the send stream and verify
        # we get the same file contents that the original filesystem had.
        _scratch_unmount
        _scratch_mkfs >>$seqres.full 2>&1
        _scratch_mount
      
        _run_btrfs_util_prog receive $SCRATCH_MNT -f $send_files_dir/1.snap
      
        # We expect the destination filesystem to have exactly the same file
        # data as the original filesystem.
        # The btrfs send implementation had a bug where it sent a clone
        # operation from file foo into file bar covering the whole [0, 100K[
        # range after creating and writing the file foo. This was incorrect
        # because the file bar now included the updates done to file foo after
        # we cloned foo to bar, breaking the COW nature of reflink copies
        # (cloned extents).
        echo "File digests in the new filesystem:"
        md5sum $SCRATCH_MNT/snap/foo | _filter_scratch
        md5sum $SCRATCH_MNT/snap/bar | _filter_scratch
      
        status=0
        exit
      
      Another test case that reproduces the problem when we have compressed
      extents:
      
        seq=`basename $0`
        seqres=$RESULT_DIR/$seq
        echo "QA output created by $seq"
      
        tmp=/tmp/$$
        status=1	# failure is the default!
        trap "_cleanup; exit \$status" 0 1 2 3 15
      
        _cleanup()
        {
            rm -fr $send_files_dir
            rm -f $tmp.*
        }
      
        # get standard environment, filters and checks
        . ./common/rc
        . ./common/filter
      
        # real QA test starts here
        _supported_fs btrfs
        _supported_os Linux
        _require_scratch
        _need_to_be_root
        _require_cp_reflink
      
        send_files_dir=$TEST_DIR/btrfs-test-$seq
      
        rm -f $seqres.full
        rm -fr $send_files_dir
        mkdir $send_files_dir
      
        _scratch_mkfs >>$seqres.full 2>&1
        _scratch_mount "-o compress"
      
        # Create our file with an extent of 100K starting at file offset 0K.
        $XFS_IO_PROG -f -c "pwrite -S 0xaa 0K 100K"       \
                        -c "fsync"                        \
                        $SCRATCH_MNT/foo | _filter_xfs_io
      
        # Rewrite part of the previous extent (its first 40K) and write a new
        # 100K extent starting at file offset 100K.
        $XFS_IO_PROG -c "pwrite -S 0xbb 0K 40K"    \
                -c "pwrite -S 0xcc 100K 100K"      \
                $SCRATCH_MNT/foo | _filter_xfs_io
      
        # Our file foo now has 3 file extent items in its metadata:
        #
        # 1) One covering the file range 0 to 40K;
        # 2) One covering the file range 40K to 100K, which points to the first
        #    extent we wrote to the file and has a data offset field with value
        #    40K (our file no longer uses the first 40K of data from that
        #    extent);
        # 3) One covering the file range 100K to 200K.
      
        # Now clone our file foo into file bar.
        cp --reflink=always $SCRATCH_MNT/foo $SCRATCH_MNT/bar
      
        # Create our snapshot for the send operation.
        _run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT \
                $SCRATCH_MNT/snap
      
        echo "File digests in the original filesystem:"
        md5sum $SCRATCH_MNT/snap/foo | _filter_scratch
        md5sum $SCRATCH_MNT/snap/bar | _filter_scratch
      
        _run_btrfs_util_prog send $SCRATCH_MNT/snap -f $send_files_dir/1.snap
      
        # Now recreate the filesystem by receiving the send stream and verify we
        # get the same file contents that the original filesystem had.
        # Btrfs send used to issue a clone operation from foo's range
        # [80K, 140K[ to bar's range [40K, 100K[ when cloning the extent pointed
        # to by foo's second file extent item, this was incorrect because of bad
        # accounting of the file extent item's data offset field. The correct
        # range to clone from should have been [40K, 100K[.
        _scratch_unmount
        _scratch_mkfs >>$seqres.full 2>&1
        _scratch_mount "-o compress"
      
        _run_btrfs_util_prog receive $SCRATCH_MNT -f $send_files_dir/1.snap
      
        echo "File digests in the new filesystem:"
        # Must match the digests we got in the original filesystem.
        md5sum $SCRATCH_MNT/snap/foo | _filter_scratch
        md5sum $SCRATCH_MNT/snap/bar | _filter_scratch
      
        status=0
        exit
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      d906d49f
  14. 08 10月, 2015 1 次提交
  15. 06 10月, 2015 1 次提交
    • F
      Btrfs: send, fix corner case for reference overwrite detection · b786f16a
      Filipe Manana 提交于
      When the inode given to did_overwrite_ref() matches the current progress
      and has a reference that collides with the reference of other inode that
      has the same number as the current progress, we were always telling our
      caller that the inode's reference was overwritten, which is incorrect
      because the other inode might be a new inode (different generation number)
      in which case we must return false from did_overwrite_ref() so that its
      callers don't use an orphanized path for the inode (as it will never be
      orphanized, instead it will be unlinked and the new inode created later).
      
      The following test case for fstests reproduces the issue:
      
        seq=`basename $0`
        seqres=$RESULT_DIR/$seq
        echo "QA output created by $seq"
      
        tmp=/tmp/$$
        status=1	# failure is the default!
        trap "_cleanup; exit \$status" 0 1 2 3 15
      
        _cleanup()
        {
            rm -fr $send_files_dir
            rm -f $tmp.*
        }
      
        # get standard environment, filters and checks
        . ./common/rc
        . ./common/filter
      
        # real QA test starts here
        _supported_fs btrfs
        _supported_os Linux
        _require_scratch
        _need_to_be_root
      
        send_files_dir=$TEST_DIR/btrfs-test-$seq
      
        rm -f $seqres.full
        rm -fr $send_files_dir
        mkdir $send_files_dir
      
        _scratch_mkfs >>$seqres.full 2>&1
        _scratch_mount
      
        # Create our test file with a single extent of 64K.
        mkdir -p $SCRATCH_MNT/foo
        $XFS_IO_PROG -f -c "pwrite -S 0xaa 0 64K" $SCRATCH_MNT/foo/bar \
            | _filter_xfs_io
      
        _run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT \
            $SCRATCH_MNT/mysnap1
        _run_btrfs_util_prog subvolume snapshot $SCRATCH_MNT \
            $SCRATCH_MNT/mysnap2
      
        echo "File digest before being replaced:"
        md5sum $SCRATCH_MNT/mysnap1/foo/bar | _filter_scratch
      
        # Remove the file and then create a new one in the same location with
        # the same name but with different content. This new file ends up
        # getting the same inode number as the previous one, because that inode
        # number was the highest inode number used by the snapshot's root and
        # therefore when attempting to find the a new inode number for the new
        # file, we end up reusing the same inode number. This happens because
        # currently btrfs uses the highest inode number summed by 1 for the
        # first inode created once a snapshot's root is loaded (done at
        # fs/btrfs/inode-map.c:btrfs_find_free_objectid in the linux kernel
        # tree).
        # Having these two different files in the snapshots with the same inode
        # number (but different generation numbers) caused the btrfs send code
        # to emit an incorrect path for the file when issuing an unlink
        # operation because it failed to realize they were different files.
        rm -f $SCRATCH_MNT/mysnap2/foo/bar
        $XFS_IO_PROG -f -c "pwrite -S 0xbb 0 96K" \
            $SCRATCH_MNT/mysnap2/foo/bar | _filter_xfs_io
      
        _run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT/mysnap2 \
            $SCRATCH_MNT/mysnap2_ro
      
        _run_btrfs_util_prog send $SCRATCH_MNT/mysnap1 -f $send_files_dir/1.snap
        _run_btrfs_util_prog send -p $SCRATCH_MNT/mysnap1 \
            $SCRATCH_MNT/mysnap2_ro -f $send_files_dir/2.snap
      
        echo "File digest in the original filesystem after being replaced:"
        md5sum $SCRATCH_MNT/mysnap2_ro/foo/bar | _filter_scratch
      
        # Now recreate the filesystem by receiving both send streams and verify
        # we get the same file contents that the original filesystem had.
        _scratch_unmount
        _scratch_mkfs >>$seqres.full 2>&1
        _scratch_mount
      
        _run_btrfs_util_prog receive -vv $SCRATCH_MNT -f $send_files_dir/1.snap
        _run_btrfs_util_prog receive -vv $SCRATCH_MNT -f $send_files_dir/2.snap
      
        echo "File digest in the new filesystem:"
        # Must match the digest from the new file.
        md5sum $SCRATCH_MNT/mysnap2_ro/foo/bar | _filter_scratch
      
        status=0
        exit
      Reported-by: NMartin Raiber <martin@urbackup.org>
      Fixes: 8b191a68 ("Btrfs: incremental send, check if orphanized dir inode needs delayed rename")
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      b786f16a
  16. 13 6月, 2015 1 次提交
    • J
      Btrfs: use received_uuid of parent during send · 37b8d27d
      Josef Bacik 提交于
      Neil Horman pointed out a problem where if he did something like this
      
      receive A
      snap A B
      change B
      send -p A B
      
      and then on another box do
      
      recieve A
      receive B
      
      the receive B would fail because we use the UUID of A for the clone sources for
      B.  This makes sense most of the time because normally you are sending from the
      original sources, not a received source.  However when you use a recieved subvol
      its UUID is going to be something completely different, so if you then try to
      receive the diff on a different volume it won't find the UUID because the new A
      will be something else.  The only constant is the received uuid.  So instead
      check to see if we have received_uuid set on the root, and if so use that as the
      clone source, as btrfs receive looks for matches either in received_uuid or
      uuid.  Thanks,
      Reported-by: NNeil Horman <nhorman@redhat.com>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Reviewed-by: NHugo Mills <hugo@carfax.org.uk>
      Signed-off-by: NChris Mason <clm@fb.com>
      37b8d27d
  17. 03 6月, 2015 3 次提交
    • F
      Btrfs: incremental send, fix clone operations for compressed extents · 619d8c4e
      Filipe Manana 提交于
      Marc reported a problem where the receiving end of an incremental send
      was performing clone operations that failed with -EINVAL. This happened
      because, unlike for uncompressed extents, we were not checking if the
      source clone offset and length, after summing the data offset, falls
      within the source file's boundaries.
      
      So make sure we do such checks when attempting to issue clone operations
      for compressed extents.
      
      Problem reproducible with the following steps:
      
        $ mkfs.btrfs -f /dev/sdb
        $ mount -o compress /dev/sdb /mnt
        $ mkfs.btrfs -f /dev/sdc
        $ mount -o compress /dev/sdc /mnt2
      
        # Create the file with a single extent of 128K. This creates a metadata file
        # extent item with a data start offset of 0 and a logical length of 128K.
        $ xfs_io -f -c "pwrite -S 0xaa 64K 128K" -c "fsync" /mnt/foo
      
        # Now rewrite the range 64K to 112K of our file. This will make the inode's
        # metadata continue to point to the 128K extent we created before, but now
        # with an extent item that points to the extent with a data start offset of
        # 112K and a logical length of 16K.
        # That metadata file extent item is associated with the logical file offset
        # at 176K and covers the logical file range 176K to 192K.
        $ xfs_io -c "pwrite -S 0xbb 64K 112K" -c "fsync" /mnt/foo
      
        # Now rewrite the range 180K to 12K. This will make the inode's metadata
        # continue to point the the 128K extent we created earlier, with a single
        # extent item that points to it with a start offset of 112K and a logical
        # length of 4K.
        # That metadata file extent item is associated with the logical file offset
        # at 176K and covers the logical file range 176K to 180K.
        $ xfs_io -c "pwrite -S 0xcc 180K 12K" -c "fsync" /mnt/foo
      
        $ btrfs subvolume snapshot -r /mnt /mnt/snap1
      
        $ touch /mnt/bar
        # Calls the btrfs clone ioctl.
        $ ~/xfstests/src/cloner -s $((176 * 1024)) -d $((176 * 1024)) \
          -l $((4 * 1024)) /mnt/foo /mnt/bar
      
        $ btrfs subvolume snapshot -r /mnt /mnt/snap2
      
        $ btrfs send /mnt/snap1 | btrfs receive /mnt2
        At subvol /mnt/snap1
        At subvol snap1
      
        $ btrfs send -p /mnt/snap1 /mnt/snap2 | btrfs receive /mnt2
        At subvol /mnt/snap2
        At snapshot snap2
        ERROR: failed to clone extents to bar
        Invalid argument
      
      A test case for fstests follows soon.
      Reported-by: NMarc MERLIN <marc@merlins.org>
      Tested-by: NMarc MERLIN <marc@merlins.org>
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Tested-by: NDavid Sterba <dsterba@suse.cz>
      Tested-by: NJan Alexander Steffens (heftig) <jan.steffens@gmail.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      619d8c4e
    • F
      Btrfs: incremental send, check if orphanized dir inode needs delayed rename · 8b191a68
      Filipe Manana 提交于
      If a directory inode is orphanized, because some inode previously
      processed has a new name that collides with the old name of the current
      inode, we need to check if it needs its rename operation delayed too,
      as its ancestor-descendent relationship with some other inode might
      have been reversed between the parent and send snapshots and therefore
      its rename operation needs to happen after that other inode is renamed.
      
      For example, for the following reproducer where this is needed (provided
      by Robbie Ko):
      
        $ mkfs.btrfs -f /dev/sdb
        $ mount /dev/sdb /mnt
        $ mkfs.btrfs -f /dev/sdc
        $ mount /dev/sdc /mnt2
      
        $ mkdir -p /mnt/data/n1/n2
        $ mkdir /mnt/data/n4
        $ mkdir -p /mnt/data/t6/t7
        $ mkdir /mnt/data/t5
        $ mkdir /mnt/data/t7
        $ mkdir /mnt/data/n4/t2
        $ mkdir /mnt/data/t4
        $ mkdir /mnt/data/t3
        $ mv /mnt/data/t7 /mnt/data/n4/t2
        $ mv /mnt/data/t4 /mnt/data/n4/t2/t7
        $ mv /mnt/data/t5 /mnt/data/n4/t2/t7/t4
        $ mv /mnt/data/t6 /mnt/data/n4/t2/t7/t4/t5
        $ mv /mnt/data/n1/n2 /mnt/data/n4/t2/t7/t4/t5/t6
        $ mv /mnt/data/n1 /mnt/data/n4/t2/t7/t4/t5/t6
        $ mv /mnt/data/n4/t2/t7/t4/t5/t6/t7 /mnt/data/n4/t2/t7/t4/t5/t6/n2
        $ mv /mnt/data/t3 /mnt/data/n4/t2/t7/t4/t5/t6/n2/t7
      
        $ btrfs subvolume snapshot -r /mnt /mnt/snap1
      
        $ mv /mnt/data/n4/t2/t7/t4/t5/t6/n1 /mnt/data/n4
        $ mv /mnt/data/n4/t2 /mnt/data/n4/n1
        $ mv /mnt/data/n4/n1/t2/t7/t4/t5/t6/n2 /mnt/data/n4/n1/t2
        $ mv /mnt/data/n4/n1/t2/n2/t7/t3 /mnt/data/n4/n1/t2
        $ mv /mnt/data/n4/n1/t2/t7/t4/t5/t6 /mnt/data/n4/n1/t2
        $ mv /mnt/data/n4/n1/t2/t7/t4 /mnt/data/n4/n1/t2/t6
        $ mv /mnt/data/n4/n1/t2/t7 /mnt/data/n4/n1/t2/t3
        $ mv /mnt/data/n4/n1/t2/n2/t7 /mnt/data/n4/n1/t2
      
        $ btrfs subvolume snapshot -r /mnt /mnt/snap2
      
        $ btrfs send /mnt/snap1 | btrfs receive /mnt2
        $ btrfs send -p /mnt/snap1 /mnt/snap2 | btrfs receive /mnt2
        ERROR: send ioctl failed with -12: Cannot allocate memory
      
      Where the parent snapshot directory hierarchy is the following:
      
        .                                                        (ino 256)
        |-- data/                                                (ino 257)
              |-- n4/                                            (ino 260)
                   |-- t2/                                       (ino 265)
                        |-- t7/                                  (ino 264)
                             |-- t4/                             (ino 266)
                                  |-- t5/                        (ino 263)
                                       |-- t6/                   (ino 261)
                                            |-- n1/              (ino 258)
                                            |-- n2/              (ino 259)
                                                 |-- t7/         (ino 262)
                                                      |-- t3/    (ino 267)
      
      And the send snapshot's directory hierarchy is the following:
      
        .                                                        (ino 256)
        |-- data/                                                (ino 257)
              |-- n4/                                            (ino 260)
                   |-- n1/                                       (ino 258)
                        |-- t2/                                  (ino 265)
                             |-- n2/                             (ino 259)
                             |-- t3/                             (ino 267)
                             |    |-- t7                         (ino 264)
                             |
                             |-- t6/                             (ino 261)
                             |    |-- t4/                        (ino 266)
                             |         |-- t5/                   (ino 263)
                             |
                             |-- t7/                             (ino 262)
      
      While processing inode 262 we orphanize inode 264 and later attempt
      to rename inode 264 to its new name/location, which resulted in building
      an incorrect destination path string for the rename operation with the
      value "data/n4/t2/t7/t4/t5/t6/n2/t7/t3/t7". This rename operation must
      have been done only after inode 267 is processed and renamed, as the
      ancestor-descendent relationship between inodes 264 and 267 was reversed
      between both snapshots, because otherwise it results in an infinite loop
      when building the path string for inode 264 when we are processing an
      inode with a number larger than 264. That loop is the following:
      
        start inode 264, send progress of 265 for example
        parent of 264 -> 267
        parent of 267 -> 262
        parent of 262 -> 259
        parent of 259 -> 261
        parent of 261 -> 263
        parent of 263 -> 266
        parent of 266 -> 264
          |--> back to first iteration while current path string length
               is <= PATH_MAX, and fail with -ENOMEM otherwise
      
      So fix this by making the check if we need to delay a directory rename
      regardless of the current inode having been orphanized or not.
      
      A test case for fstests follows soon.
      
      Thanks to Robbie Ko for providing a reproducer for this problem.
      Reported-by: NRobbie Ko <robbieko@synology.com>
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      8b191a68
    • F
      Btrfs: incremental send, don't delay directory renames unnecessarily · 80aa6027
      Filipe Manana 提交于
      Even though we delay the rename of directories when they become
      descendents of other directories that were also renamed in the send
      root to prevent infinite path build loops, we were doing it in cases
      where this was not needed and was actually harmful resulting in
      infinite path build loops as we ended up with a circular dependency
      of delayed directory renames.
      
      Consider the following reproducer:
      
        $ mkfs.btrfs -f /dev/sdb
        $ mount /dev/sdb /mnt
        $ mkfs.btrfs -f /dev/sdc
        $ mount /dev/sdc /mnt2
      
        $ mkdir /mnt/data
        $ mkdir /mnt/data/n1
        $ mkdir /mnt/data/n1/n2
        $ mkdir /mnt/data/n4
        $ mkdir /mnt/data/n1/n2/p1
        $ mkdir /mnt/data/n1/n2/p1/p2
        $ mkdir /mnt/data/t6
        $ mkdir /mnt/data/t7
        $ mkdir -p /mnt/data/t5/t7
        $ mkdir /mnt/data/t2
        $ mkdir /mnt/data/t4
        $ mkdir -p /mnt/data/t1/t3
        $ mkdir /mnt/data/p1
        $ mv /mnt/data/t1 /mnt/data/p1
        $ mkdir -p /mnt/data/p1/p2
        $ mv /mnt/data/t4 /mnt/data/p1/p2/t1
        $ mv /mnt/data/t5 /mnt/data/n4/t5
        $ mv /mnt/data/n1/n2/p1/p2 /mnt/data/n4/t5/p2
        $ mv /mnt/data/t7 /mnt/data/n4/t5/p2/t7
        $ mv /mnt/data/t2 /mnt/data/n4/t1
        $ mv /mnt/data/p1 /mnt/data/n4/t5/p2/p1
        $ mv /mnt/data/n1/n2 /mnt/data/n4/t5/p2/p1/p2/n2
        $ mv /mnt/data/n4/t5/p2/p1/p2/t1 /mnt/data/n4/t5/p2/p1/p2/n2/t1
        $ mv /mnt/data/n4/t5/t7 /mnt/data/n4/t5/p2/p1/p2/n2/t1/t7
        $ mv /mnt/data/n4/t5/p2/p1/t1/t3 /mnt/data/n4/t5/p2/p1/p2/n2/t1/t3
        $ mv /mnt/data/n4/t5/p2/p1/p2/n2/p1 /mnt/data/n4/t5/p2/p1/p2/n2/t1/t7/p1
        $ mv /mnt/data/t6 /mnt/data/n4/t5/p2/p1/p2/n2/t1/t3/t5
        $ mv /mnt/data/n4/t5/p2/p1/t1 /mnt/data/n4/t5/p2/p1/p2/n2/t1/t3/t1
        $ mv /mnt/data/n1 /mnt/data/n4/t5/p2/p1/p2/n2/t1/t7/p1/n1
      
        $ btrfs subvolume snapshot -r /mnt /mnt/snap1
      
        $ mv /mnt/data/n4/t1 /mnt/data/n4/t5/p2/p1/p2/n2/t1/t7/p1/t1
        $ mv /mnt/data/n4/t5/p2/p1/p2/n2/t1 /mnt/data/n4/
        $ mv /mnt/data/n4/t5/p2/p1/p2/n2 /mnt/data/n4/t1/n2
        $ mv /mnt/data/n4/t1/t7/p1 /mnt/data/n4/t1/n2/p1
        $ mv /mnt/data/n4/t1/t3/t1 /mnt/data/n4/t1/n2/t1
        $ mv /mnt/data/n4/t1/t3 /mnt/data/n4/t1/n2/t1/t3
        $ mv /mnt/data/n4/t5/p2/p1/p2 /mnt/data/n4/t1/n2/p1/p2
        $ mv /mnt/data/n4/t1/t7 /mnt/data/n4/t1/n2/p1/t7
        $ mv /mnt/data/n4/t5/p2/p1 /mnt/data/n4/t1/n2/p1/p2/p1
        $ mv /mnt/data/n4/t1/n2/t1/t3/t5 /mnt/data/n4/t1/n2/p1/p2/t5
        $ mv /mnt/data/n4/t5 /mnt/data/n4/t1/n2/p1/p2/p1/t5
        $ mv /mnt/data/n4/t1/n2/p1/p2/p1/t5/p2 /mnt/data/n4/t1/n2/p1/p2/p1/p2
        $ mv /mnt/data/n4/t1/n2/p1/p2/p1/p2/t7 /mnt/data/n4/t1/t7
      
        $ btrfs subvolume snapshot -r /mnt /mnt/snap2
      
        $ btrfs send /mnt/snap1 | btrfs receive /mnt2
        $ btrfs send -p /mnt/snap1 /mnt/snap2 | btrfs receive -vv /mnt2
        ERROR: send ioctl failed with -12: Cannot allocate memory
      
      This reproducer resulted in an infinite path build loop when building the
      path for inode 266 because the following circular dependency of delayed
      directory renames was created:
      
         ino 272 <- ino 261 <- ino 259 <- ino 268 <- ino 267 <- ino 261
      
      Where the notation "X <- Y" means the rename of inode X is delayed by the
      rename of inode Y (X will be renamed after Y is renamed). This resulted
      in an infinite path build loop of inode 266 because that inode has inode
      261 as an ancestor in the send root and inode 261 is in the circular
      dependency of delayed renames listed above.
      
      Fix this by not delaying the rename of a directory inode if an ancestor of
      the inode in the send root, which has a delayed rename operation, is not
      also a descendent of the inode in the parent root.
      
      Thanks to Robbie Ko for sending the reproducer example.
      A test case for xfstests follows soon.
      Reported-by: NRobbie Ko <robbieko@synology.com>
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      80aa6027
  18. 27 3月, 2015 4 次提交
    • F
      Btrfs: incremental send, remove dead code · 5f806c3a
      Filipe Manana 提交于
      The logic to detect path loops when attempting to apply a pending
      directory rename, introduced in commit
      f959492f (Btrfs: send, fix more issues related to directory renames)
      is no longer needed, and the respective fstests test case for that commit,
      btrfs/045, now passes without this code (as well as all the other test
      cases for send/receive).
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      5f806c3a
    • F
      Btrfs: incremental send, clear name from cache after orphanization · 8996a48c
      Filipe Manana 提交于
      If a directory's reference ends up being orphanized, because the inode
      currently being processed has a new path that matches that directory's
      path, make sure we evict the name of the directory from the name cache.
      This is because there might be descendent inodes (either directories or
      regular files) that will be orphanized later too, and therefore the
      orphan name of the ancestor must be used, otherwise we send issue rename
      operations with a wrong path in the send stream.
      
      Reproducer:
      
        $ mkfs.btrfs -f /dev/sdb
        $ mount /dev/sdb /mnt
      
        $ mkdir -p /mnt/data/n1/n2/p1/p2
        $ mkdir /mnt/data/n4
        $ mkdir -p /mnt/data/p1/p2
      
        $ btrfs subvolume snapshot -r /mnt /mnt/snap1
      
        $ mv /mnt/data/p1/p2 /mnt/data
        $ mv /mnt/data/n1/n2/p1/p2 /mnt/data/p1
        $ mv /mnt/data/p2 /mnt/data/n1/n2/p1
        $ mv /mnt/data/n1/n2 /mnt/data/p1
        $ mv /mnt/data/p1 /mnt/data/n4
        $ mv /mnt/data/n4/p1/n2/p1 /mnt/data
      
        $ btrfs subvolume snapshot -r /mnt /mnt/snap2
      
        $ btrfs send /mnt/snap1 -f /tmp/1.send
        $ btrfs send -p /mnt/snap1 /mnt/snap2 -f /tmp/2.send
      
        $ mkfs.btrfs -f /dev/sdc
        $ mount /dev/sdc /mnt2
        $ btrfs receive /mnt2 -f /tmp/1.send
        $ btrfs receive /mnt2 -f /tmp/2.send
        ERROR: rename data/p1/p2 -> data/n4/p1/p2 failed. no such file or directory
      
      Directories data/p1 (inode 263) and data/p1/p2 (inode 264) in the parent
      snapshot are both orphanized during the incremental send, and as soon as
      data/p1 is orphanized, we must make sure that when orphanizing data/p1/p2
      we use a source path of o263-6-o/p2 for the rename operation instead of
      the old path data/p1/p2 (the one before the orphanization of inode 263).
      
      A test case for xfstests follows soon.
      Reported-by: NRobbie Ko <robbieko@synology.com>
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      8996a48c
    • F
      Btrfs: send, don't leave without decrementing clone root's send_progress · 2f1f465a
      Filipe Manana 提交于
      If the clone root was not readonly or the dead flag was set on it, we were
      leaving without decrementing the root's send_progress counter (and before
      we just incremented it). If a concurrent snapshot deletion was in progress
      and ended up being aborted, it would be impossible to later attempt to
      delete again the snapshot, since the root's send_in_progress counter could
      never go back to 0.
      
      We were also setting clone_sources_to_rollback to i + 1 too early - if we
      bailed out because the clone root we got is not readonly or flagged as dead
      we ended up later derreferencing a null pointer because we didn't assign
      the clone root to sctx->clone_roots[i].root:
      
      		for (i = 0; sctx && i < clone_sources_to_rollback; i++)
      			btrfs_root_dec_send_in_progress(
      					sctx->clone_roots[i].root);
      
      So just don't increment the send_in_progress counter if the root is readonly
      or flagged as dead.
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NChris Mason <clm@fb.com>
      2f1f465a
    • F
      Btrfs: send, add missing check for dead clone root · 5cc2b17e
      Filipe Manana 提交于
      After we locked the root's root item, a concurrent snapshot deletion
      call might have set the dead flag on it. So check if the dead flag
      is set and abort if it is, just like we do for the parent root.
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NChris Mason <clm@fb.com>
      5cc2b17e
  19. 03 3月, 2015 1 次提交
    • F
      Btrfs: incremental send, don't rename a directory too soon · 84471e24
      Filipe Manana 提交于
      There's one more case where we can't issue a rename operation for a
      directory as soon as we process it. We used to delay directory renames
      only if they have some ancestor directory with a higher inode number
      that got renamed too, but there's another case where we need to delay
      the rename too - when a directory A is renamed to the old name of a
      directory B but that directory B has its rename delayed because it
      has now (in the send root) an ancestor with a higher inode number that
      was renamed. If we don't delay the directory rename in this case, the
      receiving end of the send stream will attempt to rename A to the old
      name of B before B got renamed to its new name, which results in a
      "directory not empty" error. So fix this by delaying directory renames
      for this case too.
      
      Steps to reproduce:
      
        $ mkfs.btrfs -f /dev/sdb
        $ mount /dev/sdb /mnt
      
        $ mkdir /mnt/a
        $ mkdir /mnt/b
        $ mkdir /mnt/c
        $ touch /mnt/a/file
      
        $ btrfs subvolume snapshot -r /mnt /mnt/snap1
      
        $ mv /mnt/c /mnt/x
        $ mv /mnt/a /mnt/x/y
        $ mv /mnt/b /mnt/a
      
        $ btrfs subvolume snapshot -r /mnt /mnt/snap2
      
        $ btrfs send /mnt/snap1 -f /tmp/1.send
        $ btrfs send -p /mnt/snap1 /mnt/snap2 -f /tmp/2.send
      
        $ mkfs.btrfs -f /dev/sdc
        $ mount /dev/sdc /mnt2
        $ btrfs receive /mnt2 -f /tmp/1.send
        $ btrfs receive /mnt2 -f /tmp/2.send
        ERROR: rename b -> a failed. Directory not empty
      
      A test case for xfstests follows soon.
      Reported-by: NAmes Cornish <ames@cornishes.net>
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      84471e24
  20. 03 2月, 2015 1 次提交
  21. 25 11月, 2014 1 次提交