1. 03 6月, 2015 7 次提交
    • F
      Btrfs: fix -ENOSPC on block group removal · 39c2d7fa
      Filipe Manana 提交于
      Unlike when attempting to allocate a new block group, where we check
      that we have enough space in the system space_info to update the device
      items and insert a new chunk item in the chunk tree, we were not checking
      if the system space_info had enough space for updating the device items
      and deleting the chunk item in the chunk tree. This often lead to -ENOSPC
      error when attempting to allocate blocks for the chunk tree (during btree
      node/leaf COW operations) while updating the device items or deleting the
      chunk item, which resulted in the current transaction being aborted and
      turning the filesystem into read-only mode.
      
      While running fstests generic/038, which stresses allocation of block
      groups and removal of unused block groups, with a large scratch device
      (750Gb) this happened often, despite more than enough unallocated space,
      and resulted in the following trace:
      
      [68663.586604] WARNING: CPU: 3 PID: 1521 at fs/btrfs/super.c:260 __btrfs_abort_transaction+0x52/0x114 [btrfs]()
      [68663.600407] BTRFS: Transaction aborted (error -28)
      (...)
      [68663.730829] Call Trace:
      [68663.732585]  [<ffffffff8142fa46>] dump_stack+0x4f/0x7b
      [68663.734334]  [<ffffffff8108b6a2>] ? console_unlock+0x361/0x3ad
      [68663.739980]  [<ffffffff81045ea5>] warn_slowpath_common+0xa1/0xbb
      [68663.757153]  [<ffffffffa036ca6d>] ? __btrfs_abort_transaction+0x52/0x114 [btrfs]
      [68663.760925]  [<ffffffff81045f05>] warn_slowpath_fmt+0x46/0x48
      [68663.762854]  [<ffffffffa03b159d>] ? btrfs_update_device+0x15a/0x16c [btrfs]
      [68663.764073]  [<ffffffffa036ca6d>] __btrfs_abort_transaction+0x52/0x114 [btrfs]
      [68663.765130]  [<ffffffffa03b3638>] btrfs_remove_chunk+0x597/0x5ee [btrfs]
      [68663.765998]  [<ffffffffa0384663>] ? btrfs_delete_unused_bgs+0x245/0x296 [btrfs]
      [68663.767068]  [<ffffffffa0384676>] btrfs_delete_unused_bgs+0x258/0x296 [btrfs]
      [68663.768227]  [<ffffffff8143527f>] ? _raw_spin_unlock_irq+0x2d/0x4c
      [68663.769081]  [<ffffffffa038b109>] cleaner_kthread+0x13d/0x16c [btrfs]
      [68663.799485]  [<ffffffffa038afcc>] ? btrfs_alloc_root+0x28/0x28 [btrfs]
      [68663.809208]  [<ffffffff8105f367>] kthread+0xef/0xf7
      [68663.828795]  [<ffffffff810e603f>] ? time_hardirqs_on+0x15/0x28
      [68663.844942]  [<ffffffff8105f278>] ? __kthread_parkme+0xad/0xad
      [68663.846486]  [<ffffffff81435a88>] ret_from_fork+0x58/0x90
      [68663.847760]  [<ffffffff8105f278>] ? __kthread_parkme+0xad/0xad
      [68663.849503] ---[ end trace 798477c6d6dbaad6 ]---
      [68663.850525] BTRFS: error (device sdc) in btrfs_remove_chunk:2652: errno=-28 No space left
      
      So fix this by verifying that enough space exists in system space_info,
      and reserving the space in the chunk block reserve, before attempting to
      delete the block group and allocate a new system chunk if we don't have
      enough space to perform the necessary updates and delete in the chunk
      tree. Like for the block group creation case, we don't error our if we
      fail to allocate a new system chunk, since we might end up not needing
      it (no node/leaf splits happen during the COW operations and/or we end
      up not needing to COW any btree nodes or leafs because they were already
      COWed in the current transaction and their writeback didn't start yet).
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      39c2d7fa
    • F
      Btrfs: fix chunk allocation regression leading to transaction abort · c152b63e
      Filipe Manana 提交于
      With commit 1b984508 ("Btrfs: fix find_free_dev_extent() malfunction
      in case device tree has hole") introduced in the kernel 4.1 merge window,
      we end up using part of a device hole for which there are already pending
      chunks or pinned chunks. Before that commit we didn't use the hole and
      would just move on to the next hole in the device.
      
      However when we adjust the start offset for the chunk allocation and we
      have pinned chunks, we set it blindly to the end offset of the pinned
      chunk we are currently processing, which is dangerous because we can
      have a pending chunk that has a start offset that matches the end offset
      of our pinned chunk - leading us to a case where we end up getting two
      pending chunks that start at the same physical device offset, which makes
      us later abort the current transaction with -EEXIST when finishing the
      chunk allocation at btrfs_create_pending_block_groups():
      
      [194737.659017] ------------[ cut here ]------------
      [194737.660192] WARNING: CPU: 15 PID: 31111 at fs/btrfs/super.c:260 __btrfs_abort_transaction+0x52/0x106 [btrfs]()
      [194737.662209] BTRFS: Transaction aborted (error -17)
      [194737.663175] Modules linked in: btrfs dm_snapshot dm_bufio dm_flakey dm_mod crc32c_generic xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc loop fuse
      [194737.674015] CPU: 15 PID: 31111 Comm: xfs_io Tainted: G        W       4.0.0-rc5-btrfs-next-9+ #2
      [194737.675986] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
      [194737.682999]  0000000000000009 ffff8800564c7a98 ffffffff8142fa46 ffffffff8108b6a2
      [194737.684540]  ffff8800564c7ae8 ffff8800564c7ad8 ffffffff81045ea5 ffff8800564c7b78
      [194737.686017]  ffffffffa0383aa7 00000000ffffffef ffff88000c7ba000 ffff8801a1f66f40
      [194737.687509] Call Trace:
      [194737.688068]  [<ffffffff8142fa46>] dump_stack+0x4f/0x7b
      [194737.689027]  [<ffffffff8108b6a2>] ? console_unlock+0x361/0x3ad
      [194737.690095]  [<ffffffff81045ea5>] warn_slowpath_common+0xa1/0xbb
      [194737.691198]  [<ffffffffa0383aa7>] ? __btrfs_abort_transaction+0x52/0x106 [btrfs]
      [194737.693789]  [<ffffffff81045f05>] warn_slowpath_fmt+0x46/0x48
      [194737.695065]  [<ffffffffa0383aa7>] __btrfs_abort_transaction+0x52/0x106 [btrfs]
      [194737.696806]  [<ffffffffa039a3bd>] btrfs_create_pending_block_groups+0x101/0x130 [btrfs]
      [194737.698683]  [<ffffffffa03aa433>] __btrfs_end_transaction+0x84/0x366 [btrfs]
      [194737.700329]  [<ffffffffa03aa725>] btrfs_end_transaction+0x10/0x12 [btrfs]
      [194737.701924]  [<ffffffffa0394b51>] btrfs_check_data_free_space+0x11f/0x27c [btrfs]
      [194737.703675]  [<ffffffffa03b8ba4>] __btrfs_buffered_write+0x16a/0x4c8 [btrfs]
      [194737.705417]  [<ffffffffa03bb502>] ? btrfs_file_write_iter+0x19a/0x431 [btrfs]
      [194737.707058]  [<ffffffffa03bb511>] ? btrfs_file_write_iter+0x1a9/0x431 [btrfs]
      [194737.708560]  [<ffffffffa03bb68d>] btrfs_file_write_iter+0x325/0x431 [btrfs]
      [194737.710673]  [<ffffffff81067d85>] ? get_parent_ip+0xe/0x3e
      [194737.712076]  [<ffffffff811534c3>] new_sync_write+0x7c/0xa0
      [194737.713293]  [<ffffffff81153b58>] vfs_write+0xb2/0x117
      [194737.714443]  [<ffffffff81154424>] SyS_pwrite64+0x64/0x82
      [194737.715646]  [<ffffffff81435b32>] system_call_fastpath+0x12/0x17
      [194737.717175] ---[ end trace f2d5dc04e56d7e48 ]---
      [194737.718170] BTRFS: error (device sdc) in btrfs_create_pending_block_groups:9524: errno=-17 Object already exists
      
      The -EEXIST failure comes from btrfs_finish_chunk_alloc(), called by
      btrfs_create_pending_block_groups(), when it attempts to insert a
      duplicated device extent item via btrfs_alloc_dev_extent().
      
      This issue was reproducible with fstests generic/038 running in a loop for
      several hours (it's very hard to hit) and using MOUNT_OPTIONS="-o discard".
      Applying Jeff's recent patch titled "btrfs: add missing discards when
      unpinning extents with -o discard" makes the issue much easier to reproduce
      (usually within 4 to 5 hours), since it pins chunks for longer periods of
      time when an unused block group is deleted by the cleaner kthread.
      
      Fix this by making sure that we never adjust the start offset to a lower
      value than it currently has.
      
      Fixes: 1b984508 ("Btrfs: fix find_free_dev_extent() malfunction in case device tree has hole"
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      c152b63e
    • S
      btrfs: use after free when closing devices · 2037a093
      Sasha Levin 提交于
      __btrfs_close_devices() would call_rcu to free the device, which is racy with
      list_for_each_entry() accessing the memory to retrieve the next device on the
      list.
      Signed-off-by: NSasha Levin <sasha.levin@oracle.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NChris Mason <clm@fb.com>
      2037a093
    • A
      Btrfs: check error before reporting missing device and add uuid · 33b97e43
      Anand Jain 提交于
      Report missing device when add is successful,
      otherwise it would exit as ENOMEM. And add uuid
      to the report.
      Signed-off-by: NAnand Jain <anand.jain@oracle.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NChris Mason <clm@fb.com>
      33b97e43
    • A
      Btrfs: log when missing device is created · 816fcebe
      Anand Jain 提交于
      Signed-off-by: NAnand Jain <anand.jain@oracle.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NChris Mason <clm@fb.com>
      816fcebe
    • D
      btrfs: fix warnings after changes in btrfs_abort_transaction · 6d13f549
      David Sterba 提交于
      fs/btrfs/volumes.c: In function ‘btrfs_create_uuid_tree’:
      fs/btrfs/volumes.c:3909:3: warning: format ‘%d’ expects argument of type ‘int’, but argument 4 has type ‘long int’ [-Wformat=]
         btrfs_abort_transaction(trans, tree_root,
         ^
        CC [M]  fs/btrfs/ioctl.o
      fs/btrfs/ioctl.c: In function ‘create_subvol’:
      fs/btrfs/ioctl.c:549:3: warning: format ‘%d’ expects argument of type ‘int’, but argument 4 has type ‘long int’ [-Wformat=]
         btrfs_abort_transaction(trans, root, PTR_ERR(new_root));
      
      PTR_ERR returns long, but we're really using 'int' for the error codes
      everywhere so just set and use the local variable.
      Signed-off-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NChris Mason <clm@fb.com>
      6d13f549
    • F
      Btrfs: check pending chunks when shrinking fs to avoid corruption · 53e489bc
      Filipe Manana 提交于
      When we shrink the usable size of a device (its total_bytes), we go over
      all the device extent items in the device tree and attempt to relocate
      the chunk of any device extent that goes beyond the new usable size for
      the device. We do that after setting the new usable size (total_bytes) in
      the device object, so that all new allocations (and reallocations) don't
      use areas of the device that go beyond the new (shorter) size. However we
      were not considering that before setting the new size in the device,
      pending chunks might have been created that use device extents that go
      beyond the new size, and those device extents are not yet in the device
      tree after we search the device tree - they are still attached to the
      list of new block group for some ongoing transaction handle, and they are
      only added to the device tree when the transaction handle is ended (via
      btrfs_create_pending_block_groups()).
      
      So check for pending chunks with device extents that go beyond the new
      size and if any exists, commit the current transaction and repeat the
      search in the device tree.
      
      Not doing this it would mean we would return success to user space while
      still having extents that go beyond the new size, and later user space
      could override those locations on the device while the fs still references
      them, causing all sorts of corruption and unexpected events.
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      53e489bc
  2. 20 5月, 2015 1 次提交
    • F
      Btrfs: fix racy system chunk allocation when setting block group ro · a9629596
      Filipe Manana 提交于
      If while setting a block group read-only we end up allocating a system
      chunk, through check_system_chunk(), we were not doing it while holding
      the chunk mutex which is a problem if a concurrent chunk allocation is
      happening, through do_chunk_alloc(), as it means both block groups can
      end up using the same logical addresses and physical regions in the
      device(s). So make sure we hold the chunk mutex.
      
      Cc: stable@vger.kernel.org  # 4.0+
      Fixes: 2f081088 ("btrfs: delete chunk allocation attemp when
                            setting block group ro")
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      a9629596
  3. 26 4月, 2015 1 次提交
    • F
      Btrfs: fix find_free_dev_extent() malfunction in case device tree has hole · 1b984508
      Forrest Liu 提交于
      If device tree has hole, find_free_dev_extent() cannot find available
      address properly.
      
      The problem can be reproduce by following script.
      
          mntpath=/btrfs
          loopdev=/dev/loop0
          filepath=/home/forrest/image
      
          umount $mntpath
          losetup -d $loopdev
          truncate --size 100g $filepath
          losetup $loopdev $filepath
          mkfs.btrfs -f $loopdev
          mount $loopdev $mntpath
      
          # make device tree with one big hole
          for i in `seq 1 1 100`; do
              fallocate -l 1g $mntpath/$i
          done
          sync
          for i in `seq 1 1 95`; do
              rm $mntpath/$i
          done
          sync
      
          # wait cleaner thread remove unused block group
          sleep 300
      
          fallocate -l 1g $mntpath/aaa
      
          # failed to allocate new chunk
          fallocate -l 1g $mntpath/bbb
      
      Above script will make device tree with one big hole, and can only allocate
      just one chunk in a transaction, so failed to allocate new chunk for $mntpath/bbb
      
          item 8 key (1 DEV_EXTENT 2185232384) itemoff 15859 itemsize 48
              dev extent chunk_tree 3
              chunk objectid 256 chunk offset 106292051968 length 1073741824
          item 9 key (1 DEV_EXTENT 104190705664) itemoff 15811 itemsize 48
              dev extent chunk_tree 3
              chunk objectid 256 chunk offset 103108575232 length 1073741824
      Signed-off-by: NForrest Liu <forrestl@synology.com>
      Reviewed-by: NLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      1b984508
  4. 13 4月, 2015 1 次提交
  5. 04 3月, 2015 9 次提交
  6. 20 2月, 2015 1 次提交
  7. 17 2月, 2015 4 次提交
  8. 15 2月, 2015 1 次提交
    • Z
      btrfs: Fix out-of-space bug · 13212b54
      Zhao Lei 提交于
      Btrfs will report NO_SPACE when we create and remove files for several times,
      and we can't write to filesystem until mount it again.
      
      Steps to reproduce:
       1: Create a single-dev btrfs fs with default option
       2: Write a file into it to take up most fs space
       3: Delete above file
       4: Wait about 100s to let chunk removed
       5: goto 2
      
      Script is like following:
       #!/bin/bash
      
       # Recommend 1.2G space, too large disk will make test slow
       DEV="/dev/sda16"
       MNT="/mnt/tmp"
      
       dev_size="$(lsblk -bn -o SIZE "$DEV")" || exit 2
       file_size_m=$((dev_size * 75 / 100 / 1024 / 1024))
      
       echo "Loop write ${file_size_m}M file on $((dev_size / 1024 / 1024))M dev"
      
       for ((i = 0; i < 10; i++)); do umount "$MNT" 2>/dev/null; done
       echo "mkfs $DEV"
       mkfs.btrfs -f "$DEV" >/dev/null || exit 2
       echo "mount $DEV $MNT"
       mount "$DEV" "$MNT" || exit 2
      
       for ((loop_i = 0; loop_i < 20; loop_i++)); do
           echo
           echo "loop $loop_i"
      
           echo "dd file..."
           cmd=(dd if=/dev/zero of="$MNT"/file0 bs=1M count="$file_size_m")
           "${cmd[@]}" 2>/dev/null || {
               # NO_SPACE error triggered
               echo "dd failed: ${cmd[*]}"
               exit 1
           }
      
           echo "rm file..."
           rm -f "$MNT"/file0 || exit 2
      
           for ((i = 0; i < 10; i++)); do
               df "$MNT" | tail -1
               sleep 10
           done
       done
      
      Reason:
       It is triggered by commit: 47ab2a6c
       which is used to remove empty block groups automatically, but the
       reason is not in that patch. Code before works well because btrfs
       don't need to create and delete chunks so many times with high
       complexity.
       Above bug is caused by many reason, any of them can trigger it.
      
      Reason1:
       When we remove some continuous chunks but leave other chunks after,
       these disk space should be used by chunk-recreating, but in current
       code, only first create will successed.
       Fixed by Forrest Liu <forrestl@synology.com> in:
       Btrfs: fix find_free_dev_extent() malfunction in case device tree has hole
      
      Reason2:
       contains_pending_extent() return wrong value in calculation.
       Fixed by Forrest Liu <forrestl@synology.com> in:
       Btrfs: fix find_free_dev_extent() malfunction in case device tree has hole
      
      Reason3:
       btrfs_check_data_free_space() try to commit transaction and retry
       allocating chunk when the first allocating failed, but space_info->full
       is set in first allocating, and prevent second allocating in retry.
       Fixed in this patch by clear space_info->full in commit transaction.
      
       Tested for severial times by above script.
      
      Changelog v3->v4:
       use light weight int instead of atomic_t to record have_remove_bgs in
       transaction, suggested by:
       Josef Bacik <jbacik@fb.com>
      
      Changelog v2->v3:
       v2 fixed the bug by adding more commit-transaction, but we
       only need to reclaim space when we are really have no space for
       new chunk, noticed by:
       Filipe David Manana <fdmanana@gmail.com>
      
       Actually, our code already have this type of commit-and-retry,
       we only need to make it working with removed-bgs.
       v3 fixed the bug with above way.
      
      Changelog v1->v2:
       v1 will introduce a new bug when delete and create chunk in same disk
       space in same transaction, noticed by:
       Filipe David Manana <fdmanana@gmail.com>
       V2 fix this bug by commit transaction after remove block grops.
      Reported-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com>
      Suggested-by: NFilipe David Manana <fdmanana@gmail.com>
      Suggested-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NZhao Lei <zhaolei@cn.fujitsu.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      13212b54
  9. 03 2月, 2015 2 次提交
  10. 22 1月, 2015 5 次提交
  11. 17 12月, 2014 1 次提交
  12. 13 12月, 2014 1 次提交
    • D
      btrfs: sink blocksize parameter to btrfs_find_create_tree_block · a83fffb7
      David Sterba 提交于
      Finally it's clear that the requested blocksize is always equal to
      nodesize, with one exception, the superblock.
      
      Superblock has fixed size regardless of the metadata block size, but
      uses the same helpers to initialize sys array/chunk tree and to work
      with the chunk items. So it pretends to be an extent_buffer for a
      moment, btrfs_read_sys_array is full of special cases, we're adding one
      more.
      Signed-off-by: NDavid Sterba <dsterba@suse.cz>
      a83fffb7
  13. 03 12月, 2014 6 次提交
    • F
      Btrfs: fix fs mapping extent map leak · 495e64f4
      Filipe Manana 提交于
      On chunk allocation error (label "error_del_extent"), after adding the
      extent map to the tree and to the pending chunks list, we would leave
      decrementing the extent map's refcount by 2 instead of 3 (our allocation
      + tree reference + list reference).
      
      Also, on chunk/block group removal, if the block group was on the list
      pending_chunks we weren't decrementing the respective list reference.
      
      Detected by 'rmmod btrfs':
      
      [20770.105881] kmem_cache_destroy btrfs_extent_map: Slab cache still has objects
      [20770.106127] CPU: 2 PID: 11093 Comm: rmmod Tainted: G        W    L 3.17.0-rc5-btrfs-next-1+ #1
      [20770.106128] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
      [20770.106130]  0000000000000000 ffff8800ba867eb8 ffffffff813e7a13 ffff8800a2e11040
      [20770.106132]  ffff8800ba867ed0 ffffffff81105d0c 0000000000000000 ffff8800ba867ee0
      [20770.106134]  ffffffffa035d65e ffff8800ba867ef0 ffffffffa03b0654 ffff8800ba867f78
      [20770.106136] Call Trace:
      [20770.106142]  [<ffffffff813e7a13>] dump_stack+0x45/0x56
      [20770.106145]  [<ffffffff81105d0c>] kmem_cache_destroy+0x4b/0x90
      [20770.106164]  [<ffffffffa035d65e>] extent_map_exit+0x1a/0x1c [btrfs]
      [20770.106176]  [<ffffffffa03b0654>] exit_btrfs_fs+0x27/0x9d3 [btrfs]
      [20770.106179]  [<ffffffff8109dc97>] SyS_delete_module+0x153/0x1c4
      [20770.106182]  [<ffffffff8121261b>] ? trace_hardirqs_on_thunk+0x3a/0x3c
      [20770.106184]  [<ffffffff813ebf52>] system_call_fastpath+0x16/0x1b
      
      This applies on top (depends on) of my previous patch titled:
      "Btrfs: fix race between fs trimming and block group remove/allocation"
      
      But the issue in fact was already present before that change, it only
      became easier to hit after Josef's 3.18 patch that added automatic
      removal of empty block groups.
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      495e64f4
    • F
      Btrfs: fix race between fs trimming and block group remove/allocation · 04216820
      Filipe Manana 提交于
      Our fs trim operation, which is completely transactionless (doesn't start
      or joins an existing transaction) consists of visiting all block groups
      and then for each one to iterate its free space entries and perform a
      discard operation against the space range represented by the free space
      entries. However before performing a discard, the corresponding free space
      entry is removed from the free space rbtree, and when the discard completes
      it is added back to the free space rbtree.
      
      If a block group remove operation happens while the discard is ongoing (or
      before it starts and after a free space entry is hidden), we end up not
      waiting for the discard to complete, remove the extent map that maps
      logical address to physical addresses and the corresponding chunk metadata
      from the the chunk and device trees. After that and before the discard
      completes, the current running transaction can finish and a new one start,
      allowing for new block groups that map to the same physical addresses to
      be allocated and written to.
      
      So fix this by keeping the extent map in memory until the discard completes
      so that the same physical addresses aren't reused before it completes.
      
      If the physical locations that are under a discard operation end up being
      used for a new metadata block group for example, and dirty metadata extents
      are written before the discard finishes (the VM might call writepages() of
      our btree inode's i_mapping for example, or an fsync log commit happens) we
      end up overwriting metadata with zeroes, which leads to errors from fsck
      like the following:
      
              checking extents
              Check tree block failed, want=833912832, have=0
              Check tree block failed, want=833912832, have=0
              Check tree block failed, want=833912832, have=0
              Check tree block failed, want=833912832, have=0
              Check tree block failed, want=833912832, have=0
              read block failed check_tree_block
              owner ref check failed [833912832 16384]
              Errors found in extent allocation tree or chunk allocation
              checking free space cache
              checking fs roots
              Check tree block failed, want=833912832, have=0
              Check tree block failed, want=833912832, have=0
              Check tree block failed, want=833912832, have=0
              Check tree block failed, want=833912832, have=0
              Check tree block failed, want=833912832, have=0
              read block failed check_tree_block
              root 5 root dir 256 error
              root 5 inode 260 errors 2001, no inode item, link count wrong
                      unresolved ref dir 256 index 0 namelen 8 name foobar_3 filetype 1 errors 6, no dir index, no inode ref
              root 5 inode 262 errors 2001, no inode item, link count wrong
                      unresolved ref dir 256 index 0 namelen 8 name foobar_5 filetype 1 errors 6, no dir index, no inode ref
              root 5 inode 263 errors 2001, no inode item, link count wrong
              (...)
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      04216820
    • M
      Btrfs, raid56: fix use-after-free problem in the final device replace procedure on raid56 · 4245215d
      Miao Xie 提交于
      The commit c404e0dc (Btrfs: fix use-after-free in the finishing
      procedure of the device replace) fixed a use-after-free problem
      which happened when removing the source device at the end of device
      replace, but at that time, btrfs didn't support device replace
      on raid56, so we didn't fix the problem on the raid56 profile.
      Currently, we implemented device replace for raid56, so we need
      kick that problem out before we enable that function for raid56.
      
      The fix method is very simple, we just increase the bio per-cpu
      counter before we submit a raid56 io, and decrease the counter
      when the raid56 io ends.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      4245215d
    • M
      Btrfs, replace: write dirty pages into the replace target device · 2c8cdd6e
      Miao Xie 提交于
      The implementation is simple:
      - In order to avoid changing the code logic of btrfs_map_bio and
        RAID56, we add the stripes of the replace target devices at the
        end of the stripe array in btrfs bio, and we sort those target
        device stripes in the array. And we keep the number of the target
        device stripes in the btrfs bio.
      - Except write operation on RAID56, all the other operation don't
        take the target device stripes into account.
      - When we do write operation, we read the data from the common devices
        and calculate the parity. Then write the dirty data and new parity
        out, at this time, we will find the relative replace target stripes
        and wirte the relative data into it.
      
      Note: The function that copying old data on the source device to
      the target device was implemented in the past, it is similar to
      the other RAID type.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      2c8cdd6e
    • M
      Btrfs, scrub: repair the common data on RAID5/6 if it is corrupted · af8e2d1d
      Miao Xie 提交于
      This patch implement the RAID5/6 common data repair function, the
      implementation is similar to the scrub on the other RAID such as
      RAID1, the differentia is that we don't read the data from the
      mirror, we use the data repair function of RAID5/6.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      af8e2d1d
    • Z
      Btrfs: remove unnecessary code of stripe_index assignment in __btrfs_map_block · 6de65650
      Zhao Lei 提交于
      stripe_index's value was set again in latter line:
      stripe_index = 0;
      Signed-off-by: NZhao Lei <zhaolei@cn.fujitsu.com>
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.cz>
      6de65650