1. 19 11月, 2019 3 次提交
  2. 18 11月, 2019 4 次提交
  3. 05 11月, 2019 1 次提交
    • D
      btrfs: un-deprecate ioctls START_SYNC and WAIT_SYNC · a5009d3a
      David Sterba 提交于
      The two ioctls START_SYNC and WAIT_SYNC were mistakenly marked as
      deprecated and scheduled for removal but we actualy do use them for
      'btrfs subvolume delete -C/-c'. The deprecated thing in ebc87351
      should have been just the async flag for subvolume creation.
      
      The deprecation has been added in this development cycle, remove it
      until it's time.
      
      Fixes: ebc87351 ("btrfs: Deprecate BTRFS_SUBVOL_CREATE_ASYNC flag")
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      a5009d3a
  4. 16 10月, 2019 1 次提交
    • Q
      btrfs: qgroup: Always free PREALLOC META reserve in btrfs_delalloc_release_extents() · 8702ba93
      Qu Wenruo 提交于
      [Background]
      Btrfs qgroup uses two types of reserved space for METADATA space,
      PERTRANS and PREALLOC.
      
      PERTRANS is metadata space reserved for each transaction started by
      btrfs_start_transaction().
      While PREALLOC is for delalloc, where we reserve space before joining a
      transaction, and finally it will be converted to PERTRANS after the
      writeback is done.
      
      [Inconsistency]
      However there is inconsistency in how we handle PREALLOC metadata space.
      
      The most obvious one is:
      In btrfs_buffered_write():
      	btrfs_delalloc_release_extents(BTRFS_I(inode), reserve_bytes, true);
      
      We always free qgroup PREALLOC meta space.
      
      While in btrfs_truncate_block():
      	btrfs_delalloc_release_extents(BTRFS_I(inode), blocksize, (ret != 0));
      
      We only free qgroup PREALLOC meta space when something went wrong.
      
      [The Correct Behavior]
      The correct behavior should be the one in btrfs_buffered_write(), we
      should always free PREALLOC metadata space.
      
      The reason is, the btrfs_delalloc_* mechanism works by:
      - Reserve metadata first, even it's not necessary
        In btrfs_delalloc_reserve_metadata()
      
      - Free the unused metadata space
        Normally in:
        btrfs_delalloc_release_extents()
        |- btrfs_inode_rsv_release()
           Here we do calculation on whether we should release or not.
      
      E.g. for 64K buffered write, the metadata rsv works like:
      
      /* The first page */
      reserve_meta:	num_bytes=calc_inode_reservations()
      free_meta:	num_bytes=0
      total:		num_bytes=calc_inode_reservations()
      /* The first page caused one outstanding extent, thus needs metadata
         rsv */
      
      /* The 2nd page */
      reserve_meta:	num_bytes=calc_inode_reservations()
      free_meta:	num_bytes=calc_inode_reservations()
      total:		not changed
      /* The 2nd page doesn't cause new outstanding extent, needs no new meta
         rsv, so we free what we have reserved */
      
      /* The 3rd~16th pages */
      reserve_meta:	num_bytes=calc_inode_reservations()
      free_meta:	num_bytes=calc_inode_reservations()
      total:		not changed (still space for one outstanding extent)
      
      This means, if btrfs_delalloc_release_extents() determines to free some
      space, then those space should be freed NOW.
      So for qgroup, we should call btrfs_qgroup_free_meta_prealloc() other
      than btrfs_qgroup_convert_reserved_meta().
      
      The good news is:
      - The callers are not that hot
        The hottest caller is in btrfs_buffered_write(), which is already
        fixed by commit 336a8bb8 ("btrfs: Fix wrong
        btrfs_delalloc_release_extents parameter"). Thus it's not that
        easy to cause false EDQUOT.
      
      - The trans commit in advance for qgroup would hide the bug
        Since commit f5fef459 ("btrfs: qgroup: Make qgroup async transaction
        commit more aggressive"), when btrfs qgroup metadata free space is slow,
        it will try to commit transaction and free the wrongly converted
        PERTRANS space, so it's not that easy to hit such bug.
      
      [FIX]
      So to fix the problem, remove the @qgroup_free parameter for
      btrfs_delalloc_release_extents(), and always pass true to
      btrfs_inode_rsv_release().
      Reported-by: NFilipe Manana <fdmanana@suse.com>
      Fixes: 43b18595 ("btrfs: qgroup: Use separate meta reservation type for delalloc")
      CC: stable@vger.kernel.org # 4.19+
      Reviewed-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      8702ba93
  5. 09 9月, 2019 7 次提交
  6. 04 7月, 2019 1 次提交
  7. 02 7月, 2019 1 次提交
  8. 01 7月, 2019 3 次提交
    • D
      vfs: create a generic checking function for FS_IOC_FSSETXATTR · 7b0e492e
      Darrick J. Wong 提交于
      Create a generic checking function for the incoming FS_IOC_FSSETXATTR
      fsxattr values so that we can standardize some of the implementation
      behaviors.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      7b0e492e
    • D
      vfs: create a generic checking and prep function for FS_IOC_SETFLAGS · 5aca2842
      Darrick J. Wong 提交于
      Create a generic function to check incoming FS_IOC_SETFLAGS flag values
      and later prepare the inode for updates so that we can standardize the
      implementations that follow ext4's flag values.
      
      Note that the efivarfs implementation no longer fails a no-op SETFLAGS
      without CAP_LINUX_IMMUTABLE since that's the behavior in ext*.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Acked-by: NDavid Sterba <dsterba@suse.com>
      Reviewed-by: NBob Peterson <rpeterso@redhat.com>
      5aca2842
    • Q
      btrfs: Flush before reflinking any extent to prevent NOCOW write falling back... · a94d1d0c
      Qu Wenruo 提交于
      btrfs: Flush before reflinking any extent to prevent NOCOW write falling back to COW without data reservation
      
      [BUG]
      The following script can cause unexpected fsync failure:
      
        #!/bin/bash
      
        dev=/dev/test/test
        mnt=/mnt/btrfs
      
        mkfs.btrfs -f $dev -b 512M > /dev/null
        mount $dev $mnt -o nospace_cache
      
        # Prealloc one extent
        xfs_io -f -c "falloc 8k 64m" $mnt/file1
        # Fill the remaining data space
        xfs_io -f -c "pwrite 0 -b 4k 512M" $mnt/padding
        sync
      
        # Write into the prealloc extent
        xfs_io -c "pwrite 1m 16m" $mnt/file1
      
        # Reflink then fsync, fsync would fail due to ENOSPC
        xfs_io -c "reflink $mnt/file1 8k 0 4k" -c "fsync" $mnt/file1
        umount $dev
      
      The fsync fails with ENOSPC, and the last page of the buffered write is
      lost.
      
      [CAUSE]
      This is caused by:
      - Btrfs' back reference only has extent level granularity
        So write into shared extent must be COWed even only part of the extent
        is shared.
      
      So for above script we have:
      - fallocate
        Create a preallocated extent where we can do NOCOW write.
      
      - fill all the remaining data and unallocated space
      
      - buffered write into preallocated space
        As we have not enough space available for data and the extent is not
        shared (yet) we fall into NOCOW mode.
      
      - reflink
        Now part of the large preallocated extent is shared, later write
        into that extent must be COWed.
      
      - fsync triggers writeback
        But now the extent is shared and therefore we must fallback into COW
        mode, which fails with ENOSPC since there's not enough space to
        allocate data extents.
      
      [WORKAROUND]
      The workaround is to ensure any buffered write in the related extents
      (not just the reflink source range) get flushed before reflink/dedupe,
      so that NOCOW writes succeed that happened before reflinking succeed.
      
      The workaround is expensive, we could do it better by only flushing
      NOCOW range, but that needs extra accounting for NOCOW range.
      For now, fix the possible data loss first.
      Reviewed-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      a94d1d0c
  9. 20 6月, 2019 1 次提交
  10. 17 6月, 2019 1 次提交
    • F
      Btrfs: fix failure to persist compression property xattr deletion on fsync · 3763771c
      Filipe Manana 提交于
      After the recent series of cleanups in the properties and xattrs modules
      that landed in the 5.2 merge window, we ended up with a regression where
      after deleting the compression xattr property through the setflags ioctl,
      we don't set the BTRFS_INODE_COPY_EVERYTHING flag in the inode anymore.
      As a consequence, if the inode was fsync'ed when it had the compression
      property set, after deleting the compression property through the setflags
      ioctl and fsync'ing again the inode, the log will still contain the
      compression xattr, because the inode did not had that bit set, which
      made the fsync not delete all xattrs from the log and copy all xattrs
      from the subvolume tree to the log tree.
      
      This regression happens due to the fact that that series of cleanups
      made btrfs_set_prop() call the old function do_setxattr() (which is now
      named btrfs_setxattr()), and not the old version of btrfs_setxattr(),
      which is now called btrfs_setxattr_trans().
      
      Fix this by setting the BTRFS_INODE_COPY_EVERYTHING bit in the current
      btrfs_setxattr() function and remove it from everywhere else, including
      its setup at btrfs_ioctl_setflags(). This is cleaner, avoids similar
      regressions in the future, and centralizes the setup of the bit. After
      all, the need to setup this bit should only be in the xattrs module,
      since it is an implementation of xattrs.
      
      Fixes: 04e6863b ("btrfs: split btrfs_setxattr calls regarding transaction")
      CC: stable@vger.kernel.org # 4.4+
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      3763771c
  11. 30 4月, 2019 11 次提交
    • A
      btrfs: drop local copy of inode i_mode · 44e5194b
      Anand Jain 提交于
      There isn't real use of making struct inode::i_mode a local copy, it
      saves a dereference one time, not much. Just use it directly.
      Signed-off-by: NAnand Jain <anand.jain@oracle.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      44e5194b
    • A
      btrfs: drop old_fsflags in btrfs_ioctl_setflags · 3c8d8b63
      Anand Jain 提交于
      btrfs_inode_flags_to_fsflags() is copied into @old_fsflags and used only
      once. Instead used it directly.
      Signed-off-by: NAnand Jain <anand.jain@oracle.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      3c8d8b63
    • A
      btrfs: modify local copy of btrfs_inode flags · d2b8fcfe
      Anand Jain 提交于
      Instead of updating the binode::flags directly, update a local copy, and
      then at the point of no error, store copy it to the binode::flags.
      Signed-off-by: NAnand Jain <anand.jain@oracle.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      d2b8fcfe
    • A
      btrfs: drop useless inode i_flags copy and restore · 11d3cd5c
      Anand Jain 提交于
      The patch ("btrfs: start transaction in btrfs_ioctl_setflags()") used
      btrfs_set_prop() instead of btrfs_set_prop_trans() by which now the
      inode::i_flags update functions such as
      btrfs_sync_inode_flags_to_i_flags() and btrfs_update_inode() is called
      in btrfs_ioctl_setflags() instead of
      btrfs_set_prop_trans()->btrfs_setxattr() as earlier. So the
      inode::i_flags remains unmodified until the thread has checked all the
      conditions. So drop the saved inode::i_flags in out_i_flags.
      Signed-off-by: NAnand Jain <anand.jain@oracle.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      11d3cd5c
    • A
      btrfs: start transaction in btrfs_ioctl_setflags() · ff9fef55
      Anand Jain 提交于
      Inode attribute can be set through the FS_IOC_SETFLAGS ioctl.  This
      flags also includes compression attribute for which we would set/reset
      the compression extended attribute. While doing this there is a bit of
      duplicate code, the following things happens twice:
      
      - start/end_transaction
      - inode_inc_iversion()
      - current_time update to inode->i_ctime
      - and btrfs_update_inode()
      
      These are updated both at btrfs_ioctl_setflags() and btrfs_set_props()
      as well.  This patch merges these two duplicate codes at
      btrfs_ioctl_setflags().
      Signed-off-by: NAnand Jain <anand.jain@oracle.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      ff9fef55
    • A
      btrfs: refactor btrfs_set_props to validate externally · f22125e5
      Anand Jain 提交于
      In preparation to merge multiple transactions when setting the
      compression flags, split btrfs_set_props() validation part outside of
      it.
      Signed-off-by: NAnand Jain <anand.jain@oracle.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      f22125e5
    • F
      Btrfs: fix race between send and deduplication that lead to failures and crashes · 62d54f3a
      Filipe Manana 提交于
      Send operates on read only trees and expects them to never change while it
      is using them. This is part of its initial design, and this expection is
      due to two different reasons:
      
      1) When it was introduced, no operations were allowed to modifiy read-only
         subvolumes/snapshots (including defrag for example).
      
      2) It keeps send from having an impact on other filesystem operations.
         Namely send does not need to keep locks on the trees nor needs to hold on
         to transaction handles and delay transaction commits. This ends up being
         a consequence of the former reason.
      
      However the deduplication feature was introduced later (on September 2013,
      while send was introduced in July 2012) and it allowed for deduplication
      with destination files that belong to read-only trees (subvolumes and
      snapshots).
      
      That means that having a send operation (either full or incremental) running
      in parallel with a deduplication that has the destination inode in one of
      the trees used by the send operation, can result in tree nodes and leaves
      getting freed and reused while send is using them. This problem is similar
      to the problem solved for the root nodes getting freed and reused when a
      snapshot is made against one tree that is currenly being used by a send
      operation, fixed in commits [1] and [2]. These commits explain in detail
      how the problem happens and the explanation is valid for any node or leaf
      that is not the root of a tree as well. This problem was also discussed
      and explained recently in a thread [3].
      
      The problem is very easy to reproduce when using send with large trees
      (snapshots) and just a few concurrent deduplication operations that target
      files in the trees used by send. A stress test case is being sent for
      fstests that triggers the issue easily. The most common error to hit is
      the send ioctl return -EIO with the following messages in dmesg/syslog:
      
       [1631617.204075] BTRFS error (device sdc): did not find backref in send_root. inode=63292, offset=0, disk_byte=5228134400 found extent=5228134400
       [1631633.251754] BTRFS error (device sdc): parent transid verify failed on 32243712 wanted 24 found 27
      
      The first one is very easy to hit while the second one happens much less
      frequently, except for very large trees (in that test case, snapshots
      with 100000 files having large xattrs to get deep and wide trees).
      Less frequently, at least one BUG_ON can be hit:
      
       [1631742.130080] ------------[ cut here ]------------
       [1631742.130625] kernel BUG at fs/btrfs/ctree.c:1806!
       [1631742.131188] invalid opcode: 0000 [#6] SMP DEBUG_PAGEALLOC PTI
       [1631742.131726] CPU: 1 PID: 13394 Comm: btrfs Tainted: G    B D W         5.0.0-rc8-btrfs-next-45 #1
       [1631742.132265] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014
       [1631742.133399] RIP: 0010:read_node_slot+0x122/0x130 [btrfs]
       (...)
       [1631742.135061] RSP: 0018:ffffb530021ebaa0 EFLAGS: 00010246
       [1631742.135615] RAX: ffff93ac8912e000 RBX: 000000000000009d RCX: 0000000000000002
       [1631742.136173] RDX: 000000000000009d RSI: ffff93ac564b0d08 RDI: ffff93ad5b48c000
       [1631742.136759] RBP: ffffb530021ebb7d R08: 0000000000000001 R09: ffffb530021ebb7d
       [1631742.137324] R10: ffffb530021eba70 R11: 0000000000000000 R12: ffff93ac87d0a708
       [1631742.137900] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000001
       [1631742.138455] FS:  00007f4cdb1528c0(0000) GS:ffff93ad76a80000(0000) knlGS:0000000000000000
       [1631742.139010] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       [1631742.139568] CR2: 00007f5acb3d0420 CR3: 000000012be3e006 CR4: 00000000003606e0
       [1631742.140131] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
       [1631742.140719] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
       [1631742.141272] Call Trace:
       [1631742.141826]  ? do_raw_spin_unlock+0x49/0xc0
       [1631742.142390]  tree_advance+0x173/0x1d0 [btrfs]
       [1631742.142948]  btrfs_compare_trees+0x268/0x690 [btrfs]
       [1631742.143533]  ? process_extent+0x1070/0x1070 [btrfs]
       [1631742.144088]  btrfs_ioctl_send+0x1037/0x1270 [btrfs]
       [1631742.144645]  _btrfs_ioctl_send+0x80/0x110 [btrfs]
       [1631742.145161]  ? trace_sched_stick_numa+0xe0/0xe0
       [1631742.145685]  btrfs_ioctl+0x13fe/0x3120 [btrfs]
       [1631742.146179]  ? account_entity_enqueue+0xd3/0x100
       [1631742.146662]  ? reweight_entity+0x154/0x1a0
       [1631742.147135]  ? update_curr+0x20/0x2a0
       [1631742.147593]  ? check_preempt_wakeup+0x103/0x250
       [1631742.148053]  ? do_vfs_ioctl+0xa2/0x6f0
       [1631742.148510]  ? btrfs_ioctl_get_supported_features+0x30/0x30 [btrfs]
       [1631742.148942]  do_vfs_ioctl+0xa2/0x6f0
       [1631742.149361]  ? __fget+0x113/0x200
       [1631742.149767]  ksys_ioctl+0x70/0x80
       [1631742.150159]  __x64_sys_ioctl+0x16/0x20
       [1631742.150543]  do_syscall_64+0x60/0x1b0
       [1631742.150931]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
       [1631742.151326] RIP: 0033:0x7f4cd9f5add7
       (...)
       [1631742.152509] RSP: 002b:00007ffe91017708 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
       [1631742.152892] RAX: ffffffffffffffda RBX: 0000000000000105 RCX: 00007f4cd9f5add7
       [1631742.153268] RDX: 00007ffe91017790 RSI: 0000000040489426 RDI: 0000000000000007
       [1631742.153633] RBP: 0000000000000007 R08: 00007f4cd9e79700 R09: 00007f4cd9e79700
       [1631742.153999] R10: 00007f4cd9e799d0 R11: 0000000000000202 R12: 0000000000000003
       [1631742.154365] R13: 0000555dfae53020 R14: 0000000000000000 R15: 0000000000000001
       (...)
       [1631742.156696] ---[ end trace 5dac9f96dcc3fd6b ]---
      
      That BUG_ON happens because while send is using a node, that node is COWed
      by a concurrent deduplication, gets freed and gets reused as a leaf (because
      a transaction commit happened in between), so when it attempts to read a
      slot from the extent buffer, at ctree.c:read_node_slot(), the extent buffer
      contents were wiped out and it now matches a leaf (which can even belong to
      some other tree now), hitting the BUG_ON(level == 0).
      
      Fix this concurrency issue by not allowing send and deduplication to run
      in parallel if both operate on the same readonly trees, returning EAGAIN
      to user space and logging an exlicit warning in dmesg/syslog.
      
      [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=be6821f82c3cc36e026f5afd10249988852b35ea
      [2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6f2f0b394b54e2b159ef969a0b5274e9bbf82ff2
      [3] https://lore.kernel.org/linux-btrfs/CAL3q7H7iqSEEyFaEtpRZw3cp613y+4k2Q8b4W7mweR3tZA05bQ@mail.gmail.com/
      
      CC: stable@vger.kernel.org # 4.4+
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      62d54f3a
    • Q
      btrfs: extent-tree: Use btrfs_ref to refactor btrfs_inc_extent_ref() · 82fa113f
      Qu Wenruo 提交于
      Use the new btrfs_ref structure and replace parameter list to clean up
      the usage of owner and level to distinguish the extent types.
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      82fa113f
    • G
      btrfs: Perform locking/unlocking in btrfs_remap_file_range() · 7984ae52
      Goldwyn Rodrigues 提交于
      Move code to make it more readable, so as locking and unlocking is
      done in the same function. The generic checks that are now performed in
      the locked section are unaffected.
      Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      7984ae52
    • A
      btrfs: refactor btrfs_set_prop and add btrfs_set_prop_trans · 262c96a3
      Anand Jain 提交于
      btrfs_set_prop() takes transaction pointer as the first argument,
      however in ioctl.c for the purpose of setting the compression property,
      we call btrfs_set_prop() with NULL transaction pointer. Down in
      the call chain  btrfs_setxattr() starts transaction to update the
      attribute and also to update the inode.
      
      So for clarity, create btrfs_set_prop_trans() with no transaction
      pointer as argument, in preparation to start transaction here instead of
      doing it down the call chain at btrfs_setxattr().
      
      Also now the btrfs_set_prop() is a static function.
      Signed-off-by: NAnand Jain <anand.jain@oracle.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      262c96a3
    • A
      btrfs: merge _btrfs_set_prop helpers · 7715da84
      Anand Jain 提交于
      btrfs_set_prop() is a redirect to __btrfs_set_prop() with the
      transaction handle equal to NULL.  __btrfs_set_prop() in turn passes
      this to do_setxattr() which then transaction is actually created.
      
      Instead merge  __btrfs_set_prop() to btrfs_set_prop(), and update the
      caller with NULL argument.
      Signed-off-by: NAnand Jain <anand.jain@oracle.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      7715da84
  12. 29 3月, 2019 1 次提交
  13. 27 2月, 2019 1 次提交
    • F
      Btrfs: fix deadlock between clone/dedupe and rename · 4ea748e1
      Filipe Manana 提交于
      Reflinking (clone/dedupe) and rename are operations that operate on two
      inodes and therefore need to lock them in the same order to avoid ABBA
      deadlocks. It happens that Btrfs' reflink implementation always locked
      them in a different order from VFS's lock_two_nondirectories() helper,
      which is used by the rename code in VFS, resulting in ABBA type deadlocks.
      
      Btrfs' locking order:
      
        static void btrfs_double_inode_lock(struct inode *inode1, struct inode *inode2)
        {
               if (inode1 < inode2)
                      swap(inode1, inode2);
      
               inode_lock_nested(inode1, I_MUTEX_PARENT);
               inode_lock_nested(inode2, I_MUTEX_CHILD);
        }
      
      VFS's locking order:
      
        void lock_two_nondirectories(struct inode *inode1, struct inode *inode2)
        {
              if (inode1 > inode2)
                      swap(inode1, inode2);
      
              if (inode1 && !S_ISDIR(inode1->i_mode))
                      inode_lock(inode1);
              if (inode2 && !S_ISDIR(inode2->i_mode) && inode2 != inode1)
                      inode_lock_nested(inode2, I_MUTEX_NONDIR2);
      }
      
      Fix this by killing the btrfs helper function that does the double inode
      locking and replace it with VFS's helper lock_two_nondirectories().
      Reported-by: NZygo Blaxell <ce3g8jdj@umail.furryterror.org>
      Fixes: 416161db ("btrfs: offline dedupe")
      CC: stable@vger.kernel.org # 4.4+
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      4ea748e1
  14. 25 2月, 2019 4 次提交