1. 22 6月, 2021 2 次提交
    • F
      btrfs: send: fix crash when memory allocations trigger reclaim · 35b22c19
      Filipe Manana 提交于
      When doing a send we don't expect the task to ever start a transaction
      after the initial check that verifies if commit roots match the regular
      roots. This is because after that we set current->journal_info with a
      stub (special value) that signals we are in send context, so that we take
      a read lock on an extent buffer when reading it from disk and verifying
      it is valid (its generation matches the generation stored in the parent).
      This stub was introduced in 2014 by commit a26e8c9f ("Btrfs: don't
      clear uptodate if the eb is under IO") in order to fix a concurrency issue
      between send and balance.
      
      However there is one particular exception where we end up needing to start
      a transaction and when this happens it results in a crash with a stack
      trace like the following:
      
      [60015.902283] kernel: WARNING: CPU: 3 PID: 58159 at arch/x86/include/asm/kfence.h:44 kfence_protect_page+0x21/0x80
      [60015.902292] kernel: Modules linked in: uinput rfcomm snd_seq_dummy (...)
      [60015.902384] kernel: CPU: 3 PID: 58159 Comm: btrfs Not tainted 5.12.9-300.fc34.x86_64 #1
      [60015.902387] kernel: Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./F2A88XN-WIFI, BIOS F6 12/24/2015
      [60015.902389] kernel: RIP: 0010:kfence_protect_page+0x21/0x80
      [60015.902393] kernel: Code: ff 0f 1f 84 00 00 00 00 00 55 48 89 fd (...)
      [60015.902396] kernel: RSP: 0018:ffff9fb583453220 EFLAGS: 00010246
      [60015.902399] kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff9fb583453224
      [60015.902401] kernel: RDX: ffff9fb583453224 RSI: 0000000000000000 RDI: 0000000000000000
      [60015.902402] kernel: RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
      [60015.902404] kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000002
      [60015.902406] kernel: R13: ffff9fb583453348 R14: 0000000000000000 R15: 0000000000000001
      [60015.902408] kernel: FS:  00007f158e62d8c0(0000) GS:ffff93bd37580000(0000) knlGS:0000000000000000
      [60015.902410] kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [60015.902412] kernel: CR2: 0000000000000039 CR3: 00000001256d2000 CR4: 00000000000506e0
      [60015.902414] kernel: Call Trace:
      [60015.902419] kernel:  kfence_unprotect+0x13/0x30
      [60015.902423] kernel:  page_fault_oops+0x89/0x270
      [60015.902427] kernel:  ? search_module_extables+0xf/0x40
      [60015.902431] kernel:  ? search_bpf_extables+0x57/0x70
      [60015.902435] kernel:  kernelmode_fixup_or_oops+0xd6/0xf0
      [60015.902437] kernel:  __bad_area_nosemaphore+0x142/0x180
      [60015.902440] kernel:  exc_page_fault+0x67/0x150
      [60015.902445] kernel:  asm_exc_page_fault+0x1e/0x30
      [60015.902450] kernel: RIP: 0010:start_transaction+0x71/0x580
      [60015.902454] kernel: Code: d3 0f 84 92 00 00 00 80 e7 06 0f 85 63 (...)
      [60015.902456] kernel: RSP: 0018:ffff9fb5834533f8 EFLAGS: 00010246
      [60015.902458] kernel: RAX: 0000000000000001 RBX: 0000000000000001 RCX: 0000000000000000
      [60015.902460] kernel: RDX: 0000000000000801 RSI: 0000000000000000 RDI: 0000000000000039
      [60015.902462] kernel: RBP: ffff93bc0a7eb800 R08: 0000000000000001 R09: 0000000000000000
      [60015.902463] kernel: R10: 0000000000098a00 R11: 0000000000000001 R12: 0000000000000001
      [60015.902464] kernel: R13: 0000000000000000 R14: ffff93bc0c92b000 R15: ffff93bc0c92b000
      [60015.902468] kernel:  btrfs_commit_inode_delayed_inode+0x5d/0x120
      [60015.902473] kernel:  btrfs_evict_inode+0x2c5/0x3f0
      [60015.902476] kernel:  evict+0xd1/0x180
      [60015.902480] kernel:  inode_lru_isolate+0xe7/0x180
      [60015.902483] kernel:  __list_lru_walk_one+0x77/0x150
      [60015.902487] kernel:  ? iput+0x1a0/0x1a0
      [60015.902489] kernel:  ? iput+0x1a0/0x1a0
      [60015.902491] kernel:  list_lru_walk_one+0x47/0x70
      [60015.902495] kernel:  prune_icache_sb+0x39/0x50
      [60015.902497] kernel:  super_cache_scan+0x161/0x1f0
      [60015.902501] kernel:  do_shrink_slab+0x142/0x240
      [60015.902505] kernel:  shrink_slab+0x164/0x280
      [60015.902509] kernel:  shrink_node+0x2c8/0x6e0
      [60015.902512] kernel:  do_try_to_free_pages+0xcb/0x4b0
      [60015.902514] kernel:  try_to_free_pages+0xda/0x190
      [60015.902516] kernel:  __alloc_pages_slowpath.constprop.0+0x373/0xcc0
      [60015.902521] kernel:  ? __memcg_kmem_charge_page+0xc2/0x1e0
      [60015.902525] kernel:  __alloc_pages_nodemask+0x30a/0x340
      [60015.902528] kernel:  pipe_write+0x30b/0x5c0
      [60015.902531] kernel:  ? set_next_entity+0xad/0x1e0
      [60015.902534] kernel:  ? switch_mm_irqs_off+0x58/0x440
      [60015.902538] kernel:  __kernel_write+0x13a/0x2b0
      [60015.902541] kernel:  kernel_write+0x73/0x150
      [60015.902543] kernel:  send_cmd+0x7b/0xd0
      [60015.902545] kernel:  send_extent_data+0x5a3/0x6b0
      [60015.902549] kernel:  process_extent+0x19b/0xed0
      [60015.902551] kernel:  btrfs_ioctl_send+0x1434/0x17e0
      [60015.902554] kernel:  ? _btrfs_ioctl_send+0xe1/0x100
      [60015.902557] kernel:  _btrfs_ioctl_send+0xbf/0x100
      [60015.902559] kernel:  ? enqueue_entity+0x18c/0x7b0
      [60015.902562] kernel:  btrfs_ioctl+0x185f/0x2f80
      [60015.902564] kernel:  ? psi_task_change+0x84/0xc0
      [60015.902569] kernel:  ? _flat_send_IPI_mask+0x21/0x40
      [60015.902572] kernel:  ? check_preempt_curr+0x2f/0x70
      [60015.902576] kernel:  ? selinux_file_ioctl+0x137/0x1e0
      [60015.902579] kernel:  ? expand_files+0x1cb/0x1d0
      [60015.902582] kernel:  ? __x64_sys_ioctl+0x82/0xb0
      [60015.902585] kernel:  __x64_sys_ioctl+0x82/0xb0
      [60015.902588] kernel:  do_syscall_64+0x33/0x40
      [60015.902591] kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xae
      [60015.902595] kernel: RIP: 0033:0x7f158e38f0ab
      [60015.902599] kernel: Code: ff ff ff 85 c0 79 9b (...)
      [60015.902602] kernel: RSP: 002b:00007ffcb2519bf8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
      [60015.902605] kernel: RAX: ffffffffffffffda RBX: 00007ffcb251ae00 RCX: 00007f158e38f0ab
      [60015.902607] kernel: RDX: 00007ffcb2519cf0 RSI: 0000000040489426 RDI: 0000000000000004
      [60015.902608] kernel: RBP: 0000000000000004 R08: 00007f158e297640 R09: 00007f158e297640
      [60015.902610] kernel: R10: 0000000000000008 R11: 0000000000000246 R12: 0000000000000000
      [60015.902612] kernel: R13: 0000000000000002 R14: 00007ffcb251aee0 R15: 0000558c1a83e2a0
      [60015.902615] kernel: ---[ end trace 7bbc33e23bb887ae ]---
      
      This happens because when writing to the pipe, by calling kernel_write(),
      we end up doing page allocations using GFP_HIGHUSER | __GFP_ACCOUNT as the
      gfp flags, which allow reclaim to happen if there is memory pressure. This
      allocation happens at fs/pipe.c:pipe_write().
      
      If the reclaim is triggered, inode eviction can be triggered and that in
      turn can result in starting a transaction if the inode has a link count
      of 0. The transaction start happens early on during eviction, when we call
      btrfs_commit_inode_delayed_inode() at btrfs_evict_inode(). This happens if
      there is currently an open file descriptor for an inode with a link count
      of 0 and the reclaim task gets a reference on the inode before that
      descriptor is closed, in which case the reclaim task ends up doing the
      final iput that triggers the inode eviction.
      
      When we have assertions enabled (CONFIG_BTRFS_ASSERT=y), this triggers
      the following assertion at transaction.c:start_transaction():
      
          /* Send isn't supposed to start transactions. */
          ASSERT(current->journal_info != BTRFS_SEND_TRANS_STUB);
      
      And when assertions are not enabled, it triggers a crash since after that
      assertion we cast current->journal_info into a transaction handle pointer
      and then dereference it:
      
         if (current->journal_info) {
             WARN_ON(type & TRANS_EXTWRITERS);
             h = current->journal_info;
             refcount_inc(&h->use_count);
             (...)
      
      Which obviously results in a crash due to an invalid memory access.
      
      The same type of issue can happen during other memory allocations we
      do directly in the send code with kmalloc (and friends) as they use
      GFP_KERNEL and therefore may trigger reclaim too, which started to
      happen since 2016 after commit e780b0d1 ("btrfs: send: use
      GFP_KERNEL everywhere").
      
      The issue could be solved by setting up a NOFS context for the entire
      send operation so that reclaim could not be triggered when allocating
      memory or pages through kernel_write(). However that is not very friendly
      and we can in fact get rid of the send stub because:
      
      1) The stub was introduced way back in 2014 by commit a26e8c9f
         ("Btrfs: don't clear uptodate if the eb is under IO") to solve an
         issue exclusive to when send and balance are running in parallel,
         however there were other problems between balance and send and we do
         not allow anymore to have balance and send run concurrently since
         commit 9e967495 ("Btrfs: prevent send failures and crashes due
         to concurrent relocation"). More generically the issues are between
         send and relocation, and that last commit eliminated only the
         possibility of having send and balance run concurrently, but shrinking
         a device also can trigger relocation, and on zoned filesystems we have
         relocation of partially used block groups triggered automatically as
         well. The previous patch that has a subject of:
      
         "btrfs: ensure relocation never runs while we have send operations running"
      
         Addresses all the remaining cases that can trigger relocation.
      
      2) We can actually allow starting and even committing transactions while
         in a send context if needed because send is not holding any locks that
         would block the start or the commit of a transaction.
      
      So get rid of all the logic added by commit a26e8c9f ("Btrfs: don't
      clear uptodate if the eb is under IO"). We can now always call
      clear_extent_buffer_uptodate() at verify_parent_transid() since send is
      the only case that uses commit roots without having a transaction open or
      without holding the commit_root_sem.
      Reported-by: NChris Murphy <lists@colorremedies.com>
      Link: https://lore.kernel.org/linux-btrfs/CAJCQCtRQ57=qXo3kygwpwEBOU_CA_eKvdmjP52sU=eFvuVOEGw@mail.gmail.com/Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      35b22c19
    • D
      btrfs: fix typos in comments · 1a9fd417
      David Sterba 提交于
      Fix typos that have snuck in since the last round. Found by codespell.
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      1a9fd417
  2. 29 4月, 2021 2 次提交
    • F
      btrfs: fix deadlock when cloning inline extents and using qgroups · f9baa501
      Filipe Manana 提交于
      There are a few exceptional cases where cloning an inline extent needs to
      copy the inline extent data into a page of the destination inode.
      
      When this happens, we end up starting a transaction while having a dirty
      page for the destination inode and while having the range locked in the
      destination's inode iotree too. Because when reserving metadata space
      for a transaction we may need to flush existing delalloc in case there is
      not enough free space, we have a mechanism in place to prevent a deadlock,
      which was introduced in commit 3d45f221 ("btrfs: fix deadlock when
      cloning inline extent and low on free metadata space").
      
      However when using qgroups, a transaction also reserves metadata qgroup
      space, which can also result in flushing delalloc in case there is not
      enough available space at the moment. When this happens we deadlock, since
      flushing delalloc requires locking the file range in the inode's iotree
      and the range was already locked at the very beginning of the clone
      operation, before attempting to start the transaction.
      
      When this issue happens, stack traces like the following are reported:
      
        [72747.556262] task:kworker/u81:9   state:D stack:    0 pid:  225 ppid:     2 flags:0x00004000
        [72747.556268] Workqueue: writeback wb_workfn (flush-btrfs-1142)
        [72747.556271] Call Trace:
        [72747.556273]  __schedule+0x296/0x760
        [72747.556277]  schedule+0x3c/0xa0
        [72747.556279]  io_schedule+0x12/0x40
        [72747.556284]  __lock_page+0x13c/0x280
        [72747.556287]  ? generic_file_readonly_mmap+0x70/0x70
        [72747.556325]  extent_write_cache_pages+0x22a/0x440 [btrfs]
        [72747.556331]  ? __set_page_dirty_nobuffers+0xe7/0x160
        [72747.556358]  ? set_extent_buffer_dirty+0x5e/0x80 [btrfs]
        [72747.556362]  ? update_group_capacity+0x25/0x210
        [72747.556366]  ? cpumask_next_and+0x1a/0x20
        [72747.556391]  extent_writepages+0x44/0xa0 [btrfs]
        [72747.556394]  do_writepages+0x41/0xd0
        [72747.556398]  __writeback_single_inode+0x39/0x2a0
        [72747.556403]  writeback_sb_inodes+0x1ea/0x440
        [72747.556407]  __writeback_inodes_wb+0x5f/0xc0
        [72747.556410]  wb_writeback+0x235/0x2b0
        [72747.556414]  ? get_nr_inodes+0x35/0x50
        [72747.556417]  wb_workfn+0x354/0x490
        [72747.556420]  ? newidle_balance+0x2c5/0x3e0
        [72747.556424]  process_one_work+0x1aa/0x340
        [72747.556426]  worker_thread+0x30/0x390
        [72747.556429]  ? create_worker+0x1a0/0x1a0
        [72747.556432]  kthread+0x116/0x130
        [72747.556435]  ? kthread_park+0x80/0x80
        [72747.556438]  ret_from_fork+0x1f/0x30
      
        [72747.566958] Workqueue: btrfs-flush_delalloc btrfs_work_helper [btrfs]
        [72747.566961] Call Trace:
        [72747.566964]  __schedule+0x296/0x760
        [72747.566968]  ? finish_wait+0x80/0x80
        [72747.566970]  schedule+0x3c/0xa0
        [72747.566995]  wait_extent_bit.constprop.68+0x13b/0x1c0 [btrfs]
        [72747.566999]  ? finish_wait+0x80/0x80
        [72747.567024]  lock_extent_bits+0x37/0x90 [btrfs]
        [72747.567047]  btrfs_invalidatepage+0x299/0x2c0 [btrfs]
        [72747.567051]  ? find_get_pages_range_tag+0x2cd/0x380
        [72747.567076]  __extent_writepage+0x203/0x320 [btrfs]
        [72747.567102]  extent_write_cache_pages+0x2bb/0x440 [btrfs]
        [72747.567106]  ? update_load_avg+0x7e/0x5f0
        [72747.567109]  ? enqueue_entity+0xf4/0x6f0
        [72747.567134]  extent_writepages+0x44/0xa0 [btrfs]
        [72747.567137]  ? enqueue_task_fair+0x93/0x6f0
        [72747.567140]  do_writepages+0x41/0xd0
        [72747.567144]  __filemap_fdatawrite_range+0xc7/0x100
        [72747.567167]  btrfs_run_delalloc_work+0x17/0x40 [btrfs]
        [72747.567195]  btrfs_work_helper+0xc2/0x300 [btrfs]
        [72747.567200]  process_one_work+0x1aa/0x340
        [72747.567202]  worker_thread+0x30/0x390
        [72747.567205]  ? create_worker+0x1a0/0x1a0
        [72747.567208]  kthread+0x116/0x130
        [72747.567211]  ? kthread_park+0x80/0x80
        [72747.567214]  ret_from_fork+0x1f/0x30
      
        [72747.569686] task:fsstress        state:D stack:    0 pid:841421 ppid:841417 flags:0x00000000
        [72747.569689] Call Trace:
        [72747.569691]  __schedule+0x296/0x760
        [72747.569694]  schedule+0x3c/0xa0
        [72747.569721]  try_flush_qgroup+0x95/0x140 [btrfs]
        [72747.569725]  ? finish_wait+0x80/0x80
        [72747.569753]  btrfs_qgroup_reserve_data+0x34/0x50 [btrfs]
        [72747.569781]  btrfs_check_data_free_space+0x5f/0xa0 [btrfs]
        [72747.569804]  btrfs_buffered_write+0x1f7/0x7f0 [btrfs]
        [72747.569810]  ? path_lookupat.isra.48+0x97/0x140
        [72747.569833]  btrfs_file_write_iter+0x81/0x410 [btrfs]
        [72747.569836]  ? __kmalloc+0x16a/0x2c0
        [72747.569839]  do_iter_readv_writev+0x160/0x1c0
        [72747.569843]  do_iter_write+0x80/0x1b0
        [72747.569847]  vfs_writev+0x84/0x140
        [72747.569869]  ? btrfs_file_llseek+0x38/0x270 [btrfs]
        [72747.569873]  do_writev+0x65/0x100
        [72747.569876]  do_syscall_64+0x33/0x40
        [72747.569879]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
        [72747.569899] task:fsstress        state:D stack:    0 pid:841424 ppid:841417 flags:0x00004000
        [72747.569903] Call Trace:
        [72747.569906]  __schedule+0x296/0x760
        [72747.569909]  schedule+0x3c/0xa0
        [72747.569936]  try_flush_qgroup+0x95/0x140 [btrfs]
        [72747.569940]  ? finish_wait+0x80/0x80
        [72747.569967]  __btrfs_qgroup_reserve_meta+0x36/0x50 [btrfs]
        [72747.569989]  start_transaction+0x279/0x580 [btrfs]
        [72747.570014]  clone_copy_inline_extent+0x332/0x490 [btrfs]
        [72747.570041]  btrfs_clone+0x5b7/0x7a0 [btrfs]
        [72747.570068]  ? lock_extent_bits+0x64/0x90 [btrfs]
        [72747.570095]  btrfs_clone_files+0xfc/0x150 [btrfs]
        [72747.570122]  btrfs_remap_file_range+0x3d8/0x4a0 [btrfs]
        [72747.570126]  do_clone_file_range+0xed/0x200
        [72747.570131]  vfs_clone_file_range+0x37/0x110
        [72747.570134]  ioctl_file_clone+0x7d/0xb0
        [72747.570137]  do_vfs_ioctl+0x138/0x630
        [72747.570140]  __x64_sys_ioctl+0x62/0xc0
        [72747.570143]  do_syscall_64+0x33/0x40
        [72747.570146]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      So fix this by skipping the flush of delalloc for an inode that is
      flagged with BTRFS_INODE_NO_DELALLOC_FLUSH, meaning it is currently under
      such a special case of cloning an inline extent, when flushing delalloc
      during qgroup metadata reservation.
      
      The special cases for cloning inline extents were added in kernel 5.7 by
      by commit 05a5a762 ("Btrfs: implement full reflink support for
      inline extents"), while having qgroup metadata space reservation flushing
      delalloc when low on space was added in kernel 5.9 by commit
      c53e9653 ("btrfs: qgroup: try to flush qgroup space when we get
      -EDQUOT"). So use a "Fixes:" tag for the later commit to ease stable
      kernel backports.
      Reported-by: NWang Yugui <wangyugui@e16-tech.com>
      Link: https://lore.kernel.org/linux-btrfs/20210421083137.31E3.409509F4@e16-tech.com/
      Fixes: c53e9653 ("btrfs: qgroup: try to flush qgroup space when we get -EDQUOT")
      CC: stable@vger.kernel.org # 5.9+
      Reviewed-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      f9baa501
    • F
      btrfs: do not consider send context as valid when trying to flush qgroups · ffb7c2e9
      Filipe Manana 提交于
      At qgroup.c:try_flush_qgroup() we are asserting that current->journal_info
      is either NULL or has the value BTRFS_SEND_TRANS_STUB.
      
      However allowing for BTRFS_SEND_TRANS_STUB makes no sense because:
      
      1) It is misleading, because send operations are read-only and do not
         ever need to reserve qgroup space;
      
      2) We already assert that current->journal_info != BTRFS_SEND_TRANS_STUB
         at transaction.c:start_transaction();
      
      3) On a kernel without CONFIG_BTRFS_ASSERT=y set, it would result in
         a crash if try_flush_qgroup() is ever called in a send context, because
         at transaction.c:start_transaction we cast current->journal_info into
         a struct btrfs_trans_handle pointer and then dereference it.
      
      So just do allow a send context at try_flush_qgroup() and update the
      comment about it.
      Reviewed-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      ffb7c2e9
  3. 19 4月, 2021 3 次提交
  4. 18 3月, 2021 1 次提交
    • F
      btrfs: fix sleep while in non-sleep context during qgroup removal · 0bb78830
      Filipe Manana 提交于
      While removing a qgroup's sysfs entry we end up taking the kernfs_mutex,
      through kobject_del(), while holding the fs_info->qgroup_lock spinlock,
      producing the following trace:
      
        [821.843637] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:281
        [821.843641] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 28214, name: podman
        [821.843644] CPU: 3 PID: 28214 Comm: podman Tainted: G        W         5.11.6 #15
        [821.843646] Hardware name: Dell Inc. PowerEdge R330/084XW4, BIOS 2.11.0 12/08/2020
        [821.843647] Call Trace:
        [821.843650]  dump_stack+0xa1/0xfb
        [821.843656]  ___might_sleep+0x144/0x160
        [821.843659]  mutex_lock+0x17/0x40
        [821.843662]  kernfs_remove_by_name_ns+0x1f/0x80
        [821.843666]  sysfs_remove_group+0x7d/0xe0
        [821.843668]  sysfs_remove_groups+0x28/0x40
        [821.843670]  kobject_del+0x2a/0x80
        [821.843672]  btrfs_sysfs_del_one_qgroup+0x2b/0x40 [btrfs]
        [821.843685]  __del_qgroup_rb+0x12/0x150 [btrfs]
        [821.843696]  btrfs_remove_qgroup+0x288/0x2a0 [btrfs]
        [821.843707]  btrfs_ioctl+0x3129/0x36a0 [btrfs]
        [821.843717]  ? __mod_lruvec_page_state+0x5e/0xb0
        [821.843719]  ? page_add_new_anon_rmap+0xbc/0x150
        [821.843723]  ? kfree+0x1b4/0x300
        [821.843725]  ? mntput_no_expire+0x55/0x330
        [821.843728]  __x64_sys_ioctl+0x5a/0xa0
        [821.843731]  do_syscall_64+0x33/0x70
        [821.843733]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
        [821.843736] RIP: 0033:0x4cd3fb
        [821.843741] RSP: 002b:000000c000906b20 EFLAGS: 00000206 ORIG_RAX: 0000000000000010
        [821.843744] RAX: ffffffffffffffda RBX: 000000c000050000 RCX: 00000000004cd3fb
        [821.843745] RDX: 000000c000906b98 RSI: 000000004010942a RDI: 000000000000000f
        [821.843747] RBP: 000000c000907cd0 R08: 000000c000622901 R09: 0000000000000000
        [821.843748] R10: 000000c000d992c0 R11: 0000000000000206 R12: 000000000000012d
        [821.843749] R13: 000000000000012c R14: 0000000000000200 R15: 0000000000000049
      
      Fix this by removing the qgroup sysfs entry while not holding the spinlock,
      since the spinlock is only meant for protection of the qgroup rbtree.
      Reported-by: NStuart Shelton <srcshelton@gmail.com>
      Link: https://lore.kernel.org/linux-btrfs/7A5485BB-0628-419D-A4D3-27B1AF47E25A@gmail.com/
      Fixes: 49e5fb46 ("btrfs: qgroup: export qgroups in sysfs")
      CC: stable@vger.kernel.org # 5.10+
      Reviewed-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      0bb78830
  5. 02 3月, 2021 1 次提交
  6. 18 12月, 2020 2 次提交
    • F
      btrfs: fix transaction leak and crash after RO remount caused by qgroup rescan · cb13eea3
      Filipe Manana 提交于
      If we remount a filesystem in RO mode while the qgroup rescan worker is
      running, we can end up having it still running after the remount is done,
      and at unmount time we may end up with an open transaction that ends up
      never getting committed. If that happens we end up with several memory
      leaks and can crash when hardware acceleration is unavailable for crc32c.
      Possibly it can lead to other nasty surprises too, due to use-after-free
      issues.
      
      The following steps explain how the problem happens.
      
      1) We have a filesystem mounted in RW mode and the qgroup rescan worker is
         running;
      
      2) We remount the filesystem in RO mode, and never stop/pause the rescan
         worker, so after the remount the rescan worker is still running. The
         important detail here is that the rescan task is still running after
         the remount operation committed any ongoing transaction through its
         call to btrfs_commit_super();
      
      3) The rescan is still running, and after the remount completed, the
         rescan worker started a transaction, after it finished iterating all
         leaves of the extent tree, to update the qgroup status item in the
         quotas tree. It does not commit the transaction, it only releases its
         handle on the transaction;
      
      4) A filesystem unmount operation starts shortly after;
      
      5) The unmount task, at close_ctree(), stops the transaction kthread,
         which had not had a chance to commit the open transaction since it was
         sleeping and the commit interval (default of 30 seconds) has not yet
         elapsed since the last time it committed a transaction;
      
      6) So after stopping the transaction kthread we still have the transaction
         used to update the qgroup status item open. At close_ctree(), when the
         filesystem is in RO mode and no transaction abort happened (or the
         filesystem is in error mode), we do not expect to have any transaction
         open, so we do not call btrfs_commit_super();
      
      7) We then proceed to destroy the work queues, free the roots and block
         groups, etc. After that we drop the last reference on the btree inode
         by calling iput() on it. Since there are dirty pages for the btree
         inode, corresponding to the COWed extent buffer for the quotas btree,
         btree_write_cache_pages() is invoked to flush those dirty pages. This
         results in creating a bio and submitting it, which makes us end up at
         btrfs_submit_metadata_bio();
      
      8) At btrfs_submit_metadata_bio() we end up at the if-then-else branch
         that calls btrfs_wq_submit_bio(), because check_async_write() returned
         a value of 1. This value of 1 is because we did not have hardware
         acceleration available for crc32c, so BTRFS_FS_CSUM_IMPL_FAST was not
         set in fs_info->flags;
      
      9) Then at btrfs_wq_submit_bio() we call btrfs_queue_work() against the
         workqueue at fs_info->workers, which was already freed before by the
         call to btrfs_stop_all_workers() at close_ctree(). This results in an
         invalid memory access due to a use-after-free, leading to a crash.
      
      When this happens, before the crash there are several warnings triggered,
      since we have reserved metadata space in a block group, the delayed refs
      reservation, etc:
      
        ------------[ cut here ]------------
        WARNING: CPU: 4 PID: 1729896 at fs/btrfs/block-group.c:125 btrfs_put_block_group+0x63/0xa0 [btrfs]
        Modules linked in: btrfs dm_snapshot dm_thin_pool (...)
        CPU: 4 PID: 1729896 Comm: umount Tainted: G    B   W         5.10.0-rc4-btrfs-next-73 #1
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
        RIP: 0010:btrfs_put_block_group+0x63/0xa0 [btrfs]
        Code: f0 01 00 00 48 39 c2 75 (...)
        RSP: 0018:ffffb270826bbdd8 EFLAGS: 00010206
        RAX: 0000000000000001 RBX: ffff947ed73e4000 RCX: ffff947ebc8b29c8
        RDX: 0000000000000001 RSI: ffffffffc0b150a0 RDI: ffff947ebc8b2800
        RBP: ffff947ebc8b2800 R08: 0000000000000000 R09: 0000000000000000
        R10: 0000000000000000 R11: 0000000000000001 R12: ffff947ed73e4110
        R13: ffff947ed73e4160 R14: ffff947ebc8b2988 R15: dead000000000100
        FS:  00007f15edfea840(0000) GS:ffff9481ad600000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 00007f37e2893320 CR3: 0000000138f68001 CR4: 00000000003706e0
        DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        Call Trace:
         btrfs_free_block_groups+0x17f/0x2f0 [btrfs]
         close_ctree+0x2ba/0x2fa [btrfs]
         generic_shutdown_super+0x6c/0x100
         kill_anon_super+0x14/0x30
         btrfs_kill_super+0x12/0x20 [btrfs]
         deactivate_locked_super+0x31/0x70
         cleanup_mnt+0x100/0x160
         task_work_run+0x68/0xb0
         exit_to_user_mode_prepare+0x1bb/0x1c0
         syscall_exit_to_user_mode+0x4b/0x260
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
        RIP: 0033:0x7f15ee221ee7
        Code: ff 0b 00 f7 d8 64 89 01 48 (...)
        RSP: 002b:00007ffe9470f0f8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
        RAX: 0000000000000000 RBX: 00007f15ee347264 RCX: 00007f15ee221ee7
        RDX: ffffffffffffff78 RSI: 0000000000000000 RDI: 000056169701d000
        RBP: 0000561697018a30 R08: 0000000000000000 R09: 00007f15ee2e2be0
        R10: 000056169701efe0 R11: 0000000000000246 R12: 0000000000000000
        R13: 000056169701d000 R14: 0000561697018b40 R15: 0000561697018c60
        irq event stamp: 0
        hardirqs last  enabled at (0): [<0000000000000000>] 0x0
        hardirqs last disabled at (0): [<ffffffff8bcae560>] copy_process+0x8a0/0x1d70
        softirqs last  enabled at (0): [<ffffffff8bcae560>] copy_process+0x8a0/0x1d70
        softirqs last disabled at (0): [<0000000000000000>] 0x0
        ---[ end trace dd74718fef1ed5c6 ]---
        ------------[ cut here ]------------
        WARNING: CPU: 2 PID: 1729896 at fs/btrfs/block-rsv.c:459 btrfs_release_global_block_rsv+0x70/0xc0 [btrfs]
        Modules linked in: btrfs dm_snapshot dm_thin_pool (...)
        CPU: 2 PID: 1729896 Comm: umount Tainted: G    B   W         5.10.0-rc4-btrfs-next-73 #1
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
        RIP: 0010:btrfs_release_global_block_rsv+0x70/0xc0 [btrfs]
        Code: 48 83 bb b0 03 00 00 00 (...)
        RSP: 0018:ffffb270826bbdd8 EFLAGS: 00010206
        RAX: 000000000033c000 RBX: ffff947ed73e4000 RCX: 0000000000000000
        RDX: 0000000000000001 RSI: ffffffffc0b0d8c1 RDI: 00000000ffffffff
        RBP: ffff947ebc8b7000 R08: 0000000000000001 R09: 0000000000000000
        R10: 0000000000000000 R11: 0000000000000001 R12: ffff947ed73e4110
        R13: ffff947ed73e5278 R14: dead000000000122 R15: dead000000000100
        FS:  00007f15edfea840(0000) GS:ffff9481aca00000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 0000561a79f76e20 CR3: 0000000138f68006 CR4: 00000000003706e0
        DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        Call Trace:
         btrfs_free_block_groups+0x24c/0x2f0 [btrfs]
         close_ctree+0x2ba/0x2fa [btrfs]
         generic_shutdown_super+0x6c/0x100
         kill_anon_super+0x14/0x30
         btrfs_kill_super+0x12/0x20 [btrfs]
         deactivate_locked_super+0x31/0x70
         cleanup_mnt+0x100/0x160
         task_work_run+0x68/0xb0
         exit_to_user_mode_prepare+0x1bb/0x1c0
         syscall_exit_to_user_mode+0x4b/0x260
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
        RIP: 0033:0x7f15ee221ee7
        Code: ff 0b 00 f7 d8 64 89 01 (...)
        RSP: 002b:00007ffe9470f0f8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
        RAX: 0000000000000000 RBX: 00007f15ee347264 RCX: 00007f15ee221ee7
        RDX: ffffffffffffff78 RSI: 0000000000000000 RDI: 000056169701d000
        RBP: 0000561697018a30 R08: 0000000000000000 R09: 00007f15ee2e2be0
        R10: 000056169701efe0 R11: 0000000000000246 R12: 0000000000000000
        R13: 000056169701d000 R14: 0000561697018b40 R15: 0000561697018c60
        irq event stamp: 0
        hardirqs last  enabled at (0): [<0000000000000000>] 0x0
        hardirqs last disabled at (0): [<ffffffff8bcae560>] copy_process+0x8a0/0x1d70
        softirqs last  enabled at (0): [<ffffffff8bcae560>] copy_process+0x8a0/0x1d70
        softirqs last disabled at (0): [<0000000000000000>] 0x0
        ---[ end trace dd74718fef1ed5c7 ]---
        ------------[ cut here ]------------
        WARNING: CPU: 2 PID: 1729896 at fs/btrfs/block-group.c:3377 btrfs_free_block_groups+0x25d/0x2f0 [btrfs]
        Modules linked in: btrfs dm_snapshot dm_thin_pool (...)
        CPU: 5 PID: 1729896 Comm: umount Tainted: G    B   W         5.10.0-rc4-btrfs-next-73 #1
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
        RIP: 0010:btrfs_free_block_groups+0x25d/0x2f0 [btrfs]
        Code: ad de 49 be 22 01 00 (...)
        RSP: 0018:ffffb270826bbde8 EFLAGS: 00010206
        RAX: ffff947ebeae1d08 RBX: ffff947ed73e4000 RCX: 0000000000000000
        RDX: 0000000000000001 RSI: ffff947e9d823ae8 RDI: 0000000000000246
        RBP: ffff947ebeae1d08 R08: 0000000000000000 R09: 0000000000000000
        R10: 0000000000000000 R11: 0000000000000001 R12: ffff947ebeae1c00
        R13: ffff947ed73e5278 R14: dead000000000122 R15: dead000000000100
        FS:  00007f15edfea840(0000) GS:ffff9481ad200000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 00007f1475d98ea8 CR3: 0000000138f68005 CR4: 00000000003706e0
        DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        Call Trace:
         close_ctree+0x2ba/0x2fa [btrfs]
         generic_shutdown_super+0x6c/0x100
         kill_anon_super+0x14/0x30
         btrfs_kill_super+0x12/0x20 [btrfs]
         deactivate_locked_super+0x31/0x70
         cleanup_mnt+0x100/0x160
         task_work_run+0x68/0xb0
         exit_to_user_mode_prepare+0x1bb/0x1c0
         syscall_exit_to_user_mode+0x4b/0x260
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
        RIP: 0033:0x7f15ee221ee7
        Code: ff 0b 00 f7 d8 64 89 (...)
        RSP: 002b:00007ffe9470f0f8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
        RAX: 0000000000000000 RBX: 00007f15ee347264 RCX: 00007f15ee221ee7
        RDX: ffffffffffffff78 RSI: 0000000000000000 RDI: 000056169701d000
        RBP: 0000561697018a30 R08: 0000000000000000 R09: 00007f15ee2e2be0
        R10: 000056169701efe0 R11: 0000000000000246 R12: 0000000000000000
        R13: 000056169701d000 R14: 0000561697018b40 R15: 0000561697018c60
        irq event stamp: 0
        hardirqs last  enabled at (0): [<0000000000000000>] 0x0
        hardirqs last disabled at (0): [<ffffffff8bcae560>] copy_process+0x8a0/0x1d70
        softirqs last  enabled at (0): [<ffffffff8bcae560>] copy_process+0x8a0/0x1d70
        softirqs last disabled at (0): [<0000000000000000>] 0x0
        ---[ end trace dd74718fef1ed5c8 ]---
        BTRFS info (device sdc): space_info 4 has 268238848 free, is not full
        BTRFS info (device sdc): space_info total=268435456, used=114688, pinned=0, reserved=16384, may_use=0, readonly=65536
        BTRFS info (device sdc): global_block_rsv: size 0 reserved 0
        BTRFS info (device sdc): trans_block_rsv: size 0 reserved 0
        BTRFS info (device sdc): chunk_block_rsv: size 0 reserved 0
        BTRFS info (device sdc): delayed_block_rsv: size 0 reserved 0
        BTRFS info (device sdc): delayed_refs_rsv: size 524288 reserved 0
      
      And the crash, which only happens when we do not have crc32c hardware
      acceleration, produces the following trace immediately after those
      warnings:
      
        stack segment: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC PTI
        CPU: 2 PID: 1749129 Comm: umount Tainted: G    B   W         5.10.0-rc4-btrfs-next-73 #1
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
        RIP: 0010:btrfs_queue_work+0x36/0x190 [btrfs]
        Code: 54 55 53 48 89 f3 (...)
        RSP: 0018:ffffb27082443ae8 EFLAGS: 00010282
        RAX: 0000000000000004 RBX: ffff94810ee9ad90 RCX: 0000000000000000
        RDX: 0000000000000001 RSI: ffff94810ee9ad90 RDI: ffff947ed8ee75a0
        RBP: a56b6b6b6b6b6b6b R08: 0000000000000000 R09: 0000000000000000
        R10: 0000000000000007 R11: 0000000000000001 R12: ffff947fa9b435a8
        R13: ffff94810ee9ad90 R14: 0000000000000000 R15: ffff947e93dc0000
        FS:  00007f3cfe974840(0000) GS:ffff9481ac600000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 00007f1b42995a70 CR3: 0000000127638003 CR4: 00000000003706e0
        DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        Call Trace:
         btrfs_wq_submit_bio+0xb3/0xd0 [btrfs]
         btrfs_submit_metadata_bio+0x44/0xc0 [btrfs]
         submit_one_bio+0x61/0x70 [btrfs]
         btree_write_cache_pages+0x414/0x450 [btrfs]
         ? kobject_put+0x9a/0x1d0
         ? trace_hardirqs_on+0x1b/0xf0
         ? _raw_spin_unlock_irqrestore+0x3c/0x60
         ? free_debug_processing+0x1e1/0x2b0
         do_writepages+0x43/0xe0
         ? lock_acquired+0x199/0x490
         __writeback_single_inode+0x59/0x650
         writeback_single_inode+0xaf/0x120
         write_inode_now+0x94/0xd0
         iput+0x187/0x2b0
         close_ctree+0x2c6/0x2fa [btrfs]
         generic_shutdown_super+0x6c/0x100
         kill_anon_super+0x14/0x30
         btrfs_kill_super+0x12/0x20 [btrfs]
         deactivate_locked_super+0x31/0x70
         cleanup_mnt+0x100/0x160
         task_work_run+0x68/0xb0
         exit_to_user_mode_prepare+0x1bb/0x1c0
         syscall_exit_to_user_mode+0x4b/0x260
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
        RIP: 0033:0x7f3cfebabee7
        Code: ff 0b 00 f7 d8 64 89 01 (...)
        RSP: 002b:00007ffc9c9a05f8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
        RAX: 0000000000000000 RBX: 00007f3cfecd1264 RCX: 00007f3cfebabee7
        RDX: ffffffffffffff78 RSI: 0000000000000000 RDI: 0000562b6b478000
        RBP: 0000562b6b473a30 R08: 0000000000000000 R09: 00007f3cfec6cbe0
        R10: 0000562b6b479fe0 R11: 0000000000000246 R12: 0000000000000000
        R13: 0000562b6b478000 R14: 0000562b6b473b40 R15: 0000562b6b473c60
        Modules linked in: btrfs dm_snapshot dm_thin_pool (...)
        ---[ end trace dd74718fef1ed5cc ]---
      
      Finally when we remove the btrfs module (rmmod btrfs), there are several
      warnings about objects that were allocated from our slabs but were never
      freed, consequence of the transaction that was never committed and got
      leaked:
      
        =============================================================================
        BUG btrfs_delayed_ref_head (Tainted: G    B   W        ): Objects remaining in btrfs_delayed_ref_head on __kmem_cache_shutdown()
        -----------------------------------------------------------------------------
      
        INFO: Slab 0x0000000094c2ae56 objects=24 used=2 fp=0x000000002bfa2521 flags=0x17fffc000010200
        CPU: 5 PID: 1729921 Comm: rmmod Tainted: G    B   W         5.10.0-rc4-btrfs-next-73 #1
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
        Call Trace:
         dump_stack+0x8d/0xb5
         slab_err+0xb7/0xdc
         ? lock_acquired+0x199/0x490
         __kmem_cache_shutdown+0x1ac/0x3c0
         ? lock_release+0x20e/0x4c0
         kmem_cache_destroy+0x55/0x120
         btrfs_delayed_ref_exit+0x11/0x35 [btrfs]
         exit_btrfs_fs+0xa/0x59 [btrfs]
         __x64_sys_delete_module+0x194/0x260
         ? fpregs_assert_state_consistent+0x1e/0x40
         ? exit_to_user_mode_prepare+0x55/0x1c0
         ? trace_hardirqs_on+0x1b/0xf0
         do_syscall_64+0x33/0x80
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
        RIP: 0033:0x7f693e305897
        Code: 73 01 c3 48 8b 0d f9 f5 (...)
        RSP: 002b:00007ffcf73eb508 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
        RAX: ffffffffffffffda RBX: 0000559df504f760 RCX: 00007f693e305897
        RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000559df504f7c8
        RBP: 00007ffcf73eb568 R08: 0000000000000000 R09: 0000000000000000
        R10: 00007f693e378ac0 R11: 0000000000000206 R12: 00007ffcf73eb740
        R13: 00007ffcf73ec5a6 R14: 0000559df504f2a0 R15: 0000559df504f760
        INFO: Object 0x0000000050cbdd61 @offset=12104
        INFO: Allocated in btrfs_add_delayed_tree_ref+0xbb/0x480 [btrfs] age=1894 cpu=6 pid=1729873
      	__slab_alloc.isra.0+0x109/0x1c0
      	kmem_cache_alloc+0x7bb/0x830
      	btrfs_add_delayed_tree_ref+0xbb/0x480 [btrfs]
      	btrfs_free_tree_block+0x128/0x360 [btrfs]
      	__btrfs_cow_block+0x489/0x5f0 [btrfs]
      	btrfs_cow_block+0xf7/0x220 [btrfs]
      	btrfs_search_slot+0x62a/0xc40 [btrfs]
      	btrfs_del_orphan_item+0x65/0xd0 [btrfs]
      	btrfs_find_orphan_roots+0x1bf/0x200 [btrfs]
      	open_ctree+0x125a/0x18a0 [btrfs]
      	btrfs_mount_root.cold+0x13/0xed [btrfs]
      	legacy_get_tree+0x30/0x60
      	vfs_get_tree+0x28/0xe0
      	fc_mount+0xe/0x40
      	vfs_kern_mount.part.0+0x71/0x90
      	btrfs_mount+0x13b/0x3e0 [btrfs]
        INFO: Freed in __btrfs_run_delayed_refs+0x1117/0x1290 [btrfs] age=4292 cpu=2 pid=1729526
      	kmem_cache_free+0x34c/0x3c0
      	__btrfs_run_delayed_refs+0x1117/0x1290 [btrfs]
      	btrfs_run_delayed_refs+0x81/0x210 [btrfs]
      	commit_cowonly_roots+0xfb/0x300 [btrfs]
      	btrfs_commit_transaction+0x367/0xc40 [btrfs]
      	sync_filesystem+0x74/0x90
      	generic_shutdown_super+0x22/0x100
      	kill_anon_super+0x14/0x30
      	btrfs_kill_super+0x12/0x20 [btrfs]
      	deactivate_locked_super+0x31/0x70
      	cleanup_mnt+0x100/0x160
      	task_work_run+0x68/0xb0
      	exit_to_user_mode_prepare+0x1bb/0x1c0
      	syscall_exit_to_user_mode+0x4b/0x260
      	entry_SYSCALL_64_after_hwframe+0x44/0xa9
        INFO: Object 0x0000000086e9b0ff @offset=12776
        INFO: Allocated in btrfs_add_delayed_tree_ref+0xbb/0x480 [btrfs] age=1900 cpu=6 pid=1729873
      	__slab_alloc.isra.0+0x109/0x1c0
      	kmem_cache_alloc+0x7bb/0x830
      	btrfs_add_delayed_tree_ref+0xbb/0x480 [btrfs]
      	btrfs_alloc_tree_block+0x2bf/0x360 [btrfs]
      	alloc_tree_block_no_bg_flush+0x4f/0x60 [btrfs]
      	__btrfs_cow_block+0x12d/0x5f0 [btrfs]
      	btrfs_cow_block+0xf7/0x220 [btrfs]
      	btrfs_search_slot+0x62a/0xc40 [btrfs]
      	btrfs_del_orphan_item+0x65/0xd0 [btrfs]
      	btrfs_find_orphan_roots+0x1bf/0x200 [btrfs]
      	open_ctree+0x125a/0x18a0 [btrfs]
      	btrfs_mount_root.cold+0x13/0xed [btrfs]
      	legacy_get_tree+0x30/0x60
      	vfs_get_tree+0x28/0xe0
      	fc_mount+0xe/0x40
      	vfs_kern_mount.part.0+0x71/0x90
        INFO: Freed in __btrfs_run_delayed_refs+0x1117/0x1290 [btrfs] age=3141 cpu=6 pid=1729803
      	kmem_cache_free+0x34c/0x3c0
      	__btrfs_run_delayed_refs+0x1117/0x1290 [btrfs]
      	btrfs_run_delayed_refs+0x81/0x210 [btrfs]
      	btrfs_write_dirty_block_groups+0x17d/0x3d0 [btrfs]
      	commit_cowonly_roots+0x248/0x300 [btrfs]
      	btrfs_commit_transaction+0x367/0xc40 [btrfs]
      	close_ctree+0x113/0x2fa [btrfs]
      	generic_shutdown_super+0x6c/0x100
      	kill_anon_super+0x14/0x30
      	btrfs_kill_super+0x12/0x20 [btrfs]
      	deactivate_locked_super+0x31/0x70
      	cleanup_mnt+0x100/0x160
      	task_work_run+0x68/0xb0
      	exit_to_user_mode_prepare+0x1bb/0x1c0
      	syscall_exit_to_user_mode+0x4b/0x260
      	entry_SYSCALL_64_after_hwframe+0x44/0xa9
        kmem_cache_destroy btrfs_delayed_ref_head: Slab cache still has objects
        CPU: 5 PID: 1729921 Comm: rmmod Tainted: G    B   W         5.10.0-rc4-btrfs-next-73 #1
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
        Call Trace:
         dump_stack+0x8d/0xb5
         kmem_cache_destroy+0x119/0x120
         btrfs_delayed_ref_exit+0x11/0x35 [btrfs]
         exit_btrfs_fs+0xa/0x59 [btrfs]
         __x64_sys_delete_module+0x194/0x260
         ? fpregs_assert_state_consistent+0x1e/0x40
         ? exit_to_user_mode_prepare+0x55/0x1c0
         ? trace_hardirqs_on+0x1b/0xf0
         do_syscall_64+0x33/0x80
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
        RIP: 0033:0x7f693e305897
        Code: 73 01 c3 48 8b 0d f9 f5 0b (...)
        RSP: 002b:00007ffcf73eb508 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
        RAX: ffffffffffffffda RBX: 0000559df504f760 RCX: 00007f693e305897
        RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000559df504f7c8
        RBP: 00007ffcf73eb568 R08: 0000000000000000 R09: 0000000000000000
        R10: 00007f693e378ac0 R11: 0000000000000206 R12: 00007ffcf73eb740
        R13: 00007ffcf73ec5a6 R14: 0000559df504f2a0 R15: 0000559df504f760
        =============================================================================
        BUG btrfs_delayed_tree_ref (Tainted: G    B   W        ): Objects remaining in btrfs_delayed_tree_ref on __kmem_cache_shutdown()
        -----------------------------------------------------------------------------
      
        INFO: Slab 0x0000000011f78dc0 objects=37 used=2 fp=0x0000000032d55d91 flags=0x17fffc000010200
        CPU: 3 PID: 1729921 Comm: rmmod Tainted: G    B   W         5.10.0-rc4-btrfs-next-73 #1
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
        Call Trace:
         dump_stack+0x8d/0xb5
         slab_err+0xb7/0xdc
         ? lock_acquired+0x199/0x490
         __kmem_cache_shutdown+0x1ac/0x3c0
         ? lock_release+0x20e/0x4c0
         kmem_cache_destroy+0x55/0x120
         btrfs_delayed_ref_exit+0x1d/0x35 [btrfs]
         exit_btrfs_fs+0xa/0x59 [btrfs]
         __x64_sys_delete_module+0x194/0x260
         ? fpregs_assert_state_consistent+0x1e/0x40
         ? exit_to_user_mode_prepare+0x55/0x1c0
         ? trace_hardirqs_on+0x1b/0xf0
         do_syscall_64+0x33/0x80
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
        RIP: 0033:0x7f693e305897
        Code: 73 01 c3 48 8b 0d f9 f5 (...)
        RSP: 002b:00007ffcf73eb508 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
        RAX: ffffffffffffffda RBX: 0000559df504f760 RCX: 00007f693e305897
        RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000559df504f7c8
        RBP: 00007ffcf73eb568 R08: 0000000000000000 R09: 0000000000000000
        R10: 00007f693e378ac0 R11: 0000000000000206 R12: 00007ffcf73eb740
        R13: 00007ffcf73ec5a6 R14: 0000559df504f2a0 R15: 0000559df504f760
        INFO: Object 0x000000001a340018 @offset=4408
        INFO: Allocated in btrfs_add_delayed_tree_ref+0x9e/0x480 [btrfs] age=1917 cpu=6 pid=1729873
      	__slab_alloc.isra.0+0x109/0x1c0
      	kmem_cache_alloc+0x7bb/0x830
      	btrfs_add_delayed_tree_ref+0x9e/0x480 [btrfs]
      	btrfs_free_tree_block+0x128/0x360 [btrfs]
      	__btrfs_cow_block+0x489/0x5f0 [btrfs]
      	btrfs_cow_block+0xf7/0x220 [btrfs]
      	btrfs_search_slot+0x62a/0xc40 [btrfs]
      	btrfs_del_orphan_item+0x65/0xd0 [btrfs]
      	btrfs_find_orphan_roots+0x1bf/0x200 [btrfs]
      	open_ctree+0x125a/0x18a0 [btrfs]
      	btrfs_mount_root.cold+0x13/0xed [btrfs]
      	legacy_get_tree+0x30/0x60
      	vfs_get_tree+0x28/0xe0
      	fc_mount+0xe/0x40
      	vfs_kern_mount.part.0+0x71/0x90
      	btrfs_mount+0x13b/0x3e0 [btrfs]
        INFO: Freed in __btrfs_run_delayed_refs+0x63d/0x1290 [btrfs] age=4167 cpu=4 pid=1729795
      	kmem_cache_free+0x34c/0x3c0
      	__btrfs_run_delayed_refs+0x63d/0x1290 [btrfs]
      	btrfs_run_delayed_refs+0x81/0x210 [btrfs]
      	btrfs_commit_transaction+0x60/0xc40 [btrfs]
      	create_subvol+0x56a/0x990 [btrfs]
      	btrfs_mksubvol+0x3fb/0x4a0 [btrfs]
      	__btrfs_ioctl_snap_create+0x119/0x1a0 [btrfs]
      	btrfs_ioctl_snap_create+0x58/0x80 [btrfs]
      	btrfs_ioctl+0x1a92/0x36f0 [btrfs]
      	__x64_sys_ioctl+0x83/0xb0
      	do_syscall_64+0x33/0x80
      	entry_SYSCALL_64_after_hwframe+0x44/0xa9
        INFO: Object 0x000000002b46292a @offset=13648
        INFO: Allocated in btrfs_add_delayed_tree_ref+0x9e/0x480 [btrfs] age=1923 cpu=6 pid=1729873
      	__slab_alloc.isra.0+0x109/0x1c0
      	kmem_cache_alloc+0x7bb/0x830
      	btrfs_add_delayed_tree_ref+0x9e/0x480 [btrfs]
      	btrfs_alloc_tree_block+0x2bf/0x360 [btrfs]
      	alloc_tree_block_no_bg_flush+0x4f/0x60 [btrfs]
      	__btrfs_cow_block+0x12d/0x5f0 [btrfs]
      	btrfs_cow_block+0xf7/0x220 [btrfs]
      	btrfs_search_slot+0x62a/0xc40 [btrfs]
      	btrfs_del_orphan_item+0x65/0xd0 [btrfs]
      	btrfs_find_orphan_roots+0x1bf/0x200 [btrfs]
      	open_ctree+0x125a/0x18a0 [btrfs]
      	btrfs_mount_root.cold+0x13/0xed [btrfs]
      	legacy_get_tree+0x30/0x60
      	vfs_get_tree+0x28/0xe0
      	fc_mount+0xe/0x40
      	vfs_kern_mount.part.0+0x71/0x90
        INFO: Freed in __btrfs_run_delayed_refs+0x63d/0x1290 [btrfs] age=3164 cpu=6 pid=1729803
      	kmem_cache_free+0x34c/0x3c0
      	__btrfs_run_delayed_refs+0x63d/0x1290 [btrfs]
      	btrfs_run_delayed_refs+0x81/0x210 [btrfs]
      	commit_cowonly_roots+0xfb/0x300 [btrfs]
      	btrfs_commit_transaction+0x367/0xc40 [btrfs]
      	close_ctree+0x113/0x2fa [btrfs]
      	generic_shutdown_super+0x6c/0x100
      	kill_anon_super+0x14/0x30
      	btrfs_kill_super+0x12/0x20 [btrfs]
      	deactivate_locked_super+0x31/0x70
      	cleanup_mnt+0x100/0x160
      	task_work_run+0x68/0xb0
      	exit_to_user_mode_prepare+0x1bb/0x1c0
      	syscall_exit_to_user_mode+0x4b/0x260
      	entry_SYSCALL_64_after_hwframe+0x44/0xa9
        kmem_cache_destroy btrfs_delayed_tree_ref: Slab cache still has objects
        CPU: 5 PID: 1729921 Comm: rmmod Tainted: G    B   W         5.10.0-rc4-btrfs-next-73 #1
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
        Call Trace:
         dump_stack+0x8d/0xb5
         kmem_cache_destroy+0x119/0x120
         btrfs_delayed_ref_exit+0x1d/0x35 [btrfs]
         exit_btrfs_fs+0xa/0x59 [btrfs]
         __x64_sys_delete_module+0x194/0x260
         ? fpregs_assert_state_consistent+0x1e/0x40
         ? exit_to_user_mode_prepare+0x55/0x1c0
         ? trace_hardirqs_on+0x1b/0xf0
         do_syscall_64+0x33/0x80
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
        RIP: 0033:0x7f693e305897
        Code: 73 01 c3 48 8b 0d f9 f5 (...)
        RSP: 002b:00007ffcf73eb508 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
        RAX: ffffffffffffffda RBX: 0000559df504f760 RCX: 00007f693e305897
        RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000559df504f7c8
        RBP: 00007ffcf73eb568 R08: 0000000000000000 R09: 0000000000000000
        R10: 00007f693e378ac0 R11: 0000000000000206 R12: 00007ffcf73eb740
        R13: 00007ffcf73ec5a6 R14: 0000559df504f2a0 R15: 0000559df504f760
        =============================================================================
        BUG btrfs_delayed_extent_op (Tainted: G    B   W        ): Objects remaining in btrfs_delayed_extent_op on __kmem_cache_shutdown()
        -----------------------------------------------------------------------------
      
        INFO: Slab 0x00000000f145ce2f objects=22 used=1 fp=0x00000000af0f92cf flags=0x17fffc000010200
        CPU: 5 PID: 1729921 Comm: rmmod Tainted: G    B   W         5.10.0-rc4-btrfs-next-73 #1
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
        Call Trace:
         dump_stack+0x8d/0xb5
         slab_err+0xb7/0xdc
         ? lock_acquired+0x199/0x490
         __kmem_cache_shutdown+0x1ac/0x3c0
         ? __mutex_unlock_slowpath+0x45/0x2a0
         kmem_cache_destroy+0x55/0x120
         exit_btrfs_fs+0xa/0x59 [btrfs]
         __x64_sys_delete_module+0x194/0x260
         ? fpregs_assert_state_consistent+0x1e/0x40
         ? exit_to_user_mode_prepare+0x55/0x1c0
         ? trace_hardirqs_on+0x1b/0xf0
         do_syscall_64+0x33/0x80
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
        RIP: 0033:0x7f693e305897
        Code: 73 01 c3 48 8b 0d f9 f5 (...)
        RSP: 002b:00007ffcf73eb508 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
        RAX: ffffffffffffffda RBX: 0000559df504f760 RCX: 00007f693e305897
        RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000559df504f7c8
        RBP: 00007ffcf73eb568 R08: 0000000000000000 R09: 0000000000000000
        R10: 00007f693e378ac0 R11: 0000000000000206 R12: 00007ffcf73eb740
        R13: 00007ffcf73ec5a6 R14: 0000559df504f2a0 R15: 0000559df504f760
        INFO: Object 0x000000004cf95ea8 @offset=6264
        INFO: Allocated in btrfs_alloc_tree_block+0x1e0/0x360 [btrfs] age=1931 cpu=6 pid=1729873
      	__slab_alloc.isra.0+0x109/0x1c0
      	kmem_cache_alloc+0x7bb/0x830
      	btrfs_alloc_tree_block+0x1e0/0x360 [btrfs]
      	alloc_tree_block_no_bg_flush+0x4f/0x60 [btrfs]
      	__btrfs_cow_block+0x12d/0x5f0 [btrfs]
      	btrfs_cow_block+0xf7/0x220 [btrfs]
      	btrfs_search_slot+0x62a/0xc40 [btrfs]
      	btrfs_del_orphan_item+0x65/0xd0 [btrfs]
      	btrfs_find_orphan_roots+0x1bf/0x200 [btrfs]
      	open_ctree+0x125a/0x18a0 [btrfs]
      	btrfs_mount_root.cold+0x13/0xed [btrfs]
      	legacy_get_tree+0x30/0x60
      	vfs_get_tree+0x28/0xe0
      	fc_mount+0xe/0x40
      	vfs_kern_mount.part.0+0x71/0x90
      	btrfs_mount+0x13b/0x3e0 [btrfs]
        INFO: Freed in __btrfs_run_delayed_refs+0xabd/0x1290 [btrfs] age=3173 cpu=6 pid=1729803
      	kmem_cache_free+0x34c/0x3c0
      	__btrfs_run_delayed_refs+0xabd/0x1290 [btrfs]
      	btrfs_run_delayed_refs+0x81/0x210 [btrfs]
      	commit_cowonly_roots+0xfb/0x300 [btrfs]
      	btrfs_commit_transaction+0x367/0xc40 [btrfs]
      	close_ctree+0x113/0x2fa [btrfs]
      	generic_shutdown_super+0x6c/0x100
      	kill_anon_super+0x14/0x30
      	btrfs_kill_super+0x12/0x20 [btrfs]
      	deactivate_locked_super+0x31/0x70
      	cleanup_mnt+0x100/0x160
      	task_work_run+0x68/0xb0
      	exit_to_user_mode_prepare+0x1bb/0x1c0
      	syscall_exit_to_user_mode+0x4b/0x260
      	entry_SYSCALL_64_after_hwframe+0x44/0xa9
        kmem_cache_destroy btrfs_delayed_extent_op: Slab cache still has objects
        CPU: 3 PID: 1729921 Comm: rmmod Tainted: G    B   W         5.10.0-rc4-btrfs-next-73 #1
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
        Call Trace:
         dump_stack+0x8d/0xb5
         kmem_cache_destroy+0x119/0x120
         exit_btrfs_fs+0xa/0x59 [btrfs]
         __x64_sys_delete_module+0x194/0x260
         ? fpregs_assert_state_consistent+0x1e/0x40
         ? exit_to_user_mode_prepare+0x55/0x1c0
         ? trace_hardirqs_on+0x1b/0xf0
         do_syscall_64+0x33/0x80
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
        RIP: 0033:0x7f693e305897
        Code: 73 01 c3 48 8b 0d f9 (...)
        RSP: 002b:00007ffcf73eb508 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
        RAX: ffffffffffffffda RBX: 0000559df504f760 RCX: 00007f693e305897
        RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000559df504f7c8
        RBP: 00007ffcf73eb568 R08: 0000000000000000 R09: 0000000000000000
        R10: 00007f693e378ac0 R11: 0000000000000206 R12: 00007ffcf73eb740
        R13: 00007ffcf73ec5a6 R14: 0000559df504f2a0 R15: 0000559df504f760
        BTRFS: state leak: start 30408704 end 30425087 state 1 in tree 1 refs 1
      
      Fix this issue by having the remount path stop the qgroup rescan worker
      when we are remounting RO and teach the rescan worker to stop when a
      remount is in progress. If later a remount in RW mode happens, we are
      already resuming the qgroup rescan worker through the call to
      btrfs_qgroup_rescan_resume(), so we do not need to worry about that.
      Tested-by: NFabian Vogt <fvogt@suse.com>
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      cb13eea3
    • Q
      btrfs: qgroup: don't try to wait flushing if we're already holding a transaction · ae5e070e
      Qu Wenruo 提交于
      There is a chance of racing for qgroup flushing which may lead to
      deadlock:
      
      	Thread A		|	Thread B
         (not holding trans handle)	|  (holding a trans handle)
      --------------------------------+--------------------------------
      __btrfs_qgroup_reserve_meta()   | __btrfs_qgroup_reserve_meta()
      |- try_flush_qgroup()		| |- try_flush_qgroup()
         |- QGROUP_FLUSHING bit set   |    |
         |				|    |- test_and_set_bit()
         |				|    |- wait_event()
         |- btrfs_join_transaction()	|
         |- btrfs_commit_transaction()|
      
      			!!! DEAD LOCK !!!
      
      Since thread A wants to commit transaction, but thread B is holding a
      transaction handle, blocking the commit.
      At the same time, thread B is waiting for thread A to finish its commit.
      
      This is just a hot fix, and would lead to more EDQUOT when we're near
      the qgroup limit.
      
      The proper fix would be to make all metadata/data reservations happen
      without holding a transaction handle.
      
      CC: stable@vger.kernel.org # 5.9+
      Reviewed-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      ae5e070e
  7. 08 12月, 2020 6 次提交
  8. 24 11月, 2020 3 次提交
    • F
      btrfs: fix lockdep splat when enabling and disabling qgroups · a855fbe6
      Filipe Manana 提交于
      When running test case btrfs/017 from fstests, lockdep reported the
      following splat:
      
        [ 1297.067385] ======================================================
        [ 1297.067708] WARNING: possible circular locking dependency detected
        [ 1297.068022] 5.10.0-rc4-btrfs-next-73 #1 Not tainted
        [ 1297.068322] ------------------------------------------------------
        [ 1297.068629] btrfs/189080 is trying to acquire lock:
        [ 1297.068929] ffff9f2725731690 (sb_internal#2){.+.+}-{0:0}, at: btrfs_quota_enable+0xaf/0xa70 [btrfs]
        [ 1297.069274]
      		 but task is already holding lock:
        [ 1297.069868] ffff9f2702b61a08 (&fs_info->qgroup_ioctl_lock){+.+.}-{3:3}, at: btrfs_quota_enable+0x3b/0xa70 [btrfs]
        [ 1297.070219]
      		 which lock already depends on the new lock.
      
        [ 1297.071131]
      		 the existing dependency chain (in reverse order) is:
        [ 1297.071721]
      		 -> #1 (&fs_info->qgroup_ioctl_lock){+.+.}-{3:3}:
        [ 1297.072375]        lock_acquire+0xd8/0x490
        [ 1297.072710]        __mutex_lock+0xa3/0xb30
        [ 1297.073061]        btrfs_qgroup_inherit+0x59/0x6a0 [btrfs]
        [ 1297.073421]        create_subvol+0x194/0x990 [btrfs]
        [ 1297.073780]        btrfs_mksubvol+0x3fb/0x4a0 [btrfs]
        [ 1297.074133]        __btrfs_ioctl_snap_create+0x119/0x1a0 [btrfs]
        [ 1297.074498]        btrfs_ioctl_snap_create+0x58/0x80 [btrfs]
        [ 1297.074872]        btrfs_ioctl+0x1a90/0x36f0 [btrfs]
        [ 1297.075245]        __x64_sys_ioctl+0x83/0xb0
        [ 1297.075617]        do_syscall_64+0x33/0x80
        [ 1297.075993]        entry_SYSCALL_64_after_hwframe+0x44/0xa9
        [ 1297.076380]
      		 -> #0 (sb_internal#2){.+.+}-{0:0}:
        [ 1297.077166]        check_prev_add+0x91/0xc60
        [ 1297.077572]        __lock_acquire+0x1740/0x3110
        [ 1297.077984]        lock_acquire+0xd8/0x490
        [ 1297.078411]        start_transaction+0x3c5/0x760 [btrfs]
        [ 1297.078853]        btrfs_quota_enable+0xaf/0xa70 [btrfs]
        [ 1297.079323]        btrfs_ioctl+0x2c60/0x36f0 [btrfs]
        [ 1297.079789]        __x64_sys_ioctl+0x83/0xb0
        [ 1297.080232]        do_syscall_64+0x33/0x80
        [ 1297.080680]        entry_SYSCALL_64_after_hwframe+0x44/0xa9
        [ 1297.081139]
      		 other info that might help us debug this:
      
        [ 1297.082536]  Possible unsafe locking scenario:
      
        [ 1297.083510]        CPU0                    CPU1
        [ 1297.084005]        ----                    ----
        [ 1297.084500]   lock(&fs_info->qgroup_ioctl_lock);
        [ 1297.084994]                                lock(sb_internal#2);
        [ 1297.085485]                                lock(&fs_info->qgroup_ioctl_lock);
        [ 1297.085974]   lock(sb_internal#2);
        [ 1297.086454]
      		  *** DEADLOCK ***
        [ 1297.087880] 3 locks held by btrfs/189080:
        [ 1297.088324]  #0: ffff9f2725731470 (sb_writers#14){.+.+}-{0:0}, at: btrfs_ioctl+0xa73/0x36f0 [btrfs]
        [ 1297.088799]  #1: ffff9f2702b60cc0 (&fs_info->subvol_sem){++++}-{3:3}, at: btrfs_ioctl+0x1f4d/0x36f0 [btrfs]
        [ 1297.089284]  #2: ffff9f2702b61a08 (&fs_info->qgroup_ioctl_lock){+.+.}-{3:3}, at: btrfs_quota_enable+0x3b/0xa70 [btrfs]
        [ 1297.089771]
      		 stack backtrace:
        [ 1297.090662] CPU: 5 PID: 189080 Comm: btrfs Not tainted 5.10.0-rc4-btrfs-next-73 #1
        [ 1297.091132] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
        [ 1297.092123] Call Trace:
        [ 1297.092629]  dump_stack+0x8d/0xb5
        [ 1297.093115]  check_noncircular+0xff/0x110
        [ 1297.093596]  check_prev_add+0x91/0xc60
        [ 1297.094076]  ? kvm_clock_read+0x14/0x30
        [ 1297.094553]  ? kvm_sched_clock_read+0x5/0x10
        [ 1297.095029]  __lock_acquire+0x1740/0x3110
        [ 1297.095510]  lock_acquire+0xd8/0x490
        [ 1297.095993]  ? btrfs_quota_enable+0xaf/0xa70 [btrfs]
        [ 1297.096476]  start_transaction+0x3c5/0x760 [btrfs]
        [ 1297.096962]  ? btrfs_quota_enable+0xaf/0xa70 [btrfs]
        [ 1297.097451]  btrfs_quota_enable+0xaf/0xa70 [btrfs]
        [ 1297.097941]  ? btrfs_ioctl+0x1f4d/0x36f0 [btrfs]
        [ 1297.098429]  btrfs_ioctl+0x2c60/0x36f0 [btrfs]
        [ 1297.098904]  ? do_user_addr_fault+0x20c/0x430
        [ 1297.099382]  ? kvm_clock_read+0x14/0x30
        [ 1297.099854]  ? kvm_sched_clock_read+0x5/0x10
        [ 1297.100328]  ? sched_clock+0x5/0x10
        [ 1297.100801]  ? sched_clock_cpu+0x12/0x180
        [ 1297.101272]  ? __x64_sys_ioctl+0x83/0xb0
        [ 1297.101739]  __x64_sys_ioctl+0x83/0xb0
        [ 1297.102207]  do_syscall_64+0x33/0x80
        [ 1297.102673]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
        [ 1297.103148] RIP: 0033:0x7f773ff65d87
      
      This is because during the quota enable ioctl we lock first the mutex
      qgroup_ioctl_lock and then start a transaction, and starting a transaction
      acquires a fs freeze semaphore (at the VFS level). However, every other
      code path, except for the quota disable ioctl path, we do the opposite:
      we start a transaction and then lock the mutex.
      
      So fix this by making the quota enable and disable paths to start the
      transaction without having the mutex locked, and then, after starting the
      transaction, lock the mutex and check if some other task already enabled
      or disabled the quotas, bailing with success if that was the case.
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      a855fbe6
    • F
      btrfs: do nofs allocations when adding and removing qgroup relations · 7aa6d359
      Filipe Manana 提交于
      When adding or removing a qgroup relation we are doing a GFP_KERNEL
      allocation which is not safe because we are holding a transaction
      handle open and that can make us deadlock if the allocator needs to
      recurse into the filesystem. So just surround those calls with a
      nofs context.
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      7aa6d359
    • F
      btrfs: fix lockdep splat when reading qgroup config on mount · 3d05cad3
      Filipe Manana 提交于
      Lockdep reported the following splat when running test btrfs/190 from
      fstests:
      
        [ 9482.126098] ======================================================
        [ 9482.126184] WARNING: possible circular locking dependency detected
        [ 9482.126281] 5.10.0-rc4-btrfs-next-73 #1 Not tainted
        [ 9482.126365] ------------------------------------------------------
        [ 9482.126456] mount/24187 is trying to acquire lock:
        [ 9482.126534] ffffa0c869a7dac0 (&fs_info->qgroup_rescan_lock){+.+.}-{3:3}, at: qgroup_rescan_init+0x43/0xf0 [btrfs]
        [ 9482.126647]
      		 but task is already holding lock:
        [ 9482.126777] ffffa0c892ebd3a0 (btrfs-quota-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x27/0x120 [btrfs]
        [ 9482.126886]
      		 which lock already depends on the new lock.
      
        [ 9482.127078]
      		 the existing dependency chain (in reverse order) is:
        [ 9482.127213]
      		 -> #1 (btrfs-quota-00){++++}-{3:3}:
        [ 9482.127366]        lock_acquire+0xd8/0x490
        [ 9482.127436]        down_read_nested+0x45/0x220
        [ 9482.127528]        __btrfs_tree_read_lock+0x27/0x120 [btrfs]
        [ 9482.127613]        btrfs_read_lock_root_node+0x41/0x130 [btrfs]
        [ 9482.127702]        btrfs_search_slot+0x514/0xc30 [btrfs]
        [ 9482.127788]        update_qgroup_status_item+0x72/0x140 [btrfs]
        [ 9482.127877]        btrfs_qgroup_rescan_worker+0xde/0x680 [btrfs]
        [ 9482.127964]        btrfs_work_helper+0xf1/0x600 [btrfs]
        [ 9482.128039]        process_one_work+0x24e/0x5e0
        [ 9482.128110]        worker_thread+0x50/0x3b0
        [ 9482.128181]        kthread+0x153/0x170
        [ 9482.128256]        ret_from_fork+0x22/0x30
        [ 9482.128327]
      		 -> #0 (&fs_info->qgroup_rescan_lock){+.+.}-{3:3}:
        [ 9482.128464]        check_prev_add+0x91/0xc60
        [ 9482.128551]        __lock_acquire+0x1740/0x3110
        [ 9482.128623]        lock_acquire+0xd8/0x490
        [ 9482.130029]        __mutex_lock+0xa3/0xb30
        [ 9482.130590]        qgroup_rescan_init+0x43/0xf0 [btrfs]
        [ 9482.131577]        btrfs_read_qgroup_config+0x43a/0x550 [btrfs]
        [ 9482.132175]        open_ctree+0x1228/0x18a0 [btrfs]
        [ 9482.132756]        btrfs_mount_root.cold+0x13/0xed [btrfs]
        [ 9482.133325]        legacy_get_tree+0x30/0x60
        [ 9482.133866]        vfs_get_tree+0x28/0xe0
        [ 9482.134392]        fc_mount+0xe/0x40
        [ 9482.134908]        vfs_kern_mount.part.0+0x71/0x90
        [ 9482.135428]        btrfs_mount+0x13b/0x3e0 [btrfs]
        [ 9482.135942]        legacy_get_tree+0x30/0x60
        [ 9482.136444]        vfs_get_tree+0x28/0xe0
        [ 9482.136949]        path_mount+0x2d7/0xa70
        [ 9482.137438]        do_mount+0x75/0x90
        [ 9482.137923]        __x64_sys_mount+0x8e/0xd0
        [ 9482.138400]        do_syscall_64+0x33/0x80
        [ 9482.138873]        entry_SYSCALL_64_after_hwframe+0x44/0xa9
        [ 9482.139346]
      		 other info that might help us debug this:
      
        [ 9482.140735]  Possible unsafe locking scenario:
      
        [ 9482.141594]        CPU0                    CPU1
        [ 9482.142011]        ----                    ----
        [ 9482.142411]   lock(btrfs-quota-00);
        [ 9482.142806]                                lock(&fs_info->qgroup_rescan_lock);
        [ 9482.143216]                                lock(btrfs-quota-00);
        [ 9482.143629]   lock(&fs_info->qgroup_rescan_lock);
        [ 9482.144056]
      		  *** DEADLOCK ***
      
        [ 9482.145242] 2 locks held by mount/24187:
        [ 9482.145637]  #0: ffffa0c8411c40e8 (&type->s_umount_key#44/1){+.+.}-{3:3}, at: alloc_super+0xb9/0x400
        [ 9482.146061]  #1: ffffa0c892ebd3a0 (btrfs-quota-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x27/0x120 [btrfs]
        [ 9482.146509]
      		 stack backtrace:
        [ 9482.147350] CPU: 1 PID: 24187 Comm: mount Not tainted 5.10.0-rc4-btrfs-next-73 #1
        [ 9482.147788] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
        [ 9482.148709] Call Trace:
        [ 9482.149169]  dump_stack+0x8d/0xb5
        [ 9482.149628]  check_noncircular+0xff/0x110
        [ 9482.150090]  check_prev_add+0x91/0xc60
        [ 9482.150561]  ? kvm_clock_read+0x14/0x30
        [ 9482.151017]  ? kvm_sched_clock_read+0x5/0x10
        [ 9482.151470]  __lock_acquire+0x1740/0x3110
        [ 9482.151941]  ? __btrfs_tree_read_lock+0x27/0x120 [btrfs]
        [ 9482.152402]  lock_acquire+0xd8/0x490
        [ 9482.152887]  ? qgroup_rescan_init+0x43/0xf0 [btrfs]
        [ 9482.153354]  __mutex_lock+0xa3/0xb30
        [ 9482.153826]  ? qgroup_rescan_init+0x43/0xf0 [btrfs]
        [ 9482.154301]  ? qgroup_rescan_init+0x43/0xf0 [btrfs]
        [ 9482.154768]  ? qgroup_rescan_init+0x43/0xf0 [btrfs]
        [ 9482.155226]  qgroup_rescan_init+0x43/0xf0 [btrfs]
        [ 9482.155690]  btrfs_read_qgroup_config+0x43a/0x550 [btrfs]
        [ 9482.156160]  open_ctree+0x1228/0x18a0 [btrfs]
        [ 9482.156643]  btrfs_mount_root.cold+0x13/0xed [btrfs]
        [ 9482.157108]  ? rcu_read_lock_sched_held+0x5d/0x90
        [ 9482.157567]  ? kfree+0x31f/0x3e0
        [ 9482.158030]  legacy_get_tree+0x30/0x60
        [ 9482.158489]  vfs_get_tree+0x28/0xe0
        [ 9482.158947]  fc_mount+0xe/0x40
        [ 9482.159403]  vfs_kern_mount.part.0+0x71/0x90
        [ 9482.159875]  btrfs_mount+0x13b/0x3e0 [btrfs]
        [ 9482.160335]  ? rcu_read_lock_sched_held+0x5d/0x90
        [ 9482.160805]  ? kfree+0x31f/0x3e0
        [ 9482.161260]  ? legacy_get_tree+0x30/0x60
        [ 9482.161714]  legacy_get_tree+0x30/0x60
        [ 9482.162166]  vfs_get_tree+0x28/0xe0
        [ 9482.162616]  path_mount+0x2d7/0xa70
        [ 9482.163070]  do_mount+0x75/0x90
        [ 9482.163525]  __x64_sys_mount+0x8e/0xd0
        [ 9482.163986]  do_syscall_64+0x33/0x80
        [ 9482.164437]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
        [ 9482.164902] RIP: 0033:0x7f51e907caaa
      
      This happens because at btrfs_read_qgroup_config() we can call
      qgroup_rescan_init() while holding a read lock on a quota btree leaf,
      acquired by the previous call to btrfs_search_slot_for_read(), and
      qgroup_rescan_init() acquires the mutex qgroup_rescan_lock.
      
      A qgroup rescan worker does the opposite: it acquires the mutex
      qgroup_rescan_lock, at btrfs_qgroup_rescan_worker(), and then tries to
      update the qgroup status item in the quota btree through the call to
      update_qgroup_status_item(). This inversion of locking order
      between the qgroup_rescan_lock mutex and quota btree locks causes the
      splat.
      
      Fix this simply by releasing and freeing the path before calling
      qgroup_rescan_init() at btrfs_read_qgroup_config().
      
      CC: stable@vger.kernel.org # 4.4+
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      3d05cad3
  9. 14 11月, 2020 1 次提交
    • Q
      btrfs: qgroup: don't commit transaction when we already hold the handle · 6f23277a
      Qu Wenruo 提交于
      [BUG]
      When running the following script, btrfs will trigger an ASSERT():
      
        #/bin/bash
        mkfs.btrfs -f $dev
        mount $dev $mnt
        xfs_io -f -c "pwrite 0 1G" $mnt/file
        sync
        btrfs quota enable $mnt
        btrfs quota rescan -w $mnt
      
        # Manually set the limit below current usage
        btrfs qgroup limit 512M $mnt $mnt
      
        # Crash happens
        touch $mnt/file
      
      The dmesg looks like this:
      
        assertion failed: refcount_read(&trans->use_count) == 1, in fs/btrfs/transaction.c:2022
        ------------[ cut here ]------------
        kernel BUG at fs/btrfs/ctree.h:3230!
        invalid opcode: 0000 [#1] SMP PTI
        RIP: 0010:assertfail.constprop.0+0x18/0x1a [btrfs]
         btrfs_commit_transaction.cold+0x11/0x5d [btrfs]
         try_flush_qgroup+0x67/0x100 [btrfs]
         __btrfs_qgroup_reserve_meta+0x3a/0x60 [btrfs]
         btrfs_delayed_update_inode+0xaa/0x350 [btrfs]
         btrfs_update_inode+0x9d/0x110 [btrfs]
         btrfs_dirty_inode+0x5d/0xd0 [btrfs]
         touch_atime+0xb5/0x100
         iterate_dir+0xf1/0x1b0
         __x64_sys_getdents64+0x78/0x110
         do_syscall_64+0x33/0x80
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
        RIP: 0033:0x7fb5afe588db
      
      [CAUSE]
      In try_flush_qgroup(), we assume we don't hold a transaction handle at
      all.  This is true for data reservation and mostly true for metadata.
      Since data space reservation always happens before we start a
      transaction, and for most metadata operation we reserve space in
      start_transaction().
      
      But there is an exception, btrfs_delayed_inode_reserve_metadata().
      It holds a transaction handle, while still trying to reserve extra
      metadata space.
      
      When we hit EDQUOT inside btrfs_delayed_inode_reserve_metadata(), we
      will join current transaction and commit, while we still have
      transaction handle from qgroup code.
      
      [FIX]
      Let's check current->journal before we join the transaction.
      
      If current->journal is unset or BTRFS_SEND_TRANS_STUB, it means
      we are not holding a transaction, thus are able to join and then commit
      transaction.
      
      If current->journal is a valid transaction handle, we avoid committing
      transaction and just end it
      
      This is less effective than committing current transaction, as it won't
      free metadata reserved space, but we may still free some data space
      before new data writes.
      
      Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1178634
      Fixes: c53e9653 ("btrfs: qgroup: try to flush qgroup space when we get -EDQUOT")
      Reviewed-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      6f23277a
  10. 05 11月, 2020 1 次提交
  11. 26 10月, 2020 1 次提交
    • J
      btrfs: drop the path before adding qgroup items when enabling qgroups · 5223cc60
      Josef Bacik 提交于
      When enabling qgroups we walk the tree_root and then add a qgroup item
      for every root that we have.  This creates a lock dependency on the
      tree_root and qgroup_root, which results in the following lockdep splat
      (with tree locks using rwsem), eg. in tests btrfs/017 or btrfs/022:
      
        ======================================================
        WARNING: possible circular locking dependency detected
        5.9.0-default+ #1299 Not tainted
        ------------------------------------------------------
        btrfs/24552 is trying to acquire lock:
        ffff9142dfc5f630 (btrfs-quota-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x35/0x1c0 [btrfs]
      
        but task is already holding lock:
        ffff9142dfc5d0b0 (btrfs-root-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x35/0x1c0 [btrfs]
      
        which lock already depends on the new lock.
      
        the existing dependency chain (in reverse order) is:
      
        -> #1 (btrfs-root-00){++++}-{3:3}:
      	 __lock_acquire+0x3fb/0x730
      	 lock_acquire.part.0+0x6a/0x130
      	 down_read_nested+0x46/0x130
      	 __btrfs_tree_read_lock+0x35/0x1c0 [btrfs]
      	 __btrfs_read_lock_root_node+0x3a/0x50 [btrfs]
      	 btrfs_search_slot_get_root+0x11d/0x290 [btrfs]
      	 btrfs_search_slot+0xc3/0x9f0 [btrfs]
      	 btrfs_insert_item+0x6e/0x140 [btrfs]
      	 btrfs_create_tree+0x1cb/0x240 [btrfs]
      	 btrfs_quota_enable+0xcd/0x790 [btrfs]
      	 btrfs_ioctl_quota_ctl+0xc9/0xe0 [btrfs]
      	 __x64_sys_ioctl+0x83/0xa0
      	 do_syscall_64+0x2d/0x70
      	 entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
        -> #0 (btrfs-quota-00){++++}-{3:3}:
      	 check_prev_add+0x91/0xc30
      	 validate_chain+0x491/0x750
      	 __lock_acquire+0x3fb/0x730
      	 lock_acquire.part.0+0x6a/0x130
      	 down_read_nested+0x46/0x130
      	 __btrfs_tree_read_lock+0x35/0x1c0 [btrfs]
      	 __btrfs_read_lock_root_node+0x3a/0x50 [btrfs]
      	 btrfs_search_slot_get_root+0x11d/0x290 [btrfs]
      	 btrfs_search_slot+0xc3/0x9f0 [btrfs]
      	 btrfs_insert_empty_items+0x58/0xa0 [btrfs]
      	 add_qgroup_item.part.0+0x72/0x210 [btrfs]
      	 btrfs_quota_enable+0x3bb/0x790 [btrfs]
      	 btrfs_ioctl_quota_ctl+0xc9/0xe0 [btrfs]
      	 __x64_sys_ioctl+0x83/0xa0
      	 do_syscall_64+0x2d/0x70
      	 entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
        other info that might help us debug this:
      
         Possible unsafe locking scenario:
      
      	 CPU0                    CPU1
      	 ----                    ----
          lock(btrfs-root-00);
      				 lock(btrfs-quota-00);
      				 lock(btrfs-root-00);
          lock(btrfs-quota-00);
      
         *** DEADLOCK ***
      
        5 locks held by btrfs/24552:
         #0: ffff9142df431478 (sb_writers#10){.+.+}-{0:0}, at: mnt_want_write_file+0x22/0xa0
         #1: ffff9142f9b10cc0 (&fs_info->subvol_sem){++++}-{3:3}, at: btrfs_ioctl_quota_ctl+0x7b/0xe0 [btrfs]
         #2: ffff9142f9b11a08 (&fs_info->qgroup_ioctl_lock){+.+.}-{3:3}, at: btrfs_quota_enable+0x3b/0x790 [btrfs]
         #3: ffff9142df431698 (sb_internal#2){.+.+}-{0:0}, at: start_transaction+0x406/0x510 [btrfs]
         #4: ffff9142dfc5d0b0 (btrfs-root-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x35/0x1c0 [btrfs]
      
        stack backtrace:
        CPU: 1 PID: 24552 Comm: btrfs Not tainted 5.9.0-default+ #1299
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014
        Call Trace:
         dump_stack+0x77/0x97
         check_noncircular+0xf3/0x110
         check_prev_add+0x91/0xc30
         validate_chain+0x491/0x750
         __lock_acquire+0x3fb/0x730
         lock_acquire.part.0+0x6a/0x130
         ? __btrfs_tree_read_lock+0x35/0x1c0 [btrfs]
         ? lock_acquire+0xc4/0x140
         ? __btrfs_tree_read_lock+0x35/0x1c0 [btrfs]
         down_read_nested+0x46/0x130
         ? __btrfs_tree_read_lock+0x35/0x1c0 [btrfs]
         __btrfs_tree_read_lock+0x35/0x1c0 [btrfs]
         ? btrfs_root_node+0xd9/0x200 [btrfs]
         __btrfs_read_lock_root_node+0x3a/0x50 [btrfs]
         btrfs_search_slot_get_root+0x11d/0x290 [btrfs]
         btrfs_search_slot+0xc3/0x9f0 [btrfs]
         btrfs_insert_empty_items+0x58/0xa0 [btrfs]
         add_qgroup_item.part.0+0x72/0x210 [btrfs]
         btrfs_quota_enable+0x3bb/0x790 [btrfs]
         btrfs_ioctl_quota_ctl+0xc9/0xe0 [btrfs]
         __x64_sys_ioctl+0x83/0xa0
         do_syscall_64+0x2d/0x70
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Fix this by dropping the path whenever we find a root item, add the
      qgroup item, and then re-lookup the root item we found and continue
      processing roots.
      Reported-by: NDavid Sterba <dsterba@suse.com>
      Reviewed-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      5223cc60
  12. 07 10月, 2020 1 次提交
  13. 27 7月, 2020 12 次提交
  14. 25 5月, 2020 1 次提交
    • Q
      btrfs: qgroup: mark qgroup inconsistent if we're inherting snapshot to a new qgroup · cbab8ade
      Qu Wenruo 提交于
      [BUG]
      For the following operation, qgroup is guaranteed to be screwed up due
      to snapshot adding to a new qgroup:
      
        # mkfs.btrfs -f $dev
        # mount $dev $mnt
        # btrfs qgroup en $mnt
        # btrfs subv create $mnt/src
        # xfs_io -f -c "pwrite 0 1m" $mnt/src/file
        # sync
        # btrfs qgroup create 1/0 $mnt/src
        # btrfs subv snapshot -i 1/0 $mnt/src $mnt/snapshot
        # btrfs qgroup show -prce $mnt/src
        qgroupid         rfer         excl     max_rfer     max_excl parent  child
        --------         ----         ----     --------     -------- ------  -----
        0/5          16.00KiB     16.00KiB         none         none ---     ---
        0/257         1.02MiB     16.00KiB         none         none ---     ---
        0/258         1.02MiB     16.00KiB         none         none 1/0     ---
        1/0             0.00B        0.00B         none         none ---     0/258
      	        ^^^^^^^^^^^^^^^^^^^^
      
      [CAUSE]
      The problem is in btrfs_qgroup_inherit(), we don't have good enough
      check to determine if the new relation would break the existing
      accounting.
      
      Unlike btrfs_add_qgroup_relation(), which has proper check to determine
      if we can do quick update without a rescan, in btrfs_qgroup_inherit() we
      can even assign a snapshot to multiple qgroups.
      
      [FIX]
      Fix it by manually marking qgroup inconsistent for snapshot inheritance.
      
      For subvolume creation, since all its extents are exclusively owned, we
      don't need to rescan.
      
      In theory, we should call relation check like quick_update_accounting()
      when doing qgroup inheritance and inform user about qgroup accounting
      inconsistency.
      
      But we don't have good mechanism to relay that back to the user in the
      snapshot creation context, thus we can only silently mark the qgroup
      inconsistent.
      
      Anyway, user shouldn't use qgroup inheritance during snapshot creation,
      and should add qgroup relationship after snapshot creation by 'btrfs
      qgroup assign', which has a much better UI to inform user about qgroup
      inconsistent and kick in rescan automatically.
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      cbab8ade
  15. 24 3月, 2020 3 次提交
    • J
      btrfs: move the root freeing stuff into btrfs_put_root · 8c38938c
      Josef Bacik 提交于
      There are a few different ways to free roots, either you allocated them
      yourself and you just do
      
      free_extent_buffer(root->node);
      free_extent_buffer(root->commit_node);
      btrfs_put_root(root);
      
      Which is the pattern for log roots.  Or for snapshots/subvolumes that
      are being dropped you simply call btrfs_free_fs_root() which does all
      the cleanup for you.
      
      Unify this all into btrfs_put_root(), so that we don't free up things
      associated with the root until the last reference is dropped.  This
      makes the root freeing code much more significant.
      
      The only caveat is at close_ctree() time we have to free the extent
      buffers for all of our main roots (extent_root, chunk_root, etc) because
      we have to drop the btree_inode and we'll run into issues if we hold
      onto those nodes until ->kill_sb() time.  This will be addressed in the
      future when we kill the btree_inode.
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      8c38938c
    • Q
      btrfs: qgroup: Remove the unnecesaary spin lock for qgroup_rescan_running · daf475c9
      Qu Wenruo 提交于
      After the previous patch, qgroup_rescan_running is protected by
      btrfs_fs_info::qgroup_rescan_lock, thus no need for the extra spinlock.
      Suggested-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      daf475c9
    • Q
      btrfs: qgroup: ensure qgroup_rescan_running is only set when the worker is at least queued · d61acbbf
      Qu Wenruo 提交于
      [BUG]
      There are some reports about btrfs wait forever to unmount itself, with
      the following call trace:
      
        INFO: task umount:4631 blocked for more than 491 seconds.
              Tainted: G               X  5.3.8-2-default #1
        "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
        umount          D    0  4631   3337 0x00000000
        Call Trace:
        ([<00000000174adf7a>] __schedule+0x342/0x748)
         [<00000000174ae3ca>] schedule+0x4a/0xd8
         [<00000000174b1f08>] schedule_timeout+0x218/0x420
         [<00000000174af10c>] wait_for_common+0x104/0x1d8
         [<000003ff804d6994>] btrfs_qgroup_wait_for_completion+0x84/0xb0 [btrfs]
         [<000003ff8044a616>] close_ctree+0x4e/0x380 [btrfs]
         [<0000000016fa3136>] generic_shutdown_super+0x8e/0x158
         [<0000000016fa34d6>] kill_anon_super+0x26/0x40
         [<000003ff8041ba88>] btrfs_kill_super+0x28/0xc8 [btrfs]
         [<0000000016fa39f8>] deactivate_locked_super+0x68/0x98
         [<0000000016fcb198>] cleanup_mnt+0xc0/0x140
         [<0000000016d6a846>] task_work_run+0xc6/0x110
         [<0000000016d04f76>] do_notify_resume+0xae/0xb8
         [<00000000174b30ae>] system_call+0xe2/0x2c8
      
      [CAUSE]
      The problem happens when we have called qgroup_rescan_init(), but
      not queued the worker. It can be caused mostly by error handling.
      
      	Qgroup ioctl thread		|	Unmount thread
      ----------------------------------------+-----------------------------------
      					|
      btrfs_qgroup_rescan()			|
      |- qgroup_rescan_init()			|
      |  |- qgroup_rescan_running = true;	|
      |					|
      |- trans = btrfs_join_transaction()	|
      |  Some error happened			|
      |					|
      |- btrfs_qgroup_rescan() returns error	|
         But qgroup_rescan_running == true;	|
      					| close_ctree()
      					| |- btrfs_qgroup_wait_for_completion()
      					|    |- running == true;
      					|    |- wait_for_completion();
      
      btrfs_qgroup_rescan_worker is never queued, thus no one is going to wake
      up close_ctree() and we get a deadlock.
      
      All involved qgroup_rescan_init() callers are:
      
      - btrfs_qgroup_rescan()
        The example above. It's possible to trigger the deadlock when error
        happened.
      
      - btrfs_quota_enable()
        Not possible. Just after qgroup_rescan_init() we queue the work.
      
      - btrfs_read_qgroup_config()
        It's possible to trigger the deadlock. It only init the work, the
        work queueing happens in btrfs_qgroup_rescan_resume().
        Thus if error happened in between, deadlock is possible.
      
      We shouldn't set fs_info->qgroup_rescan_running just in
      qgroup_rescan_init(), as at that stage we haven't yet queued qgroup
      rescan worker to run.
      
      [FIX]
      Set qgroup_rescan_running before queueing the work, so that we ensure
      the rescan work is queued when we wait for it.
      
      Fixes: 8d9eddad ("Btrfs: fix qgroup rescan worker initialization")
      Signed-off-by: NJeff Mahoney <jeffm@suse.com>
      [ Change subject and cause analyse, use a smaller fix ]
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      d61acbbf