1. 07 10月, 2020 2 次提交
    • Q
      btrfs: qgroup: fix qgroup meta rsv leak for subvolume operations · e85fde51
      Qu Wenruo 提交于
      [BUG]
      When quota is enabled for TEST_DEV, generic/013 sometimes fails like this:
      
        generic/013 14s ... _check_dmesg: something found in dmesg (see xfstests-dev/results//generic/013.dmesg)
      
      And with the following metadata leak:
      
        BTRFS warning (device dm-3): qgroup 0/1370 has unreleased space, type 2 rsv 49152
        ------------[ cut here ]------------
        WARNING: CPU: 2 PID: 47912 at fs/btrfs/disk-io.c:4078 close_ctree+0x1dc/0x323 [btrfs]
        Call Trace:
         btrfs_put_super+0x15/0x17 [btrfs]
         generic_shutdown_super+0x72/0x110
         kill_anon_super+0x18/0x30
         btrfs_kill_super+0x17/0x30 [btrfs]
         deactivate_locked_super+0x3b/0xa0
         deactivate_super+0x40/0x50
         cleanup_mnt+0x135/0x190
         __cleanup_mnt+0x12/0x20
         task_work_run+0x64/0xb0
         __prepare_exit_to_usermode+0x1bc/0x1c0
         __syscall_return_slowpath+0x47/0x230
         do_syscall_64+0x64/0xb0
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
        ---[ end trace a6cfd45ba80e4e06 ]---
        BTRFS error (device dm-3): qgroup reserved space leaked
        BTRFS info (device dm-3): disk space caching is enabled
        BTRFS info (device dm-3): has skinny extents
      
      [CAUSE]
      The qgroup preallocated meta rsv operations of that offending root are:
      
        btrfs_delayed_inode_reserve_metadata: rsv_meta_prealloc root=1370 num_bytes=131072
        btrfs_delayed_inode_reserve_metadata: rsv_meta_prealloc root=1370 num_bytes=131072
        btrfs_subvolume_reserve_metadata: rsv_meta_prealloc root=1370 num_bytes=49152
        btrfs_delayed_inode_release_metadata: convert_meta_prealloc root=1370 num_bytes=-131072
        btrfs_delayed_inode_release_metadata: convert_meta_prealloc root=1370 num_bytes=-131072
      
      It's pretty obvious that, we reserve qgroup meta rsv in
      btrfs_subvolume_reserve_metadata(), but doesn't have corresponding
      release/convert calls in btrfs_subvolume_release_metadata().
      
      This leads to the leakage.
      
      [FIX]
      To fix this bug, we should follow what we're doing in
      btrfs_delalloc_reserve_metadata(), where we reserve qgroup space, and
      add it to block_rsv->qgroup_rsv_reserved.
      
      And free the qgroup reserved metadata space when releasing the
      block_rsv.
      
      To do this, we need to change the btrfs_subvolume_release_metadata() to
      accept btrfs_root, and record the qgroup_to_release number, and call
      btrfs_qgroup_convert_reserved_meta() for it.
      
      Fixes: 733e03a0 ("btrfs: qgroup: Split meta rsv type into meta_prealloc and meta_pertrans")
      CC: stable@vger.kernel.org # 4.19+
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      e85fde51
    • J
      btrfs: change nr to u64 in btrfs_start_delalloc_roots · b4912139
      Josef Bacik 提交于
      We have btrfs_wait_ordered_roots() which takes a u64 for nr, but
      btrfs_start_delalloc_roots() that takes an int for nr, which makes using
      them in conjunction, especially for something like (u64)-1, annoying and
      inconsistent.  Fix btrfs_start_delalloc_roots() to take a u64 for nr and
      adjust start_delalloc_inodes() and it's callers appropriately.
      
      This means we've adjusted start_delalloc_inodes() to take a pointer of
      nr since we want to preserve the ability for start-delalloc_inodes() to
      return an error, so simply make it do the nr adjusting as necessary.
      
      Part of adjusting the callers to this means changing
      btrfs_writeback_inodes_sb_nr() to take a u64 for items.  This may be
      confusing because it seems unrelated, but the caller of
      btrfs_writeback_inodes_sb_nr() already passes in a u64, it's just the
      function variable that needs to be changed.
      Reviewed-by: NNikolay Borisov <nborisov@suse.com>
      Tested-by: NNikolay Borisov <nborisov@suse.com>
      Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      b4912139
  2. 14 9月, 2020 1 次提交
  3. 27 8月, 2020 1 次提交
    • J
      btrfs: fix potential deadlock in the search ioctl · a48b73ec
      Josef Bacik 提交于
      With the conversion of the tree locks to rwsem I got the following
      lockdep splat:
      
        ======================================================
        WARNING: possible circular locking dependency detected
        5.8.0-rc7-00165-g04ec4da5f45f-dirty #922 Not tainted
        ------------------------------------------------------
        compsize/11122 is trying to acquire lock:
        ffff889fabca8768 (&mm->mmap_lock#2){++++}-{3:3}, at: __might_fault+0x3e/0x90
      
        but task is already holding lock:
        ffff889fe720fe40 (btrfs-fs-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x39/0x180
      
        which lock already depends on the new lock.
      
        the existing dependency chain (in reverse order) is:
      
        -> #2 (btrfs-fs-00){++++}-{3:3}:
      	 down_write_nested+0x3b/0x70
      	 __btrfs_tree_lock+0x24/0x120
      	 btrfs_search_slot+0x756/0x990
      	 btrfs_lookup_inode+0x3a/0xb4
      	 __btrfs_update_delayed_inode+0x93/0x270
      	 btrfs_async_run_delayed_root+0x168/0x230
      	 btrfs_work_helper+0xd4/0x570
      	 process_one_work+0x2ad/0x5f0
      	 worker_thread+0x3a/0x3d0
      	 kthread+0x133/0x150
      	 ret_from_fork+0x1f/0x30
      
        -> #1 (&delayed_node->mutex){+.+.}-{3:3}:
      	 __mutex_lock+0x9f/0x930
      	 btrfs_delayed_update_inode+0x50/0x440
      	 btrfs_update_inode+0x8a/0xf0
      	 btrfs_dirty_inode+0x5b/0xd0
      	 touch_atime+0xa1/0xd0
      	 btrfs_file_mmap+0x3f/0x60
      	 mmap_region+0x3a4/0x640
      	 do_mmap+0x376/0x580
      	 vm_mmap_pgoff+0xd5/0x120
      	 ksys_mmap_pgoff+0x193/0x230
      	 do_syscall_64+0x50/0x90
      	 entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
        -> #0 (&mm->mmap_lock#2){++++}-{3:3}:
      	 __lock_acquire+0x1272/0x2310
      	 lock_acquire+0x9e/0x360
      	 __might_fault+0x68/0x90
      	 _copy_to_user+0x1e/0x80
      	 copy_to_sk.isra.32+0x121/0x300
      	 search_ioctl+0x106/0x200
      	 btrfs_ioctl_tree_search_v2+0x7b/0xf0
      	 btrfs_ioctl+0x106f/0x30a0
      	 ksys_ioctl+0x83/0xc0
      	 __x64_sys_ioctl+0x16/0x20
      	 do_syscall_64+0x50/0x90
      	 entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
        other info that might help us debug this:
      
        Chain exists of:
          &mm->mmap_lock#2 --> &delayed_node->mutex --> btrfs-fs-00
      
         Possible unsafe locking scenario:
      
      	 CPU0                    CPU1
      	 ----                    ----
          lock(btrfs-fs-00);
      				 lock(&delayed_node->mutex);
      				 lock(btrfs-fs-00);
          lock(&mm->mmap_lock#2);
      
         *** DEADLOCK ***
      
        1 lock held by compsize/11122:
         #0: ffff889fe720fe40 (btrfs-fs-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x39/0x180
      
        stack backtrace:
        CPU: 17 PID: 11122 Comm: compsize Kdump: loaded Not tainted 5.8.0-rc7-00165-g04ec4da5f45f-dirty #922
        Hardware name: Quanta Tioga Pass Single Side 01-0030993006/Tioga Pass Single Side, BIOS F08_3A18 12/20/2018
        Call Trace:
         dump_stack+0x78/0xa0
         check_noncircular+0x165/0x180
         __lock_acquire+0x1272/0x2310
         lock_acquire+0x9e/0x360
         ? __might_fault+0x3e/0x90
         ? find_held_lock+0x72/0x90
         __might_fault+0x68/0x90
         ? __might_fault+0x3e/0x90
         _copy_to_user+0x1e/0x80
         copy_to_sk.isra.32+0x121/0x300
         ? btrfs_search_forward+0x2a6/0x360
         search_ioctl+0x106/0x200
         btrfs_ioctl_tree_search_v2+0x7b/0xf0
         btrfs_ioctl+0x106f/0x30a0
         ? __do_sys_newfstat+0x5a/0x70
         ? ksys_ioctl+0x83/0xc0
         ksys_ioctl+0x83/0xc0
         __x64_sys_ioctl+0x16/0x20
         do_syscall_64+0x50/0x90
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      The problem is we're doing a copy_to_user() while holding tree locks,
      which can deadlock if we have to do a page fault for the copy_to_user().
      This exists even without my locking changes, so it needs to be fixed.
      Rework the search ioctl to do the pre-fault and then
      copy_to_user_nofault for the copying.
      
      CC: stable@vger.kernel.org # 4.4+
      Reviewed-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      a48b73ec
  4. 27 7月, 2020 8 次提交
  5. 17 6月, 2020 1 次提交
  6. 25 5月, 2020 4 次提交
    • D
      btrfs: simplify iget helpers · 0202e83f
      David Sterba 提交于
      The inode lookup starting at btrfs_iget takes the full location key,
      while only the objectid is used to match the inode, because the lookup
      happens inside the given root thus the inode number is unique.
      The entire location key is properly set up in btrfs_init_locked_inode.
      
      Simplify the helpers and pass only inode number, renaming it to 'ino'
      instead of 'objectid'. This allows to remove temporary variables key,
      saving some stack space.
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      0202e83f
    • D
      btrfs: simplify root lookup by id · 56e9357a
      David Sterba 提交于
      The main function to lookup a root by its id btrfs_get_fs_root takes the
      whole key, while only using the objectid. The value of offset is preset
      to (u64)-1 but not actually used until btrfs_find_root that does the
      actual search.
      
      Switch btrfs_get_fs_root to use only objectid and remove all local
      variables that existed just for the lookup. The actual key for search is
      set up in btrfs_get_fs_root, reusing another key variable.
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      56e9357a
    • R
      btrfs: reduce lock contention when creating snapshot · c11fbb6e
      Robbie Ko 提交于
      When creating a snapshot, ordered extents need to be flushed and this
      can take a long time.
      
      In create_snapshot there are two locks held when this happens:
      
        1. Destination directory inode lock
        2. Global subvolume semaphore
      
      This will unnecessarily block other operations like subvolume destroy,
      create, or setflag until the snapshot is created.
      
      We can fix that by moving the flush outside the locked section as this
      does not depend on the aforementioned locks.  The code factors out the
      snapshot related work from create_snapshot to btrfs_mksnapshot.
      
      __btrfs_ioctl_snap_create
        btrfs_mksubvol
          create_subvol
        btrfs_mksnapshot
          <flush>
          btrfs_mksubvol
            create_snapshot
      Reviewed-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NRobbie Ko <robbieko@synology.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      c11fbb6e
    • Q
      btrfs: rename BTRFS_ROOT_REF_COWS to BTRFS_ROOT_SHAREABLE · 92a7cc42
      Qu Wenruo 提交于
      The name BTRFS_ROOT_REF_COWS is not very clear about the meaning.
      
      In fact, that bit can only be set to those trees:
      
      - Subvolume roots
      - Data reloc root
      - Reloc roots for above roots
      
      All other trees won't get this bit set.  So just by the result, it is
      obvious that, roots with this bit set can have tree blocks shared with
      other trees.  Either shared by snapshots, or by reloc roots (an special
      snapshot created by relocation).
      
      This patch will rename BTRFS_ROOT_REF_COWS to BTRFS_ROOT_SHAREABLE to
      make it easier to understand, and update all comment mentioning
      "reference counted" to follow the rename.
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      92a7cc42
  7. 24 3月, 2020 22 次提交
  8. 24 1月, 2020 1 次提交
    • F
      Btrfs: make deduplication with range including the last block work · 831d2fa2
      Filipe Manana 提交于
      Since btrfs was migrated to use the generic VFS helpers for clone and
      deduplication, it stopped allowing for the last block of a file to be
      deduplicated when the source file size is not sector size aligned (when
      eof is somewhere in the middle of the last block). There are two reasons
      for that:
      
      1) The generic code always rounds down, to a multiple of the block size,
         the range's length for deduplications. This means we end up never
         deduplicating the last block when the eof is not block size aligned,
         even for the safe case where the destination range's end offset matches
         the destination file's size. That rounding down operation is done at
         generic_remap_check_len();
      
      2) Because of that, the btrfs specific code does not expect anymore any
         non-aligned range length's for deduplication and therefore does not
         work if such nona-aligned length is given.
      
      This patch addresses that second part, and it depends on a patch that
      fixes generic_remap_check_len(), in the VFS, which was submitted ealier
      and has the following subject:
      
        "fs: allow deduplication of eof block into the end of the destination file"
      
      These two patches address reports from users that started seeing lower
      deduplication rates due to the last block never being deduplicated when
      the file size is not aligned to the filesystem's block size.
      
      Link: https://lore.kernel.org/linux-btrfs/2019-1576167349.500456@svIo.N5dq.dFFD/
      CC: stable@vger.kernel.org # 5.1+
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      831d2fa2