1. 16 5月, 2022 1 次提交
  2. 06 5月, 2022 2 次提交
  3. 24 3月, 2022 2 次提交
    • J
      btrfs: zoned: remove left over ASSERT checking for single profile · 62ed0bf7
      Johannes Thumshirn 提交于
      With commit dcf5652291f6 ("btrfs: zoned: allow DUP on meta-data block
      groups") we started allowing DUP on metadata block groups, so the
      ASSERT()s in btrfs_can_activate_zone() and btrfs_zoned_get_device() are
      no longer valid and in fact even harmful.
      
      Fixes: dcf5652291f6 ("btrfs: zoned: allow DUP on meta-data block groups")
      CC: stable@vger.kernel.org # 5.17
      Signed-off-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      62ed0bf7
    • J
      btrfs: zoned: traverse devices under chunk_mutex in btrfs_can_activate_zone · 0b9e6676
      Johannes Thumshirn 提交于
      btrfs_can_activate_zone() can be called with the device_list_mutex already
      held, which will lead to a deadlock:
      
      insert_dev_extents() // Takes device_list_mutex
      `-> insert_dev_extent()
       `-> btrfs_insert_empty_item()
        `-> btrfs_insert_empty_items()
         `-> btrfs_search_slot()
          `-> btrfs_cow_block()
           `-> __btrfs_cow_block()
            `-> btrfs_alloc_tree_block()
             `-> btrfs_reserve_extent()
              `-> find_free_extent()
               `-> find_free_extent_update_loop()
                `-> can_allocate_chunk()
                 `-> btrfs_can_activate_zone() // Takes device_list_mutex again
      
      Instead of using the RCU on fs_devices->device_list we
      can use fs_devices->alloc_list, protected by the chunk_mutex to traverse
      the list of active devices.
      
      We are in the chunk allocation thread. The newer chunk allocation
      happens from the devices in the fs_device->alloc_list protected by the
      chunk_mutex.
      
        btrfs_create_chunk()
          lockdep_assert_held(&info->chunk_mutex);
          gather_device_info
            list_for_each_entry(device, &fs_devices->alloc_list, dev_alloc_list)
      
      Also, a device that reappears after the mount won't join the alloc_list
      yet and, it will be in the dev_list, which we don't want to consider in
      the context of the chunk alloc.
      
        [15.166572] WARNING: possible recursive locking detected
        [15.167117] 5.17.0-rc6-dennis #79 Not tainted
        [15.167487] --------------------------------------------
        [15.167733] kworker/u8:3/146 is trying to acquire lock:
        [15.167733] ffff888102962ee0 (&fs_devs->device_list_mutex){+.+.}-{3:3}, at: find_free_extent+0x15a/0x14f0 [btrfs]
        [15.167733]
        [15.167733] but task is already holding lock:
        [15.167733] ffff888102962ee0 (&fs_devs->device_list_mutex){+.+.}-{3:3}, at: btrfs_create_pending_block_groups+0x20a/0x560 [btrfs]
        [15.167733]
        [15.167733] other info that might help us debug this:
        [15.167733]  Possible unsafe locking scenario:
        [15.167733]
        [15.171834]        CPU0
        [15.171834]        ----
        [15.171834]   lock(&fs_devs->device_list_mutex);
        [15.171834]   lock(&fs_devs->device_list_mutex);
        [15.171834]
        [15.171834]  *** DEADLOCK ***
        [15.171834]
        [15.171834]  May be due to missing lock nesting notation
        [15.171834]
        [15.171834] 5 locks held by kworker/u8:3/146:
        [15.171834]  #0: ffff888100050938 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x1c3/0x5a0
        [15.171834]  #1: ffffc9000067be80 ((work_completion)(&fs_info->async_data_reclaim_work)){+.+.}-{0:0}, at: process_one_work+0x1c3/0x5a0
        [15.176244]  #2: ffff88810521e620 (sb_internal){.+.+}-{0:0}, at: flush_space+0x335/0x600 [btrfs]
        [15.176244]  #3: ffff888102962ee0 (&fs_devs->device_list_mutex){+.+.}-{3:3}, at: btrfs_create_pending_block_groups+0x20a/0x560 [btrfs]
        [15.176244]  #4: ffff8881152e4b78 (btrfs-dev-00){++++}-{3:3}, at: __btrfs_tree_lock+0x27/0x130 [btrfs]
        [15.179641]
        [15.179641] stack backtrace:
        [15.179641] CPU: 1 PID: 146 Comm: kworker/u8:3 Not tainted 5.17.0-rc6-dennis #79
        [15.179641] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1.fc35 04/01/2014
        [15.179641] Workqueue: events_unbound btrfs_async_reclaim_data_space [btrfs]
        [15.179641] Call Trace:
        [15.179641]  <TASK>
        [15.179641]  dump_stack_lvl+0x45/0x59
        [15.179641]  __lock_acquire.cold+0x217/0x2b2
        [15.179641]  lock_acquire+0xbf/0x2b0
        [15.183838]  ? find_free_extent+0x15a/0x14f0 [btrfs]
        [15.183838]  __mutex_lock+0x8e/0x970
        [15.183838]  ? find_free_extent+0x15a/0x14f0 [btrfs]
        [15.183838]  ? find_free_extent+0x15a/0x14f0 [btrfs]
        [15.183838]  ? lock_is_held_type+0xd7/0x130
        [15.183838]  ? find_free_extent+0x15a/0x14f0 [btrfs]
        [15.183838]  find_free_extent+0x15a/0x14f0 [btrfs]
        [15.183838]  ? _raw_spin_unlock+0x24/0x40
        [15.183838]  ? btrfs_get_alloc_profile+0x106/0x230 [btrfs]
        [15.187601]  btrfs_reserve_extent+0x131/0x260 [btrfs]
        [15.187601]  btrfs_alloc_tree_block+0xb5/0x3b0 [btrfs]
        [15.187601]  __btrfs_cow_block+0x138/0x600 [btrfs]
        [15.187601]  btrfs_cow_block+0x10f/0x230 [btrfs]
        [15.187601]  btrfs_search_slot+0x55f/0xbc0 [btrfs]
        [15.187601]  ? lock_is_held_type+0xd7/0x130
        [15.187601]  btrfs_insert_empty_items+0x2d/0x60 [btrfs]
        [15.187601]  btrfs_create_pending_block_groups+0x2b3/0x560 [btrfs]
        [15.187601]  __btrfs_end_transaction+0x36/0x2a0 [btrfs]
        [15.192037]  flush_space+0x374/0x600 [btrfs]
        [15.192037]  ? find_held_lock+0x2b/0x80
        [15.192037]  ? btrfs_async_reclaim_data_space+0x49/0x180 [btrfs]
        [15.192037]  ? lock_release+0x131/0x2b0
        [15.192037]  btrfs_async_reclaim_data_space+0x70/0x180 [btrfs]
        [15.192037]  process_one_work+0x24c/0x5a0
        [15.192037]  worker_thread+0x4a/0x3d0
      
      Fixes: a85f05e5 ("btrfs: zoned: avoid chunk allocation if active block group has enough space")
      CC: stable@vger.kernel.org # 5.16+
      Reviewed-by: NAnand Jain <anand.jain@oracle.com>
      Signed-off-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      0b9e6676
  4. 14 3月, 2022 5 次提交
  5. 07 1月, 2022 2 次提交
    • N
      btrfs: zoned: fix chunk allocation condition for zoned allocator · 82187d2e
      Naohiro Aota 提交于
      The ZNS specification defines a limit on the number of "active"
      zones. That limit impose us to limit the number of block groups which
      can be used for an allocation at the same time. Not to exceed the
      limit, we reuse the existing active block groups as much as possible
      when we can't activate any other zones without sacrificing an already
      activated block group in commit a85f05e5 ("btrfs: zoned: avoid
      chunk allocation if active block group has enough space").
      
      However, the check is wrong in two ways. First, it checks the
      condition for every raid index (ffe_ctl->index). Even if it reaches
      the condition and "ffe_ctl->max_extent_size >=
      ffe_ctl->min_alloc_size" is met, there can be other block groups
      having enough space to hold ffe_ctl->num_bytes. (Actually, this won't
      happen in the current zoned code as it only supports SINGLE
      profile. But, it can happen once it enables other RAID types.)
      
      Second, it checks the active zone availability depending on the
      raid index. The raid index is just an index for
      space_info->block_groups, so it has nothing to do with chunk allocation.
      
      These mistakes are causing a faulty allocation in a certain
      situation. Consider we are running zoned btrfs on a device whose
      max_active_zone == 0 (no limit). And, suppose no block group have a
      room to fit ffe_ctl->num_bytes but some room to meet
      ffe_ctl->min_alloc_size (i.e. max_extent_size > num_bytes >=
      min_alloc_size).
      
      In this situation, the following occur:
      
      - With SINGLE raid_index, it reaches the chunk allocation checking
        code
      - The check returns true because we can activate a new zone (no limit)
      - But, before allocating the chunk, it iterates to the next raid index
        (RAID5)
      - Since there are no RAID5 block groups on zoned mode, it again
        reaches the check code
      - The check returns false because of btrfs_can_activate_zone()'s "if
        (raid_index != BTRFS_RAID_SINGLE)" part
      - That results in returning -ENOSPC without allocating a new chunk
      
      As a result, we end up hitting -ENOSPC too early.
      
      Move the check to the right place in the can_allocate_chunk() hook,
      and do the active zone check depending on the allocation flag, not on
      the raid index.
      
      CC: stable@vger.kernel.org # 5.16
      Signed-off-by: NNaohiro Aota <naohiro.aota@wdc.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      82187d2e
    • J
      btrfs: zoned: simplify btrfs_check_meta_write_pointer · 8fdf54fe
      Johannes Thumshirn 提交于
      btrfs_check_meta_write_pointer() will always be called with a NULL
      'cache_ret' argument.
      
      As there's no need to check if we have a valid block_group passed in
      remove these checks.
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      8fdf54fe
  6. 03 1月, 2022 2 次提交
  7. 08 12月, 2021 1 次提交
  8. 27 10月, 2021 16 次提交
  9. 26 10月, 2021 1 次提交
  10. 23 8月, 2021 2 次提交
  11. 22 7月, 2021 1 次提交
  12. 22 6月, 2021 1 次提交
  13. 21 6月, 2021 3 次提交
  14. 04 6月, 2021 1 次提交