1. 23 8月, 2021 12 次提交
    • D
      btrfs: reset replace target device to allocation state on close · 0d977e0e
      Desmond Cheong Zhi Xi 提交于
      This crash was observed with a failed assertion on device close:
      
        BTRFS: Transaction aborted (error -28)
        WARNING: CPU: 1 PID: 3902 at fs/btrfs/extent-tree.c:2150 btrfs_run_delayed_refs+0x1d2/0x1e0 [btrfs]
        Modules linked in: btrfs blake2b_generic libcrc32c crc32c_intel xor zstd_decompress zstd_compress xxhash lzo_compress lzo_decompress raid6_pq loop
        CPU: 1 PID: 3902 Comm: kworker/u8:4 Not tainted 5.14.0-rc5-default+ #1532
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014
        Workqueue: events_unbound btrfs_async_reclaim_metadata_space [btrfs]
        RIP: 0010:btrfs_run_delayed_refs+0x1d2/0x1e0 [btrfs]
        RSP: 0018:ffffb7a5452d7d80 EFLAGS: 00010282
        RAX: 0000000000000000 RBX: 0000000000000003 RCX: 0000000000000000
        RDX: 0000000000000001 RSI: ffffffffabee13c4 RDI: 00000000ffffffff
        RBP: ffff97834176a378 R08: 0000000000000001 R09: 0000000000000001
        R10: 0000000000000000 R11: 0000000000000001 R12: ffff97835195d388
        R13: 0000000005b08000 R14: ffff978385484000 R15: 000000000000016c
        FS:  0000000000000000(0000) GS:ffff9783bd800000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 000056190d003fe8 CR3: 000000002a81e005 CR4: 0000000000170ea0
        Call Trace:
         flush_space+0x197/0x2f0 [btrfs]
         btrfs_async_reclaim_metadata_space+0x139/0x300 [btrfs]
         process_one_work+0x262/0x5e0
         worker_thread+0x4c/0x320
         ? process_one_work+0x5e0/0x5e0
         kthread+0x144/0x170
         ? set_kthread_struct+0x40/0x40
         ret_from_fork+0x1f/0x30
        irq event stamp: 19334989
        hardirqs last  enabled at (19334997): [<ffffffffab0e0c87>] console_unlock+0x2b7/0x400
        hardirqs last disabled at (19335006): [<ffffffffab0e0d0d>] console_unlock+0x33d/0x400
        softirqs last  enabled at (19334900): [<ffffffffaba0030d>] __do_softirq+0x30d/0x574
        softirqs last disabled at (19334893): [<ffffffffab0721ec>] irq_exit_rcu+0x12c/0x140
        ---[ end trace 45939e308e0dd3c7 ]---
        BTRFS: error (device vdd) in btrfs_run_delayed_refs:2150: errno=-28 No space left
        BTRFS info (device vdd): forced readonly
        BTRFS warning (device vdd): failed setting block group ro: -30
        BTRFS info (device vdd): suspending dev_replace for unmount
        assertion failed: !test_bit(BTRFS_DEV_STATE_REPLACE_TGT, &device->dev_state), in fs/btrfs/volumes.c:1150
        ------------[ cut here ]------------
        kernel BUG at fs/btrfs/ctree.h:3431!
        invalid opcode: 0000 [#1] PREEMPT SMP
        CPU: 1 PID: 3982 Comm: umount Tainted: G        W         5.14.0-rc5-default+ #1532
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014
        RIP: 0010:assertfail.constprop.0+0x18/0x1a [btrfs]
        RSP: 0018:ffffb7a5454c7db8 EFLAGS: 00010246
        RAX: 0000000000000068 RBX: ffff978364b91c00 RCX: 0000000000000000
        RDX: 0000000000000000 RSI: ffffffffabee13c4 RDI: 00000000ffffffff
        RBP: ffff9783523a4c00 R08: 0000000000000001 R09: 0000000000000001
        R10: 0000000000000000 R11: 0000000000000001 R12: ffff9783523a4d18
        R13: 0000000000000000 R14: 0000000000000004 R15: 0000000000000003
        FS:  00007f61c8f42800(0000) GS:ffff9783bd800000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 000056190cffa810 CR3: 0000000030b96002 CR4: 0000000000170ea0
        Call Trace:
         btrfs_close_one_device.cold+0x11/0x55 [btrfs]
         close_fs_devices+0x44/0xb0 [btrfs]
         btrfs_close_devices+0x48/0x160 [btrfs]
         generic_shutdown_super+0x69/0x100
         kill_anon_super+0x14/0x30
         btrfs_kill_super+0x12/0x20 [btrfs]
         deactivate_locked_super+0x2c/0xa0
         cleanup_mnt+0x144/0x1b0
         task_work_run+0x59/0xa0
         exit_to_user_mode_loop+0xe7/0xf0
         exit_to_user_mode_prepare+0xaf/0xf0
         syscall_exit_to_user_mode+0x19/0x50
         do_syscall_64+0x4a/0x90
         entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      This happens when close_ctree is called while a dev_replace hasn't
      completed. In close_ctree, we suspend the dev_replace, but keep the
      replace target around so that we can resume the dev_replace procedure
      when we mount the root again. This is the call trace:
      
        close_ctree():
          btrfs_dev_replace_suspend_for_unmount();
          btrfs_close_devices():
            btrfs_close_fs_devices():
              btrfs_close_one_device():
                ASSERT(!test_bit(BTRFS_DEV_STATE_REPLACE_TGT,
                       &device->dev_state));
      
      However, since the replace target sticks around, there is a device
      with BTRFS_DEV_STATE_REPLACE_TGT set on close, and we fail the
      assertion in btrfs_close_one_device.
      
      To fix this, if we come across the replace target device when
      closing, we should properly reset it back to allocation state. This
      fix also ensures that if a non-target device has a corrupted state and
      has the BTRFS_DEV_STATE_REPLACE_TGT bit set, the assertion will still
      catch the error.
      Reported-by: NDavid Sterba <dsterba@suse.com>
      Fixes: b2a61667 ("btrfs: fix rw device counting in __btrfs_free_extra_devids")
      CC: stable@vger.kernel.org # 4.19+
      Reviewed-by: NAnand Jain <anand.jain@oracle.com>
      Signed-off-by: NDesmond Cheong Zhi Xi <desmondcheongzx@gmail.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      0d977e0e
    • Q
      btrfs: fix NULL pointer dereference when deleting device by invalid id · e4571b8c
      Qu Wenruo 提交于
      [BUG]
      It's easy to trigger NULL pointer dereference, just by removing a
      non-existing device id:
      
       # mkfs.btrfs -f -m single -d single /dev/test/scratch1 \
      				     /dev/test/scratch2
       # mount /dev/test/scratch1 /mnt/btrfs
       # btrfs device remove 3 /mnt/btrfs
      
      Then we have the following kernel NULL pointer dereference:
      
       BUG: kernel NULL pointer dereference, address: 0000000000000000
       #PF: supervisor read access in kernel mode
       #PF: error_code(0x0000) - not-present page
       PGD 0 P4D 0
       Oops: 0000 [#1] PREEMPT SMP NOPTI
       CPU: 9 PID: 649 Comm: btrfs Not tainted 5.14.0-rc3-custom+ #35
       Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
       RIP: 0010:btrfs_rm_device+0x4de/0x6b0 [btrfs]
        btrfs_ioctl+0x18bb/0x3190 [btrfs]
        ? lock_is_held_type+0xa5/0x120
        ? find_held_lock.constprop.0+0x2b/0x80
        ? do_user_addr_fault+0x201/0x6a0
        ? lock_release+0xd2/0x2d0
        ? __x64_sys_ioctl+0x83/0xb0
        __x64_sys_ioctl+0x83/0xb0
        do_syscall_64+0x3b/0x90
        entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      [CAUSE]
      Commit a27a94c2 ("btrfs: Make btrfs_find_device_by_devspec return
      btrfs_device directly") moves the "missing" device path check into
      btrfs_rm_device().
      
      But btrfs_rm_device() itself can have case where it only receives
      @devid, with NULL as @device_path.
      
      In that case, calling strcmp() on NULL will trigger the NULL pointer
      dereference.
      
      Before that commit, we handle the "missing" case inside
      btrfs_find_device_by_devspec(), which will not check @device_path at all
      if @devid is provided, thus no way to trigger the bug.
      
      [FIX]
      Before calling strcmp(), also make sure @device_path is not NULL.
      
      Fixes: a27a94c2 ("btrfs: Make btrfs_find_device_by_devspec return btrfs_device directly")
      CC: stable@vger.kernel.org # 5.4+
      Reported-by: Nbutt3rflyh4ck <butterflyhuangxx@gmail.com>
      Reviewed-by: NAnand Jain <anand.jain@oracle.com>
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      e4571b8c
    • M
      btrfs: introduce btrfs_search_backwards function · 0ff40a91
      Marcos Paulo de Souza 提交于
      It's a common practice to start a search using offset (u64)-1, which is
      the u64 maximum value, meaning that we want the search_slot function to
      be set in the last item with the same objectid and type.
      
      Once we are in this position, it's a matter to start a search backwards
      by calling btrfs_previous_item, which will check if we'll need to go to
      a previous leaf and other necessary checks, only to be sure that we are
      in last offset of the same object and type.
      
      The new btrfs_search_backwards function does the all these steps when
      necessary, and can be used to avoid code duplication.
      Signed-off-by: NMarcos Paulo de Souza <mpdesouza@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      0ff40a91
    • A
      btrfs: simplify return values in btrfs_check_raid_min_devices · efc222f8
      Anand Jain 提交于
      Function btrfs_check_raid_min_devices() returns error code from the enum
      btrfs_err_code and it starts from 1. So there is no need to check if ret
      is > 0. So drop this check and also drop the local variable ret.
      Signed-off-by: NAnand Jain <anand.jain@oracle.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      efc222f8
    • D
      btrfs: allow degenerate raid0/raid10 · b2f78e88
      David Sterba 提交于
      The data on raid0 and raid10 are supposed to be spread over multiple
      devices, so the minimum constraints are set to 2 and 4 respectively.
      This is an artificial limit and there's some interest to remove it.
      
      Change this to allow raid0 on one device and raid10 on two devices. This
      works as expected eg. when converting or removing devices.
      
      The only difference is when raid0 on two devices gets one device
      removed. Unpatched would silently create a single profile, while newly
      it would be raid0.
      
      The motivation is to allow to preserve the profile type as long as it
      possible for some intermediate state (device removal, conversion), or
      when there are disks of different size, with raid0 the otherwise
      unusable space of the last device will be used too. Similarly for
      raid10, though the two largest devices would need to be the same.
      
      Unpatched kernel will mount and use the degenerate profiles just fine
      but won't allow any operation that would not satisfy the stricter device
      number constraints, eg. not allowing to go from 3 to 2 devices for
      raid10 or various profile conversions.
      
      Example output:
      
        # btrfs fi us -T .
        Overall:
            Device size:                  10.00GiB
            Device allocated:              1.01GiB
            Device unallocated:            8.99GiB
            Device missing:                  0.00B
            Used:                        200.61MiB
            Free (estimated):              9.79GiB      (min: 9.79GiB)
            Free (statfs, df):             9.79GiB
            Data ratio:                       1.00
            Metadata ratio:                   1.00
            Global reserve:                3.25MiB      (used: 0.00B)
            Multiple profiles:                  no
      
      		Data      Metadata  System
        Id Path       RAID0     single    single   Unallocated
        -- ---------- --------- --------- -------- -----------
         1 /dev/sda10   1.00GiB   8.00MiB  1.00MiB     8.99GiB
        -- ---------- --------- --------- -------- -----------
           Total        1.00GiB   8.00MiB  1.00MiB     8.99GiB
           Used       200.25MiB 352.00KiB 16.00KiB
      
        # btrfs dev us .
        /dev/sda10, ID: 1
           Device size:            10.00GiB
           Device slack:              0.00B
           Data,RAID0/1:            1.00GiB
           Metadata,single:         8.00MiB
           System,single:           1.00MiB
           Unallocated:             8.99GiB
      
      Note "Data,RAID0/1", with btrfs-progs 5.13+ the number of devices per
      profile is printed.
      Reviewed-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      b2f78e88
    • Q
      btrfs: subpage: reject raid56 filesystem and profile conversion · c8050b3b
      Qu Wenruo 提交于
      RAID56 is not only unsafe due to its write-hole problem, but also has
      tons of hardcoded PAGE_SIZE.
      
      Disable it for subpage support for now.
      Reviewed-by: NAnand Jain <anand.jain@oracle.com>
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      c8050b3b
    • D
      btrfs: constify and cleanup variables in comparators · 214cc184
      David Sterba 提交于
      Comparators just read the data and thus get const parameters. This
      should be also preserved by the local variables, update all comparators
      passed to sort or bsearch.
      
      Cleanups:
      
      - unnecessary casts are dropped
      - btrfs_cmp_device_free_bytes is cleaned up to follow the common pattern
        and 'inline' is dropped as the function address is taken
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      214cc184
    • D
      btrfs: simplify data stripe calculation helpers · d58ede8d
      David Sterba 提交于
      There are two helpers doing the same calculations based on nparity and
      ncopies. calc_data_stripes can be simplified into one expression, so far
      we don't have profile with both copies and parity, so there's no
      effective change. calc_stripe_length should reuse the helper and not
      repeat the same calculation.
      Reviewed-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      d58ede8d
    • D
      btrfs: merge alloc_device helpers · fe4f46d4
      David Sterba 提交于
      The device allocation is split to two functions, but one just calls the
      other and they're very far in the file. Merge them together.
      Reviewed-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      fe4f46d4
    • D
      btrfs: uninline btrfs_bg_flags_to_raid_index · 500a44c9
      David Sterba 提交于
      The helper does a simple translation from block group flags to index to
      the btrfs_raid_array table. There's no apparent reason to inline the
      function, the translation happens usually once per function and is not
      called in a loop.
      
      Making it a proper function saves quite some binary code (x86_64,
      release config):
      
         text    data     bss     dec     hex filename
      1164011   19253   14912 1198176  124860 pre/btrfs.ko
      1161559   19253   14912 1195724  123ecc post/btrfs.ko
      
      DELTA: -2451
      
      Also add the const attribute as there are no side effects, this could
      help compiler to optimize a few things without the function body.
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      500a44c9
    • M
      btrfs: use btrfs_next_leaf instead of btrfs_next_item when slots > nritems · ad9a9378
      Marcos Paulo de Souza 提交于
      After calling btrfs_search_slot is a common practice to check if the
      slot found isn't bigger than number of slots in the current leaf, and if
      so, search for the same key in the next leaf by calling btrfs_next_leaf,
      which calls btrfs_next_old_leaf to do the job.
      
      Calling btrfs_next_item in the same situation would end up in the same
      code flow, since
      
      * btrfs_next_item
        * btrfs_next_old_item
          * if slot >= nritems(curr_leaf)
            btrfs_next_old_leaf
      
      Change btrfs_verify_dev_extents and calculate_emulated_zone_size
      functions to use btrfs_next_leaf in the same situation.
      Signed-off-by: NMarcos Paulo de Souza <mpdesouza@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      ad9a9378
    • N
      btrfs: make btrfs_finish_chunk_alloc private to block-group.c · 2eadb9e7
      Nikolay Borisov 提交于
      One of the final things that must be done to add a new chunk is
      inserting its device extent items in the device tree. They describe
      the portion of allocated device physical space during phase 1 of
      chunk allocation. This is currently done in btrfs_finish_chunk_alloc
      whose name isn't very informative. What's more, this function is only
      used in block-group.c but is defined as public. There isn't anything
      special about it that would warrant it being defined in volumes.c.
      
      Just move btrfs_finish_chunk_alloc and alloc_chunk_dev_extent to
      block-group.c, make the former static and rename both functions to
      insert_dev_extents and insert_dev_extent respectively.
      Reviewed-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NNikolay Borisov <nborisov@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      2eadb9e7
  2. 29 7月, 2021 1 次提交
    • D
      btrfs: fix rw device counting in __btrfs_free_extra_devids · b2a61667
      Desmond Cheong Zhi Xi 提交于
      When removing a writeable device in __btrfs_free_extra_devids, the rw
      device count should be decremented.
      
      This error was caught by Syzbot which reported a warning in
      close_fs_devices:
      
        WARNING: CPU: 1 PID: 9355 at fs/btrfs/volumes.c:1168 close_fs_devices+0x763/0x880 fs/btrfs/volumes.c:1168
        Modules linked in:
        CPU: 0 PID: 9355 Comm: syz-executor552 Not tainted 5.13.0-rc1-syzkaller #0
        Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
        RIP: 0010:close_fs_devices+0x763/0x880 fs/btrfs/volumes.c:1168
        RSP: 0018:ffffc9000333f2f0 EFLAGS: 00010293
        RAX: ffffffff8365f5c3 RBX: 0000000000000001 RCX: ffff888029afd4c0
        RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000000
        RBP: ffff88802846f508 R08: ffffffff8365f525 R09: ffffed100337d128
        R10: ffffed100337d128 R11: 0000000000000000 R12: dffffc0000000000
        R13: ffff888019be8868 R14: 1ffff1100337d10d R15: 1ffff1100337d10a
        FS:  00007f6f53828700(0000) GS:ffff8880b9a00000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 000000000047c410 CR3: 00000000302a6000 CR4: 00000000001506f0
        DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        Call Trace:
         btrfs_close_devices+0xc9/0x450 fs/btrfs/volumes.c:1180
         open_ctree+0x8e1/0x3968 fs/btrfs/disk-io.c:3693
         btrfs_fill_super fs/btrfs/super.c:1382 [inline]
         btrfs_mount_root+0xac5/0xc60 fs/btrfs/super.c:1749
         legacy_get_tree+0xea/0x180 fs/fs_context.c:592
         vfs_get_tree+0x86/0x270 fs/super.c:1498
         fc_mount fs/namespace.c:993 [inline]
         vfs_kern_mount+0xc9/0x160 fs/namespace.c:1023
         btrfs_mount+0x3d3/0xb50 fs/btrfs/super.c:1809
         legacy_get_tree+0xea/0x180 fs/fs_context.c:592
         vfs_get_tree+0x86/0x270 fs/super.c:1498
         do_new_mount fs/namespace.c:2905 [inline]
         path_mount+0x196f/0x2be0 fs/namespace.c:3235
         do_mount fs/namespace.c:3248 [inline]
         __do_sys_mount fs/namespace.c:3456 [inline]
         __se_sys_mount+0x2f9/0x3b0 fs/namespace.c:3433
         do_syscall_64+0x3f/0xb0 arch/x86/entry/common.c:47
         entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Because fs_devices->rw_devices was not 0 after
      closing all devices. Here is the call trace that was observed:
      
        btrfs_mount_root():
          btrfs_scan_one_device():
            device_list_add();   <---------------- device added
          btrfs_open_devices():
            open_fs_devices():
              btrfs_open_one_device();   <-------- writable device opened,
      	                                     rw device count ++
          btrfs_fill_super():
            open_ctree():
              btrfs_free_extra_devids():
      	  __btrfs_free_extra_devids();  <--- writable device removed,
      	                              rw device count not decremented
      	  fail_tree_roots:
      	    btrfs_close_devices():
      	      close_fs_devices();   <------- rw device count off by 1
      
      As a note, prior to commit cf89af14 ("btrfs: dev-replace: fail
      mount if we don't have replace item with target device"), rw_devices
      was decremented on removing a writable device in
      __btrfs_free_extra_devids only if the BTRFS_DEV_STATE_REPLACE_TGT bit
      was not set for the device. However, this check does not need to be
      reinstated as it is now redundant and incorrect.
      
      In __btrfs_free_extra_devids, we skip removing the device if it is the
      target for replacement. This is done by checking whether device->devid
      == BTRFS_DEV_REPLACE_DEVID. Since BTRFS_DEV_STATE_REPLACE_TGT is set
      only on the device with devid BTRFS_DEV_REPLACE_DEVID, no devices
      should have the BTRFS_DEV_STATE_REPLACE_TGT bit set after the check,
      and so it's redundant to test for that bit.
      
      Additionally, following commit 82372bc8 ("Btrfs: make
      the logic of source device removing more clear"), rw_devices is
      incremented whenever a writeable device is added to the alloc
      list (including the target device in btrfs_dev_replace_finishing), so
      all removals of writable devices from the alloc list should also be
      accompanied by a decrement to rw_devices.
      
      Reported-by: syzbot+a70e2ad0879f160b9217@syzkaller.appspotmail.com
      Fixes: cf89af14 ("btrfs: dev-replace: fail mount if we don't have replace item with target device")
      CC: stable@vger.kernel.org # 5.10+
      Tested-by: syzbot+a70e2ad0879f160b9217@syzkaller.appspotmail.com
      Reviewed-by: NAnand Jain <anand.jain@oracle.com>
      Signed-off-by: NDesmond Cheong Zhi Xi <desmondcheongzx@gmail.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      b2a61667
  3. 07 7月, 2021 1 次提交
    • F
      btrfs: rework chunk allocation to avoid exhaustion of the system chunk array · 79bd3712
      Filipe Manana 提交于
      Commit eafa4fd0 ("btrfs: fix exhaustion of the system chunk array
      due to concurrent allocations") fixed a problem that resulted in
      exhausting the system chunk array in the superblock when there are many
      tasks allocating chunks in parallel. Basically too many tasks enter the
      first phase of chunk allocation without previous tasks having finished
      their second phase of allocation, resulting in too many system chunks
      being allocated. That was originally observed when running the fallocate
      tests of stress-ng on a PowerPC machine, using a node size of 64K.
      
      However that commit also introduced a deadlock where a task in phase 1 of
      the chunk allocation waited for another task that had allocated a system
      chunk to finish its phase 2, but that other task was waiting on an extent
      buffer lock held by the first task, therefore resulting in both tasks not
      making any progress. That change was later reverted by a patch with the
      subject "btrfs: fix deadlock with concurrent chunk allocations involving
      system chunks", since there is no simple and short solution to address it
      and the deadlock is relatively easy to trigger on zoned filesystems, while
      the system chunk array exhaustion is not so common.
      
      This change reworks the chunk allocation to avoid the system chunk array
      exhaustion. It accomplishes that by making the first phase of chunk
      allocation do the updates of the device items in the chunk btree and the
      insertion of the new chunk item in the chunk btree. This is done while
      under the protection of the chunk mutex (fs_info->chunk_mutex), in the
      same critical section that checks for available system space, allocates
      a new system chunk if needed and reserves system chunk space. This way
      we do not have chunk space reserved until the second phase completes.
      
      The same logic is applied to chunk removal as well, since it keeps
      reserved system space long after it is done updating the chunk btree.
      
      For direct allocation of system chunks, the previous behaviour remains,
      because otherwise we would deadlock on extent buffers of the chunk btree.
      Changes to the chunk btree are by large done by chunk allocation and chunk
      removal, which first reserve chunk system space and then later do changes
      to the chunk btree. The other remaining cases are uncommon and correspond
      to adding a device, removing a device and resizing a device. All these
      other cases do not pre-reserve system space, they modify the chunk btree
      right away, so they don't hold reserved space for a long period like chunk
      allocation and chunk removal do.
      
      The diff of this change is huge, but more than half of it is just addition
      of comments describing both how things work regarding chunk allocation and
      removal, including both the new behavior and the parts of the old behavior
      that did not change.
      
      CC: stable@vger.kernel.org # 5.12+
      Tested-by: NShin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
      Tested-by: NNaohiro Aota <naohiro.aota@wdc.com>
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Tested-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      79bd3712
  4. 22 6月, 2021 2 次提交
    • F
      btrfs: ensure relocation never runs while we have send operations running · 1cea5cf0
      Filipe Manana 提交于
      Relocation and send do not play well together because while send is
      running a block group can be relocated, a transaction committed and
      the respective disk extents get re-allocated and written to or discarded
      while send is about to do something with the extents.
      
      This was explained in commit 9e967495 ("Btrfs: prevent send failures
      and crashes due to concurrent relocation"), which prevented balance and
      send from running in parallel but it did not address one remaining case
      where chunk relocation can happen: shrinking a device (and device deletion
      which shrinks a device's size to 0 before deleting the device).
      
      We also have now one more case where relocation is triggered: on zoned
      filesystems partially used block groups get relocated by a background
      thread, introduced in commit 18bb8bbf ("btrfs: zoned: automatically
      reclaim zones").
      
      So make sure that instead of preventing balance from running when there
      are ongoing send operations, we prevent relocation from happening.
      This uses the infrastructure recently added by a patch that has the
      subject: "btrfs: add cancellable chunk relocation support".
      
      Also it adds a spinlock used exclusively for the exclusivity between
      send and relocation, as before fs_info->balance_mutex was used, which
      would make an attempt to run send to block waiting for balance to
      finish, which can take a lot of time on large filesystems.
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      1cea5cf0
    • D
      btrfs: fix typos in comments · 1a9fd417
      David Sterba 提交于
      Fix typos that have snuck in since the last round. Found by codespell.
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      1a9fd417
  5. 21 6月, 2021 2 次提交
  6. 01 6月, 2021 1 次提交
  7. 14 5月, 2021 1 次提交
  8. 21 4月, 2021 4 次提交
  9. 19 4月, 2021 2 次提交
  10. 09 4月, 2021 1 次提交
  11. 18 3月, 2021 1 次提交
    • J
      btrfs: do not initialize dev stats if we have no dev_root · 82d62d06
      Josef Bacik 提交于
      Neal reported a panic trying to use -o rescue=all
      
        BUG: kernel NULL pointer dereference, address: 0000000000000030
        PGD 0 P4D 0
        Oops: 0000 [#1] SMP PTI
        CPU: 0 PID: 4095 Comm: mount Not tainted 5.11.0-0.rc7.149.fc34.x86_64 #1
        RIP: 0010:btrfs_device_init_dev_stats+0x4c/0x1f0
        RSP: 0018:ffffa60285fbfb68 EFLAGS: 00010246
        RAX: 0000000000000000 RBX: ffff88b88f806498 RCX: ffff88b82e7a2a10
        RDX: ffffa60285fbfb97 RSI: ffff88b82e7a2a10 RDI: 0000000000000000
        RBP: ffff88b88f806b3c R08: 0000000000000000 R09: 0000000000000000
        R10: ffff88b82e7a2a10 R11: 0000000000000000 R12: ffff88b88f806a00
        R13: ffff88b88f806478 R14: ffff88b88f806a00 R15: ffff88b82e7a2a10
        FS:  00007f698be1ec40(0000) GS:ffff88b937e00000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 0000000000000030 CR3: 0000000092c9c006 CR4: 00000000003706f0
        Call Trace:
        ? btrfs_init_dev_stats+0x1f/0xf0
        btrfs_init_dev_stats+0x62/0xf0
        open_ctree+0x1019/0x15ff
        btrfs_mount_root.cold+0x13/0xfa
        legacy_get_tree+0x27/0x40
        vfs_get_tree+0x25/0xb0
        vfs_kern_mount.part.0+0x71/0xb0
        btrfs_mount+0x131/0x3d0
        ? legacy_get_tree+0x27/0x40
        ? btrfs_show_options+0x640/0x640
        legacy_get_tree+0x27/0x40
        vfs_get_tree+0x25/0xb0
        path_mount+0x441/0xa80
        __x64_sys_mount+0xf4/0x130
        do_syscall_64+0x33/0x40
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
        RIP: 0033:0x7f698c04e52e
      
      This happens because we unconditionally attempt to initialize device
      stats on mount, but we may not have been able to read the device root.
      Fix this by skipping initializing the device stats if we do not have a
      device root.
      Reported-by: NNeal Gompa <ngompa13@gmail.com>
      CC: stable@vger.kernel.org # 5.11+
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      82d62d06
  12. 09 2月, 2021 10 次提交
    • N
      btrfs: zoned: relocate block group to repair IO failure in zoned filesystems · f7ef5287
      Naohiro Aota 提交于
      When a bad checksum is found and if the filesystem has a mirror of the
      damaged data, we read the correct data from the mirror and writes it to
      damaged blocks. This however, violates the sequential write constraints
      of a zoned block device.
      
      We can consider three methods to repair an IO failure in zoned filesystems:
      
      (1) Reset and rewrite the damaged zone
      (2) Allocate new device extent and replace the damaged device extent to
          the new extent
      (3) Relocate the corresponding block group
      
      Method (1) is most similar to a behavior done with regular devices.
      However, it also wipes non-damaged data in the same device extent, and
      so it unnecessary degrades non-damaged data.
      
      Method (2) is much like device replacing but done in the same device. It
      is safe because it keeps the device extent until the replacing finish.
      However, extending device replacing is non-trivial. It assumes
      "src_dev->physical == dst_dev->physical". Also, the extent mapping
      replacing function should be extended to support replacing device extent
      position in one device.
      
      Method (3) invokes relocation of the damaged block group and is
      straightforward to implement. It relocates all the mirrored device
      extents, so it potentially is a more costly operation than method (1) or
      (2). But it relocates only used extents which reduce the total IO size.
      
      Let's apply method (3) for now. In the future, we can extend device-replace
      and apply method (2).
      
      For protecting a block group gets relocated multiple time with multiple
      IO errors, this commit introduces "relocating_repair" bit to show it's
      now relocating to repair IO failures. Also it uses a new kthread
      "btrfs-relocating-repair", not to block IO path with relocating process.
      
      This commit also supports repairing in the scrub process.
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NNaohiro Aota <naohiro.aota@wdc.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      f7ef5287
    • N
      btrfs: zoned: implement copying for zoned device-replace · de17addc
      Naohiro Aota 提交于
      This is 3/4 patch to implement device-replace on zoned filesystems.
      
      This commit implements copying. To do this, it tracks the write pointer
      during the device replace process. As device-replace's copy process is
      smart enough to only copy used extents on the source device, we have to
      fill the gap to honor the sequential write requirement in the target
      device.
      
      The device-replace process on zoned filesystems must copy or clone all
      the extents in the source device exactly once. So, we need to ensure
      allocations started just before the dev-replace process to have their
      corresponding extent information in the B-trees.
      finish_extent_writes_for_zoned() implements that functionality, which
      basically is the removed code in the commit 042528f8 ("Btrfs: fix
      block group remaining RO forever after error during device replace").
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NNaohiro Aota <naohiro.aota@wdc.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      de17addc
    • N
      btrfs: zoned: implement cloning for zoned device-replace · 6143c23c
      Naohiro Aota 提交于
      This is 2/4 patch to implement device replace for zoned filesystems.
      
      In zoned mode, a block group must be either copied (from the source
      device to the target device) or cloned (to both devices).
      
      Implement the cloning part. If a block group targeted by an IO is marked
      to copy, we should not clone the IO to the destination device, because
      the block group is eventually copied by the replace process.
      
      This commit also handles cloning of device reset.
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NNaohiro Aota <naohiro.aota@wdc.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      6143c23c
    • N
      btrfs: zoned: use ZONE_APPEND write for zoned mode · d8e3fb10
      Naohiro Aota 提交于
      Enable zone append writing for zoned mode. When using zone append, a
      bio is issued to the start of a target zone and the device decides to
      place it inside the zone. Upon completion the device reports the actual
      written position back to the host.
      
      Three parts are necessary to enable zone append mode. First, modify the
      bio to use REQ_OP_ZONE_APPEND in btrfs_submit_bio_hook() and adjust the
      bi_sector to point the beginning of the zone.
      
      Second, record the returned physical address (and disk/partno) to the
      ordered extent in end_bio_extent_writepage() after the bio has been
      completed. We cannot resolve the physical address to the logical address
      because we can neither take locks nor allocate a buffer in this end_bio
      context. So, we need to record the physical address to resolve it later
      in btrfs_finish_ordered_io().
      
      And finally, rewrite the logical addresses of the extent mapping and
      checksum data according to the physical address using btrfs_rmap_block.
      If the returned address matches the originally allocated address, we can
      skip this rewriting process.
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Signed-off-by: NNaohiro Aota <naohiro.aota@wdc.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      d8e3fb10
    • N
      btrfs: zoned: handle REQ_OP_ZONE_APPEND as writing · cfe94440
      Naohiro Aota 提交于
      Zoned filesystems use REQ_OP_ZONE_APPEND bios for writing to actual
      devices.
      
      Let btrfs_end_bio() and btrfs_op be aware of it, by mapping
      REQ_OP_ZONE_APPEND to BTRFS_MAP_WRITE and using btrfs_op() instead of
      bio_op().
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NNaohiro Aota <naohiro.aota@wdc.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      cfe94440
    • N
      btrfs: zoned: verify device extent is aligned to zone · 381a696e
      Naohiro Aota 提交于
      Add a check in verify_one_dev_extent() to ensure that a device extent on
      a zoned block device is aligned to the respective zone boundary.
      
      If it isn't, mark the filesystem as unclean.
      Reviewed-by: NAnand Jain <anand.jain@oracle.com>
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NNaohiro Aota <naohiro.aota@wdc.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      381a696e
    • N
      btrfs: zoned: implement zoned chunk allocator · 1cd6121f
      Naohiro Aota 提交于
      Implement a zoned chunk and device extent allocator. One device zone
      becomes a device extent so that a zone reset affects only this device
      extent and does not change the state of blocks in the neighbor device
      extents.
      
      To implement the allocator, we need to extend the following functions for
      a zoned filesystem.
      
      - init_alloc_chunk_ctl
      - dev_extent_search_start
      - dev_extent_hole_check
      - decide_stripe_size
      
      init_alloc_chunk_ctl_zoned() is mostly the same as regular one. It always
      set the stripe_size to the zone size and aligns the parameters to the zone
      size.
      
      dev_extent_search_start() only aligns the start offset to zone boundaries.
      We don't care about the first 1MB like in regular filesystem because we
      anyway reserve the first two zones for superblock logging.
      
      dev_extent_hole_check_zoned() checks if zones in given hole are either
      conventional or empty sequential zones. Also, it skips zones reserved for
      superblock logging.
      
      With the change to the hole, the new hole may now contain pending extents.
      So, in this case, loop again to check that.
      
      Finally, decide_stripe_size_zoned() should shrink the number of devices
      instead of stripe size because we need to honor stripe_size == zone_size.
      Reviewed-by: NAnand Jain <anand.jain@oracle.com>
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NNaohiro Aota <naohiro.aota@wdc.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      1cd6121f
    • N
      btrfs: zoned: defer loading zone info after opening trees · 73651042
      Naohiro Aota 提交于
      This is a preparation patch to implement zone emulation on a regular
      device.
      
      To emulate a zoned filesystem on a regular (non-zoned) device, we need to
      decide an emulated zone size. Instead of making it a compile-time static
      value, we'll make it configurable at mkfs time. Since we have one zone ==
      one device extent restriction, we can determine the emulated zone size
      from the size of a device extent. We can extend btrfs_get_dev_zone_info()
      to show a regular device filled with conventional zones once the zone size
      is decided.
      
      The current call site of btrfs_get_dev_zone_info() during the mount process
      is earlier than loading the file system trees so that we don't know the
      size of a device extent at this point. Thus we can't slice a regular device
      to conventional zones.
      
      This patch introduces btrfs_get_dev_zone_info_all_devices to load the zone
      info for all the devices. And, it places this function in open_ctree()
      after loading the trees.
      Reviewed-by: NAnand Jain <anand.jain@oracle.com>
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NNaohiro Aota <naohiro.aota@wdc.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      73651042
    • M
      btrfs: let callers of btrfs_get_io_geometry pass the em · 42034313
      Michal Rostecki 提交于
      Before this change, the btrfs_get_io_geometry() function was calling
      btrfs_get_chunk_map() to get the extent mapping, necessary for
      calculating the I/O geometry. It was using that extent mapping only
      internally and freeing the pointer after its execution.
      
      That resulted in calling btrfs_get_chunk_map() de facto twice by the
      __btrfs_map_block() function. It was calling btrfs_get_io_geometry()
      first and then calling btrfs_get_chunk_map() directly to get the extent
      mapping, used by the rest of the function.
      
      Change that to passing the extent mapping to the btrfs_get_io_geometry()
      function as an argument.
      
      This could improve performance in some cases.  For very large
      filesystems, i.e. several thousands of allocated chunks, not only this
      avoids searching two times the rbtree, saving time, it may also help
      reducing contention on the lock that protects the tree - thinking of
      writeback starting for multiple inodes, other tasks allocating or
      removing chunks, and anything else that requires access to the rbtree.
      Reviewed-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NMichal Rostecki <mrostecki@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      [ add Filipe's analysis ]
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      42034313
    • N
      btrfs: consolidate btrfs_previous_item ret val handling in btrfs_shrink_device · 7056bf69
      Nikolay Borisov 提交于
      Instead of having three 'if' to handle non-NULL return value consolidate
      this in one 'if (ret)'. That way the code is more obvious:
      
       - Always drop delete_unused_bgs_mutex if ret is not NULL
       - If ret is negative -> goto done
       - If it's 1 -> reset ret to 0, release the path and finish the loop.
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NNikolay Borisov <nborisov@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      7056bf69
  13. 28 1月, 2021 1 次提交
  14. 26 1月, 2021 1 次提交
    • S
      btrfs: fix lockdep warning due to seqcount_mutex on 32bit arch · c41ec452
      Su Yue 提交于
      This effectively reverts commit d5c82388 ("btrfs: convert
      data_seqcount to seqcount_mutex_t").
      
      While running fstests on 32 bits test box, many tests failed because of
      warnings in dmesg. One of those warnings (btrfs/003):
      
        [66.441317] WARNING: CPU: 6 PID: 9251 at include/linux/seqlock.h:279 btrfs_remove_chunk+0x58b/0x7b0 [btrfs]
        [66.441446] CPU: 6 PID: 9251 Comm: btrfs Tainted: G           O      5.11.0-rc4-custom+ #5
        [66.441449] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.14.0-1 04/01/2014
        [66.441451] EIP: btrfs_remove_chunk+0x58b/0x7b0 [btrfs]
        [66.441472] EAX: 00000000 EBX: 00000001 ECX: c576070c EDX: c6b15803
        [66.441475] ESI: 10000000 EDI: 00000000 EBP: c56fbcfc ESP: c56fbc70
        [66.441477] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010246
        [66.441481] CR0: 80050033 CR2: 05c8da20 CR3: 04b20000 CR4: 00350ed0
        [66.441485] Call Trace:
        [66.441510]  btrfs_relocate_chunk+0xb1/0x100 [btrfs]
        [66.441529]  ? btrfs_lookup_block_group+0x17/0x20 [btrfs]
        [66.441562]  btrfs_balance+0x8ed/0x13b0 [btrfs]
        [66.441586]  ? btrfs_ioctl_balance+0x333/0x3c0 [btrfs]
        [66.441619]  ? __this_cpu_preempt_check+0xf/0x11
        [66.441643]  btrfs_ioctl_balance+0x333/0x3c0 [btrfs]
        [66.441664]  ? btrfs_ioctl_get_supported_features+0x30/0x30 [btrfs]
        [66.441683]  btrfs_ioctl+0x414/0x2ae0 [btrfs]
        [66.441700]  ? __lock_acquire+0x35f/0x2650
        [66.441717]  ? lockdep_hardirqs_on+0x87/0x120
        [66.441720]  ? lockdep_hardirqs_on_prepare+0xd0/0x1e0
        [66.441724]  ? call_rcu+0x2d3/0x530
        [66.441731]  ? __might_fault+0x41/0x90
        [66.441736]  ? kvm_sched_clock_read+0x15/0x50
        [66.441740]  ? sched_clock+0x8/0x10
        [66.441745]  ? sched_clock_cpu+0x13/0x180
        [66.441750]  ? btrfs_ioctl_get_supported_features+0x30/0x30 [btrfs]
        [66.441750]  ? btrfs_ioctl_get_supported_features+0x30/0x30 [btrfs]
        [66.441768]  __ia32_sys_ioctl+0x165/0x8a0
        [66.441773]  ? __this_cpu_preempt_check+0xf/0x11
        [66.441785]  ? __might_fault+0x89/0x90
        [66.441791]  __do_fast_syscall_32+0x54/0x80
        [66.441796]  do_fast_syscall_32+0x32/0x70
        [66.441801]  do_SYSENTER_32+0x15/0x20
        [66.441805]  entry_SYSENTER_32+0x9f/0xf2
        [66.441808] EIP: 0xab7b5549
        [66.441814] EAX: ffffffda EBX: 00000003 ECX: c4009420 EDX: bfa91f5c
        [66.441816] ESI: 00000003 EDI: 00000001 EBP: 00000000 ESP: bfa91e98
        [66.441818] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b EFLAGS: 00000292
        [66.441833] irq event stamp: 42579
        [66.441835] hardirqs last  enabled at (42585): [<c60eb065>] console_unlock+0x495/0x590
        [66.441838] hardirqs last disabled at (42590): [<c60eafd5>] console_unlock+0x405/0x590
        [66.441840] softirqs last  enabled at (41698): [<c601b76c>] call_on_stack+0x1c/0x60
        [66.441843] softirqs last disabled at (41681): [<c601b76c>] call_on_stack+0x1c/0x60
      
        ========================================================================
        btrfs_remove_chunk+0x58b/0x7b0:
        __seqprop_mutex_assert at linux/./include/linux/seqlock.h:279
        (inlined by) btrfs_device_set_bytes_used at linux/fs/btrfs/volumes.h:212
        (inlined by) btrfs_remove_chunk at linux/fs/btrfs/volumes.c:2994
        ========================================================================
      
      The warning is produced by lockdep_assert_held() in
      __seqprop_mutex_assert() if CONFIG_LOCKDEP is enabled.
      And "olumes.c:2994 is btrfs_device_set_bytes_used() with mutex lock
      fs_info->chunk_mutex held already.
      
      After adding some debug prints, the cause was found that many
      __alloc_device() are called with NULL @fs_info (during scanning ioctl).
      Inside the function, btrfs_device_data_ordered_init() is expanded to
      seqcount_mutex_init().  In this scenario, its second
      parameter info->chunk_mutex  is &NULL->chunk_mutex which equals
      to offsetof(struct btrfs_fs_info, chunk_mutex) unexpectedly. Thus,
      seqcount_mutex_init() is called in wrong way. And later
      btrfs_device_get/set helpers trigger lockdep warnings.
      
      The device and filesystem object lifetimes are different and we'd have
      to synchronize initialization of the btrfs_device::data_seqcount with
      the fs_info, possibly using some additional synchronization. It would
      still not prevent concurrent access to the seqcount lock when it's used
      for read and initialization.
      
      Commit d5c82388 ("btrfs: convert data_seqcount to seqcount_mutex_t")
      does not mention a particular problem being fixed so revert should not
      cause any harm and we'll get the lockdep warning fixed.
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=210139Reported-by: NErhard F <erhard_f@mailbox.org>
      Fixes: d5c82388 ("btrfs: convert data_seqcount to seqcount_mutex_t")
      CC: stable@vger.kernel.org # 5.10
      CC: Davidlohr Bueso <dbueso@suse.de>
      Signed-off-by: NSu Yue <l@damenly.su>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      c41ec452