1. 17 12月, 2021 1 次提交
  2. 16 12月, 2021 4 次提交
    • S
      btrfs: fix missing blkdev_put() call in btrfs_scan_one_device() · 4989d4a0
      Shin'ichiro Kawasaki 提交于
      The function btrfs_scan_one_device() calls blkdev_get_by_path() and
      blkdev_put() to get and release its target block device. However, when
      btrfs_sb_log_location_bdev() fails, blkdev_put() is not called and the
      block device is left without clean up. This triggered failure of fstests
      generic/085. Fix the failure path of btrfs_sb_log_location_bdev() to
      call blkdev_put().
      
      Fixes: 12659251 ("btrfs: implement log-structured superblock for ZONED mode")
      CC: stable@vger.kernel.org # 5.15+
      Reviewed-by: NNikolay Borisov <nborisov@suse.com>
      Signed-off-by: NShin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      4989d4a0
    • F
      btrfs: fix warning when freeing leaf after subvolume creation failure · 212a58fd
      Filipe Manana 提交于
      When creating a subvolume, at ioctl.c:create_subvol(), if we fail to
      insert the root item for the new subvolume into the root tree, we can
      trigger the following warning:
      
      [78961.741046] WARNING: CPU: 0 PID: 4079814 at fs/btrfs/extent-tree.c:3357 btrfs_free_tree_block+0x2af/0x310 [btrfs]
      [78961.743344] Modules linked in:
      [78961.749440]  dm_snapshot dm_thin_pool (...)
      [78961.773648] CPU: 0 PID: 4079814 Comm: fsstress Not tainted 5.16.0-rc4-btrfs-next-108 #1
      [78961.775198] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
      [78961.777266] RIP: 0010:btrfs_free_tree_block+0x2af/0x310 [btrfs]
      [78961.778398] Code: 17 00 48 85 (...)
      [78961.781067] RSP: 0018:ffffaa4001657b28 EFLAGS: 00010202
      [78961.781877] RAX: 0000000000000213 RBX: ffff897f8a796910 RCX: 0000000000000000
      [78961.782780] RDX: 0000000000000000 RSI: 0000000011004000 RDI: 00000000ffffffff
      [78961.783764] RBP: ffff8981f490e800 R08: 0000000000000001 R09: 0000000000000000
      [78961.784740] R10: 0000000000000000 R11: 0000000000000001 R12: ffff897fc963fcc8
      [78961.785665] R13: 0000000000000001 R14: ffff898063548000 R15: ffff898063548000
      [78961.786620] FS:  00007f31283c6b80(0000) GS:ffff8982ace00000(0000) knlGS:0000000000000000
      [78961.787717] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [78961.788598] CR2: 00007f31285c3000 CR3: 000000023fcc8003 CR4: 0000000000370ef0
      [78961.789568] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [78961.790585] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [78961.791684] Call Trace:
      [78961.792082]  <TASK>
      [78961.792359]  create_subvol+0x5d1/0x9a0 [btrfs]
      [78961.793054]  btrfs_mksubvol+0x447/0x4c0 [btrfs]
      [78961.794009]  ? preempt_count_add+0x49/0xa0
      [78961.794705]  __btrfs_ioctl_snap_create+0x123/0x190 [btrfs]
      [78961.795712]  ? _copy_from_user+0x66/0xa0
      [78961.796382]  btrfs_ioctl_snap_create_v2+0xbb/0x140 [btrfs]
      [78961.797392]  btrfs_ioctl+0xd1e/0x35c0 [btrfs]
      [78961.798172]  ? __slab_free+0x10a/0x360
      [78961.798820]  ? rcu_read_lock_sched_held+0x12/0x60
      [78961.799664]  ? lock_release+0x223/0x4a0
      [78961.800321]  ? lock_acquired+0x19f/0x420
      [78961.800992]  ? rcu_read_lock_sched_held+0x12/0x60
      [78961.801796]  ? trace_hardirqs_on+0x1b/0xe0
      [78961.802495]  ? _raw_spin_unlock_irqrestore+0x3e/0x60
      [78961.803358]  ? kmem_cache_free+0x321/0x3c0
      [78961.804071]  ? __x64_sys_ioctl+0x83/0xb0
      [78961.804711]  __x64_sys_ioctl+0x83/0xb0
      [78961.805348]  do_syscall_64+0x3b/0xc0
      [78961.805969]  entry_SYSCALL_64_after_hwframe+0x44/0xae
      [78961.806830] RIP: 0033:0x7f31284bc957
      [78961.807517] Code: 3c 1c 48 f7 d8 (...)
      
      This is because we are calling btrfs_free_tree_block() on an extent
      buffer that is dirty. Fix that by cleaning the extent buffer, with
      btrfs_clean_tree_block(), before freeing it.
      
      This was triggered by test case generic/475 from fstests.
      
      Fixes: 67addf29 ("btrfs: fix metadata extent leak after failure to create subvolume")
      CC: stable@vger.kernel.org # 4.4+
      Reviewed-by: NNikolay Borisov <nborisov@suse.com>
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      212a58fd
    • F
      btrfs: fix invalid delayed ref after subvolume creation failure · 7a163608
      Filipe Manana 提交于
      When creating a subvolume, at ioctl.c:create_subvol(), if we fail to
      insert the new root's root item into the root tree, we are freeing the
      metadata extent we reserved for the new root to prevent a metadata
      extent leak, as we don't abort the transaction at that point (since
      there is nothing at that point that is irreversible).
      
      However we allocated the metadata extent for the new root which we are
      creating for the new subvolume, so its delayed reference refers to the
      ID of this new root. But when we free the metadata extent we pass the
      root of the subvolume where the new subvolume is located to
      btrfs_free_tree_block() - this is incorrect because this will generate
      a delayed reference that refers to the ID of the parent subvolume's root,
      and not to ID of the new root.
      
      This results in a failure when running delayed references that leads to
      a transaction abort and a trace like the following:
      
      [3868.738042] RIP: 0010:__btrfs_free_extent+0x709/0x950 [btrfs]
      [3868.739857] Code: 68 0f 85 e6 fb ff (...)
      [3868.742963] RSP: 0018:ffffb0e9045cf910 EFLAGS: 00010246
      [3868.743908] RAX: 00000000fffffffe RBX: 00000000fffffffe RCX: 0000000000000002
      [3868.745312] RDX: 00000000fffffffe RSI: 0000000000000002 RDI: ffff90b0cd793b88
      [3868.746643] RBP: 000000000e5d8000 R08: 0000000000000000 R09: ffff90b0cd793b88
      [3868.747979] R10: 0000000000000002 R11: 00014ded97944d68 R12: 0000000000000000
      [3868.749373] R13: ffff90b09afe4a28 R14: 0000000000000000 R15: ffff90b0cd793b88
      [3868.750725] FS:  00007f281c4a8b80(0000) GS:ffff90b3ada00000(0000) knlGS:0000000000000000
      [3868.752275] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [3868.753515] CR2: 00007f281c6a5000 CR3: 0000000108a42006 CR4: 0000000000370ee0
      [3868.754869] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [3868.756228] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [3868.757803] Call Trace:
      [3868.758281]  <TASK>
      [3868.758655]  ? btrfs_merge_delayed_refs+0x178/0x1c0 [btrfs]
      [3868.759827]  __btrfs_run_delayed_refs+0x2b1/0x1250 [btrfs]
      [3868.761047]  btrfs_run_delayed_refs+0x86/0x210 [btrfs]
      [3868.762069]  ? lock_acquired+0x19f/0x420
      [3868.762829]  btrfs_commit_transaction+0x69/0xb20 [btrfs]
      [3868.763860]  ? _raw_spin_unlock+0x29/0x40
      [3868.764614]  ? btrfs_block_rsv_release+0x1c2/0x1e0 [btrfs]
      [3868.765870]  create_subvol+0x1d8/0x9a0 [btrfs]
      [3868.766766]  btrfs_mksubvol+0x447/0x4c0 [btrfs]
      [3868.767669]  ? preempt_count_add+0x49/0xa0
      [3868.768444]  __btrfs_ioctl_snap_create+0x123/0x190 [btrfs]
      [3868.769639]  ? _copy_from_user+0x66/0xa0
      [3868.770391]  btrfs_ioctl_snap_create_v2+0xbb/0x140 [btrfs]
      [3868.771495]  btrfs_ioctl+0xd1e/0x35c0 [btrfs]
      [3868.772364]  ? __slab_free+0x10a/0x360
      [3868.773198]  ? rcu_read_lock_sched_held+0x12/0x60
      [3868.774121]  ? lock_release+0x223/0x4a0
      [3868.774863]  ? lock_acquired+0x19f/0x420
      [3868.775634]  ? rcu_read_lock_sched_held+0x12/0x60
      [3868.776530]  ? trace_hardirqs_on+0x1b/0xe0
      [3868.777373]  ? _raw_spin_unlock_irqrestore+0x3e/0x60
      [3868.778280]  ? kmem_cache_free+0x321/0x3c0
      [3868.779011]  ? __x64_sys_ioctl+0x83/0xb0
      [3868.779718]  __x64_sys_ioctl+0x83/0xb0
      [3868.780387]  do_syscall_64+0x3b/0xc0
      [3868.781059]  entry_SYSCALL_64_after_hwframe+0x44/0xae
      [3868.781953] RIP: 0033:0x7f281c59e957
      [3868.782585] Code: 3c 1c 48 f7 d8 4c (...)
      [3868.785867] RSP: 002b:00007ffe1f83e2b8 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
      [3868.787198] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f281c59e957
      [3868.788450] RDX: 00007ffe1f83e2c0 RSI: 0000000050009418 RDI: 0000000000000003
      [3868.789748] RBP: 00007ffe1f83f300 R08: 0000000000000000 R09: 00007ffe1f83fe36
      [3868.791214] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000003
      [3868.792468] R13: 0000000000000003 R14: 00007ffe1f83e2c0 R15: 00000000000003cc
      [3868.793765]  </TASK>
      [3868.794037] irq event stamp: 0
      [3868.794548] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
      [3868.795670] hardirqs last disabled at (0): [<ffffffff98294214>] copy_process+0x934/0x2040
      [3868.797086] softirqs last  enabled at (0): [<ffffffff98294214>] copy_process+0x934/0x2040
      [3868.798309] softirqs last disabled at (0): [<0000000000000000>] 0x0
      [3868.799284] ---[ end trace be24c7002fe27747 ]---
      [3868.799928] BTRFS info (device dm-0): leaf 241188864 gen 1268 total ptrs 214 free space 469 owner 2
      [3868.801133] BTRFS info (device dm-0): refs 2 lock_owner 225627 current 225627
      [3868.802056]  item 0 key (237436928 169 0) itemoff 16250 itemsize 33
      [3868.802863]          extent refs 1 gen 1265 flags 2
      [3868.803447]          ref#0: tree block backref root 1610
      (...)
      [3869.064354]  item 114 key (241008640 169 0) itemoff 12488 itemsize 33
      [3869.065421]          extent refs 1 gen 1268 flags 2
      [3869.066115]          ref#0: tree block backref root 1689
      (...)
      [3869.403834] BTRFS error (device dm-0): unable to find ref byte nr 241008640 parent 0 root 1622  owner 0 offset 0
      [3869.405641] BTRFS: error (device dm-0) in __btrfs_free_extent:3076: errno=-2 No such entry
      [3869.407138] BTRFS: error (device dm-0) in btrfs_run_delayed_refs:2159: errno=-2 No such entry
      
      Fix this by passing the new subvolume's root ID to btrfs_free_tree_block().
      This requires changing the root argument of btrfs_free_tree_block() from
      struct btrfs_root * to a u64, since at this point during the subvolume
      creation we have not yet created the struct btrfs_root for the new
      subvolume, and btrfs_free_tree_block() only needs a root ID and nothing
      else from a struct btrfs_root.
      
      This was triggered by test case generic/475 from fstests.
      
      Fixes: 67addf29 ("btrfs: fix metadata extent leak after failure to create subvolume")
      CC: stable@vger.kernel.org # 4.4+
      Reviewed-by: NNikolay Borisov <nborisov@suse.com>
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      7a163608
    • J
      btrfs: check WRITE_ERR when trying to read an extent buffer · 651740a5
      Josef Bacik 提交于
      Filipe reported a hang when we have errors on btrfs.  This turned out to
      be a side-effect of my fix c2e39305 ("btrfs: clear extent buffer
      uptodate when we fail to write it") which made it so we clear
      EXTENT_BUFFER_UPTODATE on an eb when we fail to write it out.
      
      Below is a paste of Filipe's analysis he got from using drgn to debug
      the hang
      
      """
      btree readahead code calls read_extent_buffer_pages(), sets ->io_pages to
      a value while writeback of all pages has not yet completed:
         --> writeback for the first 3 pages finishes, we clear
             EXTENT_BUFFER_UPTODATE from eb on the first page when we get an
             error.
         --> at this point eb->io_pages is 1 and we cleared Uptodate bit from the
             first 3 pages
         --> read_extent_buffer_pages() does not see EXTENT_BUFFER_UPTODATE() so
             it continues, it's able to lock the pages since we obviously don't
             hold the pages locked during writeback
         --> read_extent_buffer_pages() then computes 'num_reads' as 3, and sets
             eb->io_pages to 3, since only the first page does not have Uptodate
             bit set at this point
         --> writeback for the remaining page completes, we ended decrementing
             eb->io_pages by 1, resulting in eb->io_pages == 2, and therefore
             never calling end_extent_buffer_writeback(), so
             EXTENT_BUFFER_WRITEBACK remains in the eb's flags
         --> of course, when the read bio completes, it doesn't and shouldn't
             call end_extent_buffer_writeback()
         --> we should clear EXTENT_BUFFER_UPTODATE only after all pages of
             the eb finished writeback?  or maybe make the read pages code
             wait for writeback of all pages of the eb to complete before
             checking which pages need to be read, touch ->io_pages, submit
             read bio, etc
      
      writeback bit never cleared means we can hang when aborting a
      transaction, at:
      
          btrfs_cleanup_one_transaction()
             btrfs_destroy_marked_extents()
               wait_on_extent_buffer_writeback()
      """
      
      This is a problem because our writes are not synchronized with reads in
      any way.  We clear the UPTODATE flag and then we can easily come in and
      try to read the EB while we're still waiting on other bio's to
      complete.
      
      We have two options here, we could lock all the pages, and then check to
      see if eb->io_pages != 0 to know if we've already got an outstanding
      write on the eb.
      
      Or we can simply check to see if we have WRITE_ERR set on this extent
      buffer.  We set this bit _before_ we clear UPTODATE, so if the read gets
      triggered because we aren't UPTODATE because of a write error we're
      guaranteed to have WRITE_ERR set, and in this case we can simply return
      -EIO.  This will fix the reported hang.
      Reported-by: NFilipe Manana <fdmanana@suse.com>
      Fixes: c2e39305 ("btrfs: clear extent buffer uptodate when we fail to write it")
      CC: stable@vger.kernel.org # 5.4+
      Reviewed-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      651740a5
  3. 14 12月, 2021 5 次提交
    • F
      btrfs: fix missing last dir item offset update when logging directory · 1b2e5e5c
      Filipe Manana 提交于
      When logging a directory, once we finish processing a leaf that is full
      of dir items, if we find the next leaf was not modified in the current
      transaction, we grab the first key of that next leaf and log it as to
      mark the end of a key range boundary.
      
      However we did not update the value of ctx->last_dir_item_offset, which
      tracks the offset of the last logged key. This can result in subsequent
      logging of the same directory in the current transaction to not realize
      that key was already logged, and then add it to the middle of a batch
      that starts with a lower key, resulting later in a leaf with one key
      that is duplicated and at non-consecutive slots. When that happens we get
      an error later when writing out the leaf, reporting that there is a pair
      of keys in wrong order. The report is something like the following:
      
      Dec 13 21:44:50 kernel: BTRFS critical (device dm-0): corrupt leaf:
      root=18446744073709551610 block=118444032 slot=21, bad key order, prev
      (704687 84 4146773349) current (704687 84 1063561078)
      Dec 13 21:44:50 kernel: BTRFS info (device dm-0): leaf 118444032 gen
      91449 total ptrs 39 free space 546 owner 18446744073709551610
      Dec 13 21:44:50 kernel:         item 0 key (704687 1 0) itemoff 3835
      itemsize 160
      Dec 13 21:44:50 kernel:                 inode generation 35532 size
      1026 mode 40755
      Dec 13 21:44:50 kernel:         item 1 key (704687 12 704685) itemoff
      3822 itemsize 13
      Dec 13 21:44:50 kernel:         item 2 key (704687 24 3817753667)
      itemoff 3736 itemsize 86
      Dec 13 21:44:50 kernel:         item 3 key (704687 60 0) itemoff 3728 itemsize 8
      Dec 13 21:44:50 kernel:         item 4 key (704687 72 0) itemoff 3720 itemsize 8
      Dec 13 21:44:50 kernel:         item 5 key (704687 84 140445108)
      itemoff 3666 itemsize 54
      Dec 13 21:44:50 kernel:                 dir oid 704793 type 1
      Dec 13 21:44:50 kernel:         item 6 key (704687 84 298800632)
      itemoff 3599 itemsize 67
      Dec 13 21:44:50 kernel:                 dir oid 707849 type 2
      Dec 13 21:44:50 kernel:         item 7 key (704687 84 476147658)
      itemoff 3532 itemsize 67
      Dec 13 21:44:50 kernel:                 dir oid 707901 type 2
      Dec 13 21:44:50 kernel:         item 8 key (704687 84 633818382)
      itemoff 3471 itemsize 61
      Dec 13 21:44:50 kernel:                 dir oid 704694 type 2
      Dec 13 21:44:50 kernel:         item 9 key (704687 84 654256665)
      itemoff 3403 itemsize 68
      Dec 13 21:44:50 kernel:                 dir oid 707841 type 1
      Dec 13 21:44:50 kernel:         item 10 key (704687 84 995843418)
      itemoff 3331 itemsize 72
      Dec 13 21:44:50 kernel:                 dir oid 2167736 type 1
      Dec 13 21:44:50 kernel:         item 11 key (704687 84 1063561078)
      itemoff 3278 itemsize 53
      Dec 13 21:44:50 kernel:                 dir oid 704799 type 2
      Dec 13 21:44:50 kernel:         item 12 key (704687 84 1101156010)
      itemoff 3225 itemsize 53
      Dec 13 21:44:50 kernel:                 dir oid 704696 type 1
      Dec 13 21:44:50 kernel:         item 13 key (704687 84 2521936574)
      itemoff 3173 itemsize 52
      Dec 13 21:44:50 kernel:                 dir oid 704704 type 2
      Dec 13 21:44:50 kernel:         item 14 key (704687 84 2618368432)
      itemoff 3112 itemsize 61
      Dec 13 21:44:50 kernel:                 dir oid 704738 type 1
      Dec 13 21:44:50 kernel:         item 15 key (704687 84 2676316190)
      itemoff 3046 itemsize 66
      Dec 13 21:44:50 kernel:                 dir oid 2167729 type 1
      Dec 13 21:44:50 kernel:         item 16 key (704687 84 3319104192)
      itemoff 2986 itemsize 60
      Dec 13 21:44:50 kernel:                 dir oid 704745 type 2
      Dec 13 21:44:50 kernel:         item 17 key (704687 84 3908046265)
      itemoff 2929 itemsize 57
      Dec 13 21:44:50 kernel:                 dir oid 2167734 type 1
      Dec 13 21:44:50 kernel:         item 18 key (704687 84 3945713089)
      itemoff 2857 itemsize 72
      Dec 13 21:44:50 kernel:                 dir oid 2167730 type 1
      Dec 13 21:44:50 kernel:         item 19 key (704687 84 4077169308)
      itemoff 2795 itemsize 62
      Dec 13 21:44:50 kernel:                 dir oid 704688 type 1
      Dec 13 21:44:50 kernel:         item 20 key (704687 84 4146773349)
      itemoff 2727 itemsize 68
      Dec 13 21:44:50 kernel:                 dir oid 707892 type 1
      Dec 13 21:44:50 kernel:         item 21 key (704687 84 1063561078)
      itemoff 2674 itemsize 53
      Dec 13 21:44:50 kernel:                 dir oid 704799 type 2
      Dec 13 21:44:50 kernel:         item 22 key (704687 96 2) itemoff 2612
      itemsize 62
      Dec 13 21:44:50 kernel:         item 23 key (704687 96 6) itemoff 2551
      itemsize 61
      Dec 13 21:44:50 kernel:         item 24 key (704687 96 7) itemoff 2498
      itemsize 53
      Dec 13 21:44:50 kernel:         item 25 key (704687 96 12) itemoff
      2446 itemsize 52
      Dec 13 21:44:50 kernel:         item 26 key (704687 96 14) itemoff
      2385 itemsize 61
      Dec 13 21:44:50 kernel:         item 27 key (704687 96 18) itemoff
      2325 itemsize 60
      Dec 13 21:44:50 kernel:         item 28 key (704687 96 24) itemoff
      2271 itemsize 54
      Dec 13 21:44:50 kernel:         item 29 key (704687 96 28) itemoff
      2218 itemsize 53
      Dec 13 21:44:50 kernel:         item 30 key (704687 96 62) itemoff
      2150 itemsize 68
      Dec 13 21:44:50 kernel:         item 31 key (704687 96 66) itemoff
      2083 itemsize 67
      Dec 13 21:44:50 kernel:         item 32 key (704687 96 75) itemoff
      2015 itemsize 68
      Dec 13 21:44:50 kernel:         item 33 key (704687 96 79) itemoff
      1948 itemsize 67
      Dec 13 21:44:50 kernel:         item 34 key (704687 96 82) itemoff
      1882 itemsize 66
      Dec 13 21:44:50 kernel:         item 35 key (704687 96 83) itemoff
      1810 itemsize 72
      Dec 13 21:44:50 kernel:         item 36 key (704687 96 85) itemoff
      1753 itemsize 57
      Dec 13 21:44:50 kernel:         item 37 key (704687 96 87) itemoff
      1681 itemsize 72
      Dec 13 21:44:50 kernel:         item 38 key (704694 1 0) itemoff 1521
      itemsize 160
      Dec 13 21:44:50 kernel:                 inode generation 35534 size 30
      mode 40755
      Dec 13 21:44:50 kernel: BTRFS error (device dm-0): block=118444032
      write time tree block corruption detected
      
      So fix that by adding the missing update of ctx->last_dir_item_offset with
      the offset of the boundary key.
      Reported-by: NChris Murphy <lists@colorremedies.com>
      Link: https://lore.kernel.org/linux-btrfs/CAJCQCtT+RSzpUjbMq+UfzNUMe1X5+1G+DnAGbHC=OZ=iRS24jg@mail.gmail.com/
      Fixes: dc287224 ("btrfs: keep track of the last logged keys when logging a directory")
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      1b2e5e5c
    • F
      btrfs: fix double free of anon_dev after failure to create subvolume · 33fab972
      Filipe Manana 提交于
      When creating a subvolume, at create_subvol(), we allocate an anonymous
      device and later call btrfs_get_new_fs_root(), which in turn just calls
      btrfs_get_root_ref(). There we call btrfs_init_fs_root() which assigns
      the anonymous device to the root, but if after that call there's an error,
      when we jump to 'fail' label, we call btrfs_put_root(), which frees the
      anonymous device and then returns an error that is propagated back to
      create_subvol(). Than create_subvol() frees the anonymous device again.
      
      When this happens, if the anonymous device was not reallocated after
      the first time it was freed with btrfs_put_root(), we get a kernel
      message like the following:
      
        (...)
        [13950.282466] BTRFS: error (device dm-0) in create_subvol:663: errno=-5 IO failure
        [13950.283027] ida_free called for id=65 which is not allocated.
        [13950.285974] BTRFS info (device dm-0): forced readonly
        (...)
      
      If the anonymous device gets reallocated by another btrfs filesystem
      or any other kernel subsystem, then bad things can happen.
      
      So fix this by setting the root's anonymous device to 0 at
      btrfs_get_root_ref(), before we call btrfs_put_root(), if an error
      happened.
      
      Fixes: 2dfb1e43 ("btrfs: preallocate anon block device at first phase of snapshot creation")
      CC: stable@vger.kernel.org # 5.10+
      Reviewed-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      33fab972
    • J
      btrfs: fix memory leak in __add_inode_ref() · f35838a6
      Jianglei Nie 提交于
      Line 1169 (#3) allocates a memory chunk for victim_name by kmalloc(),
      but  when the function returns in line 1184 (#4) victim_name allocated
      by line 1169 (#3) is not freed, which will lead to a memory leak.
      There is a similar snippet of code in this function as allocating a memory
      chunk for victim_name in line 1104 (#1) as well as releasing the memory
      in line 1116 (#2).
      
      We should kfree() victim_name when the return value of backref_in_log()
      is less than zero and before the function returns in line 1184 (#4).
      
      1057 static inline int __add_inode_ref(struct btrfs_trans_handle *trans,
      1058 				  struct btrfs_root *root,
      1059 				  struct btrfs_path *path,
      1060 				  struct btrfs_root *log_root,
      1061 				  struct btrfs_inode *dir,
      1062 				  struct btrfs_inode *inode,
      1063 				  u64 inode_objectid, u64 parent_objectid,
      1064 				  u64 ref_index, char *name, int namelen,
      1065 				  int *search_done)
      1066 {
      
      1104 	victim_name = kmalloc(victim_name_len, GFP_NOFS);
      	// #1: kmalloc (victim_name-1)
      1105 	if (!victim_name)
      1106 		return -ENOMEM;
      
      1112	ret = backref_in_log(log_root, &search_key,
      1113			parent_objectid, victim_name,
      1114			victim_name_len);
      1115	if (ret < 0) {
      1116		kfree(victim_name); // #2: kfree (victim_name-1)
      1117		return ret;
      1118	} else if (!ret) {
      
      1169 	victim_name = kmalloc(victim_name_len, GFP_NOFS);
      	// #3: kmalloc (victim_name-2)
      1170 	if (!victim_name)
      1171 		return -ENOMEM;
      
      1180 	ret = backref_in_log(log_root, &search_key,
      1181 			parent_objectid, victim_name,
      1182 			victim_name_len);
      1183 	if (ret < 0) {
      1184 		return ret; // #4: missing kfree (victim_name-2)
      1185 	} else if (!ret) {
      
      1241 	return 0;
      1242 }
      
      Fixes: d3316c82 ("btrfs: Properly handle backref_in_log retval")
      CC: stable@vger.kernel.org # 5.10+
      Reviewed-by: NQu Wenruo <wqu@suse.com>
      Reviewed-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NJianglei Nie <niejianglei2021@163.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      f35838a6
    • L
      fget: clarify and improve __fget_files() implementation · e386dfc5
      Linus Torvalds 提交于
      Commit 054aa8d4 ("fget: check that the fd still exists after getting
      a ref to it") fixed a race with getting a reference to a file just as it
      was being closed.  It was a fairly minimal patch, and I didn't think
      re-checking the file pointer lookup would be a measurable overhead,
      since it was all right there and cached.
      
      But I was wrong, as pointed out by the kernel test robot.
      
      The 'poll2' case of the will-it-scale.per_thread_ops benchmark regressed
      quite noticeably.  Admittedly it seems to be a very artificial test:
      doing "poll()" system calls on regular files in a very tight loop in
      multiple threads.
      
      That means that basically all the time is spent just looking up file
      descriptors without ever doing anything useful with them (not that doing
      'poll()' on a regular file is useful to begin with).  And as a result it
      shows the extra "re-check fd" cost as a sore thumb.
      
      Happily, the regression is fixable by just writing the code to loook up
      the fd to be better and clearer.  There's still a cost to verify the
      file pointer, but now it's basically in the noise even for that
      benchmark that does nothing else - and the code is more understandable
      and has better comments too.
      
      [ Side note: this patch is also a classic case of one that looks very
        messy with the default greedy Myers diff - it's much more legible with
        either the patience of histogram diff algorithm ]
      
      Link: https://lore.kernel.org/lkml/20211210053743.GA36420@xsang-OptiPlex-9020/
      Link: https://lore.kernel.org/lkml/20211213083154.GA20853@linux.intel.com/Reported-by: Nkernel test robot <oliver.sang@intel.com>
      Tested-by: NCarel Si <beibei.si@intel.com>
      Cc: Jann Horn <jannh@google.com>
      Cc: Miklos Szeredi <mszeredi@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e386dfc5
    • J
      io-wq: drop wqe lock before creating new worker · d800c65c
      Jens Axboe 提交于
      We have two io-wq creation paths:
      
      - On queue enqueue
      - When a worker goes to sleep
      
      The latter invokes worker creation with the wqe->lock held, but that can
      run into problems if we end up exiting and need to cancel the queued work.
      syzbot caught this:
      
      ============================================
      WARNING: possible recursive locking detected
      5.16.0-rc4-syzkaller #0 Not tainted
      --------------------------------------------
      iou-wrk-6468/6471 is trying to acquire lock:
      ffff88801aa98018 (&wqe->lock){+.+.}-{2:2}, at: io_worker_cancel_cb+0xb7/0x210 fs/io-wq.c:187
      
      but task is already holding lock:
      ffff88801aa98018 (&wqe->lock){+.+.}-{2:2}, at: io_wq_worker_sleeping+0xb6/0x140 fs/io-wq.c:700
      
      other info that might help us debug this:
       Possible unsafe locking scenario:
      
             CPU0
             ----
        lock(&wqe->lock);
        lock(&wqe->lock);
      
       *** DEADLOCK ***
      
       May be due to missing lock nesting notation
      
      1 lock held by iou-wrk-6468/6471:
       #0: ffff88801aa98018 (&wqe->lock){+.+.}-{2:2}, at: io_wq_worker_sleeping+0xb6/0x140 fs/io-wq.c:700
      
      stack backtrace:
      CPU: 1 PID: 6471 Comm: iou-wrk-6468 Not tainted 5.16.0-rc4-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       <TASK>
       __dump_stack lib/dump_stack.c:88 [inline]
       dump_stack_lvl+0x1dc/0x2d8 lib/dump_stack.c:106
       print_deadlock_bug kernel/locking/lockdep.c:2956 [inline]
       check_deadlock kernel/locking/lockdep.c:2999 [inline]
       validate_chain+0x5984/0x8240 kernel/locking/lockdep.c:3788
       __lock_acquire+0x1382/0x2b00 kernel/locking/lockdep.c:5027
       lock_acquire+0x19f/0x4d0 kernel/locking/lockdep.c:5637
       __raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline]
       _raw_spin_lock+0x2a/0x40 kernel/locking/spinlock.c:154
       io_worker_cancel_cb+0xb7/0x210 fs/io-wq.c:187
       io_wq_cancel_tw_create fs/io-wq.c:1220 [inline]
       io_queue_worker_create+0x3cf/0x4c0 fs/io-wq.c:372
       io_wq_worker_sleeping+0xbe/0x140 fs/io-wq.c:701
       sched_submit_work kernel/sched/core.c:6295 [inline]
       schedule+0x67/0x1f0 kernel/sched/core.c:6323
       schedule_timeout+0xac/0x300 kernel/time/timer.c:1857
       wait_woken+0xca/0x1b0 kernel/sched/wait.c:460
       unix_msg_wait_data net/unix/unix_bpf.c:32 [inline]
       unix_bpf_recvmsg+0x7f9/0xe20 net/unix/unix_bpf.c:77
       unix_stream_recvmsg+0x214/0x2c0 net/unix/af_unix.c:2832
       sock_recvmsg_nosec net/socket.c:944 [inline]
       sock_recvmsg net/socket.c:962 [inline]
       sock_read_iter+0x3a7/0x4d0 net/socket.c:1035
       call_read_iter include/linux/fs.h:2156 [inline]
       io_iter_do_read fs/io_uring.c:3501 [inline]
       io_read fs/io_uring.c:3558 [inline]
       io_issue_sqe+0x144c/0x9590 fs/io_uring.c:6671
       io_wq_submit_work+0x2d8/0x790 fs/io_uring.c:6836
       io_worker_handle_work+0x808/0xdd0 fs/io-wq.c:574
       io_wqe_worker+0x395/0x870 fs/io-wq.c:630
       ret_from_fork+0x1f/0x30
      
      We can safely drop the lock before doing work creation, making the two
      contexts the same in that regard.
      
      Reported-by: syzbot+b18b8be69df33a3918e9@syzkaller.appspotmail.com
      Fixes: 71a85387 ("io-wq: check for wq exit after adding new worker task_work")
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      d800c65c
  4. 11 12月, 2021 4 次提交
    • J
      io-wq: check for wq exit after adding new worker task_work · 71a85387
      Jens Axboe 提交于
      We check IO_WQ_BIT_EXIT before attempting to create a new worker, and
      wq exit cancels pending work if we have any. But it's possible to have
      a race between the two, where creation checks exit finding it not set,
      but we're in the process of exiting. The exit side will cancel pending
      creation task_work, but there's a gap where we add task_work after we've
      canceled existing creations at exit time.
      
      Fix this by checking the EXIT bit post adding the creation task_work.
      If it's set, run the same cancelation that exit does.
      
      Reported-and-tested-by: syzbot+b60c982cb0efc5e05a47@syzkaller.appspotmail.com
      Reviewed-by: NHao Xu <haoxu@linux.alibaba.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      71a85387
    • J
      io_uring: ensure task_work gets run as part of cancelations · 78a78060
      Jens Axboe 提交于
      If we successfully cancel a work item but that work item needs to be
      processed through task_work, then we can be sleeping uninterruptibly
      in io_uring_cancel_generic() and never process it. Hence we don't
      make forward progress and we end up with an uninterruptible sleep
      warning.
      
      While in there, correct a comment that should be IFF, not IIF.
      
      Reported-and-tested-by: syzbot+21e6887c0be14181206d@syzkaller.appspotmail.com
      Cc: stable@vger.kernel.org
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      78a78060
    • J
      nfsd: fix use-after-free due to delegation race · 548ec080
      J. Bruce Fields 提交于
      A delegation break could arrive as soon as we've called vfs_setlease.  A
      delegation break runs a callback which immediately (in
      nfsd4_cb_recall_prepare) adds the delegation to del_recall_lru.  If we
      then exit nfs4_set_delegation without hashing the delegation, it will be
      freed as soon as the callback is done with it, without ever being
      removed from del_recall_lru.
      
      Symptoms show up later as use-after-free or list corruption warnings,
      usually in the laundromat thread.
      
      I suspect aba2072f "nfsd: grant read delegations to clients holding
      writes" made this bug easier to hit, but I looked as far back as v3.0
      and it looks to me it already had the same problem.  So I'm not sure
      where the bug was introduced; it may have been there from the beginning.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      548ec080
    • A
      nfsd: Fix nsfd startup race (again) · b10252c7
      Alexander Sverdlin 提交于
      Commit bd5ae928 ("nfsd: register pernet ops last, unregister first")
      has re-opened rpc_pipefs_event() race against nfsd_net_id registration
      (register_pernet_subsys()) which has been fixed by commit bb7ffbf2
      ("nfsd: fix nsfd startup race triggering BUG_ON").
      
      Restore the order of register_pernet_subsys() vs register_cld_notifier().
      Add WARN_ON() to prevent a future regression.
      
      Crash info:
      Unable to handle kernel NULL pointer dereference at virtual address 0000000000000012
      CPU: 8 PID: 345 Comm: mount Not tainted 5.4.144-... #1
      pc : rpc_pipefs_event+0x54/0x120 [nfsd]
      lr : rpc_pipefs_event+0x48/0x120 [nfsd]
      Call trace:
       rpc_pipefs_event+0x54/0x120 [nfsd]
       blocking_notifier_call_chain
       rpc_fill_super
       get_tree_keyed
       rpc_fs_get_tree
       vfs_get_tree
       do_mount
       ksys_mount
       __arm64_sys_mount
       el0_svc_handler
       el0_svc
      
      Fixes: bd5ae928 ("nfsd: register pernet ops last, unregister first")
      Cc: stable@vger.kernel.org
      Signed-off-by: NAlexander Sverdlin <alexander.sverdlin@nokia.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      b10252c7
  5. 10 12月, 2021 4 次提交
    • X
      aio: Fix incorrect usage of eventfd_signal_allowed() · 4b374986
      Xie Yongji 提交于
      We should defer eventfd_signal() to the workqueue when
      eventfd_signal_allowed() return false rather than return
      true.
      
      Fixes: b542e383 ("eventfd: Make signal recursion protection a task bit")
      Signed-off-by: NXie Yongji <xieyongji@bytedance.com>
      Link: https://lore.kernel.org/r/20210913111928.98-1-xieyongji@bytedance.comReviewed-by: NEric Biggers <ebiggers@google.com>
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      4b374986
    • E
      aio: fix use-after-free due to missing POLLFREE handling · 50252e4b
      Eric Biggers 提交于
      signalfd_poll() and binder_poll() are special in that they use a
      waitqueue whose lifetime is the current task, rather than the struct
      file as is normally the case.  This is okay for blocking polls, since a
      blocking poll occurs within one task; however, non-blocking polls
      require another solution.  This solution is for the queue to be cleared
      before it is freed, by sending a POLLFREE notification to all waiters.
      
      Unfortunately, only eventpoll handles POLLFREE.  A second type of
      non-blocking poll, aio poll, was added in kernel v4.18, and it doesn't
      handle POLLFREE.  This allows a use-after-free to occur if a signalfd or
      binder fd is polled with aio poll, and the waitqueue gets freed.
      
      Fix this by making aio poll handle POLLFREE.
      
      A patch by Ramji Jiyani <ramjiyani@google.com>
      (https://lore.kernel.org/r/20211027011834.2497484-1-ramjiyani@google.com)
      tried to do this by making aio_poll_wake() always complete the request
      inline if POLLFREE is seen.  However, that solution had two bugs.
      First, it introduced a deadlock, as it unconditionally locked the aio
      context while holding the waitqueue lock, which inverts the normal
      locking order.  Second, it didn't consider that POLLFREE notifications
      are missed while the request has been temporarily de-queued.
      
      The second problem was solved by my previous patch.  This patch then
      properly fixes the use-after-free by handling POLLFREE in a
      deadlock-free way.  It does this by taking advantage of the fact that
      freeing of the waitqueue is RCU-delayed, similar to what eventpoll does.
      
      Fixes: 2c14fa83 ("aio: implement IOCB_CMD_POLL")
      Cc: <stable@vger.kernel.org> # v4.18+
      Link: https://lore.kernel.org/r/20211209010455.42744-6-ebiggers@kernel.orgSigned-off-by: NEric Biggers <ebiggers@google.com>
      50252e4b
    • E
      aio: keep poll requests on waitqueue until completed · 363bee27
      Eric Biggers 提交于
      Currently, aio_poll_wake() will always remove the poll request from the
      waitqueue.  Then, if aio_poll_complete_work() sees that none of the
      polled events are ready and the request isn't cancelled, it re-adds the
      request to the waitqueue.  (This can easily happen when polling a file
      that doesn't pass an event mask when waking up its waitqueue.)
      
      This is fundamentally broken for two reasons:
      
        1. If a wakeup occurs between vfs_poll() and the request being
           re-added to the waitqueue, it will be missed because the request
           wasn't on the waitqueue at the time.  Therefore, IOCB_CMD_POLL
           might never complete even if the polled file is ready.
      
        2. When the request isn't on the waitqueue, there is no way to be
           notified that the waitqueue is being freed (which happens when its
           lifetime is shorter than the struct file's).  This is supposed to
           happen via the waitqueue entries being woken up with POLLFREE.
      
      Therefore, leave the requests on the waitqueue until they are actually
      completed (or cancelled).  To keep track of when aio_poll_complete_work
      needs to be scheduled, use new fields in struct poll_iocb.  Remove the
      'done' field which is now redundant.
      
      Note that this is consistent with how sys_poll() and eventpoll work;
      their wakeup functions do *not* remove the waitqueue entries.
      
      Fixes: 2c14fa83 ("aio: implement IOCB_CMD_POLL")
      Cc: <stable@vger.kernel.org> # v4.18+
      Link: https://lore.kernel.org/r/20211209010455.42744-5-ebiggers@kernel.orgSigned-off-by: NEric Biggers <ebiggers@google.com>
      363bee27
    • E
      signalfd: use wake_up_pollfree() · 9537bae0
      Eric Biggers 提交于
      wake_up_poll() uses nr_exclusive=1, so it's not guaranteed to wake up
      all exclusive waiters.  Yet, POLLFREE *must* wake up all waiters.  epoll
      and aio poll are fortunately not affected by this, but it's very
      fragile.  Thus, the new function wake_up_pollfree() has been introduced.
      
      Convert signalfd to use wake_up_pollfree().
      Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Fixes: d80e731e ("epoll: introduce POLLFREE to flush ->signalfd_wqh before kfree()")
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/r/20211209010455.42744-4-ebiggers@kernel.orgSigned-off-by: NEric Biggers <ebiggers@google.com>
      9537bae0
  6. 09 12月, 2021 1 次提交
  7. 08 12月, 2021 12 次提交
    • Q
      btrfs: replace the BUG_ON in btrfs_del_root_ref with proper error handling · 8289ed9f
      Qu Wenruo 提交于
      I hit the BUG_ON() with generic/475 test case, and to my surprise, all
      callers of btrfs_del_root_ref() are already aborting transaction, thus
      there is not need for such BUG_ON(), just go to @out label and caller
      will properly handle the error.
      
      CC: stable@vger.kernel.org # 5.4+
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      8289ed9f
    • J
      btrfs: zoned: clear data relocation bg on zone finish · 5911f538
      Johannes Thumshirn 提交于
      When finishing a zone that is used by a dedicated data relocation
      block group, also remove its reference from fs_info, so we're not trying
      to use a full block group for allocations during data relocation, which
      will always fail.
      
      The result is we're not making any forward progress and end up in a
      deadlock situation.
      
      Fixes: c2707a25 ("btrfs: zoned: add a dedicated data relocation block group")
      Reviewed-by: NNaohiro Aota <naohiro.aota@wdc.com>
      Signed-off-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      5911f538
    • J
      btrfs: free exchange changeset on failures · da5e817d
      Johannes Thumshirn 提交于
      Fstests runs on my VMs have show several kmemleak reports like the following.
      
        unreferenced object 0xffff88811ae59080 (size 64):
          comm "xfs_io", pid 12124, jiffies 4294987392 (age 6.368s)
          hex dump (first 32 bytes):
            00 c0 1c 00 00 00 00 00 ff cf 1c 00 00 00 00 00  ................
            90 97 e5 1a 81 88 ff ff 90 97 e5 1a 81 88 ff ff  ................
          backtrace:
            [<00000000ac0176d2>] ulist_add_merge+0x60/0x150 [btrfs]
            [<0000000076e9f312>] set_state_bits+0x86/0xc0 [btrfs]
            [<0000000014fe73d6>] set_extent_bit+0x270/0x690 [btrfs]
            [<000000004f675208>] set_record_extent_bits+0x19/0x20 [btrfs]
            [<00000000b96137b1>] qgroup_reserve_data+0x274/0x310 [btrfs]
            [<0000000057e9dcbb>] btrfs_check_data_free_space+0x5c/0xa0 [btrfs]
            [<0000000019c4511d>] btrfs_delalloc_reserve_space+0x1b/0xa0 [btrfs]
            [<000000006d37e007>] btrfs_dio_iomap_begin+0x415/0x970 [btrfs]
            [<00000000fb8a74b8>] iomap_iter+0x161/0x1e0
            [<0000000071dff6ff>] __iomap_dio_rw+0x1df/0x700
            [<000000002567ba53>] iomap_dio_rw+0x5/0x20
            [<0000000072e555f8>] btrfs_file_write_iter+0x290/0x530 [btrfs]
            [<000000005eb3d845>] new_sync_write+0x106/0x180
            [<000000003fb505bf>] vfs_write+0x24d/0x2f0
            [<000000009bb57d37>] __x64_sys_pwrite64+0x69/0xa0
            [<000000003eba3fdf>] do_syscall_64+0x43/0x90
      
      In case brtfs_qgroup_reserve_data() or btrfs_delalloc_reserve_metadata()
      fail the allocated extent_changeset will not be freed.
      
      So in btrfs_check_data_free_space() and btrfs_delalloc_reserve_space()
      free the allocated extent_changeset to get rid of the allocated memory.
      
      The issue currently only happens in the direct IO write path, but only
      after 65b3c08606e5 ("btrfs: fix ENOSPC failure when attempting direct IO
      write into NOCOW range"), and also at defrag_one_locked_target(). Every
      other place is always calling extent_changeset_free() even if its call
      to btrfs_delalloc_reserve_space() or btrfs_check_data_free_space() has
      failed.
      
      CC: stable@vger.kernel.org # 5.15+
      Reviewed-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      da5e817d
    • N
      btrfs: fix re-dirty process of tree-log nodes · 84c25448
      Naohiro Aota 提交于
      There is a report of a transaction abort of -EAGAIN with the following
      script.
      
        #!/bin/sh
      
        for d in sda sdb; do
                mkfs.btrfs -d single -m single -f /dev/\${d}
        done
      
        mount /dev/sda /mnt/test
        mount /dev/sdb /mnt/scratch
      
        for dir in test scratch; do
                echo 3 >/proc/sys/vm/drop_caches
                fio --directory=/mnt/\${dir} --name=fio.\${dir} --rw=read --size=50G --bs=64m \
                        --numjobs=$(nproc) --time_based --ramp_time=5 --runtime=480 \
                        --group_reporting |& tee /dev/shm/fio.\${dir}
                echo 3 >/proc/sys/vm/drop_caches
        done
      
        for d in sda sdb; do
                umount /dev/\${d}
        done
      
      The stack trace is shown in below.
      
        [3310.967991] BTRFS: error (device sda) in btrfs_commit_transaction:2341: errno=-11 unknown (Error while writing out transaction)
        [3310.968060] BTRFS info (device sda): forced readonly
        [3310.968064] BTRFS warning (device sda): Skipping commit of aborted transaction.
        [3310.968065] ------------[ cut here ]------------
        [3310.968066] BTRFS: Transaction aborted (error -11)
        [3310.968074] WARNING: CPU: 14 PID: 1684 at fs/btrfs/transaction.c:1946 btrfs_commit_transaction.cold+0x209/0x2c8
        [3310.968131] CPU: 14 PID: 1684 Comm: fio Not tainted 5.14.10-300.fc35.x86_64 #1
        [3310.968135] Hardware name: DIAWAY Tartu/Tartu, BIOS V2.01.B10 04/08/2021
        [3310.968137] RIP: 0010:btrfs_commit_transaction.cold+0x209/0x2c8
        [3310.968144] RSP: 0018:ffffb284ce393e10 EFLAGS: 00010282
        [3310.968147] RAX: 0000000000000026 RBX: ffff973f147b0f60 RCX: 0000000000000027
        [3310.968149] RDX: ffff974ecf098a08 RSI: 0000000000000001 RDI: ffff974ecf098a00
        [3310.968150] RBP: ffff973f147b0f08 R08: 0000000000000000 R09: ffffb284ce393c48
        [3310.968151] R10: ffffb284ce393c40 R11: ffffffff84f47468 R12: ffff973f101bfc00
        [3310.968153] R13: ffff971f20cf2000 R14: 00000000fffffff5 R15: ffff973f147b0e58
        [3310.968154] FS:  00007efe65468740(0000) GS:ffff974ecf080000(0000) knlGS:0000000000000000
        [3310.968157] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        [3310.968158] CR2: 000055691bcbe260 CR3: 000000105cfa4001 CR4: 0000000000770ee0
        [3310.968160] PKRU: 55555554
        [3310.968161] Call Trace:
        [3310.968167]  ? dput+0xd4/0x300
        [3310.968174]  btrfs_sync_file+0x3f1/0x490
        [3310.968180]  __x64_sys_fsync+0x33/0x60
        [3310.968185]  do_syscall_64+0x3b/0x90
        [3310.968190]  entry_SYSCALL_64_after_hwframe+0x44/0xae
        [3310.968194] RIP: 0033:0x7efe6557329b
        [3310.968200] RSP: 002b:00007ffe0236ebc0 EFLAGS: 00000293 ORIG_RAX: 000000000000004a
        [3310.968203] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007efe6557329b
        [3310.968204] RDX: 0000000000000000 RSI: 00007efe58d77010 RDI: 0000000000000006
        [3310.968205] RBP: 0000000004000000 R08: 0000000000000000 R09: 00007efe58d77010
        [3310.968207] R10: 0000000016cacc0c R11: 0000000000000293 R12: 00007efe5ce95980
        [3310.968208] R13: 0000000000000000 R14: 00007efe6447c790 R15: 0000000c80000000
        [3310.968212] ---[ end trace 1a346f4d3c0d96ba ]---
        [3310.968214] BTRFS: error (device sda) in cleanup_transaction:1946: errno=-11 unknown
      
      The abort occurs because of a write hole while writing out freeing tree
      nodes of a tree-log tree. For zoned btrfs, we re-dirty a freed tree
      node to ensure btrfs can write the region and does not leave a hole on
      write on a zoned device. The current code fails to re-dirty a node
      when the tree-log tree's depth is greater or equal to 2. That leads to
      a transaction abort with -EAGAIN.
      
      Fix the issue by properly re-dirtying a node on walking up the tree.
      
      Fixes: d3575156 ("btrfs: zoned: redirty released extent buffers")
      CC: stable@vger.kernel.org # 5.12+
      Link: https://github.com/kdave/btrfs-progs/issues/415Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Signed-off-by: NNaohiro Aota <naohiro.aota@wdc.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      84c25448
    • J
      btrfs: call mapping_set_error() on btree inode with a write error · 68b85589
      Josef Bacik 提交于
      generic/484 fails sometimes with compression on because the write ends
      up small enough that it goes into the btree.  This means that we never
      call mapping_set_error() on the inode itself, because the page gets
      marked as fine when we inline it into the metadata.  When the metadata
      writeback happens we see it and abort the transaction properly and mark
      the fs as readonly, however we don't do the mapping_set_error() on
      anything.  In syncfs() we will simply return 0 if the sb is marked
      read-only, so we can't check for this in our syncfs callback.  The only
      way the error gets returned if we called mapping_set_error() on
      something.  Fix this by calling mapping_set_error() on the btree inode
      mapping.  This allows us to properly return an error on syncfs and pass
      generic/484 with compression on.
      Reviewed-by: NNikolay Borisov <nborisov@suse.com>
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      68b85589
    • J
      btrfs: clear extent buffer uptodate when we fail to write it · c2e39305
      Josef Bacik 提交于
      I got dmesg errors on generic/281 on our overnight fstests.  Looking at
      the history this happens occasionally, with errors like this
      
        WARNING: CPU: 0 PID: 673217 at fs/btrfs/extent_io.c:6848 assert_eb_page_uptodate+0x3f/0x50
        CPU: 0 PID: 673217 Comm: kworker/u4:13 Tainted: G        W         5.16.0-rc2+ #469
        Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
        Workqueue: btrfs-cache btrfs_work_helper
        RIP: 0010:assert_eb_page_uptodate+0x3f/0x50
        RSP: 0018:ffffae598230bc60 EFLAGS: 00010246
        RAX: 0017ffffc0002112 RBX: ffffebaec4100900 RCX: 0000000000001000
        RDX: ffffebaec45733c7 RSI: ffffebaec4100900 RDI: ffff9fd98919f340
        RBP: 0000000000000d56 R08: ffff9fd98e300000 R09: 0000000000000000
        R10: 0001207370a91c50 R11: 0000000000000000 R12: 00000000000007b0
        R13: ffff9fd98919f340 R14: 0000000001500000 R15: 0000000001cb0000
        FS:  0000000000000000(0000) GS:ffff9fd9fbc00000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 00007f549fcf8940 CR3: 0000000114908004 CR4: 0000000000370ef0
        Call Trace:
      
         extent_buffer_test_bit+0x3f/0x70
         free_space_test_bit+0xa6/0xc0
         load_free_space_tree+0x1d6/0x430
         caching_thread+0x454/0x630
         ? rcu_read_lock_sched_held+0x12/0x60
         ? rcu_read_lock_sched_held+0x12/0x60
         ? rcu_read_lock_sched_held+0x12/0x60
         ? lock_release+0x1f0/0x2d0
         btrfs_work_helper+0xf2/0x3e0
         ? lock_release+0x1f0/0x2d0
         ? finish_task_switch.isra.0+0xf9/0x3a0
         process_one_work+0x270/0x5a0
         worker_thread+0x55/0x3c0
         ? process_one_work+0x5a0/0x5a0
         kthread+0x174/0x1a0
         ? set_kthread_struct+0x40/0x40
         ret_from_fork+0x1f/0x30
      
      This happens because we're trying to read from a extent buffer page that
      is !PageUptodate.  This happens because we will clear the page uptodate
      when we have an IO error, but we don't clear the extent buffer uptodate.
      If we do a read later and find this extent buffer we'll think its valid
      and not return an error, and then trip over this warning.
      
      Fix this by also clearing uptodate on the extent buffer when this
      happens, so that we get an error when we do a btrfs_search_slot() and
      find this block later.
      
      CC: stable@vger.kernel.org # 5.4+
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      c2e39305
    • J
      btrfs: fail if fstrim_range->start == U64_MAX · f981fec1
      Josef Bacik 提交于
      We've always been failing generic/260 because it's testing things we
      actually don't care about and thus won't fail for.  However we probably
      should fail for fstrim_range->start == U64_MAX since we clearly can't
      trim anything past that.  This in combination with an update to
      generic/260 will allow us to pass this test properly.
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      f981fec1
    • D
      btrfs: fix error pointer dereference in btrfs_ioctl_rm_dev_v2() · d815b3f2
      Dan Carpenter 提交于
      If memdup_user() fails the error handing will crash when it tries
      to kfree() an error pointer.  Just return directly because there is
      no cleanup required.
      
      Fixes: 1a15eb72 ("btrfs: use btrfs_get_dev_args_from_path in dev removal ioctls")
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      d815b3f2
    • S
      tracefs: Set all files to the same group ownership as the mount option · 48b27b6b
      Steven Rostedt (VMware) 提交于
      As people have been asking to allow non-root processes to have access to
      the tracefs directory, it was considered best to only allow groups to have
      access to the directory, where it is easier to just set the tracefs file
      system to a specific group (as other would be too dangerous), and that way
      the admins could pick which processes would have access to tracefs.
      
      Unfortunately, this broke tooling on Android that expected the other bit
      to be set. For some special cases, for non-root tools to trace the system,
      tracefs would be mounted and change the permissions of the top level
      directory which gave access to all running tasks permission to the
      tracing directory. Even though this would be dangerous to do in a
      production environment, for testing environments this can be useful.
      
      Now with the new changes to not allow other (which is still the proper
      thing to do), it breaks the testing tooling. Now more code needs to be
      loaded on the system to change ownership of the tracing directory.
      
      The real solution is to have tracefs honor the gid=xxx option when
      mounting. That is,
      
      (tracing group tracing has value 1003)
      
       mount -t tracefs -o gid=1003 tracefs /sys/kernel/tracing
      
      should have it that all files in the tracing directory should be of the
      given group.
      
      Copy the logic from d_walk() from dcache.c and simplify it for the mount
      case of tracefs if gid is set. All the files in tracefs will be walked and
      their group will be set to the value passed in.
      
      Link: https://lkml.kernel.org/r/20211207171729.2a54e1b3@gandalf.local.home
      
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: linux-fsdevel@vger.kernel.org
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Reported-by: NKalesh Singh <kaleshsingh@google.com>
      Reported-by: NYabin Cui <yabinc@google.com>
      Fixes: 49d67e44 ("tracefs: Have tracefs directories not set OTH permission bits by default")
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      48b27b6b
    • S
      tracefs: Have new files inherit the ownership of their parent · ee7f3666
      Steven Rostedt (VMware) 提交于
      If directories in tracefs have their ownership changed, then any new files
      and directories that are created under those directories should inherit
      the ownership of the director they are created in.
      
      Link: https://lkml.kernel.org/r/20211208075720.4855d180@gandalf.local.home
      
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Yabin Cui <yabinc@google.com>
      Cc: Christian Brauner <christian.brauner@ubuntu.com>
      Cc: stable@vger.kernel.org
      Fixes: 4282d606 ("tracefs: Add new tracefs file system")
      Reported-by: NKalesh Singh <kaleshsingh@google.com>
      Reported: https://lore.kernel.org/all/CAC_TJve8MMAv+H_NdLSJXZUSoxOEq2zB_pVaJ9p=7H6Bu3X76g@mail.gmail.com/Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      ee7f3666
    • V
      cifs: Fix crash on unload of cifs_arc4.ko · 51a08bde
      Vincent Whitchurch 提交于
      The exit function is wrongly placed in the __init section and this leads
      to a crash when the module is unloaded.  Just remove both the init and
      exit functions since this module does not need them.
      
      Fixes: 71c02863 ("cifs: fork arc4 and create a separate module...")
      Signed-off-by: NVincent Whitchurch <vincent.whitchurch@axis.com>
      Acked-by: NRonnie Sahlberg <lsahlber@redhat.com>
      Acked-by: NPaulo Alcantara (SUSE) <pc@cjr.nz>
      Cc: stable@vger.kernel.org # 5.15
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      51a08bde
    • D
      xfs: remove all COW fork extents when remounting readonly · 089558bc
      Darrick J. Wong 提交于
      As part of multiple customer escalations due to file data corruption
      after copy on write operations, I wrote some fstests that use fsstress
      to hammer on COW to shake things loose.  Regrettably, I caught some
      filesystem shutdowns due to incorrect rmap operations with the following
      loop:
      
      mount <filesystem>				# (0)
      fsstress <run only readonly ops> &		# (1)
      while true; do
      	fsstress <run all ops>
      	mount -o remount,ro			# (2)
      	fsstress <run only readonly ops>
      	mount -o remount,rw			# (3)
      done
      
      When (2) happens, notice that (1) is still running.  xfs_remount_ro will
      call xfs_blockgc_stop to walk the inode cache to free all the COW
      extents, but the blockgc mechanism races with (1)'s reader threads to
      take IOLOCKs and loses, which means that it doesn't clean them all out.
      Call such a file (A).
      
      When (3) happens, xfs_remount_rw calls xfs_reflink_recover_cow, which
      walks the ondisk refcount btree and frees any COW extent that it finds.
      This function does not check the inode cache, which means that incore
      COW forks of inode (A) is now inconsistent with the ondisk metadata.  If
      one of those former COW extents are allocated and mapped into another
      file (B) and someone triggers a COW to the stale reservation in (A), A's
      dirty data will be written into (B) and once that's done, those blocks
      will be transferred to (A)'s data fork without bumping the refcount.
      
      The results are catastrophic -- file (B) and the refcount btree are now
      corrupt.  Solve this race by forcing the xfs_blockgc_free_space to run
      synchronously, which causes xfs_icwalk to return to inodes that were
      skipped because the blockgc code couldn't take the IOLOCK.  This is safe
      to do here because the VFS has already prohibited new writer threads.
      
      Fixes: 10ddf64e ("xfs: remove leftover CoW reservations when remounting ro")
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChandan Babu R <chandan.babu@oracle.com>
      089558bc
  8. 07 12月, 2021 3 次提交
    • J
      netfs: fix parameter of cleanup() · 3cfef1b6
      Jeffle Xu 提交于
      The order of these two parameters is just reversed. gcc didn't warn on
      that, probably because 'void *' can be converted from or to other
      pointer types without warning.
      
      Cc: stable@vger.kernel.org
      Fixes: 3d3c9504 ("netfs: Provide readahead and readpage netfs helpers")
      Fixes: e1b1240c ("netfs: Add write_begin helper")
      Signed-off-by: NJeffle Xu <jefflexu@linux.alibaba.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Reviewed-by: NJeff Layton <jlayton@redhat.com>
      Link: https://lore.kernel.org/r/20211207031449.100510-1-jefflexu@linux.alibaba.com/ # v1
      3cfef1b6
    • D
      netfs: Fix lockdep warning from taking sb_writers whilst holding mmap_lock · 598ad0bd
      David Howells 提交于
      Taking sb_writers whilst holding mmap_lock isn't allowed and will result in
      a lockdep warning like that below.  The problem comes from cachefiles
      needing to take the sb_writers lock in order to do a write to the cache,
      but being asked to do this by netfslib called from readpage, readahead or
      write_begin[1].
      
      Fix this by always offloading the write to the cache off to a worker
      thread.  The main thread doesn't need to wait for it, so deadlock can be
      avoided.
      
      This can be tested by running the quick xfstests on something like afs or
      ceph with lockdep enabled.
      
      WARNING: possible circular locking dependency detected
      5.15.0-rc1-build2+ #292 Not tainted
      ------------------------------------------------------
      holetest/65517 is trying to acquire lock:
      ffff88810c81d730 (mapping.invalidate_lock#3){.+.+}-{3:3}, at: filemap_fault+0x276/0x7a5
      
      but task is already holding lock:
      ffff8881595b53e8 (&mm->mmap_lock#2){++++}-{3:3}, at: do_user_addr_fault+0x28d/0x59c
      
      which lock already depends on the new lock.
      
      
      the existing dependency chain (in reverse order) is:
      
      -> #2 (&mm->mmap_lock#2){++++}-{3:3}:
             validate_chain+0x3c4/0x4a8
             __lock_acquire+0x89d/0x949
             lock_acquire+0x2dc/0x34b
             __might_fault+0x87/0xb1
             strncpy_from_user+0x25/0x18c
             removexattr+0x7c/0xe5
             __do_sys_fremovexattr+0x73/0x96
             do_syscall_64+0x67/0x7a
             entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      -> #1 (sb_writers#10){.+.+}-{0:0}:
             validate_chain+0x3c4/0x4a8
             __lock_acquire+0x89d/0x949
             lock_acquire+0x2dc/0x34b
             cachefiles_write+0x2b3/0x4bb
             netfs_rreq_do_write_to_cache+0x3b5/0x432
             netfs_readpage+0x2de/0x39d
             filemap_read_page+0x51/0x94
             filemap_get_pages+0x26f/0x413
             filemap_read+0x182/0x427
             new_sync_read+0xf0/0x161
             vfs_read+0x118/0x16e
             ksys_read+0xb8/0x12e
             do_syscall_64+0x67/0x7a
             entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      -> #0 (mapping.invalidate_lock#3){.+.+}-{3:3}:
             check_noncircular+0xe4/0x129
             check_prev_add+0x16b/0x3a4
             validate_chain+0x3c4/0x4a8
             __lock_acquire+0x89d/0x949
             lock_acquire+0x2dc/0x34b
             down_read+0x40/0x4a
             filemap_fault+0x276/0x7a5
             __do_fault+0x96/0xbf
             do_fault+0x262/0x35a
             __handle_mm_fault+0x171/0x1b5
             handle_mm_fault+0x12a/0x233
             do_user_addr_fault+0x3d2/0x59c
             exc_page_fault+0x85/0xa5
             asm_exc_page_fault+0x1e/0x30
      
      other info that might help us debug this:
      
      Chain exists of:
        mapping.invalidate_lock#3 --> sb_writers#10 --> &mm->mmap_lock#2
      
       Possible unsafe locking scenario:
      
             CPU0                    CPU1
             ----                    ----
        lock(&mm->mmap_lock#2);
                                     lock(sb_writers#10);
                                     lock(&mm->mmap_lock#2);
        lock(mapping.invalidate_lock#3);
      
       *** DEADLOCK ***
      
      1 lock held by holetest/65517:
       #0: ffff8881595b53e8 (&mm->mmap_lock#2){++++}-{3:3}, at: do_user_addr_fault+0x28d/0x59c
      
      stack backtrace:
      CPU: 0 PID: 65517 Comm: holetest Not tainted 5.15.0-rc1-build2+ #292
      Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014
      Call Trace:
       dump_stack_lvl+0x45/0x59
       check_noncircular+0xe4/0x129
       ? print_circular_bug+0x207/0x207
       ? validate_chain+0x461/0x4a8
       ? add_chain_block+0x88/0xd9
       ? hlist_add_head_rcu+0x49/0x53
       check_prev_add+0x16b/0x3a4
       validate_chain+0x3c4/0x4a8
       ? check_prev_add+0x3a4/0x3a4
       ? mark_lock+0xa5/0x1c6
       __lock_acquire+0x89d/0x949
       lock_acquire+0x2dc/0x34b
       ? filemap_fault+0x276/0x7a5
       ? rcu_read_unlock+0x59/0x59
       ? add_to_page_cache_lru+0x13c/0x13c
       ? lock_is_held_type+0x7b/0xd3
       down_read+0x40/0x4a
       ? filemap_fault+0x276/0x7a5
       filemap_fault+0x276/0x7a5
       ? pagecache_get_page+0x2dd/0x2dd
       ? __lock_acquire+0x8bc/0x949
       ? pte_offset_kernel.isra.0+0x6d/0xc3
       __do_fault+0x96/0xbf
       ? do_fault+0x124/0x35a
       do_fault+0x262/0x35a
       ? handle_pte_fault+0x1c1/0x20d
       __handle_mm_fault+0x171/0x1b5
       ? handle_pte_fault+0x20d/0x20d
       ? __lock_release+0x151/0x254
       ? mark_held_locks+0x1f/0x78
       ? rcu_read_unlock+0x3a/0x59
       handle_mm_fault+0x12a/0x233
       do_user_addr_fault+0x3d2/0x59c
       ? pgtable_bad+0x70/0x70
       ? rcu_read_lock_bh_held+0xab/0xab
       exc_page_fault+0x85/0xa5
       ? asm_exc_page_fault+0x8/0x30
       asm_exc_page_fault+0x1e/0x30
      RIP: 0033:0x40192f
      Code: ff 48 89 c3 48 8b 05 50 28 00 00 48 85 ed 7e 23 31 d2 4b 8d 0c 2f eb 0a 0f 1f 00 48 8b 05 39 28 00 00 48 0f af c2 48 83 c2 01 <48> 89 1c 01 48 39 d5 7f e8 8b 0d f2 27 00 00 31 c0 85 c9 74 0e 8b
      RSP: 002b:00007f9931867eb0 EFLAGS: 00010202
      RAX: 0000000000000000 RBX: 00007f9931868700 RCX: 00007f993206ac00
      RDX: 0000000000000001 RSI: 0000000000000000 RDI: 00007ffc13e06ee0
      RBP: 0000000000000100 R08: 0000000000000000 R09: 00007f9931868700
      R10: 00007f99318689d0 R11: 0000000000000202 R12: 00007ffc13e06ee0
      R13: 0000000000000c00 R14: 00007ffc13e06e00 R15: 00007f993206a000
      
      Fixes: 726218fd ("netfs: Define an interface to talk to a cache")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Tested-by: NJeff Layton <jlayton@kernel.org>
      cc: Jan Kara <jack@suse.cz>
      cc: linux-cachefs@redhat.com
      cc: linux-fsdevel@vger.kernel.org
      Link: https://lore.kernel.org/r/20210922110420.GA21576@quack2.suse.cz/ [1]
      Link: https://lore.kernel.org/r/163887597541.1596626.2668163316598972956.stgit@warthog.procyon.org.uk/ # v1
      598ad0bd
    • J
      io-wq: remove spurious bit clear on task_work addition · e47498af
      Jens Axboe 提交于
      There's a small race here where the task_work could finish and drop
      the worker itself, so that by the time that task_work_add() returns
      with a successful addition we've already put the worker.
      
      The worker callbacks clear this bit themselves, so we don't actually
      need to manually clear it in the caller. Get rid of it.
      
      Reported-by: syzbot+b60c982cb0efc5e05a47@syzkaller.appspotmail.com
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      e47498af
  9. 04 12月, 2021 5 次提交
  10. 03 12月, 2021 1 次提交