1. 14 3月, 2022 2 次提交
  2. 02 3月, 2022 1 次提交
    • S
      btrfs: tree-checker: use u64 for item data end to avoid overflow · a6ab66eb
      Su Yue 提交于
      User reported there is an array-index-out-of-bounds access while
      mounting the crafted image:
      
        [350.411942 ] loop0: detected capacity change from 0 to 262144
        [350.427058 ] BTRFS: device fsid a62e00e8-e94e-4200-8217-12444de93c2e devid 1 transid 8 /dev/loop0 scanned by systemd-udevd (1044)
        [350.428564 ] BTRFS info (device loop0): disk space caching is enabled
        [350.428568 ] BTRFS info (device loop0): has skinny extents
        [350.429589 ]
        [350.429619 ] UBSAN: array-index-out-of-bounds in fs/btrfs/struct-funcs.c:161:1
        [350.429636 ] index 1048096 is out of range for type 'page *[16]'
        [350.429650 ] CPU: 0 PID: 9 Comm: kworker/u8:1 Not tainted 5.16.0-rc4
        [350.429652 ] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-1ubuntu1.1 04/01/2014
        [350.429653 ] Workqueue: btrfs-endio-meta btrfs_work_helper [btrfs]
        [350.429772 ] Call Trace:
        [350.429774 ]  <TASK>
        [350.429776 ]  dump_stack_lvl+0x47/0x5c
        [350.429780 ]  ubsan_epilogue+0x5/0x50
        [350.429786 ]  __ubsan_handle_out_of_bounds+0x66/0x70
        [350.429791 ]  btrfs_get_16+0xfd/0x120 [btrfs]
        [350.429832 ]  check_leaf+0x754/0x1a40 [btrfs]
        [350.429874 ]  ? filemap_read+0x34a/0x390
        [350.429878 ]  ? load_balance+0x175/0xfc0
        [350.429881 ]  validate_extent_buffer+0x244/0x310 [btrfs]
        [350.429911 ]  btrfs_validate_metadata_buffer+0xf8/0x100 [btrfs]
        [350.429935 ]  end_bio_extent_readpage+0x3af/0x850 [btrfs]
        [350.429969 ]  ? newidle_balance+0x259/0x480
        [350.429972 ]  end_workqueue_fn+0x29/0x40 [btrfs]
        [350.429995 ]  btrfs_work_helper+0x71/0x330 [btrfs]
        [350.430030 ]  ? __schedule+0x2fb/0xa40
        [350.430033 ]  process_one_work+0x1f6/0x400
        [350.430035 ]  ? process_one_work+0x400/0x400
        [350.430036 ]  worker_thread+0x2d/0x3d0
        [350.430037 ]  ? process_one_work+0x400/0x400
        [350.430038 ]  kthread+0x165/0x190
        [350.430041 ]  ? set_kthread_struct+0x40/0x40
        [350.430043 ]  ret_from_fork+0x1f/0x30
        [350.430047 ]  </TASK>
        [350.430047 ]
        [350.430077 ] BTRFS warning (device loop0): bad eb member start: ptr 0xffe20f4e start 20975616 member offset 4293005178 size 2
      
      btrfs check reports:
        corrupt leaf: root=3 block=20975616 physical=20975616 slot=1, unexpected
        item end, have 4294971193 expect 3897
      
      The first slot item offset is 4293005033 and the size is 1966160.
      In check_leaf, we use btrfs_item_end() to check item boundary versus
      extent_buffer data size. However, return type of btrfs_item_end() is u32.
      (u32)(4293005033 + 1966160) == 3897, overflow happens and the result 3897
      equals to leaf data size reasonably.
      
      Fix it by use u64 variable to store item data end in check_leaf() to
      avoid u32 overflow.
      
      This commit does solve the invalid memory access showed by the stack
      trace.  However, its metadata profile is DUP and another copy of the
      leaf is fine.  So the image can be mounted successfully. But when umount
      is called, the ASSERT btrfs_mark_buffer_dirty() will be triggered
      because the only node in extent tree has 0 item and invalid owner. It's
      solved by another commit
      "btrfs: check extent buffer owner against the owner rootid".
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=215299Reported-by: NWenqing Liu <wenqingliu0120@gmail.com>
      CC: stable@vger.kernel.org # 4.19+
      Signed-off-by: NSu Yue <l@damenly.su>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      a6ab66eb
  3. 31 1月, 2022 2 次提交
    • S
      btrfs: tree-checker: check item_size for dev_item · ea1d1ca4
      Su Yue 提交于
      Check item size before accessing the device item to avoid out of bound
      access, similar to inode_item check.
      Signed-off-by: NSu Yue <l@damenly.su>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      ea1d1ca4
    • S
      btrfs: tree-checker: check item_size for inode_item · 0c982944
      Su Yue 提交于
      while mounting the crafted image, out-of-bounds access happens:
      
        [350.429619] UBSAN: array-index-out-of-bounds in fs/btrfs/struct-funcs.c:161:1
        [350.429636] index 1048096 is out of range for type 'page *[16]'
        [350.429650] CPU: 0 PID: 9 Comm: kworker/u8:1 Not tainted 5.16.0-rc4 #1
        [350.429652] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-1ubuntu1.1 04/01/2014
        [350.429653] Workqueue: btrfs-endio-meta btrfs_work_helper [btrfs]
        [350.429772] Call Trace:
        [350.429774]  <TASK>
        [350.429776]  dump_stack_lvl+0x47/0x5c
        [350.429780]  ubsan_epilogue+0x5/0x50
        [350.429786]  __ubsan_handle_out_of_bounds+0x66/0x70
        [350.429791]  btrfs_get_16+0xfd/0x120 [btrfs]
        [350.429832]  check_leaf+0x754/0x1a40 [btrfs]
        [350.429874]  ? filemap_read+0x34a/0x390
        [350.429878]  ? load_balance+0x175/0xfc0
        [350.429881]  validate_extent_buffer+0x244/0x310 [btrfs]
        [350.429911]  btrfs_validate_metadata_buffer+0xf8/0x100 [btrfs]
        [350.429935]  end_bio_extent_readpage+0x3af/0x850 [btrfs]
        [350.429969]  ? newidle_balance+0x259/0x480
        [350.429972]  end_workqueue_fn+0x29/0x40 [btrfs]
        [350.429995]  btrfs_work_helper+0x71/0x330 [btrfs]
        [350.430030]  ? __schedule+0x2fb/0xa40
        [350.430033]  process_one_work+0x1f6/0x400
        [350.430035]  ? process_one_work+0x400/0x400
        [350.430036]  worker_thread+0x2d/0x3d0
        [350.430037]  ? process_one_work+0x400/0x400
        [350.430038]  kthread+0x165/0x190
        [350.430041]  ? set_kthread_struct+0x40/0x40
        [350.430043]  ret_from_fork+0x1f/0x30
        [350.430047]  </TASK>
        [350.430077] BTRFS warning (device loop0): bad eb member start: ptr 0xffe20f4e start 20975616 member offset 4293005178 size 2
      
      check_leaf() is checking the leaf:
      
        corrupt leaf: root=4 block=29396992 slot=1, bad key order, prev (16140901064495857664 1 0) current (1 204 12582912)
        leaf 29396992 items 6 free space 3565 generation 6 owner DEV_TREE
        leaf 29396992 flags 0x1(WRITTEN) backref revision 1
        fs uuid a62e00e8-e94e-4200-8217-12444de93c2e
        chunk uuid cecbd0f7-9ca0-441e-ae9f-f782f9732bd8
      	  item 0 key (16140901064495857664 INODE_ITEM 0) itemoff 3955 itemsize 40
      		  generation 0 transid 0 size 0 nbytes 17592186044416
      		  block group 0 mode 52667 links 33 uid 0 gid 2104132511 rdev 94223634821136
      		  sequence 100305 flags 0x2409000a(none)
      		  atime 0.0 (1970-01-01 08:00:00)
      		  ctime 2973280098083405823.4294967295 (-269783007-01-01 21:37:03)
      		  mtime 18446744071572723616.4026825121 (1902-04-16 12:40:00)
      		  otime 9249929404488876031.4294967295 (622322949-04-16 04:25:58)
      	  item 1 key (1 DEV_EXTENT 12582912) itemoff 3907 itemsize 48
      		  dev extent chunk_tree 3
      		  chunk_objectid 256 chunk_offset 12582912 length 8388608
      		  chunk_tree_uuid cecbd0f7-9ca0-441e-ae9f-f782f9732bd8
      
      The corrupted leaf of device tree has an inode item. The leaf passed
      checksum and others checks in validate_extent_buffer until check_leaf_item().
      Because of the key type BTRFS_INODE_ITEM, check_inode_item() is called even we
      are in the device tree. Since the
      item offset + sizeof(struct btrfs_inode_item) > eb->len, out-of-bounds access
      is triggered.
      
      The item end vs leaf boundary check has been done before
      check_leaf_item(), so fix it by checking item size in check_inode_item()
      before access of the inode item in extent buffer.
      
      Other check functions except check_dev_item() in check_leaf_item()
      have their item size checks.
      The commit for check_dev_item() is followed.
      
      No regression observed during running fstests.
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=215299
      CC: stable@vger.kernel.org # 5.10+
      CC: Wenqing Liu <wenqingliu0120@gmail.com>
      Signed-off-by: NSu Yue <l@damenly.su>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      0c982944
  4. 03 1月, 2022 2 次提交
  5. 23 8月, 2021 3 次提交
    • B
      btrfs: add ro compat flags to inodes · 77eea05e
      Boris Burkov 提交于
      Currently, inode flags are fully backwards incompatible in btrfs. If we
      introduce a new inode flag, then tree-checker will detect it and fail.
      This can even cause us to fail to mount entirely. To make it possible to
      introduce new flags which can be read-only compatible, like VERITY, we
      add new ro flags to btrfs without treating them quite so harshly in
      tree-checker. A read-only file system can survive an unexpected flag,
      and can be mounted.
      
      As for the implementation, it unfortunately gets a little complicated.
      
      The on-disk representation of the inode, btrfs_inode_item, has an __le64
      for flags but the in-memory representation, btrfs_inode, uses a u32.
      David Sterba had the nice idea that we could reclaim those wasted 32 bits
      on disk and use them for the new ro_compat flags.
      
      It turns out that the tree-checker code which checks for unknown flags
      is broken, and ignores the upper 32 bits we are hoping to use. The issue
      is that the flags use the literal 1 rather than 1ULL, so the flags are
      signed ints, and one of them is specifically (1 << 31). As a result, the
      mask which ORs the flags is a negative integer on machines where int is
      32 bit twos complement. When tree-checker evaluates the expression:
      
        btrfs_inode_flags(leaf, iitem) & ~BTRFS_INODE_FLAG_MASK)
      
      The mask is something like 0x80000abc, which gets promoted to u64 with
      sign extension to 0xffffffff80000abc. Negating that 64 bit mask leaves
      all the upper bits zeroed, and we can't detect unexpected flags.
      
      This suggests that we can't use those bits after all. Luckily, we have
      good reason to believe that they are zero anyway. Inode flags are
      metadata, which is always checksummed, so any bit flips that would
      introduce 1s would cause a checksum failure anyway (excluding the
      improbable case of the checksum getting corrupted exactly badly).
      
      Further, unless the 1 << 31 flag is used, the cast to u64 of the 32 bit
      inode flag should preserve its value and not add leading zeroes
      (at least for twos complement). The only place that flag
      (BTRFS_INODE_ROOT_ITEM_INIT) is used is in a special inode embedded in
      the root item, and indeed for that inode we see 0xffffffff80000000 as
      the flags on disk. However, that inode is never seen by tree checker,
      nor is it used in a context where verity might be meaningful.
      Theoretically, a future ro flag might cause trouble on that inode, so we
      should proactively clean up that mess before it does.
      
      With the introduction of the new ro flags, keep two separate unsigned
      masks and check them against the appropriate u32. Since we no longer run
      afoul of sign extension, this also stops writing out 0xffffffff80000000
      in root_item inodes going forward.
      Signed-off-by: NBoris Burkov <boris@bur.io>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      77eea05e
    • D
      btrfs: tree-checker: add missing stripe checks for raid1c3/4 profiles · 6c154ba4
      David Sterba 提交于
      The stripe checks for raid1c3/raid1c4 are missing in the sequence in
      btrfs_check_chunk_valid.
      Reviewed-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      6c154ba4
    • D
      btrfs: tree-checker: use table values for stripe checks · 0ac6e06b
      David Sterba 提交于
      There are hardcoded values in several checks regarding chunks and stripe
      constraints. We have that defined in the raid table and ought to use it.
      Reviewed-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      0ac6e06b
  6. 19 4月, 2021 1 次提交
  7. 23 2月, 2021 1 次提交
    • J
      btrfs: tree-checker: do not error out if extent ref hash doesn't match · 1119a72e
      Josef Bacik 提交于
      The tree checker checks the extent ref hash at read and write time to
      make sure we do not corrupt the file system.  Generally extent
      references go inline, but if we have enough of them we need to make an
      item, which looks like
      
      key.objectid	= <bytenr>
      key.type	= <BTRFS_EXTENT_DATA_REF_KEY|BTRFS_TREE_BLOCK_REF_KEY>
      key.offset	= hash(tree, owner, offset)
      
      However if key.offset collide with an unrelated extent reference we'll
      simply key.offset++ until we get something that doesn't collide.
      Obviously this doesn't match at tree checker time, and thus we error
      while writing out the transaction.  This is relatively easy to
      reproduce, simply do something like the following
      
        xfs_io -f -c "pwrite 0 1M" file
        offset=2
      
        for i in {0..10000}
        do
      	  xfs_io -c "reflink file 0 ${offset}M 1M" file
      	  offset=$(( offset + 2 ))
        done
      
        xfs_io -c "reflink file 0 17999258914816 1M" file
        xfs_io -c "reflink file 0 35998517829632 1M" file
        xfs_io -c "reflink file 0 53752752058368 1M" file
      
        btrfs filesystem sync
      
      And the sync will error out because we'll abort the transaction.  The
      magic values above are used because they generate hash collisions with
      the first file in the main subvol.
      
      The fix for this is to remove the hash value check from tree checker, as
      we have no idea which offset ours should belong to.
      Reported-by: NTuomas Lähdekorpi <tuomas.lahdekorpi@gmail.com>
      Fixes: 0785a9aa ("btrfs: tree-checker: Add EXTENT_DATA_REF check")
      CC: stable@vger.kernel.org # 5.4+
      Reviewed-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      [ add comment]
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      1119a72e
  8. 08 1月, 2021 1 次提交
    • S
      btrfs: tree-checker: check if chunk item end overflows · 347fb0cf
      Su Yue 提交于
      While mounting a crafted image provided by user, kernel panics due to
      the invalid chunk item whose end is less than start.
      
        [66.387422] loop: module loaded
        [66.389773] loop0: detected capacity change from 262144 to 0
        [66.427708] BTRFS: device fsid a62e00e8-e94e-4200-8217-12444de93c2e devid 1 transid 12 /dev/loop0 scanned by mount (613)
        [66.431061] BTRFS info (device loop0): disk space caching is enabled
        [66.431078] BTRFS info (device loop0): has skinny extents
        [66.437101] BTRFS error: insert state: end < start 29360127 37748736
        [66.437136] ------------[ cut here ]------------
        [66.437140] WARNING: CPU: 16 PID: 613 at fs/btrfs/extent_io.c:557 insert_state.cold+0x1a/0x46 [btrfs]
        [66.437369] CPU: 16 PID: 613 Comm: mount Tainted: G           O      5.11.0-rc1-custom #45
        [66.437374] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.14.0-1 04/01/2014
        [66.437378] RIP: 0010:insert_state.cold+0x1a/0x46 [btrfs]
        [66.437420] RSP: 0018:ffff93e5414c3908 EFLAGS: 00010286
        [66.437427] RAX: 0000000000000000 RBX: 0000000001bfffff RCX: 0000000000000000
        [66.437431] RDX: 0000000000000000 RSI: ffffffffb90d4660 RDI: 00000000ffffffff
        [66.437434] RBP: ffff93e5414c3938 R08: 0000000000000001 R09: 0000000000000001
        [66.437438] R10: ffff93e5414c3658 R11: 0000000000000000 R12: ffff8ec782d72aa0
        [66.437441] R13: ffff8ec78bc71628 R14: 0000000000000000 R15: 0000000002400000
        [66.437447] FS:  00007f01386a8580(0000) GS:ffff8ec809000000(0000) knlGS:0000000000000000
        [66.437451] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        [66.437455] CR2: 00007f01382fa000 CR3: 0000000109a34000 CR4: 0000000000750ee0
        [66.437460] PKRU: 55555554
        [66.437464] Call Trace:
        [66.437475]  set_extent_bit+0x652/0x740 [btrfs]
        [66.437539]  set_extent_bits_nowait+0x1d/0x20 [btrfs]
        [66.437576]  add_extent_mapping+0x1e0/0x2f0 [btrfs]
        [66.437621]  read_one_chunk+0x33c/0x420 [btrfs]
        [66.437674]  btrfs_read_chunk_tree+0x6a4/0x870 [btrfs]
        [66.437708]  ? kvm_sched_clock_read+0x18/0x40
        [66.437739]  open_ctree+0xb32/0x1734 [btrfs]
        [66.437781]  ? bdi_register_va+0x1b/0x20
        [66.437788]  ? super_setup_bdi_name+0x79/0xd0
        [66.437810]  btrfs_mount_root.cold+0x12/0xeb [btrfs]
        [66.437854]  ? __kmalloc_track_caller+0x217/0x3b0
        [66.437873]  legacy_get_tree+0x34/0x60
        [66.437880]  vfs_get_tree+0x2d/0xc0
        [66.437888]  vfs_kern_mount.part.0+0x78/0xc0
        [66.437897]  vfs_kern_mount+0x13/0x20
        [66.437902]  btrfs_mount+0x11f/0x3c0 [btrfs]
        [66.437940]  ? kfree+0x5ff/0x670
        [66.437944]  ? __kmalloc_track_caller+0x217/0x3b0
        [66.437962]  legacy_get_tree+0x34/0x60
        [66.437974]  vfs_get_tree+0x2d/0xc0
        [66.437983]  path_mount+0x48c/0xd30
        [66.437998]  __x64_sys_mount+0x108/0x140
        [66.438011]  do_syscall_64+0x38/0x50
        [66.438018]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
        [66.438023] RIP: 0033:0x7f0138827f6e
        [66.438033] RSP: 002b:00007ffecd79edf8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
        [66.438040] RAX: ffffffffffffffda RBX: 00007f013894c264 RCX: 00007f0138827f6e
        [66.438044] RDX: 00005593a4a41360 RSI: 00005593a4a33690 RDI: 00005593a4a3a6c0
        [66.438047] RBP: 00005593a4a33440 R08: 0000000000000000 R09: 0000000000000001
        [66.438050] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
        [66.438054] R13: 00005593a4a3a6c0 R14: 00005593a4a41360 R15: 00005593a4a33440
        [66.438078] irq event stamp: 18169
        [66.438082] hardirqs last  enabled at (18175): [<ffffffffb81154bf>] console_unlock+0x4ff/0x5f0
        [66.438088] hardirqs last disabled at (18180): [<ffffffffb8115427>] console_unlock+0x467/0x5f0
        [66.438092] softirqs last  enabled at (16910): [<ffffffffb8a00fe2>] asm_call_irq_on_stack+0x12/0x20
        [66.438097] softirqs last disabled at (16905): [<ffffffffb8a00fe2>] asm_call_irq_on_stack+0x12/0x20
        [66.438103] ---[ end trace e114b111db64298b ]---
        [66.438107] BTRFS error: found node 12582912 29360127 on insert of 37748736 29360127
        [66.438127] BTRFS critical: panic in extent_io_tree_panic:679: locking error: extent tree was modified by another thread while locked (errno=-17 Object already exists)
        [66.441069] ------------[ cut here ]------------
        [66.441072] kernel BUG at fs/btrfs/extent_io.c:679!
        [66.442064] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
        [66.443018] CPU: 16 PID: 613 Comm: mount Tainted: G        W  O      5.11.0-rc1-custom #45
        [66.444538] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.14.0-1 04/01/2014
        [66.446223] RIP: 0010:extent_io_tree_panic.isra.0+0x23/0x25 [btrfs]
        [66.450878] RSP: 0018:ffff93e5414c3948 EFLAGS: 00010246
        [66.451840] RAX: 0000000000000000 RBX: 0000000001bfffff RCX: 0000000000000000
        [66.453141] RDX: 0000000000000000 RSI: ffffffffb90d4660 RDI: 00000000ffffffff
        [66.454445] RBP: ffff93e5414c3948 R08: 0000000000000001 R09: 0000000000000001
        [66.455743] R10: ffff93e5414c3658 R11: 0000000000000000 R12: ffff8ec782d728c0
        [66.457055] R13: ffff8ec78bc71628 R14: ffff8ec782d72aa0 R15: 0000000002400000
        [66.458356] FS:  00007f01386a8580(0000) GS:ffff8ec809000000(0000) knlGS:0000000000000000
        [66.459841] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        [66.460895] CR2: 00007f01382fa000 CR3: 0000000109a34000 CR4: 0000000000750ee0
        [66.462196] PKRU: 55555554
        [66.462692] Call Trace:
        [66.463139]  set_extent_bit.cold+0x30/0x98 [btrfs]
        [66.464049]  set_extent_bits_nowait+0x1d/0x20 [btrfs]
        [66.490466]  add_extent_mapping+0x1e0/0x2f0 [btrfs]
        [66.514097]  read_one_chunk+0x33c/0x420 [btrfs]
        [66.534976]  btrfs_read_chunk_tree+0x6a4/0x870 [btrfs]
        [66.555718]  ? kvm_sched_clock_read+0x18/0x40
        [66.575758]  open_ctree+0xb32/0x1734 [btrfs]
        [66.595272]  ? bdi_register_va+0x1b/0x20
        [66.614638]  ? super_setup_bdi_name+0x79/0xd0
        [66.633809]  btrfs_mount_root.cold+0x12/0xeb [btrfs]
        [66.652938]  ? __kmalloc_track_caller+0x217/0x3b0
        [66.671925]  legacy_get_tree+0x34/0x60
        [66.690300]  vfs_get_tree+0x2d/0xc0
        [66.708221]  vfs_kern_mount.part.0+0x78/0xc0
        [66.725808]  vfs_kern_mount+0x13/0x20
        [66.742730]  btrfs_mount+0x11f/0x3c0 [btrfs]
        [66.759350]  ? kfree+0x5ff/0x670
        [66.775441]  ? __kmalloc_track_caller+0x217/0x3b0
        [66.791750]  legacy_get_tree+0x34/0x60
        [66.807494]  vfs_get_tree+0x2d/0xc0
        [66.823349]  path_mount+0x48c/0xd30
        [66.838753]  __x64_sys_mount+0x108/0x140
        [66.854412]  do_syscall_64+0x38/0x50
        [66.869673]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
        [66.885093] RIP: 0033:0x7f0138827f6e
        [66.945613] RSP: 002b:00007ffecd79edf8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
        [66.977214] RAX: ffffffffffffffda RBX: 00007f013894c264 RCX: 00007f0138827f6e
        [66.994266] RDX: 00005593a4a41360 RSI: 00005593a4a33690 RDI: 00005593a4a3a6c0
        [67.011544] RBP: 00005593a4a33440 R08: 0000000000000000 R09: 0000000000000001
        [67.028836] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
        [67.045812] R13: 00005593a4a3a6c0 R14: 00005593a4a41360 R15: 00005593a4a33440
        [67.216138] ---[ end trace e114b111db64298c ]---
        [67.237089] RIP: 0010:extent_io_tree_panic.isra.0+0x23/0x25 [btrfs]
        [67.325317] RSP: 0018:ffff93e5414c3948 EFLAGS: 00010246
        [67.347946] RAX: 0000000000000000 RBX: 0000000001bfffff RCX: 0000000000000000
        [67.371343] RDX: 0000000000000000 RSI: ffffffffb90d4660 RDI: 00000000ffffffff
        [67.394757] RBP: ffff93e5414c3948 R08: 0000000000000001 R09: 0000000000000001
        [67.418409] R10: ffff93e5414c3658 R11: 0000000000000000 R12: ffff8ec782d728c0
        [67.441906] R13: ffff8ec78bc71628 R14: ffff8ec782d72aa0 R15: 0000000002400000
        [67.465436] FS:  00007f01386a8580(0000) GS:ffff8ec809000000(0000) knlGS:0000000000000000
        [67.511660] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        [67.535047] CR2: 00007f01382fa000 CR3: 0000000109a34000 CR4: 0000000000750ee0
        [67.558449] PKRU: 55555554
        [67.581146] note: mount[613] exited with preempt_count 2
      
      The image has a chunk item which has a logical start 37748736 and length
      18446744073701163008 (-8M). The calculated end 29360127 overflows.
      EEXIST was caught by insert_state() because of the duplicate end and
      extent_io_tree_panic() was called.
      
      Add overflow check of chunk item end to tree checker so it can be
      detected early at mount time.
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=208929
      CC: stable@vger.kernel.org # 4.19+
      Reviewed-by: NAnand Jain <anand.jain@oracle.com>
      Signed-off-by: NSu Yue <l@damenly.su>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      347fb0cf
  9. 08 12月, 2020 4 次提交
    • D
      btrfs: tree-checker: annotate all error branches as unlikely · c7c01a4a
      David Sterba 提交于
      The tree checker is called many times as it verifies metadata at
      read/write time. The checks follow a simple pattern:
      
        if (error_condition) {
      	  report_error();
      	  return -EUCLEAN;
        }
      
      All the error reporting functions are annotated as __cold that is
      supposed to hint the compiler to move the statement block out of the hot
      path. This does not seem to happen that often.
      
      As the error condition is expected to be false almost always, we can
      annotate it with 'unlikely' as this satisfies one of the few use cases
      for the annotation. The expected outcome is a stronger hint to compiler
      to reorder the checks
      
        test
        jump to exit
        test
        jump to exit
        ...
      
      which can be observed in asm of eg. check_dir_item,
      btrfs_check_chunk_valid, check_root_item or check_leaf.
      
      There's a measurable run time improvement reported by Josef, the testing
      workload went from 655 MiB/s to 677 MiB/s, which is about +3%.
      
      There should be no functional changes but some of the conditions have
      been rewritten to produce more readable result, some lines are longer
      than 80, for the sake of readability.
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      c7c01a4a
    • D
      btrfs: switch cached fs_info::csum_size from u16 to u32 · 223486c2
      David Sterba 提交于
      The fs_info value is 32bit, switch also the local u16 variables. This
      leads to a better assembly code generated due to movzwl.
      
      This simple change will shave some bytes on x86_64 and release config:
      
         text    data     bss     dec     hex filename
      1090000   17980   14912 1122892  11224c pre/btrfs.ko
      1089794   17980   14912 1122686  11217e post/btrfs.ko
      
      DELTA: -206
      Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      223486c2
    • D
      btrfs: use cached value of fs_info::csum_size everywhere · 55fc29be
      David Sterba 提交于
      btrfs_get_16 shows up in the system performance profiles (helper to read
      16bit values from on-disk structures). This is partially because of the
      checksum size that's frequently read along with data reads/writes, other
      u16 uses are from item size or directory entries.
      
      Replace all calls to btrfs_super_csum_size by the cached value from
      fs_info.
      Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Reviewed-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      55fc29be
    • D
      btrfs: add set/get accessors for root_item::drop_level · c8422684
      David Sterba 提交于
      The drop_level member is used directly unlike all the other int types in
      root_item. Add the definition and use it everywhere. The type is u8 so
      there's no conversion necessary and the helpers are properly inlined,
      this is for consistency.
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      c8422684
  10. 24 11月, 2020 1 次提交
  11. 14 11月, 2020 1 次提交
  12. 26 10月, 2020 1 次提交
  13. 07 10月, 2020 1 次提交
  14. 27 8月, 2020 1 次提交
  15. 25 5月, 2020 1 次提交
  16. 20 1月, 2020 5 次提交
  17. 13 12月, 2019 2 次提交
    • F
      Btrfs: make tree checker detect checksum items with overlapping ranges · ad1d8c43
      Filipe Manana 提交于
      Having checksum items, either on the checksums tree or in a log tree, that
      represent ranges that overlap each other is a sign of a corruption. Such
      case confuses the checksum lookup code and can result in not being able to
      find checksums or find stale checksums.
      
      So add a check for such case.
      
      This is motivated by a recent fix for a case where a log tree had checksum
      items covering ranges that overlap each other due to extent cloning, and
      resulted in missing checksums after replaying the log tree. It also helps
      detect past issues such as stale and outdated checksums due to overlapping,
      commit 27b9a812 ("Btrfs: fix csum tree corruption, duplicate and
      outdated checksums").
      
      CC: stable@vger.kernel.org # 4.4+
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      ad1d8c43
    • A
      btrfs: tree-checker: Fix error format string for size_t · 994bf9cd
      Andreas Färber 提交于
      Argument BTRFS_FILE_EXTENT_INLINE_DATA_START is defined as offsetof(),
      which returns type size_t, so we need %zu instead of %lu.
      
      This fixes a build warning on 32-bit ARM:
      
        ../fs/btrfs/tree-checker.c: In function 'check_extent_data_item':
        ../fs/btrfs/tree-checker.c:230:43: warning: format '%lu' expects argument of type 'long unsigned int', but argument 5 has type 'unsigned int' [-Wformat=]
          230 |     "invalid item size, have %u expect [%lu, %u)",
              |                                         ~~^
              |                                           long unsigned int
              |                                         %u
      
      Fixes: 153a6d29 ("btrfs: tree-checker: Check item size before reading file extent type")
      Acked-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: NAndreas Färber <afaerber@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      994bf9cd
  18. 19 11月, 2019 2 次提交
  19. 18 11月, 2019 6 次提交
  20. 26 10月, 2019 1 次提交
    • Q
      btrfs: tree-checker: Fix wrong check on max devid · 8bb177d1
      Qu Wenruo 提交于
      [BUG]
      The following script will cause false alert on devid check.
        #!/bin/bash
      
        dev1=/dev/test/test
        dev2=/dev/test/scratch1
        mnt=/mnt/btrfs
      
        umount $dev1 &> /dev/null
        umount $dev2 &> /dev/null
        umount $mnt &> /dev/null
      
        mkfs.btrfs -f $dev1
      
        mount $dev1 $mnt
      
        _fail()
        {
                echo "!!! FAILED !!!"
                exit 1
        }
      
        for ((i = 0; i < 4096; i++)); do
                btrfs dev add -f $dev2 $mnt || _fail
                btrfs dev del $dev1 $mnt || _fail
                dev_tmp=$dev1
                dev1=$dev2
                dev2=$dev_tmp
        done
      
      [CAUSE]
      Tree-checker uses BTRFS_MAX_DEVS() and BTRFS_MAX_DEVS_SYS_CHUNK() as
      upper limit for devid.  But we can have devid holes just like above
      script.
      
      So the check for devid is incorrect and could cause false alert.
      
      [FIX]
      Just remove the whole devid check.  We don't have any hard requirement
      for devid assignment.
      
      Furthermore, even devid could get corrupted by a bitflip, we still have
      dev extents verification at mount time, so corrupted data won't sneak
      in.
      
      This fixes fstests btrfs/194.
      Reported-by: NAnand Jain <anand.jain@oracle.com>
      Fixes: ab4ba2e1 ("btrfs: tree-checker: Verify dev item")
      CC: stable@vger.kernel.org # 5.2+
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      8bb177d1
  21. 09 9月, 2019 1 次提交
    • Q
      btrfs: Detect unbalanced tree with empty leaf before crashing btree operations · 62fdaa52
      Qu Wenruo 提交于
      [BUG]
      With crafted image, btrfs will panic at btree operations:
      
        kernel BUG at fs/btrfs/ctree.c:3894!
        invalid opcode: 0000 [#1] SMP PTI
        CPU: 0 PID: 1138 Comm: btrfs-transacti Not tainted 5.0.0-rc8+ #9
        RIP: 0010:__push_leaf_left+0x6b6/0x6e0
        RSP: 0018:ffffc0bd4128b990 EFLAGS: 00010246
        RAX: 0000000000000000 RBX: ffffa0a4ab8f0e38 RCX: 0000000000000000
        RDX: ffffa0a280000000 RSI: 0000000000000000 RDI: ffffa0a4b3814000
        RBP: ffffc0bd4128ba38 R08: 0000000000001000 R09: ffffc0bd4128b948
        R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000240
        R13: ffffa0a4b556fb60 R14: ffffa0a4ab8f0af0 R15: ffffa0a4ab8f0af0
        FS: 0000000000000000(0000) GS:ffffa0a4b7a00000(0000) knlGS:0000000000000000
        CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 00007f2461c80020 CR3: 000000022b32a006 CR4: 00000000000206f0
        Call Trace:
        ? _cond_resched+0x1a/0x50
        push_leaf_left+0x179/0x190
        btrfs_del_items+0x316/0x470
        btrfs_del_csums+0x215/0x3a0
        __btrfs_free_extent.isra.72+0x5a7/0xbe0
        __btrfs_run_delayed_refs+0x539/0x1120
        btrfs_run_delayed_refs+0xdb/0x1b0
        btrfs_commit_transaction+0x52/0x950
        ? start_transaction+0x94/0x450
        transaction_kthread+0x163/0x190
        kthread+0x105/0x140
        ? btrfs_cleanup_transaction+0x560/0x560
        ? kthread_destroy_worker+0x50/0x50
        ret_from_fork+0x35/0x40
        Modules linked in:
        ---[ end trace c2425e6e89b5558f ]---
      
      [CAUSE]
      The offending csum tree looks like this:
      
        checksum tree key (CSUM_TREE ROOT_ITEM 0)
        node 29741056 level 1 items 14 free 107 generation 19 owner CSUM_TREE
      	  ...
      	  key (EXTENT_CSUM EXTENT_CSUM 85975040) block 29630464 gen 17
      	  key (EXTENT_CSUM EXTENT_CSUM 89911296) block 29642752 gen 17 <<<
      	  key (EXTENT_CSUM EXTENT_CSUM 92274688) block 29646848 gen 17
      	  ...
      
        leaf 29630464 items 6 free space 1 generation 17 owner CSUM_TREE
      	  item 0 key (EXTENT_CSUM EXTENT_CSUM 85975040) itemoff 3987 itemsize 8
      		  range start 85975040 end 85983232 length 8192
      	  ...
        leaf 29642752 items 0 free space 3995 generation 17 owner 0
      		      ^ empty leaf            invalid owner ^
      
        leaf 29646848 items 1 free space 602 generation 17 owner CSUM_TREE
      	  item 0 key (EXTENT_CSUM EXTENT_CSUM 92274688) itemoff 627 itemsize 3368
      		  range start 92274688 end 95723520 length 3448832
      
      So we have a corrupted csum tree where one tree leaf is completely
      empty, causing unbalanced btree, thus leading to unexpected btree
      balance error.
      
      [FIX]
      For this particular case, we handle it in two directions to catch it:
      - Check if the tree block is empty through btrfs_verify_level_key()
        So that invalid tree blocks won't be read out through
        btrfs_search_slot() and its variants.
      
      - Check 0 tree owner in tree checker
        NO tree is using 0 as its tree owner, detect it and reject at tree
        block read time.
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=202821Reviewed-by: NNikolay Borisov <nborisov@suse.com>
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      62fdaa52