• Q
    btrfs: fix u32 overflows when left shifting stripe_nr · a7299a18
    Qu Wenruo 提交于
    [BUG]
    David reported an ASSERT() get triggered during fio load on 8 devices
    with data/raid6 and metadata/raid1c3:
    
      fio --rw=randrw --randrepeat=1 --size=3000m \
    	  --bsrange=512b-64k --bs_unaligned \
    	  --ioengine=libaio --fsync=1024 \
    	  --name=job0 --name=job1 \
    
    The ASSERT() is from rbio_add_bio() of raid56.c:
    
    	ASSERT(orig_logical >= full_stripe_start &&
    	       orig_logical + orig_len <= full_stripe_start +
    	       rbio->nr_data * BTRFS_STRIPE_LEN);
    
    Which is checking if the target rbio is crossing the full stripe
    boundary.
    
      [100.789] assertion failed: orig_logical >= full_stripe_start && orig_logical + orig_len <= full_stripe_start + rbio->nr_data * BTRFS_STRIPE_LEN, in fs/btrfs/raid56.c:1622
      [100.795] ------------[ cut here ]------------
      [100.796] kernel BUG at fs/btrfs/raid56.c:1622!
      [100.797] invalid opcode: 0000 [#1] PREEMPT SMP KASAN
      [100.798] CPU: 1 PID: 100 Comm: kworker/u8:4 Not tainted 6.4.0-rc6-default+ #124
      [100.799] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552-rebuilt.opensuse.org 04/01/2014
      [100.802] Workqueue: writeback wb_workfn (flush-btrfs-1)
      [100.803] RIP: 0010:rbio_add_bio+0x204/0x210 [btrfs]
      [100.806] RSP: 0018:ffff888104a8f300 EFLAGS: 00010246
      [100.808] RAX: 00000000000000a1 RBX: ffff8881075907e0 RCX: ffffed1020951e01
      [100.809] RDX: 0000000000000000 RSI: 0000000000000008 RDI: 0000000000000001
      [100.811] RBP: 0000000141d20000 R08: 0000000000000001 R09: ffff888104a8f04f
      [100.813] R10: ffffed1020951e09 R11: 0000000000000003 R12: ffff88810e87f400
      [100.815] R13: 0000000041d20000 R14: 0000000144529000 R15: ffff888101524000
      [100.817] FS:  0000000000000000(0000) GS:ffff88811ac00000(0000) knlGS:0000000000000000
      [100.821] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [100.822] CR2: 000055d54e44c270 CR3: 000000010a9a1006 CR4: 00000000003706a0
      [100.824] Call Trace:
      [100.825]  <TASK>
      [100.825]  ? die+0x32/0x80
      [100.826]  ? do_trap+0x12d/0x160
      [100.827]  ? rbio_add_bio+0x204/0x210 [btrfs]
      [100.827]  ? rbio_add_bio+0x204/0x210 [btrfs]
      [100.829]  ? do_error_trap+0x90/0x130
      [100.830]  ? rbio_add_bio+0x204/0x210 [btrfs]
      [100.831]  ? handle_invalid_op+0x2c/0x30
      [100.833]  ? rbio_add_bio+0x204/0x210 [btrfs]
      [100.835]  ? exc_invalid_op+0x29/0x40
      [100.836]  ? asm_exc_invalid_op+0x16/0x20
      [100.837]  ? rbio_add_bio+0x204/0x210 [btrfs]
      [100.837]  raid56_parity_write+0x64/0x270 [btrfs]
      [100.838]  btrfs_submit_chunk+0x26e/0x800 [btrfs]
      [100.840]  ? btrfs_bio_init+0x80/0x80 [btrfs]
      [100.841]  ? release_pages+0x503/0x6d0
      [100.842]  ? folio_unlock+0x2f/0x60
      [100.844]  ? __folio_put+0x60/0x60
      [100.845]  ? btrfs_do_readpage+0xae0/0xae0 [btrfs]
      [100.847]  btrfs_submit_bio+0x21/0x60 [btrfs]
      [100.847]  submit_one_bio+0x6a/0xb0 [btrfs]
      [100.849]  extent_write_cache_pages+0x395/0x680 [btrfs]
      [100.850]  ? __extent_writepage+0x520/0x520 [btrfs]
      [100.851]  ? mark_usage+0x190/0x190
      [100.852]  extent_writepages+0xdb/0x130 [btrfs]
      [100.853]  ? extent_write_locked_range+0x480/0x480 [btrfs]
      [100.854]  ? mark_usage+0x190/0x190
      [100.854]  ? attach_extent_buffer_page+0x220/0x220 [btrfs]
      [100.855]  ? reacquire_held_locks+0x178/0x280
      [100.856]  ? writeback_sb_inodes+0x245/0x7f0
      [100.857]  do_writepages+0x102/0x2e0
      [100.858]  ? page_writeback_cpu_online+0x10/0x10
      [100.859]  ? __lock_release.isra.0+0x14a/0x4d0
      [100.860]  ? reacquire_held_locks+0x280/0x280
      [100.861]  ? __lock_acquired+0x1e9/0x3d0
      [100.862]  ? do_raw_spin_lock+0x1b0/0x1b0
      [100.863]  __writeback_single_inode+0x94/0x450
      [100.864]  writeback_sb_inodes+0x372/0x7f0
      [100.864]  ? lock_sync+0xd0/0xd0
      [100.865]  ? do_raw_spin_unlock+0x93/0xf0
      [100.866]  ? sync_inode_metadata+0xc0/0xc0
      [100.867]  ? rwsem_optimistic_spin+0x340/0x340
      [100.868]  __writeback_inodes_wb+0x70/0x130
      [100.869]  wb_writeback+0x2d1/0x530
      [100.869]  ? __writeback_inodes_wb+0x130/0x130
      [100.870]  ? lockdep_hardirqs_on_prepare.part.0+0xf1/0x1c0
      [100.870]  wb_do_writeback+0x3eb/0x480
      [100.871]  ? wb_writeback+0x530/0x530
      [100.871]  ? mark_lock_irq+0xcd0/0xcd0
      [100.872]  wb_workfn+0xe0/0x3f0<
    
    [CAUSE]
    Commit a97699d1 ("btrfs: replace map_lookup->stripe_len by
    BTRFS_STRIPE_LEN") changes how we calculate the map length, to reduce
    u64 division.
    
    Function btrfs_max_io_len() is to get the length to the stripe boundary.
    
    It calculates the full stripe start offset (inside the chunk) by the
    following code:
    
    		*full_stripe_start =
    			rounddown(*stripe_nr, nr_data_stripes(map)) <<
    			BTRFS_STRIPE_LEN_SHIFT;
    
    The calculation itself is fine, but the value returned by rounddown() is
    dependent on both @stripe_nr (which is u32) and nr_data_stripes() (which
    returned int).
    
    Thus the result is also u32, then we do the left shift, which can
    overflow u32.
    
    If such overflow happens, @full_stripe_start will be a value way smaller
    than @offset, causing later "full_stripe_len - (offset -
    *full_stripe_start)" to underflow, thus make later length calculation to
    have no stripe boundary limit, resulting a write bio to exceed stripe
    boundary.
    
    There are some other locations like this, with a u32 @stripe_nr got left
    shift, which can lead to a similar overflow.
    
    [FIX]
    Fix all @stripe_nr with left shift with a type cast to u64 before the
    left shift.
    
    Those involved @stripe_nr or similar variables are recording the stripe
    number inside the chunk, which is small enough to be contained by u32,
    but their offset inside the chunk can not fit into u32.
    
    Thus for those specific left shifts, a type cast to u64 is necessary so
    this patch does not touch them and the code will be cleaned up in the
    future to keep the fix minimal.
    Reported-by: NDavid Sterba <dsterba@suse.com>
    Fixes: a97699d1 ("btrfs: replace map_lookup->stripe_len by BTRFS_STRIPE_LEN")
    Tested-by: NDavid Sterba <dsterba@suse.com>
    Signed-off-by: NQu Wenruo <wqu@suse.com>
    Signed-off-by: NDavid Sterba <dsterba@suse.com>
    a7299a18
volumes.c 217.3 KB