1. 27 10月, 2021 7 次提交
    • Q
      btrfs: handle errors properly inside btrfs_submit_compressed_read() · 86ccbb4d
      Qu Wenruo 提交于
      There are quite some BUG_ON()s inside btrfs_submit_compressed_read(),
      namely all errors inside the for() loop relies on BUG_ON() to handle
      -ENOMEM.
      
      Handle these errors properly by:
      
      - Wait for submitted bios to finish first
        Using wake_var_event() APIs to wait without introducing extra memory
        overhead inside compressed_bio.
        This allows us to wait for any submitted bio to finish, while still
        keeps the compressed_bio from being freed.
      
      - Introduce finish_compressed_bio_read() to finish the compressed_bio
      
      - Properly end the bio and finish compressed_bio when error happens
      
      Now in btrfs_submit_compressed_read() even when the bio submission
      failed, we can properly handle the error without triggering BUG_ON().
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      86ccbb4d
    • Q
      btrfs: subpage: add bitmap for PageChecked flag · e4f94347
      Qu Wenruo 提交于
      Although in btrfs we have very limited usage of PageChecked flag, it's
      still some page flag not yet subpage compatible.
      
      Fix it by introducing btrfs_subpage::checked_offset to do the convert.
      
      For most call sites, especially for free-space cache, COW fixup and
      btrfs_invalidatepage(), they all work in full page mode anyway.
      
      For other call sites, they work as subpage compatible mode.
      
      Some call sites need extra modification:
      
      - btrfs_drop_pages()
        Needs extra parameter to get the real range we need to clear checked
        flag.
      
        Also since btrfs_drop_pages() will accept pages beyond the dirtied
        range, update btrfs_subpage_clamp_range() to handle such case
        by setting @len to 0 if the page is beyond target range.
      
      - btrfs_invalidatepage()
        We need to call subpage helper before calling __btrfs_releasepage(),
        or it will trigger ASSERT() as page->private will be cleared.
      
      - btrfs_verify_data_csum()
        In theory we don't need the io_bio->csum check anymore, but it's
        won't hurt.  Just change the comment.
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      e4f94347
    • Q
      btrfs: introduce compressed_bio::pending_sectors to trace compressed bio · 6ec9765d
      Qu Wenruo 提交于
      For btrfs_submit_compressed_read() and btrfs_submit_compressed_write(),
      we have a pretty weird dance around compressed_bio::pending_bios:
      
        btrfs_submit_compressed_read/write()
        {
      	cb = kmalloc()
      	refcount_set(&cb->pending_bios, 0);
      	bio = btrfs_alloc_bio();
      
      	/* NOTE here, we haven't yet submitted any bio */
      	refcount_set(&cb->pending_bios, 1);
      
      	for (pg_index = 0; pg_index < cb->nr_pages; pg_index++) {
      		if (submit) {
      			/* Here we submit bio, but we always have one
      			 * extra pending_bios */
      			refcount_inc(&cb->pending_bios);
      			ret = btrfs_map_bio();
      		}
      	}
      
      	/* Submit the last bio */
      	ret = btrfs_map_bio();
        }
      
      There are two reasons why we do this:
      
      - compressed_bio::pending_bios is a refcount
        Thus if it's reduced to 0, it can not be increased again.
      
      - To ensure the compressed_bio is not freed by some submitted bios
        If the submitted bio is finished before the next bio submitted,
        we can free the compressed_bio completely.
      
      But the above code is sometimes confusing, and we can do it better by
      introducing a new member, compressed_bio::pending_sectors.
      
      Now we use compressed_bio::pending_sectors to indicate whether we have
      any pending sectors under IO or not yet submitted.
      
      If pending_sectors == 0, we're definitely the last bio of compressed_bio,
      and is OK to release the compressed bio.
      
      Now the workflow looks like this:
      
        btrfs_submit_compressed_read/write()
        {
      	cb = kmalloc()
      	atomic_set(&cb->pending_bios, 0);
      	refcount_set(&cb->pending_sectors,
      		     compressed_len >> sectorsize_bits);
      	bio = btrfs_alloc_bio();
      
      	for (pg_index = 0; pg_index < cb->nr_pages; pg_index++) {
      		if (submit) {
      			refcount_inc(&cb->pending_bios);
      			ret = btrfs_map_bio();
      		}
      	}
      
      	/* Submit the last bio */
      	refcount_inc(&cb->pending_bios);
      	ret = btrfs_map_bio();
        }
      
      For now we still need pending_bios for later error handling, but will
      remove pending_bios eventually after properly handling the errors.
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      6ec9765d
    • Q
      btrfs: subpage: make add_ra_bio_pages() compatible · 6a404910
      Qu Wenruo 提交于
      [BUG]
      If we remove the subpage limitation in add_ra_bio_pages(), then read a
      compressed extent which has part of its range in next page, like the
      following inode layout:
      
      	0	32K	64K	96K	128K
      	|<--------------|-------------->|
      
      Btrfs will trigger ASSERT() in endio function:
      
        assertion failed: atomic_read(&subpage->readers) >= nbits
        ------------[ cut here ]------------
        kernel BUG at fs/btrfs/ctree.h:3431!
        Internal error: Oops - BUG: 0 [#1] SMP
        Workqueue: btrfs-endio btrfs_work_helper [btrfs]
        Call trace:
         assertfail.constprop.0+0x28/0x2c [btrfs]
         btrfs_subpage_end_reader+0x148/0x14c [btrfs]
         end_page_read+0x8c/0x100 [btrfs]
         end_bio_extent_readpage+0x320/0x6b0 [btrfs]
         bio_endio+0x15c/0x1dc
         end_workqueue_fn+0x44/0x64 [btrfs]
         btrfs_work_helper+0x74/0x250 [btrfs]
         process_one_work+0x1d4/0x47c
         worker_thread+0x180/0x400
         kthread+0x11c/0x120
         ret_from_fork+0x10/0x30
        ---[ end trace c8b7b552d3bb408c ]---
      
      [CAUSE]
      When we read the page range [0, 64K), we find it's a compressed extent,
      and we will try to add extra pages in add_ra_bio_pages() to avoid
      reading the same compressed extent.
      
      But when we add such page into the read bio, it doesn't follow the
      behavior of btrfs_do_readpage() to properly set subpage::readers.
      
      This means, for page [64K, 128K), its subpage::readers is still 0.
      
      And when endio is executed on both pages, since page [64K, 128K) has 0
      subpage::readers, it triggers above ASSERT()
      
      [FIX]
      Function add_ra_bio_pages() is far from subpage compatible, it always
      assume PAGE_SIZE == sectorsize, thus when it skip to next range it
      always just skip PAGE_SIZE.
      
      Make it subpage compatible by:
      
      - Skip to next page properly when needed
        If we find there is already a page cache, we need to skip to next page.
        For that case, we shouldn't just skip PAGE_SIZE bytes, but use
        @pg_index to calculate the next bytenr and continue.
      
      - Only add the page range covered by current extent map
        We need to calculate which range is covered by current extent map and
        only add that part into the read bio.
      
      - Update subpage::readers before submitting the bio
      
      - Use proper cursor other than confusing @last_offset
      
      - Calculate the missed threshold based on sector size
        It's no longer using missed pages, as for 64K page size, we have at
        most 3 pages to skip. (If aligned only 2 pages)
      
      - Add ASSERT() to make sure our bytenr is always aligned
      
      - Add comment for the function
        Add a special note for subpage case, as the function won't really
        work well for subpage cases.
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      6a404910
    • Q
      btrfs: remove unused parameter nr_pages in add_ra_bio_pages() · cd9255be
      Qu Wenruo 提交于
      Variable @nr_pages only gets increased but never used.  Remove it.
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      cd9255be
    • Q
      btrfs: rename struct btrfs_io_bio to btrfs_bio · c3a3b19b
      Qu Wenruo 提交于
      Previously we had "struct btrfs_bio", which records IO context for
      mirrored IO and RAID56, and "strcut btrfs_io_bio", which records extra
      btrfs specific info for logical bytenr bio.
      
      With "btrfs_bio" renamed to "btrfs_io_context", we are safe to rename
      "btrfs_io_bio" to "btrfs_bio" which is a more suitable name now.
      
      The struct btrfs_bio changes meaning by this commit. There was a
      suggested name like btrfs_logical_bio but it's a bit long and we'd
      prefer to use a shorter name.
      
      This could be a concern for backports to older kernels where the
      different meaning could possibly cause confusion or bugs. Comparing the
      new and old structures, there's no overlap among the struct members so a
      build would break in case of incorrect backport.
      
      We haven't had many backports to bio code anyway so this is more of a
      theoretical cause of bugs and a matter of precaution but we'll need to
      keep the semantic change in mind.
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      c3a3b19b
    • Q
      btrfs: remove btrfs_bio_alloc() helper · cd8e0cca
      Qu Wenruo 提交于
      The helper btrfs_bio_alloc() is almost the same as btrfs_io_bio_alloc(),
      except it's allocating using BIO_MAX_VECS as @nr_iovecs, and initializes
      bio->bi_iter.bi_sector.
      
      However the naming itself is not using "btrfs_io_bio" to indicate its
      parameter is "strcut btrfs_io_bio" and can be easily confused with
      "struct btrfs_bio".
      
      Considering assigned bio->bi_iter.bi_sector is such a simple work and
      there are already tons of call sites doing that manually, there is no
      need to do that in a helper.
      
      Remove btrfs_bio_alloc() helper, and enhance btrfs_io_bio_alloc()
      function to provide a fail-safe value for its @nr_iovecs.
      
      And then replace all btrfs_bio_alloc() callers with
      btrfs_io_bio_alloc().
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      cd8e0cca
  2. 23 8月, 2021 5 次提交
    • Q
      btrfs: rework btrfs_decompress_buf2page() · 1c3dc173
      Qu Wenruo 提交于
      There are several bugs inside the function btrfs_decompress_buf2page()
      
      - @start_byte doesn't take bvec.bv_offset into consideration
        Thus it can't handle case where the target range is not page aligned.
      
      - Too many helper variables
        There are tons of helper variables, @buf_offset, @current_buf_start,
        @start_byte, @prev_start_byte, @working_bytes, @bytes.
        This hurts anyone who wants to read the function.
      
      - No obvious main cursor for the iteartion
        A new problem caused by previous problem.
      
      - Comments for parameter list makes no sense
        Like @buf_start is the offset to @buf, or offset inside the full
        decompressed extent? (Spoiler alert, the later case)
        And @total_out acts more like @buf_start + @size_of_buf.
      
        The worst is @disk_start.
        The real meaning of it is the file offset of the full decompressed
        extent.
      
      This patch will rework the whole function by:
      
      - Add a proper comment with ASCII art to explain the parameter list
      
      - Rework parameter list
        The old @buf_start is renamed to @decompressed, to show how many bytes
        are already decompressed inside the full decompressed extent.
        The old @total_out is replaced by @buf_len, which is the decompressed
        data size.
        For old @disk_start and @bio, just pass @compressed_bio in.
      
      - Use single main cursor
        The main cursor will be @cur_file_offset, to show what's the current
        file offset.
        Other helper variables will be declared inside the main loop, and only
        minimal amount of helper variables:
        * offset_inside_decompressed_buf:	The only real helper
        * copy_start_file_offset:		File offset we start memcpy
        * bvec_file_offset:			File offset of current bvec
      
      Even with all these extensive comments, the final function is still
      smaller than the original function, which is definitely a win.
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      1c3dc173
    • Q
      btrfs: grab correct extent map for subpage compressed extent read · 557023ea
      Qu Wenruo 提交于
      [BUG]
      When subpage compressed read write support is enabled, btrfs/038 always
      fails with EIO.
      
      A simplified script can easily trigger the problem:
      
        mkfs.btrfs -f -s 4k $dev
        mount $dev $mnt -o compress=lzo
      
        xfs_io -f -c "truncate 118811" $mnt/foo
        xfs_io -c "pwrite -S 0x0d -b 39987 92267 39987" $mnt/foo > /dev/null
      
        sync
        btrfs subvolume snapshot -r $mnt $mnt/mysnap1
      
        xfs_io -c "pwrite -S 0x3e -b 80000 200000 80000" $mnt/foo > /dev/null
        sync
      
        xfs_io -c "pwrite -S 0xdc -b 10000 250000 10000" $mnt/foo > /dev/null
        xfs_io -c "pwrite -S 0xff -b 10000 300000 10000" $mnt/foo > /dev/null
      
        sync
        btrfs subvolume snapshot -r $mnt $mnt/mysnap2
      
        cat $mnt/mysnap2/foo
        # Above cat will fail due to EIO
      
      [CAUSE]
      The problem is in btrfs_submit_compressed_read().
      
      When it tries to grab the extent map of the read range, it uses the
      following call:
      
      	em = lookup_extent_mapping(em_tree,
        				   page_offset(bio_first_page_all(bio)),
      				   fs_info->sectorsize);
      
      The problem is in the page_offset(bio_first_page_all(bio)) part.
      
      The offending inode has the following file extent layout
      
              item 10 key (257 EXTENT_DATA 131072) itemoff 15639 itemsize 53
                      generation 8 type 1 (regular)
                      extent data disk byte 13680640 nr 4096
                      extent data offset 0 nr 4096 ram 4096
                      extent compression 0 (none)
              item 11 key (257 EXTENT_DATA 135168) itemoff 15586 itemsize 53
                      generation 8 type 1 (regular)
                      extent data disk byte 0 nr 0
              item 12 key (257 EXTENT_DATA 196608) itemoff 15533 itemsize 53
                      generation 8 type 1 (regular)
                      extent data disk byte 13676544 nr 4096
                      extent data offset 0 nr 53248 ram 86016
                      extent compression 2 (lzo)
      
      And the bio passed in has the following parameters:
      
      page_offset(bio_first_page_all(bio))	= 131072
      bio_first_bvec_all(bio)->bv_offset	= 65536
      
      If we use page_offset(bio_first_page_all(bio) without adding bv_offset,
      we will get an extent map for file offset 131072, not 196608.
      
      This means we read uncompressed data from disk, and later decompression
      will definitely fail.
      
      [FIX]
      Take bv_offset into consideration when trying to grab an extent map.
      
      And add an ASSERT() to ensure we're really getting a compressed extent.
      
      Thankfully this won't affect anything but subpage, thus we only need to
      ensure this patch get merged before we enabled basic subpage support.
      Reviewed-by: NAnand Jain <anand.jain@oracle.com>
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      557023ea
    • Q
      btrfs: disable compressed readahead for subpage · ca62e85d
      Qu Wenruo 提交于
      For current subpage support, we only support 64K page size with 4K
      sector size.
      
      This makes compressed readahead less effective, as maximum compressed
      extent size is only 128K, 2x the page size.
      
      On the other hand, the function add_ra_bio_pages() is still assuming
      sectorsize == PAGE_SIZE, and code change may affect 4K page size
      systems.
      
      So for now, let's disable subpage compressed readahead for now.
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      ca62e85d
    • D
      btrfs: compression: drop kmap/kunmap from generic helpers · 4c2bf276
      David Sterba 提交于
      The pages in compressed_pages are not from highmem anymore so we can
      drop the mapping for checksum calculation and inline extent.
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      4c2bf276
    • D
      btrfs: drop from __GFP_HIGHMEM all allocations · b0ee5e1e
      David Sterba 提交于
      The highmem flag is used for allocating pages for compression and for
      raid56 pages. The high memory makes sense on 32bit systems but is not
      without problems. On 64bit system's it's just another layer of wrappers.
      
      The time the pages are allocated for compression or raid56 is relatively
      short (about a transaction commit), so the pages are not blocked
      indefinitely. As the number of pages depends on the amount of data being
      written/read, there's a theoretical problem. A fast device on a 32bit
      system could use most of the low memory pool, while with the highmem
      allocation that would not happen. This was possibly the original idea
      long time ago, but nowadays we optimize for 64bit systems.
      
      This patch removes all usage of the __GFP_HIGHMEM flag for page
      allocation, the kmap/kunmap are still in place and will be removed in
      followup patches. Remaining is masking out the bit in
      alloc_extent_state and __lookup_free_space_inode, that can safely stay.
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      b0ee5e1e
  3. 29 7月, 2021 1 次提交
  4. 22 6月, 2021 1 次提交
  5. 21 6月, 2021 7 次提交
  6. 28 5月, 2021 1 次提交
    • Q
      btrfs: fix compressed writes that cross stripe boundary · 4c80a97d
      Qu Wenruo 提交于
      [BUG]
      When running btrfs/027 with "-o compress" mount option, it always
      crashes with the following call trace:
      
        BTRFS critical (device dm-4): mapping failed logical 298901504 bio len 12288 len 8192
        ------------[ cut here ]------------
        kernel BUG at fs/btrfs/volumes.c:6651!
        invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
        CPU: 5 PID: 31089 Comm: kworker/u24:10 Tainted: G           OE     5.13.0-rc2-custom+ #26
        Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
        Workqueue: btrfs-delalloc btrfs_work_helper [btrfs]
        RIP: 0010:btrfs_map_bio.cold+0x58/0x5a [btrfs]
        Call Trace:
         btrfs_submit_compressed_write+0x2d7/0x470 [btrfs]
         submit_compressed_extents+0x3b0/0x470 [btrfs]
         ? mark_held_locks+0x49/0x70
         btrfs_work_helper+0x131/0x3e0 [btrfs]
         process_one_work+0x28f/0x5d0
         worker_thread+0x55/0x3c0
         ? process_one_work+0x5d0/0x5d0
         kthread+0x141/0x160
         ? __kthread_bind_mask+0x60/0x60
         ret_from_fork+0x22/0x30
        ---[ end trace 63113a3a91f34e68 ]---
      
      [CAUSE]
      The critical message before the crash means we have a bio at logical
      bytenr 298901504 length 12288, but only 8192 bytes can fit into one
      stripe, the remaining 4096 bytes go to another stripe.
      
      In btrfs, all bios are properly split to avoid cross stripe boundary,
      but commit 764c7c9a ("btrfs: zoned: fix parallel compressed writes")
      changed the behavior for compressed writes.
      
      Previously if we find our new page can't be fitted into current stripe,
      ie. "submit == 1" case, we submit current bio without adding current
      page.
      
             submit = btrfs_bio_fits_in_stripe(page, PAGE_SIZE, bio, 0);
      
         page->mapping = NULL;
         if (submit || bio_add_page(bio, page, PAGE_SIZE, 0) <
             PAGE_SIZE) {
      
      But after the modification, we will add the page no matter if it crosses
      stripe boundary, leading to the above crash.
      
             submit = btrfs_bio_fits_in_stripe(page, PAGE_SIZE, bio, 0);
      
         if (pg_index == 0 && use_append)
                 len = bio_add_zone_append_page(bio, page, PAGE_SIZE, 0);
         else
                 len = bio_add_page(bio, page, PAGE_SIZE, 0);
      
         page->mapping = NULL;
         if (submit || len < PAGE_SIZE) {
      
      [FIX]
      It's no longer possible to revert to the original code style as we have
      two different bio_add_*_page() calls now.
      
      The new fix is to skip the bio_add_*_page() call if @submit is true.
      
      Also to avoid @len to be uninitialized, always initialize it to zero.
      
      If @submit is true, @len will not be checked.
      If @submit is not true, @len will be the return value of
      bio_add_*_page() call.
      Either way, the behavior is still the same as the old code.
      Reported-by: NJosef Bacik <josef@toxicpanda.com>
      Fixes: 764c7c9a ("btrfs: zoned: fix parallel compressed writes")
      Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      4c80a97d
  7. 20 5月, 2021 1 次提交
    • J
      btrfs: zoned: fix parallel compressed writes · 764c7c9a
      Johannes Thumshirn 提交于
      When multiple processes write data to the same block group on a
      compressed zoned filesystem, the underlying device could report I/O
      errors and data corruption is possible.
      
      This happens because on a zoned file system, compressed data writes
      where sent to the device via a REQ_OP_WRITE instead of a
      REQ_OP_ZONE_APPEND operation. But with REQ_OP_WRITE and parallel
      submission it cannot be guaranteed that the data is always submitted
      aligned to the underlying zone's write pointer.
      
      The change to using REQ_OP_ZONE_APPEND instead of REQ_OP_WRITE on a
      zoned filesystem is non intrusive on a regular file system or when
      submitting to a conventional zone on a zoned filesystem, as it is
      guarded by btrfs_use_zone_append.
      Reported-by: NDavid Sterba <dsterba@suse.com>
      Fixes: 9d294a68 ("btrfs: zoned: enable to mount ZONED incompat flag")
      CC: stable@vger.kernel.org # 5.12.x: e380adfc: btrfs: zoned: pass start block to btrfs_use_zone_append
      CC: stable@vger.kernel.org # 5.12.x
      Signed-off-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      764c7c9a
  8. 06 5月, 2021 1 次提交
    • I
      btrfs: use memzero_page() instead of open coded kmap pattern · d048b9c2
      Ira Weiny 提交于
      There are many places where kmap/memset/kunmap patterns occur.
      
      Use the newly lifted memzero_page() to eliminate direct uses of kmap and
      leverage the new core functions use of kmap_local_page().
      
      The development of this patch was aided by the following coccinelle
      script:
      
      // <smpl>
      // SPDX-License-Identifier: GPL-2.0-only
      // Find kmap/memset/kunmap pattern and replace with memset*page calls
      //
      // NOTE: Offsets and other expressions may be more complex than what the script
      // will automatically generate.  Therefore a catchall rule is provided to find
      // the pattern which then must be evaluated by hand.
      //
      // Confidence: Low
      // Copyright: (C) 2021 Intel Corporation
      // URL: http://coccinelle.lip6.fr/
      // Comments:
      // Options:
      
      //
      // Then the memset pattern
      //
      @ memset_rule1 @
      expression page, V, L, Off;
      identifier ptr;
      type VP;
      @@
      
      (
      -VP ptr = kmap(page);
      |
      -ptr = kmap(page);
      |
      -VP ptr = kmap_atomic(page);
      |
      -ptr = kmap_atomic(page);
      )
      <+...
      (
      -memset(ptr, 0, L);
      +memzero_page(page, 0, L);
      |
      -memset(ptr + Off, 0, L);
      +memzero_page(page, Off, L);
      |
      -memset(ptr, V, L);
      +memset_page(page, V, 0, L);
      |
      -memset(ptr + Off, V, L);
      +memset_page(page, V, Off, L);
      )
      ...+>
      (
      -kunmap(page);
      |
      -kunmap_atomic(ptr);
      )
      
      // Remove any pointers left unused
      @
      depends on memset_rule1
      @
      identifier memset_rule1.ptr;
      type VP, VP1;
      @@
      
      -VP ptr;
      	... when != ptr;
      ? VP1 ptr;
      
      //
      // Catch all
      //
      @ memset_rule2 @
      expression page;
      identifier ptr;
      expression GenTo, GenSize, GenValue;
      type VP;
      @@
      
      (
      -VP ptr = kmap(page);
      |
      -ptr = kmap(page);
      |
      -VP ptr = kmap_atomic(page);
      |
      -ptr = kmap_atomic(page);
      )
      <+...
      (
      //
      // Some call sites have complex expressions within the memset/memcpy
      // The follow are catch alls which need to be evaluated by hand.
      //
      -memset(GenTo, 0, GenSize);
      +memzero_pageExtra(page, GenTo, GenSize);
      |
      -memset(GenTo, GenValue, GenSize);
      +memset_pageExtra(page, GenValue, GenTo, GenSize);
      )
      ...+>
      (
      -kunmap(page);
      |
      -kunmap_atomic(ptr);
      )
      
      // Remove any pointers left unused
      @
      depends on memset_rule2
      @
      identifier memset_rule2.ptr;
      type VP, VP1;
      @@
      
      -VP ptr;
      	... when != ptr;
      ? VP1 ptr;
      
      // </smpl>
      
      Link: https://lkml.kernel.org/r/20210309212137.2610186-4-ira.weiny@intel.comSigned-off-by: NIra Weiny <ira.weiny@intel.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Cc: Chris Mason <clm@fb.com>
      Cc: Josef Bacik <josef@toxicpanda.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d048b9c2
  9. 20 4月, 2021 1 次提交
    • Q
      btrfs: handle remount to no compress during compression · 1d8ba9e7
      Qu Wenruo 提交于
      [BUG]
      When running btrfs/071 with inode_need_compress() removed from
      compress_file_range(), we got the following crash:
      
        BUG: kernel NULL pointer dereference, address: 0000000000000018
        #PF: supervisor read access in kernel mode
        #PF: error_code(0x0000) - not-present page
        Workqueue: btrfs-delalloc btrfs_work_helper [btrfs]
        RIP: 0010:compress_file_range+0x476/0x7b0 [btrfs]
        Call Trace:
         ? submit_compressed_extents+0x450/0x450 [btrfs]
         async_cow_start+0x16/0x40 [btrfs]
         btrfs_work_helper+0xf2/0x3e0 [btrfs]
         process_one_work+0x278/0x5e0
         worker_thread+0x55/0x400
         ? process_one_work+0x5e0/0x5e0
         kthread+0x168/0x190
         ? kthread_create_worker_on_cpu+0x70/0x70
         ret_from_fork+0x22/0x30
        ---[ end trace 65faf4eae941fa7d ]---
      
      This is already after the patch "btrfs: inode: fix NULL pointer
      dereference if inode doesn't need compression."
      
      [CAUSE]
      @pages is firstly created by kcalloc() in compress_file_extent():
                      pages = kcalloc(nr_pages, sizeof(struct page *), GFP_NOFS);
      
      Then passed to btrfs_compress_pages() to be utilized there:
      
                      ret = btrfs_compress_pages(...
                                                 pages,
                                                 &nr_pages,
                                                 ...);
      
      btrfs_compress_pages() will initialize each page as output, in
      zlib_compress_pages() we have:
      
                              pages[nr_pages] = out_page;
                              nr_pages++;
      
      Normally this is completely fine, but there is a special case which
      is in btrfs_compress_pages() itself:
      
              switch (type) {
              default:
                      return -E2BIG;
              }
      
      In this case, we didn't modify @pages nor @out_pages, leaving them
      untouched, then when we cleanup pages, the we can hit NULL pointer
      dereference again:
      
              if (pages) {
                      for (i = 0; i < nr_pages; i++) {
                              WARN_ON(pages[i]->mapping);
                              put_page(pages[i]);
                      }
              ...
              }
      
      Since pages[i] are all initialized to zero, and btrfs_compress_pages()
      doesn't change them at all, accessing pages[i]->mapping would lead to
      NULL pointer dereference.
      
      This is not possible for current kernel, as we check
      inode_need_compress() before doing pages allocation.
      But if we're going to remove that inode_need_compress() in
      compress_file_extent(), then it's going to be a problem.
      
      [FIX]
      When btrfs_compress_pages() hits its default case, modify @out_pages to
      0 to prevent such problem from happening.
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=212331
      CC: stable@vger.kernel.org # 5.10+
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      1d8ba9e7
  10. 19 4月, 2021 1 次提交
    • I
      btrfs: convert kmap to kmap_local_page, simple cases · 58c1a35c
      Ira Weiny 提交于
      Use a simple coccinelle script to help convert the most common
      kmap()/kunmap() patterns to kmap_local_page()/kunmap_local().
      
      Note that some kmaps which were caught by this script needed to be
      handled by hand because of the strict unmapping order of kunmap_local()
      so they are not included in this patch.  But this script got us started.
      
      There's another temp variable added for the final length write to the
      first page so it does not interfere with cpage_out that is used for
      mapping other pages.
      
      The development of this patch was aided by the follow script:
      
      // <smpl>
      // SPDX-License-Identifier: GPL-2.0-only
      // Find kmap and replace with kmap_local_page then mark kunmap
      //
      // Confidence: Low
      // Copyright: (C) 2021 Intel Corporation
      // URL: http://coccinelle.lip6.fr/
      
      @ catch_all @
      expression e, e2;
      @@
      
      (
      -kmap(e)
      +kmap_local_page(e)
      )
      ...
      (
      -kunmap(...)
      +kunmap_local()
      )
      
      // </smpl>
      Signed-off-by: NIra Weiny <ira.weiny@intel.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      58c1a35c
  11. 26 2月, 2021 1 次提交
    • I
      btrfs: use memcpy_[to|from]_page() and kmap_local_page() · 3590ec58
      Ira Weiny 提交于
      There are many places where the pattern kmap/memcpy/kunmap occurs.
      
      This pattern was lifted to the core common functions
      memcpy_[to|from]_page().
      
      Use these new functions to reduce the code, eliminate direct uses of
      kmap, and leverage the new core functions use of kmap_local_page().
      
      Also, there is 1 place where a kmap/memcpy is followed by an
      optional memset.  Here we leave the kmap open coded to avoid remapping
      the page but use kmap_local_page() directly.
      
      Development of this patch was aided by the coccinelle script:
      
      // <smpl>
      // SPDX-License-Identifier: GPL-2.0-only
      // Find kmap/memcpy/kunmap pattern and replace with memcpy*page calls
      //
      // NOTE: Offsets and other expressions may be more complex than what the script
      // will automatically generate.  Therefore a catchall rule is provided to find
      // the pattern which then must be evaluated by hand.
      //
      // Confidence: Low
      // Copyright: (C) 2021 Intel Corporation
      // URL: http://coccinelle.lip6.fr/
      // Comments:
      // Options:
      
      //
      // simple memcpy version
      //
      @ memcpy_rule1 @
      expression page, T, F, B, Off;
      identifier ptr;
      type VP;
      @@
      
      (
      -VP ptr = kmap(page);
      |
      -ptr = kmap(page);
      |
      -VP ptr = kmap_atomic(page);
      |
      -ptr = kmap_atomic(page);
      )
      <+...
      (
      -memcpy(ptr + Off, F, B);
      +memcpy_to_page(page, Off, F, B);
      |
      -memcpy(ptr, F, B);
      +memcpy_to_page(page, 0, F, B);
      |
      -memcpy(T, ptr + Off, B);
      +memcpy_from_page(T, page, Off, B);
      |
      -memcpy(T, ptr, B);
      +memcpy_from_page(T, page, 0, B);
      )
      ...+>
      (
      -kunmap(page);
      |
      -kunmap_atomic(ptr);
      )
      
      // Remove any pointers left unused
      @
      depends on memcpy_rule1
      @
      identifier memcpy_rule1.ptr;
      type VP, VP1;
      @@
      
      -VP ptr;
      	... when != ptr;
      ? VP1 ptr;
      
      //
      // Some callers kmap without a temp pointer
      //
      @ memcpy_rule2 @
      expression page, T, Off, F, B;
      @@
      
      <+...
      (
      -memcpy(kmap(page) + Off, F, B);
      +memcpy_to_page(page, Off, F, B);
      |
      -memcpy(kmap(page), F, B);
      +memcpy_to_page(page, 0, F, B);
      |
      -memcpy(T, kmap(page) + Off, B);
      +memcpy_from_page(T, page, Off, B);
      |
      -memcpy(T, kmap(page), B);
      +memcpy_from_page(T, page, 0, B);
      )
      ...+>
      -kunmap(page);
      // No need for the ptr variable removal
      
      //
      // Catch all
      //
      @ memcpy_rule3 @
      expression page;
      expression GenTo, GenFrom, GenSize;
      identifier ptr;
      type VP;
      @@
      
      (
      -VP ptr = kmap(page);
      |
      -ptr = kmap(page);
      |
      -VP ptr = kmap_atomic(page);
      |
      -ptr = kmap_atomic(page);
      )
      <+...
      (
      //
      // Some call sites have complex expressions within the memcpy
      // match a catch all to be evaluated by hand.
      //
      -memcpy(GenTo, GenFrom, GenSize);
      +memcpy_to_pageExtra(page, GenTo, GenFrom, GenSize);
      +memcpy_from_pageExtra(GenTo, page, GenFrom, GenSize);
      )
      ...+>
      (
      -kunmap(page);
      |
      -kunmap_atomic(ptr);
      )
      
      // Remove any pointers left unused
      @
      depends on memcpy_rule3
      @
      identifier memcpy_rule3.ptr;
      type VP, VP1;
      @@
      
      -VP ptr;
      	... when != ptr;
      ? VP1 ptr;
      
      // <smpl>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NIra Weiny <ira.weiny@intel.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      3590ec58
  12. 23 2月, 2021 2 次提交
    • Q
      btrfs: make check_compressed_csum() to be subpage compatible · 04d4ba4c
      Qu Wenruo 提交于
      Currently check_compressed_csum() completely relies on sectorsize ==
      PAGE_SIZE to do checksum verification for compressed extents.
      
      To make it subpage compatible, this patch will:
      - Do extra calculation for the csum range
        Since we have multiple sectors inside a page, we need to only hash
        the range we want, not the full page anymore.
      
      - Do sector-by-sector hash inside the page
      
      With this patch and previous conversion on
      btrfs_submit_compressed_read(), now we can read subpage compressed
      extents properly, and do proper csum verification.
      Reviewed-by: NAnand Jain <anand.jain@oracle.com>
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      04d4ba4c
    • Q
      btrfs: make btrfs_submit_compressed_read() subpage compatible · be6a1361
      Qu Wenruo 提交于
      For compressed read, we always submit page read using page size.  This
      doesn't work well with subpage, as for subpage one page can contain
      several sectors.  Such submission will read range out of what we want,
      and cause problems.
      
      Thankfully to make it subpage compatible, we only need to change how the
      last page of the compressed extent is read.
      
      Instead of always adding a full page to the compressed read bio, if we're
      at the last page, calculate the size using compressed length, so that we
      only add part of the range into the compressed read bio.
      
      Since we are here, also change the PAGE_SIZE used in
      lookup_extent_mapping() to sectorsize.
      This modification won't cause any functional change, as
      lookup_extent_mapping() can handle the case where the search range is
      larger than found extent range.
      Reviewed-by: NAnand Jain <anand.jain@oracle.com>
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      be6a1361
  13. 09 2月, 2021 1 次提交
    • Q
      btrfs: introduce btrfs_subpage for data inodes · 32443de3
      Qu Wenruo 提交于
      To support subpage sector size, data also need extra info to make sure
      which sectors in a page are uptodate/dirty/...
      
      This patch will make pages for data inodes get btrfs_subpage structure
      attached, and detached when the page is freed.
      
      This patch also slightly changes the timing when
      set_page_extent_mapped() is called to make sure:
      
      - We have page->mapping set
        page->mapping->host is used to grab btrfs_fs_info, thus we can only
        call this function after page is mapped to an inode.
      
        One call site attaches pages to inode manually, thus we have to modify
        the timing of set_page_extent_mapped() a bit.
      
      - As soon as possible, before other operations
        Since memory allocation can fail, we have to do extra error handling.
        Calling set_page_extent_mapped() as soon as possible can simply the
        error handling for several call sites.
      
      The idea is pretty much the same as iomap_page, but with more bitmaps
      for btrfs specific cases.
      
      Currently the plan is to switch iomap if iomap can provide sector
      aligned write back (only write back dirty sectors, but not the full
      page, data balance require this feature).
      
      So we will stick to btrfs specific bitmap for now.
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      32443de3
  14. 10 12月, 2020 2 次提交
    • Q
      btrfs: refactor btrfs_lookup_bio_sums to handle out-of-order bvecs · 6275193e
      Qu Wenruo 提交于
      Refactor btrfs_lookup_bio_sums() by:
      
      - Remove the @file_offset parameter
        There are two factors making the @file_offset parameter useless:
      
        * For csum lookup in csum tree, file offset makes no sense
          We only need disk_bytenr, which is unrelated to file_offset
      
        * page_offset (file offset) of each bvec is not contiguous.
          Pages can be added to the same bio as long as their on-disk bytenr
          is contiguous, meaning we could have pages at different file offsets
          in the same bio.
      
        Thus passing file_offset makes no sense any more.
        The only user of file_offset is for data reloc inode, we will use
        a new function, search_file_offset_in_bio(), to handle it.
      
      - Extract the csum tree lookup into search_csum_tree()
        The new function will handle the csum search in csum tree.
        The return value is the same as btrfs_find_ordered_sum(), returning
        the number of found sectors which have checksum.
      
      - Change how we do the main loop
        The only needed info from bio is:
        * the on-disk bytenr
        * the length
      
        After extracting the above info, we can do the search without bio
        at all, which makes the main loop much simpler:
      
      	for (cur_disk_bytenr = orig_disk_bytenr;
      	     cur_disk_bytenr < orig_disk_bytenr + orig_len;
      	     cur_disk_bytenr += count * sectorsize) {
      
      		/* Lookup csum tree */
      		count = search_csum_tree(fs_info, path, cur_disk_bytenr,
      					 search_len, csum_dst);
      		if (!count) {
      			/* Csum hole handling */
      		}
      	}
      
      - Use single variable as the source to calculate all other offsets
        Instead of all different type of variables, we use only one main
        variable, cur_disk_bytenr, which represents the current disk bytenr.
      
        All involved values can be calculated from that variable, and
        all those variable will only be visible in the inner loop.
      
      The above refactoring makes btrfs_lookup_bio_sums() way more robust than
      it used to be, especially related to the file offset lookup.  Now
      file_offset lookup is only related to data reloc inode, otherwise we
      don't need to bother file_offset at all.
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      6275193e
    • D
      btrfs: drop casts of bio bi_sector · 1201b58b
      David Sterba 提交于
      Since commit 72deb455 ("block: remove CONFIG_LBDAF") (5.2) the
      sector_t type is u64 on all arches and configs so we don't need to
      typecast it.  It used to be unsigned long and the result of sector size
      shifts were not guaranteed to fit in the type.
      Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      1201b58b
  15. 08 12月, 2020 5 次提交
  16. 07 10月, 2020 1 次提交
  17. 27 7月, 2020 2 次提交