1. 06 12月, 2022 16 次提交
  2. 24 10月, 2022 1 次提交
    • Q
      btrfs: make thaw time super block check to also verify checksum · 3d17adea
      Qu Wenruo 提交于
      Previous commit a05d3c91 ("btrfs: check superblock to ensure the fs
      was not modified at thaw time") only checks the content of the super
      block, but it doesn't really check if the on-disk super block has a
      matching checksum.
      
      This patch will add the checksum verification to thaw time superblock
      verification.
      
      This involves the following extra changes:
      
      - Export btrfs_check_super_csum()
        As we need to call it in super.c.
      
      - Change the argument list of btrfs_check_super_csum()
        Instead of passing a char *, directly pass struct btrfs_super_block *
        pointer.
      
      - Verify that our checksum type didn't change before checking the
        checksum value, like it's done at mount time
      
      Fixes: a05d3c91 ("btrfs: check superblock to ensure the fs was not modified at thaw time")
      Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      3d17adea
  3. 26 9月, 2022 7 次提交
    • Q
      btrfs: relax block-group-tree feature dependency checks · d7f67ac9
      Qu Wenruo 提交于
      [BUG]
      When one user did a wrong attempt to clear block group tree, which can
      not be done through mount option, by using "-o clear_cache,space_cache=v2",
      it will cause the following error on a fs with block-group-tree feature:
      
        BTRFS info (device dm-1): force clearing of disk cache
        BTRFS info (device dm-1): using free space tree
        BTRFS info (device dm-1): clearing free space tree
        BTRFS info (device dm-1): clearing compat-ro feature flag for FREE_SPACE_TREE (0x1)
        BTRFS info (device dm-1): clearing compat-ro feature flag for FREE_SPACE_TREE_VALID (0x2)
        BTRFS error (device dm-1): block-group-tree feature requires fres-space-tree and no-holes
        BTRFS error (device dm-1): super block corruption detected before writing it to disk
        BTRFS: error (device dm-1) in write_all_supers:4318: errno=-117 Filesystem corrupted (unexpected superblock corruption detected)
        BTRFS warning (device dm-1: state E): Skipping commit of aborted transaction.
      
      [CAUSE]
      Although the dependency for block-group-tree feature is just an
      artificial one (to reduce test matrix), we put the dependency check into
      btrfs_validate_super().
      
      This is too strict, and during space cache clearing, we will have a
      window where free space tree is cleared, and we need to commit the super
      block.
      
      In that window, we had block group tree without v2 cache, and triggered
      the artificial dependency check.
      
      This is not necessary at all, especially for such a soft dependency.
      
      [FIX]
      Introduce a new helper, btrfs_check_features(), to do all the runtime
      limitation checks, including:
      
      - Unsupported incompat flags check
      
      - Unsupported compat RO flags check
      
      - Setting missing incompat flags
      
      - Artificial feature dependency checks
        Currently only block group tree will rely on this.
      
      - Subpage runtime check for v1 cache
      
      With this helper, we can move quite some checks from
      open_ctree()/btrfs_remount() into it, and just call it after
      btrfs_parse_options().
      
      Now "-o clear_cache,space_cache=v2" will not trigger the above error
      anymore.
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      [ edit messages ]
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      d7f67ac9
    • J
      btrfs: separate out the extent state and extent buffer init code · a62a3bd9
      Josef Bacik 提交于
      In order to help separate the extent buffer from the extent io tree code
      we need to break up the init functions.
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      a62a3bd9
    • Q
      btrfs: enhance unsupported compat RO flags handling · 81d5d614
      Qu Wenruo 提交于
      Currently there are two corner cases not handling compat RO flags
      correctly:
      
      - Remount
        We can still mount the fs RO with compat RO flags, then remount it RW.
        We should not allow any write into a fs with unsupported RO flags.
      
      - Still try to search block group items
        In fact, behavior/on-disk format change to extent tree should not
        need a full incompat flag.
      
        And since we can ensure fs with unsupported RO flags never got any
        writes (with above case fixed), then we can even skip block group
        items search at mount time.
      
      This patch will enhance the unsupported RO compat flags by:
      
      - Reject read-write remount if there are unsupported RO compat flags
      
      - Go dummy block group items directly for unsupported RO compat flags
        In fact, only changes to chunk/subvolume/root/csum trees should go
        incompat flags.
      
      The latter part should allow future change to extent tree to be compat
      RO flags.
      
      Thus this patch also needs to be backported to all stable trees.
      
      CC: stable@vger.kernel.org # 4.9+
      Reviewed-by: NNikolay Borisov <nborisov@suse.com>
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      81d5d614
    • Q
      btrfs: dump all space infos if we abort transaction due to ENOSPC · 8e327b9c
      Qu Wenruo 提交于
      We have hit some transaction abort due to -ENOSPC internally.
      
      Normally we should always reserve enough space for metadata for every
      transaction, thus hitting -ENOSPC should really indicate some cases we
      didn't expect.
      
      But unfortunately current error reporting will only give a kernel
      warning and stack trace, not really helpful to debug what's causing the
      problem.
      
      And mount option debug_enospc can only help when user can reproduce the
      problem, but under most cases, such transaction abort by -ENOSPC is
      really hard to reproduce.
      
      So this patch will dump all space infos (data, metadata, system) when we
      abort the first transaction with -ENOSPC.
      
      This should at least provide some clue to us.
      
      The example of a dump would look like this:
      
        BTRFS: Transaction aborted (error -28)
        WARNING: CPU: 8 PID: 3366 at fs/btrfs/transaction.c:2137 btrfs_commit_transaction+0xf81/0xfb0 [btrfs]
        <call trace skipped>
        ---[ end trace 0000000000000000 ]---
        BTRFS info (device dm-1: state A): dumping space info:
        BTRFS info (device dm-1: state A): space_info DATA has 6791168 free, is not full
        BTRFS info (device dm-1: state A): space_info total=8388608, used=1597440, pinned=0, reserved=0, may_use=0, readonly=0 zone_unusable=0
        BTRFS info (device dm-1: state A): space_info METADATA has 257114112 free, is not full
        BTRFS info (device dm-1: state A): space_info total=268435456, used=131072, pinned=180224, reserved=65536, may_use=10878976, readonly=65536 zone_unusable=0
        BTRFS info (device dm-1: state A): space_info SYSTEM has 8372224 free, is not full
        BTRFS info (device dm-1: state A): space_info total=8388608, used=16384, pinned=0, reserved=0, may_use=0, readonly=0 zone_unusable=0
        BTRFS info (device dm-1: state A): global_block_rsv: size 3670016 reserved 3670016
        BTRFS info (device dm-1: state A): trans_block_rsv: size 0 reserved 0
        BTRFS info (device dm-1: state A): chunk_block_rsv: size 0 reserved 0
        BTRFS info (device dm-1: state A): delayed_block_rsv: size 4063232 reserved 4063232
        BTRFS info (device dm-1: state A): delayed_refs_rsv: size 3145728 reserved 3145728
        BTRFS: error (device dm-1: state A) in btrfs_commit_transaction:2137: errno=-28 No space left
        BTRFS info (device dm-1: state EA): forced readonly
      Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      8e327b9c
    • Q
      btrfs: check superblock to ensure the fs was not modified at thaw time · a05d3c91
      Qu Wenruo 提交于
      [BACKGROUND]
      There is an incident report that, one user hibernated the system, with
      one btrfs on removable device still mounted.
      
      Then by some incident, the btrfs got mounted and modified by another
      system/OS, then back to the hibernated system.
      
      After resuming from the hibernation, new write happened into the victim btrfs.
      
      Now the fs is completely broken, since the underlying btrfs is no longer
      the same one before the hibernation, and the user lost their data due to
      various transid mismatch.
      
      [REPRODUCER]
      We can emulate the situation using the following small script:
      
        truncate -s 1G $dev
        mkfs.btrfs -f $dev
        mount $dev $mnt
        fsstress -w -d $mnt -n 500
        sync
        xfs_freeze -f $mnt
        cp $dev $dev.backup
      
        # There is no way to mount the same cloned fs on the same system,
        # as the conflicting fsid will be rejected by btrfs.
        # Thus here we have to wipe the fs using a different btrfs.
        mkfs.btrfs -f $dev.backup
      
        dd if=$dev.backup of=$dev bs=1M
        xfs_freeze -u $mnt
        fsstress -w -d $mnt -n 20
        umount $mnt
        btrfs check $dev
      
      The final fsck will fail due to some tree blocks has incorrect fsid.
      
      This is enough to emulate the problem hit by the unfortunate user.
      
      [ENHANCEMENT]
      Although such case should not be that common, it can still happen from
      time to time.
      
      From the view of btrfs, we can detect any unexpected super block change,
      and if there is any unexpected change, we just mark the fs read-only,
      and thaw the fs.
      
      By this we can limit the damage to minimal, and I hope no one would lose
      their data by this anymore.
      Suggested-by: NGoffredo Baroncelli <kreijack@libero.it>
      Link: https://lore.kernel.org/linux-btrfs/83bf3b4b-7f4c-387a-b286-9251e3991e34@bluemole.com/Reviewed-by: NAnand Jain <anand.jain@oracle.com>
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      a05d3c91
    • C
      btrfs: move btrfs_bio allocation to volumes.c · d45cfb88
      Christoph Hellwig 提交于
      volumes.c is the place that implements the storage layer using the
      btrfs_bio structure, so move the bio_set and allocation helpers there
      as well.
      
      To make up for the new initialization boilerplate, merge the two
      init/exit helpers in extent_io.c into a single one.
      Reviewed-by: NNikolay Borisov <nborisov@suse.com>
      Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Reviewed-by: NAnand Jain <anand.jain@oracle.com>
      Tested-by: NNikolay Borisov <nborisov@suse.com>
      Tested-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      d45cfb88
    • M
      btrfs: don't print information about space cache or tree every remount · dbecac26
      Maciej S. Szmigiero 提交于
      btrfs currently prints information about space cache or free space tree
      being in use on every remount, regardless whether such remount actually
      enabled or disabled one of these features.
      
      This is actually unnecessary since providing remount options changing the
      state of these features will explicitly print the appropriate notice.
      
      Let's instead print such unconditional information just on an initial mount
      to avoid filling the kernel log when, for example, laptop-mode-tools
      remount the fs on some events.
      Signed-off-by: NMaciej S. Szmigiero <maciej.szmigiero@oracle.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      dbecac26
  4. 25 7月, 2022 6 次提交
    • D
      btrfs: use mask for all RAID1* profiles in btrfs_calc_avail_data_space · d09cb9e1
      David Sterba 提交于
      There's a sequence of hard coded values for RAID1 profiles that are
      already stored in the raid_attr table that should be used instead.
      Reviewed-by: NNikolay Borisov <nborisov@suse.com>
      Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      d09cb9e1
    • Q
      btrfs: use named constant for reserved device space · 37f85ec3
      Qu Wenruo 提交于
      There's a reserved space on each device of size 1MiB that can be used by
      bootloaders or to avoid accidental overwrite. Use a symbolic constant
      with the explaining comment instead of hard coding the value and
      multiple comments.
      
      Note: since btrfs-progs v4.1, mkfs.btrfs will reserve the first 1MiB for
      the primary super block (at offset 64KiB), until then the range could
      have been used by mistake. Kernel has been always respecting the 1MiB
      range for writes.
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      [ update changelog ]
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      37f85ec3
    • C
      btrfs: remove btrfs_end_io_wq · d7b9416f
      Christoph Hellwig 提交于
      All reads bio that go through btrfs_map_bio need to be completed in
      user context.  And read I/Os are the most common and timing critical
      in almost any file system workloads.
      
      Embed a work_struct into struct btrfs_bio and use it to complete all
      read bios submitted through btrfs_map, using the REQ_META flag to decide
      which workqueue they are placed on.
      
      This removes the need for a separate 128 byte allocation (typically
      rounded up to 192 bytes by slab) for all reads with a size increase
      of 24 bytes for struct btrfs_bio.  Future patches will reorganize
      struct btrfs_bio to make use of this extra space for writes as well.
      
      (All sizes are based a on typical 64-bit non-debug build)
      Reviewed-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      d7b9416f
    • C
      btrfs: don't use btrfs_bio_wq_end_io for compressed writes · fed8a72d
      Christoph Hellwig 提交于
      Compressed write bio completion is the only user of btrfs_bio_wq_end_io
      for writes, and the use of btrfs_bio_wq_end_io is a little suboptimal
      here as we only real need user context for the final completion of a
      compressed_bio structure, and not every single bio completion.
      
      Add a work_struct to struct compressed_bio instead and use that to call
      finish_compressed_bio_write.  This allows to remove all handling of
      write bios in the btrfs_bio_wq_end_io infrastructure.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      fed8a72d
    • Q
      btrfs: add trace event for submitted RAID56 bio · b8bea09a
      Qu Wenruo 提交于
      Add tracepoint for better insight to how the RAID56 data are submitted.
      
      The output looks like this: (trace event header and UUID skipped)
      
         raid56_read_partial: full_stripe=389152768 devid=3 type=DATA1 offset=32768 opf=0x0 physical=323059712 len=32768
         raid56_read_partial: full_stripe=389152768 devid=1 type=DATA2 offset=0 opf=0x0 physical=67174400 len=65536
         raid56_write_stripe: full_stripe=389152768 devid=3 type=DATA1 offset=0 opf=0x1 physical=323026944 len=32768
         raid56_write_stripe: full_stripe=389152768 devid=2 type=PQ1 offset=0 opf=0x1 physical=323026944 len=32768
      
      The above debug output is from a 32K data write into an empty RAID56
      data chunk.
      
      Some explanation on the event output:
      
        full_stripe:	the logical bytenr of the full stripe
        devid:	btrfs devid
        type:		raid stripe type.
               	DATA1:	the first data stripe
               	DATA2:	the second data stripe
               	PQ1:	the P stripe
               	PQ2:	the Q stripe
        offset:	the offset inside the stripe.
        opf:		the bio op type
        physical:	the physical offset the bio is for
        len:		the length of the bio
      
      The first two lines are from partial RMW read, which is reading the
      remaining data stripes from disks.
      
      The last two lines are for full stripe RMW write, which is writing the
      involved two 16K stripes (one for DATA1 stripe, one for P stripe).
      The stripe for DATA2 doesn't need to be written.
      
      There are 5 types of trace events:
      
      - raid56_read_partial
        Read remaining data for regular read/write path.
      
      - raid56_write_stripe
        Write the modified stripes for regular read/write path.
      
      - raid56_scrub_read_recover
        Read remaining data for scrub recovery path.
      
      - raid56_scrub_write_stripe
        Write the modified stripes for scrub path.
      
      - raid56_scrub_read
        Read remaining data for scrub path.
      
      Also, since the trace events are included at super.c, we have to export
      needed structure definitions to 'raid56.h' and include the header in
      super.c, or we're unable to access those members.
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      [ reformat comments ]
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      b8bea09a
    • D
      btrfs: fix typos in comments · 143823cf
      David Sterba 提交于
      Codespell has found a few typos.
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      143823cf
  5. 04 7月, 2022 1 次提交
    • R
      mm: shrinkers: provide shrinkers with names · e33c267a
      Roman Gushchin 提交于
      Currently shrinkers are anonymous objects.  For debugging purposes they
      can be identified by count/scan function names, but it's not always
      useful: e.g.  for superblock's shrinkers it's nice to have at least an
      idea of to which superblock the shrinker belongs.
      
      This commit adds names to shrinkers.  register_shrinker() and
      prealloc_shrinker() functions are extended to take a format and arguments
      to master a name.
      
      In some cases it's not possible to determine a good name at the time when
      a shrinker is allocated.  For such cases shrinker_debugfs_rename() is
      provided.
      
      The expected format is:
          <subsystem>-<shrinker_type>[:<instance>]-<id>
      For some shrinkers an instance can be encoded as (MAJOR:MINOR) pair.
      
      After this change the shrinker debugfs directory looks like:
        $ cd /sys/kernel/debug/shrinker/
        $ ls
          dquota-cache-16     sb-devpts-28     sb-proc-47       sb-tmpfs-42
          mm-shadow-18        sb-devtmpfs-5    sb-proc-48       sb-tmpfs-43
          mm-zspool:zram0-34  sb-hugetlbfs-17  sb-pstore-31     sb-tmpfs-44
          rcu-kfree-0         sb-hugetlbfs-33  sb-rootfs-2      sb-tmpfs-49
          sb-aio-20           sb-iomem-12      sb-securityfs-6  sb-tracefs-13
          sb-anon_inodefs-15  sb-mqueue-21     sb-selinuxfs-22  sb-xfs:vda1-36
          sb-bdev-3           sb-nsfs-4        sb-sockfs-8      sb-zsmalloc-19
          sb-bpf-32           sb-pipefs-14     sb-sysfs-26      thp-deferred_split-10
          sb-btrfs:vda2-24    sb-proc-25       sb-tmpfs-1       thp-zero-9
          sb-cgroup2-30       sb-proc-39       sb-tmpfs-27      xfs-buf:vda1-37
          sb-configfs-23      sb-proc-41       sb-tmpfs-29      xfs-inodegc:vda1-38
          sb-dax-11           sb-proc-45       sb-tmpfs-35
          sb-debugfs-7        sb-proc-46       sb-tmpfs-40
      
      [roman.gushchin@linux.dev: fix build warnings]
        Link: https://lkml.kernel.org/r/Yr+ZTnLb9lJk6fJO@castleReported-by: Nkernel test robot <lkp@intel.com>
      Link: https://lkml.kernel.org/r/20220601032227.4076670-4-roman.gushchin@linux.devSigned-off-by: NRoman Gushchin <roman.gushchin@linux.dev>
      Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
      Cc: Dave Chinner <dchinner@redhat.com>
      Cc: Hillf Danton <hdanton@sina.com>
      Cc: Kent Overstreet <kent.overstreet@gmail.com>
      Cc: Muchun Song <songmuchun@bytedance.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      e33c267a
  6. 07 6月, 2022 1 次提交
  7. 06 6月, 2022 1 次提交
    • Q
      btrfs: prevent remounting to v1 space cache for subpage mount · 0591f040
      Qu Wenruo 提交于
      Upstream commit 9f73f1ae ("btrfs: force v2 space cache usage for
      subpage mount") forces subpage mount to use v2 cache, to avoid
      deprecated v1 cache which doesn't support subpage properly.
      
      But there is a loophole that user can still remount to v1 cache.
      
      The existing check will only give users a warning, but does not really
      prevent to do the remount.
      
      Although remounting to v1 will not cause any problems since the v1 cache
      will always be marked invalid when mounted with a different page size,
      it's still better to prevent v1 cache at all for subpage mounts.
      
      Fixes: 9f73f1ae ("btrfs: force v2 space cache usage for subpage mount")
      CC: stable@vger.kernel.org # 5.15+
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      0591f040
  8. 16 5月, 2022 3 次提交
  9. 14 3月, 2022 3 次提交
    • S
      btrfs: add filesystems state details to error messages · c067da87
      Sweet Tea Dorminy 提交于
      When a filesystem goes read-only due to an error, multiple errors tend
      to be reported, some of which are knock-on failures. Logging fs_states,
      in btrfs_handle_fs_error() and btrfs_printk() helps distinguish the
      first error from subsequent messages which may only exist due to an
      error state.
      
      Under the new format, most initial errors will look like:
      `BTRFS: error (device loop0) in ...`
      while subsequent errors will begin with:
      `error (device loop0: state E) in ...`
      
      An initial transaction abort error will look like
      `error (device loop0: state A) in ...`
      and subsequent messages will contain
      `(device loop0: state EA) in ...`
      
      In addition to the error states we can also print other states that are
      temporary, like remounting, device replace, or indicate a global state
      that may affect functionality.
      
      Now implemented:
      
      E - filesystem error detected
      A - transaction aborted
      L - log tree errors
      
      M - remounting in progress
      R - device replace in progress
      C - data checksums not verified (mounted with ignoredatacsums)
      Signed-off-by: NSweet Tea Dorminy <sweettea-kernel@dorminy.me>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      c067da87
    • J
      btrfs: disable space cache related mount options for extent tree v2 · 63cd070d
      Josef Bacik 提交于
      We cannot fall back on the slow caching for extent tree v2, which means
      we can't just arbitrarily clear the free space trees at mount time.
      Furthermore we can't do v1 space cache with extent tree v2.  Simply
      ignore these mount options for extent tree v2 as they aren't relevant.
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      63cd070d
    • A
      btrfs: match stale devices by dev_t · 16cab91a
      Anand Jain 提交于
      After the commit "btrfs: harden identification of the stale device", we
      don't have to match the device path anymore. Instead, we match the dev_t.
      So pass in the dev_t instead of the device path, in the call chain
      btrfs_forget_devices()->btrfs_free_stale_devices().
      Signed-off-by: NAnand Jain <anand.jain@oracle.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      16cab91a
  10. 22 1月, 2022 1 次提交