1. 10 12月, 2020 7 次提交
    • B
      btrfs: keep sb cache_generation consistent with space_cache · 94846229
      Boris Burkov 提交于
      When mounting, btrfs uses the cache_generation in the super block to
      determine if space cache v1 is in use. However, by mounting with
      nospace_cache or space_cache=v2, it is possible to disable space cache
      v1, which does not result in un-setting cache_generation back to 0.
      
      In order to base some logic, like mount option printing in /proc/mounts,
      on the current state of the space cache rather than just the values of
      the mount option, keep the value of cache_generation consistent with the
      status of space cache v1.
      
      We ensure that cache_generation > 0 iff the file system is using
      space_cache v1. This requires committing a transaction on any mount
      which changes whether we are using v1. (v1->nospace_cache, v1->v2,
      nospace_cache->v1, v2->v1).
      
      Since the mechanism for writing out the cache generation is transaction
      commit, but we want some finer grained control over when we un-set it,
      we can't just rely on the SPACE_CACHE mount option, and introduce an
      fs_info flag that mount can use when it wants to unset the generation.
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NBoris Burkov <boris@bur.io>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      94846229
    • B
      btrfs: clear oneshot options on mount and remount · 8cd29088
      Boris Burkov 提交于
      Some options only apply during mount time and are cleared at the end
      of mount. For now, the example is USEBACKUPROOT, but CLEAR_CACHE also
      fits the bill, and this is a preparation patch for also clearing that
      option.
      
      One subtlety is that the current code only resets USEBACKUPROOT on rw
      mounts, but the option is meaningfully "consumed" by a ro mount, so it
      feels appropriate to clear in that case as well. A subsequent read-write
      remount would not go through open_ctree, which is the only place that
      checks the option, so the change should be benign.
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NBoris Burkov <boris@bur.io>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      8cd29088
    • B
      btrfs: lift read-write mount setup from mount and remount · 44c0ca21
      Boris Burkov 提交于
      Mounting rw and remounting from ro to rw naturally share invariants and
      functionality which result in a correctly setup rw filesystem. Luckily,
      there is even a strong unity in the code which implements them. In
      mount's open_ctree, these operations mostly happen after an early return
      for ro file systems, and in remount, they happen in a section devoted to
      remounting ro->rw, after some remount specific validation passes.
      
      However, there are unfortunately a few differences. There are small
      deviations in the order of some of the operations, remount does not
      start orphan cleanup in root_tree or fs_tree, remount does not create
      the free space tree, and remount does not handle "one-shot" mount
      options like clear_cache and uuid tree rescan.
      
      Since we want to add building the free space tree to remount, and also
      to start the same orphan cleanup process on a filesystem mounted as ro
      then remounted rw, we would benefit from unifying the logic between the
      two code paths.
      
      This patch only lifts the existing common functionality, and leaves a
      natural path for fixing the discrepancies.
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NBoris Burkov <boris@bur.io>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      44c0ca21
    • N
      btrfs: remove inode number cache feature · 5297199a
      Nikolay Borisov 提交于
      It's been deprecated since commit b547a88e ("btrfs: start
      deprecation of mount option inode_cache") which enumerates the reasons.
      
      A filesystem that uses the feature (mount -o inode_cache) tracks the
      inode numbers in bitmaps, that data stay on the filesystem after this
      patch. The size is roughly 5MiB for 1M inodes [1], which is considered
      small enough to be left there. Removal of the change can be implemented
      in btrfs-progs if needed.
      
      [1] https://lore.kernel.org/linux-btrfs/20201127145836.GZ6430@twin.jikos.cz/Signed-off-by: NNikolay Borisov <nborisov@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      [ update changelog ]
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      5297199a
    • N
      btrfs: disallow space_cache in ZONED mode · 5d1ab66c
      Naohiro Aota 提交于
      As updates to the space cache v1 are in-place, the space cache cannot be
      located over sequential zones and there is no guarantees that the device
      will have enough conventional zones to store this cache. Resolve this
      problem by disabling completely the space cache v1.  This does not
      introduce any problems with sequential block groups: all the free space
      is located after the allocation pointer and no free space before the
      pointer.  There is no need to have such cache.
      
      Note: we can technically use free-space-tree (space cache v2) on ZONED
      mode. But, since ZONED mode now always allocates extents in a block
      group sequentially regardless of underlying device zone type, it's no
      use to enable and maintain the tree.
      
      For the same reason, NODATACOW is also disabled.
      
      In summary, ZONED will disable:
      
      | Disabled features | Reason                                              |
      |-------------------+-----------------------------------------------------|
      | RAID/DUP          | Cannot handle two zone append writes to different   |
      |                   | zones                                               |
      |-------------------+-----------------------------------------------------|
      | space_cache (v1)  | In-place updating                                   |
      | NODATACOW         | In-place updating                                   |
      |-------------------+-----------------------------------------------------|
      | fallocate         | Reserved extent will be a write hole                |
      |-------------------+-----------------------------------------------------|
      | MIXED_BG          | Allocated metadata region will be write holes for   |
      |                   | data writes                                         |
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NNaohiro Aota <naohiro.aota@wdc.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      5d1ab66c
    • N
      btrfs: check and enable ZONED mode · b70f5097
      Naohiro Aota 提交于
      Introduce function btrfs_check_zoned_mode() to check if ZONED flag is
      enabled on the file system and if the file system consists of zoned
      devices with equal zone size.
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
      Signed-off-by: NNaohiro Aota <naohiro.aota@wdc.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      b70f5097
    • N
      btrfs: get zone information of zoned block devices · 5b316468
      Naohiro Aota 提交于
      If a zoned block device is found, get its zone information (number of
      zones and zone size).  To avoid costly run-time zone report
      commands to test the device zones type during block allocation, attach
      the seq_zones bitmap to the device structure to indicate if a zone is
      sequential or accept random writes. Also it attaches the empty_zones
      bitmap to indicate if a zone is empty or not.
      
      This patch also introduces the helper function btrfs_dev_is_sequential()
      to test if the zone storing a block is a sequential write required zone
      and btrfs_dev_is_empty_zone() to test if the zone is a empty zone.
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NAnand Jain <anand.jain@oracle.com>
      Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
      Signed-off-by: NNaohiro Aota <naohiro.aota@wdc.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      5b316468
  2. 08 12月, 2020 10 次提交
  3. 07 10月, 2020 2 次提交
  4. 20 8月, 2020 1 次提交
    • M
      btrfs: reset compression level for lzo on remount · 282dd7d7
      Marcos Paulo de Souza 提交于
      Currently a user can set mount "-o compress" which will set the
      compression algorithm to zlib, and use the default compress level for
      zlib (3):
      
        relatime,compress=zlib:3,space_cache
      
      If the user remounts the fs using "-o compress=lzo", then the old
      compress_level is used:
      
        relatime,compress=lzo:3,space_cache
      
      But lzo does not expose any tunable compression level. The same happens
      if we set any compress argument with different level, also with zstd.
      
      Fix this by resetting the compress_level when compress=lzo is
      specified.  With the fix applied, lzo is shown without compress level:
      
        relatime,compress=lzo,space_cache
      
      CC: stable@vger.kernel.org # 4.4+
      Signed-off-by: NMarcos Paulo de Souza <mpdesouza@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      282dd7d7
  5. 11 8月, 2020 3 次提交
    • J
      btrfs: make sure SB_I_VERSION doesn't get unset by remount · faa00889
      Josef Bacik 提交于
      There's some inconsistency around SB_I_VERSION handling with mount and
      remount.  Since we don't really want it to be off ever just work around
      this by making sure we don't get the flag cleared on remount.
      
      There's a tiny cpu cost of setting the bit, otherwise all changes to
      i_version also change some of the times (ctime/mtime) so the inode needs
      to be synced. We wouldn't save anything by disabling it.
      Reported-by: NEric Sandeen <sandeen@redhat.com>
      CC: stable@vger.kernel.org # 5.4+
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      [ add perf impact analysis ]
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      faa00889
    • J
      btrfs: don't show full path of bind mounts in subvol= · 3ef3959b
      Josef Bacik 提交于
      Chris Murphy reported a problem where rpm ostree will bind mount a bunch
      of things for whatever voodoo it's doing.  But when it does this
      /proc/mounts shows something like
      
        /dev/sda /mnt/test btrfs rw,relatime,subvolid=256,subvol=/foo 0 0
        /dev/sda /mnt/test/baz btrfs rw,relatime,subvolid=256,subvol=/foo/bar 0 0
      
      Despite subvolid=256 being subvol=/foo.  This is because we're just
      spitting out the dentry of the mount point, which in the case of bind
      mounts is the source path for the mountpoint.  Instead we should spit
      out the path to the actual subvol.  Fix this by looking up the name for
      the subvolid we have mounted.  With this fix the same test looks like
      this
      
        /dev/sda /mnt/test btrfs rw,relatime,subvolid=256,subvol=/foo 0 0
        /dev/sda /mnt/test/baz btrfs rw,relatime,subvolid=256,subvol=/foo 0 0
      Reported-by: NChris Murphy <chris@colorremedies.com>
      CC: stable@vger.kernel.org # 4.4+
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      3ef3959b
    • D
      btrfs: fix messages after changing compression level by remount · 27942c99
      David Sterba 提交于
      Reported by Forza on IRC that remounting with compression options does
      not reflect the change in level, or at least it does not appear to do so
      according to the messages:
      
        mount -o compress=zstd:1 /dev/sda /mnt
        mount -o remount,compress=zstd:15 /mnt
      
      does not print the change to the level to syslog:
      
        [   41.366060] BTRFS info (device vda): use zstd compression, level 1
        [   41.368254] BTRFS info (device vda): disk space caching is enabled
        [   41.390429] BTRFS info (device vda): disk space caching is enabled
      
      What really happens is that the message is lost but the level is actualy
      changed.
      
      There's another weird output, if compression is reset to 'no':
      
        [   45.413776] BTRFS info (device vda): use no compression, level 4
      
      To fix that, save the previous compression level and print the message
      in that case too and use separate message for 'no' compression.
      
      CC: stable@vger.kernel.org # 4.19+
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      27942c99
  6. 27 7月, 2020 7 次提交
    • J
      btrfs: open-code remount flag setting in btrfs_remount · 88c4703f
      Johannes Thumshirn 提交于
      When we're (re)mounting a btrfs filesystem we set the
      BTRFS_FS_STATE_REMOUNTING state in fs_info to serialize against async
      reclaim or defrags.
      
      This flag is set in btrfs_remount_prepare() called by btrfs_remount().
      As btrfs_remount_prepare() does nothing but setting this flag and
      doesn't have a second caller, we can just open-code the flag setting in
      btrfs_remount().
      
      Similarly do for so clearing of the flag by moving it out of
      btrfs_remount_cleanup() into btrfs_remount() to be symmetrical.
      Signed-off-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      88c4703f
    • J
      btrfs: document special case error codes for fs errors · 59131393
      Josef Bacik 提交于
      We've had some discussions about what to do in certain scenarios for
      error codes, specifically EUCLEAN and EROFS.  Document these near the
      error handling code so its clear what their intentions are.
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      59131393
    • A
      btrfs: don't traverse into the seed devices in show_devname · 4faf55b0
      Anand Jain 提交于
      ->show_devname currently shows the lowest devid in the list. As the seed
      devices have the lowest devid in the sprouted filesystem, the userland
      tool such as findmnt end up seeing seed device instead of the device from
      the read-writable sprouted filesystem. As shown below.
      
       mount /dev/sda /btrfs
       mount: /btrfs: WARNING: device write-protected, mounted read-only.
      
       findmnt --output SOURCE,TARGET,UUID /btrfs
       SOURCE   TARGET UUID
       /dev/sda /btrfs 899f7027-3e46-4626-93e7-7d4c9ad19111
      
       btrfs dev add -f /dev/sdb /btrfs
      
       umount /btrfs
       mount /dev/sdb /btrfs
      
       findmnt --output SOURCE,TARGET,UUID /btrfs
       SOURCE   TARGET UUID
       /dev/sda /btrfs 899f7027-3e46-4626-93e7-7d4c9ad19111
      
      All sprouts from a single seed will show the same seed device and the
      same fsid. That's confusing.
      This is causing problems in our prototype as there isn't any reference
      to the sprout file-system(s) which is being used for actual read and
      write.
      
      This was added in the patch which implemented the show_devname in btrfs
      commit 9c5085c1 ("Btrfs: implement ->show_devname").
      I tried to look for any particular reason that we need to show the seed
      device, there isn't any.
      
      So instead, do not traverse through the seed devices, just show the
      lowest devid in the sprouted fsid.
      
      After the patch:
      
       mount /dev/sda /btrfs
       mount: /btrfs: WARNING: device write-protected, mounted read-only.
      
       findmnt --output SOURCE,TARGET,UUID /btrfs
       SOURCE   TARGET UUID
       /dev/sda /btrfs 899f7027-3e46-4626-93e7-7d4c9ad19111
      
       btrfs dev add -f /dev/sdb /btrfs
       mount -o rw,remount /dev/sdb /btrfs
      
       findmnt --output SOURCE,TARGET,UUID /btrfs
       SOURCE   TARGET UUID
       /dev/sdb /btrfs 595ca0e6-b82e-46b5-b9e2-c72a6928be48
      
       mount /dev/sda /btrfs1
       mount: /btrfs1: WARNING: device write-protected, mounted read-only.
      
       btrfs dev add -f /dev/sdc /btrfs1
      
       findmnt --output SOURCE,TARGET,UUID /btrfs1
       SOURCE   TARGET  UUID
       /dev/sdc /btrfs1 ca1dbb7a-8446-4f95-853c-a20f3f82bdbb
      
       cat /proc/self/mounts | grep btrfs
       /dev/sdb /btrfs btrfs rw,relatime,noacl,space_cache,subvolid=5,subvol=/ 0 0
       /dev/sdc /btrfs1 btrfs ro,relatime,noacl,space_cache,subvolid=5,subvol=/ 0 0
      Reported-by: NMartin K. Petersen <martin.petersen@oracle.com>
      CC: stable@vger.kernel.org # 4.19+
      Tested-by: NMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: NAnand Jain <anand.jain@oracle.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      4faf55b0
    • D
      btrfs: remove deprecated mount option subvolrootid · b90a4ab6
      David Sterba 提交于
      The option subvolrootid used to be a workaround for mounting subvolumes
      and ineffective since 5e2a4b25 ("btrfs: deprecate subvolrootid mount
      option"). We have subvol= that works and we don't need to keep the
      cruft, let's remove it.
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      b90a4ab6
    • D
      btrfs: remove deprecated mount option alloc_start · d801e7a3
      David Sterba 提交于
      The mount option alloc_start has no effect since 0d0c71b3 ("btrfs:
      obsolete and remove mount option alloc_start") which has details why
      it's been deprecated. We can remove it.
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      d801e7a3
    • D
      btrfs: start deprecation of mount option inode_cache · b547a88e
      David Sterba 提交于
      Estimated time of removal of the functionality is 5.11, the option will
      be still parsed but will have no effect.
      
      Reasons for deprecation and removal:
      
      - very poor naming choice of the mount option, it's supposed to cache
        and reuse the inode _numbers_, but it sounds a some generic cache for
        inodes
      
      - the only known usecase where this option would make sense is on a
        32bit architecture where inode numbers in one subvolume would be
        exhausted due to 32bit inode::i_ino
      
      - the cache is stored on disk, consumes space, needs to be loaded and
        written back
      
      - new inode number allocation is slower due to lookups into the cache
        (compared to a simple increment which is the default)
      
      - uses the free-space-cache code that is going to be deprecated as well
        in the future
      
      Known problems:
      
      - since 2011, returning EEXIST when there's not enough space in a page
        to store all checksums, see commit 4b9465cb ("Btrfs: add mount -o
        inode_cache")
      
      Remaining issues:
      
      - if the option was enabled, new inodes created, the option disabled
        again, the cache is still stored on the devices and there's currently
        no way to remove it
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      b547a88e
    • Q
      btrfs: introduce "rescue=" mount option · 74ef0018
      Qu Wenruo 提交于
      This patch introduces a new "rescue=" mount option group for all mount
      options for data recovery.
      
      Different rescue sub options are seperated by ':'. E.g
      "ro,rescue=nologreplay:usebackuproot".
      
      The original plan was to use ';', but ';' needs to be escaped/quoted,
      or it will be interpreted by bash, similar to '|'.
      
      And obviously, user can specify rescue options one by one like:
      "ro,rescue=nologreplay,rescue=usebackuproot".
      
      The following mount options are converted to "rescue=", old mount
      options are deprecated but still available for compatibility purpose:
      
      - usebackuproot
        Now it's "rescue=usebackuproot"
      
      - nologreplay
        Now it's "rescue=nologreplay"
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      74ef0018
  7. 02 7月, 2020 1 次提交
  8. 25 5月, 2020 4 次提交
  9. 24 3月, 2020 5 次提交