1. 27 7月, 2020 4 次提交
    • N
      btrfs: always initialize btrfs_bio::tgtdev_map/raid_map pointers · 608769a4
      Nikolay Borisov 提交于
      Since btrfs_bio always contains the extra space for the tgtdev_map and
      raid_maps it's pointless to make the assignment iff specific conditions
      are met.
      
      Instead, always assign the pointers to their correct value at allocation
      time. To accommodate this change also move code a bit in
      __btrfs_map_block so that btrfs_bio::stripes array is always initialized
      before the raid_map, subsequently move the call to sort_parity_stripes
      in the 'if' building the raid_map, retaining the old behavior.
      
      To better understand the change, there are 2 aspects to this:
      
      1. The original code is harder to grasp because the calculations for
         initializing raid_map/tgtdev ponters are apart from the initial
         allocation of memory. Having them predicated on 2 separate checks
         doesn't help that either... So by moving the initialisation in
         alloc_btrfs_bio puts everything together.
      
      2. tgtdev/raid_maps are now always initialized despite sometimes they
         might be equal i.e __btrfs_map_block_for_discard calls
         alloc_btrfs_bio with tgtdev = 0 but their usage should be predicated
         on external checks i.e. just because those pointers are non-null
         doesn't mean they are valid per-se. And actually while taking another
         look at __btrfs_map_block I saw a discrepancy:
      
         Original code initialised tgtdev_map if the following check is true:
      
      	   if (dev_replace_is_ongoing && dev_replace->tgtdev != NULL)
      
         However, further down tgtdev_map is only used if the following check
         is true:
      
      	if (dev_replace_is_ongoing && dev_replace->tgtdev != NULL && need_full_stripe(op))
      
        e.g. the additional need_full_stripe(op) predicate is there.
      Signed-off-by: NNikolay Borisov <nborisov@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      [ copy more details from mail discussion ]
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      608769a4
    • N
      btrfs: don't check for btrfs_device::bdev in btrfs_end_bio · 3eee86c8
      Nikolay Borisov 提交于
      btrfs_map_bio ensures that all submitted bios to devices have valid
      btrfs_device::bdev so this check can be removed from btrfs_end_bio. This
      check was added in june 2012 597a60fa ("Btrfs: don't count I/O
      statistic read errors for missing devices") but then in October of the
      same year another commit de1ee92a ("Btrfs: recheck bio against
      block device when we map the bio") started checking for the presence of
      btrfs_device::bdev before actually issuing the bio.
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NNikolay Borisov <nborisov@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      3eee86c8
    • N
      btrfs: record btrfs_device directly in btrfs_io_bio · c31efbdf
      Nikolay Borisov 提交于
      Instead of recording stripe_index and using that to access correct
      btrfs_device from btrfs_bio::stripes record the btrfs_device in
      btrfs_io_bio. This will enable endio handlers to increment device
      error counters on checksum errors.
      Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Signed-off-by: NNikolay Borisov <nborisov@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      c31efbdf
    • D
      btrfs: allow use of global block reserve for balance item deletion · 3502a8c0
      David Sterba 提交于
      On a filesystem with exhausted metadata, but still enough to start
      balance, it's possible to hit this error:
      
      [324402.053842] BTRFS info (device loop0): 1 enospc errors during balance
      [324402.060769] BTRFS info (device loop0): balance: ended with status: -28
      [324402.172295] BTRFS: error (device loop0) in reset_balance_state:3321: errno=-28 No space left
      
      It fails inside reset_balance_state and turns the filesystem to
      read-only, which is unnecessary and should be fixed too, but the problem
      is caused by lack for space when the balance item is deleted. This is a
      one-time operation and from the same rank as unlink that is allowed to
      use the global block reserve. So do the same for the balance item.
      
      Status of the filesystem (100GiB) just after the balance fails:
      
      $ btrfs fi df mnt
      Data, single: total=80.01GiB, used=38.58GiB
      System, single: total=4.00MiB, used=16.00KiB
      Metadata, single: total=19.99GiB, used=19.48GiB
      GlobalReserve, single: total=512.00MiB, used=50.11MiB
      
      CC: stable@vger.kernel.org # 4.4+
      Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      3502a8c0
  2. 22 7月, 2020 1 次提交
    • B
      btrfs: fix mount failure caused by race with umount · 48cfa61b
      Boris Burkov 提交于
      It is possible to cause a btrfs mount to fail by racing it with a slow
      umount. The crux of the sequence is generic_shutdown_super not yet
      calling sop->put_super before btrfs_mount_root calls btrfs_open_devices.
      If that occurs, btrfs_open_devices will decide the opened counter is
      non-zero, increment it, and skip resetting fs_devices->total_rw_bytes to
      0. From here, mount will call sget which will result in grab_super
      trying to take the super block umount semaphore. That semaphore will be
      held by the slow umount, so mount will block. Before up-ing the
      semaphore, umount will delete the super block, resulting in mount's sget
      reliably allocating a new one, which causes the mount path to dutifully
      fill it out, and increment total_rw_bytes a second time, which causes
      the mount to fail, as we see double the expected bytes.
      
      Here is the sequence laid out in greater detail:
      
      CPU0                                                    CPU1
      down_write sb->s_umount
      btrfs_kill_super
        kill_anon_super(sb)
          generic_shutdown_super(sb);
            shrink_dcache_for_umount(sb);
            sync_filesystem(sb);
            evict_inodes(sb); // SLOW
      
                                                    btrfs_mount_root
                                                      btrfs_scan_one_device
                                                      fs_devices = device->fs_devices
                                                      fs_info->fs_devices = fs_devices
                                                      // fs_devices-opened makes this a no-op
                                                      btrfs_open_devices(fs_devices, mode, fs_type)
                                                      s = sget(fs_type, test, set, flags, fs_info);
                                                        find sb in s_instances
                                                        grab_super(sb);
                                                          down_write(&s->s_umount); // blocks
      
            sop->put_super(sb)
              // sb->fs_devices->opened == 2; no-op
            spin_lock(&sb_lock);
            hlist_del_init(&sb->s_instances);
            spin_unlock(&sb_lock);
            up_write(&sb->s_umount);
                                                          return 0;
                                                        retry lookup
                                                        don't find sb in s_instances (deleted by CPU0)
                                                        s = alloc_super
                                                        return s;
                                                      btrfs_fill_super(s, fs_devices, data)
                                                        open_ctree // fs_devices total_rw_bytes improperly set!
                                                          btrfs_read_chunk_tree
                                                            read_one_dev // increment total_rw_bytes again!!
                                                            super_total_bytes < fs_devices->total_rw_bytes // ERROR!!!
      
      To fix this, we clear total_rw_bytes from within btrfs_read_chunk_tree
      before the calls to read_one_dev, while holding the sb umount semaphore
      and the uuid mutex.
      
      To reproduce, it is sufficient to dirty a decent number of inodes, then
      quickly umount and mount.
      
        for i in $(seq 0 500)
        do
          dd if=/dev/zero of="/mnt/foo/$i" bs=1M count=1
        done
        umount /mnt/foo&
        mount /mnt/foo
      
      does the trick for me.
      
      CC: stable@vger.kernel.org # 4.4+
      Signed-off-by: NBoris Burkov <boris@bur.io>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      48cfa61b
  3. 25 5月, 2020 5 次提交
    • A
      btrfs: drop stale reference to volume_mutex · ae3e715f
      Anand Jain 提交于
      Commit dccdb07b ("btrfs: kill btrfs_fs_info::volume_mutex") removed
      the last use of the volume_mutex, forgetting to update the comment.
      Signed-off-by: NAnand Jain <anand.jain@oracle.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      ae3e715f
    • A
      btrfs: free alien device after device add · 7f551d96
      Anand Jain 提交于
      When an old device has new fsid through 'btrfs device add -f <dev>' our
      fs_devices list has an alien device in one of the fs_devices lists.
      
      By having an alien device in fs_devices, we have two issues so far
      
      1. missing device does not not show as missing in the userland
      
      2. degraded mount will fail
      
      Both issues are caused by the fact that there's an alien device in the
      fs_devices list. (Alien means that it does not belong to the filesystem,
      identified by fsid, or does not contain btrfs filesystem at all, eg. due
      to overwrite).
      
      A device can be scanned/added through the control device ioctls
      SCAN_DEV, DEVICES_READY or by ADD_DEV.
      
      And device coming through the control device is checked against the all
      other devices in the lists, but this was not the case for ADD_DEV.
      
      This patch fixes both issues above by removing the alien device.
      
      CC: stable@vger.kernel.org # 5.4+
      Signed-off-by: NAnand Jain <anand.jain@oracle.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      7f551d96
    • A
      btrfs: include non-missing as a qualifier for the latest_bdev · 998a0671
      Anand Jain 提交于
      btrfs_free_extra_devids() updates fs_devices::latest_bdev to point to
      the bdev with greatest device::generation number.  For a typical-missing
      device the generation number is zero so fs_devices::latest_bdev will
      never point to it.
      
      But if the missing device is due to alienation [1], then
      device::generation is not zero and if it is greater or equal to the rest
      of device  generations in the list, then fs_devices::latest_bdev ends up
      pointing to the missing device and reports the error like [2].
      
      [1] We maintain devices of a fsid (as in fs_device::fsid) in the
      fs_devices::devices list, a device is considered as an alien device
      if its fsid does not match with the fs_device::fsid
      
      Consider a working filesystem with raid1:
      
        $ mkfs.btrfs -f -d raid1 -m raid1 /dev/sda /dev/sdb
        $ mount /dev/sda /mnt-raid1
        $ umount /mnt-raid1
      
      While mnt-raid1 was unmounted the user force-adds one of its devices to
      another btrfs filesystem:
      
        $ mkfs.btrfs -f /dev/sdc
        $ mount /dev/sdc /mnt-single
        $ btrfs dev add -f /dev/sda /mnt-single
      
      Now the original mnt-raid1 fails to mount in degraded mode, because
      fs_devices::latest_bdev is pointing to the alien device.
      
        $ mount -o degraded /dev/sdb /mnt-raid1
      
      [2]
      mount: wrong fs type, bad option, bad superblock on /dev/sdb,
             missing codepage or helper program, or other error
      
             In some cases useful info is found in syslog - try
             dmesg | tail or so.
      
        kernel: BTRFS warning (device sdb): devid 1 uuid 072a0192-675b-4d5a-8640-a5cf2b2c704d is missing
        kernel: BTRFS error (device sdb): failed to read devices
        kernel: BTRFS error (device sdb): open_ctree failed
      
      Fix the root cause by checking if the device is not missing before it
      can be considered for the fs_devices::latest_bdev.
      
      CC: stable@vger.kernel.org # 4.19+
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NAnand Jain <anand.jain@oracle.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      998a0671
    • A
      btrfs: drop useless goto in open_fs_devices · 1ed802c9
      Anand Jain 提交于
      There is no need of goto out in open_fs_devices() as there is nothing
      special done there.
      Reviewed-by: NNikolay Borisov <nborisov@suse.com>
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NAnand Jain <anand.jain@oracle.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      1ed802c9
    • N
      btrfs: make btrfs_read_disk_super return struct btrfs_disk_super · b335eab8
      Nikolay Borisov 提交于
      Instead of returning both the page and the super block structure, make
      btrfs_read_disk_super just return a pointer to struct btrfs_disk_super.
      As a result the function signature is simplified. Also,
      read_cache_page_gfp can never return NULL so check its return value only
      for IS_ERR.
      Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Signed-off-by: NNikolay Borisov <nborisov@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      b335eab8
  4. 24 3月, 2020 27 次提交
  5. 24 1月, 2020 3 次提交