1. 27 7月, 2020 33 次提交
  2. 25 7月, 2020 1 次提交
  3. 24 7月, 2020 2 次提交
  4. 23 7月, 2020 1 次提交
  5. 22 7月, 2020 3 次提交
    • B
      btrfs: fix mount failure caused by race with umount · 48cfa61b
      Boris Burkov 提交于
      It is possible to cause a btrfs mount to fail by racing it with a slow
      umount. The crux of the sequence is generic_shutdown_super not yet
      calling sop->put_super before btrfs_mount_root calls btrfs_open_devices.
      If that occurs, btrfs_open_devices will decide the opened counter is
      non-zero, increment it, and skip resetting fs_devices->total_rw_bytes to
      0. From here, mount will call sget which will result in grab_super
      trying to take the super block umount semaphore. That semaphore will be
      held by the slow umount, so mount will block. Before up-ing the
      semaphore, umount will delete the super block, resulting in mount's sget
      reliably allocating a new one, which causes the mount path to dutifully
      fill it out, and increment total_rw_bytes a second time, which causes
      the mount to fail, as we see double the expected bytes.
      
      Here is the sequence laid out in greater detail:
      
      CPU0                                                    CPU1
      down_write sb->s_umount
      btrfs_kill_super
        kill_anon_super(sb)
          generic_shutdown_super(sb);
            shrink_dcache_for_umount(sb);
            sync_filesystem(sb);
            evict_inodes(sb); // SLOW
      
                                                    btrfs_mount_root
                                                      btrfs_scan_one_device
                                                      fs_devices = device->fs_devices
                                                      fs_info->fs_devices = fs_devices
                                                      // fs_devices-opened makes this a no-op
                                                      btrfs_open_devices(fs_devices, mode, fs_type)
                                                      s = sget(fs_type, test, set, flags, fs_info);
                                                        find sb in s_instances
                                                        grab_super(sb);
                                                          down_write(&s->s_umount); // blocks
      
            sop->put_super(sb)
              // sb->fs_devices->opened == 2; no-op
            spin_lock(&sb_lock);
            hlist_del_init(&sb->s_instances);
            spin_unlock(&sb_lock);
            up_write(&sb->s_umount);
                                                          return 0;
                                                        retry lookup
                                                        don't find sb in s_instances (deleted by CPU0)
                                                        s = alloc_super
                                                        return s;
                                                      btrfs_fill_super(s, fs_devices, data)
                                                        open_ctree // fs_devices total_rw_bytes improperly set!
                                                          btrfs_read_chunk_tree
                                                            read_one_dev // increment total_rw_bytes again!!
                                                            super_total_bytes < fs_devices->total_rw_bytes // ERROR!!!
      
      To fix this, we clear total_rw_bytes from within btrfs_read_chunk_tree
      before the calls to read_one_dev, while holding the sb umount semaphore
      and the uuid mutex.
      
      To reproduce, it is sufficient to dirty a decent number of inodes, then
      quickly umount and mount.
      
        for i in $(seq 0 500)
        do
          dd if=/dev/zero of="/mnt/foo/$i" bs=1M count=1
        done
        umount /mnt/foo&
        mount /mnt/foo
      
      does the trick for me.
      
      CC: stable@vger.kernel.org # 4.4+
      Signed-off-by: NBoris Burkov <boris@bur.io>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      48cfa61b
    • R
      btrfs: fix page leaks after failure to lock page for delalloc · 5909ca11
      Robbie Ko 提交于
      When locking pages for delalloc, we check if it's dirty and mapping still
      matches. If it does not match, we need to return -EAGAIN and release all
      pages. Only the current page was put though, iterate over all the
      remaining pages too.
      
      CC: stable@vger.kernel.org # 4.14+
      Reviewed-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: NNikolay Borisov <nborisov@suse.com>
      Signed-off-by: NRobbie Ko <robbieko@synology.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      5909ca11
    • Q
      btrfs: qgroup: fix data leak caused by race between writeback and truncate · fa91e4aa
      Qu Wenruo 提交于
      [BUG]
      When running tests like generic/013 on test device with btrfs quota
      enabled, it can normally lead to data leak, detected at unmount time:
      
        BTRFS warning (device dm-3): qgroup 0/5 has unreleased space, type 0 rsv 4096
        ------------[ cut here ]------------
        WARNING: CPU: 11 PID: 16386 at fs/btrfs/disk-io.c:4142 close_ctree+0x1dc/0x323 [btrfs]
        RIP: 0010:close_ctree+0x1dc/0x323 [btrfs]
        Call Trace:
         btrfs_put_super+0x15/0x17 [btrfs]
         generic_shutdown_super+0x72/0x110
         kill_anon_super+0x18/0x30
         btrfs_kill_super+0x17/0x30 [btrfs]
         deactivate_locked_super+0x3b/0xa0
         deactivate_super+0x40/0x50
         cleanup_mnt+0x135/0x190
         __cleanup_mnt+0x12/0x20
         task_work_run+0x64/0xb0
         __prepare_exit_to_usermode+0x1bc/0x1c0
         __syscall_return_slowpath+0x47/0x230
         do_syscall_64+0x64/0xb0
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
        ---[ end trace caf08beafeca2392 ]---
        BTRFS error (device dm-3): qgroup reserved space leaked
      
      [CAUSE]
      In the offending case, the offending operations are:
      2/6: writev f2X[269 1 0 0 0 0] [1006997,67,288] 0
      2/7: truncate f2X[269 1 0 0 48 1026293] 18388 0
      
      The following sequence of events could happen after the writev():
      	CPU1 (writeback)		|		CPU2 (truncate)
      -----------------------------------------------------------------
      btrfs_writepages()			|
      |- extent_write_cache_pages()		|
         |- Got page for 1003520		|
         |  1003520 is Dirty, no writeback	|
         |  So (!clear_page_dirty_for_io())   |
         |  gets called for it		|
         |- Now page 1003520 is Clean.	|
         |					| btrfs_setattr()
         |					| |- btrfs_setsize()
         |					|    |- truncate_setsize()
         |					|       New i_size is 18388
         |- __extent_writepage()		|
         |  |- page_offset() > i_size		|
            |- btrfs_invalidatepage()		|
      	 |- Page is clean, so no qgroup |
      	    callback executed
      
      This means, the qgroup reserved data space is not properly released in
      btrfs_invalidatepage() as the page is Clean.
      
      [FIX]
      Instead of checking the dirty bit of a page, call
      btrfs_qgroup_free_data() unconditionally in btrfs_invalidatepage().
      
      As qgroup rsv are completely bound to the QGROUP_RESERVED bit of
      io_tree, not bound to page status, thus we won't cause double freeing
      anyway.
      
      Fixes: 0b34c261 ("btrfs: qgroup: Prevent qgroup->reserved from going subzero")
      CC: stable@vger.kernel.org # 4.14+
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      fa91e4aa