1. 23 11月, 2020 1 次提交
    • A
      btrfs: dev-replace: fail mount if we don't have replace item with target device · 2db7fb1b
      Anand Jain 提交于
      commit cf89af14 upstream.
      
      If there is a device BTRFS_DEV_REPLACE_DEVID without the device replace
      item, then it means the filesystem is inconsistent state. This is either
      corruption or a crafted image.  Fail the mount as this needs a closer
      look what is actually wrong.
      
      As of now if BTRFS_DEV_REPLACE_DEVID is present without the replace
      item, in __btrfs_free_extra_devids() we determine that there is an
      extra device, and free those extra devices but continue to mount the
      device.
      However, we were wrong in keeping tack of the rw_devices so the syzbot
      testcase failed:
      
        WARNING: CPU: 1 PID: 3612 at fs/btrfs/volumes.c:1166 close_fs_devices.part.0+0x607/0x800 fs/btrfs/volumes.c:1166
        Kernel panic - not syncing: panic_on_warn set ...
        CPU: 1 PID: 3612 Comm: syz-executor.2 Not tainted 5.9.0-rc4-syzkaller #0
        Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
        Call Trace:
         __dump_stack lib/dump_stack.c:77 [inline]
         dump_stack+0x198/0x1fd lib/dump_stack.c:118
         panic+0x347/0x7c0 kernel/panic.c:231
         __warn.cold+0x20/0x46 kernel/panic.c:600
         report_bug+0x1bd/0x210 lib/bug.c:198
         handle_bug+0x38/0x90 arch/x86/kernel/traps.c:234
         exc_invalid_op+0x14/0x40 arch/x86/kernel/traps.c:254
         asm_exc_invalid_op+0x12/0x20 arch/x86/include/asm/idtentry.h:536
        RIP: 0010:close_fs_devices.part.0+0x607/0x800 fs/btrfs/volumes.c:1166
        RSP: 0018:ffffc900091777e0 EFLAGS: 00010246
        RAX: 0000000000040000 RBX: ffffffffffffffff RCX: ffffc9000c8b7000
        RDX: 0000000000040000 RSI: ffffffff83097f47 RDI: 0000000000000007
        RBP: dffffc0000000000 R08: 0000000000000001 R09: ffff8880988a187f
        R10: 0000000000000000 R11: 0000000000000001 R12: ffff88809593a130
        R13: ffff88809593a1ec R14: ffff8880988a1908 R15: ffff88809593a050
         close_fs_devices fs/btrfs/volumes.c:1193 [inline]
         btrfs_close_devices+0x95/0x1f0 fs/btrfs/volumes.c:1179
         open_ctree+0x4984/0x4a2d fs/btrfs/disk-io.c:3434
         btrfs_fill_super fs/btrfs/super.c:1316 [inline]
         btrfs_mount_root.cold+0x14/0x165 fs/btrfs/super.c:1672
      
      The fix here is, when we determine that there isn't a replace item
      then fail the mount if there is a replace target device (devid 0).
      
      CC: stable@vger.kernel.org # 4.19+
      Reported-by: syzbot+4cfe71a4da060be47502@syzkaller.appspotmail.com
      Signed-off-by: NAnand Jain <anand.jain@oracle.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      2db7fb1b
  2. 16 11月, 2020 1 次提交
    • A
      btrfs: fix replace of seed device · 3f4ce8ee
      Anand Jain 提交于
      stable inclusion
      from linux-4.19.155
      commit c9830b1f2b4a7105541cf8796b75e5860805c6fa
      
      --------------------------------
      
      [ Upstream commit c6a5d954 ]
      
      If you replace a seed device in a sprouted fs, it appears to have
      successfully replaced the seed device, but if you look closely, it
      didn't.  Here is an example.
      
        $ mkfs.btrfs /dev/sda
        $ btrfstune -S1 /dev/sda
        $ mount /dev/sda /btrfs
        $ btrfs device add /dev/sdb /btrfs
        $ umount /btrfs
        $ btrfs device scan --forget
        $ mount -o device=/dev/sda /dev/sdb /btrfs
        $ btrfs replace start -f /dev/sda /dev/sdc /btrfs
        $ echo $?
        0
      
        BTRFS info (device sdb): dev_replace from /dev/sda (devid 1) to /dev/sdc started
        BTRFS info (device sdb): dev_replace from /dev/sda (devid 1) to /dev/sdc finished
      
        $ btrfs fi show
        Label: none  uuid: ab2c88b7-be81-4a7e-9849-c3666e7f9f4f
      	  Total devices 2 FS bytes used 256.00KiB
      	  devid    1 size 3.00GiB used 520.00MiB path /dev/sdc
      	  devid    2 size 3.00GiB used 896.00MiB path /dev/sdb
      
        Label: none  uuid: 10bd3202-0415-43af-96a8-d5409f310a7e
      	  Total devices 1 FS bytes used 128.00KiB
      	  devid    1 size 3.00GiB used 536.00MiB path /dev/sda
      
      So as per the replace start command and kernel log replace was successful.
      Now let's try to clean mount.
      
        $ umount /btrfs
        $ btrfs device scan --forget
      
        $ mount -o device=/dev/sdc /dev/sdb /btrfs
        mount: /btrfs: wrong fs type, bad option, bad superblock on /dev/sdb, missing codepage or helper program, or other error.
      
        [  636.157517] BTRFS error (device sdc): failed to read chunk tree: -2
        [  636.180177] BTRFS error (device sdc): open_ctree failed
      
      That's because per dev items it is still looking for the original seed
      device.
      
       $ btrfs inspect-internal dump-tree -d /dev/sdb
      
      	item 0 key (DEV_ITEMS DEV_ITEM 1) itemoff 16185 itemsize 98
      		devid 1 total_bytes 3221225472 bytes_used 545259520
      		io_align 4096 io_width 4096 sector_size 4096 type 0
      		generation 6 start_offset 0 dev_group 0
      		seek_speed 0 bandwidth 0
      		uuid 59368f50-9af2-4b17-91da-8a783cc418d4  <--- seed uuid
      		fsid 10bd3202-0415-43af-96a8-d5409f310a7e  <--- seed fsid
      	item 1 key (DEV_ITEMS DEV_ITEM 2) itemoff 16087 itemsize 98
      		devid 2 total_bytes 3221225472 bytes_used 939524096
      		io_align 4096 io_width 4096 sector_size 4096 type 0
      		generation 0 start_offset 0 dev_group 0
      		seek_speed 0 bandwidth 0
      		uuid 56a0a6bc-4630-4998-8daf-3c3030c4256a  <- sprout uuid
      		fsid ab2c88b7-be81-4a7e-9849-c3666e7f9f4f <- sprout fsid
      
      But the replaced target has the following uuid+fsid in its superblock
      which doesn't match with the expected uuid+fsid in its devitem.
      
        $ btrfs in dump-super /dev/sdc | egrep '^generation|dev_item.uuid|dev_item.fsid|devid'
        generation	20
        dev_item.uuid	59368f50-9af2-4b17-91da-8a783cc418d4
        dev_item.fsid	ab2c88b7-be81-4a7e-9849-c3666e7f9f4f [match]
        dev_item.devid	1
      
      So if you provide the original seed device the mount shall be
      successful.  Which so long happening in the test case btrfs/163.
      
        $ btrfs device scan --forget
        $ mount -o device=/dev/sda /dev/sdb /btrfs
      
      Fix in this patch:
      If a seed is not sprouted then there is no replacement of it, because of
      its read-only filesystem with a read-only device. Similarly, in the case
      of a sprouted filesystem, the seed device is still read only. So, mark
      it as you can't replace a seed device, you can only add a new device and
      then delete the seed device. If replace is attempted then returns
      -EINVAL.
      Signed-off-by: NAnand Jain <anand.jain@oracle.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      3f4ce8ee
  3. 14 4月, 2020 1 次提交
  4. 27 12月, 2019 6 次提交
    • A
      btrfs: merge btrfs_find_device and find_device · d23290e7
      Anand Jain 提交于
      mainline inclusion
      from mainline-v5.1-rc7
      commit 09ba3bc9
      category: bugfix
      bugzilla: 13690
      CVE: CVE-2019-18885
      
      -------------------------------------------------
      
      Both btrfs_find_device() and find_device() does the same thing except
      that the latter does not take the seed device onto account in the device
      scanning context. We can merge them.
      Signed-off-by: NAnand Jain <anand.jain@oracle.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Conflicts:
        fs/btrfs/volumes.c
      [yyl adjust context]
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      Reviewed-by: NYi Zhang <yi.zhang@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      d23290e7
    • A
      btrfs: refactor btrfs_find_device() take fs_devices as argument · ab7de67e
      Anand Jain 提交于
      mainline inclusion
      from mainline-v5.1-rc7
      commit e4319cd9
      category: bugfix
      bugzilla: 13690
      CVE: CVE-2019-18885
      
      -------------------------------------------------
      
      btrfs_find_device() accepts fs_info as an argument and retrieves
      fs_devices from fs_info.
      
      Instead use fs_devices, so that this function can be used in non-mount
      (during device scanning) context as well.
      Signed-off-by: NAnand Jain <anand.jain@oracle.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Conflicts:
        fs/btrfs/volumes.c
      [yyl: adjust context]
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      Reviewed-by: NYi Zhang <yi.zhang@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      ab7de67e
    • N
      btrfs: Ensure replaced device doesn't have pending chunk allocation · 4a377467
      Nikolay Borisov 提交于
      commit debd1c06 upstream.
      
      Recent FITRIM work, namely bbbf7243 ("btrfs: combine device update
      operations during transaction commit") combined the way certain
      operations are recoded in a transaction. As a result an ASSERT was added
      in dev_replace_finish to ensure the new code works correctly.
      Unfortunately I got reports that it's possible to trigger the assert,
      meaning that during a device replace it's possible to have an unfinished
      chunk allocation on the source device.
      
      This is supposed to be prevented by the fact that a transaction is
      committed before finishing the replace oepration and alter acquiring the
      chunk mutex. This is not sufficient since by the time the transaction is
      committed and the chunk mutex acquired it's possible to allocate a chunk
      depending on the workload being executed on the replaced device. This
      bug has been present ever since device replace was introduced but there
      was never code which checks for it.
      
      The correct way to fix is to ensure that there is no pending device
      modification operation when the chunk mutex is acquire and if there is
      repeat transaction commit. Unfortunately it's not possible to just
      exclude the source device from btrfs_fs_devices::dev_alloc_list since
      this causes ENOSPC to be hit in transaction commit.
      
      Fixing that in another way would need to add special cases to handle the
      last writes and forbid new ones. The looped transaction fix is more
      obvious, and can be easily backported. The runtime of dev-replace is
      long so there's no noticeable delay caused by that.
      Reported-by: NDavid Sterba <dsterba@suse.com>
      Fixes: 391cd9df ("Btrfs: fix unprotected alloc list insertion during the finishing procedure of replace")
      CC: stable@vger.kernel.org # 4.4+
      Signed-off-by: NNikolay Borisov <nborisov@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      4a377467
    • A
      btrfs: fix use-after-free due to race between replace start and cancel · 55cf3ae3
      Anand Jain 提交于
      [ Upstream commit d189dd70 ]
      
      The device replace cancel thread can race with the replace start thread
      and if fs_info::scrubs_running is not yet set, btrfs_scrub_cancel() will
      fail to stop the scrub thread.
      
      The scrub thread continues with the scrub for replace which then will
      try to write to the target device and which is already freed by the
      cancel thread.
      
      scrub_setup_ctx() warns as tgtdev is NULL.
      
        struct scrub_ctx *scrub_setup_ctx(struct btrfs_device *dev, int is_dev_replace)
        {
        ...
      	  if (is_dev_replace) {
      		  WARN_ON(!fs_info->dev_replace.tgtdev);  <===
      		  sctx->pages_per_wr_bio = SCRUB_PAGES_PER_WR_BIO;
      		  sctx->wr_tgtdev = fs_info->dev_replace.tgtdev;
      		  sctx->flush_all_writes = false;
      	  }
      
        [ 6724.497655] BTRFS info (device sdb): dev_replace from /dev/sdb (devid 1) to /dev/sdc started
        [ 6753.945017] BTRFS info (device sdb): dev_replace from /dev/sdb (devid 1) to /dev/sdc canceled
        [ 6852.426700] WARNING: CPU: 0 PID: 4494 at fs/btrfs/scrub.c:622 scrub_setup_ctx.isra.19+0x220/0x230 [btrfs]
        ...
        [ 6852.428928] RIP: 0010:scrub_setup_ctx.isra.19+0x220/0x230 [btrfs]
        ...
        [ 6852.432970] Call Trace:
        [ 6852.433202]  btrfs_scrub_dev+0x19b/0x5c0 [btrfs]
        [ 6852.433471]  btrfs_dev_replace_start+0x48c/0x6a0 [btrfs]
        [ 6852.433800]  btrfs_dev_replace_by_ioctl+0x3a/0x60 [btrfs]
        [ 6852.434097]  btrfs_ioctl+0x2476/0x2d20 [btrfs]
        [ 6852.434365]  ? do_sigaction+0x7d/0x1e0
        [ 6852.434623]  do_vfs_ioctl+0xa9/0x6c0
        [ 6852.434865]  ? syscall_trace_enter+0x1c8/0x310
        [ 6852.435124]  ? syscall_trace_enter+0x1c8/0x310
        [ 6852.435387]  ksys_ioctl+0x60/0x90
        [ 6852.435663]  __x64_sys_ioctl+0x16/0x20
        [ 6852.435907]  do_syscall_64+0x50/0x180
        [ 6852.436150]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Further, as the replace thread enters scrub_write_page_to_dev_replace()
      without the target device it panics:
      
        static int scrub_add_page_to_wr_bio(struct scrub_ctx *sctx,
      				      struct scrub_page *spage)
        {
        ...
      	bio_set_dev(bio, sbio->dev->bdev); <======
      
        [ 6929.715145] BUG: unable to handle kernel NULL pointer dereference at 00000000000000a0
        ..
        [ 6929.717106] Workqueue: btrfs-scrub btrfs_scrub_helper [btrfs]
        [ 6929.717420] RIP: 0010:scrub_write_page_to_dev_replace+0xb4/0x260
        [btrfs]
        ..
        [ 6929.721430] Call Trace:
        [ 6929.721663]  scrub_write_block_to_dev_replace+0x3f/0x60 [btrfs]
        [ 6929.721975]  scrub_bio_end_io_worker+0x1af/0x490 [btrfs]
        [ 6929.722277]  normal_work_helper+0xf0/0x4c0 [btrfs]
        [ 6929.722552]  process_one_work+0x1f4/0x520
        [ 6929.722805]  ? process_one_work+0x16e/0x520
        [ 6929.723063]  worker_thread+0x46/0x3d0
        [ 6929.723313]  kthread+0xf8/0x130
        [ 6929.723544]  ? process_one_work+0x520/0x520
        [ 6929.723800]  ? kthread_delayed_work_timer_fn+0x80/0x80
        [ 6929.724081]  ret_from_fork+0x3a/0x50
      
      Fix this by letting the btrfs_dev_replace_finishing() to do the job of
      cleaning after the cancel, including freeing of the target device.
      btrfs_dev_replace_finishing() is called when btrfs_scub_dev() returns
      along with the scrub return status.
      Signed-off-by: NAnand Jain <anand.jain@oracle.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      55cf3ae3
    • A
      btrfs: dev-replace: go back to suspend state if another EXCL_OP is running · 0a64c690
      Anand Jain 提交于
      commit 05c49e6b upstream.
      
      In a secnario where balance and replace co-exists as below,
      
        - start balance
        - pause balance
        - start replace
        - reboot
      
      and when system restarts, balance resumes first. Then the replace is
      attempted to restart but will fail as the EXCL_OP lock is already held
      by the balance. If so place the replace state back to
      BTRFS_IOCTL_DEV_REPLACE_STATE_SUSPENDED state.
      
      Fixes: 010a47bd ("btrfs: add proper safety check before resuming dev-replace")
      CC: stable@vger.kernel.org # 4.18+
      Signed-off-by: NAnand Jain <anand.jain@oracle.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      0a64c690
    • A
      btrfs: dev-replace: go back to suspended state if target device is missing · 5ba72b6e
      Anand Jain 提交于
      commit 0d228ece upstream.
      
      At the time of forced unmount we place the running replace to
      BTRFS_IOCTL_DEV_REPLACE_STATE_SUSPENDED state, so when the system comes
      back and expect the target device is missing.
      
      Then let the replace state continue to be in
      BTRFS_IOCTL_DEV_REPLACE_STATE_SUSPENDED state instead of
      BTRFS_IOCTL_DEV_REPLACE_STATE_STARTED as there isn't any matching scrub
      running as part of replace.
      
      Fixes: e93c89c1 ("Btrfs: add new sources for device replace code")
      CC: stable@vger.kernel.org # 4.4+
      Signed-off-by: NAnand Jain <anand.jain@oracle.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      5ba72b6e
  5. 14 11月, 2018 1 次提交
    • J
      btrfs: fix error handling in btrfs_dev_replace_start · 18bdce0e
      Jeff Mahoney 提交于
      commit 5c06147128fbbdf7a84232c5f0d808f53153defe upstream.
      
      When we fail to start a transaction in btrfs_dev_replace_start, we leave
      dev_replace->replace_start set to STARTED but clear ->srcdev and
      ->tgtdev.  Later, that can result in an Oops in
      btrfs_dev_replace_progress when having state set to STARTED or SUSPENDED
      implies that ->srcdev is valid.
      
      Also fix error handling when the state is already STARTED or SUSPENDED
      while starting.  That, too, will clear ->srcdev and ->tgtdev even though
      it doesn't own them.  This should be an impossible case to hit since we
      should be protected by the BTRFS_FS_EXCL_OP bit being set.  Let's add an
      ASSERT there while we're at it.
      
      Fixes: e93c89c1 (Btrfs: add new sources for device replace code)
      CC: stable@vger.kernel.org # 4.4+
      Signed-off-by: NJeff Mahoney <jeffm@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      18bdce0e
  6. 06 8月, 2018 6 次提交
  7. 29 5月, 2018 8 次提交
  8. 12 4月, 2018 1 次提交
  9. 31 3月, 2018 1 次提交
  10. 26 3月, 2018 6 次提交
  11. 22 1月, 2018 3 次提交
  12. 16 8月, 2017 2 次提交
  13. 17 7月, 2017 1 次提交
    • D
      VFS: Convert sb->s_flags & MS_RDONLY to sb_rdonly(sb) · bc98a42c
      David Howells 提交于
      Firstly by applying the following with coccinelle's spatch:
      
      	@@ expression SB; @@
      	-SB->s_flags & MS_RDONLY
      	+sb_rdonly(SB)
      
      to effect the conversion to sb_rdonly(sb), then by applying:
      
      	@@ expression A, SB; @@
      	(
      	-(!sb_rdonly(SB)) && A
      	+!sb_rdonly(SB) && A
      	|
      	-A != (sb_rdonly(SB))
      	+A != sb_rdonly(SB)
      	|
      	-A == (sb_rdonly(SB))
      	+A == sb_rdonly(SB)
      	|
      	-!(sb_rdonly(SB))
      	+!sb_rdonly(SB)
      	|
      	-A && (sb_rdonly(SB))
      	+A && sb_rdonly(SB)
      	|
      	-A || (sb_rdonly(SB))
      	+A || sb_rdonly(SB)
      	|
      	-(sb_rdonly(SB)) != A
      	+sb_rdonly(SB) != A
      	|
      	-(sb_rdonly(SB)) == A
      	+sb_rdonly(SB) == A
      	|
      	-(sb_rdonly(SB)) && A
      	+sb_rdonly(SB) && A
      	|
      	-(sb_rdonly(SB)) || A
      	+sb_rdonly(SB) || A
      	)
      
      	@@ expression A, B, SB; @@
      	(
      	-(sb_rdonly(SB)) ? 1 : 0
      	+sb_rdonly(SB)
      	|
      	-(sb_rdonly(SB)) ? A : B
      	+sb_rdonly(SB) ? A : B
      	)
      
      to remove left over excess bracketage and finally by applying:
      
      	@@ expression A, SB; @@
      	(
      	-(A & MS_RDONLY) != sb_rdonly(SB)
      	+(bool)(A & MS_RDONLY) != sb_rdonly(SB)
      	|
      	-(A & MS_RDONLY) == sb_rdonly(SB)
      	+(bool)(A & MS_RDONLY) == sb_rdonly(SB)
      	)
      
      to make comparisons against the result of sb_rdonly() (which is a bool)
      work correctly.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      bc98a42c
  14. 30 6月, 2017 1 次提交
  15. 18 4月, 2017 1 次提交