1. 29 8月, 2020 1 次提交
  2. 28 8月, 2020 3 次提交
  3. 27 8月, 2020 9 次提交
    • Q
      btrfs: tree-checker: fix the error message for transid error · f96d6960
      Qu Wenruo 提交于
      The error message for inode transid is the same as for inode generation,
      which makes us unable to detect the real problem.
      Reported-by: NTyler Richmond <t.d.richmond@gmail.com>
      Fixes: 496245ca ("btrfs: tree-checker: Verify inode item")
      CC: stable@vger.kernel.org # 5.4+
      Reviewed-by: NMarcos Paulo de Souza <mpdesouza@suse.com>
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      f96d6960
    • J
      btrfs: set the lockdep class for log tree extent buffers · d3beaa25
      Josef Bacik 提交于
      These are special extent buffers that get rewound in order to lookup
      the state of the tree at a specific point in time.  As such they do not
      go through the normal initialization paths that set their lockdep class,
      so handle them appropriately when they are created and before they are
      locked.
      
      CC: stable@vger.kernel.org # 4.4+
      Reviewed-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      d3beaa25
    • J
      btrfs: set the correct lockdep class for new nodes · ad244665
      Josef Bacik 提交于
      When flipping over to the rw_semaphore I noticed I'd get a lockdep splat
      in replace_path(), which is weird because we're swapping the reloc root
      with the actual target root.  Turns out this is because we're using the
      root->root_key.objectid as the root id for the newly allocated tree
      block when setting the lockdep class, however we need to be using the
      actual owner of this new block, which is saved in owner.
      
      The affected path is through btrfs_copy_root as all other callers of
      btrfs_alloc_tree_block (which calls init_new_buffer) have root_objectid
      == root->root_key.objectid .
      
      CC: stable@vger.kernel.org # 5.4+
      Reviewed-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: NNikolay Borisov <nborisov@suse.com>
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      ad244665
    • J
      btrfs: allocate scrub workqueues outside of locks · e89c4a9c
      Josef Bacik 提交于
      I got the following lockdep splat while testing:
      
        ======================================================
        WARNING: possible circular locking dependency detected
        5.8.0-rc7-00172-g021118712e59 #932 Not tainted
        ------------------------------------------------------
        btrfs/229626 is trying to acquire lock:
        ffffffff828513f0 (cpu_hotplug_lock){++++}-{0:0}, at: alloc_workqueue+0x378/0x450
      
        but task is already holding lock:
        ffff889dd3889518 (&fs_info->scrub_lock){+.+.}-{3:3}, at: btrfs_scrub_dev+0x11c/0x630
      
        which lock already depends on the new lock.
      
        the existing dependency chain (in reverse order) is:
      
        -> #7 (&fs_info->scrub_lock){+.+.}-{3:3}:
      	 __mutex_lock+0x9f/0x930
      	 btrfs_scrub_dev+0x11c/0x630
      	 btrfs_dev_replace_by_ioctl.cold.21+0x10a/0x1d4
      	 btrfs_ioctl+0x2799/0x30a0
      	 ksys_ioctl+0x83/0xc0
      	 __x64_sys_ioctl+0x16/0x20
      	 do_syscall_64+0x50/0x90
      	 entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
        -> #6 (&fs_devs->device_list_mutex){+.+.}-{3:3}:
      	 __mutex_lock+0x9f/0x930
      	 btrfs_run_dev_stats+0x49/0x480
      	 commit_cowonly_roots+0xb5/0x2a0
      	 btrfs_commit_transaction+0x516/0xa60
      	 sync_filesystem+0x6b/0x90
      	 generic_shutdown_super+0x22/0x100
      	 kill_anon_super+0xe/0x30
      	 btrfs_kill_super+0x12/0x20
      	 deactivate_locked_super+0x29/0x60
      	 cleanup_mnt+0xb8/0x140
      	 task_work_run+0x6d/0xb0
      	 __prepare_exit_to_usermode+0x1cc/0x1e0
      	 do_syscall_64+0x5c/0x90
      	 entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
        -> #5 (&fs_info->tree_log_mutex){+.+.}-{3:3}:
      	 __mutex_lock+0x9f/0x930
      	 btrfs_commit_transaction+0x4bb/0xa60
      	 sync_filesystem+0x6b/0x90
      	 generic_shutdown_super+0x22/0x100
      	 kill_anon_super+0xe/0x30
      	 btrfs_kill_super+0x12/0x20
      	 deactivate_locked_super+0x29/0x60
      	 cleanup_mnt+0xb8/0x140
      	 task_work_run+0x6d/0xb0
      	 __prepare_exit_to_usermode+0x1cc/0x1e0
      	 do_syscall_64+0x5c/0x90
      	 entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
        -> #4 (&fs_info->reloc_mutex){+.+.}-{3:3}:
      	 __mutex_lock+0x9f/0x930
      	 btrfs_record_root_in_trans+0x43/0x70
      	 start_transaction+0xd1/0x5d0
      	 btrfs_dirty_inode+0x42/0xd0
      	 touch_atime+0xa1/0xd0
      	 btrfs_file_mmap+0x3f/0x60
      	 mmap_region+0x3a4/0x640
      	 do_mmap+0x376/0x580
      	 vm_mmap_pgoff+0xd5/0x120
      	 ksys_mmap_pgoff+0x193/0x230
      	 do_syscall_64+0x50/0x90
      	 entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
        -> #3 (&mm->mmap_lock#2){++++}-{3:3}:
      	 __might_fault+0x68/0x90
      	 _copy_to_user+0x1e/0x80
      	 perf_read+0x141/0x2c0
      	 vfs_read+0xad/0x1b0
      	 ksys_read+0x5f/0xe0
      	 do_syscall_64+0x50/0x90
      	 entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
        -> #2 (&cpuctx_mutex){+.+.}-{3:3}:
      	 __mutex_lock+0x9f/0x930
      	 perf_event_init_cpu+0x88/0x150
      	 perf_event_init+0x1db/0x20b
      	 start_kernel+0x3ae/0x53c
      	 secondary_startup_64+0xa4/0xb0
      
        -> #1 (pmus_lock){+.+.}-{3:3}:
      	 __mutex_lock+0x9f/0x930
      	 perf_event_init_cpu+0x4f/0x150
      	 cpuhp_invoke_callback+0xb1/0x900
      	 _cpu_up.constprop.26+0x9f/0x130
      	 cpu_up+0x7b/0xc0
      	 bringup_nonboot_cpus+0x4f/0x60
      	 smp_init+0x26/0x71
      	 kernel_init_freeable+0x110/0x258
      	 kernel_init+0xa/0x103
      	 ret_from_fork+0x1f/0x30
      
        -> #0 (cpu_hotplug_lock){++++}-{0:0}:
      	 __lock_acquire+0x1272/0x2310
      	 lock_acquire+0x9e/0x360
      	 cpus_read_lock+0x39/0xb0
      	 alloc_workqueue+0x378/0x450
      	 __btrfs_alloc_workqueue+0x15d/0x200
      	 btrfs_alloc_workqueue+0x51/0x160
      	 scrub_workers_get+0x5a/0x170
      	 btrfs_scrub_dev+0x18c/0x630
      	 btrfs_dev_replace_by_ioctl.cold.21+0x10a/0x1d4
      	 btrfs_ioctl+0x2799/0x30a0
      	 ksys_ioctl+0x83/0xc0
      	 __x64_sys_ioctl+0x16/0x20
      	 do_syscall_64+0x50/0x90
      	 entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
        other info that might help us debug this:
      
        Chain exists of:
          cpu_hotplug_lock --> &fs_devs->device_list_mutex --> &fs_info->scrub_lock
      
         Possible unsafe locking scenario:
      
      	 CPU0                    CPU1
      	 ----                    ----
          lock(&fs_info->scrub_lock);
      				 lock(&fs_devs->device_list_mutex);
      				 lock(&fs_info->scrub_lock);
          lock(cpu_hotplug_lock);
      
         *** DEADLOCK ***
      
        2 locks held by btrfs/229626:
         #0: ffff88bfe8bb86e0 (&fs_devs->device_list_mutex){+.+.}-{3:3}, at: btrfs_scrub_dev+0xbd/0x630
         #1: ffff889dd3889518 (&fs_info->scrub_lock){+.+.}-{3:3}, at: btrfs_scrub_dev+0x11c/0x630
      
        stack backtrace:
        CPU: 15 PID: 229626 Comm: btrfs Kdump: loaded Not tainted 5.8.0-rc7-00172-g021118712e59 #932
        Hardware name: Quanta Tioga Pass Single Side 01-0030993006/Tioga Pass Single Side, BIOS F08_3A18 12/20/2018
        Call Trace:
         dump_stack+0x78/0xa0
         check_noncircular+0x165/0x180
         __lock_acquire+0x1272/0x2310
         lock_acquire+0x9e/0x360
         ? alloc_workqueue+0x378/0x450
         cpus_read_lock+0x39/0xb0
         ? alloc_workqueue+0x378/0x450
         alloc_workqueue+0x378/0x450
         ? rcu_read_lock_sched_held+0x52/0x80
         __btrfs_alloc_workqueue+0x15d/0x200
         btrfs_alloc_workqueue+0x51/0x160
         scrub_workers_get+0x5a/0x170
         btrfs_scrub_dev+0x18c/0x630
         ? start_transaction+0xd1/0x5d0
         btrfs_dev_replace_by_ioctl.cold.21+0x10a/0x1d4
         btrfs_ioctl+0x2799/0x30a0
         ? do_sigaction+0x102/0x250
         ? lockdep_hardirqs_on_prepare+0xca/0x160
         ? _raw_spin_unlock_irq+0x24/0x30
         ? trace_hardirqs_on+0x1c/0xe0
         ? _raw_spin_unlock_irq+0x24/0x30
         ? do_sigaction+0x102/0x250
         ? ksys_ioctl+0x83/0xc0
         ksys_ioctl+0x83/0xc0
         __x64_sys_ioctl+0x16/0x20
         do_syscall_64+0x50/0x90
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      This happens because we're allocating the scrub workqueues under the
      scrub and device list mutex, which brings in a whole host of other
      dependencies.
      
      Because the work queue allocation is done with GFP_KERNEL, it can
      trigger reclaim, which can lead to a transaction commit, which in turns
      needs the device_list_mutex, it can lead to a deadlock. A different
      problem for which this fix is a solution.
      
      Fix this by moving the actual allocation outside of the
      scrub lock, and then only take the lock once we're ready to actually
      assign them to the fs_info.  We'll now have to cleanup the workqueues in
      a few more places, so I've added a helper to do the refcount dance to
      safely free the workqueues.
      
      CC: stable@vger.kernel.org # 5.4+
      Reviewed-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      e89c4a9c
    • J
      btrfs: fix potential deadlock in the search ioctl · a48b73ec
      Josef Bacik 提交于
      With the conversion of the tree locks to rwsem I got the following
      lockdep splat:
      
        ======================================================
        WARNING: possible circular locking dependency detected
        5.8.0-rc7-00165-g04ec4da5f45f-dirty #922 Not tainted
        ------------------------------------------------------
        compsize/11122 is trying to acquire lock:
        ffff889fabca8768 (&mm->mmap_lock#2){++++}-{3:3}, at: __might_fault+0x3e/0x90
      
        but task is already holding lock:
        ffff889fe720fe40 (btrfs-fs-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x39/0x180
      
        which lock already depends on the new lock.
      
        the existing dependency chain (in reverse order) is:
      
        -> #2 (btrfs-fs-00){++++}-{3:3}:
      	 down_write_nested+0x3b/0x70
      	 __btrfs_tree_lock+0x24/0x120
      	 btrfs_search_slot+0x756/0x990
      	 btrfs_lookup_inode+0x3a/0xb4
      	 __btrfs_update_delayed_inode+0x93/0x270
      	 btrfs_async_run_delayed_root+0x168/0x230
      	 btrfs_work_helper+0xd4/0x570
      	 process_one_work+0x2ad/0x5f0
      	 worker_thread+0x3a/0x3d0
      	 kthread+0x133/0x150
      	 ret_from_fork+0x1f/0x30
      
        -> #1 (&delayed_node->mutex){+.+.}-{3:3}:
      	 __mutex_lock+0x9f/0x930
      	 btrfs_delayed_update_inode+0x50/0x440
      	 btrfs_update_inode+0x8a/0xf0
      	 btrfs_dirty_inode+0x5b/0xd0
      	 touch_atime+0xa1/0xd0
      	 btrfs_file_mmap+0x3f/0x60
      	 mmap_region+0x3a4/0x640
      	 do_mmap+0x376/0x580
      	 vm_mmap_pgoff+0xd5/0x120
      	 ksys_mmap_pgoff+0x193/0x230
      	 do_syscall_64+0x50/0x90
      	 entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
        -> #0 (&mm->mmap_lock#2){++++}-{3:3}:
      	 __lock_acquire+0x1272/0x2310
      	 lock_acquire+0x9e/0x360
      	 __might_fault+0x68/0x90
      	 _copy_to_user+0x1e/0x80
      	 copy_to_sk.isra.32+0x121/0x300
      	 search_ioctl+0x106/0x200
      	 btrfs_ioctl_tree_search_v2+0x7b/0xf0
      	 btrfs_ioctl+0x106f/0x30a0
      	 ksys_ioctl+0x83/0xc0
      	 __x64_sys_ioctl+0x16/0x20
      	 do_syscall_64+0x50/0x90
      	 entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
        other info that might help us debug this:
      
        Chain exists of:
          &mm->mmap_lock#2 --> &delayed_node->mutex --> btrfs-fs-00
      
         Possible unsafe locking scenario:
      
      	 CPU0                    CPU1
      	 ----                    ----
          lock(btrfs-fs-00);
      				 lock(&delayed_node->mutex);
      				 lock(btrfs-fs-00);
          lock(&mm->mmap_lock#2);
      
         *** DEADLOCK ***
      
        1 lock held by compsize/11122:
         #0: ffff889fe720fe40 (btrfs-fs-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x39/0x180
      
        stack backtrace:
        CPU: 17 PID: 11122 Comm: compsize Kdump: loaded Not tainted 5.8.0-rc7-00165-g04ec4da5f45f-dirty #922
        Hardware name: Quanta Tioga Pass Single Side 01-0030993006/Tioga Pass Single Side, BIOS F08_3A18 12/20/2018
        Call Trace:
         dump_stack+0x78/0xa0
         check_noncircular+0x165/0x180
         __lock_acquire+0x1272/0x2310
         lock_acquire+0x9e/0x360
         ? __might_fault+0x3e/0x90
         ? find_held_lock+0x72/0x90
         __might_fault+0x68/0x90
         ? __might_fault+0x3e/0x90
         _copy_to_user+0x1e/0x80
         copy_to_sk.isra.32+0x121/0x300
         ? btrfs_search_forward+0x2a6/0x360
         search_ioctl+0x106/0x200
         btrfs_ioctl_tree_search_v2+0x7b/0xf0
         btrfs_ioctl+0x106f/0x30a0
         ? __do_sys_newfstat+0x5a/0x70
         ? ksys_ioctl+0x83/0xc0
         ksys_ioctl+0x83/0xc0
         __x64_sys_ioctl+0x16/0x20
         do_syscall_64+0x50/0x90
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      The problem is we're doing a copy_to_user() while holding tree locks,
      which can deadlock if we have to do a page fault for the copy_to_user().
      This exists even without my locking changes, so it needs to be fixed.
      Rework the search ioctl to do the pre-fault and then
      copy_to_user_nofault for the copying.
      
      CC: stable@vger.kernel.org # 4.4+
      Reviewed-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      a48b73ec
    • J
      btrfs: drop path before adding new uuid tree entry · 9771a5cf
      Josef Bacik 提交于
      With the conversion of the tree locks to rwsem I got the following
      lockdep splat:
      
        ======================================================
        WARNING: possible circular locking dependency detected
        5.8.0-rc7-00167-g0d7ba0c5b375-dirty #925 Not tainted
        ------------------------------------------------------
        btrfs-uuid/7955 is trying to acquire lock:
        ffff88bfbafec0f8 (btrfs-root-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x39/0x180
      
        but task is already holding lock:
        ffff88bfbafef2a8 (btrfs-uuid-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x39/0x180
      
        which lock already depends on the new lock.
      
        the existing dependency chain (in reverse order) is:
      
        -> #1 (btrfs-uuid-00){++++}-{3:3}:
      	 down_read_nested+0x3e/0x140
      	 __btrfs_tree_read_lock+0x39/0x180
      	 __btrfs_read_lock_root_node+0x3a/0x50
      	 btrfs_search_slot+0x4bd/0x990
      	 btrfs_uuid_tree_add+0x89/0x2d0
      	 btrfs_uuid_scan_kthread+0x330/0x390
      	 kthread+0x133/0x150
      	 ret_from_fork+0x1f/0x30
      
        -> #0 (btrfs-root-00){++++}-{3:3}:
      	 __lock_acquire+0x1272/0x2310
      	 lock_acquire+0x9e/0x360
      	 down_read_nested+0x3e/0x140
      	 __btrfs_tree_read_lock+0x39/0x180
      	 __btrfs_read_lock_root_node+0x3a/0x50
      	 btrfs_search_slot+0x4bd/0x990
      	 btrfs_find_root+0x45/0x1b0
      	 btrfs_read_tree_root+0x61/0x100
      	 btrfs_get_root_ref.part.50+0x143/0x630
      	 btrfs_uuid_tree_iterate+0x207/0x314
      	 btrfs_uuid_rescan_kthread+0x12/0x50
      	 kthread+0x133/0x150
      	 ret_from_fork+0x1f/0x30
      
        other info that might help us debug this:
      
         Possible unsafe locking scenario:
      
      	 CPU0                    CPU1
      	 ----                    ----
          lock(btrfs-uuid-00);
      				 lock(btrfs-root-00);
      				 lock(btrfs-uuid-00);
          lock(btrfs-root-00);
      
         *** DEADLOCK ***
      
        1 lock held by btrfs-uuid/7955:
         #0: ffff88bfbafef2a8 (btrfs-uuid-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x39/0x180
      
        stack backtrace:
        CPU: 73 PID: 7955 Comm: btrfs-uuid Kdump: loaded Not tainted 5.8.0-rc7-00167-g0d7ba0c5b375-dirty #925
        Hardware name: Quanta Tioga Pass Single Side 01-0030993006/Tioga Pass Single Side, BIOS F08_3A18 12/20/2018
        Call Trace:
         dump_stack+0x78/0xa0
         check_noncircular+0x165/0x180
         __lock_acquire+0x1272/0x2310
         lock_acquire+0x9e/0x360
         ? __btrfs_tree_read_lock+0x39/0x180
         ? btrfs_root_node+0x1c/0x1d0
         down_read_nested+0x3e/0x140
         ? __btrfs_tree_read_lock+0x39/0x180
         __btrfs_tree_read_lock+0x39/0x180
         __btrfs_read_lock_root_node+0x3a/0x50
         btrfs_search_slot+0x4bd/0x990
         btrfs_find_root+0x45/0x1b0
         btrfs_read_tree_root+0x61/0x100
         btrfs_get_root_ref.part.50+0x143/0x630
         btrfs_uuid_tree_iterate+0x207/0x314
         ? btree_readpage+0x20/0x20
         btrfs_uuid_rescan_kthread+0x12/0x50
         kthread+0x133/0x150
         ? kthread_create_on_node+0x60/0x60
         ret_from_fork+0x1f/0x30
      
      This problem exists because we have two different rescan threads,
      btrfs_uuid_scan_kthread which creates the uuid tree, and
      btrfs_uuid_tree_iterate that goes through and updates or deletes any out
      of date roots.  The problem is they both do things in different order.
      btrfs_uuid_scan_kthread() reads the tree_root, and then inserts entries
      into the uuid_root.  btrfs_uuid_tree_iterate() scans the uuid_root, but
      then does a btrfs_get_fs_root() which can read from the tree_root.
      
      It's actually easy enough to not be holding the path in
      btrfs_uuid_scan_kthread() when we add a uuid entry, as we already drop
      it further down and re-start the search when we loop.  So simply move
      the path release before we add our entry to the uuid tree.
      
      This also fixes a problem where we're holding a path open after we do
      btrfs_end_transaction(), which has it's own problems.
      
      CC: stable@vger.kernel.org # 4.4+
      Reviewed-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      9771a5cf
    • M
      btrfs: block-group: fix free-space bitmap threshold · e3e39c72
      Marcos Paulo de Souza 提交于
      [BUG]
      After commit 9afc6649 ("btrfs: block-group: refactor how we read one
      block group item"), cache->length is being assigned after calling
      btrfs_create_block_group_cache. This causes a problem since
      set_free_space_tree_thresholds calculates the free-space threshold to
      decide if the free-space tree should convert from extents to bitmaps.
      
      The current code calls set_free_space_tree_thresholds with cache->length
      being 0, which then makes cache->bitmap_high_thresh zero. This implies
      the system will always use bitmap instead of extents, which is not
      desired if the block group is not fragmented.
      
      This behavior can be seen by a test that expects to repair systems
      with FREE_SPACE_EXTENT and FREE_SPACE_BITMAP, but the current code only
      created FREE_SPACE_BITMAP.
      
      [FIX]
      Call set_free_space_tree_thresholds after setting cache->length. There
      is now a WARN_ON in set_free_space_tree_thresholds to help preventing
      the same mistake to happen again in the future.
      
      Link: https://github.com/kdave/btrfs-progs/issues/251
      Fixes: 9afc6649 ("btrfs: block-group: refactor how we read one block group item")
      CC: stable@vger.kernel.org # 5.8+
      Reviewed-by: NQu Wenruo <wqu@suse.com>
      Reviewed-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NMarcos Paulo de Souza <mpdesouza@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      e3e39c72
    • J
      io_uring: clear req->result on IOPOLL re-issue · 56450c20
      Jens Axboe 提交于
      Make sure we clear req->result, which was set to -EAGAIN for retry
      purposes, when moving it to the reissue list. Otherwise we can end up
      retrying a request more than once, which leads to weird results in
      the io-wq handling (and other spots).
      
      Cc: stable@vger.kernel.org
      Reported-by: NAndres Freund <andres@anarazel.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      56450c20
    • J
      io_uring: make offset == -1 consistent with preadv2/pwritev2 · 0fef9483
      Jens Axboe 提交于
      The man page for io_uring generally claims were consistent with what
      preadv2 and pwritev2 accept, but turns out there's a slight discrepancy
      in how offset == -1 is handled for pipes/streams. preadv doesn't allow
      it, but preadv2 does. This currently causes io_uring to return -EINVAL
      if that is attempted, but we should allow that as documented.
      
      This change makes us consistent with preadv2/pwritev2 for just passing
      in a NULL ppos for streams if the offset is -1.
      
      Cc: stable@vger.kernel.org # v5.7+
      Reported-by: NBenedikt Ames <wisp3rwind@posteo.eu>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      0fef9483
  4. 26 8月, 2020 3 次提交
  5. 25 8月, 2020 2 次提交
  6. 24 8月, 2020 6 次提交
    • J
      ceph: fix inode number handling on arches with 32-bit ino_t · ebce3eb2
      Jeff Layton 提交于
      Tuan and Ulrich mentioned that they were hitting a problem on s390x,
      which has a 32-bit ino_t value, even though it's a 64-bit arch (for
      historical reasons).
      
      I think the current handling of inode numbers in the ceph driver is
      wrong. It tries to use 32-bit inode numbers on 32-bit arches, but that's
      actually not a problem. 32-bit arches can deal with 64-bit inode numbers
      just fine when userland code is compiled with LFS support (the common
      case these days).
      
      What we really want to do is just use 64-bit numbers everywhere, unless
      someone has mounted with the ino32 mount option. In that case, we want
      to ensure that we hash the inode number down to something that will fit
      in 32 bits before presenting the value to userland.
      
      Add new helper functions that do this, and only do the conversion before
      presenting these values to userland in getattr and readdir.
      
      The inode table hashvalue is changed to just cast the inode number to
      unsigned long, as low-order bits are the most likely to vary anyway.
      
      While it's not strictly required, we do want to put something in
      inode->i_ino. Instead of basing it on BITS_PER_LONG, however, base it on
      the size of the ino_t type.
      
      NOTE: This is a user-visible change on 32-bit arches:
      
      1/ inode numbers will be seen to have changed between kernel versions.
         32-bit arches will see large inode numbers now instead of the hashed
         ones they saw before.
      
      2/ any really old software not built with LFS support may start failing
         stat() calls with -EOVERFLOW on inode numbers >2^32. Nothing much we
         can do about these, but hopefully the intersection of people running
         such code on ceph will be very small.
      
      The workaround for both problems is to mount with "-o ino32".
      
      [ idryomov: changelog tweak ]
      
      URL: https://tracker.ceph.com/issues/46828Reported-by: NUlrich Weigand <Ulrich.Weigand@de.ibm.com>
      Reported-and-Tested-by: NTuan Hoang1 <Tuan.Hoang1@ibm.com>
      Signed-off-by: NJeff Layton <jlayton@kernel.org>
      Reviewed-by: N"Yan, Zheng" <zyan@redhat.com>
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      ebce3eb2
    • B
      gfs2: add some much needed cleanup for log flushes that fail · 462582b9
      Bob Peterson 提交于
      When a log flush fails due to io errors, it signals the failure but does
      not clean up after itself very well. This is because buffers are added to
      the transaction tr_buf and tr_databuf queue, but the io error causes
      gfs2_log_flush to bypass the "after_commit" functions responsible for
      dequeueing the bd elements. If the bd elements are added to the ail list
      before the error, function ail_drain takes care of dequeueing them.
      But if they haven't gotten that far, the elements are forgotten and
      make the transactions unable to be freed.
      
      This patch introduces new function trans_drain which drains the bd
      elements from the transaction so they can be freed properly.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      462582b9
    • M
      binfmt_flat: revert "binfmt_flat: don't offset the data start" · 2217b982
      Max Filippov 提交于
      binfmt_flat loader uses the gap between text and data to store data
      segment pointers for the libraries. Even in the absence of shared
      libraries it stores at least one pointer to the executable's own data
      segment. Text and data can go back to back in the flat binary image and
      without offsetting data segment last few instructions in the text
      segment may get corrupted by the data segment pointer.
      
      Fix it by reverting commit a2357223 ("binfmt_flat: don't offset the
      data start").
      
      Cc: stable@vger.kernel.org
      Fixes: a2357223 ("binfmt_flat: don't offset the data start")
      Signed-off-by: NMax Filippov <jcmvbkbc@gmail.com>
      Signed-off-by: NGreg Ungerer <gerg@linux-m68k.org>
      2217b982
    • G
      treewide: Use fallthrough pseudo-keyword · df561f66
      Gustavo A. R. Silva 提交于
      Replace the existing /* fall through */ comments and its variants with
      the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
      fall-through markings when it is the case.
      
      [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-throughSigned-off-by: NGustavo A. R. Silva <gustavoars@kernel.org>
      df561f66
    • P
      io-wq: fix hang after cancelling pending hashed work · 204361a7
      Pavel Begunkov 提交于
      Don't forget to update wqe->hash_tail after cancelling a pending work
      item, if it was hashed.
      
      Cc: stable@vger.kernel.org # 5.7+
      Reported-by: NDmitry Shulyak <yashulyak@gmail.com>
      Fixes: 86f3cd1b ("io-wq: handle hashed writes in chains")
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      204361a7
    • J
      io_uring: don't recurse on tsk->sighand->siglock with signalfd · fd7d6de2
      Jens Axboe 提交于
      If an application is doing reads on signalfd, and we arm the poll handler
      because there's no data available, then the wakeup can recurse on the
      tasks sighand->siglock as the signal delivery from task_work_add() will
      use TWA_SIGNAL and that attempts to lock it again.
      
      We can detect the signalfd case pretty easily by comparing the poll->head
      wait_queue_head_t with the target task signalfd wait queue. Just use
      normal task wakeup for this case.
      
      Cc: stable@vger.kernel.org # v5.7+
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      fd7d6de2
  7. 23 8月, 2020 2 次提交
  8. 22 8月, 2020 3 次提交
    • D
      afs: Fix NULL deref in afs_dynroot_depopulate() · 5e0b17b0
      David Howells 提交于
      If an error occurs during the construction of an afs superblock, it's
      possible that an error occurs after a superblock is created, but before
      we've created the root dentry.  If the superblock has a dynamic root
      (ie.  what's normally mounted on /afs), the afs_kill_super() will call
      afs_dynroot_depopulate() to unpin any created dentries - but this will
      oops if the root hasn't been created yet.
      
      Fix this by skipping that bit of code if there is no root dentry.
      
      This leads to an oops looking like:
      
      	general protection fault, ...
      	KASAN: null-ptr-deref in range [0x0000000000000068-0x000000000000006f]
      	...
      	RIP: 0010:afs_dynroot_depopulate+0x25f/0x529 fs/afs/dynroot.c:385
      	...
      	Call Trace:
      	 afs_kill_super+0x13b/0x180 fs/afs/super.c:535
      	 deactivate_locked_super+0x94/0x160 fs/super.c:335
      	 afs_get_tree+0x1124/0x1460 fs/afs/super.c:598
      	 vfs_get_tree+0x89/0x2f0 fs/super.c:1547
      	 do_new_mount fs/namespace.c:2875 [inline]
      	 path_mount+0x1387/0x2070 fs/namespace.c:3192
      	 do_mount fs/namespace.c:3205 [inline]
      	 __do_sys_mount fs/namespace.c:3413 [inline]
      	 __se_sys_mount fs/namespace.c:3390 [inline]
      	 __x64_sys_mount+0x27f/0x300 fs/namespace.c:3390
      	 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
      	 entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      which is oopsing on this line:
      
      	inode_lock(root->d_inode);
      
      presumably because sb->s_root was NULL.
      
      Fixes: 0da0b7fd ("afs: Display manually added cells in dynamic root mount")
      Reported-by: syzbot+c1eff8205244ae7e11a6@syzkaller.appspotmail.com
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5e0b17b0
    • P
      squashfs: avoid bio_alloc() failure with 1Mbyte blocks · f26044c8
      Phillip Lougher 提交于
      This is a regression introduced by the patch "migrate from ll_rw_block
      usage to BIO".
      
      Bio_alloc() is limited to 256 pages (1 Mbyte).  This can cause a failure
      when reading 1 Mbyte block filesystems.  The problem is a datablock can be
      fully (or almost uncompressed), requiring 256 pages, but, because blocks
      are not aligned to page boundaries, it may require 257 pages to read.
      
      Bio_kmalloc() can handle 1024 pages, and so use this for the edge
      condition.
      
      Fixes: 93e72b3c ("squashfs: migrate from ll_rw_block usage to BIO")
      Reported-by: NNicolas Prochazka <nicolas.prochazka@gmail.com>
      Reported-by: NTomoatsu Shimada <shimada@walbrix.com>
      Signed-off-by: NPhillip Lougher <phillip@squashfs.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: NGuenter Roeck <groeck@chromium.org>
      Cc: Philippe Liard <pliard@google.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Adrien Schildknecht <adrien+dev@schischi.me>
      Cc: Daniel Rosenberg <drosen@google.com>
      Cc: <stable@vger.kernel.org>
      Link: http://lkml.kernel.org/r/20200815035637.15319-1-phillip@squashfs.org.ukSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f26044c8
    • J
      romfs: fix uninitialized memory leak in romfs_dev_read() · bcf85fce
      Jann Horn 提交于
      romfs has a superblock field that limits the size of the filesystem; data
      beyond that limit is never accessed.
      
      romfs_dev_read() fetches a caller-supplied number of bytes from the
      backing device.  It returns 0 on success or an error code on failure;
      therefore, its API can't represent short reads, it's all-or-nothing.
      
      However, when romfs_dev_read() detects that the requested operation would
      cross the filesystem size limit, it currently silently truncates the
      requested number of bytes.  This e.g.  means that when the content of a
      file with size 0x1000 starts one byte before the filesystem size limit,
      ->readpage() will only fill a single byte of the supplied page while
      leaving the rest uninitialized, leaking that uninitialized memory to
      userspace.
      
      Fix it by returning an error code instead of truncating the read when the
      requested read operation would go beyond the end of the filesystem.
      
      Fixes: da4458bd ("NOMMU: Make it possible for RomFS to use MTD devices directly")
      Signed-off-by: NJann Horn <jannh@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: David Howells <dhowells@redhat.com>
      Cc: <stable@vger.kernel.org>
      Link: http://lkml.kernel.org/r/20200818013202.2246365-1-jannh@google.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bcf85fce
  9. 21 8月, 2020 3 次提交
    • B
      btrfs: detect nocow for swap after snapshot delete · a84d5d42
      Boris Burkov 提交于
      can_nocow_extent and btrfs_cross_ref_exist both rely on a heuristic for
      detecting a must cow condition which is not exactly accurate, but saves
      unnecessary tree traversal. The incorrect assumption is that if the
      extent was created in a generation smaller than the last snapshot
      generation, it must be referenced by that snapshot. That is true, except
      the snapshot could have since been deleted, without affecting the last
      snapshot generation.
      
      The original patch claimed a performance win from this check, but it
      also leads to a bug where you are unable to use a swapfile if you ever
      snapshotted the subvolume it's in. Make the check slower and more strict
      for the swapon case, without modifying the general cow checks as a
      compromise. Turning swap on does not seem to be a particularly
      performance sensitive operation, so incurring a possibly unnecessary
      btrfs_search_slot seems worthwhile for the added usability.
      
      Note: Until the snapshot is competely cleaned after deletion,
      check_committed_refs will still cause the logic to think that cow is
      necessary, so the user must until 'btrfs subvolu sync' finished before
      activating the swapfile swapon.
      
      CC: stable@vger.kernel.org # 5.4+
      Suggested-by: NOmar Sandoval <osandov@osandov.com>
      Signed-off-by: NBoris Burkov <boris@bur.io>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      a84d5d42
    • J
      btrfs: check the right error variable in btrfs_del_dir_entries_in_log · fb2fecba
      Josef Bacik 提交于
      With my new locking code dbench is so much faster that I tripped over a
      transaction abort from ENOSPC.  This turned out to be because
      btrfs_del_dir_entries_in_log was checking for ret == -ENOSPC, but this
      function sets err on error, and returns err.  So instead of properly
      marking the inode as needing a full commit, we were returning -ENOSPC
      and aborting in __btrfs_unlink_inode.  Fix this by checking the proper
      variable so that we return the correct thing in the case of ENOSPC.
      
      The ENOENT needs to be checked, because btrfs_lookup_dir_item_index()
      can return -ENOENT if the dir item isn't in the tree log (which would
      happen if we hadn't fsync'ed this guy).  We actually handle that case in
      __btrfs_unlink_inode, so it's an expected error to get back.
      
      Fixes: 4a500fd1 ("Btrfs: Metadata ENOSPC handling for tree log")
      CC: stable@vger.kernel.org # 4.4+
      Reviewed-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      [ add note and comment about ENOENT ]
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      fb2fecba
    • D
      afs: Fix key ref leak in afs_put_operation() · ba8e4207
      David Howells 提交于
      The afs_put_operation() function needs to put the reference to the key
      that's authenticating the operation.
      
      Fixes: e49c7b2f ("afs: Build an abstraction around an "operation" concept")
      Reported-by: NDave Botsch <botsch@cnf.cornell.edu>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ba8e4207
  10. 20 8月, 2020 8 次提交
    • P
      io_uring: kill extra iovec=NULL in import_iovec() · 867a23ea
      Pavel Begunkov 提交于
      If io_import_iovec() returns an error, return iovec is undefined and
      must not be used, so don't set it to NULL when failing.
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      867a23ea
    • P
      io_uring: comment on kfree(iovec) checks · f261c168
      Pavel Begunkov 提交于
      kfree() handles NULL pointers well, but io_{read,write}() checks it
      because of performance reasons. Leave a comment there for those who are
      tempted to patch it.
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      f261c168
    • P
      io_uring: fix racy req->flags modification · bb175342
      Pavel Begunkov 提交于
      Setting and clearing REQ_F_OVERFLOW in io_uring_cancel_files() and
      io_cqring_overflow_flush() are racy, because they might be called
      asynchronously.
      
      REQ_F_OVERFLOW flag in only needed for files cancellation, so if it can
      be guaranteed that requests _currently_ marked inflight can't be
      overflown, the problem will be solved with removing the flag
      altogether.
      
      That's how the patch works, it removes inflight status of a request
      in io_cqring_fill_event() whenever it should be thrown into CQ-overflow
      list. That's Ok to do, because no opcode specific handling can be done
      after io_cqring_fill_event(), the same assumption as with "struct
      io_completion" patches.
      And it already have a good place for such cleanups, which is
      io_clean_op(). A nice side effect of this is removing this inflight
      check from the hot path.
      
      note on synchronisation: now __io_cqring_fill_event() may be taking two
      spinlocks simultaneously, completion_lock and inflight_lock. It's fine,
      because we never do that in reverse order, and CQ-overflow of inflight
      requests shouldn't happen often.
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      bb175342
    • J
      io_uring: use system_unbound_wq for ring exit work · fc666777
      Jens Axboe 提交于
      We currently use system_wq, which is unbounded in terms of number of
      workers. This means that if we're exiting tons of rings at the same
      time, then we'll briefly spawn tons of event kworkers just for a very
      short blocking time as the rings exit.
      
      Use system_unbound_wq instead, which has a sane cap on the concurrency
      level.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      fc666777
    • F
      btrfs: fix space cache memory leak after transaction abort · bbc37d6e
      Filipe Manana 提交于
      If a transaction aborts it can cause a memory leak of the pages array of
      a block group's io_ctl structure. The following steps explain how that can
      happen:
      
      1) Transaction N is committing, currently in state TRANS_STATE_UNBLOCKED
         and it's about to start writing out dirty extent buffers;
      
      2) Transaction N + 1 already started and another task, task A, just called
         btrfs_commit_transaction() on it;
      
      3) Block group B was dirtied (extents allocated from it) by transaction
         N + 1, so when task A calls btrfs_start_dirty_block_groups(), at the
         very beginning of the transaction commit, it starts writeback for the
         block group's space cache by calling btrfs_write_out_cache(), which
         allocates the pages array for the block group's io_ctl with a call to
         io_ctl_init(). Block group A is added to the io_list of transaction
         N + 1 by btrfs_start_dirty_block_groups();
      
      4) While transaction N's commit is writing out the extent buffers, it gets
         an IO error and aborts transaction N, also setting the file system to
         RO mode;
      
      5) Task A has already returned from btrfs_start_dirty_block_groups(), is at
         btrfs_commit_transaction() and has set transaction N + 1 state to
         TRANS_STATE_COMMIT_START. Immediately after that it checks that the
         filesystem was turned to RO mode, due to transaction N's abort, and
         jumps to the "cleanup_transaction" label. After that we end up at
         btrfs_cleanup_one_transaction() which calls btrfs_cleanup_dirty_bgs().
         That helper finds block group B in the transaction's io_list but it
         never releases the pages array of the block group's io_ctl, resulting in
         a memory leak.
      
      In fact at the point when we are at btrfs_cleanup_dirty_bgs(), the pages
      array points to pages that were already released by us at
      __btrfs_write_out_cache() through the call to io_ctl_drop_pages(). We end
      up freeing the pages array only after waiting for the ordered extent to
      complete through btrfs_wait_cache_io(), which calls io_ctl_free() to do
      that. But in the transaction abort case we don't wait for the space cache's
      ordered extent to complete through a call to btrfs_wait_cache_io(), so
      that's why we end up with a memory leak - we wait for the ordered extent
      to complete indirectly by shutting down the work queues and waiting for
      any jobs in them to complete before returning from close_ctree().
      
      We can solve the leak simply by freeing the pages array right after
      releasing the pages (with the call to io_ctl_drop_pages()) at
      __btrfs_write_out_cache(), since we will never use it anymore after that
      and the pages array points to already released pages at that point, which
      is currently not a problem since no one will use it after that, but not a
      good practice anyway since it can easily lead to use-after-free issues.
      
      So fix this by freeing the pages array right after releasing the pages at
      __btrfs_write_out_cache().
      
      This issue can often be reproduced with test case generic/475 from fstests
      and kmemleak can detect it and reports it with the following trace:
      
      unreferenced object 0xffff9bbf009fa600 (size 512):
        comm "fsstress", pid 38807, jiffies 4298504428 (age 22.028s)
        hex dump (first 32 bytes):
          00 a0 7c 4d 3d ed ff ff 40 a0 7c 4d 3d ed ff ff  ..|M=...@.|M=...
          80 a0 7c 4d 3d ed ff ff c0 a0 7c 4d 3d ed ff ff  ..|M=.....|M=...
        backtrace:
          [<00000000f4b5cfe2>] __kmalloc+0x1a8/0x3e0
          [<0000000028665e7f>] io_ctl_init+0xa7/0x120 [btrfs]
          [<00000000a1f95b2d>] __btrfs_write_out_cache+0x86/0x4a0 [btrfs]
          [<00000000207ea1b0>] btrfs_write_out_cache+0x7f/0xf0 [btrfs]
          [<00000000af21f534>] btrfs_start_dirty_block_groups+0x27b/0x580 [btrfs]
          [<00000000c3c23d44>] btrfs_commit_transaction+0xa6f/0xe70 [btrfs]
          [<000000009588930c>] create_subvol+0x581/0x9a0 [btrfs]
          [<000000009ef2fd7f>] btrfs_mksubvol+0x3fb/0x4a0 [btrfs]
          [<00000000474e5187>] __btrfs_ioctl_snap_create+0x119/0x1a0 [btrfs]
          [<00000000708ee349>] btrfs_ioctl_snap_create_v2+0xb0/0xf0 [btrfs]
          [<00000000ea60106f>] btrfs_ioctl+0x12c/0x3130 [btrfs]
          [<000000005c923d6d>] __x64_sys_ioctl+0x83/0xb0
          [<0000000043ace2c9>] do_syscall_64+0x33/0x80
          [<00000000904efbce>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      CC: stable@vger.kernel.org # 4.9+
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      bbc37d6e
    • D
      btrfs: use the correct const function attribute for btrfs_get_num_csums · 604997b4
      David Sterba 提交于
      The build robot reports
      
      compiler: h8300-linux-gcc (GCC) 9.3.0
         In file included from fs/btrfs/tests/extent-map-tests.c:8:
      >> fs/btrfs/tests/../ctree.h:2166:8: warning: type qualifiers ignored on function return type [-Wignored-qualifiers]
          2166 | size_t __const btrfs_get_num_csums(void);
               |        ^~~~~~~
      
      The function attribute for const does not follow the expected scheme and
      in this case is confused with a const type qualifier.
      Reported-by: Nkernel test robot <lkp@intel.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      604997b4
    • M
      btrfs: reset compression level for lzo on remount · 282dd7d7
      Marcos Paulo de Souza 提交于
      Currently a user can set mount "-o compress" which will set the
      compression algorithm to zlib, and use the default compress level for
      zlib (3):
      
        relatime,compress=zlib:3,space_cache
      
      If the user remounts the fs using "-o compress=lzo", then the old
      compress_level is used:
      
        relatime,compress=lzo:3,space_cache
      
      But lzo does not expose any tunable compression level. The same happens
      if we set any compress argument with different level, also with zstd.
      
      Fix this by resetting the compress_level when compress=lzo is
      specified.  With the fix applied, lzo is shown without compress level:
      
        relatime,compress=lzo,space_cache
      
      CC: stable@vger.kernel.org # 4.4+
      Signed-off-by: NMarcos Paulo de Souza <mpdesouza@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      282dd7d7
    • J
      btrfs: handle errors from async submission · c965d640
      Johannes Thumshirn 提交于
      Btrfs' async submit mechanism is able to handle errors in the submission
      path and the meta-data async submit function correctly passes the error
      code to the caller.
      
      In btrfs_submit_bio_start() and btrfs_submit_bio_start_direct_io() we're
      not handling the errors returned by btrfs_csum_one_bio() correctly though
      and simply call BUG_ON(). This is unnecessary as the caller of these two
      functions - run_one_async_start - correctly checks for the return values
      and sets the status of the async_submit_bio. The actual bio submission
      will be handled later on by run_one_async_done only if
      async_submit_bio::status is 0, so the data won't be written if we
      encountered an error in the checksum process.
      
      Simply return the error from btrfs_csum_one_bio() to the async submitters,
      like it's done in btree_submit_bio_start().
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      c965d640