1. 05 10月, 2019 16 次提交
    • R
      ext4: fix warning inside ext4_convert_unwritten_extents_endio · 775e3e73
      Rakesh Pandit 提交于
      commit e3d550c2c4f2f3dba469bc3c4b83d9332b4e99e1 upstream.
      
      Really enable warning when CONFIG_EXT4_DEBUG is set and fix missing
      first argument.  This was introduced in commit ff95ec22 ("ext4:
      add warning to ext4_convert_unwritten_extents_endio") and splitting
      extents inside endio would trigger it.
      
      Fixes: ff95ec22 ("ext4: add warning to ext4_convert_unwritten_extents_endio")
      Signed-off-by: NRakesh Pandit <rakesh@tuxera.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      775e3e73
    • F
      Btrfs: fix race setting up and completing qgroup rescan workers · bacff03b
      Filipe Manana 提交于
      commit 13fc1d271a2e3ab8a02071e711add01fab9271f6 upstream.
      
      There is a race between setting up a qgroup rescan worker and completing
      a qgroup rescan worker that can lead to callers of the qgroup rescan wait
      ioctl to either not wait for the rescan worker to complete or to hang
      forever due to missing wake ups. The following diagram shows a sequence
      of steps that illustrates the race.
      
              CPU 1                                                         CPU 2                                  CPU 3
      
       btrfs_ioctl_quota_rescan()
        btrfs_qgroup_rescan()
         qgroup_rescan_init()
          mutex_lock(&fs_info->qgroup_rescan_lock)
          spin_lock(&fs_info->qgroup_lock)
      
          fs_info->qgroup_flags |=
            BTRFS_QGROUP_STATUS_FLAG_RESCAN
      
          init_completion(
            &fs_info->qgroup_rescan_completion)
      
          fs_info->qgroup_rescan_running = true
      
          mutex_unlock(&fs_info->qgroup_rescan_lock)
          spin_unlock(&fs_info->qgroup_lock)
      
          btrfs_init_work()
           --> starts the worker
      
                                                              btrfs_qgroup_rescan_worker()
                                                               mutex_lock(&fs_info->qgroup_rescan_lock)
      
                                                               fs_info->qgroup_flags &=
                                                                 ~BTRFS_QGROUP_STATUS_FLAG_RESCAN
      
                                                               mutex_unlock(&fs_info->qgroup_rescan_lock)
      
                                                               starts transaction, updates qgroup status
                                                               item, etc
      
                                                                                                                 btrfs_ioctl_quota_rescan()
                                                                                                                  btrfs_qgroup_rescan()
                                                                                                                   qgroup_rescan_init()
                                                                                                                    mutex_lock(&fs_info->qgroup_rescan_lock)
                                                                                                                    spin_lock(&fs_info->qgroup_lock)
      
                                                                                                                    fs_info->qgroup_flags |=
                                                                                                                      BTRFS_QGROUP_STATUS_FLAG_RESCAN
      
                                                                                                                    init_completion(
                                                                                                                      &fs_info->qgroup_rescan_completion)
      
                                                                                                                    fs_info->qgroup_rescan_running = true
      
                                                                                                                    mutex_unlock(&fs_info->qgroup_rescan_lock)
                                                                                                                    spin_unlock(&fs_info->qgroup_lock)
      
                                                                                                                    btrfs_init_work()
                                                                                                                     --> starts another worker
      
                                                               mutex_lock(&fs_info->qgroup_rescan_lock)
      
                                                               fs_info->qgroup_rescan_running = false
      
                                                               mutex_unlock(&fs_info->qgroup_rescan_lock)
      
      							 complete_all(&fs_info->qgroup_rescan_completion)
      
      Before the rescan worker started by the task at CPU 3 completes, if
      another task calls btrfs_ioctl_quota_rescan(), it will get -EINPROGRESS
      because the flag BTRFS_QGROUP_STATUS_FLAG_RESCAN is set at
      fs_info->qgroup_flags, which is expected and correct behaviour.
      
      However if other task calls btrfs_ioctl_quota_rescan_wait() before the
      rescan worker started by the task at CPU 3 completes, it will return
      immediately without waiting for the new rescan worker to complete,
      because fs_info->qgroup_rescan_running is set to false by CPU 2.
      
      This race is making test case btrfs/171 (from fstests) to fail often:
      
        btrfs/171 9s ... - output mismatch (see /home/fdmanana/git/hub/xfstests/results//btrfs/171.out.bad)
      #      --- tests/btrfs/171.out     2018-09-16 21:30:48.505104287 +0100
      #      +++ /home/fdmanana/git/hub/xfstests/results//btrfs/171.out.bad      2019-09-19 02:01:36.938486039 +0100
      #      @@ -1,2 +1,3 @@
      #       QA output created by 171
      #      +ERROR: quota rescan failed: Operation now in progress
      #       Silence is golden
      #      ...
      #      (Run 'diff -u /home/fdmanana/git/hub/xfstests/tests/btrfs/171.out /home/fdmanana/git/hub/xfstests/results//btrfs/171.out.bad'  to see the entire diff)
      
      That is because the test calls the btrfs-progs commands "qgroup quota
      rescan -w", "qgroup assign" and "qgroup remove" in a sequence that makes
      calls to the rescan start ioctl fail with -EINPROGRESS (note the "btrfs"
      commands 'qgroup assign' and 'qgroup remove' often call the rescan start
      ioctl after calling the qgroup assign ioctl,
      btrfs_ioctl_qgroup_assign()), since previous waits didn't actually wait
      for a rescan worker to complete.
      
      Another problem the race can cause is missing wake ups for waiters,
      since the call to complete_all() happens outside a critical section and
      after clearing the flag BTRFS_QGROUP_STATUS_FLAG_RESCAN. In the sequence
      diagram above, if we have a waiter for the first rescan task (executed
      by CPU 2), then fs_info->qgroup_rescan_completion.wait is not empty, and
      if after the rescan worker clears BTRFS_QGROUP_STATUS_FLAG_RESCAN and
      before it calls complete_all() against
      fs_info->qgroup_rescan_completion, the task at CPU 3 calls
      init_completion() against fs_info->qgroup_rescan_completion which
      re-initilizes its wait queue to an empty queue, therefore causing the
      rescan worker at CPU 2 to call complete_all() against an empty queue,
      never waking up the task waiting for that rescan worker.
      
      Fix this by clearing BTRFS_QGROUP_STATUS_FLAG_RESCAN and setting
      fs_info->qgroup_rescan_running to false in the same critical section,
      delimited by the mutex fs_info->qgroup_rescan_lock, as well as doing the
      call to complete_all() in that same critical section. This gives the
      protection needed to avoid rescan wait ioctl callers not waiting for a
      running rescan worker and the lost wake ups problem, since setting that
      rescan flag and boolean as well as initializing the wait queue is done
      already in a critical section delimited by that mutex (at
      qgroup_rescan_init()).
      
      Fixes: 57254b6e ("Btrfs: add ioctl to wait for qgroup rescan completion")
      Fixes: d2c609b8 ("btrfs: properly track when rescan worker is running")
      CC: stable@vger.kernel.org # 4.4+
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bacff03b
    • Q
      btrfs: qgroup: Fix reserved data space leak if we have multiple reserve calls · b5c42ef0
      Qu Wenruo 提交于
      commit d4e204948fe3e0dc8e1fbf3f8f3290c9c2823be3 upstream.
      
      [BUG]
      The following script can cause btrfs qgroup data space leak:
      
        mkfs.btrfs -f $dev
        mount $dev -o nospace_cache $mnt
      
        btrfs subv create $mnt/subv
        btrfs quota en $mnt
        btrfs quota rescan -w $mnt
        btrfs qgroup limit 128m $mnt/subv
      
        for (( i = 0; i < 3; i++)); do
                # Create 3 64M holes for latter fallocate to fail
                truncate -s 192m $mnt/subv/file
                xfs_io -c "pwrite 64m 4k" $mnt/subv/file > /dev/null
                xfs_io -c "pwrite 128m 4k" $mnt/subv/file > /dev/null
                sync
      
                # it's supposed to fail, and each failure will leak at least 64M
                # data space
                xfs_io -f -c "falloc 0 192m" $mnt/subv/file &> /dev/null
                rm $mnt/subv/file
                sync
        done
      
        # Shouldn't fail after we removed the file
        xfs_io -f -c "falloc 0 64m" $mnt/subv/file
      
      [CAUSE]
      Btrfs qgroup data reserve code allow multiple reservations to happen on
      a single extent_changeset:
      E.g:
      	btrfs_qgroup_reserve_data(inode, &data_reserved, 0, SZ_1M);
      	btrfs_qgroup_reserve_data(inode, &data_reserved, SZ_1M, SZ_2M);
      	btrfs_qgroup_reserve_data(inode, &data_reserved, 0, SZ_4M);
      
      Btrfs qgroup code has its internal tracking to make sure we don't
      double-reserve in above example.
      
      The only pattern utilizing this feature is in the main while loop of
      btrfs_fallocate() function.
      
      However btrfs_qgroup_reserve_data()'s error handling has a bug in that
      on error it clears all ranges in the io_tree with EXTENT_QGROUP_RESERVED
      flag but doesn't free previously reserved bytes.
      
      This bug has a two fold effect:
      - Clearing EXTENT_QGROUP_RESERVED ranges
        This is the correct behavior, but it prevents
        btrfs_qgroup_check_reserved_leak() to catch the leakage as the
        detector is purely EXTENT_QGROUP_RESERVED flag based.
      
      - Leak the previously reserved data bytes.
      
      The bug manifests when N calls to btrfs_qgroup_reserve_data are made and
      the last one fails, leaking space reserved in the previous ones.
      
      [FIX]
      Also free previously reserved data bytes when btrfs_qgroup_reserve_data
      fails.
      
      Fixes: 52472553 ("btrfs: qgroup: Introduce btrfs_qgroup_reserve_data function")
      CC: stable@vger.kernel.org # 4.4+
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b5c42ef0
    • Q
      btrfs: qgroup: Fix the wrong target io_tree when freeing reserved data space · c521bfa8
      Qu Wenruo 提交于
      commit bab32fc069ce8829c416e8737c119f62a57970f9 upstream.
      
      [BUG]
      Under the following case with qgroup enabled, if some error happened
      after we have reserved delalloc space, then in error handling path, we
      could cause qgroup data space leakage:
      
      From btrfs_truncate_block() in inode.c:
      
      	ret = btrfs_delalloc_reserve_space(inode, &data_reserved,
      					   block_start, blocksize);
      	if (ret)
      		goto out;
      
       again:
      	page = find_or_create_page(mapping, index, mask);
      	if (!page) {
      		btrfs_delalloc_release_space(inode, data_reserved,
      					     block_start, blocksize, true);
      		btrfs_delalloc_release_extents(BTRFS_I(inode), blocksize, true);
      		ret = -ENOMEM;
      		goto out;
      	}
      
      [CAUSE]
      In the above case, btrfs_delalloc_reserve_space() will call
      btrfs_qgroup_reserve_data() and mark the io_tree range with
      EXTENT_QGROUP_RESERVED flag.
      
      In the error handling path, we have the following call stack:
      btrfs_delalloc_release_space()
      |- btrfs_free_reserved_data_space()
         |- btrsf_qgroup_free_data()
            |- __btrfs_qgroup_release_data(reserved=@reserved, free=1)
               |- qgroup_free_reserved_data(reserved=@reserved)
                  |- clear_record_extent_bits();
                  |- freed += changeset.bytes_changed;
      
      However due to a completion bug, qgroup_free_reserved_data() will clear
      EXTENT_QGROUP_RESERVED flag in BTRFS_I(inode)->io_failure_tree, other
      than the correct BTRFS_I(inode)->io_tree.
      Since io_failure_tree is never marked with that flag,
      btrfs_qgroup_free_data() will not free any data reserved space at all,
      causing a leakage.
      
      This type of error handling can only be triggered by errors outside of
      qgroup code. So EDQUOT error from qgroup can't trigger it.
      
      [FIX]
      Fix the wrong target io_tree.
      Reported-by: NJosef Bacik <josef@toxicpanda.com>
      Fixes: bc42bda2 ("btrfs: qgroup: Fix qgroup reserved space underflow by only freeing reserved ranges")
      CC: stable@vger.kernel.org # 4.14+
      Reviewed-by: NNikolay Borisov <nborisov@suse.com>
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c521bfa8
    • N
      btrfs: Relinquish CPUs in btrfs_compare_trees · 067f82a0
      Nikolay Borisov 提交于
      commit 6af112b11a4bc1b560f60a618ac9c1dcefe9836e upstream.
      
      When doing any form of incremental send the parent and the child trees
      need to be compared via btrfs_compare_trees. This  can result in long
      loop chains without ever relinquishing the CPU. This causes softlockup
      detector to trigger when comparing trees with a lot of items. Example
      report:
      
      watchdog: BUG: soft lockup - CPU#0 stuck for 24s! [snapperd:16153]
      CPU: 0 PID: 16153 Comm: snapperd Not tainted 5.2.9-1-default #1 openSUSE Tumbleweed (unreleased)
      Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
      pstate: 40000005 (nZcv daif -PAN -UAO)
      pc : __ll_sc_arch_atomic_sub_return+0x14/0x20
      lr : btrfs_release_extent_buffer_pages+0xe0/0x1e8 [btrfs]
      sp : ffff00001273b7e0
      Call trace:
       __ll_sc_arch_atomic_sub_return+0x14/0x20
       release_extent_buffer+0xdc/0x120 [btrfs]
       free_extent_buffer.part.0+0xb0/0x118 [btrfs]
       free_extent_buffer+0x24/0x30 [btrfs]
       btrfs_release_path+0x4c/0xa0 [btrfs]
       btrfs_free_path.part.0+0x20/0x40 [btrfs]
       btrfs_free_path+0x24/0x30 [btrfs]
       get_inode_info+0xa8/0xf8 [btrfs]
       finish_inode_if_needed+0xe0/0x6d8 [btrfs]
       changed_cb+0x9c/0x410 [btrfs]
       btrfs_compare_trees+0x284/0x648 [btrfs]
       send_subvol+0x33c/0x520 [btrfs]
       btrfs_ioctl_send+0x8a0/0xaf0 [btrfs]
       btrfs_ioctl+0x199c/0x2288 [btrfs]
       do_vfs_ioctl+0x4b0/0x820
       ksys_ioctl+0x84/0xb8
       __arm64_sys_ioctl+0x28/0x38
       el0_svc_common.constprop.0+0x7c/0x188
       el0_svc_handler+0x34/0x90
       el0_svc+0x8/0xc
      
      Fix this by adding a call to cond_resched at the beginning of the main
      loop in btrfs_compare_trees.
      
      Fixes: 7069830a ("Btrfs: add btrfs_compare_trees function")
      CC: stable@vger.kernel.org # 4.4+
      Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: NNikolay Borisov <nborisov@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      067f82a0
    • F
      Btrfs: fix use-after-free when using the tree modification log · b08344be
      Filipe Manana 提交于
      commit efad8a853ad2057f96664328a0d327a05ce39c76 upstream.
      
      At ctree.c:get_old_root(), we are accessing a root's header owner field
      after we have freed the respective extent buffer. This results in an
      use-after-free that can lead to crashes, and when CONFIG_DEBUG_PAGEALLOC
      is set, results in a stack trace like the following:
      
        [ 3876.799331] stack segment: 0000 [#1] SMP DEBUG_PAGEALLOC PTI
        [ 3876.799363] CPU: 0 PID: 15436 Comm: pool Not tainted 5.3.0-rc3-btrfs-next-54 #1
        [ 3876.799385] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014
        [ 3876.799433] RIP: 0010:btrfs_search_old_slot+0x652/0xd80 [btrfs]
        (...)
        [ 3876.799502] RSP: 0018:ffff9f08c1a2f9f0 EFLAGS: 00010286
        [ 3876.799518] RAX: ffff8dd300000000 RBX: ffff8dd85a7a9348 RCX: 000000038da26000
        [ 3876.799538] RDX: 0000000000000000 RSI: ffffe522ce368980 RDI: 0000000000000246
        [ 3876.799559] RBP: dae1922adadad000 R08: 0000000008020000 R09: ffffe522c0000000
        [ 3876.799579] R10: ffff8dd57fd788c8 R11: 000000007511b030 R12: ffff8dd781ddc000
        [ 3876.799599] R13: ffff8dd9e6240578 R14: ffff8dd6896f7a88 R15: ffff8dd688cf90b8
        [ 3876.799620] FS:  00007f23ddd97700(0000) GS:ffff8dda20200000(0000) knlGS:0000000000000000
        [ 3876.799643] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        [ 3876.799660] CR2: 00007f23d4024000 CR3: 0000000710bb0005 CR4: 00000000003606f0
        [ 3876.799682] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        [ 3876.799703] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        [ 3876.799723] Call Trace:
        [ 3876.799735]  ? do_raw_spin_unlock+0x49/0xc0
        [ 3876.799749]  ? _raw_spin_unlock+0x24/0x30
        [ 3876.799779]  resolve_indirect_refs+0x1eb/0xc80 [btrfs]
        [ 3876.799810]  find_parent_nodes+0x38d/0x1180 [btrfs]
        [ 3876.799841]  btrfs_check_shared+0x11a/0x1d0 [btrfs]
        [ 3876.799870]  ? extent_fiemap+0x598/0x6e0 [btrfs]
        [ 3876.799895]  extent_fiemap+0x598/0x6e0 [btrfs]
        [ 3876.799913]  do_vfs_ioctl+0x45a/0x700
        [ 3876.799926]  ksys_ioctl+0x70/0x80
        [ 3876.799938]  ? trace_hardirqs_off_thunk+0x1a/0x20
        [ 3876.799953]  __x64_sys_ioctl+0x16/0x20
        [ 3876.799965]  do_syscall_64+0x62/0x220
        [ 3876.799977]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
        [ 3876.799993] RIP: 0033:0x7f23e0013dd7
        (...)
        [ 3876.800056] RSP: 002b:00007f23ddd96ca8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
        [ 3876.800078] RAX: ffffffffffffffda RBX: 00007f23d80210f8 RCX: 00007f23e0013dd7
        [ 3876.800099] RDX: 00007f23d80210f8 RSI: 00000000c020660b RDI: 0000000000000003
        [ 3876.800626] RBP: 000055fa2a2a2440 R08: 0000000000000000 R09: 00007f23ddd96d7c
        [ 3876.801143] R10: 00007f23d8022000 R11: 0000000000000246 R12: 00007f23ddd96d80
        [ 3876.801662] R13: 00007f23ddd96d78 R14: 00007f23d80210f0 R15: 00007f23ddd96d80
        (...)
        [ 3876.805107] ---[ end trace e53161e179ef04f9 ]---
      
      Fix that by saving the root's header owner field into a local variable
      before freeing the root's extent buffer, and then use that local variable
      when needed.
      
      Fixes: 30b0463a ("Btrfs: fix accessing the root pointer in tree mod log functions")
      CC: stable@vger.kernel.org # 3.10+
      Reviewed-by: NNikolay Borisov <nborisov@suse.com>
      Reviewed-by: NAnand Jain <anand.jain@oracle.com>
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b08344be
    • C
      btrfs: fix allocation of free space cache v1 bitmap pages · 4874c6fe
      Christophe Leroy 提交于
      commit 3acd48507dc43eeeb0a1fe965b8bad91cab904a7 upstream.
      
      Various notifications of type "BUG kmalloc-4096 () : Redzone
      overwritten" have been observed recently in various parts of the kernel.
      After some time, it has been made a relation with the use of BTRFS
      filesystem and with SLUB_DEBUG turned on.
      
      [   22.809700] BUG kmalloc-4096 (Tainted: G        W        ): Redzone overwritten
      
      [   22.810286] INFO: 0xbe1a5921-0xfbfc06cd. First byte 0x0 instead of 0xcc
      [   22.810866] INFO: Allocated in __load_free_space_cache+0x588/0x780 [btrfs] age=22 cpu=0 pid=224
      [   22.811193] 	__slab_alloc.constprop.26+0x44/0x70
      [   22.811345] 	kmem_cache_alloc_trace+0xf0/0x2ec
      [   22.811588] 	__load_free_space_cache+0x588/0x780 [btrfs]
      [   22.811848] 	load_free_space_cache+0xf4/0x1b0 [btrfs]
      [   22.812090] 	cache_block_group+0x1d0/0x3d0 [btrfs]
      [   22.812321] 	find_free_extent+0x680/0x12a4 [btrfs]
      [   22.812549] 	btrfs_reserve_extent+0xec/0x220 [btrfs]
      [   22.812785] 	btrfs_alloc_tree_block+0x178/0x5f4 [btrfs]
      [   22.813032] 	__btrfs_cow_block+0x150/0x5d4 [btrfs]
      [   22.813262] 	btrfs_cow_block+0x194/0x298 [btrfs]
      [   22.813484] 	commit_cowonly_roots+0x44/0x294 [btrfs]
      [   22.813718] 	btrfs_commit_transaction+0x63c/0xc0c [btrfs]
      [   22.813973] 	close_ctree+0xf8/0x2a4 [btrfs]
      [   22.814107] 	generic_shutdown_super+0x80/0x110
      [   22.814250] 	kill_anon_super+0x18/0x30
      [   22.814437] 	btrfs_kill_super+0x18/0x90 [btrfs]
      [   22.814590] INFO: Freed in proc_cgroup_show+0xc0/0x248 age=41 cpu=0 pid=83
      [   22.814841] 	proc_cgroup_show+0xc0/0x248
      [   22.814967] 	proc_single_show+0x54/0x98
      [   22.815086] 	seq_read+0x278/0x45c
      [   22.815190] 	__vfs_read+0x28/0x17c
      [   22.815289] 	vfs_read+0xa8/0x14c
      [   22.815381] 	ksys_read+0x50/0x94
      [   22.815475] 	ret_from_syscall+0x0/0x38
      
      Commit 69d24804 ("btrfs: use copy_page for copying pages instead of
      memcpy") changed the way bitmap blocks are copied. But allthough bitmaps
      have the size of a page, they were allocated with kzalloc().
      
      Most of the time, kzalloc() allocates aligned blocks of memory, so
      copy_page() can be used. But when some debug options like SLAB_DEBUG are
      activated, kzalloc() may return unaligned pointer.
      
      On powerpc, memcpy(), copy_page() and other copying functions use
      'dcbz' instruction which provides an entire zeroed cacheline to avoid
      memory read when the intention is to overwrite a full line. Functions
      like memcpy() are writen to care about partial cachelines at the start
      and end of the destination, but copy_page() assumes it gets pages. As
      pages are naturally cache aligned, copy_page() doesn't care about
      partial lines. This means that when copy_page() is called with a
      misaligned pointer, a few leading bytes are zeroed.
      
      To fix it, allocate bitmaps through kmem_cache instead of using kzalloc()
      The cache pool is created with PAGE_SIZE alignment constraint.
      Reported-by: NErhard F. <erhard_f@mailbox.org>
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204371
      Fixes: 69d24804 ("btrfs: use copy_page for copying pages instead of memcpy")
      Cc: stable@vger.kernel.org # 4.19+
      Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      [ rename to btrfs_free_space_bitmap ]
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4874c6fe
    • M
      ovl: filter of trusted xattr results in audit · 934243a7
      Mark Salyzyn 提交于
      commit 5c2e9f346b815841f9bed6029ebcb06415caf640 upstream.
      
      When filtering xattr list for reading, presence of trusted xattr
      results in a security audit log.  However, if there is other content
      no errno will be set, and if there isn't, the errno will be -ENODATA
      and not -EPERM as is usually associated with a lack of capability.
      The check does not block the request to list the xattrs present.
      
      Switch to ns_capable_noaudit to reflect a more appropriate check.
      Signed-off-by: NMark Salyzyn <salyzyn@android.com>
      Cc: linux-security-module@vger.kernel.org
      Cc: kernel-team@android.com
      Cc: stable@vger.kernel.org # v3.18+
      Fixes: a082c6f6 ("ovl: filter trusted xattr for non-admin")
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      934243a7
    • D
      ovl: Fix dereferencing possible ERR_PTR() · e7265adc
      Ding Xiang 提交于
      commit 97f024b9171e74c4443bbe8a8dce31b917f97ac5 upstream.
      
      if ovl_encode_real_fh() fails, no memory was allocated
      and the error in the error-valued pointer should be returned.
      
      Fixes: 9b6faee0 ("ovl: check ERR_PTR() return value from ovl_encode_fh()")
      Signed-off-by: NDing Xiang <dingxiang@cmss.chinamobile.com>
      Cc: <stable@vger.kernel.org> # v4.16+
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e7265adc
    • S
      smb3: allow disabling requesting leases · 2e96c933
      Steve French 提交于
      commit 3e7a02d47872081f4b6234a9f72500f1d10f060c upstream.
      
      In some cases to work around server bugs or performance
      problems it can be helpful to be able to disable requesting
      SMB2.1/SMB3 leases on a particular mount (not to all servers
      and all shares we are mounted to). Add new mount parm
      "nolease" which turns off requesting leases on directory
      or file opens.  Currently the only way to disable leases is
      globally through a module load parameter. This is more
      granular.
      Suggested-by: NPavel Shilovsky <pshilov@microsoft.com>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      Reviewed-by: NRonnie Sahlberg <lsahlber@redhat.com>
      Reviewed-by: NPavel Shilovsky <pshilov@microsoft.com>
      CC: Stable <stable@vger.kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2e96c933
    • B
      gfs2: clear buf_in_tr when ending a transaction in sweep_bh_for_rgrps · e0c1e6e5
      Bob Peterson 提交于
      commit f0b444b349e33ae0d3dd93e25ca365482a5d17d4 upstream.
      
      In function sweep_bh_for_rgrps, which is a helper for punch_hole,
      it uses variable buf_in_tr to keep track of when it needs to commit
      pending block frees on a partial delete that overflows the
      transaction created for the delete. The problem is that the
      variable was initialized at the start of function sweep_bh_for_rgrps
      but it was never cleared, even when starting a new transaction.
      
      This patch reinitializes the variable when the transaction is
      ended, so the next transaction starts out with it cleared.
      
      Fixes: d552a2b9 ("GFS2: Non-recursive delete")
      Cc: stable@vger.kernel.org # v4.12+
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e0c1e6e5
    • K
      binfmt_elf: Do not move brk for INTERP-less ET_EXEC · 29ecf8ca
      Kees Cook 提交于
      commit 7be3cb019db1cbd5fd5ffe6d64a23fefa4b6f229 upstream.
      
      When brk was moved for binaries without an interpreter, it should have
      been limited to ET_DYN only. In other words, the special case was an
      ET_DYN that lacks an INTERP, not just an executable that lacks INTERP.
      The bug manifested for giant static executables, where the brk would end
      up in the middle of the text area on 32-bit architectures.
      Reported-and-tested-by: NRichard Kojedzinszky <richard@kojedz.in>
      Fixes: bbdc6076d2e5 ("binfmt_elf: move brk out of mmap when doing direct loader exec")
      Cc: stable@vger.kernel.org
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      29ecf8ca
    • V
      fuse: fix missing unlock_page in fuse_writepage() · ad411629
      Vasily Averin 提交于
      commit d5880c7a8620290a6c90ced7a0e8bd0ad9419601 upstream.
      
      unlock_page() was missing in case of an already in-flight write against the
      same page.
      Signed-off-by: NVasily Averin <vvs@virtuozzo.com>
      Fixes: ff17be08 ("fuse: writepage: skip already in flight")
      Cc: <stable@vger.kernel.org> # v3.13
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ad411629
    • Y
      ceph: use ceph_evict_inode to cleanup inode's resource · e9bcaf82
      Yan, Zheng 提交于
      [ Upstream commit 87bc5b895d94a0f40fe170d4cf5771c8e8f85d15 ]
      
      remove_session_caps() relies on __wait_on_freeing_inode(), to wait for
      freeing inode to remove its caps. But VFS wakes freeing inode waiters
      before calling destroy_inode().
      
      [ jlayton: mainline moved to ->free_inode before the original patch was
      	   merged. This backport reinstates ceph_destroy_inode and just
      	   has it do the call_rcu call. ]
      
      Cc: stable@vger.kernel.org
      Link: https://tracker.ceph.com/issues/40102Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
      Reviewed-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      e9bcaf82
    • S
      Revert "ceph: use ceph_evict_inode to cleanup inode's resource" · 72f0fff3
      Sasha Levin 提交于
      This reverts commit 81281039.
      
      The backport was incorrect and was causing kernel panics. Revert and
      re-apply a correct backport from Jeff Layton.
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      72f0fff3
    • Q
      btrfs: extent-tree: Make sure we only allocate extents from block groups with the same type · c5dbd74f
      Qu Wenruo 提交于
      [ Upstream commit 2a28468e525f3924efed7f29f2bc5a2926e7e19a ]
      
      [BUG]
      With fuzzed image and MIXED_GROUPS super flag, we can hit the following
      BUG_ON():
      
        kernel BUG at fs/btrfs/delayed-ref.c:491!
        invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
        CPU: 0 PID: 1849 Comm: sync Tainted: G           O      5.2.0-custom #27
        RIP: 0010:update_existing_head_ref.cold+0x44/0x46 [btrfs]
        Call Trace:
         add_delayed_ref_head+0x20c/0x2d0 [btrfs]
         btrfs_add_delayed_tree_ref+0x1fc/0x490 [btrfs]
         btrfs_free_tree_block+0x123/0x380 [btrfs]
         __btrfs_cow_block+0x435/0x500 [btrfs]
         btrfs_cow_block+0x110/0x240 [btrfs]
         btrfs_search_slot+0x230/0xa00 [btrfs]
         ? __lock_acquire+0x105e/0x1e20
         btrfs_insert_empty_items+0x67/0xc0 [btrfs]
         alloc_reserved_file_extent+0x9e/0x340 [btrfs]
         __btrfs_run_delayed_refs+0x78e/0x1240 [btrfs]
         ? kvm_clock_read+0x18/0x30
         ? __sched_clock_gtod_offset+0x21/0x50
         btrfs_run_delayed_refs.part.0+0x4e/0x180 [btrfs]
         btrfs_run_delayed_refs+0x23/0x30 [btrfs]
         btrfs_commit_transaction+0x53/0x9f0 [btrfs]
         btrfs_sync_fs+0x7c/0x1c0 [btrfs]
         ? __ia32_sys_fdatasync+0x20/0x20
         sync_fs_one_sb+0x23/0x30
         iterate_supers+0x95/0x100
         ksys_sync+0x62/0xb0
         __ia32_sys_sync+0xe/0x20
         do_syscall_64+0x65/0x240
         entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      [CAUSE]
      This situation is caused by several factors:
      - Fuzzed image
        The extent tree of this fs missed one backref for extent tree root.
        So we can allocated space from that slot.
      
      - MIXED_BG feature
        Super block has MIXED_BG flag.
      
      - No mixed block groups exists
        All block groups are just regular ones.
      
      This makes data space_info->block_groups[] contains metadata block
      groups.  And when we reserve space for data, we can use space in
      metadata block group.
      
      Then we hit the following file operations:
      
      - fallocate
        We need to allocate data extents.
        find_free_extent() choose to use the metadata block to allocate space
        from, and choose the space of extent tree root, since its backref is
        missing.
      
        This generate one delayed ref head with is_data = 1.
      
      - extent tree update
        We need to update extent tree at run_delayed_ref time.
      
        This generate one delayed ref head with is_data = 0, for the same
        bytenr of old extent tree root.
      
      Then we trigger the BUG_ON().
      
      [FIX]
      The quick fix here is to check block_group->flags before using it.
      
      The problem can only happen for MIXED_GROUPS fs. Regular filesystems
      won't have space_info with DATA|METADATA flag, and no way to hit the
      bug.
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=203255Reported-by: NJungyeon Yoon <jungyeon.yoon@gmail.com>
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      c5dbd74f
  2. 01 10月, 2019 6 次提交
    • C
      f2fs: use generic EFSBADCRC/EFSCORRUPTED · 59a5cea4
      Chao Yu 提交于
      [ Upstream commit 10f966bbf521bb9b2e497bbca496a5141f4071d0 ]
      
      f2fs uses EFAULT as error number to indicate filesystem is corrupted
      all the time, but generic filesystems use EUCLEAN for such condition,
      we need to change to follow others.
      
      This patch adds two new macros as below to wrap more generic error
      code macros, and spread them in code.
      
      EFSBADCRC	EBADMSG		/* Bad CRC detected */
      EFSCORRUPTED	EUCLEAN		/* Filesystem is corrupted */
      Reported-by: NPavel Machek <pavel@ucw.cz>
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Acked-by: NPavel Machek <pavel@ucw.cz>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      59a5cea4
    • D
      xfs: don't crash on null attr fork xfs_bmapi_read · 649836fe
      Darrick J. Wong 提交于
      [ Upstream commit 8612de3f7ba6e900465e340516b8313806d27b2d ]
      
      Zorro Lang reported a crash in generic/475 if we try to inactivate a
      corrupt inode with a NULL attr fork (stack trace shortened somewhat):
      
      RIP: 0010:xfs_bmapi_read+0x311/0xb00 [xfs]
      RSP: 0018:ffff888047f9ed68 EFLAGS: 00010202
      RAX: dffffc0000000000 RBX: ffff888047f9f038 RCX: 1ffffffff5f99f51
      RDX: 0000000000000002 RSI: 0000000000000008 RDI: 0000000000000012
      RBP: ffff888002a41f00 R08: ffffed10005483f0 R09: ffffed10005483ef
      R10: ffffed10005483ef R11: ffff888002a41f7f R12: 0000000000000004
      R13: ffffe8fff53b5768 R14: 0000000000000005 R15: 0000000000000001
      FS:  00007f11d44b5b80(0000) GS:ffff888114200000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000ef6000 CR3: 000000002e176003 CR4: 00000000001606e0
      Call Trace:
       xfs_dabuf_map.constprop.18+0x696/0xe50 [xfs]
       xfs_da_read_buf+0xf5/0x2c0 [xfs]
       xfs_da3_node_read+0x1d/0x230 [xfs]
       xfs_attr_inactive+0x3cc/0x5e0 [xfs]
       xfs_inactive+0x4c8/0x5b0 [xfs]
       xfs_fs_destroy_inode+0x31b/0x8e0 [xfs]
       destroy_inode+0xbc/0x190
       xfs_bulkstat_one_int+0xa8c/0x1200 [xfs]
       xfs_bulkstat_one+0x16/0x20 [xfs]
       xfs_bulkstat+0x6fa/0xf20 [xfs]
       xfs_ioc_bulkstat+0x182/0x2b0 [xfs]
       xfs_file_ioctl+0xee0/0x12a0 [xfs]
       do_vfs_ioctl+0x193/0x1000
       ksys_ioctl+0x60/0x90
       __x64_sys_ioctl+0x6f/0xb0
       do_syscall_64+0x9f/0x4d0
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x7f11d39a3e5b
      
      The "obvious" cause is that the attr ifork is null despite the inode
      claiming an attr fork having at least one extent, but it's not so
      obvious why we ended up with an inode in that state.
      Reported-by: NZorro Lang <zlang@redhat.com>
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204031Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NBill O'Donnell <billodo@redhat.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      649836fe
    • C
      f2fs: fix to do sanity check on segment bitmap of LFS curseg · 303f6d6b
      Chao Yu 提交于
      [ Upstream commit c854f4d681365498f53ba07843a16423625aa7e9 ]
      
      As Jungyeon Reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=203233
      
      - Reproduces
      gcc poc_13.c
      ./run.sh f2fs
      
      - Kernel messages
       F2FS-fs (sdb): Bitmap was wrongly set, blk:4608
       kernel BUG at fs/f2fs/segment.c:2133!
       RIP: 0010:update_sit_entry+0x35d/0x3e0
       Call Trace:
        f2fs_allocate_data_block+0x16c/0x5a0
        do_write_page+0x57/0x100
        f2fs_do_write_node_page+0x33/0xa0
        __write_node_page+0x270/0x4e0
        f2fs_sync_node_pages+0x5df/0x670
        f2fs_write_checkpoint+0x364/0x13a0
        f2fs_sync_fs+0xa3/0x130
        f2fs_do_sync_file+0x1a6/0x810
        do_fsync+0x33/0x60
        __x64_sys_fsync+0xb/0x10
        do_syscall_64+0x43/0x110
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      The testcase fails because that, in fuzzed image, current segment was
      allocated with LFS type, its .next_blkoff should point to an unused
      block address, but actually, its bitmap shows it's not. So during
      allocation, f2fs crash when setting bitmap.
      
      Introducing sanity_check_curseg() to check such inconsistence of
      current in-used segment.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      303f6d6b
    • C
      Revert "f2fs: avoid out-of-range memory access" · 73d90f57
      Chao Yu 提交于
      [ Upstream commit a37d0862d17411edb67677a580a6f505ec2225f6 ]
      
      As Pavel Machek reported:
      
      "We normally use -EUCLEAN to signal filesystem corruption. Plus, it is
      good idea to report it to the syslog and mark filesystem as "needing
      fsck" if filesystem can do that."
      
      Still we need improve the original patch with:
      - use unlikely keyword
      - add message print
      - return EUCLEAN
      
      However, after rethink this patch, I don't think we should add such
      condition check here as below reasons:
      - We have already checked the field in f2fs_sanity_check_ckpt(),
      - If there is fs corrupt or security vulnerability, there is nothing
      to guarantee the field is integrated after the check, unless we do
      the check before each of its use, however no filesystem does that.
      - We only have similar check for bitmap, which was added due to there
      is bitmap corruption happened on f2fs' runtime in product.
      - There are so many key fields in SB/CP/NAT did have such check
      after f2fs_sanity_check_{sb,cp,..}.
      
      So I propose to revert this unneeded check.
      
      This reverts commit 56f3ce675103e3fb9e631cfb4131fc768bc23e9a.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      73d90f57
    • S
      f2fs: check all the data segments against all node ones · ef2baa03
      Surbhi Palande 提交于
      [ Upstream commit 1166c1f2f69117ad254189ca781287afa6e550b6 ]
      
      As a part of the sanity checking while mounting, distinct segment number
      assignment to data and node segments is verified. Fixing a small bug in
      this verification between node and data segments. We need to check all
      the data segments with all the node segments.
      
      Fixes: 042be0f849e5f ("f2fs: fix to do sanity check with current segment number")
      Signed-off-by: NSurbhi Palande <csurbhi@gmail.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      ef2baa03
    • A
      CIFS: fix deadlock in cached root handling · e867ef11
      Aurelien Aptel 提交于
      commit 7e5a70ad88b1e6f6d9b934b2efb41afff496820f upstream.
      
      Prevent deadlock between open_shroot() and
      cifs_mark_open_files_invalid() by releasing the lock before entering
      SMB2_open, taking it again after and checking if we still need to use
      the result.
      
      Link: https://lore.kernel.org/linux-cifs/684ed01c-cbca-2716-bc28-b0a59a0f8521@prodrive-technologies.com/T/#u
      Fixes: 3d4ef9a1 ("smb3: fix redundant opens on root")
      Signed-off-by: NAurelien Aptel <aaptel@suse.com>
      Reviewed-by: NPavel Shilovsky <pshilov@microsoft.com>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      CC: Stable <stable@vger.kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      e867ef11
  3. 21 9月, 2019 9 次提交
  4. 19 9月, 2019 2 次提交
    • R
      ubifs: Correctly use tnc_next() in search_dh_cookie() · 72cd230b
      Richard Weinberger 提交于
      commit bacfa94b08027b9f66ede7044972e3b066766b3e upstream.
      
      Commit c877154d fixed an uninitialized variable and optimized
      the function to not call tnc_next() in the first iteration of the
      loop. While this seemed perfectly legit and wise, it turned out to
      be illegal.
      If the lookup function does not find an exact match it will rewind
      the cursor by 1.
      The rewinded cursor will not match the name hash we are looking for
      and this results in a spurious -ENOENT.
      So we need to move to the next entry in case of an non-exact match,
      but not if the match was exact.
      
      While we are here, update the documentation to avoid further confusion.
      
      Cc: Hyunchul Lee <hyc.lee@gmail.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Fixes: c877154d ("ubifs: Fix uninitialized variable in search_dh_cookie()")
      Fixes: 781f675e ("ubifs: Fix unlink code wrt. double hash lookups")
      Signed-off-by: NRichard Weinberger <richard@nod.at>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      72cd230b
    • F
      Btrfs: fix assertion failure during fsync and use of stale transaction · 7cbd49cf
      Filipe Manana 提交于
      commit 410f954cb1d1c79ae485dd83a175f21954fd87cd upstream.
      
      Sometimes when fsync'ing a file we need to log that other inodes exist and
      when we need to do that we acquire a reference on the inodes and then drop
      that reference using iput() after logging them.
      
      That generally is not a problem except if we end up doing the final iput()
      (dropping the last reference) on the inode and that inode has a link count
      of 0, which can happen in a very short time window if the logging path
      gets a reference on the inode while it's being unlinked.
      
      In that case we end up getting the eviction callback, btrfs_evict_inode(),
      invoked through the iput() call chain which needs to drop all of the
      inode's items from its subvolume btree, and in order to do that, it needs
      to join a transaction at the helper function evict_refill_and_join().
      However because the task previously started a transaction at the fsync
      handler, btrfs_sync_file(), it has current->journal_info already pointing
      to a transaction handle and therefore evict_refill_and_join() will get
      that transaction handle from btrfs_join_transaction(). From this point on,
      two different problems can happen:
      
      1) evict_refill_and_join() will often change the transaction handle's
         block reserve (->block_rsv) and set its ->bytes_reserved field to a
         value greater than 0. If evict_refill_and_join() never commits the
         transaction, the eviction handler ends up decreasing the reference
         count (->use_count) of the transaction handle through the call to
         btrfs_end_transaction(), and after that point we have a transaction
         handle with a NULL ->block_rsv (which is the value prior to the
         transaction join from evict_refill_and_join()) and a ->bytes_reserved
         value greater than 0. If after the eviction/iput completes the inode
         logging path hits an error or it decides that it must fallback to a
         transaction commit, the btrfs fsync handle, btrfs_sync_file(), gets a
         non-zero value from btrfs_log_dentry_safe(), and because of that
         non-zero value it tries to commit the transaction using a handle with
         a NULL ->block_rsv and a non-zero ->bytes_reserved value. This makes
         the transaction commit hit an assertion failure at
         btrfs_trans_release_metadata() because ->bytes_reserved is not zero but
         the ->block_rsv is NULL. The produced stack trace for that is like the
         following:
      
         [192922.917158] assertion failed: !trans->bytes_reserved, file: fs/btrfs/transaction.c, line: 816
         [192922.917553] ------------[ cut here ]------------
         [192922.917922] kernel BUG at fs/btrfs/ctree.h:3532!
         [192922.918310] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC PTI
         [192922.918666] CPU: 2 PID: 883 Comm: fsstress Tainted: G        W         5.1.4-btrfs-next-47 #1
         [192922.919035] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014
         [192922.919801] RIP: 0010:assfail.constprop.25+0x18/0x1a [btrfs]
         (...)
         [192922.920925] RSP: 0018:ffffaebdc8a27da8 EFLAGS: 00010286
         [192922.921315] RAX: 0000000000000051 RBX: ffff95c9c16a41c0 RCX: 0000000000000000
         [192922.921692] RDX: 0000000000000000 RSI: ffff95cab6b16838 RDI: ffff95cab6b16838
         [192922.922066] RBP: ffff95c9c16a41c0 R08: 0000000000000000 R09: 0000000000000000
         [192922.922442] R10: ffffaebdc8a27e70 R11: 0000000000000000 R12: ffff95ca731a0980
         [192922.922820] R13: 0000000000000000 R14: ffff95ca84c73338 R15: ffff95ca731a0ea8
         [192922.923200] FS:  00007f337eda4e80(0000) GS:ffff95cab6b00000(0000) knlGS:0000000000000000
         [192922.923579] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
         [192922.923948] CR2: 00007f337edad000 CR3: 00000001e00f6002 CR4: 00000000003606e0
         [192922.924329] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
         [192922.924711] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
         [192922.925105] Call Trace:
         [192922.925505]  btrfs_trans_release_metadata+0x10c/0x170 [btrfs]
         [192922.925911]  btrfs_commit_transaction+0x3e/0xaf0 [btrfs]
         [192922.926324]  btrfs_sync_file+0x44c/0x490 [btrfs]
         [192922.926731]  do_fsync+0x38/0x60
         [192922.927138]  __x64_sys_fdatasync+0x13/0x20
         [192922.927543]  do_syscall_64+0x60/0x1c0
         [192922.927939]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
         (...)
         [192922.934077] ---[ end trace f00808b12068168f ]---
      
      2) If evict_refill_and_join() decides to commit the transaction, it will
         be able to do it, since the nested transaction join only increments the
         transaction handle's ->use_count reference counter and it does not
         prevent the transaction from getting committed. This means that after
         eviction completes, the fsync logging path will be using a transaction
         handle that refers to an already committed transaction. What happens
         when using such a stale transaction can be unpredictable, we are at
         least having a use-after-free on the transaction handle itself, since
         the transaction commit will call kmem_cache_free() against the handle
         regardless of its ->use_count value, or we can end up silently losing
         all the updates to the log tree after that iput() in the logging path,
         or using a transaction handle that in the meanwhile was allocated to
         another task for a new transaction, etc, pretty much unpredictable
         what can happen.
      
      In order to fix both of them, instead of using iput() during logging, use
      btrfs_add_delayed_iput(), so that the logging path of fsync never drops
      the last reference on an inode, that step is offloaded to a safe context
      (usually the cleaner kthread).
      
      The assertion failure issue was sporadically triggered by the test case
      generic/475 from fstests, which loads the dm error target while fsstress
      is running, which lead to fsync failing while logging inodes with -EIO
      errors and then trying later to commit the transaction, triggering the
      assertion failure.
      
      CC: stable@vger.kernel.org # 4.4+
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7cbd49cf
  5. 16 9月, 2019 7 次提交
    • C
      ext4: unsigned int compared against zero · ff693225
      Colin Ian King 提交于
      [ Upstream commit fbbbbd2f28aec991f3fbc248df211550fbdfd58c ]
      
      There are two cases where u32 variables n and err are being checked
      for less than zero error values, the checks is always false because
      the variables are not signed. Fix this by making the variables ints.
      
      Addresses-Coverity: ("Unsigned compared against 0")
      Fixes: 345c0dbf3a30 ("ext4: protect journal inode's blocks using block_validity")
      Signed-off-by: NColin Ian King <colin.king@canonical.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      ff693225
    • T
      ext4: fix block validity checks for journal inodes using indirect blocks · 292666d2
      Theodore Ts'o 提交于
      [ Upstream commit 170417c8c7bb2cbbdd949bf5c443c0c8f24a203b ]
      
      Commit 345c0dbf3a30 ("ext4: protect journal inode's blocks using
      block_validity") failed to add an exception for the journal inode in
      ext4_check_blockref(), which is the function used by ext4_get_branch()
      for indirect blocks.  This caused attempts to read from the ext3-style
      journals to fail with:
      
      [  848.968550] EXT4-fs error (device sdb7): ext4_get_branch:171: inode #8: block 30343695: comm jbd2/sdb7-8: invalid block
      
      Fix this by adding the missing exception check.
      
      Fixes: 345c0dbf3a30 ("ext4: protect journal inode's blocks using block_validity")
      Reported-by: NArthur Marsh <arthur.marsh@internode.on.net>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      292666d2
    • T
      ext4: don't perform block validity checks on the journal inode · 97fbf573
      Theodore Ts'o 提交于
      [ Upstream commit 0a944e8a6c66ca04c7afbaa17e22bf208a8b37f0 ]
      
      Since the journal inode is already checked when we added it to the
      block validity's system zone, if we check it again, we'll just trigger
      a failure.
      
      This was causing failures like this:
      
      [   53.897001] EXT4-fs error (device sda): ext4_find_extent:909: inode
      #8: comm jbd2/sda-8: pblk 121667583 bad header/extent: invalid extent entries - magic f30a, entries 8, max 340(340), depth 0(0)
      [   53.931430] jbd2_journal_bmap: journal block not found at offset 49 on sda-8
      [   53.938480] Aborting journal on device sda-8.
      
      ... but only if the system was under enough memory pressure that
      logical->physical mapping for the journal inode gets pushed out of the
      extent cache.  (This is why it wasn't noticed earlier.)
      
      Fixes: 345c0dbf3a30 ("ext4: protect journal inode's blocks using block_validity")
      Reported-by: NDan Rue <dan.rue@linaro.org>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Tested-by: NNaresh Kamboju <naresh.kamboju@linaro.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      97fbf573
    • T
      NFSv4: Fix delegation state recovery · 652993a5
      Trond Myklebust 提交于
      [ Upstream commit 5eb8d18ca0e001c6055da2b7f30d8f6dca23a44f ]
      
      Once we clear the NFS_DELEGATED_STATE flag, we're telling
      nfs_delegation_claim_opens() that we're done recovering all open state
      for that stateid, so we really need to ensure that we test for all
      open modes that are currently cached and recover them before exiting
      nfs4_open_delegation_recall().
      
      Fixes: 24311f88 ("NFSv4: Recovery of recalled read delegations...")
      Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
      Cc: stable@vger.kernel.org # v4.3+
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      652993a5
    • N
      pstore: Fix double-free in pstore_mkfile() failure path · 5e9a2ce6
      Norbert Manthey 提交于
      [ Upstream commit 4c6d80e1144bdf48cae6b602ae30d41f3e5c76a9 ]
      
      The pstore_mkfile() function is passed a pointer to a struct
      pstore_record. On success it consumes this 'record' pointer and
      references it from the created inode.
      
      On failure, however, it may or may not free the record. There are even
      two different code paths which return -ENOMEM -- one of which does and
      the other doesn't free the record.
      
      Make the behaviour deterministic by never consuming and freeing the
      record when returning failure, allowing the caller to do the cleanup
      consistently.
      Signed-off-by: NNorbert Manthey <nmanthey@amazon.de>
      Link: https://lore.kernel.org/r/1562331960-26198-1-git-send-email-nmanthey@amazon.de
      Fixes: 83f70f07 ("pstore: Do not duplicate record metadata")
      Fixes: 1dfff7dd ("pstore: Pass record contents instead of copying")
      Cc: stable@vger.kernel.org
      [kees: also move "private" allocation location, rename inode cleanup label]
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      5e9a2ce6
    • J
      btrfs: correctly validate compression type · 1c13c9c4
      Johannes Thumshirn 提交于
      [ Upstream commit aa53e3bfac7205fb3a8815ac1c937fd6ed01b41e ]
      
      Nikolay reported the following KASAN splat when running btrfs/048:
      
      [ 1843.470920] ==================================================================
      [ 1843.471971] BUG: KASAN: slab-out-of-bounds in strncmp+0x66/0xb0
      [ 1843.472775] Read of size 1 at addr ffff888111e369e2 by task btrfs/3979
      
      [ 1843.473904] CPU: 3 PID: 3979 Comm: btrfs Not tainted 5.2.0-rc3-default #536
      [ 1843.475009] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
      [ 1843.476322] Call Trace:
      [ 1843.476674]  dump_stack+0x7c/0xbb
      [ 1843.477132]  ? strncmp+0x66/0xb0
      [ 1843.477587]  print_address_description+0x114/0x320
      [ 1843.478256]  ? strncmp+0x66/0xb0
      [ 1843.478740]  ? strncmp+0x66/0xb0
      [ 1843.479185]  __kasan_report+0x14e/0x192
      [ 1843.479759]  ? strncmp+0x66/0xb0
      [ 1843.480209]  kasan_report+0xe/0x20
      [ 1843.480679]  strncmp+0x66/0xb0
      [ 1843.481105]  prop_compression_validate+0x24/0x70
      [ 1843.481798]  btrfs_xattr_handler_set_prop+0x65/0x160
      [ 1843.482509]  __vfs_setxattr+0x71/0x90
      [ 1843.483012]  __vfs_setxattr_noperm+0x84/0x130
      [ 1843.483606]  vfs_setxattr+0xac/0xb0
      [ 1843.484085]  setxattr+0x18c/0x230
      [ 1843.484546]  ? vfs_setxattr+0xb0/0xb0
      [ 1843.485048]  ? __mod_node_page_state+0x1f/0xa0
      [ 1843.485672]  ? _raw_spin_unlock+0x24/0x40
      [ 1843.486233]  ? __handle_mm_fault+0x988/0x1290
      [ 1843.486823]  ? lock_acquire+0xb4/0x1e0
      [ 1843.487330]  ? lock_acquire+0xb4/0x1e0
      [ 1843.487842]  ? mnt_want_write_file+0x3c/0x80
      [ 1843.488442]  ? debug_lockdep_rcu_enabled+0x22/0x40
      [ 1843.489089]  ? rcu_sync_lockdep_assert+0xe/0x70
      [ 1843.489707]  ? __sb_start_write+0x158/0x200
      [ 1843.490278]  ? mnt_want_write_file+0x3c/0x80
      [ 1843.490855]  ? __mnt_want_write+0x98/0xe0
      [ 1843.491397]  __x64_sys_fsetxattr+0xba/0xe0
      [ 1843.492201]  ? trace_hardirqs_off_thunk+0x1a/0x1c
      [ 1843.493201]  do_syscall_64+0x6c/0x230
      [ 1843.493988]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [ 1843.495041] RIP: 0033:0x7fa7a8a7707a
      [ 1843.495819] Code: 48 8b 0d 21 de 2b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 be 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ee dd 2b 00 f7 d8 64 89 01 48
      [ 1843.499203] RSP: 002b:00007ffcb73bca38 EFLAGS: 00000202 ORIG_RAX: 00000000000000be
      [ 1843.500210] RAX: ffffffffffffffda RBX: 00007ffcb73bda9d RCX: 00007fa7a8a7707a
      [ 1843.501170] RDX: 00007ffcb73bda9d RSI: 00000000006dc050 RDI: 0000000000000003
      [ 1843.502152] RBP: 00000000006dc050 R08: 0000000000000000 R09: 0000000000000000
      [ 1843.503109] R10: 0000000000000002 R11: 0000000000000202 R12: 00007ffcb73bda91
      [ 1843.504055] R13: 0000000000000003 R14: 00007ffcb73bda82 R15: ffffffffffffffff
      
      [ 1843.505268] Allocated by task 3979:
      [ 1843.505771]  save_stack+0x19/0x80
      [ 1843.506211]  __kasan_kmalloc.constprop.5+0xa0/0xd0
      [ 1843.506836]  setxattr+0xeb/0x230
      [ 1843.507264]  __x64_sys_fsetxattr+0xba/0xe0
      [ 1843.507886]  do_syscall_64+0x6c/0x230
      [ 1843.508429]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      [ 1843.509558] Freed by task 0:
      [ 1843.510188] (stack is not available)
      
      [ 1843.511309] The buggy address belongs to the object at ffff888111e369e0
                      which belongs to the cache kmalloc-8 of size 8
      [ 1843.514095] The buggy address is located 2 bytes inside of
                      8-byte region [ffff888111e369e0, ffff888111e369e8)
      [ 1843.516524] The buggy address belongs to the page:
      [ 1843.517561] page:ffff88813f478d80 refcount:1 mapcount:0 mapping:ffff88811940c300 index:0xffff888111e373b8 compound_mapcount: 0
      [ 1843.519993] flags: 0x4404000010200(slab|head)
      [ 1843.520951] raw: 0004404000010200 ffff88813f48b008 ffff888119403d50 ffff88811940c300
      [ 1843.522616] raw: ffff888111e373b8 000000000016000f 00000001ffffffff 0000000000000000
      [ 1843.524281] page dumped because: kasan: bad access detected
      
      [ 1843.525936] Memory state around the buggy address:
      [ 1843.526975]  ffff888111e36880: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [ 1843.528479]  ffff888111e36900: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [ 1843.530138] >ffff888111e36980: fc fc fc fc fc fc fc fc fc fc fc fc 02 fc fc fc
      [ 1843.531877]                                                        ^
      [ 1843.533287]  ffff888111e36a00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [ 1843.534874]  ffff888111e36a80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [ 1843.536468] ==================================================================
      
      This is caused by supplying a too short compression value ('lz') in the
      test-case and comparing it to 'lzo' with strncmp() and a length of 3.
      strncmp() read past the 'lz' when looking for the 'o' and thus caused an
      out-of-bounds read.
      
      Introduce a new check 'btrfs_compress_is_valid_type()' which not only
      checks the user-supplied value against known compression types, but also
      employs checks for too short values.
      Reported-by: NNikolay Borisov <nborisov@suse.com>
      Fixes: 272e5326c783 ("btrfs: prop: fix vanished compression property after failed set")
      CC: stable@vger.kernel.org # 5.1+
      Reviewed-by: NNikolay Borisov <nborisov@suse.com>
      Signed-off-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      1c13c9c4
    • Y
      ceph: use ceph_evict_inode to cleanup inode's resource · 81281039
      Yan, Zheng 提交于
      [ Upstream commit 87bc5b895d94a0f40fe170d4cf5771c8e8f85d15 ]
      
      remove_session_caps() relies on __wait_on_freeing_inode(), to wait for
      freeing inode to remove its caps. But VFS wakes freeing inode waiters
      before calling destroy_inode().
      
      Cc: stable@vger.kernel.org
      Link: https://tracker.ceph.com/issues/40102Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
      Reviewed-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      81281039