1. 08 2月, 2021 36 次提交
    • P
      io_uring: dont kill fasync under completion_lock · b3a9880e
      Pavel Begunkov 提交于
      stable inclusion
      from stable-5.10.12
      commit 7bccd1c19128140b9fefaa43808924c6932bef5b
      bugzilla: 47876
      
      --------------------------------
      
      [ Upstream commit 4aa84f2f ]
      
            CPU0                    CPU1
             ----                    ----
        lock(&new->fa_lock);
                                     local_irq_disable();
                                     lock(&ctx->completion_lock);
                                     lock(&new->fa_lock);
        <Interrupt>
          lock(&ctx->completion_lock);
      
       *** DEADLOCK ***
      
      Move kill_fasync() out of io_commit_cqring() to io_cqring_ev_posted(),
      so it doesn't hold completion_lock while doing it. That saves from the
      reported deadlock, and it's just nice to shorten the locking time and
      untangle nested locks (compl_lock -> wq_head::lock).
      
      Cc: stable@vger.kernel.org # 5.5+
      Reported-by: syzbot+91ca3f25bd7f795f019c@syzkaller.appspotmail.com
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      b3a9880e
    • P
      io_uring: fix skipping disabling sqo on exec · 417ce493
      Pavel Begunkov 提交于
      stable inclusion
      from stable-5.10.12
      commit 186725a80c4e931b6fe31b94d66c989d5f2354c1
      bugzilla: 47876
      
      --------------------------------
      
      [ Upstream commit 0b5cd6c3 ]
      
      If there are no requests at the time __io_uring_task_cancel() is called,
      tctx_inflight() returns zero and and it terminates not getting a chance
      to go through __io_uring_files_cancel() and do
      io_disable_sqo_submit(). And we absolutely want them disabled by the
      time cancellation ends.
      
      Cc: stable@vger.kernel.org # 5.5+
      Reported-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Fixes: d9d05217 ("io_uring: stop SQPOLL submit on creator's death")
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      417ce493
    • P
      io_uring: fix uring_flush in exit_files() warning · 8940e1a8
      Pavel Begunkov 提交于
      stable inclusion
      from stable-5.10.12
      commit 54b4c4f9aba9e5d1ef6877f42a57895b189107c9
      bugzilla: 47876
      
      --------------------------------
      
      [ Upstream commit 4325cb49 ]
      
      WARNING: CPU: 1 PID: 11100 at fs/io_uring.c:9096
      	io_uring_flush+0x326/0x3a0 fs/io_uring.c:9096
      RIP: 0010:io_uring_flush+0x326/0x3a0 fs/io_uring.c:9096
      Call Trace:
       filp_close+0xb4/0x170 fs/open.c:1280
       close_files fs/file.c:401 [inline]
       put_files_struct fs/file.c:416 [inline]
       put_files_struct+0x1cc/0x350 fs/file.c:413
       exit_files+0x7e/0xa0 fs/file.c:433
       do_exit+0xc22/0x2ae0 kernel/exit.c:820
       do_group_exit+0x125/0x310 kernel/exit.c:922
       get_signal+0x3e9/0x20a0 kernel/signal.c:2770
       arch_do_signal_or_restart+0x2a8/0x1eb0 arch/x86/kernel/signal.c:811
       handle_signal_work kernel/entry/common.c:147 [inline]
       exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
       exit_to_user_mode_prepare+0x148/0x250 kernel/entry/common.c:201
       __syscall_exit_to_user_mode_work kernel/entry/common.c:291 [inline]
       syscall_exit_to_user_mode+0x19/0x50 kernel/entry/common.c:302
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      An SQPOLL ring creator task may have gotten rid of its file note during
      exit and called io_disable_sqo_submit(), but the io_uring is still left
      referenced through fdtable, which will be put during close_files() and
      cause a false positive warning.
      
      First split the warning into two for more clarity when is hit, and the
      add sqo_dead check to handle the described case.
      
      Cc: stable@vger.kernel.org # 5.5+
      Reported-by: syzbot+a32b546d58dde07875a1@syzkaller.appspotmail.com
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      8940e1a8
    • P
      io_uring: fix false positive sqo warning on flush · 79ee5666
      Pavel Begunkov 提交于
      stable inclusion
      from stable-5.10.12
      commit 0682759126bc761c325325ca809ae99c93fda2a0
      bugzilla: 47876
      
      --------------------------------
      
      [ Upstream commit 6b393a1f ]
      
      WARNING: CPU: 1 PID: 9094 at fs/io_uring.c:8884
      	io_disable_sqo_submit+0x106/0x130 fs/io_uring.c:8884
      Call Trace:
       io_uring_flush+0x28b/0x3a0 fs/io_uring.c:9099
       filp_close+0xb4/0x170 fs/open.c:1280
       close_fd+0x5c/0x80 fs/file.c:626
       __do_sys_close fs/open.c:1299 [inline]
       __se_sys_close fs/open.c:1297 [inline]
       __x64_sys_close+0x2f/0xa0 fs/open.c:1297
       do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      io_uring's final close() may be triggered by any task not only the
      creator. It's well handled by io_uring_flush() including SQPOLL case,
      though a warning in io_disable_sqo_submit() will fallaciously fire by
      moving this warning out to the only call site that matters.
      
      Cc: stable@vger.kernel.org # 5.5+
      Reported-by: syzbot+2f5d1785dc624932da78@syzkaller.appspotmail.com
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      79ee5666
    • P
      io_uring: do sqo disable on install_fd error · 43eac3ed
      Pavel Begunkov 提交于
      stable inclusion
      from stable-5.10.12
      commit 8cb6f4da831bc51145aee3a923f03114121dea6b
      bugzilla: 47876
      
      --------------------------------
      
      [ Upstream commit 06585c49 ]
      
      WARNING: CPU: 0 PID: 8494 at fs/io_uring.c:8717
      	io_ring_ctx_wait_and_kill+0x4f2/0x600 fs/io_uring.c:8717
      Call Trace:
       io_uring_release+0x3e/0x50 fs/io_uring.c:8759
       __fput+0x283/0x920 fs/file_table.c:280
       task_work_run+0xdd/0x190 kernel/task_work.c:140
       tracehook_notify_resume include/linux/tracehook.h:189 [inline]
       exit_to_user_mode_loop kernel/entry/common.c:174 [inline]
       exit_to_user_mode_prepare+0x249/0x250 kernel/entry/common.c:201
       __syscall_exit_to_user_mode_work kernel/entry/common.c:291 [inline]
       syscall_exit_to_user_mode+0x19/0x50 kernel/entry/common.c:302
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      failed io_uring_install_fd() is a special case, we don't do
      io_ring_ctx_wait_and_kill() directly but defer it to fput, though still
      need to io_disable_sqo_submit() before.
      
      note: it doesn't fix any real problem, just a warning. That's because
      sqring won't be available to the userspace in this case and so SQPOLL
      won't submit anything.
      
      Cc: stable@vger.kernel.org # 5.5+
      Reported-by: syzbot+9c9c35374c0ecac06516@syzkaller.appspotmail.com
      Fixes: d9d05217 ("io_uring: stop SQPOLL submit on creator's death")
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      43eac3ed
    • P
      io_uring: fix null-deref in io_disable_sqo_submit · 8173ba66
      Pavel Begunkov 提交于
      stable inclusion
      from stable-5.10.12
      commit 0e3562e3b2aeb4a6aa4615185a8f59c51cade61b
      bugzilla: 47876
      
      --------------------------------
      
      [ Upstream commit b4411616 ]
      
      general protection fault, probably for non-canonical address
      	0xdffffc0000000022: 0000 [#1] KASAN: null-ptr-deref
      	in range [0x0000000000000110-0x0000000000000117]
      RIP: 0010:io_ring_set_wakeup_flag fs/io_uring.c:6929 [inline]
      RIP: 0010:io_disable_sqo_submit+0xdb/0x130 fs/io_uring.c:8891
      Call Trace:
       io_uring_create fs/io_uring.c:9711 [inline]
       io_uring_setup+0x12b1/0x38e0 fs/io_uring.c:9739
       do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      io_disable_sqo_submit() might be called before user rings were
      allocated, don't do io_ring_set_wakeup_flag() in those cases.
      
      Cc: stable@vger.kernel.org # 5.5+
      Reported-by: syzbot+ab412638aeb652ded540@syzkaller.appspotmail.com
      Fixes: d9d05217 ("io_uring: stop SQPOLL submit on creator's death")
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      8173ba66
    • P
      io_uring: stop SQPOLL submit on creator's death · 63238c74
      Pavel Begunkov 提交于
      stable inclusion
      from stable-5.10.12
      commit a63d9157571b52f7339d6db4c2ab7bc3bfe527c0
      bugzilla: 47876
      
      --------------------------------
      
      [ Upstream commit d9d05217 ]
      
      When the creator of SQPOLL io_uring dies (i.e. sqo_task), we don't want
      its internals like ->files and ->mm to be poked by the SQPOLL task, it
      have never been nice and recently got racy. That can happen when the
      owner undergoes destruction and SQPOLL tasks tries to submit new
      requests in parallel, and so calls io_sq_thread_acquire*().
      
      That patch halts SQPOLL submissions when sqo_task dies by introducing
      sqo_dead flag. Once set, the SQPOLL task must not do any submission,
      which is synchronised by uring_lock as well as the new flag.
      
      The tricky part is to make sure that disabling always happens, that
      means either the ring is discovered by creator's do_exit() -> cancel,
      or if the final close() happens before it's done by the creator. The
      last is guaranteed by the fact that for SQPOLL the creator task and only
      it holds exactly one file note, so either it pins up to do_exit() or
      removed by the creator on the final put in flush. (see comments in
      uring_flush() around file->f_count == 2).
      
      One more place that can trigger io_sq_thread_acquire_*() is
      __io_req_task_submit(). Shoot off requests on sqo_dead there, even
      though actually we don't need to. That's because cancellation of
      sqo_task should wait for the request before going any further.
      
      note 1: io_disable_sqo_submit() does io_ring_set_wakeup_flag() so the
      caller would enter the ring to get an error, but it still doesn't
      guarantee that the flag won't be cleared.
      
      note 2: if final __userspace__ close happens not from the creator
      task, the file note will pin the ring until the task dies.
      
      Cc: stable@vger.kernel.org # 5.5+
      Fixed: b1b6b5a3 ("kernel/io_uring: cancel io_uring before task works")
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      63238c74
    • P
      io_uring: add warn_once for io_uring_flush() · db9b4d68
      Pavel Begunkov 提交于
      stable inclusion
      from stable-5.10.12
      commit da67631a33c342528245817cc61e36dd945665b0
      bugzilla: 47876
      
      --------------------------------
      
      [ Upstream commit 6b5733eb]
      
      files_cancel() should cancel all relevant requests and drop file notes,
      so we should never have file notes after that, including on-exit fput
      and flush. Add a WARN_ONCE to be sure.
      
      Cc: stable@vger.kernel.org # 5.5+
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      db9b4d68
    • P
      io_uring: inline io_uring_attempt_task_drop() · 8157becf
      Pavel Begunkov 提交于
      stable inclusion
      from stable-5.10.12
      commit 18f31594ee52ed1f364e376767fb839935fd899c
      bugzilla: 47876
      
      --------------------------------
      
      [ Upstream commit 4f793dc4 ]
      
      A simple preparation change inlining io_uring_attempt_task_drop() into
      io_uring_flush().
      
      Cc: stable@vger.kernel.org # 5.5+
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      8157becf
    • P
      kernel/io_uring: cancel io_uring before task works · 56ea41ef
      Pavel Begunkov 提交于
      stable inclusion
      from stable-5.10.12
      commit 7bf3fb6243a3b153ab1854b331ec19d67a4878bb
      bugzilla: 47876
      
      --------------------------------
      
      [ Upstream commit b1b6b5a3 ]
      
      For cancelling io_uring requests it needs either to be able to run
      currently enqueued task_works or having it shut down by that moment.
      Otherwise io_uring_cancel_files() may be waiting for requests that won't
      ever complete.
      
      Go with the first way and do cancellations before setting PF_EXITING and
      so before putting the task_work infrastructure into a transition state
      where task_work_run() would better not be called.
      
      Cc: stable@vger.kernel.org # 5.5+
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      56ea41ef
    • Z
      jffs2: move jffs2_init_inode_info() just after allocating inode · 3f9d6b86
      zhangyi (F) 提交于
      hulk inclusion
      category: bugfix
      bugzilla: 47443
      CVE: NA
      ---------------------------
      
      After commit 4fdcfab5 ("jffs2: fix use-after-free on symlink
      traversal"), it expose a freeing uninitialized memory problem due to
      this commit move the operaion of freeing f->target to
      jffs2_i_callback(), which may not be initialized in some error path of
      allocating jffs2 inode (eg: jffs2_iget()->iget_locked()->
      destroy_inode()->..->jffs2_i_callback()->kfree(f->target)).
      
      Fix this by initialize the jffs2_inode_info just after allocating it.
      Reported-by: NGuohua Zhong <zhongguohua1@huawei.com>
      Reported-by: NHuaijie Yi <yihuaijie@huawei.com>
      Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com>
      Reviewed-by: NYang Erkun <yangerkun@huawei.com>
      [backport from hulk-4.4]
      Signed-off-by: Nyangerkun <yangerkun@huawei.com>
      Reviewed-by: Nzhangyi (F) <yi.zhang@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      3f9d6b86
    • H
      jffs2: protect no-raw-node-ref check of inocache by erase_completion_lock · a7d48da7
      Hou Tao 提交于
      euler inclusion
      category: bugfix
      bugzilla: 47446
      CVE: NA
      --------------------------------------------------
      
      In jffs2_do_clear_inode(), we will check whether or not there is any
      jffs2_raw_node_ref associated with the current inocache. If there
      is no raw-node-ref, the inocache could be freed. And if there are
      still some jffs2_raw_node_ref linked in inocache->nodes, the inocache
      could not be freed and its free will be decided by
      jffs2_remove_node_refs_from_ino_list().
      
      However there is a race between jffs2_do_clear_inode() and
      jffs2_remove_node_refs_from_ino_list() as shown in the following
      scenario:
      
      CPU 0                   CPU 1
      in sys_unlink()         in jffs2_garbage_collect_pass()
      
      jffs2_do_unlink
        f->inocache->pino_nlink = 0
        set_nlink(inode, 0)
      
                              // contains all raw-node-refs of the unlinked inode
                              start GC a jeb
      
      iput_final
      jffs2_evict_inode
      jffs2_do_clear_inode
        acquire f->sem
          mark all refs as obsolete
      
                              GC complete
                              jeb is moved to erase_pending_list
                              jffs2_erase_pending_blocks
                                jffs2_free_jeb_node_refs
                                  jffs2_remove_node_refs_from_ino_list
      
          f->inocache = INO_STATE_CHECKEDABSENT
      
                                    // no raw-node-ref is associated with the
                                    // inocache of the unlinked inode
                                    ic->nodes == (void *)ic && ic->pino_nlink == 0
                                      jffs2_del_ino_cache
      
          f->inodecache->nodes == f->nodes
            // double-free occurs
            jffs2_del_ino_cache
      
      Double-free of inocache will lead to all kinds of weired behaviours. The
      following BUG_ON is one case in which two active inodes are used the same
      inocache (the freed inocache is reused by a new inode, then the inocache
      is double-freed and reused by another new inode):
      
        jffs2: Raw node at 0x006c6000 wasn't in node lists for ino #662249
        ------------[ cut here ]------------
        kernel BUG at fs/jffs2/gc.c:645!
        invalid opcode: 0000 [#1] PREEMPT SMP
        Modules linked in: nandsim
        CPU: 0 PID: 15837 Comm: cp Not tainted 4.4.172 #1
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
        RIP: [<ffffffff816f1256>] jffs2_garbage_collect_live+0x1578/0x1593
        Call Trace:
         [<ffffffff8154b8aa>] jffs2_garbage_collect_pass+0xf6a/0x15d0
         [<ffffffff81541bbd>] jffs2_reserve_space+0x2bd/0x8a0
         [<ffffffff81546a62>] jffs2_do_create+0x52/0x480
         [<ffffffff8153c9f2>] jffs2_create+0xe2/0x2a0
         [<ffffffff8133bed7>] vfs_create+0xe7/0x220
         [<ffffffff81340ab4>] path_openat+0x11f4/0x1c00
         [<ffffffff81343635>] do_filp_open+0xa5/0x140
         [<ffffffff813288ed>] do_sys_open+0x19d/0x320
         [<ffffffff81328a96>] SyS_open+0x26/0x30
         [<ffffffff81c3f8f8>] entry_SYSCALL_64_fastpath+0x18/0x73
        ---[ end trace dd5c02f1653e8cac ]---
      
      Fix it by protecting no-raw-node-ref check by erase_completion_lock.
      And also need to move the call of jffs2_set_inocache_state() under
      erase_completion_lock, else the inocache may be leaked because
      jffs2_del_ino_cache() invoked by jffs2_remove_node_refs_from_ino_list()
      may find the state of inocache is still INO_STATE_CHECKING and will
      not free the inocache.
      
      Link: http://lists.infradead.org/pipermail/linux-mtd/2019-February/087764.htmlSigned-off-by: NHou Tao <houtao1@huawei.com>
      Reviewed-by: NWei Fang <fangwei1@huawei.com>
      Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com>
      [cherry-pick from hulk-4.4]
      Signed-off-by: Nyangerkun <yangerkun@huawei.com>
      Reviewed-by: NHou Tao <houtao1@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      a7d48da7
    • H
      jffs2: handle INO_STATE_CLEARING in jffs2_do_read_inode() · 1b482fce
      Hou Tao 提交于
      hulk inclusion
      category: bugfix
      bugzilla: 47446
      CVE: NA
      --------------------------
      
      For inode that fails to be created midway, GC procedure may
      try to GC its dnode, and in the following case BUG() will be
      triggered:
      
      CPU 0                       CPU 1
      in jffs2_do_create()        in jffs2_garbage_collect_pass()
      
      jffs2_write_dnode succeed
      // for dirent
      jffs2_reserve_space fail
      
      			    inum = ic->ino
      			    nlink = ic->pino_nlink (> 0)
      
      iget_failed
        make_bad_inode
          remove_inode_hash
        iput
          jffs2_evict_inode
            jffs2_do_clear_inode
              jffs2_set_inocache_state(INO_STATE_CLEARING)
      
      			    jffs2_gc_fetch_inode
      			      jffs2_iget
      			        // a new inode is created because
      			        // the old inode had been unhashed
      			        iget_locked
      			      jffs2_do_read_inode
      			        jffs2_get_ino_cache
      				// assert BUG()
      				f->inocache->state = INO_STATE_CLEARING
      
      Fix it by waiting for its state changes to INO_STATE_CHECKEDABSENT.
      
      Link: http://lists.infradead.org/pipermail/linux-mtd/2019-February/087762.htmlSigned-off-by: NHou Tao <houtao1@huawei.com>
      Reviewed-by: NWei Fang <fangwei1@huawei.com>
      Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com>
      [cherry-pick from hulk-4.4]
      Signed-off-by: Nyangerkun <yangerkun@huawei.com>
      Reviewed-by: NHou Tao <houtao1@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      1b482fce
    • H
      jffs2: reset pino_nlink to 0 when inode creation failed · 97a21e98
      Hou Tao 提交于
      hulk inclusion
      category: bugfix
      bugzilla: 47446
      CVE: NA
      -------------------------------------------------
      
      So jffs2_do_clear_inode() could mark all flash nodes used by
      the inode as obsolete and GC procedure will reclaim these
      flash nodes, else these flash spaces will not be reclaimable
      forever.
      
      Link: http://lists.infradead.org/pipermail/linux-mtd/2019-February/087763.htmlSigned-off-by: NHou Tao <houtao1@huawei.com>
      Reviewed-by: NWei Fang <fangwei1@huawei.com>
      Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com>
      Signed-off-by: Nyangerkun <yangerkun@huawei.com>
      Reviewed-by: NHou Tao <houtao1@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      97a21e98
    • K
      jffs2: GC deadlock reading a page that is used in jffs2_write_begin() · 5c27edc5
      Kyeong Yoo 提交于
      hulk inclusion
      category: bugfix
      bugzilla: 47446
      CVE: NA
      -------------------------------------------------
      
      GC task can deadlock in read_cache_page() because it may attempt
      to release a page that is actually allocated by another task in
      jffs2_write_begin().
      The reason is that in jffs2_write_begin() there is a small window
      a cache page is allocated for use but not set Uptodate yet.
      
      This ends up with a deadlock between two tasks:
      1) A task (e.g. file copy)
         - jffs2_write_begin() locks a cache page
         - jffs2_write_end() tries to lock "alloc_sem" from
      	 jffs2_reserve_space() <-- STUCK
      2) GC task (jffs2_gcd_mtd3)
         - jffs2_garbage_collect_pass() locks "alloc_sem"
         - try to lock the same cache page in read_cache_page() <-- STUCK
      
      So to avoid this deadlock, hold "alloc_sem" in jffs2_write_begin()
      while reading data in a cache page.
      Signed-off-by: NKyeong Yoo <kyeong.yoo@alliedtelesis.co.nz>
      Link: http://lists.infradead.org/pipermail/linux-mtd/2017-July/075581.htmlSigned-off-by: NHou Tao <houtao1@huawei.com>
      Reviewed-by: NWei Fang <fangwei1@huawei.com>
      Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com>
      
      [backport from hulk-4.4]
      Conflicts:
      	fs/jffs2/file.c
      Signed-off-by: Nyangerkun <yangerkun@huawei.com>
      Reviewed-by: NHou Tao <houtao1@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      5c27edc5
    • H
      jffs2: make the overwritten xattr invisible after remount · 939350e6
      Hou Tao 提交于
      euler inclusion
      category: bugfix
      bugzilla: 47447
      CVE: NA
      -------------------------------------------------
      
      For xattr modification, we do not write a new jffs2_raw_xref with
      delete marker into flash, so if a xattr is modified then removed,
      and the old xref & xdatum are not erased by GC, after reboot or
      remount, the new xattr xref will be dead but the old xattr xref
      will be alive, and we will get the overwritten xattr instead of
      non-existent error when reading the removed xattr.
      
      Fix it by writing the deletion mark for xattr overwrite.
      
      Fixes: 8a13695c ("[JFFS2][XATTR] rid unnecessary writing of delete marker.")
      Signed-off-by: NHou Tao <houtao1@huawei.com>
      Acked-by: NMiao Xie <miaoxie@huawei.com>
      Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com>
      [cherry-pick from hulk-4.4]
      Signed-off-by: Nyangerkun <yangerkun@huawei.com>
      Reviewed-by: NHou Tao <houtao1@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      939350e6
    • J
      fs/pipe: allow sendfile() to pipe again · 893baf1e
      Johannes Berg 提交于
      stable inclusion
      from stable-5.10.11
      commit e8572713897eb9e4bfaef90bf15d5dd00d7126fc
      bugzilla: 47621
      
      --------------------------------
      
      commit f8ad8187 upstream.
      
      After commit 36e2c742 ("fs: don't allow splice read/write
      without explicit ops") sendfile() could no longer send data
      from a real file to a pipe, breaking for example certain cgit
      setups (e.g. when running behind fcgiwrap), because in this
      case cgit will try to do exactly this: sendfile() to a pipe.
      
      Fix this by using iter_file_splice_write for the splice_write
      method of pipes, as suggested by Christoph.
      
      Cc: stable@vger.kernel.org
      Fixes: 36e2c742 ("fs: don't allow splice read/write without explicit ops")
      Suggested-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Tested-by: NJohannes Berg <johannes@sipsolutions.net>
      Signed-off-by: NJohannes Berg <johannes@sipsolutions.net>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
      893baf1e
    • C
      kernfs: wire up ->splice_read and ->splice_write · d21249e8
      Christoph Hellwig 提交于
      stable inclusion
      from stable-5.10.11
      commit 0b6672fd778cd92caed7206ba520a3f056d10484
      bugzilla: 47621
      
      --------------------------------
      
      commit f2d6c270 upstream.
      
      Wire up the splice_read and splice_write methods to the default
      helpers using ->read_iter and ->write_iter now that those are
      implemented for kernfs.  This restores support to use splice and
      sendfile on kernfs files.
      
      Fixes: 36e2c742 ("fs: don't allow splice read/write without explicit ops")
      Reported-by: NSiddharth Gupta <sidgup@codeaurora.org>
      Tested-by: NSiddharth Gupta <sidgup@codeaurora.org>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Link: https://lore.kernel.org/r/20210120204631.274206-4-hch@lst.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
      d21249e8
    • C
      kernfs: implement ->write_iter · ecbce6b4
      Christoph Hellwig 提交于
      stable inclusion
      from stable-5.10.11
      commit 11167454e9cbfa95856fea3f8e5428b4215a534c
      bugzilla: 47621
      
      --------------------------------
      
      commit cc099e0b upstream.
      
      Switch kernfs to implement the write_iter method instead of plain old
      write to prepare to supporting splice and sendfile again.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Link: https://lore.kernel.org/r/20210120204631.274206-3-hch@lst.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
      ecbce6b4
    • C
      kernfs: implement ->read_iter · 3eee7d1a
      Christoph Hellwig 提交于
      stable inclusion
      from stable-5.10.11
      commit 6ce10b6481cd46040bf3c8f3daec08d3fafa30f4
      bugzilla: 47621
      
      --------------------------------
      
      commit 4eaad21a upstream.
      
      Switch kernfs to implement the read_iter method instead of plain old
      read to prepare to supporting splice and sendfile again.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Link: https://lore.kernel.org/r/20210120204631.274206-2-hch@lst.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
      3eee7d1a
    • T
      cachefiles: Drop superfluous readpages aops NULL check · 9074c8b7
      Takashi Iwai 提交于
      stable inclusion
      from stable-5.10.11
      commit 76e2b0b65d47206754084233d268d57ade2a988e
      bugzilla: 47621
      
      --------------------------------
      
      commit db58465f upstream.
      
      After the recent actions to convert readpages aops to readahead, the
      NULL checks of readpages aops in cachefiles_read_or_alloc_page() may
      hit falsely.  More badly, it's an ASSERT() call, and this panics.
      
      Drop the superfluous NULL checks for fixing this regression.
      
      [DH: Note that cachefiles never actually used readpages, so this check was
       never actually necessary]
      
      BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=208883
      BugLink: https://bugzilla.opensuse.org/show_bug.cgi?id=1175245
      Fixes: 9ae326a6 ("CacheFiles: A cache that backs onto a mounted filesystem")
      Signed-off-by: NTakashi Iwai <tiwai@suse.de>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Acked-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
      9074c8b7
    • P
      io_uring: fix short read retries for non-reg files · 3b0df637
      Pavel Begunkov 提交于
      stable inclusion
      from stable-5.10.11
      commit 2df15ef2a9cc58142d7acf1393db3fe5434f44c2
      bugzilla: 47621
      
      --------------------------------
      
      commit 9a173346 upstream.
      
      Sockets and other non-regular files may actually expect short reads to
      happen, don't retry reads for them. Because non-reg files don't set
      FMODE_BUF_RASYNC and so it won't do second/retry do_read, we can filter
      out those cases after first do_read() attempt with ret>0.
      
      Cc: stable@vger.kernel.org # 5.9+
      Suggested-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
      3b0df637
    • J
      io_uring: fix SQPOLL IORING_OP_CLOSE cancelation state · dd91344e
      Jens Axboe 提交于
      stable inclusion
      from stable-5.10.11
      commit f3ac7a5996d7cd739664c5f71cab4f8da03937e7
      bugzilla: 47621
      
      --------------------------------
      
      commit 607ec89e upstream.
      
      IORING_OP_CLOSE is special in terms of cancelation, since it has an
      intermediate state where we've removed the file descriptor but hasn't
      closed the file yet. For that reason, it's currently marked with
      IO_WQ_WORK_NO_CANCEL to prevent cancelation. This ensures that the op
      is always run even if canceled, to prevent leaving us with a live file
      but an fd that is gone. However, with SQPOLL, since a cancel request
      doesn't carry any resources on behalf of the request being canceled, if
      we cancel before any of the close op has been run, we can end up with
      io-wq not having the ->files assigned. This can result in the following
      oops reported by Joseph:
      
      BUG: kernel NULL pointer dereference, address: 00000000000000d8
      PGD 800000010b76f067 P4D 800000010b76f067 PUD 10b462067 PMD 0
      Oops: 0000 [#1] SMP PTI
      CPU: 1 PID: 1788 Comm: io_uring-sq Not tainted 5.11.0-rc4 #1
      Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      RIP: 0010:__lock_acquire+0x19d/0x18c0
      Code: 00 00 8b 1d fd 56 dd 08 85 db 0f 85 43 05 00 00 48 c7 c6 98 7b 95 82 48 c7 c7 57 96 93 82 e8 9a bc f5 ff 0f 0b e9 2b 05 00 00 <48> 81 3f c0 ca 67 8a b8 00 00 00 00 41 0f 45 c0 89 04 24 e9 81 fe
      RSP: 0018:ffffc90001933828 EFLAGS: 00010002
      RAX: 0000000000000001 RBX: 0000000000000001 RCX: 0000000000000000
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00000000000000d8
      RBP: 0000000000000246 R08: 0000000000000001 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
      R13: 0000000000000000 R14: ffff888106e8a140 R15: 00000000000000d8
      FS:  0000000000000000(0000) GS:ffff88813bd00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00000000000000d8 CR3: 0000000106efa004 CR4: 00000000003706e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       lock_acquire+0x31a/0x440
       ? close_fd_get_file+0x39/0x160
       ? __lock_acquire+0x647/0x18c0
       _raw_spin_lock+0x2c/0x40
       ? close_fd_get_file+0x39/0x160
       close_fd_get_file+0x39/0x160
       io_issue_sqe+0x1334/0x14e0
       ? lock_acquire+0x31a/0x440
       ? __io_free_req+0xcf/0x2e0
       ? __io_free_req+0x175/0x2e0
       ? find_held_lock+0x28/0xb0
       ? io_wq_submit_work+0x7f/0x240
       io_wq_submit_work+0x7f/0x240
       io_wq_cancel_cb+0x161/0x580
       ? io_wqe_wake_worker+0x114/0x360
       ? io_uring_get_socket+0x40/0x40
       io_async_find_and_cancel+0x3b/0x140
       io_issue_sqe+0xbe1/0x14e0
       ? __lock_acquire+0x647/0x18c0
       ? __io_queue_sqe+0x10b/0x5f0
       __io_queue_sqe+0x10b/0x5f0
       ? io_req_prep+0xdb/0x1150
       ? mark_held_locks+0x6d/0xb0
       ? mark_held_locks+0x6d/0xb0
       ? io_queue_sqe+0x235/0x4b0
       io_queue_sqe+0x235/0x4b0
       io_submit_sqes+0xd7e/0x12a0
       ? _raw_spin_unlock_irq+0x24/0x30
       ? io_sq_thread+0x3ae/0x940
       io_sq_thread+0x207/0x940
       ? do_wait_intr_irq+0xc0/0xc0
       ? __ia32_sys_io_uring_enter+0x650/0x650
       kthread+0x134/0x180
       ? kthread_create_worker_on_cpu+0x90/0x90
       ret_from_fork+0x1f/0x30
      
      Fix this by moving the IO_WQ_WORK_NO_CANCEL until _after_ we've modified
      the fdtable. Canceling before this point is totally fine, and running
      it in the io-wq context _after_ that point is also fine.
      
      For 5.12, we'll handle this internally and get rid of the no-cancel
      flag, as IORING_OP_CLOSE is the only user of it.
      
      Cc: stable@vger.kernel.org
      Fixes: b5dba59e ("io_uring: add support for IORING_OP_CLOSE")
      Reported-by: "Abaci <abaci@linux.alibaba.com>"
      Reviewed-and-tested-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
      dd91344e
    • J
      io_uring: iopoll requests should also wake task ->in_idle state · f7aecdbf
      Jens Axboe 提交于
      stable inclusion
      from stable-5.10.11
      commit ca75872dd9f3db7893113b8fca6f2c874a4cbccf
      bugzilla: 47621
      
      --------------------------------
      
      commit c93cc9e1 upstream.
      
      If we're freeing/finishing iopoll requests, ensure we check if the task
      is in idling in terms of cancelation. Otherwise we could end up waiting
      forever in __io_uring_task_cancel() if the task has active iopoll
      requests that need cancelation.
      
      Cc: stable@vger.kernel.org # 5.9+
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
      f7aecdbf
    • X
      proc_sysctl: fix oops caused by incorrect command parameters · f07e87be
      Xiaoming Ni 提交于
      stable inclusion
      from stable-5.10.11
      commit cb5fe25c822057c49e61229a4e83ba27c3e24c17
      bugzilla: 47621
      
      --------------------------------
      
      commit 697edcb0 upstream.
      
      The process_sysctl_arg() does not check whether val is empty before
      invoking strlen(val).  If the command line parameter () is incorrectly
      configured and val is empty, oops is triggered.
      
      For example:
        "hung_task_panic=1" is incorrectly written as "hung_task_panic", oops is
        triggered. The call stack is as follows:
          Kernel command line: .... hung_task_panic
          ......
          Call trace:
          __pi_strlen+0x10/0x98
          parse_args+0x278/0x344
          do_sysctl_args+0x8c/0xfc
          kernel_init+0x5c/0xf4
          ret_from_fork+0x10/0x30
      
      To fix it, check whether "val" is empty when "phram" is a sysctl field.
      Error codes are returned in the failure branch, and error logs are
      generated by parse_args().
      
      Link: https://lkml.kernel.org/r/20210118133029.28580-1-nixiaoming@huawei.com
      Fixes: 3db978d4 ("kernel/sysctl: support setting sysctl parameters from kernel command line")
      Signed-off-by: NXiaoming Ni <nixiaoming@huawei.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Luis Chamberlain <mcgrof@kernel.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Iurii Zaikin <yzaikin@google.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Heiner Kallweit <hkallweit1@gmail.com>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: <stable@vger.kernel.org>	[5.8+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
      f07e87be
    • R
      cifs: do not fail __smb_send_rqst if non-fatal signals are pending · f20287f9
      Ronnie Sahlberg 提交于
      stable inclusion
      from stable-5.10.11
      commit 2edf2c9f3e5e7a6fbeaa40b9a4ef65b4dfc97405
      bugzilla: 47621
      
      --------------------------------
      
      commit 214a5ea0 upstream.
      
      RHBZ 1848178
      
      The original intent of returning an error in this function
      in the patch:
        "CIFS: Mask off signals when sending SMB packets"
      was to avoid interrupting packet send in the middle of
      sending the data (and thus breaking an SMB connection),
      but we also don't want to fail the request for non-fatal
      signals even before we have had a chance to try to
      send it (the reported problem could be reproduced e.g.
      by exiting a child process when the parent process was in
      the midst of calling futimens to update a file's timestamps).
      
      In addition, since the signal may remain pending when we enter the
      sending loop, we may end up not sending the whole packet before
      TCP buffers become full. In this case the code returns -EINTR
      but what we need here is to return -ERESTARTSYS instead to
      allow system calls to be restarted.
      
      Fixes: b30c74c7 ("CIFS: Mask off signals when sending SMB packets")
      Cc: stable@vger.kernel.org # v5.1+
      Signed-off-by: NRonnie Sahlberg <lsahlber@redhat.com>
      Reviewed-by: NPavel Shilovsky <pshilov@microsoft.com>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
      f20287f9
    • J
      btrfs: print the actual offset in btrfs_root_name · c1924092
      Josef Bacik 提交于
      stable inclusion
      from stable-5.10.11
      commit dbba7a38b0074412b22b8ac41092015e1dae12ae
      bugzilla: 47621
      
      --------------------------------
      
      [ Upstream commit 71008734 ]
      
      We're supposed to print the root_key.offset in btrfs_root_name in the
      case of a reloc root, not the objectid.  Fix this helper to take the key
      so we have access to the offset when we need it.
      
      Fixes: 457f1864 ("btrfs: pretty print leaked root name")
      Reviewed-by: NQu Wenruo <wqu@suse.com>
      Reviewed-by: NNikolay Borisov <nborisov@suse.com>
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
      c1924092
    • T
      nfsd: Don't set eof on a truncated READ_PLUS · b26532c0
      Trond Myklebust 提交于
      stable inclusion
      from stable-5.10.11
      commit 6533681890902e3b59bbceaea311760b3791c28d
      bugzilla: 47621
      
      --------------------------------
      
      [ Upstream commit b68f0cbd ]
      
      If the READ_PLUS operation was truncated due to an error, then ensure we
      clear the 'eof' flag.
      
      Fixes: 9f0b5792 ("NFSD: Encode a full READ_PLUS reply")
      Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
      b26532c0
    • T
      nfsd: Fixes for nfsd4_encode_read_plus_data() · 03b89074
      Trond Myklebust 提交于
      stable inclusion
      from stable-5.10.11
      commit de82ec8e5e8cba33f84ebef26478b636e94a90fb
      bugzilla: 47621
      
      --------------------------------
      
      [ Upstream commit 72d78717 ]
      
      Ensure that we encode the data payload + padding, and that we truncate
      the preallocated buffer to the actual read size.
      
      Fixes: 528b8493 ("NFSD: Add READ_PLUS data support")
      Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
      03b89074
    • M
      io_uring: flush timeouts that should already have expired · 057d754e
      Marcelo Diop-Gonzalez 提交于
      stable inclusion
      from stable-5.10.11
      commit 2ca824c79376453e7e3df60437324b36043ff29b
      bugzilla: 47621
      
      --------------------------------
      
      [ Upstream commit f010505b ]
      
      Right now io_flush_timeouts() checks if the current number of events
      is equal to ->timeout.target_seq, but this will miss some timeouts if
      there have been more than 1 event added since the last time they were
      flushed (possible in io_submit_flush_completions(), for example). Fix
      it by recording the last sequence at which timeouts were flushed so
      that the number of events seen can be compared to the number of events
      needed without overflow.
      Signed-off-by: NMarcelo Diop-Gonzalez <marcelo827@gmail.com>
      Reviewed-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
      057d754e
    • E
      fs: fix lazytime expiration handling in __writeback_single_inode() · e14ae526
      Eric Biggers 提交于
      stable inclusion
      from stable-5.10.11
      commit 13ef6bccab397c02d5a48d236316fd5f626f8b01
      bugzilla: 47621
      
      --------------------------------
      
      commit 1e249cb5 upstream.
      
      When lazytime is enabled and an inode is being written due to its
      in-memory updated timestamps having expired, either due to a sync() or
      syncfs() system call or due to dirtytime_expire_interval having elapsed,
      the VFS needs to inform the filesystem so that the filesystem can copy
      the inode's timestamps out to the on-disk data structures.
      
      This is done by __writeback_single_inode() calling
      mark_inode_dirty_sync(), which then calls ->dirty_inode(I_DIRTY_SYNC).
      
      However, this occurs after __writeback_single_inode() has already
      cleared the dirty flags from ->i_state.  This causes two bugs:
      
      - mark_inode_dirty_sync() redirties the inode, causing it to remain
        dirty.  This wastefully causes the inode to be written twice.  But
        more importantly, it breaks cases where sync_filesystem() is expected
        to clean dirty inodes.  This includes the FS_IOC_REMOVE_ENCRYPTION_KEY
        ioctl (as reported at
        https://lore.kernel.org/r/20200306004555.GB225345@gmail.com), as well
        as possibly filesystem freezing (freeze_super()).
      
      - Since ->i_state doesn't contain I_DIRTY_TIME when ->dirty_inode() is
        called from __writeback_single_inode() for lazytime expiration,
        xfs_fs_dirty_inode() ignores the notification.  (XFS only cares about
        lazytime expirations, and it assumes that i_state will contain
        I_DIRTY_TIME during those.)  Therefore, lazy timestamps aren't
        persisted by sync(), syncfs(), or dirtytime_expire_interval on XFS.
      
      Fix this by moving the call to mark_inode_dirty_sync() to earlier in
      __writeback_single_inode(), before the dirty flags are cleared from
      i_state.  This makes filesystems be properly notified of the timestamp
      expiration, and it avoids incorrectly redirtying the inode.
      
      This fixes xfstest generic/580 (which tests
      FS_IOC_REMOVE_ENCRYPTION_KEY) when run on ext4 or f2fs with lazytime
      enabled.  It also fixes the new lazytime xfstest I've proposed, which
      reproduces the above-mentioned XFS bug
      (https://lore.kernel.org/r/20210105005818.92978-1-ebiggers@kernel.org).
      
      Alternatively, we could call ->dirty_inode(I_DIRTY_SYNC) directly.  But
      due to the introduction of I_SYNC_QUEUED, mark_inode_dirty_sync() is the
      right thing to do because mark_inode_dirty_sync() now knows not to move
      the inode to a writeback list if it is currently queued for sync.
      
      Fixes: 0ae45f63 ("vfs: add support for a lazytime mount option")
      Cc: stable@vger.kernel.org
      Depends-on: 5afced3b ("writeback: Avoid skipping inode writeback")
      Link: https://lore.kernel.org/r/20210112190253.64307-2-ebiggers@kernel.orgSuggested-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
      e14ae526
    • F
      btrfs: send: fix invalid clone operations when cloning from the same file and root · fc5ac7cd
      Filipe Manana 提交于
      stable inclusion
      from stable-5.10.11
      commit adc11110d1e58b575b669f7d76982dac4220ea10
      bugzilla: 47621
      
      --------------------------------
      
      commit 518837e6 upstream.
      
      When an incremental send finds an extent that is shared, it checks which
      file extent items in the range refer to that extent, and for those it
      emits clone operations, while for others it emits regular write operations
      to avoid corruption at the destination (as described and fixed by commit
      d906d49f ("Btrfs: send, fix file corruption due to incorrect cloning
      operations")).
      
      However when the root we are cloning from is the send root, we are cloning
      from the inode currently being processed and the source file range has
      several extent items that partially point to the desired extent, with an
      offset smaller than the offset in the file extent item for the range we
      want to clone into, it can cause the algorithm to issue a clone operation
      that starts at the current eof of the file being processed in the receiver
      side, in which case the receiver will fail, with EINVAL, when attempting
      to execute the clone operation.
      
      Example reproducer:
      
        $ cat test-send-clone.sh
        #!/bin/bash
      
        DEV=/dev/sdi
        MNT=/mnt/sdi
      
        mkfs.btrfs -f $DEV >/dev/null
        mount $DEV $MNT
      
        # Create our test file with a single and large extent (1M) and with
        # different content for different file ranges that will be reflinked
        # later.
        xfs_io -f \
               -c "pwrite -S 0xab 0 128K" \
               -c "pwrite -S 0xcd 128K 128K" \
               -c "pwrite -S 0xef 256K 256K" \
               -c "pwrite -S 0x1a 512K 512K" \
               $MNT/foobar
      
        btrfs subvolume snapshot -r $MNT $MNT/snap1
        btrfs send -f /tmp/snap1.send $MNT/snap1
      
        # Now do a series of changes to our file such that we end up with
        # different parts of the extent reflinked into different file offsets
        # and we overwrite a large part of the extent too, so no file extent
        # items refer to that part that was overwritten. This used to confuse
        # the algorithm used by the kernel to figure out which file ranges to
        # clone, making it attempt to clone from a source range starting at
        # the current eof of the file, resulting in the receiver to fail since
        # it is an invalid clone operation.
        #
        xfs_io -c "reflink $MNT/foobar 64K 1M 960K" \
               -c "reflink $MNT/foobar 0K 512K 256K" \
               -c "reflink $MNT/foobar 512K 128K 256K" \
               -c "pwrite -S 0x73 384K 640K" \
               $MNT/foobar
      
        btrfs subvolume snapshot -r $MNT $MNT/snap2
        btrfs send -f /tmp/snap2.send -p $MNT/snap1 $MNT/snap2
      
        echo -e "\nFile digest in the original filesystem:"
        md5sum $MNT/snap2/foobar
      
        # Now unmount the filesystem, create a new one, mount it and try to
        # apply both send streams to recreate both snapshots.
        umount $DEV
      
        mkfs.btrfs -f $DEV >/dev/null
        mount $DEV $MNT
      
        btrfs receive -f /tmp/snap1.send $MNT
        btrfs receive -f /tmp/snap2.send $MNT
      
        # Must match what we got in the original filesystem of course.
        echo -e "\nFile digest in the new filesystem:"
        md5sum $MNT/snap2/foobar
      
        umount $MNT
      
      When running the reproducer, the incremental send operation fails due to
      an invalid clone operation:
      
        $ ./test-send-clone.sh
        wrote 131072/131072 bytes at offset 0
        128 KiB, 32 ops; 0.0015 sec (80.906 MiB/sec and 20711.9741 ops/sec)
        wrote 131072/131072 bytes at offset 131072
        128 KiB, 32 ops; 0.0013 sec (90.514 MiB/sec and 23171.6148 ops/sec)
        wrote 262144/262144 bytes at offset 262144
        256 KiB, 64 ops; 0.0025 sec (98.270 MiB/sec and 25157.2327 ops/sec)
        wrote 524288/524288 bytes at offset 524288
        512 KiB, 128 ops; 0.0052 sec (95.730 MiB/sec and 24506.9883 ops/sec)
        Create a readonly snapshot of '/mnt/sdi' in '/mnt/sdi/snap1'
        At subvol /mnt/sdi/snap1
        linked 983040/983040 bytes at offset 1048576
        960 KiB, 1 ops; 0.0006 sec (1.419 GiB/sec and 1550.3876 ops/sec)
        linked 262144/262144 bytes at offset 524288
        256 KiB, 1 ops; 0.0020 sec (120.192 MiB/sec and 480.7692 ops/sec)
        linked 262144/262144 bytes at offset 131072
        256 KiB, 1 ops; 0.0018 sec (133.833 MiB/sec and 535.3319 ops/sec)
        wrote 655360/655360 bytes at offset 393216
        640 KiB, 160 ops; 0.0093 sec (66.781 MiB/sec and 17095.8436 ops/sec)
        Create a readonly snapshot of '/mnt/sdi' in '/mnt/sdi/snap2'
        At subvol /mnt/sdi/snap2
      
        File digest in the original filesystem:
        9c13c61cb0b9f5abf45344375cb04dfa  /mnt/sdi/snap2/foobar
        At subvol snap1
        At snapshot snap2
        ERROR: failed to clone extents to foobar: Invalid argument
      
        File digest in the new filesystem:
        132f0396da8f48d2e667196bff882cfc  /mnt/sdi/snap2/foobar
      
      The clone operation is invalid because its source range starts at the
      current eof of the file in the receiver, causing the receiver to get
      an EINVAL error from the clone operation when attempting it.
      
      For the example above, what happens is the following:
      
      1) When processing the extent at file offset 1M, the algorithm checks that
         the extent is shared and can be (fully or partially) found at file
         offset 0.
      
         At this point the file has a size (and eof) of 1M at the receiver;
      
      2) It finds that our extent item at file offset 1M has a data offset of
         64K and, since the file extent item at file offset 0 has a data offset
         of 0, it issues a clone operation, from the same file and root, that
         has a source range offset of 64K, destination offset of 1M and a length
         of 64K, since the extent item at file offset 0 refers only to the first
         128K of the shared extent.
      
         After this clone operation, the file size (and eof) at the receiver is
         increased from 1M to 1088K (1M + 64K);
      
      3) Now there's still 896K (960K - 64K) of data left to clone or write, so
         it checks for the next file extent item, which starts at file offset
         128K. This file extent item has a data offset of 0 and a length of
         256K, so a clone operation with a source range offset of 256K, a
         destination offset of 1088K (1M + 64K) and length of 128K is issued.
      
         After this operation the file size (and eof) at the receiver increases
         from 1088K to 1216K (1088K + 128K);
      
      4) Now there's still 768K (896K - 128K) of data left to clone or write, so
         it checks for the next file extent item, located at file offset 384K.
         This file extent item points to a different extent, not the one we want
         to clone, with a length of 640K. So we issue a write operation into the
         file range 1216K (1088K + 128K, end of the last clone operation), with
         a length of 640K and with a data matching the one we can find for that
         range in send root.
      
         After this operation, the file size (and eof) at the receiver increases
         from 1216K to 1856K (1216K + 640K);
      
      5) Now there's still 128K (768K - 640K) of data left to clone or write, so
         we look into the file extent item, which is for file offset 1M and it
         points to the extent we want to clone, with a data offset of 64K and a
         length of 960K.
      
         However this matches the file offset we started with, the start of the
         range to clone into. So we can't for sure find any file extent item
         from here onwards with the rest of the data we want to clone, yet we
         proceed and since the file extent item points to the shared extent,
         with a data offset of 64K, we issue a clone operation with a source
         range starting at file offset 1856K, which matches the file extent
         item's offset, 1M, plus the amount of data cloned and written so far,
         which is 64K (step 2) + 128K (step 3) + 640K (step 4). This clone
         operation is invalid since the source range offset matches the current
         eof of the file in the receiver. We should have stopped looking for
         extents to clone at this point and instead fallback to write, which
         would simply the contain the data in the file range from 1856K to
         1856K + 128K.
      
      So fix this by stopping the loop that looks for file ranges to clone at
      clone_range() when we reach the current eof of the file being processed,
      if we are cloning from the same file and using the send root as the clone
      root. This ensures any data not yet cloned will be sent to the receiver
      through a write operation.
      
      A test case for fstests will follow soon.
      Reported-by: NMassimo B. <massimo.b@gmx.net>
      Link: https://lore.kernel.org/linux-btrfs/6ae34776e85912960a253a8327068a892998e685.camel@gmx.net/
      Fixes: 11f2069c ("Btrfs: send, allow clone operations within the same file")
      CC: stable@vger.kernel.org # 5.5+
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
      fc5ac7cd
    • J
      btrfs: don't clear ret in btrfs_start_dirty_block_groups · 0bd793ad
      Josef Bacik 提交于
      stable inclusion
      from stable-5.10.11
      commit 018abb50891e4faf051de2ac01cb041f3904e1d1
      bugzilla: 47621
      
      --------------------------------
      
      commit 34d1eb0e upstream.
      
      If we fail to update a block group item in the loop we'll break, however
      we'll do btrfs_run_delayed_refs and lose our error value in ret, and
      thus not clean up properly.  Fix this by only running the delayed refs
      if there was no failure.
      
      CC: stable@vger.kernel.org # 4.4+
      Reviewed-by: NQu Wenruo <wqu@suse.com>
      Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
      0bd793ad
    • J
      btrfs: fix lockdep splat in btrfs_recover_relocation · d26e08a4
      Josef Bacik 提交于
      stable inclusion
      from stable-5.10.11
      commit 14e17e90bfaaf0392d8a48744f91d81ea121fd10
      bugzilla: 47621
      
      --------------------------------
      
      commit fb286100 upstream.
      
      While testing the error paths of relocation I hit the following lockdep
      splat:
      
        ======================================================
        WARNING: possible circular locking dependency detected
        5.10.0-rc6+ #217 Not tainted
        ------------------------------------------------------
        mount/779 is trying to acquire lock:
        ffffa0e676945418 (&fs_info->balance_mutex){+.+.}-{3:3}, at: btrfs_recover_balance+0x2f0/0x340
      
        but task is already holding lock:
        ffffa0e60ee31da8 (btrfs-root-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x27/0x100
      
        which lock already depends on the new lock.
      
        the existing dependency chain (in reverse order) is:
      
        -> #2 (btrfs-root-00){++++}-{3:3}:
      	 down_read_nested+0x43/0x130
      	 __btrfs_tree_read_lock+0x27/0x100
      	 btrfs_read_lock_root_node+0x31/0x40
      	 btrfs_search_slot+0x462/0x8f0
      	 btrfs_update_root+0x55/0x2b0
      	 btrfs_drop_snapshot+0x398/0x750
      	 clean_dirty_subvols+0xdf/0x120
      	 btrfs_recover_relocation+0x534/0x5a0
      	 btrfs_start_pre_rw_mount+0xcb/0x170
      	 open_ctree+0x151f/0x1726
      	 btrfs_mount_root.cold+0x12/0xea
      	 legacy_get_tree+0x30/0x50
      	 vfs_get_tree+0x28/0xc0
      	 vfs_kern_mount.part.0+0x71/0xb0
      	 btrfs_mount+0x10d/0x380
      	 legacy_get_tree+0x30/0x50
      	 vfs_get_tree+0x28/0xc0
      	 path_mount+0x433/0xc10
      	 __x64_sys_mount+0xe3/0x120
      	 do_syscall_64+0x33/0x40
      	 entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
        -> #1 (sb_internal#2){.+.+}-{0:0}:
      	 start_transaction+0x444/0x700
      	 insert_balance_item.isra.0+0x37/0x320
      	 btrfs_balance+0x354/0xf40
      	 btrfs_ioctl_balance+0x2cf/0x380
      	 __x64_sys_ioctl+0x83/0xb0
      	 do_syscall_64+0x33/0x40
      	 entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
        -> #0 (&fs_info->balance_mutex){+.+.}-{3:3}:
      	 __lock_acquire+0x1120/0x1e10
      	 lock_acquire+0x116/0x370
      	 __mutex_lock+0x7e/0x7b0
      	 btrfs_recover_balance+0x2f0/0x340
      	 open_ctree+0x1095/0x1726
      	 btrfs_mount_root.cold+0x12/0xea
      	 legacy_get_tree+0x30/0x50
      	 vfs_get_tree+0x28/0xc0
      	 vfs_kern_mount.part.0+0x71/0xb0
      	 btrfs_mount+0x10d/0x380
      	 legacy_get_tree+0x30/0x50
      	 vfs_get_tree+0x28/0xc0
      	 path_mount+0x433/0xc10
      	 __x64_sys_mount+0xe3/0x120
      	 do_syscall_64+0x33/0x40
      	 entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
        other info that might help us debug this:
      
        Chain exists of:
          &fs_info->balance_mutex --> sb_internal#2 --> btrfs-root-00
      
         Possible unsafe locking scenario:
      
      	 CPU0                    CPU1
      	 ----                    ----
          lock(btrfs-root-00);
      				 lock(sb_internal#2);
      				 lock(btrfs-root-00);
          lock(&fs_info->balance_mutex);
      
         *** DEADLOCK ***
      
        2 locks held by mount/779:
         #0: ffffa0e60dc040e0 (&type->s_umount_key#47/1){+.+.}-{3:3}, at: alloc_super+0xb5/0x380
         #1: ffffa0e60ee31da8 (btrfs-root-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x27/0x100
      
        stack backtrace:
        CPU: 0 PID: 779 Comm: mount Not tainted 5.10.0-rc6+ #217
        Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
        Call Trace:
         dump_stack+0x8b/0xb0
         check_noncircular+0xcf/0xf0
         ? trace_call_bpf+0x139/0x260
         __lock_acquire+0x1120/0x1e10
         lock_acquire+0x116/0x370
         ? btrfs_recover_balance+0x2f0/0x340
         __mutex_lock+0x7e/0x7b0
         ? btrfs_recover_balance+0x2f0/0x340
         ? btrfs_recover_balance+0x2f0/0x340
         ? rcu_read_lock_sched_held+0x3f/0x80
         ? kmem_cache_alloc_trace+0x2c4/0x2f0
         ? btrfs_get_64+0x5e/0x100
         btrfs_recover_balance+0x2f0/0x340
         open_ctree+0x1095/0x1726
         btrfs_mount_root.cold+0x12/0xea
         ? rcu_read_lock_sched_held+0x3f/0x80
         legacy_get_tree+0x30/0x50
         vfs_get_tree+0x28/0xc0
         vfs_kern_mount.part.0+0x71/0xb0
         btrfs_mount+0x10d/0x380
         ? __kmalloc_track_caller+0x2f2/0x320
         legacy_get_tree+0x30/0x50
         vfs_get_tree+0x28/0xc0
         ? capable+0x3a/0x60
         path_mount+0x433/0xc10
         __x64_sys_mount+0xe3/0x120
         do_syscall_64+0x33/0x40
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      This is straightforward to fix, simply release the path before we setup
      the balance_ctl.
      
      CC: stable@vger.kernel.org # 4.4+
      Reviewed-by: NQu Wenruo <wqu@suse.com>
      Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
      d26e08a4
    • J
      btrfs: do not double free backref nodes on error · c4b1a4ed
      Josef Bacik 提交于
      stable inclusion
      from stable-5.10.11
      commit 5169a289fc8c860c1f29883053116cbef2123eaf
      bugzilla: 47621
      
      --------------------------------
      
      commit 49ecc679 upstream.
      
      Zygo reported the following KASAN splat:
      
        BUG: KASAN: use-after-free in btrfs_backref_cleanup_node+0x18a/0x420
        Read of size 8 at addr ffff888112402950 by task btrfs/28836
      
        CPU: 0 PID: 28836 Comm: btrfs Tainted: G        W         5.10.0-e35f27394290-for-next+ #23
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
        Call Trace:
         dump_stack+0xbc/0xf9
         ? btrfs_backref_cleanup_node+0x18a/0x420
         print_address_description.constprop.8+0x21/0x210
         ? record_print_text.cold.34+0x11/0x11
         ? btrfs_backref_cleanup_node+0x18a/0x420
         ? btrfs_backref_cleanup_node+0x18a/0x420
         kasan_report.cold.10+0x20/0x37
         ? btrfs_backref_cleanup_node+0x18a/0x420
         __asan_load8+0x69/0x90
         btrfs_backref_cleanup_node+0x18a/0x420
         btrfs_backref_release_cache+0x83/0x1b0
         relocate_block_group+0x394/0x780
         ? merge_reloc_roots+0x4a0/0x4a0
         btrfs_relocate_block_group+0x26e/0x4c0
         btrfs_relocate_chunk+0x52/0x120
         btrfs_balance+0xe2e/0x1900
         ? check_flags.part.50+0x6c/0x1e0
         ? btrfs_relocate_chunk+0x120/0x120
         ? kmem_cache_alloc_trace+0xa06/0xcb0
         ? _copy_from_user+0x83/0xc0
         btrfs_ioctl_balance+0x3a7/0x460
         btrfs_ioctl+0x24c8/0x4360
         ? __kasan_check_read+0x11/0x20
         ? check_chain_key+0x1f4/0x2f0
         ? __asan_loadN+0xf/0x20
         ? btrfs_ioctl_get_supported_features+0x30/0x30
         ? kvm_sched_clock_read+0x18/0x30
         ? check_chain_key+0x1f4/0x2f0
         ? lock_downgrade+0x3f0/0x3f0
         ? handle_mm_fault+0xad6/0x2150
         ? do_vfs_ioctl+0xfc/0x9d0
         ? ioctl_file_clone+0xe0/0xe0
         ? check_flags.part.50+0x6c/0x1e0
         ? check_flags.part.50+0x6c/0x1e0
         ? check_flags+0x26/0x30
         ? lock_is_held_type+0xc3/0xf0
         ? syscall_enter_from_user_mode+0x1b/0x60
         ? do_syscall_64+0x13/0x80
         ? rcu_read_lock_sched_held+0xa1/0xd0
         ? __kasan_check_read+0x11/0x20
         ? __fget_light+0xae/0x110
         __x64_sys_ioctl+0xc3/0x100
         do_syscall_64+0x37/0x80
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
        RIP: 0033:0x7f4c4bdfe427
      
        Allocated by task 28836:
         kasan_save_stack+0x21/0x50
         __kasan_kmalloc.constprop.18+0xbe/0xd0
         kasan_kmalloc+0x9/0x10
         kmem_cache_alloc_trace+0x410/0xcb0
         btrfs_backref_alloc_node+0x46/0xf0
         btrfs_backref_add_tree_node+0x60d/0x11d0
         build_backref_tree+0xc5/0x700
         relocate_tree_blocks+0x2be/0xb90
         relocate_block_group+0x2eb/0x780
         btrfs_relocate_block_group+0x26e/0x4c0
         btrfs_relocate_chunk+0x52/0x120
         btrfs_balance+0xe2e/0x1900
         btrfs_ioctl_balance+0x3a7/0x460
         btrfs_ioctl+0x24c8/0x4360
         __x64_sys_ioctl+0xc3/0x100
         do_syscall_64+0x37/0x80
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
        Freed by task 28836:
         kasan_save_stack+0x21/0x50
         kasan_set_track+0x20/0x30
         kasan_set_free_info+0x1f/0x30
         __kasan_slab_free+0xf3/0x140
         kasan_slab_free+0xe/0x10
         kfree+0xde/0x200
         btrfs_backref_error_cleanup+0x452/0x530
         build_backref_tree+0x1a5/0x700
         relocate_tree_blocks+0x2be/0xb90
         relocate_block_group+0x2eb/0x780
         btrfs_relocate_block_group+0x26e/0x4c0
         btrfs_relocate_chunk+0x52/0x120
         btrfs_balance+0xe2e/0x1900
         btrfs_ioctl_balance+0x3a7/0x460
         btrfs_ioctl+0x24c8/0x4360
         __x64_sys_ioctl+0xc3/0x100
         do_syscall_64+0x37/0x80
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      This occurred because we freed our backref node in
      btrfs_backref_error_cleanup(), but then tried to free it again in
      btrfs_backref_release_cache().  This is because
      btrfs_backref_release_cache() will cycle through all of the
      cache->leaves nodes and free them up.  However
      btrfs_backref_error_cleanup() freed the backref node with
      btrfs_backref_free_node(), which simply kfree()d the backref node
      without unlinking it from the cache.  Change this to a
      btrfs_backref_drop_node(), which does the appropriate cleanup and
      removes the node from the cache->leaves list, so when we go to free the
      remaining cache we don't trip over items we've already dropped.
      
      Fixes: 75bfb9af ("Btrfs: cleanup error handling in build_backref_tree")
      CC: stable@vger.kernel.org # 4.4+
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
      c4b1a4ed
    • J
      btrfs: don't get an EINTR during drop_snapshot for reloc · 51c06482
      Josef Bacik 提交于
      stable inclusion
      from stable-5.10.11
      commit 9e2fc8f10c9175e7f5d4bd636036ef427bb3eae9
      bugzilla: 47621
      
      --------------------------------
      
      commit 18d3bff4 upstream.
      
      This was partially fixed by f3e3d9cc ("btrfs: avoid possible signal
      interruption of btrfs_drop_snapshot() on relocation tree"), however it
      missed a spot when we restart a trans handle because we need to end the
      transaction.  The fix is the same, simply use btrfs_join_transaction()
      instead of btrfs_start_transaction() when deleting reloc roots.
      
      Fixes: f3e3d9cc ("btrfs: avoid possible signal interruption of btrfs_drop_snapshot() on relocation tree")
      CC: stable@vger.kernel.org # 5.4+
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
      51c06482
  2. 29 1月, 2021 3 次提交
  3. 28 1月, 2021 1 次提交