1. 28 1月, 2021 1 次提交
    • H
      io_uring: fix flush cqring overflow list while TASK_INTERRUPTIBLE · 6195ba09
      Hao Xu 提交于
      Abaci reported the follow warning:
      
      [   27.073425] do not call blocking ops when !TASK_RUNNING; state=1 set at [] prepare_to_wait_exclusive+0x3a/0xc0
      [   27.075805] WARNING: CPU: 0 PID: 951 at kernel/sched/core.c:7853 __might_sleep+0x80/0xa0
      [   27.077604] Modules linked in:
      [   27.078379] CPU: 0 PID: 951 Comm: a.out Not tainted 5.11.0-rc3+ #1
      [   27.079637] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      [   27.080852] RIP: 0010:__might_sleep+0x80/0xa0
      [   27.081835] Code: 65 48 8b 04 25 80 71 01 00 48 8b 90 c0 15 00 00 48 8b 70 18 48 c7 c7 08 39 95 82 c6 05 f9 5f de 08 01 48 89 d1 e8 00 c6 fa ff  0b eb bf 41 0f b6 f5 48 c7 c7 40 23 c9 82 e8 f3 48 ec 00 eb a7
      [   27.084521] RSP: 0018:ffffc90000fe3ce8 EFLAGS: 00010286
      [   27.085350] RAX: 0000000000000000 RBX: ffffffff82956083 RCX: 0000000000000000
      [   27.086348] RDX: ffff8881057a0000 RSI: ffffffff8118cc9e RDI: ffff88813bc28570
      [   27.087598] RBP: 00000000000003a7 R08: 0000000000000001 R09: 0000000000000001
      [   27.088819] R10: ffffc90000fe3e00 R11: 00000000fffef9f0 R12: 0000000000000000
      [   27.089819] R13: 0000000000000000 R14: ffff88810576eb80 R15: ffff88810576e800
      [   27.091058] FS:  00007f7b144cf740(0000) GS:ffff88813bc00000(0000) knlGS:0000000000000000
      [   27.092775] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   27.093796] CR2: 00000000022da7b8 CR3: 000000010b928002 CR4: 00000000003706f0
      [   27.094778] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [   27.095780] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [   27.097011] Call Trace:
      [   27.097685]  __mutex_lock+0x5d/0xa30
      [   27.098565]  ? prepare_to_wait_exclusive+0x71/0xc0
      [   27.099412]  ? io_cqring_overflow_flush.part.101+0x6d/0x70
      [   27.100441]  ? lockdep_hardirqs_on_prepare+0xe9/0x1c0
      [   27.101537]  ? _raw_spin_unlock_irqrestore+0x2d/0x40
      [   27.102656]  ? trace_hardirqs_on+0x46/0x110
      [   27.103459]  ? io_cqring_overflow_flush.part.101+0x6d/0x70
      [   27.104317]  io_cqring_overflow_flush.part.101+0x6d/0x70
      [   27.105113]  io_cqring_wait+0x36e/0x4d0
      [   27.105770]  ? find_held_lock+0x28/0xb0
      [   27.106370]  ? io_uring_remove_task_files+0xa0/0xa0
      [   27.107076]  __x64_sys_io_uring_enter+0x4fb/0x640
      [   27.107801]  ? rcu_read_lock_sched_held+0x59/0xa0
      [   27.108562]  ? lockdep_hardirqs_on_prepare+0xe9/0x1c0
      [   27.109684]  ? syscall_enter_from_user_mode+0x26/0x70
      [   27.110731]  do_syscall_64+0x2d/0x40
      [   27.111296]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [   27.112056] RIP: 0033:0x7f7b13dc8239
      [   27.112663] Code: 01 00 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05  3d 01 f0 ff ff 73 01 c3 48 8b 0d 27 ec 2c 00 f7 d8 64 89 01 48
      [   27.115113] RSP: 002b:00007ffd6d7f5c88 EFLAGS: 00000286 ORIG_RAX: 00000000000001aa
      [   27.116562] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f7b13dc8239
      [   27.117961] RDX: 000000000000478e RSI: 0000000000000000 RDI: 0000000000000003
      [   27.118925] RBP: 00007ffd6d7f5cb0 R08: 0000000020000040 R09: 0000000000000008
      [   27.119773] R10: 0000000000000001 R11: 0000000000000286 R12: 0000000000400480
      [   27.120614] R13: 00007ffd6d7f5d90 R14: 0000000000000000 R15: 0000000000000000
      [   27.121490] irq event stamp: 5635
      [   27.121946] hardirqs last  enabled at (5643): [] console_unlock+0x5c4/0x740
      [   27.123476] hardirqs last disabled at (5652): [] console_unlock+0x4e7/0x740
      [   27.125192] softirqs last  enabled at (5272): [] __do_softirq+0x3c5/0x5aa
      [   27.126430] softirqs last disabled at (5267): [] asm_call_irq_on_stack+0xf/0x20
      [   27.127634] ---[ end trace 289d7e28fa60f928 ]---
      
      This is caused by calling io_cqring_overflow_flush() which may sleep
      after calling prepare_to_wait_exclusive() which set task state to
      TASK_INTERRUPTIBLE
      Reported-by: NAbaci <abaci@linux.alibaba.com>
      Fixes: 6c503150 ("io_uring: patch up IOPOLL overflow_flush sync")
      Reviewed-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NHao Xu <haoxu@linux.alibaba.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      6195ba09
  2. 27 1月, 2021 2 次提交
    • P
      io_uring: fix wqe->lock/completion_lock deadlock · 907d1df3
      Pavel Begunkov 提交于
      Joseph reports following deadlock:
      
      CPU0:
      ...
      io_kill_linked_timeout  // &ctx->completion_lock
      io_commit_cqring
      __io_queue_deferred
      __io_queue_async_work
      io_wq_enqueue
      io_wqe_enqueue  // &wqe->lock
      
      CPU1:
      ...
      __io_uring_files_cancel
      io_wq_cancel_cb
      io_wqe_cancel_pending_work  // &wqe->lock
      io_cancel_task_cb  // &ctx->completion_lock
      
      Only __io_queue_deferred() calls queue_async_work() while holding
      ctx->completion_lock, enqueue drained requests via io_req_task_queue()
      instead.
      
      Cc: stable@vger.kernel.org # 5.9+
      Reported-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      Tested-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      907d1df3
    • P
      io_uring: fix cancellation taking mutex while TASK_UNINTERRUPTIBLE · ca70f00b
      Pavel Begunkov 提交于
      do not call blocking ops when !TASK_RUNNING; state=2 set at
      	[<00000000ced9dbfc>] prepare_to_wait+0x1f4/0x3b0
      	kernel/sched/wait.c:262
      WARNING: CPU: 1 PID: 19888 at kernel/sched/core.c:7853
      	__might_sleep+0xed/0x100 kernel/sched/core.c:7848
      RIP: 0010:__might_sleep+0xed/0x100 kernel/sched/core.c:7848
      Call Trace:
       __mutex_lock_common+0xc4/0x2ef0 kernel/locking/mutex.c:935
       __mutex_lock kernel/locking/mutex.c:1103 [inline]
       mutex_lock_nested+0x1a/0x20 kernel/locking/mutex.c:1118
       io_wq_submit_work+0x39a/0x720 fs/io_uring.c:6411
       io_run_cancel fs/io-wq.c:856 [inline]
       io_wqe_cancel_pending_work fs/io-wq.c:990 [inline]
       io_wq_cancel_cb+0x614/0xcb0 fs/io-wq.c:1027
       io_uring_cancel_files fs/io_uring.c:8874 [inline]
       io_uring_cancel_task_requests fs/io_uring.c:8952 [inline]
       __io_uring_files_cancel+0x115d/0x19e0 fs/io_uring.c:9038
       io_uring_files_cancel include/linux/io_uring.h:51 [inline]
       do_exit+0x2e6/0x2490 kernel/exit.c:780
       do_group_exit+0x168/0x2d0 kernel/exit.c:922
       get_signal+0x16b5/0x2030 kernel/signal.c:2770
       arch_do_signal_or_restart+0x8e/0x6a0 arch/x86/kernel/signal.c:811
       handle_signal_work kernel/entry/common.c:147 [inline]
       exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
       exit_to_user_mode_prepare+0xac/0x1e0 kernel/entry/common.c:201
       __syscall_exit_to_user_mode_work kernel/entry/common.c:291 [inline]
       syscall_exit_to_user_mode+0x48/0x190 kernel/entry/common.c:302
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Rewrite io_uring_cancel_files() to mimic __io_uring_task_cancel()'s
      counting scheme, so it does all the heavy work before setting
      TASK_UNINTERRUPTIBLE.
      
      Cc: stable@vger.kernel.org # 5.9+
      Reported-by: syzbot+f655445043a26a7cfab8@syzkaller.appspotmail.com
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      [axboe: fix inverted task check]
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      ca70f00b
  3. 26 1月, 2021 1 次提交
  4. 25 1月, 2021 5 次提交
    • J
      io_uring: only call io_cqring_ev_posted() if events were posted · b18032bb
      Jens Axboe 提交于
      This normally doesn't cause any extra harm, but it does mean that we'll
      increment the eventfd notification count, if one has been registered
      with the ring. This can confuse applications, when they see more
      notifications on the eventfd side than are available in the ring.
      
      Do the nice thing and only increment this count, if we actually posted
      (or even overflowed) events.
      Reported-and-tested-by: NDan Melnic <dmm@fb.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      b18032bb
    • J
      io_uring: if we see flush on exit, cancel related tasks · 84965ff8
      Jens Axboe 提交于
      Ensure we match tasks that belong to a dead or dying task as well, as we
      need to reap those in addition to those belonging to the exiting task.
      
      Cc: stable@vger.kernel.org # 5.9+
      Reported-by: NJosef Grieb <josef.grieb@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      84965ff8
    • X
      proc_sysctl: fix oops caused by incorrect command parameters · 697edcb0
      Xiaoming Ni 提交于
      The process_sysctl_arg() does not check whether val is empty before
      invoking strlen(val).  If the command line parameter () is incorrectly
      configured and val is empty, oops is triggered.
      
      For example:
        "hung_task_panic=1" is incorrectly written as "hung_task_panic", oops is
        triggered. The call stack is as follows:
          Kernel command line: .... hung_task_panic
          ......
          Call trace:
          __pi_strlen+0x10/0x98
          parse_args+0x278/0x344
          do_sysctl_args+0x8c/0xfc
          kernel_init+0x5c/0xf4
          ret_from_fork+0x10/0x30
      
      To fix it, check whether "val" is empty when "phram" is a sysctl field.
      Error codes are returned in the failure branch, and error logs are
      generated by parse_args().
      
      Link: https://lkml.kernel.org/r/20210118133029.28580-1-nixiaoming@huawei.com
      Fixes: 3db978d4 ("kernel/sysctl: support setting sysctl parameters from kernel command line")
      Signed-off-by: NXiaoming Ni <nixiaoming@huawei.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Luis Chamberlain <mcgrof@kernel.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Iurii Zaikin <yzaikin@google.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Heiner Kallweit <hkallweit1@gmail.com>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: <stable@vger.kernel.org>	[5.8+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      697edcb0
    • J
      io_uring: account io_uring internal files as REQ_F_INFLIGHT · 02a13674
      Jens Axboe 提交于
      We need to actively cancel anything that introduces a potential circular
      loop, where io_uring holds a reference to itself. If the file in question
      is an io_uring file, then add the request to the inflight list.
      
      Cc: stable@vger.kernel.org # 5.9+
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      02a13674
    • P
      io_uring: fix sleeping under spin in __io_clean_op · 9d5c8190
      Pavel Begunkov 提交于
      [   27.629441] BUG: sleeping function called from invalid context
      	at fs/file.c:402
      [   27.631317] in_atomic(): 1, irqs_disabled(): 1, non_block: 0,
      	pid: 1012, name: io_wqe_worker-0
      [   27.633220] 1 lock held by io_wqe_worker-0/1012:
      [   27.634286]  #0: ffff888105e26c98 (&ctx->completion_lock)
      	{....}-{2:2}, at: __io_req_complete.part.102+0x30/0x70
      [   27.649249] Call Trace:
      [   27.649874]  dump_stack+0xac/0xe3
      [   27.650666]  ___might_sleep+0x284/0x2c0
      [   27.651566]  put_files_struct+0xb8/0x120
      [   27.652481]  __io_clean_op+0x10c/0x2a0
      [   27.653362]  __io_cqring_fill_event+0x2c1/0x350
      [   27.654399]  __io_req_complete.part.102+0x41/0x70
      [   27.655464]  io_openat2+0x151/0x300
      [   27.656297]  io_issue_sqe+0x6c/0x14e0
      [   27.660991]  io_wq_submit_work+0x7f/0x240
      [   27.662890]  io_worker_handle_work+0x501/0x8a0
      [   27.664836]  io_wqe_worker+0x158/0x520
      [   27.667726]  kthread+0x134/0x180
      [   27.669641]  ret_from_fork+0x1f/0x30
      
      Instead of cleaning files on overflow, return back overflow cancellation
      into io_uring_cancel_files(). Previously it was racy to clean
      REQ_F_OVERFLOW flag, but we got rid of it, and can do it through
      repetitive attempts targeting all matching requests.
      Reported-by: NAbaci <abaci@linux.alibaba.com>
      Reported-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      Cc: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      9d5c8190
  5. 23 1月, 2021 3 次提交
    • R
      cifs: do not fail __smb_send_rqst if non-fatal signals are pending · 214a5ea0
      Ronnie Sahlberg 提交于
      RHBZ 1848178
      
      The original intent of returning an error in this function
      in the patch:
        "CIFS: Mask off signals when sending SMB packets"
      was to avoid interrupting packet send in the middle of
      sending the data (and thus breaking an SMB connection),
      but we also don't want to fail the request for non-fatal
      signals even before we have had a chance to try to
      send it (the reported problem could be reproduced e.g.
      by exiting a child process when the parent process was in
      the midst of calling futimens to update a file's timestamps).
      
      In addition, since the signal may remain pending when we enter the
      sending loop, we may end up not sending the whole packet before
      TCP buffers become full. In this case the code returns -EINTR
      but what we need here is to return -ERESTARTSYS instead to
      allow system calls to be restarted.
      
      Fixes: b30c74c7 ("CIFS: Mask off signals when sending SMB packets")
      Cc: stable@vger.kernel.org # v5.1+
      Signed-off-by: NRonnie Sahlberg <lsahlber@redhat.com>
      Reviewed-by: NPavel Shilovsky <pshilov@microsoft.com>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      214a5ea0
    • P
      io_uring: fix short read retries for non-reg files · 9a173346
      Pavel Begunkov 提交于
      Sockets and other non-regular files may actually expect short reads to
      happen, don't retry reads for them. Because non-reg files don't set
      FMODE_BUF_RASYNC and so it won't do second/retry do_read, we can filter
      out those cases after first do_read() attempt with ret>0.
      
      Cc: stable@vger.kernel.org # 5.9+
      Suggested-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      9a173346
    • J
      io_uring: fix SQPOLL IORING_OP_CLOSE cancelation state · 607ec89e
      Jens Axboe 提交于
      IORING_OP_CLOSE is special in terms of cancelation, since it has an
      intermediate state where we've removed the file descriptor but hasn't
      closed the file yet. For that reason, it's currently marked with
      IO_WQ_WORK_NO_CANCEL to prevent cancelation. This ensures that the op
      is always run even if canceled, to prevent leaving us with a live file
      but an fd that is gone. However, with SQPOLL, since a cancel request
      doesn't carry any resources on behalf of the request being canceled, if
      we cancel before any of the close op has been run, we can end up with
      io-wq not having the ->files assigned. This can result in the following
      oops reported by Joseph:
      
      BUG: kernel NULL pointer dereference, address: 00000000000000d8
      PGD 800000010b76f067 P4D 800000010b76f067 PUD 10b462067 PMD 0
      Oops: 0000 [#1] SMP PTI
      CPU: 1 PID: 1788 Comm: io_uring-sq Not tainted 5.11.0-rc4 #1
      Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      RIP: 0010:__lock_acquire+0x19d/0x18c0
      Code: 00 00 8b 1d fd 56 dd 08 85 db 0f 85 43 05 00 00 48 c7 c6 98 7b 95 82 48 c7 c7 57 96 93 82 e8 9a bc f5 ff 0f 0b e9 2b 05 00 00 <48> 81 3f c0 ca 67 8a b8 00 00 00 00 41 0f 45 c0 89 04 24 e9 81 fe
      RSP: 0018:ffffc90001933828 EFLAGS: 00010002
      RAX: 0000000000000001 RBX: 0000000000000001 RCX: 0000000000000000
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00000000000000d8
      RBP: 0000000000000246 R08: 0000000000000001 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
      R13: 0000000000000000 R14: ffff888106e8a140 R15: 00000000000000d8
      FS:  0000000000000000(0000) GS:ffff88813bd00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00000000000000d8 CR3: 0000000106efa004 CR4: 00000000003706e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       lock_acquire+0x31a/0x440
       ? close_fd_get_file+0x39/0x160
       ? __lock_acquire+0x647/0x18c0
       _raw_spin_lock+0x2c/0x40
       ? close_fd_get_file+0x39/0x160
       close_fd_get_file+0x39/0x160
       io_issue_sqe+0x1334/0x14e0
       ? lock_acquire+0x31a/0x440
       ? __io_free_req+0xcf/0x2e0
       ? __io_free_req+0x175/0x2e0
       ? find_held_lock+0x28/0xb0
       ? io_wq_submit_work+0x7f/0x240
       io_wq_submit_work+0x7f/0x240
       io_wq_cancel_cb+0x161/0x580
       ? io_wqe_wake_worker+0x114/0x360
       ? io_uring_get_socket+0x40/0x40
       io_async_find_and_cancel+0x3b/0x140
       io_issue_sqe+0xbe1/0x14e0
       ? __lock_acquire+0x647/0x18c0
       ? __io_queue_sqe+0x10b/0x5f0
       __io_queue_sqe+0x10b/0x5f0
       ? io_req_prep+0xdb/0x1150
       ? mark_held_locks+0x6d/0xb0
       ? mark_held_locks+0x6d/0xb0
       ? io_queue_sqe+0x235/0x4b0
       io_queue_sqe+0x235/0x4b0
       io_submit_sqes+0xd7e/0x12a0
       ? _raw_spin_unlock_irq+0x24/0x30
       ? io_sq_thread+0x3ae/0x940
       io_sq_thread+0x207/0x940
       ? do_wait_intr_irq+0xc0/0xc0
       ? __ia32_sys_io_uring_enter+0x650/0x650
       kthread+0x134/0x180
       ? kthread_create_worker_on_cpu+0x90/0x90
       ret_from_fork+0x1f/0x30
      
      Fix this by moving the IO_WQ_WORK_NO_CANCEL until _after_ we've modified
      the fdtable. Canceling before this point is totally fine, and running
      it in the io-wq context _after_ that point is also fine.
      
      For 5.12, we'll handle this internally and get rid of the no-cancel
      flag, as IORING_OP_CLOSE is the only user of it.
      
      Cc: stable@vger.kernel.org
      Fixes: b5dba59e ("io_uring: add support for IORING_OP_CLOSE")
      Reported-by: "Abaci <abaci@linux.alibaba.com>"
      Reviewed-and-tested-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      607ec89e
  6. 22 1月, 2021 3 次提交
  7. 21 1月, 2021 1 次提交
  8. 18 1月, 2021 7 次提交
    • J
      btrfs: don't clear ret in btrfs_start_dirty_block_groups · 34d1eb0e
      Josef Bacik 提交于
      If we fail to update a block group item in the loop we'll break, however
      we'll do btrfs_run_delayed_refs and lose our error value in ret, and
      thus not clean up properly.  Fix this by only running the delayed refs
      if there was no failure.
      
      CC: stable@vger.kernel.org # 4.4+
      Reviewed-by: NQu Wenruo <wqu@suse.com>
      Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      34d1eb0e
    • J
      btrfs: fix lockdep splat in btrfs_recover_relocation · fb286100
      Josef Bacik 提交于
      While testing the error paths of relocation I hit the following lockdep
      splat:
      
        ======================================================
        WARNING: possible circular locking dependency detected
        5.10.0-rc6+ #217 Not tainted
        ------------------------------------------------------
        mount/779 is trying to acquire lock:
        ffffa0e676945418 (&fs_info->balance_mutex){+.+.}-{3:3}, at: btrfs_recover_balance+0x2f0/0x340
      
        but task is already holding lock:
        ffffa0e60ee31da8 (btrfs-root-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x27/0x100
      
        which lock already depends on the new lock.
      
        the existing dependency chain (in reverse order) is:
      
        -> #2 (btrfs-root-00){++++}-{3:3}:
      	 down_read_nested+0x43/0x130
      	 __btrfs_tree_read_lock+0x27/0x100
      	 btrfs_read_lock_root_node+0x31/0x40
      	 btrfs_search_slot+0x462/0x8f0
      	 btrfs_update_root+0x55/0x2b0
      	 btrfs_drop_snapshot+0x398/0x750
      	 clean_dirty_subvols+0xdf/0x120
      	 btrfs_recover_relocation+0x534/0x5a0
      	 btrfs_start_pre_rw_mount+0xcb/0x170
      	 open_ctree+0x151f/0x1726
      	 btrfs_mount_root.cold+0x12/0xea
      	 legacy_get_tree+0x30/0x50
      	 vfs_get_tree+0x28/0xc0
      	 vfs_kern_mount.part.0+0x71/0xb0
      	 btrfs_mount+0x10d/0x380
      	 legacy_get_tree+0x30/0x50
      	 vfs_get_tree+0x28/0xc0
      	 path_mount+0x433/0xc10
      	 __x64_sys_mount+0xe3/0x120
      	 do_syscall_64+0x33/0x40
      	 entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
        -> #1 (sb_internal#2){.+.+}-{0:0}:
      	 start_transaction+0x444/0x700
      	 insert_balance_item.isra.0+0x37/0x320
      	 btrfs_balance+0x354/0xf40
      	 btrfs_ioctl_balance+0x2cf/0x380
      	 __x64_sys_ioctl+0x83/0xb0
      	 do_syscall_64+0x33/0x40
      	 entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
        -> #0 (&fs_info->balance_mutex){+.+.}-{3:3}:
      	 __lock_acquire+0x1120/0x1e10
      	 lock_acquire+0x116/0x370
      	 __mutex_lock+0x7e/0x7b0
      	 btrfs_recover_balance+0x2f0/0x340
      	 open_ctree+0x1095/0x1726
      	 btrfs_mount_root.cold+0x12/0xea
      	 legacy_get_tree+0x30/0x50
      	 vfs_get_tree+0x28/0xc0
      	 vfs_kern_mount.part.0+0x71/0xb0
      	 btrfs_mount+0x10d/0x380
      	 legacy_get_tree+0x30/0x50
      	 vfs_get_tree+0x28/0xc0
      	 path_mount+0x433/0xc10
      	 __x64_sys_mount+0xe3/0x120
      	 do_syscall_64+0x33/0x40
      	 entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
        other info that might help us debug this:
      
        Chain exists of:
          &fs_info->balance_mutex --> sb_internal#2 --> btrfs-root-00
      
         Possible unsafe locking scenario:
      
      	 CPU0                    CPU1
      	 ----                    ----
          lock(btrfs-root-00);
      				 lock(sb_internal#2);
      				 lock(btrfs-root-00);
          lock(&fs_info->balance_mutex);
      
         *** DEADLOCK ***
      
        2 locks held by mount/779:
         #0: ffffa0e60dc040e0 (&type->s_umount_key#47/1){+.+.}-{3:3}, at: alloc_super+0xb5/0x380
         #1: ffffa0e60ee31da8 (btrfs-root-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x27/0x100
      
        stack backtrace:
        CPU: 0 PID: 779 Comm: mount Not tainted 5.10.0-rc6+ #217
        Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
        Call Trace:
         dump_stack+0x8b/0xb0
         check_noncircular+0xcf/0xf0
         ? trace_call_bpf+0x139/0x260
         __lock_acquire+0x1120/0x1e10
         lock_acquire+0x116/0x370
         ? btrfs_recover_balance+0x2f0/0x340
         __mutex_lock+0x7e/0x7b0
         ? btrfs_recover_balance+0x2f0/0x340
         ? btrfs_recover_balance+0x2f0/0x340
         ? rcu_read_lock_sched_held+0x3f/0x80
         ? kmem_cache_alloc_trace+0x2c4/0x2f0
         ? btrfs_get_64+0x5e/0x100
         btrfs_recover_balance+0x2f0/0x340
         open_ctree+0x1095/0x1726
         btrfs_mount_root.cold+0x12/0xea
         ? rcu_read_lock_sched_held+0x3f/0x80
         legacy_get_tree+0x30/0x50
         vfs_get_tree+0x28/0xc0
         vfs_kern_mount.part.0+0x71/0xb0
         btrfs_mount+0x10d/0x380
         ? __kmalloc_track_caller+0x2f2/0x320
         legacy_get_tree+0x30/0x50
         vfs_get_tree+0x28/0xc0
         ? capable+0x3a/0x60
         path_mount+0x433/0xc10
         __x64_sys_mount+0xe3/0x120
         do_syscall_64+0x33/0x40
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      This is straightforward to fix, simply release the path before we setup
      the balance_ctl.
      
      CC: stable@vger.kernel.org # 4.4+
      Reviewed-by: NQu Wenruo <wqu@suse.com>
      Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      fb286100
    • J
      btrfs: do not double free backref nodes on error · 49ecc679
      Josef Bacik 提交于
      Zygo reported the following KASAN splat:
      
        BUG: KASAN: use-after-free in btrfs_backref_cleanup_node+0x18a/0x420
        Read of size 8 at addr ffff888112402950 by task btrfs/28836
      
        CPU: 0 PID: 28836 Comm: btrfs Tainted: G        W         5.10.0-e35f27394290-for-next+ #23
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
        Call Trace:
         dump_stack+0xbc/0xf9
         ? btrfs_backref_cleanup_node+0x18a/0x420
         print_address_description.constprop.8+0x21/0x210
         ? record_print_text.cold.34+0x11/0x11
         ? btrfs_backref_cleanup_node+0x18a/0x420
         ? btrfs_backref_cleanup_node+0x18a/0x420
         kasan_report.cold.10+0x20/0x37
         ? btrfs_backref_cleanup_node+0x18a/0x420
         __asan_load8+0x69/0x90
         btrfs_backref_cleanup_node+0x18a/0x420
         btrfs_backref_release_cache+0x83/0x1b0
         relocate_block_group+0x394/0x780
         ? merge_reloc_roots+0x4a0/0x4a0
         btrfs_relocate_block_group+0x26e/0x4c0
         btrfs_relocate_chunk+0x52/0x120
         btrfs_balance+0xe2e/0x1900
         ? check_flags.part.50+0x6c/0x1e0
         ? btrfs_relocate_chunk+0x120/0x120
         ? kmem_cache_alloc_trace+0xa06/0xcb0
         ? _copy_from_user+0x83/0xc0
         btrfs_ioctl_balance+0x3a7/0x460
         btrfs_ioctl+0x24c8/0x4360
         ? __kasan_check_read+0x11/0x20
         ? check_chain_key+0x1f4/0x2f0
         ? __asan_loadN+0xf/0x20
         ? btrfs_ioctl_get_supported_features+0x30/0x30
         ? kvm_sched_clock_read+0x18/0x30
         ? check_chain_key+0x1f4/0x2f0
         ? lock_downgrade+0x3f0/0x3f0
         ? handle_mm_fault+0xad6/0x2150
         ? do_vfs_ioctl+0xfc/0x9d0
         ? ioctl_file_clone+0xe0/0xe0
         ? check_flags.part.50+0x6c/0x1e0
         ? check_flags.part.50+0x6c/0x1e0
         ? check_flags+0x26/0x30
         ? lock_is_held_type+0xc3/0xf0
         ? syscall_enter_from_user_mode+0x1b/0x60
         ? do_syscall_64+0x13/0x80
         ? rcu_read_lock_sched_held+0xa1/0xd0
         ? __kasan_check_read+0x11/0x20
         ? __fget_light+0xae/0x110
         __x64_sys_ioctl+0xc3/0x100
         do_syscall_64+0x37/0x80
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
        RIP: 0033:0x7f4c4bdfe427
      
        Allocated by task 28836:
         kasan_save_stack+0x21/0x50
         __kasan_kmalloc.constprop.18+0xbe/0xd0
         kasan_kmalloc+0x9/0x10
         kmem_cache_alloc_trace+0x410/0xcb0
         btrfs_backref_alloc_node+0x46/0xf0
         btrfs_backref_add_tree_node+0x60d/0x11d0
         build_backref_tree+0xc5/0x700
         relocate_tree_blocks+0x2be/0xb90
         relocate_block_group+0x2eb/0x780
         btrfs_relocate_block_group+0x26e/0x4c0
         btrfs_relocate_chunk+0x52/0x120
         btrfs_balance+0xe2e/0x1900
         btrfs_ioctl_balance+0x3a7/0x460
         btrfs_ioctl+0x24c8/0x4360
         __x64_sys_ioctl+0xc3/0x100
         do_syscall_64+0x37/0x80
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
        Freed by task 28836:
         kasan_save_stack+0x21/0x50
         kasan_set_track+0x20/0x30
         kasan_set_free_info+0x1f/0x30
         __kasan_slab_free+0xf3/0x140
         kasan_slab_free+0xe/0x10
         kfree+0xde/0x200
         btrfs_backref_error_cleanup+0x452/0x530
         build_backref_tree+0x1a5/0x700
         relocate_tree_blocks+0x2be/0xb90
         relocate_block_group+0x2eb/0x780
         btrfs_relocate_block_group+0x26e/0x4c0
         btrfs_relocate_chunk+0x52/0x120
         btrfs_balance+0xe2e/0x1900
         btrfs_ioctl_balance+0x3a7/0x460
         btrfs_ioctl+0x24c8/0x4360
         __x64_sys_ioctl+0xc3/0x100
         do_syscall_64+0x37/0x80
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      This occurred because we freed our backref node in
      btrfs_backref_error_cleanup(), but then tried to free it again in
      btrfs_backref_release_cache().  This is because
      btrfs_backref_release_cache() will cycle through all of the
      cache->leaves nodes and free them up.  However
      btrfs_backref_error_cleanup() freed the backref node with
      btrfs_backref_free_node(), which simply kfree()d the backref node
      without unlinking it from the cache.  Change this to a
      btrfs_backref_drop_node(), which does the appropriate cleanup and
      removes the node from the cache->leaves list, so when we go to free the
      remaining cache we don't trip over items we've already dropped.
      
      Fixes: 75bfb9af ("Btrfs: cleanup error handling in build_backref_tree")
      CC: stable@vger.kernel.org # 4.4+
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      49ecc679
    • J
      btrfs: don't get an EINTR during drop_snapshot for reloc · 18d3bff4
      Josef Bacik 提交于
      This was partially fixed by f3e3d9cc ("btrfs: avoid possible signal
      interruption of btrfs_drop_snapshot() on relocation tree"), however it
      missed a spot when we restart a trans handle because we need to end the
      transaction.  The fix is the same, simply use btrfs_join_transaction()
      instead of btrfs_start_transaction() when deleting reloc roots.
      
      Fixes: f3e3d9cc ("btrfs: avoid possible signal interruption of btrfs_drop_snapshot() on relocation tree")
      CC: stable@vger.kernel.org # 5.4+
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      18d3bff4
    • L
      udf: fix the problem that the disc content is not displayed · 5cdc4a69
      lianzhi chang 提交于
      When the capacity of the disc is too large (assuming the 4.7G
      specification), the disc (UDF file system) will be burned
      multiple times in the windows (Multisession Usage). When the
      remaining capacity of the CD is less than 300M (estimated
      value, for reference only), open the CD in the Linux system,
      the content of the CD is displayed as blank (the kernel will
      say "No VRS found"). Windows can display the contents of the
      CD normally.
      Through analysis, in the "fs/udf/super.c": udf_check_vsd
      function, the actual value of VSD_MAX_SECTOR_OFFSET may
      be much larger than 0x800000. According to the current code
      logic, it is found that the type of sbi->s_session is "__s32",
       when the remaining capacity of the disc is less than 300M
      (take a set of test values: sector=3154903040,
      sbi->s_session=1540464, sb->s_blocksize_bits=11 ), the
      calculation result of "sbi->s_session << sb->s_blocksize_bits"
       will overflow. Therefore, it is necessary to convert the
      type of s_session to "loff_t" (when udf_check_vsd starts,
      assign a value to _sector, which is also converted in this
      way), so that the result will not overflow, and then the
      content of the disc can be displayed normally.
      
      Link: https://lore.kernel.org/r/20210114075741.30448-1-changlianzhi@uniontech.comSigned-off-by: Nlianzhi chang <changlianzhi@uniontech.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      5cdc4a69
    • J
      fs/cifs: Simplify bool comparison. · 16a78851
      Jiapeng Zhong 提交于
      Fix the follow warnings:
      
      ./fs/cifs/connect.c: WARNING: Comparison of 0/1 to bool variable
      Reported-by: NAbaci Robot <abaci@linux.alibaba.com>
      Signed-off-by: NJiapeng Zhong <abaci-bugfix@linux.alibaba.com>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      16a78851
    • J
      fs/cifs: Assign boolean values to a bool variable · 2be449fc
      Jiapeng Zhong 提交于
      Fix the following coccicheck warnings:
      
      ./fs/cifs/connect.c:3386:2-21: WARNING: Assignment of 0/1 to
      bool variable.
      Reported-by: NAbaci Robot <abaci@linux.alibaba.com>
      Signed-off-by: NJiapeng Zhong <abaci-bugfix@linux.alibaba.com>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      2be449fc
  9. 17 1月, 2021 6 次提交
    • P
      io_uring: fix skipping disabling sqo on exec · 0b5cd6c3
      Pavel Begunkov 提交于
      If there are no requests at the time __io_uring_task_cancel() is called,
      tctx_inflight() returns zero and and it terminates not getting a chance
      to go through __io_uring_files_cancel() and do
      io_disable_sqo_submit(). And we absolutely want them disabled by the
      time cancellation ends.
      Reported-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Fixes: d9d05217 ("io_uring: stop SQPOLL submit on creator's death")
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      0b5cd6c3
    • P
      io_uring: fix uring_flush in exit_files() warning · 4325cb49
      Pavel Begunkov 提交于
      WARNING: CPU: 1 PID: 11100 at fs/io_uring.c:9096
      	io_uring_flush+0x326/0x3a0 fs/io_uring.c:9096
      RIP: 0010:io_uring_flush+0x326/0x3a0 fs/io_uring.c:9096
      Call Trace:
       filp_close+0xb4/0x170 fs/open.c:1280
       close_files fs/file.c:401 [inline]
       put_files_struct fs/file.c:416 [inline]
       put_files_struct+0x1cc/0x350 fs/file.c:413
       exit_files+0x7e/0xa0 fs/file.c:433
       do_exit+0xc22/0x2ae0 kernel/exit.c:820
       do_group_exit+0x125/0x310 kernel/exit.c:922
       get_signal+0x3e9/0x20a0 kernel/signal.c:2770
       arch_do_signal_or_restart+0x2a8/0x1eb0 arch/x86/kernel/signal.c:811
       handle_signal_work kernel/entry/common.c:147 [inline]
       exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
       exit_to_user_mode_prepare+0x148/0x250 kernel/entry/common.c:201
       __syscall_exit_to_user_mode_work kernel/entry/common.c:291 [inline]
       syscall_exit_to_user_mode+0x19/0x50 kernel/entry/common.c:302
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      An SQPOLL ring creator task may have gotten rid of its file note during
      exit and called io_disable_sqo_submit(), but the io_uring is still left
      referenced through fdtable, which will be put during close_files() and
      cause a false positive warning.
      
      First split the warning into two for more clarity when is hit, and the
      add sqo_dead check to handle the described case.
      
      Reported-by: syzbot+a32b546d58dde07875a1@syzkaller.appspotmail.com
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      4325cb49
    • P
      io_uring: fix false positive sqo warning on flush · 6b393a1f
      Pavel Begunkov 提交于
      WARNING: CPU: 1 PID: 9094 at fs/io_uring.c:8884
      	io_disable_sqo_submit+0x106/0x130 fs/io_uring.c:8884
      Call Trace:
       io_uring_flush+0x28b/0x3a0 fs/io_uring.c:9099
       filp_close+0xb4/0x170 fs/open.c:1280
       close_fd+0x5c/0x80 fs/file.c:626
       __do_sys_close fs/open.c:1299 [inline]
       __se_sys_close fs/open.c:1297 [inline]
       __x64_sys_close+0x2f/0xa0 fs/open.c:1297
       do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      io_uring's final close() may be triggered by any task not only the
      creator. It's well handled by io_uring_flush() including SQPOLL case,
      though a warning in io_disable_sqo_submit() will fallaciously fire by
      moving this warning out to the only call site that matters.
      
      Reported-by: syzbot+2f5d1785dc624932da78@syzkaller.appspotmail.com
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      6b393a1f
    • J
      io_uring: iopoll requests should also wake task ->in_idle state · c93cc9e1
      Jens Axboe 提交于
      If we're freeing/finishing iopoll requests, ensure we check if the task
      is in idling in terms of cancelation. Otherwise we could end up waiting
      forever in __io_uring_task_cancel() if the task has active iopoll
      requests that need cancelation.
      
      Cc: stable@vger.kernel.org # 5.9+
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      c93cc9e1
    • L
      mm: don't play games with pinned pages in clear_page_refs · 9348b73c
      Linus Torvalds 提交于
      Turning a pinned page read-only breaks the pinning after COW.  Don't do it.
      
      The whole "track page soft dirty" state doesn't work with pinned pages
      anyway, since the page might be dirtied by the pinning entity without
      ever being noticed in the page tables.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9348b73c
    • L
      mm: fix clear_refs_write locking · 29a951df
      Linus Torvalds 提交于
      Turning page table entries read-only requires the mmap_sem held for
      writing.
      
      So stop doing the odd games with turning things from read locks to write
      locks and back.  Just get the write lock.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      29a951df
  10. 16 1月, 2021 6 次提交
  11. 14 1月, 2021 5 次提交