1. 02 2月, 2021 29 次提交
  2. 29 1月, 2021 3 次提交
    • P
      io_uring: reinforce cancel on flush during exit · 3a7efd1a
      Pavel Begunkov 提交于
      What 84965ff8 ("io_uring: if we see flush on exit, cancel related tasks")
      really wants is to cancel all relevant REQ_F_INFLIGHT requests reliably.
      That can be achieved by io_uring_cancel_files(), but we'll miss it
      calling io_uring_cancel_task_requests(files=NULL) from io_uring_flush(),
      because it will go through __io_uring_cancel_task_requests().
      
      Just always call io_uring_cancel_files() during cancel, it's good enough
      for now.
      
      Cc: stable@vger.kernel.org # 5.9+
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      3a7efd1a
    • P
      io_uring: fix sqo ownership false positive warning · 70b2c60d
      Pavel Begunkov 提交于
      WARNING: CPU: 0 PID: 21359 at fs/io_uring.c:9042
          io_uring_cancel_task_requests+0xe55/0x10c0 fs/io_uring.c:9042
      Call Trace:
       io_uring_flush+0x47b/0x6e0 fs/io_uring.c:9227
       filp_close+0xb4/0x170 fs/open.c:1295
       close_files fs/file.c:403 [inline]
       put_files_struct fs/file.c:418 [inline]
       put_files_struct+0x1cc/0x350 fs/file.c:415
       exit_files+0x7e/0xa0 fs/file.c:435
       do_exit+0xc22/0x2ae0 kernel/exit.c:820
       do_group_exit+0x125/0x310 kernel/exit.c:922
       get_signal+0x427/0x20f0 kernel/signal.c:2773
       arch_do_signal_or_restart+0x2a8/0x1eb0 arch/x86/kernel/signal.c:811
       handle_signal_work kernel/entry/common.c:147 [inline]
       exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
       exit_to_user_mode_prepare+0x148/0x250 kernel/entry/common.c:201
       __syscall_exit_to_user_mode_work kernel/entry/common.c:291 [inline]
       syscall_exit_to_user_mode+0x19/0x50 kernel/entry/common.c:302
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Now io_uring_cancel_task_requests() can be called not through file
      notes but directly, remove a WARN_ONCE() there that give us false
      positives. That check is not very important and we catch it in other
      places.
      
      Fixes: 84965ff8 ("io_uring: if we see flush on exit, cancel related tasks")
      Cc: stable@vger.kernel.org # 5.9+
      Reported-by: syzbot+3e3d9bd0c6ce9efbc3ef@syzkaller.appspotmail.com
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      70b2c60d
    • P
      io_uring: fix list corruption for splice file_get · f609cbb8
      Pavel Begunkov 提交于
      kernel BUG at lib/list_debug.c:29!
      Call Trace:
       __list_add include/linux/list.h:67 [inline]
       list_add include/linux/list.h:86 [inline]
       io_file_get+0x8cc/0xdb0 fs/io_uring.c:6466
       __io_splice_prep+0x1bc/0x530 fs/io_uring.c:3866
       io_splice_prep fs/io_uring.c:3920 [inline]
       io_req_prep+0x3546/0x4e80 fs/io_uring.c:6081
       io_queue_sqe+0x609/0x10d0 fs/io_uring.c:6628
       io_submit_sqe fs/io_uring.c:6705 [inline]
       io_submit_sqes+0x1495/0x2720 fs/io_uring.c:6953
       __do_sys_io_uring_enter+0x107d/0x1f30 fs/io_uring.c:9353
       do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      io_file_get() may be called from splice, and so REQ_F_INFLIGHT may
      already be set.
      
      Fixes: 02a13674 ("io_uring: account io_uring internal files as REQ_F_INFLIGHT")
      Cc: stable@vger.kernel.org # 5.9+
      Reported-by: syzbot+6879187cf57845801267@syzkaller.appspotmail.com
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      f609cbb8
  3. 28 1月, 2021 1 次提交
    • H
      io_uring: fix flush cqring overflow list while TASK_INTERRUPTIBLE · 6195ba09
      Hao Xu 提交于
      Abaci reported the follow warning:
      
      [   27.073425] do not call blocking ops when !TASK_RUNNING; state=1 set at [] prepare_to_wait_exclusive+0x3a/0xc0
      [   27.075805] WARNING: CPU: 0 PID: 951 at kernel/sched/core.c:7853 __might_sleep+0x80/0xa0
      [   27.077604] Modules linked in:
      [   27.078379] CPU: 0 PID: 951 Comm: a.out Not tainted 5.11.0-rc3+ #1
      [   27.079637] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      [   27.080852] RIP: 0010:__might_sleep+0x80/0xa0
      [   27.081835] Code: 65 48 8b 04 25 80 71 01 00 48 8b 90 c0 15 00 00 48 8b 70 18 48 c7 c7 08 39 95 82 c6 05 f9 5f de 08 01 48 89 d1 e8 00 c6 fa ff  0b eb bf 41 0f b6 f5 48 c7 c7 40 23 c9 82 e8 f3 48 ec 00 eb a7
      [   27.084521] RSP: 0018:ffffc90000fe3ce8 EFLAGS: 00010286
      [   27.085350] RAX: 0000000000000000 RBX: ffffffff82956083 RCX: 0000000000000000
      [   27.086348] RDX: ffff8881057a0000 RSI: ffffffff8118cc9e RDI: ffff88813bc28570
      [   27.087598] RBP: 00000000000003a7 R08: 0000000000000001 R09: 0000000000000001
      [   27.088819] R10: ffffc90000fe3e00 R11: 00000000fffef9f0 R12: 0000000000000000
      [   27.089819] R13: 0000000000000000 R14: ffff88810576eb80 R15: ffff88810576e800
      [   27.091058] FS:  00007f7b144cf740(0000) GS:ffff88813bc00000(0000) knlGS:0000000000000000
      [   27.092775] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   27.093796] CR2: 00000000022da7b8 CR3: 000000010b928002 CR4: 00000000003706f0
      [   27.094778] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [   27.095780] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [   27.097011] Call Trace:
      [   27.097685]  __mutex_lock+0x5d/0xa30
      [   27.098565]  ? prepare_to_wait_exclusive+0x71/0xc0
      [   27.099412]  ? io_cqring_overflow_flush.part.101+0x6d/0x70
      [   27.100441]  ? lockdep_hardirqs_on_prepare+0xe9/0x1c0
      [   27.101537]  ? _raw_spin_unlock_irqrestore+0x2d/0x40
      [   27.102656]  ? trace_hardirqs_on+0x46/0x110
      [   27.103459]  ? io_cqring_overflow_flush.part.101+0x6d/0x70
      [   27.104317]  io_cqring_overflow_flush.part.101+0x6d/0x70
      [   27.105113]  io_cqring_wait+0x36e/0x4d0
      [   27.105770]  ? find_held_lock+0x28/0xb0
      [   27.106370]  ? io_uring_remove_task_files+0xa0/0xa0
      [   27.107076]  __x64_sys_io_uring_enter+0x4fb/0x640
      [   27.107801]  ? rcu_read_lock_sched_held+0x59/0xa0
      [   27.108562]  ? lockdep_hardirqs_on_prepare+0xe9/0x1c0
      [   27.109684]  ? syscall_enter_from_user_mode+0x26/0x70
      [   27.110731]  do_syscall_64+0x2d/0x40
      [   27.111296]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [   27.112056] RIP: 0033:0x7f7b13dc8239
      [   27.112663] Code: 01 00 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05  3d 01 f0 ff ff 73 01 c3 48 8b 0d 27 ec 2c 00 f7 d8 64 89 01 48
      [   27.115113] RSP: 002b:00007ffd6d7f5c88 EFLAGS: 00000286 ORIG_RAX: 00000000000001aa
      [   27.116562] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f7b13dc8239
      [   27.117961] RDX: 000000000000478e RSI: 0000000000000000 RDI: 0000000000000003
      [   27.118925] RBP: 00007ffd6d7f5cb0 R08: 0000000020000040 R09: 0000000000000008
      [   27.119773] R10: 0000000000000001 R11: 0000000000000286 R12: 0000000000400480
      [   27.120614] R13: 00007ffd6d7f5d90 R14: 0000000000000000 R15: 0000000000000000
      [   27.121490] irq event stamp: 5635
      [   27.121946] hardirqs last  enabled at (5643): [] console_unlock+0x5c4/0x740
      [   27.123476] hardirqs last disabled at (5652): [] console_unlock+0x4e7/0x740
      [   27.125192] softirqs last  enabled at (5272): [] __do_softirq+0x3c5/0x5aa
      [   27.126430] softirqs last disabled at (5267): [] asm_call_irq_on_stack+0xf/0x20
      [   27.127634] ---[ end trace 289d7e28fa60f928 ]---
      
      This is caused by calling io_cqring_overflow_flush() which may sleep
      after calling prepare_to_wait_exclusive() which set task state to
      TASK_INTERRUPTIBLE
      Reported-by: NAbaci <abaci@linux.alibaba.com>
      Fixes: 6c503150 ("io_uring: patch up IOPOLL overflow_flush sync")
      Reviewed-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NHao Xu <haoxu@linux.alibaba.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      6195ba09
  4. 27 1月, 2021 2 次提交
    • P
      io_uring: fix wqe->lock/completion_lock deadlock · 907d1df3
      Pavel Begunkov 提交于
      Joseph reports following deadlock:
      
      CPU0:
      ...
      io_kill_linked_timeout  // &ctx->completion_lock
      io_commit_cqring
      __io_queue_deferred
      __io_queue_async_work
      io_wq_enqueue
      io_wqe_enqueue  // &wqe->lock
      
      CPU1:
      ...
      __io_uring_files_cancel
      io_wq_cancel_cb
      io_wqe_cancel_pending_work  // &wqe->lock
      io_cancel_task_cb  // &ctx->completion_lock
      
      Only __io_queue_deferred() calls queue_async_work() while holding
      ctx->completion_lock, enqueue drained requests via io_req_task_queue()
      instead.
      
      Cc: stable@vger.kernel.org # 5.9+
      Reported-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      Tested-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      907d1df3
    • P
      io_uring: fix cancellation taking mutex while TASK_UNINTERRUPTIBLE · ca70f00b
      Pavel Begunkov 提交于
      do not call blocking ops when !TASK_RUNNING; state=2 set at
      	[<00000000ced9dbfc>] prepare_to_wait+0x1f4/0x3b0
      	kernel/sched/wait.c:262
      WARNING: CPU: 1 PID: 19888 at kernel/sched/core.c:7853
      	__might_sleep+0xed/0x100 kernel/sched/core.c:7848
      RIP: 0010:__might_sleep+0xed/0x100 kernel/sched/core.c:7848
      Call Trace:
       __mutex_lock_common+0xc4/0x2ef0 kernel/locking/mutex.c:935
       __mutex_lock kernel/locking/mutex.c:1103 [inline]
       mutex_lock_nested+0x1a/0x20 kernel/locking/mutex.c:1118
       io_wq_submit_work+0x39a/0x720 fs/io_uring.c:6411
       io_run_cancel fs/io-wq.c:856 [inline]
       io_wqe_cancel_pending_work fs/io-wq.c:990 [inline]
       io_wq_cancel_cb+0x614/0xcb0 fs/io-wq.c:1027
       io_uring_cancel_files fs/io_uring.c:8874 [inline]
       io_uring_cancel_task_requests fs/io_uring.c:8952 [inline]
       __io_uring_files_cancel+0x115d/0x19e0 fs/io_uring.c:9038
       io_uring_files_cancel include/linux/io_uring.h:51 [inline]
       do_exit+0x2e6/0x2490 kernel/exit.c:780
       do_group_exit+0x168/0x2d0 kernel/exit.c:922
       get_signal+0x16b5/0x2030 kernel/signal.c:2770
       arch_do_signal_or_restart+0x8e/0x6a0 arch/x86/kernel/signal.c:811
       handle_signal_work kernel/entry/common.c:147 [inline]
       exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
       exit_to_user_mode_prepare+0xac/0x1e0 kernel/entry/common.c:201
       __syscall_exit_to_user_mode_work kernel/entry/common.c:291 [inline]
       syscall_exit_to_user_mode+0x48/0x190 kernel/entry/common.c:302
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Rewrite io_uring_cancel_files() to mimic __io_uring_task_cancel()'s
      counting scheme, so it does all the heavy work before setting
      TASK_UNINTERRUPTIBLE.
      
      Cc: stable@vger.kernel.org # 5.9+
      Reported-by: syzbot+f655445043a26a7cfab8@syzkaller.appspotmail.com
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      [axboe: fix inverted task check]
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      ca70f00b
  5. 26 1月, 2021 1 次提交
  6. 25 1月, 2021 4 次提交
    • J
      io_uring: only call io_cqring_ev_posted() if events were posted · b18032bb
      Jens Axboe 提交于
      This normally doesn't cause any extra harm, but it does mean that we'll
      increment the eventfd notification count, if one has been registered
      with the ring. This can confuse applications, when they see more
      notifications on the eventfd side than are available in the ring.
      
      Do the nice thing and only increment this count, if we actually posted
      (or even overflowed) events.
      Reported-and-tested-by: NDan Melnic <dmm@fb.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      b18032bb
    • J
      io_uring: if we see flush on exit, cancel related tasks · 84965ff8
      Jens Axboe 提交于
      Ensure we match tasks that belong to a dead or dying task as well, as we
      need to reap those in addition to those belonging to the exiting task.
      
      Cc: stable@vger.kernel.org # 5.9+
      Reported-by: NJosef Grieb <josef.grieb@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      84965ff8
    • J
      io_uring: account io_uring internal files as REQ_F_INFLIGHT · 02a13674
      Jens Axboe 提交于
      We need to actively cancel anything that introduces a potential circular
      loop, where io_uring holds a reference to itself. If the file in question
      is an io_uring file, then add the request to the inflight list.
      
      Cc: stable@vger.kernel.org # 5.9+
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      02a13674
    • P
      io_uring: fix sleeping under spin in __io_clean_op · 9d5c8190
      Pavel Begunkov 提交于
      [   27.629441] BUG: sleeping function called from invalid context
      	at fs/file.c:402
      [   27.631317] in_atomic(): 1, irqs_disabled(): 1, non_block: 0,
      	pid: 1012, name: io_wqe_worker-0
      [   27.633220] 1 lock held by io_wqe_worker-0/1012:
      [   27.634286]  #0: ffff888105e26c98 (&ctx->completion_lock)
      	{....}-{2:2}, at: __io_req_complete.part.102+0x30/0x70
      [   27.649249] Call Trace:
      [   27.649874]  dump_stack+0xac/0xe3
      [   27.650666]  ___might_sleep+0x284/0x2c0
      [   27.651566]  put_files_struct+0xb8/0x120
      [   27.652481]  __io_clean_op+0x10c/0x2a0
      [   27.653362]  __io_cqring_fill_event+0x2c1/0x350
      [   27.654399]  __io_req_complete.part.102+0x41/0x70
      [   27.655464]  io_openat2+0x151/0x300
      [   27.656297]  io_issue_sqe+0x6c/0x14e0
      [   27.660991]  io_wq_submit_work+0x7f/0x240
      [   27.662890]  io_worker_handle_work+0x501/0x8a0
      [   27.664836]  io_wqe_worker+0x158/0x520
      [   27.667726]  kthread+0x134/0x180
      [   27.669641]  ret_from_fork+0x1f/0x30
      
      Instead of cleaning files on overflow, return back overflow cancellation
      into io_uring_cancel_files(). Previously it was racy to clean
      REQ_F_OVERFLOW flag, but we got rid of it, and can do it through
      repetitive attempts targeting all matching requests.
      Reported-by: NAbaci <abaci@linux.alibaba.com>
      Reported-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      Cc: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      9d5c8190