1. 26 2月, 2021 3 次提交
    • J
      io-wq: improve manager/worker handling over exec · 4fb6ac32
      Jens Axboe 提交于
      exec will cancel any threads, including the ones that io-wq is using. This
      isn't a problem, in fact we'd prefer it to be that way since it means we
      know that any async work cancels naturally without having to handle it
      proactively.
      
      But it does mean that we need to setup a new manager, as the manager and
      workers are gone. Handle this at queue time, and cancel work if we fail.
      Since the manager can go away without us noticing, ensure that the manager
      itself holds a reference to the 'wq' as well. Rename io_wq_destroy() to
      io_wq_put() to reflect that.
      
      In the future we can now simplify exec cancelation handling, for now just
      leave it the same.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      4fb6ac32
    • J
      io_uring: ensure SQPOLL startup is triggered before error shutdown · eb85890b
      Jens Axboe 提交于
      syzbot reports the following hang:
      
      INFO: task syz-executor.0:12538 can't die for more than 143 seconds.
      task:syz-executor.0  state:D stack:28352 pid:12538 ppid:  8423 flags:0x00004004
      Call Trace:
       context_switch kernel/sched/core.c:4324 [inline]
       __schedule+0x90c/0x21a0 kernel/sched/core.c:5075
       schedule+0xcf/0x270 kernel/sched/core.c:5154
       schedule_timeout+0x1db/0x250 kernel/time/timer.c:1868
       do_wait_for_common kernel/sched/completion.c:85 [inline]
       __wait_for_common kernel/sched/completion.c:106 [inline]
       wait_for_common kernel/sched/completion.c:117 [inline]
       wait_for_completion+0x168/0x270 kernel/sched/completion.c:138
       io_sq_thread_finish+0x96/0x580 fs/io_uring.c:7152
       io_sq_offload_create fs/io_uring.c:7929 [inline]
       io_uring_create fs/io_uring.c:9465 [inline]
       io_uring_setup+0x1fb2/0x2c20 fs/io_uring.c:9550
       do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      which is due to exiting after the SQPOLL thread has been created, but
      hasn't been started yet. Ensure that we always complete the startup
      side when waiting for it to exit.
      
      Reported-by: syzbot+c927c937cba8ef66dd4a@syzkaller.appspotmail.com
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      eb85890b
    • J
      io-wq: make buffered file write hashed work map per-ctx · e941894e
      Jens Axboe 提交于
      Before the io-wq thread change, we maintained a hash work map and lock
      per-node per-ring. That wasn't ideal, as we really wanted it to be per
      ring. But now that we have per-task workers, the hash map ends up being
      just per-task. That'll work just fine for the normal case of having
      one task use a ring, but if you share the ring between tasks, then it's
      considerably worse than it was before.
      
      Make the hash map per ctx instead, which provides full per-ctx buffered
      write serialization on hashed writes.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      e941894e
  2. 24 2月, 2021 5 次提交
  3. 22 2月, 2021 7 次提交
  4. 21 2月, 2021 7 次提交
    • P
      io_uring: fix leaving invalid req->flags · ebf4a5db
      Pavel Begunkov 提交于
      sqe->flags are subset of req flags, so incorrectly copied may span into
      in-kernel flags and wreck havoc, e.g. by setting REQ_F_INFLIGHT.
      
      Fixes: 5be9ad1e ("io_uring: optimise io_init_req() flags setting")
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      ebf4a5db
    • P
      io_uring: wait potential ->release() on resurrect · 88f171ab
      Pavel Begunkov 提交于
      There is a short window where percpu_refs are already turned zero, but
      we try to do resurrect(). Play nicer and wait for ->release() to happen
      in this case and proceed as everything is ok. One downside for ctx refs
      is that we can ignore signal_pending() on a rare occasion, but someone
      else should check for it later if needed.
      
      Cc: <stable@vger.kernel.org> # 5.5+
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      88f171ab
    • P
      io_uring: keep generic rsrc infra generic · f2303b1f
      Pavel Begunkov 提交于
      io_rsrc_ref_quiesce() is a generic resource function, though now it
      was wired to allocate and initialise ref nodes with file-specific
      callbacks/etc. Keep it sane by passing in as a parameters everything we
      need for initialisations, otherwise it will hurt us badly one day.
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      f2303b1f
    • P
      io_uring: zero ref_node after killing it · e6cb007c
      Pavel Begunkov 提交于
      After a rsrc/files reference node's refs are killed, it must never be
      used. And that's how it works, it either assigns a new node or kills the
      whole data table.
      
      Let's explicitly NULL it, that shouldn't be necessary, but if something
      would go wrong I'd rather catch a NULL dereference to using a dangling
      pointer.
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      e6cb007c
    • J
      io_uring: make the !CONFIG_NET helpers a bit more robust · 99a10081
      Jens Axboe 提交于
      With the prep and prep async split, we now have potentially 3 helpers
      that need to be defined for !CONFIG_NET. Add some helpers to do just
      that.
      
      Fixes the following compile error on !CONFIG_NET:
      
      fs/io_uring.c:6171:10: error: implicit declaration of function
      'io_sendmsg_prep_async'; did you mean 'io_req_prep_async'?
      [-Werror=implicit-function-declaration]
         return io_sendmsg_prep_async(req);
                   ^~~~~~~~~~~~~~~~~~~~~
      	     io_req_prep_async
      
      Fixes: 93642ef8 ("io_uring: split sqe-prep and async setup")
      Reported-by: NNaresh Kamboju <naresh.kamboju@linaro.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      99a10081
    • H
      io_uring: don't hold uring_lock when calling io_run_task_work* · 8bad28d8
      Hao Xu 提交于
      Abaci reported the below issue:
      [  141.400455] hrtimer: interrupt took 205853 ns
      [  189.869316] process 'usr/local/ilogtail/ilogtail_0.16.26' started with executable stack
      [  250.188042]
      [  250.188327] ============================================
      [  250.189015] WARNING: possible recursive locking detected
      [  250.189732] 5.11.0-rc4 #1 Not tainted
      [  250.190267] --------------------------------------------
      [  250.190917] a.out/7363 is trying to acquire lock:
      [  250.191506] ffff888114dbcbe8 (&ctx->uring_lock){+.+.}-{3:3}, at: __io_req_task_submit+0x29/0xa0
      [  250.192599]
      [  250.192599] but task is already holding lock:
      [  250.193309] ffff888114dbfbe8 (&ctx->uring_lock){+.+.}-{3:3}, at: __x64_sys_io_uring_register+0xad/0x210
      [  250.194426]
      [  250.194426] other info that might help us debug this:
      [  250.195238]  Possible unsafe locking scenario:
      [  250.195238]
      [  250.196019]        CPU0
      [  250.196411]        ----
      [  250.196803]   lock(&ctx->uring_lock);
      [  250.197420]   lock(&ctx->uring_lock);
      [  250.197966]
      [  250.197966]  *** DEADLOCK ***
      [  250.197966]
      [  250.198837]  May be due to missing lock nesting notation
      [  250.198837]
      [  250.199780] 1 lock held by a.out/7363:
      [  250.200373]  #0: ffff888114dbfbe8 (&ctx->uring_lock){+.+.}-{3:3}, at: __x64_sys_io_uring_register+0xad/0x210
      [  250.201645]
      [  250.201645] stack backtrace:
      [  250.202298] CPU: 0 PID: 7363 Comm: a.out Not tainted 5.11.0-rc4 #1
      [  250.203144] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      [  250.203887] Call Trace:
      [  250.204302]  dump_stack+0xac/0xe3
      [  250.204804]  __lock_acquire+0xab6/0x13a0
      [  250.205392]  lock_acquire+0x2c3/0x390
      [  250.205928]  ? __io_req_task_submit+0x29/0xa0
      [  250.206541]  __mutex_lock+0xae/0x9f0
      [  250.207071]  ? __io_req_task_submit+0x29/0xa0
      [  250.207745]  ? 0xffffffffa0006083
      [  250.208248]  ? __io_req_task_submit+0x29/0xa0
      [  250.208845]  ? __io_req_task_submit+0x29/0xa0
      [  250.209452]  ? __io_req_task_submit+0x5/0xa0
      [  250.210083]  __io_req_task_submit+0x29/0xa0
      [  250.210687]  io_async_task_func+0x23d/0x4c0
      [  250.211278]  task_work_run+0x89/0xd0
      [  250.211884]  io_run_task_work_sig+0x50/0xc0
      [  250.212464]  io_sqe_files_unregister+0xb2/0x1f0
      [  250.213109]  __io_uring_register+0x115a/0x1750
      [  250.213718]  ? __x64_sys_io_uring_register+0xad/0x210
      [  250.214395]  ? __fget_files+0x15a/0x260
      [  250.214956]  __x64_sys_io_uring_register+0xbe/0x210
      [  250.215620]  ? trace_hardirqs_on+0x46/0x110
      [  250.216205]  do_syscall_64+0x2d/0x40
      [  250.216731]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [  250.217455] RIP: 0033:0x7f0fa17e5239
      [  250.218034] Code: 01 00 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05  3d 01 f0 ff ff 73 01 c3 48 8b 0d 27 ec 2c 00 f7 d8 64 89 01 48
      [  250.220343] RSP: 002b:00007f0fa1eeac48 EFLAGS: 00000246 ORIG_RAX: 00000000000001ab
      [  250.221360] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f0fa17e5239
      [  250.222272] RDX: 0000000000000000 RSI: 0000000000000003 RDI: 0000000000000008
      [  250.223185] RBP: 00007f0fa1eeae20 R08: 0000000000000000 R09: 0000000000000000
      [  250.224091] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
      [  250.224999] R13: 0000000000021000 R14: 0000000000000000 R15: 00007f0fa1eeb700
      
      This is caused by calling io_run_task_work_sig() to do work under
      uring_lock while the caller io_sqe_files_unregister() already held
      uring_lock.
      To fix this issue, briefly drop uring_lock when calling
      io_run_task_work_sig(), and there are two things to concern:
      
      - hold uring_lock in io_ring_ctx_free() around io_sqe_files_unregister()
          this is for consistency of lock/unlock.
      - add new fixed rsrc ref node before dropping uring_lock
          it's not safe to do io_uring_enter-->percpu_ref_get() with a dying one.
      - check if rsrc_data->refs is dying to avoid parallel io_sqe_files_unregister
      Reported-by: NAbaci <abaci@linux.alibaba.com>
      Fixes: 1ffc5422 ("io_uring: fix io_sqe_files_unregister() hangs")
      Suggested-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NHao Xu <haoxu@linux.alibaba.com>
      [axboe: fixes from Pavel folded in]
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      8bad28d8
    • P
      io_uring: fail io-wq submission from a task_work · a3df7698
      Pavel Begunkov 提交于
      In case of failure io_wq_submit_work() needs to post an CQE and so
      potentially take uring_lock. The safest way to deal with it is to do
      that from under task_work where we can safely take the lock.
      
      Also, as io_iopoll_check() holds the lock tight and releases it
      reluctantly, it will play nicer in the furuter with notifying an
      iopolling task about new such pending failed requests.
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      a3df7698
  5. 19 2月, 2021 12 次提交
  6. 18 2月, 2021 1 次提交
  7. 17 2月, 2021 1 次提交
  8. 14 2月, 2021 3 次提交
  9. 13 2月, 2021 1 次提交