1. 28 9月, 2021 1 次提交
  2. 25 9月, 2021 1 次提交
  3. 14 9月, 2021 1 次提交
  4. 13 9月, 2021 2 次提交
  5. 09 9月, 2021 2 次提交
    • Q
      io-wq: fix memory leak in create_io_worker() · 66e70be7
      Qiang.zhang 提交于
      BUG: memory leak
      unreferenced object 0xffff888126fcd6c0 (size 192):
        comm "syz-executor.1", pid 11934, jiffies 4294983026 (age 15.690s)
        backtrace:
          [<ffffffff81632c91>] kmalloc_node include/linux/slab.h:609 [inline]
          [<ffffffff81632c91>] kzalloc_node include/linux/slab.h:732 [inline]
          [<ffffffff81632c91>] create_io_worker+0x41/0x1e0 fs/io-wq.c:739
          [<ffffffff8163311e>] io_wqe_create_worker fs/io-wq.c:267 [inline]
          [<ffffffff8163311e>] io_wqe_enqueue+0x1fe/0x330 fs/io-wq.c:866
          [<ffffffff81620b64>] io_queue_async_work+0xc4/0x200 fs/io_uring.c:1473
          [<ffffffff8162c59c>] __io_queue_sqe+0x34c/0x510 fs/io_uring.c:6933
          [<ffffffff8162c7ab>] io_req_task_submit+0x4b/0xa0 fs/io_uring.c:2233
          [<ffffffff8162cb48>] io_async_task_func+0x108/0x1c0 fs/io_uring.c:5462
          [<ffffffff816259e3>] tctx_task_work+0x1b3/0x3a0 fs/io_uring.c:2158
          [<ffffffff81269b43>] task_work_run+0x73/0xb0 kernel/task_work.c:164
          [<ffffffff812dcdd1>] tracehook_notify_signal include/linux/tracehook.h:212 [inline]
          [<ffffffff812dcdd1>] handle_signal_work kernel/entry/common.c:146 [inline]
          [<ffffffff812dcdd1>] exit_to_user_mode_loop kernel/entry/common.c:172 [inline]
          [<ffffffff812dcdd1>] exit_to_user_mode_prepare+0x151/0x180 kernel/entry/common.c:209
          [<ffffffff843ff25d>] __syscall_exit_to_user_mode_work kernel/entry/common.c:291 [inline]
          [<ffffffff843ff25d>] syscall_exit_to_user_mode+0x1d/0x40 kernel/entry/common.c:302
          [<ffffffff843fa4a2>] do_syscall_64+0x42/0xb0 arch/x86/entry/common.c:86
          [<ffffffff84600068>] entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      when create_io_thread() return error, and not retry, the worker object
      need to be freed.
      
      Reported-by: syzbot+65454c239241d3d647da@syzkaller.appspotmail.com
      Signed-off-by: NQiang.zhang <qiang.zhang@windriver.com>
      Link: https://lore.kernel.org/r/20210909115822.181188-1-qiang.zhang@windriver.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
      66e70be7
    • J
      io-wq: fix silly logic error in io_task_work_match() · 3b33e3f4
      Jens Axboe 提交于
      We check for the func with an OR condition, which means it always ends
      up being false and we never match the task_work we want to cancel. In
      the unexpected case that we do exit with that pending, we can trigger
      a hang waiting for a worker to exit, but it was never created. syzbot
      reports that as such:
      
      INFO: task syz-executor687:8514 blocked for more than 143 seconds.
            Not tainted 5.14.0-syzkaller #0
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      task:syz-executor687 state:D stack:27296 pid: 8514 ppid:  8479 flags:0x00024004
      Call Trace:
       context_switch kernel/sched/core.c:4940 [inline]
       __schedule+0x940/0x26f0 kernel/sched/core.c:6287
       schedule+0xd3/0x270 kernel/sched/core.c:6366
       schedule_timeout+0x1db/0x2a0 kernel/time/timer.c:1857
       do_wait_for_common kernel/sched/completion.c:85 [inline]
       __wait_for_common kernel/sched/completion.c:106 [inline]
       wait_for_common kernel/sched/completion.c:117 [inline]
       wait_for_completion+0x176/0x280 kernel/sched/completion.c:138
       io_wq_exit_workers fs/io-wq.c:1162 [inline]
       io_wq_put_and_exit+0x40c/0xc70 fs/io-wq.c:1197
       io_uring_clean_tctx fs/io_uring.c:9607 [inline]
       io_uring_cancel_generic+0x5fe/0x740 fs/io_uring.c:9687
       io_uring_files_cancel include/linux/io_uring.h:16 [inline]
       do_exit+0x265/0x2a30 kernel/exit.c:780
       do_group_exit+0x125/0x310 kernel/exit.c:922
       get_signal+0x47f/0x2160 kernel/signal.c:2868
       arch_do_signal_or_restart+0x2a9/0x1c40 arch/x86/kernel/signal.c:865
       handle_signal_work kernel/entry/common.c:148 [inline]
       exit_to_user_mode_loop kernel/entry/common.c:172 [inline]
       exit_to_user_mode_prepare+0x17d/0x290 kernel/entry/common.c:209
       __syscall_exit_to_user_mode_work kernel/entry/common.c:291 [inline]
       syscall_exit_to_user_mode+0x19/0x60 kernel/entry/common.c:302
       do_syscall_64+0x42/0xb0 arch/x86/entry/common.c:86
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      RIP: 0033:0x445cd9
      RSP: 002b:00007fc657f4b308 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
      RAX: 0000000000000001 RBX: 00000000004cb448 RCX: 0000000000445cd9
      RDX: 00000000000f4240 RSI: 0000000000000081 RDI: 00000000004cb44c
      RBP: 00000000004cb440 R08: 000000000000000e R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 000000000049b154
      R13: 0000000000000003 R14: 00007fc657f4b400 R15: 0000000000022000
      
      While in there, also decrement accr->nr_workers. This isn't strictly
      needed as we're exiting, but let's make sure the accounting matches up.
      
      Fixes: 3146cba9 ("io-wq: make worker creation resilient against signals")
      Reported-by: syzbot+f62d3e0a4ea4f38f5326@syzkaller.appspotmail.com
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      3b33e3f4
  6. 08 9月, 2021 1 次提交
    • P
      io-wq: fix cancellation on create-worker failure · 713b9825
      Pavel Begunkov 提交于
      WARNING: CPU: 0 PID: 10392 at fs/io_uring.c:1151 req_ref_put_and_test
      fs/io_uring.c:1151 [inline]
      WARNING: CPU: 0 PID: 10392 at fs/io_uring.c:1151 req_ref_put_and_test
      fs/io_uring.c:1146 [inline]
      WARNING: CPU: 0 PID: 10392 at fs/io_uring.c:1151
      io_req_complete_post+0xf5b/0x1190 fs/io_uring.c:1794
      Modules linked in:
      Call Trace:
       tctx_task_work+0x1e5/0x570 fs/io_uring.c:2158
       task_work_run+0xe0/0x1a0 kernel/task_work.c:164
       tracehook_notify_signal include/linux/tracehook.h:212 [inline]
       handle_signal_work kernel/entry/common.c:146 [inline]
       exit_to_user_mode_loop kernel/entry/common.c:172 [inline]
       exit_to_user_mode_prepare+0x232/0x2a0 kernel/entry/common.c:209
       __syscall_exit_to_user_mode_work kernel/entry/common.c:291 [inline]
       syscall_exit_to_user_mode+0x19/0x60 kernel/entry/common.c:302
       do_syscall_64+0x42/0xb0 arch/x86/entry/common.c:86
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      When io_wqe_enqueue() -> io_wqe_create_worker() fails, we can't just
      call io_run_cancel() to clean up the request, it's already enqueued via
      io_wqe_insert_work() and will be executed either by some other worker
      during cancellation (e.g. in io_wq_put_and_exit()).
      Reported-by: NHao Sun <sunhao.th@gmail.com>
      Fixes: 3146cba9 ("io-wq: make worker creation resilient against signals")
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Link: https://lore.kernel.org/r/93b9de0fcf657affab0acfd675d4abcd273ee863.1631092071.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
      713b9825
  7. 03 9月, 2021 2 次提交
    • J
      io-wq: make worker creation resilient against signals · 3146cba9
      Jens Axboe 提交于
      If a task is queueing async work and also handling signals, then we can
      run into the case where create_io_thread() is interrupted and returns
      failure because of that. If this happens for creating the first worker
      in a group, then that worker will never get created and we can hang the
      ring.
      
      If we do get a fork failure, retry from task_work. With signals we have
      to be a bit careful as we cannot simply queue as task_work, as we'll
      still have signals pending at that point. Punt over a normal workqueue
      first and then create from task_work after that.
      
      Lastly, ensure that we handle fatal worker creations. Worker creation
      failures are normally not fatal, only if we fail to create one in an empty
      worker group can we not make progress. Right now that is ignored, ensure
      that we handle that and run cancel on the work item.
      
      There are two paths that create new workers - one is the "existing worker
      going to sleep", and the other is "no workers found for this work, create
      one". The former is never fatal, as workers do exist in the group. Only
      the latter needs to be carefully handled.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      3146cba9
    • J
      io-wq: get rid of FIXED worker flag · 05c5f4ee
      Jens Axboe 提交于
      It makes the logic easier to follow if we just get rid of the fixed worker
      flag, and simply ensure that we never exit the last worker in the group.
      This also means that no particular worker is special.
      
      Just track the last timeout state, and if we have hit it and no work
      is pending, check if there are other workers. If yes, then we can exit
      this one safely.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      05c5f4ee
  8. 02 9月, 2021 2 次提交
    • J
      io-wq: only exit on fatal signals · 15e20db2
      Jens Axboe 提交于
      If the application uses io_uring and also relies heavily on signals
      for communication, that can cause io-wq workers to spuriously exit
      just because the parent has a signal pending. Just ignore signals
      unless they are fatal.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      15e20db2
    • J
      io-wq: split bounded and unbounded work into separate lists · f95dc207
      Jens Axboe 提交于
      We've got a few issues that all boil down to the fact that we have one
      list of pending work items, yet two different types of workers to
      serve them. This causes some oddities around workers switching type and
      even hashed work vs regular work on the same bounded list.
      
      Just separate them out cleanly, similarly to how we already do
      accounting of what is running. That provides a clean separation and
      removes some corner cases that can cause stalls when handling IO
      that is punted to io-wq.
      
      Fixes: ecc53c48 ("io-wq: check max_worker limits if a worker transitions bound state")
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      f95dc207
  9. 01 9月, 2021 3 次提交
    • J
      io-wq: fix queue stalling race · 0242f642
      Jens Axboe 提交于
      We need to set the stalled bit early, before we drop the lock for adding
      us to the stall hash queue. If not, then we can race with new work being
      queued between adding us to the stall hash and io_worker_handle_work()
      marking us stalled.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      0242f642
    • J
      io-wq: ensure that hash wait lock is IRQ disabling · 08bdbd39
      Jens Axboe 提交于
      A previous commit removed the IRQ safety of the worker and wqe locks,
      but that left one spot of the hash wait lock now being done without
      already having IRQs disabled.
      
      Ensure that we use the right locking variant for the hashed waitqueue
      lock.
      
      Fixes: a9a4aa9f ("io-wq: wqe and worker locks no longer need to be IRQ safe")
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      08bdbd39
    • J
      io-wq: fix race between adding work and activating a free worker · 94ffb0a2
      Jens Axboe 提交于
      The attempt to find and activate a free worker for new work is currently
      combined with creating a new one if we don't find one, but that opens
      io-wq up to a race where the worker that is found and activated can
      put itself to sleep without knowing that it has been selected to perform
      this new work.
      
      Fix this by moving the activation into where we add the new work item,
      then we can retain it within the wqe->lock scope and elimiate the race
      with the worker itself checking inside the lock, but sleeping outside of
      it.
      
      Cc: stable@vger.kernel.org
      Reported-by: NAndres Freund <andres@anarazel.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      94ffb0a2
  10. 30 8月, 2021 3 次提交
    • J
      io-wq: fix wakeup race when adding new work · 87df7fb9
      Jens Axboe 提交于
      When new work is added, io_wqe_enqueue() checks if we need to wake or
      create a new worker. But that check is done outside the lock that
      otherwise synchronizes us with a worker going to sleep, so we can end
      up in the following situation:
      
      CPU0				CPU1
      lock
      insert work
      unlock
      atomic_read(nr_running) != 0
      				lock
      				atomic_dec(nr_running)
      no wakeup needed
      
      Hold the wqe lock around the "need to wakeup" check. Then we can also get
      rid of the temporary work_flags variable, as we know the work will remain
      valid as long as we hold the lock.
      
      Cc: stable@vger.kernel.org
      Reported-by: NAndres Freund <andres@anarazel.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      87df7fb9
    • J
      io-wq: wqe and worker locks no longer need to be IRQ safe · a9a4aa9f
      Jens Axboe 提交于
      io_uring no longer queues async work off completion handlers that run in
      hard or soft interrupt context, and that use case was the only reason that
      io-wq had to use IRQ safe locks for wqe and worker locks.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      a9a4aa9f
    • J
      io-wq: check max_worker limits if a worker transitions bound state · ecc53c48
      Jens Axboe 提交于
      For the two places where new workers are created, we diligently check if
      we are allowed to create a new worker. If we're currently at the limit
      of how many workers of a given type we can have, then we don't create
      any new ones.
      
      If you have a mixed workload with various types of bound and unbounded
      work, then it can happen that a worker finishes one type of work and
      is then transitioned to the other type. For this case, we don't check
      if we are actually allowed to do so. This can cause io-wq to temporarily
      exceed the allowed number of workers for a given type.
      
      When retrieving work, check that the types match. If they don't, check
      if we are allowed to transition to the other type. If not, then don't
      handle the new work.
      
      Cc: stable@vger.kernel.org
      Reported-by: NJohannes Lundberg <johalun0@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      ecc53c48
  11. 29 8月, 2021 1 次提交
    • J
      io-wq: provide a way to limit max number of workers · 2e480058
      Jens Axboe 提交于
      io-wq divides work into two categories:
      
      1) Work that completes in a bounded time, like reading from a regular file
         or a block device. This type of work is limited based on the size of
         the SQ ring.
      
      2) Work that may never complete, we call this unbounded work. The amount
         of workers here is just limited by RLIMIT_NPROC.
      
      For various uses cases, it's handy to have the kernel limit the maximum
      amount of pending workers for both categories. Provide a way to do with
      with a new IORING_REGISTER_IOWQ_MAX_WORKERS operation.
      
      IORING_REGISTER_IOWQ_MAX_WORKERS takes an array of two integers and sets
      the max worker count to what is being passed in for each category. The
      old values are returned into that same array. If 0 is being passed in for
      either category, it simply returns the current value.
      
      The value is capped at RLIMIT_NPROC. This actually isn't that important
      as it's more of a hint, if we're exceeding the value then our attempt
      to fork a new worker will fail. This happens naturally already if more
      than one node is in the system, as these values are per-node internally
      for io-wq.
      Reported-by: NJohannes Lundberg <johalun0@gmail.com>
      Link: https://github.com/axboe/liburing/issues/420Signed-off-by: NJens Axboe <axboe@kernel.dk>
      2e480058
  12. 24 8月, 2021 2 次提交
    • H
      io-wq: move nr_running and worker_refs out of wqe->lock protection · 79dca184
      Hao Xu 提交于
      We don't need to protect nr_running and worker_refs by wqe->lock, so
      narrow the range of raw_spin_lock_irq - raw_spin_unlock_irq
      Signed-off-by: NHao Xu <haoxu@linux.alibaba.com>
      Link: https://lore.kernel.org/r/20210810125554.99229-1-haoxu@linux.alibaba.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
      79dca184
    • J
      io-wq: remove GFP_ATOMIC allocation off schedule out path · d3e9f732
      Jens Axboe 提交于
      Daniel reports that the v5.14-rc4-rt4 kernel throws a BUG when running
      stress-ng:
      
      | [   90.202543] BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:35
      | [   90.202549] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 2047, name: iou-wrk-2041
      | [   90.202555] CPU: 5 PID: 2047 Comm: iou-wrk-2041 Tainted: G        W         5.14.0-rc4-rt4+ #89
      | [   90.202559] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014
      | [   90.202561] Call Trace:
      | [   90.202577]  dump_stack_lvl+0x34/0x44
      | [   90.202584]  ___might_sleep.cold+0x87/0x94
      | [   90.202588]  rt_spin_lock+0x19/0x70
      | [   90.202593]  ___slab_alloc+0xcb/0x7d0
      | [   90.202598]  ? newidle_balance.constprop.0+0xf5/0x3b0
      | [   90.202603]  ? dequeue_entity+0xc3/0x290
      | [   90.202605]  ? io_wqe_dec_running.isra.0+0x98/0xe0
      | [   90.202610]  ? pick_next_task_fair+0xb9/0x330
      | [   90.202612]  ? __schedule+0x670/0x1410
      | [   90.202615]  ? io_wqe_dec_running.isra.0+0x98/0xe0
      | [   90.202618]  kmem_cache_alloc_trace+0x79/0x1f0
      | [   90.202621]  io_wqe_dec_running.isra.0+0x98/0xe0
      | [   90.202625]  io_wq_worker_sleeping+0x37/0x50
      | [   90.202628]  schedule+0x30/0xd0
      | [   90.202630]  schedule_timeout+0x8f/0x1a0
      | [   90.202634]  ? __bpf_trace_tick_stop+0x10/0x10
      | [   90.202637]  io_wqe_worker+0xfd/0x320
      | [   90.202641]  ? finish_task_switch.isra.0+0xd3/0x290
      | [   90.202644]  ? io_worker_handle_work+0x670/0x670
      | [   90.202646]  ? io_worker_handle_work+0x670/0x670
      | [   90.202649]  ret_from_fork+0x22/0x30
      
      which is due to the RT kernel not liking a GFP_ATOMIC allocation inside
      a raw spinlock. Besides that not working on RT, doing any kind of
      allocation from inside schedule() is kind of nasty and should be avoided
      if at all possible.
      
      This particular path happens when an io-wq worker goes to sleep, and we
      need a new worker to handle pending work. We currently allocate a small
      data item to hold the information we need to create a new worker, but we
      can instead include this data in the io_worker struct itself and just
      protect it with a single bit lock. We only really need one per worker
      anyway, as we will have run pending work between to sleep cycles.
      
      https://lore.kernel.org/lkml/20210804082418.fbibprcwtzyt5qax@beryllium.lan/Reported-by: NDaniel Wagner <dwagner@suse.de>
      Tested-by: NDaniel Wagner <dwagner@suse.de>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      d3e9f732
  13. 10 8月, 2021 2 次提交
  14. 06 8月, 2021 2 次提交
    • H
      io-wq: fix lack of acct->nr_workers < acct->max_workers judgement · 21698274
      Hao Xu 提交于
      There should be this judgement before we create an io-worker
      
      Fixes: 685fe7fe ("io-wq: eliminate the need for a manager thread")
      Signed-off-by: NHao Xu <haoxu@linux.alibaba.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      21698274
    • H
      io-wq: fix no lock protection of acct->nr_worker · 3d4e4fac
      Hao Xu 提交于
      There is an acct->nr_worker visit without lock protection. Think about
      the case: two callers call io_wqe_wake_worker(), one is the original
      context and the other one is an io-worker(by calling
      io_wqe_enqueue(wqe, linked)), on two cpus paralelly, this may cause
      nr_worker to be larger than max_worker.
      Let's fix it by adding lock for it, and let's do nr_workers++ before
      create_io_worker. There may be a edge cause that the first caller fails
      to create an io-worker, but the second caller doesn't know it and then
      quit creating io-worker as well:
      
      say nr_worker = max_worker - 1
              cpu 0                        cpu 1
         io_wqe_wake_worker()          io_wqe_wake_worker()
            nr_worker < max_worker
            nr_worker++
            create_io_worker()         nr_worker == max_worker
               failed                  return
            return
      
      But the chance of this case is very slim.
      
      Fixes: 685fe7fe ("io-wq: eliminate the need for a manager thread")
      Signed-off-by: NHao Xu <haoxu@linux.alibaba.com>
      [axboe: fix unconditional create_io_worker() call]
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      3d4e4fac
  15. 05 8月, 2021 1 次提交
  16. 24 7月, 2021 1 次提交
    • J
      io_uring: explicitly catch any illegal async queue attempt · 991468dc
      Jens Axboe 提交于
      Catch an illegal case to queue async from an unrelated task that got
      the ring fd passed to it. This should not be possible to hit, but
      better be proactive and catch it explicitly. io-wq is extended to
      check for early IO_WQ_WORK_CANCEL being set on a work item as well,
      so it can run the request through the normal cancelation path.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      991468dc
  17. 18 6月, 2021 3 次提交
  18. 16 6月, 2021 2 次提交
  19. 14 6月, 2021 4 次提交
  20. 26 5月, 2021 2 次提交
    • Z
      io-wq: Fix UAF when wakeup wqe in hash waitqueue · 3743c172
      Zqiang 提交于
      BUG: KASAN: use-after-free in __wake_up_common+0x637/0x650
      Read of size 8 at addr ffff8880304250d8 by task iou-wrk-28796/28802
      
      Call Trace:
       __dump_stack [inline]
       dump_stack+0x141/0x1d7
       print_address_description.constprop.0.cold+0x5b/0x2c6
       __kasan_report [inline]
       kasan_report.cold+0x7c/0xd8
       __wake_up_common+0x637/0x650
       __wake_up_common_lock+0xd0/0x130
       io_worker_handle_work+0x9dd/0x1790
       io_wqe_worker+0xb2a/0xd40
       ret_from_fork+0x1f/0x30
      
      Allocated by task 28798:
       kzalloc_node [inline]
       io_wq_create+0x3c4/0xdd0
       io_init_wq_offload [inline]
       io_uring_alloc_task_context+0x1bf/0x6b0
       __io_uring_add_task_file+0x29a/0x3c0
       io_uring_add_task_file [inline]
       io_uring_install_fd [inline]
       io_uring_create [inline]
       io_uring_setup+0x209a/0x2bd0
       do_syscall_64+0x3a/0xb0
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Freed by task 28798:
       kfree+0x106/0x2c0
       io_wq_destroy+0x182/0x380
       io_wq_put [inline]
       io_wq_put_and_exit+0x7a/0xa0
       io_uring_clean_tctx [inline]
       __io_uring_cancel+0x428/0x530
       io_uring_files_cancel
       do_exit+0x299/0x2a60
       do_group_exit+0x125/0x310
       get_signal+0x47f/0x2150
       arch_do_signal_or_restart+0x2a8/0x1eb0
       handle_signal_work[inline]
       exit_to_user_mode_loop [inline]
       exit_to_user_mode_prepare+0x171/0x280
       __syscall_exit_to_user_mode_work [inline]
       syscall_exit_to_user_mode+0x19/0x60
       do_syscall_64+0x47/0xb0
       entry_SYSCALL_64_after_hwframe
      
      There are the following scenarios, hash waitqueue is shared by
      io-wq1 and io-wq2. (note: wqe is worker)
      
      io-wq1:worker2     | locks bit1
      io-wq2:worker1     | waits bit1
      io-wq1:worker3     | waits bit1
      
      io-wq1:worker2     | completes all wqe bit1 work items
      io-wq1:worker2     | drop bit1, exit
      
      io-wq2:worker1     | locks bit1
      io-wq1:worker3     | can not locks bit1, waits bit1 and exit
      io-wq1             | exit and free io-wq1
      io-wq2:worker1     | drops bit1
      io-wq1:worker3     | be waked up, even though wqe is freed
      
      After all iou-wrk belonging to io-wq1 have exited, remove wqe
      form hash waitqueue, it is guaranteed that there will be no more
      wqe belonging to io-wq1 in the hash waitqueue.
      
      Reported-by: syzbot+6cb11ade52aa17095297@syzkaller.appspotmail.com
      Signed-off-by: NZqiang <qiang.zhang@windriver.com>
      Link: https://lore.kernel.org/r/20210526050826.30500-1-qiang.zhang@windriver.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
      3743c172
    • P
      io_uring/io-wq: close io-wq full-stop gap · 17a91051
      Pavel Begunkov 提交于
      There is an old problem with io-wq cancellation where requests should be
      killed and are in io-wq but are not discoverable, e.g. in @next_hashed
      or @linked vars of io_worker_handle_work(). It adds some unreliability
      to individual request canellation, but also may potentially get
      __io_uring_cancel() stuck. For instance:
      
      1) An __io_uring_cancel()'s cancellation round have not found any
         request but there are some as desribed.
      2) __io_uring_cancel() goes to sleep
      3) Then workers wake up and try to execute those hidden requests
         that happen to be unbound.
      
      As we already cancel all requests of io-wq there, set IO_WQ_BIT_EXIT
      in advance, so preventing 3) from executing unbound requests. The
      workers will initially break looping because of getting a signal as they
      are threads of the dying/exec()'ing user task.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Link: https://lore.kernel.org/r/abfcf8c54cb9e8f7bfbad7e9a0cc5433cc70bdc2.1621781238.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
      17a91051
  21. 21 4月, 2021 1 次提交
  22. 12 4月, 2021 1 次提交