1. 06 8月, 2021 2 次提交
    • H
      io-wq: fix lack of acct->nr_workers < acct->max_workers judgement · 21698274
      Hao Xu 提交于
      There should be this judgement before we create an io-worker
      
      Fixes: 685fe7fe ("io-wq: eliminate the need for a manager thread")
      Signed-off-by: NHao Xu <haoxu@linux.alibaba.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      21698274
    • H
      io-wq: fix no lock protection of acct->nr_worker · 3d4e4fac
      Hao Xu 提交于
      There is an acct->nr_worker visit without lock protection. Think about
      the case: two callers call io_wqe_wake_worker(), one is the original
      context and the other one is an io-worker(by calling
      io_wqe_enqueue(wqe, linked)), on two cpus paralelly, this may cause
      nr_worker to be larger than max_worker.
      Let's fix it by adding lock for it, and let's do nr_workers++ before
      create_io_worker. There may be a edge cause that the first caller fails
      to create an io-worker, but the second caller doesn't know it and then
      quit creating io-worker as well:
      
      say nr_worker = max_worker - 1
              cpu 0                        cpu 1
         io_wqe_wake_worker()          io_wqe_wake_worker()
            nr_worker < max_worker
            nr_worker++
            create_io_worker()         nr_worker == max_worker
               failed                  return
            return
      
      But the chance of this case is very slim.
      
      Fixes: 685fe7fe ("io-wq: eliminate the need for a manager thread")
      Signed-off-by: NHao Xu <haoxu@linux.alibaba.com>
      [axboe: fix unconditional create_io_worker() call]
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      3d4e4fac
  2. 05 8月, 2021 1 次提交
  3. 24 7月, 2021 1 次提交
    • J
      io_uring: explicitly catch any illegal async queue attempt · 991468dc
      Jens Axboe 提交于
      Catch an illegal case to queue async from an unrelated task that got
      the ring fd passed to it. This should not be possible to hit, but
      better be proactive and catch it explicitly. io-wq is extended to
      check for early IO_WQ_WORK_CANCEL being set on a work item as well,
      so it can run the request through the normal cancelation path.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      991468dc
  4. 18 6月, 2021 3 次提交
  5. 16 6月, 2021 2 次提交
  6. 14 6月, 2021 4 次提交
  7. 26 5月, 2021 2 次提交
    • Z
      io-wq: Fix UAF when wakeup wqe in hash waitqueue · 3743c172
      Zqiang 提交于
      BUG: KASAN: use-after-free in __wake_up_common+0x637/0x650
      Read of size 8 at addr ffff8880304250d8 by task iou-wrk-28796/28802
      
      Call Trace:
       __dump_stack [inline]
       dump_stack+0x141/0x1d7
       print_address_description.constprop.0.cold+0x5b/0x2c6
       __kasan_report [inline]
       kasan_report.cold+0x7c/0xd8
       __wake_up_common+0x637/0x650
       __wake_up_common_lock+0xd0/0x130
       io_worker_handle_work+0x9dd/0x1790
       io_wqe_worker+0xb2a/0xd40
       ret_from_fork+0x1f/0x30
      
      Allocated by task 28798:
       kzalloc_node [inline]
       io_wq_create+0x3c4/0xdd0
       io_init_wq_offload [inline]
       io_uring_alloc_task_context+0x1bf/0x6b0
       __io_uring_add_task_file+0x29a/0x3c0
       io_uring_add_task_file [inline]
       io_uring_install_fd [inline]
       io_uring_create [inline]
       io_uring_setup+0x209a/0x2bd0
       do_syscall_64+0x3a/0xb0
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Freed by task 28798:
       kfree+0x106/0x2c0
       io_wq_destroy+0x182/0x380
       io_wq_put [inline]
       io_wq_put_and_exit+0x7a/0xa0
       io_uring_clean_tctx [inline]
       __io_uring_cancel+0x428/0x530
       io_uring_files_cancel
       do_exit+0x299/0x2a60
       do_group_exit+0x125/0x310
       get_signal+0x47f/0x2150
       arch_do_signal_or_restart+0x2a8/0x1eb0
       handle_signal_work[inline]
       exit_to_user_mode_loop [inline]
       exit_to_user_mode_prepare+0x171/0x280
       __syscall_exit_to_user_mode_work [inline]
       syscall_exit_to_user_mode+0x19/0x60
       do_syscall_64+0x47/0xb0
       entry_SYSCALL_64_after_hwframe
      
      There are the following scenarios, hash waitqueue is shared by
      io-wq1 and io-wq2. (note: wqe is worker)
      
      io-wq1:worker2     | locks bit1
      io-wq2:worker1     | waits bit1
      io-wq1:worker3     | waits bit1
      
      io-wq1:worker2     | completes all wqe bit1 work items
      io-wq1:worker2     | drop bit1, exit
      
      io-wq2:worker1     | locks bit1
      io-wq1:worker3     | can not locks bit1, waits bit1 and exit
      io-wq1             | exit and free io-wq1
      io-wq2:worker1     | drops bit1
      io-wq1:worker3     | be waked up, even though wqe is freed
      
      After all iou-wrk belonging to io-wq1 have exited, remove wqe
      form hash waitqueue, it is guaranteed that there will be no more
      wqe belonging to io-wq1 in the hash waitqueue.
      
      Reported-by: syzbot+6cb11ade52aa17095297@syzkaller.appspotmail.com
      Signed-off-by: NZqiang <qiang.zhang@windriver.com>
      Link: https://lore.kernel.org/r/20210526050826.30500-1-qiang.zhang@windriver.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
      3743c172
    • P
      io_uring/io-wq: close io-wq full-stop gap · 17a91051
      Pavel Begunkov 提交于
      There is an old problem with io-wq cancellation where requests should be
      killed and are in io-wq but are not discoverable, e.g. in @next_hashed
      or @linked vars of io_worker_handle_work(). It adds some unreliability
      to individual request canellation, but also may potentially get
      __io_uring_cancel() stuck. For instance:
      
      1) An __io_uring_cancel()'s cancellation round have not found any
         request but there are some as desribed.
      2) __io_uring_cancel() goes to sleep
      3) Then workers wake up and try to execute those hidden requests
         that happen to be unbound.
      
      As we already cancel all requests of io-wq there, set IO_WQ_BIT_EXIT
      in advance, so preventing 3) from executing unbound requests. The
      workers will initially break looping because of getting a signal as they
      are threads of the dying/exec()'ing user task.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Link: https://lore.kernel.org/r/abfcf8c54cb9e8f7bfbad7e9a0cc5433cc70bdc2.1621781238.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
      17a91051
  8. 21 4月, 2021 1 次提交
  9. 12 4月, 2021 5 次提交
  10. 09 4月, 2021 1 次提交
    • P
      io-wq: cancel unbounded works on io-wq destroy · c60eb049
      Pavel Begunkov 提交于
      WARNING: CPU: 5 PID: 227 at fs/io_uring.c:8578 io_ring_exit_work+0xe6/0x470
      RIP: 0010:io_ring_exit_work+0xe6/0x470
      Call Trace:
       process_one_work+0x206/0x400
       worker_thread+0x4a/0x3d0
       kthread+0x129/0x170
       ret_from_fork+0x22/0x30
      
      INFO: task lfs-openat:2359 blocked for more than 245 seconds.
      task:lfs-openat      state:D stack:    0 pid: 2359 ppid:     1 flags:0x00000004
      Call Trace:
       ...
       wait_for_completion+0x8b/0xf0
       io_wq_destroy_manager+0x24/0x60
       io_wq_put_and_exit+0x18/0x30
       io_uring_clean_tctx+0x76/0xa0
       __io_uring_files_cancel+0x1b9/0x2e0
       do_exit+0xc0/0xb40
       ...
      
      Even after io-wq destroy has been issued io-wq worker threads will
      continue executing all left work items as usual, and may hang waiting
      for I/O that won't ever complete (aka unbounded).
      
      [<0>] pipe_read+0x306/0x450
      [<0>] io_iter_do_read+0x1e/0x40
      [<0>] io_read+0xd5/0x330
      [<0>] io_issue_sqe+0xd21/0x18a0
      [<0>] io_wq_submit_work+0x6c/0x140
      [<0>] io_worker_handle_work+0x17d/0x400
      [<0>] io_wqe_worker+0x2c0/0x330
      [<0>] ret_from_fork+0x22/0x30
      
      Cancel all unbounded I/O instead of executing them. This changes the
      user visible behaviour, but that's inevitable as io-wq is not per task.
      Suggested-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Link: https://lore.kernel.org/r/cd4b543154154cba055cf86f351441c2174d7f71.1617842918.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
      c60eb049
  11. 01 4月, 2021 1 次提交
  12. 28 3月, 2021 1 次提交
  13. 26 3月, 2021 1 次提交
    • J
      io-wq: fix race around pending work on teardown · f5d2d23b
      Jens Axboe 提交于
      syzbot reports that it's triggering the warning condition on having
      pending work on shutdown:
      
      WARNING: CPU: 1 PID: 12346 at fs/io-wq.c:1061 io_wq_destroy fs/io-wq.c:1061 [inline]
      WARNING: CPU: 1 PID: 12346 at fs/io-wq.c:1061 io_wq_put+0x153/0x260 fs/io-wq.c:1072
      Modules linked in:
      CPU: 1 PID: 12346 Comm: syz-executor.5 Not tainted 5.12.0-rc2-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:io_wq_destroy fs/io-wq.c:1061 [inline]
      RIP: 0010:io_wq_put+0x153/0x260 fs/io-wq.c:1072
      Code: 8d e8 71 90 ea 01 49 89 c4 41 83 fc 40 7d 4f e8 33 4d 97 ff 42 80 7c 2d 00 00 0f 85 77 ff ff ff e9 7a ff ff ff e8 1d 4d 97 ff <0f> 0b eb b9 8d 6b ff 89 ee 09 de bf ff ff ff ff e8 18 51 97 ff 09
      RSP: 0018:ffffc90001ebfb08 EFLAGS: 00010293
      RAX: ffffffff81e16083 RBX: ffff888019038040 RCX: ffff88801e86b780
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000040
      RBP: 1ffff1100b2f8a80 R08: ffffffff81e15fce R09: ffffed100b2f8a82
      R10: ffffed100b2f8a82 R11: 0000000000000000 R12: 0000000000000000
      R13: dffffc0000000000 R14: ffff8880597c5400 R15: ffff888019038000
      FS:  00007f8dcd89c700(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 000055e9a054e160 CR3: 000000001dfb8000 CR4: 00000000001506f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       io_uring_clean_tctx+0x1b7/0x210 fs/io_uring.c:8802
       __io_uring_files_cancel+0x13c/0x170 fs/io_uring.c:8820
       io_uring_files_cancel include/linux/io_uring.h:47 [inline]
       do_exit+0x258/0x2340 kernel/exit.c:780
       do_group_exit+0x168/0x2d0 kernel/exit.c:922
       get_signal+0x1734/0x1ef0 kernel/signal.c:2773
       arch_do_signal_or_restart+0x3c/0x610 arch/x86/kernel/signal.c:811
       handle_signal_work kernel/entry/common.c:147 [inline]
       exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
       exit_to_user_mode_prepare+0xac/0x1e0 kernel/entry/common.c:208
       __syscall_exit_to_user_mode_work kernel/entry/common.c:290 [inline]
       syscall_exit_to_user_mode+0x48/0x180 kernel/entry/common.c:301
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      RIP: 0033:0x465f69
      
      which shouldn't happen, but seems to be possible due to a race on whether
      or not the io-wq manager sees a fatal signal first, or whether the io-wq
      workers do. If we race with queueing work and then send a fatal signal to
      the owning task, and the io-wq worker sees that before the manager sets
      IO_WQ_BIT_EXIT, then it's possible to have the worker exit and leave work
      behind.
      
      Just turn the WARN_ON_ONCE() into a cancelation condition instead.
      
      Reported-by: syzbot+77a738a6bc947bf639ca@syzkaller.appspotmail.com
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      f5d2d23b
  14. 22 3月, 2021 1 次提交
  15. 21 3月, 2021 1 次提交
  16. 13 3月, 2021 1 次提交
    • J
      io_uring: allow IO worker threads to be frozen · 16efa4fc
      Jens Axboe 提交于
      With the freezer using the proper signaling to notify us of when it's
      time to freeze a thread, we can re-enable normal freezer usage for the
      IO threads. Ensure that SQPOLL, io-wq, and the io-wq manager call
      try_to_freeze() appropriately, and remove the default setting of
      PF_NOFREEZE from create_io_thread().
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      16efa4fc
  17. 10 3月, 2021 3 次提交
  18. 08 3月, 2021 1 次提交
  19. 07 3月, 2021 1 次提交
    • J
      io-wq: fix race in freeing 'wq' and worker access · 886d0137
      Jens Axboe 提交于
      Ran into a use-after-free on the main io-wq struct, wq. It has a worker
      ref and completion event, but the manager itself isn't holding a
      reference. This can lead to a race where the manager thinks there are
      no workers and exits, but a worker is being added. That leads to the
      following trace:
      
      BUG: KASAN: use-after-free in io_wqe_worker+0x4c0/0x5e0
      Read of size 8 at addr ffff888108baa8a0 by task iou-wrk-3080422/3080425
      
      CPU: 5 PID: 3080425 Comm: iou-wrk-3080422 Not tainted 5.12.0-rc1+ #110
      Hardware name: Micro-Star International Co., Ltd. MS-7C60/TRX40 PRO 10G (MS-7C60), BIOS 1.60 05/13/2020
      Call Trace:
       dump_stack+0x90/0xbe
       print_address_description.constprop.0+0x67/0x28d
       ? io_wqe_worker+0x4c0/0x5e0
       kasan_report.cold+0x7b/0xd4
       ? io_wqe_worker+0x4c0/0x5e0
       __asan_load8+0x6d/0xa0
       io_wqe_worker+0x4c0/0x5e0
       ? io_worker_handle_work+0xc00/0xc00
       ? recalc_sigpending+0xe5/0x120
       ? io_worker_handle_work+0xc00/0xc00
       ? io_worker_handle_work+0xc00/0xc00
       ret_from_fork+0x1f/0x30
      
      Allocated by task 3080422:
       kasan_save_stack+0x23/0x60
       __kasan_kmalloc+0x80/0xa0
       kmem_cache_alloc_node_trace+0xa0/0x480
       io_wq_create+0x3b5/0x600
       io_uring_alloc_task_context+0x13c/0x380
       io_uring_add_task_file+0x109/0x140
       __x64_sys_io_uring_enter+0x45f/0x660
       do_syscall_64+0x32/0x80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Freed by task 3080422:
       kasan_save_stack+0x23/0x60
       kasan_set_track+0x20/0x40
       kasan_set_free_info+0x24/0x40
       __kasan_slab_free+0xe8/0x120
       kfree+0xa8/0x400
       io_wq_put+0x14a/0x220
       io_wq_put_and_exit+0x9a/0xc0
       io_uring_clean_tctx+0x101/0x140
       __io_uring_files_cancel+0x36e/0x3c0
       do_exit+0x169/0x1340
       __x64_sys_exit+0x34/0x40
       do_syscall_64+0x32/0x80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Have the manager itself hold a reference, and now both drop points drop
      and complete if we hit zero, and the manager can unconditionally do a
      wait_for_completion() instead of having a race between reading the ref
      count and waiting if it was non-zero.
      
      Fixes: fb3a1f6c ("io-wq: have manager wait for all workers to exit")
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      886d0137
  20. 05 3月, 2021 2 次提交
    • J
      io-wq: kill hashed waitqueue before manager exits · 09ca6c40
      Jens Axboe 提交于
      If we race with shutting down the io-wq context and someone queueing
      a hashed entry, then we can exit the manager with it armed. If it then
      triggers after the manager has exited, we can have a use-after-free where
      io_wqe_hash_wake() attempts to wake a now gone manager process.
      
      Move the killing of the hashed write queue into the manager itself, so
      that we know we've killed it before the task exits.
      
      Fixes: e941894e ("io-wq: make buffered file write hashed work map per-ctx")
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      09ca6c40
    • J
      io_uring: move to using create_io_thread() · 46fe18b1
      Jens Axboe 提交于
      This allows us to do task creation and setup without needing to use
      completions to try and synchronize with the starting thread. Get rid of
      the old io_wq_fork_thread() wrapper, and the 'wq' and 'worker' startup
      completion events - we can now do setup before the task is running.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      46fe18b1
  21. 04 3月, 2021 5 次提交
    • J
      io-wq: ensure all pending work is canceled on exit · f0127254
      Jens Axboe 提交于
      If we race on shutting down the io-wq, then we should ensure that any
      work that was queued after workers shutdown is canceled. Harden the
      add work check a bit too, checking for IO_WQ_BIT_EXIT and cancel if
      it's set.
      
      Add a WARN_ON() for having any work before we kill the io-wq context.
      
      Reported-by: syzbot+91b4b56ead187d35c9d3@syzkaller.appspotmail.com
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      f0127254
    • J
      io_uring: ensure that threads freeze on suspend · e4b4a13f
      Jens Axboe 提交于
      Alex reports that his system fails to suspend using 5.12-rc1, with the
      following dump:
      
      [  240.650300] PM: suspend entry (deep)
      [  240.650748] Filesystems sync: 0.000 seconds
      [  240.725605] Freezing user space processes ...
      [  260.739483] Freezing of tasks failed after 20.013 seconds (3 tasks refusing to freeze, wq_busy=0):
      [  260.739497] task:iou-mgr-446     state:S stack:    0 pid:  516 ppid:   439 flags:0x00004224
      [  260.739504] Call Trace:
      [  260.739507]  ? sysvec_apic_timer_interrupt+0xb/0x81
      [  260.739515]  ? pick_next_task_fair+0x197/0x1cde
      [  260.739519]  ? sysvec_reschedule_ipi+0x2f/0x6a
      [  260.739522]  ? asm_sysvec_reschedule_ipi+0x12/0x20
      [  260.739525]  ? __schedule+0x57/0x6d6
      [  260.739529]  ? del_timer_sync+0xb9/0x115
      [  260.739533]  ? schedule+0x63/0xd5
      [  260.739536]  ? schedule_timeout+0x219/0x356
      [  260.739540]  ? __next_timer_interrupt+0xf1/0xf1
      [  260.739544]  ? io_wq_manager+0x73/0xb1
      [  260.739549]  ? io_wq_create+0x262/0x262
      [  260.739553]  ? ret_from_fork+0x22/0x30
      [  260.739557] task:iou-mgr-517     state:S stack:    0 pid:  522 ppid:   439 flags:0x00004224
      [  260.739561] Call Trace:
      [  260.739563]  ? sysvec_apic_timer_interrupt+0xb/0x81
      [  260.739566]  ? pick_next_task_fair+0x16f/0x1cde
      [  260.739569]  ? sysvec_apic_timer_interrupt+0xb/0x81
      [  260.739571]  ? asm_sysvec_apic_timer_interrupt+0x12/0x20
      [  260.739574]  ? __schedule+0x5b7/0x6d6
      [  260.739578]  ? del_timer_sync+0x70/0x115
      [  260.739581]  ? schedule_timeout+0x211/0x356
      [  260.739585]  ? __next_timer_interrupt+0xf1/0xf1
      [  260.739588]  ? io_wq_check_workers+0x15/0x11f
      [  260.739592]  ? io_wq_manager+0x69/0xb1
      [  260.739596]  ? io_wq_create+0x262/0x262
      [  260.739600]  ? ret_from_fork+0x22/0x30
      [  260.739603] task:iou-wrk-517     state:S stack:    0 pid:  523 ppid:   439 flags:0x00004224
      [  260.739607] Call Trace:
      [  260.739609]  ? __schedule+0x5b7/0x6d6
      [  260.739614]  ? schedule+0x63/0xd5
      [  260.739617]  ? schedule_timeout+0x219/0x356
      [  260.739621]  ? __next_timer_interrupt+0xf1/0xf1
      [  260.739624]  ? task_thread.isra.0+0x148/0x3af
      [  260.739628]  ? task_thread_unbound+0xa/0xa
      [  260.739632]  ? task_thread_bound+0x7/0x7
      [  260.739636]  ? ret_from_fork+0x22/0x30
      [  260.739647] OOM killer enabled.
      [  260.739648] Restarting tasks ... done.
      [  260.740077] PM: suspend exit
      
      Play nice and ensure that any thread we create will call try_to_freeze()
      at an opportune time so that memory suspend can proceed. For the io-wq
      worker threads, mark them as PF_NOFREEZE. They could potentially be
      blocked for a long time.
      Reported-by: NAlex Xu (Hello71) <alex_y_xu@yahoo.ca>
      Tested-by: NAlex Xu (Hello71) <alex_y_xu@yahoo.ca>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      e4b4a13f
    • J
      io-wq: fix error path leak of buffered write hash map · dc7bbc9e
      Jens Axboe 提交于
      The 'err' path should include the hash put, we already grabbed a reference
      once we get that far.
      
      Fixes: e941894e ("io-wq: make buffered file write hashed work map per-ctx")
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      dc7bbc9e
    • J
      io_uring: move cred assignment into io_issue_sqe() · 5730b27e
      Jens Axboe 提交于
      If we move it in there, then we no longer have to care about it in io-wq.
      This means we can drop the cred handling in io-wq, and we can drop the
      REQ_F_WORK_INITIALIZED flag and async init functions as that was the last
      user of it since we moved to the new workers. Then we can also drop
      io_wq_work->creds, and just hold the personality u16 in there instead.
      Suggested-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      5730b27e
    • J
      io-wq: provide an io_wq_put_and_exit() helper · afcc4015
      Jens Axboe 提交于
      If we put the io-wq from io_uring, we really want it to exit. Provide
      a helper that does that for us. Couple that with not having the manager
      hold a reference to the 'wq' and the normal SQPOLL exit will tear down
      the io-wq context appropriate.
      
      On the io-wq side, our wq context is per task, so only the task itself
      is manipulating ->manager and hence it's safe to check and clear without
      any extra locking. We just need to ensure that the manager task stays
      around, in case it exits.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      afcc4015