1. 07 3月, 2021 1 次提交
    • J
      io-wq: fix race in freeing 'wq' and worker access · 886d0137
      Jens Axboe 提交于
      Ran into a use-after-free on the main io-wq struct, wq. It has a worker
      ref and completion event, but the manager itself isn't holding a
      reference. This can lead to a race where the manager thinks there are
      no workers and exits, but a worker is being added. That leads to the
      following trace:
      
      BUG: KASAN: use-after-free in io_wqe_worker+0x4c0/0x5e0
      Read of size 8 at addr ffff888108baa8a0 by task iou-wrk-3080422/3080425
      
      CPU: 5 PID: 3080425 Comm: iou-wrk-3080422 Not tainted 5.12.0-rc1+ #110
      Hardware name: Micro-Star International Co., Ltd. MS-7C60/TRX40 PRO 10G (MS-7C60), BIOS 1.60 05/13/2020
      Call Trace:
       dump_stack+0x90/0xbe
       print_address_description.constprop.0+0x67/0x28d
       ? io_wqe_worker+0x4c0/0x5e0
       kasan_report.cold+0x7b/0xd4
       ? io_wqe_worker+0x4c0/0x5e0
       __asan_load8+0x6d/0xa0
       io_wqe_worker+0x4c0/0x5e0
       ? io_worker_handle_work+0xc00/0xc00
       ? recalc_sigpending+0xe5/0x120
       ? io_worker_handle_work+0xc00/0xc00
       ? io_worker_handle_work+0xc00/0xc00
       ret_from_fork+0x1f/0x30
      
      Allocated by task 3080422:
       kasan_save_stack+0x23/0x60
       __kasan_kmalloc+0x80/0xa0
       kmem_cache_alloc_node_trace+0xa0/0x480
       io_wq_create+0x3b5/0x600
       io_uring_alloc_task_context+0x13c/0x380
       io_uring_add_task_file+0x109/0x140
       __x64_sys_io_uring_enter+0x45f/0x660
       do_syscall_64+0x32/0x80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Freed by task 3080422:
       kasan_save_stack+0x23/0x60
       kasan_set_track+0x20/0x40
       kasan_set_free_info+0x24/0x40
       __kasan_slab_free+0xe8/0x120
       kfree+0xa8/0x400
       io_wq_put+0x14a/0x220
       io_wq_put_and_exit+0x9a/0xc0
       io_uring_clean_tctx+0x101/0x140
       __io_uring_files_cancel+0x36e/0x3c0
       do_exit+0x169/0x1340
       __x64_sys_exit+0x34/0x40
       do_syscall_64+0x32/0x80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Have the manager itself hold a reference, and now both drop points drop
      and complete if we hit zero, and the manager can unconditionally do a
      wait_for_completion() instead of having a race between reading the ref
      count and waiting if it was non-zero.
      
      Fixes: fb3a1f6c ("io-wq: have manager wait for all workers to exit")
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      886d0137
  2. 05 3月, 2021 2 次提交
    • J
      io-wq: kill hashed waitqueue before manager exits · 09ca6c40
      Jens Axboe 提交于
      If we race with shutting down the io-wq context and someone queueing
      a hashed entry, then we can exit the manager with it armed. If it then
      triggers after the manager has exited, we can have a use-after-free where
      io_wqe_hash_wake() attempts to wake a now gone manager process.
      
      Move the killing of the hashed write queue into the manager itself, so
      that we know we've killed it before the task exits.
      
      Fixes: e941894e ("io-wq: make buffered file write hashed work map per-ctx")
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      09ca6c40
    • J
      io_uring: move to using create_io_thread() · 46fe18b1
      Jens Axboe 提交于
      This allows us to do task creation and setup without needing to use
      completions to try and synchronize with the starting thread. Get rid of
      the old io_wq_fork_thread() wrapper, and the 'wq' and 'worker' startup
      completion events - we can now do setup before the task is running.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      46fe18b1
  3. 04 3月, 2021 10 次提交
    • J
      io-wq: ensure all pending work is canceled on exit · f0127254
      Jens Axboe 提交于
      If we race on shutting down the io-wq, then we should ensure that any
      work that was queued after workers shutdown is canceled. Harden the
      add work check a bit too, checking for IO_WQ_BIT_EXIT and cancel if
      it's set.
      
      Add a WARN_ON() for having any work before we kill the io-wq context.
      
      Reported-by: syzbot+91b4b56ead187d35c9d3@syzkaller.appspotmail.com
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      f0127254
    • J
      io_uring: ensure that threads freeze on suspend · e4b4a13f
      Jens Axboe 提交于
      Alex reports that his system fails to suspend using 5.12-rc1, with the
      following dump:
      
      [  240.650300] PM: suspend entry (deep)
      [  240.650748] Filesystems sync: 0.000 seconds
      [  240.725605] Freezing user space processes ...
      [  260.739483] Freezing of tasks failed after 20.013 seconds (3 tasks refusing to freeze, wq_busy=0):
      [  260.739497] task:iou-mgr-446     state:S stack:    0 pid:  516 ppid:   439 flags:0x00004224
      [  260.739504] Call Trace:
      [  260.739507]  ? sysvec_apic_timer_interrupt+0xb/0x81
      [  260.739515]  ? pick_next_task_fair+0x197/0x1cde
      [  260.739519]  ? sysvec_reschedule_ipi+0x2f/0x6a
      [  260.739522]  ? asm_sysvec_reschedule_ipi+0x12/0x20
      [  260.739525]  ? __schedule+0x57/0x6d6
      [  260.739529]  ? del_timer_sync+0xb9/0x115
      [  260.739533]  ? schedule+0x63/0xd5
      [  260.739536]  ? schedule_timeout+0x219/0x356
      [  260.739540]  ? __next_timer_interrupt+0xf1/0xf1
      [  260.739544]  ? io_wq_manager+0x73/0xb1
      [  260.739549]  ? io_wq_create+0x262/0x262
      [  260.739553]  ? ret_from_fork+0x22/0x30
      [  260.739557] task:iou-mgr-517     state:S stack:    0 pid:  522 ppid:   439 flags:0x00004224
      [  260.739561] Call Trace:
      [  260.739563]  ? sysvec_apic_timer_interrupt+0xb/0x81
      [  260.739566]  ? pick_next_task_fair+0x16f/0x1cde
      [  260.739569]  ? sysvec_apic_timer_interrupt+0xb/0x81
      [  260.739571]  ? asm_sysvec_apic_timer_interrupt+0x12/0x20
      [  260.739574]  ? __schedule+0x5b7/0x6d6
      [  260.739578]  ? del_timer_sync+0x70/0x115
      [  260.739581]  ? schedule_timeout+0x211/0x356
      [  260.739585]  ? __next_timer_interrupt+0xf1/0xf1
      [  260.739588]  ? io_wq_check_workers+0x15/0x11f
      [  260.739592]  ? io_wq_manager+0x69/0xb1
      [  260.739596]  ? io_wq_create+0x262/0x262
      [  260.739600]  ? ret_from_fork+0x22/0x30
      [  260.739603] task:iou-wrk-517     state:S stack:    0 pid:  523 ppid:   439 flags:0x00004224
      [  260.739607] Call Trace:
      [  260.739609]  ? __schedule+0x5b7/0x6d6
      [  260.739614]  ? schedule+0x63/0xd5
      [  260.739617]  ? schedule_timeout+0x219/0x356
      [  260.739621]  ? __next_timer_interrupt+0xf1/0xf1
      [  260.739624]  ? task_thread.isra.0+0x148/0x3af
      [  260.739628]  ? task_thread_unbound+0xa/0xa
      [  260.739632]  ? task_thread_bound+0x7/0x7
      [  260.739636]  ? ret_from_fork+0x22/0x30
      [  260.739647] OOM killer enabled.
      [  260.739648] Restarting tasks ... done.
      [  260.740077] PM: suspend exit
      
      Play nice and ensure that any thread we create will call try_to_freeze()
      at an opportune time so that memory suspend can proceed. For the io-wq
      worker threads, mark them as PF_NOFREEZE. They could potentially be
      blocked for a long time.
      Reported-by: NAlex Xu (Hello71) <alex_y_xu@yahoo.ca>
      Tested-by: NAlex Xu (Hello71) <alex_y_xu@yahoo.ca>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      e4b4a13f
    • J
      io-wq: fix error path leak of buffered write hash map · dc7bbc9e
      Jens Axboe 提交于
      The 'err' path should include the hash put, we already grabbed a reference
      once we get that far.
      
      Fixes: e941894e ("io-wq: make buffered file write hashed work map per-ctx")
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      dc7bbc9e
    • J
      io_uring: move cred assignment into io_issue_sqe() · 5730b27e
      Jens Axboe 提交于
      If we move it in there, then we no longer have to care about it in io-wq.
      This means we can drop the cred handling in io-wq, and we can drop the
      REQ_F_WORK_INITIALIZED flag and async init functions as that was the last
      user of it since we moved to the new workers. Then we can also drop
      io_wq_work->creds, and just hold the personality u16 in there instead.
      Suggested-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      5730b27e
    • J
      io-wq: provide an io_wq_put_and_exit() helper · afcc4015
      Jens Axboe 提交于
      If we put the io-wq from io_uring, we really want it to exit. Provide
      a helper that does that for us. Couple that with not having the manager
      hold a reference to the 'wq' and the normal SQPOLL exit will tear down
      the io-wq context appropriate.
      
      On the io-wq side, our wq context is per task, so only the task itself
      is manipulating ->manager and hence it's safe to check and clear without
      any extra locking. We just need to ensure that the manager task stays
      around, in case it exits.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      afcc4015
    • J
      io-wq: fix double put of 'wq' in error path · 470ec4ed
      Jens Axboe 提交于
      We are already freeing the wq struct in both spots, so don't put it and
      get it freed twice.
      
      Reported-by: syzbot+7bf785eedca35ca05501@syzkaller.appspotmail.com
      Fixes: 4fb6ac32 ("io-wq: improve manager/worker handling over exec")
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      470ec4ed
    • J
      io-wq: wait for manager exit on wq destroy · d364d9e5
      Jens Axboe 提交于
      The manager waits for the workers, hence the manager is always valid if
      workers are running. Now also have wq destroy wait for the manager on
      exit, so we now everything is gone.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      d364d9e5
    • J
      io-wq: rename wq->done completion to wq->started · dbf99620
      Jens Axboe 提交于
      This is a leftover from a different use cases, it's used to wait for
      the manager to startup. Rename it as such.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      dbf99620
    • J
      io-wq: don't ask for a new worker if we're exiting · 613eeb60
      Jens Axboe 提交于
      If we're in the process of shutting down the async context, then don't
      create new workers if we already have at least the fixed one.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      613eeb60
    • J
      io-wq: have manager wait for all workers to exit · fb3a1f6c
      Jens Axboe 提交于
      Instead of having to wait separately on workers and manager, just have
      the manager wait on the workers. We use an atomic_t for the reference
      here, as we need to start at 0 and allow increment from that. Since the
      number of workers is naturally capped by the allowed nr of processes,
      and that uses an int, there is no risk of overflow.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      fb3a1f6c
  4. 02 3月, 2021 1 次提交
  5. 26 2月, 2021 3 次提交
    • J
      io-wq: remove now unused IO_WQ_BIT_ERROR · d6ce7f67
      Jens Axboe 提交于
      This flag is now dead, remove it.
      
      Fixes: 1cbd9c2b ("io-wq: don't create any IO workers upfront")
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      d6ce7f67
    • J
      io-wq: improve manager/worker handling over exec · 4fb6ac32
      Jens Axboe 提交于
      exec will cancel any threads, including the ones that io-wq is using. This
      isn't a problem, in fact we'd prefer it to be that way since it means we
      know that any async work cancels naturally without having to handle it
      proactively.
      
      But it does mean that we need to setup a new manager, as the manager and
      workers are gone. Handle this at queue time, and cancel work if we fail.
      Since the manager can go away without us noticing, ensure that the manager
      itself holds a reference to the 'wq' as well. Rename io_wq_destroy() to
      io_wq_put() to reflect that.
      
      In the future we can now simplify exec cancelation handling, for now just
      leave it the same.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      4fb6ac32
    • J
      io-wq: make buffered file write hashed work map per-ctx · e941894e
      Jens Axboe 提交于
      Before the io-wq thread change, we maintained a hash work map and lock
      per-node per-ring. That wasn't ideal, as we really wanted it to be per
      ring. But now that we have per-task workers, the hash map ends up being
      just per-task. That'll work just fine for the normal case of having
      one task use a ring, but if you share the ring between tasks, then it's
      considerably worse than it was before.
      
      Make the hash map per ctx instead, which provides full per-ctx buffered
      write serialization on hashed writes.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      e941894e
  6. 24 2月, 2021 3 次提交
    • J
      io-wq: fix race around io_worker grabbing · eb2de941
      Jens Axboe 提交于
      There's a small window between lookup dropping the reference to the
      worker and calling wake_up_process() on the worker task, where the worker
      itself could have exited. We ensure that the worker struct itself is
      valid, but worker->task may very well be gone by the time we issue the
      wakeup.
      
      Fix the race by using a completion triggered by the reference going to
      zero, and having exit wait for that completion before proceeding.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      eb2de941
    • J
      io-wq: fix races around manager/worker creation and task exit · 8b3e78b5
      Jens Axboe 提交于
      These races have always been there, they are just more apparent now that
      we do early cancel of io-wq when the task exits.
      
      Ensure that the io-wq manager sets task state correctly to not miss
      wakeups for task creation. This is important if we get a wakeup after
      having marked ourselves as TASK_INTERRUPTIBLE. If we do end up creating
      workers, then we flip the state back to running, making the subsequent
      schedule() a no-op. Also increment the wq ref count before forking the
      thread, to avoid a use-after-free.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      8b3e78b5
    • J
      io-wq: remove nr_process accounting · 728f13e7
      Jens Axboe 提交于
      We're now just using fork like we would from userspace, so there's no
      need to try and impose extra restrictions or accounting on the user
      side of things. That's already being done for us. That also means we
      don't have to pass in the user_struct anymore, that's correctly inherited
      through ->creds on fork.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      728f13e7
  7. 22 2月, 2021 9 次提交
  8. 13 2月, 2021 1 次提交
    • J
      io-wq: clear out worker ->fs and ->files · e06aa2e9
      Jens Axboe 提交于
      By default, kernel threads have init_fs and init_files assigned. In the
      past, this has triggered security problems, as commands that don't ask
      for (and hence don't get assigned) fs/files from the originating task
      can then attempt path resolution etc with access to parts of the system
      they should not be able to.
      
      Rather than add checks in the fs code for misuse, just set these to
      NULL. If we do attempt to use them, then the resulting code will oops
      rather than provide access to something that it should not permit.
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      e06aa2e9
  9. 04 2月, 2021 1 次提交
  10. 02 2月, 2021 1 次提交
  11. 21 12月, 2020 1 次提交
  12. 10 12月, 2020 1 次提交
  13. 05 11月, 2020 1 次提交
  14. 22 10月, 2020 1 次提交
  15. 21 10月, 2020 1 次提交
  16. 17 10月, 2020 3 次提交