1. 20 10月, 2022 1 次提交
    • R
      io-wq: Fix memory leak in worker creation · 996d3efe
      Rafael Mendonca 提交于
      If the CPU mask allocation for a node fails, then the memory allocated for
      the 'io_wqe' struct of the current node doesn't get freed on the error
      handling path, since it has not yet been added to the 'wqes' array.
      
      This was spotted when fuzzing v6.1-rc1 with Syzkaller:
      BUG: memory leak
      unreferenced object 0xffff8880093d5000 (size 1024):
        comm "syz-executor.2", pid 7701, jiffies 4295048595 (age 13.900s)
        hex dump (first 32 bytes):
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<00000000cb463369>] __kmem_cache_alloc_node+0x18e/0x720
          [<00000000147a3f9c>] kmalloc_node_trace+0x2a/0x130
          [<000000004e107011>] io_wq_create+0x7b9/0xdc0
          [<00000000c38b2018>] io_uring_alloc_task_context+0x31e/0x59d
          [<00000000867399da>] __io_uring_add_tctx_node.cold+0x19/0x1ba
          [<000000007e0e7a79>] io_uring_setup.cold+0x1b80/0x1dce
          [<00000000b545e9f6>] __x64_sys_io_uring_setup+0x5d/0x80
          [<000000008a8a7508>] do_syscall_64+0x5d/0x90
          [<000000004ac08bec>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      Fixes: 0e03496d ("io-wq: use private CPU mask")
      Cc: stable@vger.kernel.org
      Signed-off-by: NRafael Mendonca <rafaelmendsr@gmail.com>
      Link: https://lore.kernel.org/r/20221020014710.902201-1-rafaelmendsr@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
      996d3efe
  2. 04 8月, 2022 1 次提交
    • P
      audit, io_uring, io-wq: Fix memory leak in io_sq_thread() and io_wqe_worker() · f482aa98
      Peilin Ye 提交于
      Currently @audit_context is allocated twice for io_uring workers:
      
        1. copy_process() calls audit_alloc();
        2. io_sq_thread() or io_wqe_worker() calls audit_alloc_kernel() (which
           is effectively audit_alloc()) and overwrites @audit_context,
           causing:
      
        BUG: memory leak
        unreferenced object 0xffff888144547400 (size 1024):
      <...>
          hex dump (first 32 bytes):
            00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00  ................
            00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
          backtrace:
            [<ffffffff8135cfc3>] audit_alloc+0x133/0x210
            [<ffffffff81239e63>] copy_process+0xcd3/0x2340
            [<ffffffff8123b5f3>] create_io_thread+0x63/0x90
            [<ffffffff81686604>] create_io_worker+0xb4/0x230
            [<ffffffff81686f68>] io_wqe_enqueue+0x248/0x3b0
            [<ffffffff8167663a>] io_queue_iowq+0xba/0x200
            [<ffffffff816768b3>] io_queue_async+0x113/0x180
            [<ffffffff816840df>] io_req_task_submit+0x18f/0x1a0
            [<ffffffff816841cd>] io_apoll_task_func+0xdd/0x120
            [<ffffffff8167d49f>] tctx_task_work+0x11f/0x570
            [<ffffffff81272c4e>] task_work_run+0x7e/0xc0
            [<ffffffff8125a688>] get_signal+0xc18/0xf10
            [<ffffffff8111645b>] arch_do_signal_or_restart+0x2b/0x730
            [<ffffffff812ea44e>] exit_to_user_mode_prepare+0x5e/0x180
            [<ffffffff844ae1b2>] syscall_exit_to_user_mode+0x12/0x20
            [<ffffffff844a7e80>] do_syscall_64+0x40/0x80
      
      Then,
      
        3. io_sq_thread() or io_wqe_worker() frees @audit_context using
           audit_free();
        4. do_exit() eventually calls audit_free() again, which is okay
           because audit_free() does a NULL check.
      
      As suggested by Paul Moore, fix it by deleting audit_alloc_kernel() and
      redundant audit_free() calls.
      
      Fixes: 5bd2182d ("audit,io_uring,io-wq: add some basic audit support to io_uring")
      Suggested-by: NPaul Moore <paul@paul-moore.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPeilin Ye <peilin.ye@bytedance.com>
      Acked-by: NPaul Moore <paul@paul-moore.com>
      Link: https://lore.kernel.org/r/20220803222343.31673-1-yepeilin.cs@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
      f482aa98
  3. 25 7月, 2022 3 次提交
  4. 30 4月, 2022 1 次提交
  5. 11 3月, 2022 2 次提交
  6. 10 3月, 2022 3 次提交
    • H
      io-wq: use IO_WQ_ACCT_NR rather than hardcoded number · 86127bb1
      Hao Xu 提交于
      It's better to use the defined enum stuff not the hardcoded number to
      define array.
      Signed-off-by: NHao Xu <haoxu@linux.alibaba.com>
      Link: https://lore.kernel.org/r/20220206095241.121485-4-haoxu@linux.alibaba.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
      86127bb1
    • H
      io-wq: reduce acct->lock crossing functions lock/unlock · e13fb1fe
      Hao Xu 提交于
      reduce acct->lock lock and unlock in different functions to make the
      code clearer.
      Signed-off-by: NHao Xu <haoxu@linux.alibaba.com>
      Link: https://lore.kernel.org/r/20220206095241.121485-3-haoxu@linux.alibaba.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
      e13fb1fe
    • H
      io-wq: decouple work_list protection from the big wqe->lock · 42abc95f
      Hao Xu 提交于
      wqe->lock is abused, it now protects acct->work_list, hash stuff,
      nr_workers, wqe->free_list and so on. Lets first get the work_list out
      of the wqe-lock mess by introduce a specific lock for work list. This
      is the first step to solve the huge contension between work insertion
      and work consumption.
      good thing:
        - split locking for bound and unbound work list
        - reduce contension between work_list visit and (worker's)free_list.
      
      For the hash stuff, since there won't be a work with same file in both
      bound and unbound work list, thus they won't visit same hash entry. it
      works well to use the new lock to protect hash stuff.
      
      Results:
      set max_unbound_worker = 4, test with echo-server:
      nice -n -15 ./io_uring_echo_server -p 8081 -f -n 1000 -l 16
      (-n connection, -l workload)
      before this patch:
      Samples: 2M of event 'cycles:ppp', Event count (approx.): 1239982111074
      Overhead  Command          Shared Object         Symbol
        28.59%  iou-wrk-10021    [kernel.vmlinux]      [k] native_queued_spin_lock_slowpath
         8.89%  io_uring_echo_s  [kernel.vmlinux]      [k] native_queued_spin_lock_slowpath
         6.20%  iou-wrk-10021    [kernel.vmlinux]      [k] _raw_spin_lock
         2.45%  io_uring_echo_s  [kernel.vmlinux]      [k] io_prep_async_work
         2.36%  iou-wrk-10021    [kernel.vmlinux]      [k] _raw_spin_lock_irqsave
         2.29%  iou-wrk-10021    [kernel.vmlinux]      [k] io_worker_handle_work
         1.29%  io_uring_echo_s  [kernel.vmlinux]      [k] io_wqe_enqueue
         1.06%  iou-wrk-10021    [kernel.vmlinux]      [k] io_wqe_worker
         1.06%  io_uring_echo_s  [kernel.vmlinux]      [k] _raw_spin_lock
         1.03%  iou-wrk-10021    [kernel.vmlinux]      [k] __schedule
         0.99%  iou-wrk-10021    [kernel.vmlinux]      [k] tcp_sendmsg_locked
      
      with this patch:
      Samples: 1M of event 'cycles:ppp', Event count (approx.): 708446691943
      Overhead  Command          Shared Object         Symbol
        16.86%  iou-wrk-10893    [kernel.vmlinux]      [k] native_queued_spin_lock_slowpat
         9.10%  iou-wrk-10893    [kernel.vmlinux]      [k] _raw_spin_lock
         4.53%  io_uring_echo_s  [kernel.vmlinux]      [k] native_queued_spin_lock_slowpat
         2.87%  iou-wrk-10893    [kernel.vmlinux]      [k] io_worker_handle_work
         2.57%  iou-wrk-10893    [kernel.vmlinux]      [k] _raw_spin_lock_irqsave
         2.56%  io_uring_echo_s  [kernel.vmlinux]      [k] io_prep_async_work
         1.82%  io_uring_echo_s  [kernel.vmlinux]      [k] _raw_spin_lock
         1.33%  iou-wrk-10893    [kernel.vmlinux]      [k] io_wqe_worker
         1.26%  io_uring_echo_s  [kernel.vmlinux]      [k] try_to_wake_up
      
      spin_lock failure from 25.59% + 8.89% =  34.48% to 16.86% + 4.53% = 21.39%
      TPS is similar, while cpu usage is from almost 400% to 350%
      Signed-off-by: NHao Xu <haoxu@linux.alibaba.com>
      Link: https://lore.kernel.org/r/20220206095241.121485-2-haoxu@linux.alibaba.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
      42abc95f
  7. 20 1月, 2022 1 次提交
  8. 19 1月, 2022 5 次提交
  9. 08 1月, 2022 1 次提交
  10. 14 12月, 2021 1 次提交
    • J
      io-wq: drop wqe lock before creating new worker · d800c65c
      Jens Axboe 提交于
      We have two io-wq creation paths:
      
      - On queue enqueue
      - When a worker goes to sleep
      
      The latter invokes worker creation with the wqe->lock held, but that can
      run into problems if we end up exiting and need to cancel the queued work.
      syzbot caught this:
      
      ============================================
      WARNING: possible recursive locking detected
      5.16.0-rc4-syzkaller #0 Not tainted
      --------------------------------------------
      iou-wrk-6468/6471 is trying to acquire lock:
      ffff88801aa98018 (&wqe->lock){+.+.}-{2:2}, at: io_worker_cancel_cb+0xb7/0x210 fs/io-wq.c:187
      
      but task is already holding lock:
      ffff88801aa98018 (&wqe->lock){+.+.}-{2:2}, at: io_wq_worker_sleeping+0xb6/0x140 fs/io-wq.c:700
      
      other info that might help us debug this:
       Possible unsafe locking scenario:
      
             CPU0
             ----
        lock(&wqe->lock);
        lock(&wqe->lock);
      
       *** DEADLOCK ***
      
       May be due to missing lock nesting notation
      
      1 lock held by iou-wrk-6468/6471:
       #0: ffff88801aa98018 (&wqe->lock){+.+.}-{2:2}, at: io_wq_worker_sleeping+0xb6/0x140 fs/io-wq.c:700
      
      stack backtrace:
      CPU: 1 PID: 6471 Comm: iou-wrk-6468 Not tainted 5.16.0-rc4-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       <TASK>
       __dump_stack lib/dump_stack.c:88 [inline]
       dump_stack_lvl+0x1dc/0x2d8 lib/dump_stack.c:106
       print_deadlock_bug kernel/locking/lockdep.c:2956 [inline]
       check_deadlock kernel/locking/lockdep.c:2999 [inline]
       validate_chain+0x5984/0x8240 kernel/locking/lockdep.c:3788
       __lock_acquire+0x1382/0x2b00 kernel/locking/lockdep.c:5027
       lock_acquire+0x19f/0x4d0 kernel/locking/lockdep.c:5637
       __raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline]
       _raw_spin_lock+0x2a/0x40 kernel/locking/spinlock.c:154
       io_worker_cancel_cb+0xb7/0x210 fs/io-wq.c:187
       io_wq_cancel_tw_create fs/io-wq.c:1220 [inline]
       io_queue_worker_create+0x3cf/0x4c0 fs/io-wq.c:372
       io_wq_worker_sleeping+0xbe/0x140 fs/io-wq.c:701
       sched_submit_work kernel/sched/core.c:6295 [inline]
       schedule+0x67/0x1f0 kernel/sched/core.c:6323
       schedule_timeout+0xac/0x300 kernel/time/timer.c:1857
       wait_woken+0xca/0x1b0 kernel/sched/wait.c:460
       unix_msg_wait_data net/unix/unix_bpf.c:32 [inline]
       unix_bpf_recvmsg+0x7f9/0xe20 net/unix/unix_bpf.c:77
       unix_stream_recvmsg+0x214/0x2c0 net/unix/af_unix.c:2832
       sock_recvmsg_nosec net/socket.c:944 [inline]
       sock_recvmsg net/socket.c:962 [inline]
       sock_read_iter+0x3a7/0x4d0 net/socket.c:1035
       call_read_iter include/linux/fs.h:2156 [inline]
       io_iter_do_read fs/io_uring.c:3501 [inline]
       io_read fs/io_uring.c:3558 [inline]
       io_issue_sqe+0x144c/0x9590 fs/io_uring.c:6671
       io_wq_submit_work+0x2d8/0x790 fs/io_uring.c:6836
       io_worker_handle_work+0x808/0xdd0 fs/io-wq.c:574
       io_wqe_worker+0x395/0x870 fs/io-wq.c:630
       ret_from_fork+0x1f/0x30
      
      We can safely drop the lock before doing work creation, making the two
      contexts the same in that regard.
      
      Reported-by: syzbot+b18b8be69df33a3918e9@syzkaller.appspotmail.com
      Fixes: 71a85387 ("io-wq: check for wq exit after adding new worker task_work")
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      d800c65c
  11. 11 12月, 2021 1 次提交
    • J
      io-wq: check for wq exit after adding new worker task_work · 71a85387
      Jens Axboe 提交于
      We check IO_WQ_BIT_EXIT before attempting to create a new worker, and
      wq exit cancels pending work if we have any. But it's possible to have
      a race between the two, where creation checks exit finding it not set,
      but we're in the process of exiting. The exit side will cancel pending
      creation task_work, but there's a gap where we add task_work after we've
      canceled existing creations at exit time.
      
      Fix this by checking the EXIT bit post adding the creation task_work.
      If it's set, run the same cancelation that exit does.
      
      Reported-and-tested-by: syzbot+b60c982cb0efc5e05a47@syzkaller.appspotmail.com
      Reviewed-by: NHao Xu <haoxu@linux.alibaba.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      71a85387
  12. 07 12月, 2021 1 次提交
    • J
      io-wq: remove spurious bit clear on task_work addition · e47498af
      Jens Axboe 提交于
      There's a small race here where the task_work could finish and drop
      the worker itself, so that by the time that task_work_add() returns
      with a successful addition we've already put the worker.
      
      The worker callbacks clear this bit themselves, so we don't actually
      need to manually clear it in the caller. Get rid of it.
      
      Reported-by: syzbot+b60c982cb0efc5e05a47@syzkaller.appspotmail.com
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      e47498af
  13. 03 12月, 2021 1 次提交
  14. 12 11月, 2021 1 次提交
    • J
      io-wq: serialize hash clear with wakeup · d3e3c102
      Jens Axboe 提交于
      We need to ensure that we serialize the stalled and hash bits with the
      wait_queue wait handler, or we could be racing with someone modifying
      the hashed state after we find it busy, but before we then give up and
      wait for it to be cleared. This can cause random delays or stalls when
      handling buffered writes for many files, where some of these files cause
      hash collisions between the worker threads.
      
      Cc: stable@vger.kernel.org
      Reported-by: NDaniel Black <daniel@mariadb.org>
      Fixes: e941894e ("io-wq: make buffered file write hashed work map per-ctx")
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      d3e3c102
  15. 03 11月, 2021 1 次提交
  16. 29 10月, 2021 1 次提交
    • P
      io-wq: remove worker to owner tw dependency · 1d5f5ea7
      Pavel Begunkov 提交于
      INFO: task iou-wrk-6609:6612 blocked for more than 143 seconds.
            Not tainted 5.15.0-rc5-syzkaller #0
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      task:iou-wrk-6609    state:D stack:27944 pid: 6612 ppid:  6526 flags:0x00004006
      Call Trace:
       context_switch kernel/sched/core.c:4940 [inline]
       __schedule+0xb44/0x5960 kernel/sched/core.c:6287
       schedule+0xd3/0x270 kernel/sched/core.c:6366
       schedule_timeout+0x1db/0x2a0 kernel/time/timer.c:1857
       do_wait_for_common kernel/sched/completion.c:85 [inline]
       __wait_for_common kernel/sched/completion.c:106 [inline]
       wait_for_common kernel/sched/completion.c:117 [inline]
       wait_for_completion+0x176/0x280 kernel/sched/completion.c:138
       io_worker_exit fs/io-wq.c:183 [inline]
       io_wqe_worker+0x66d/0xc40 fs/io-wq.c:597
       ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
      
      io-wq worker may submit a task_work to the master task and upon
      io_worker_exit() wait for the tw to get executed. The problem appears
      when the master task is waiting in coredump.c:
      
      468                     freezer_do_not_count();
      469                     wait_for_completion(&core_state->startup);
      470                     freezer_count();
      
      Apparently having some dependency on children threads getting everything
      stuck. Workaround it by cancelling the taks_work callback that causes it
      before going into io_worker_exit() waiting.
      
      p.s. probably a better option is to not submit tw elevating the refcount
      in the first place, but let's leave this excercise for the future.
      
      Cc: stable@vger.kernel.org
      Reported-and-tested-by: syzbot+27d62ee6f256b186883e@syzkaller.appspotmail.com
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Link: https://lore.kernel.org/r/142a716f4ed936feae868959059154362bfa8c19.1635509451.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
      1d5f5ea7
  17. 23 10月, 2021 1 次提交
  18. 20 10月, 2021 1 次提交
  19. 19 10月, 2021 1 次提交
  20. 28 9月, 2021 1 次提交
  21. 25 9月, 2021 1 次提交
  22. 20 9月, 2021 1 次提交
    • P
      audit,io_uring,io-wq: add some basic audit support to io_uring · 5bd2182d
      Paul Moore 提交于
      This patch adds basic auditing to io_uring operations, regardless of
      their context.  This is accomplished by allocating audit_context
      structures for the io-wq worker and io_uring SQPOLL kernel threads
      as well as explicitly auditing the io_uring operations in
      io_issue_sqe().  Individual io_uring operations can bypass auditing
      through the "audit_skip" field in the struct io_op_def definition for
      the operation; although great care must be taken so that security
      relevant io_uring operations do not bypass auditing; please contact
      the audit mailing list (see the MAINTAINERS file) with any questions.
      
      The io_uring operations are audited using a new AUDIT_URINGOP record,
      an example is shown below:
      
        type=UNKNOWN[1336] msg=audit(1631800225.981:37289):
          uring_op=19 success=yes exit=0 items=0 ppid=15454 pid=15681
          uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0
          subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
          key=(null)
      
      Thanks to Richard Guy Briggs for review and feedback.
      Signed-off-by: NPaul Moore <paul@paul-moore.com>
      5bd2182d
  23. 14 9月, 2021 1 次提交
  24. 13 9月, 2021 2 次提交
  25. 09 9月, 2021 2 次提交
    • Q
      io-wq: fix memory leak in create_io_worker() · 66e70be7
      Qiang.zhang 提交于
      BUG: memory leak
      unreferenced object 0xffff888126fcd6c0 (size 192):
        comm "syz-executor.1", pid 11934, jiffies 4294983026 (age 15.690s)
        backtrace:
          [<ffffffff81632c91>] kmalloc_node include/linux/slab.h:609 [inline]
          [<ffffffff81632c91>] kzalloc_node include/linux/slab.h:732 [inline]
          [<ffffffff81632c91>] create_io_worker+0x41/0x1e0 fs/io-wq.c:739
          [<ffffffff8163311e>] io_wqe_create_worker fs/io-wq.c:267 [inline]
          [<ffffffff8163311e>] io_wqe_enqueue+0x1fe/0x330 fs/io-wq.c:866
          [<ffffffff81620b64>] io_queue_async_work+0xc4/0x200 fs/io_uring.c:1473
          [<ffffffff8162c59c>] __io_queue_sqe+0x34c/0x510 fs/io_uring.c:6933
          [<ffffffff8162c7ab>] io_req_task_submit+0x4b/0xa0 fs/io_uring.c:2233
          [<ffffffff8162cb48>] io_async_task_func+0x108/0x1c0 fs/io_uring.c:5462
          [<ffffffff816259e3>] tctx_task_work+0x1b3/0x3a0 fs/io_uring.c:2158
          [<ffffffff81269b43>] task_work_run+0x73/0xb0 kernel/task_work.c:164
          [<ffffffff812dcdd1>] tracehook_notify_signal include/linux/tracehook.h:212 [inline]
          [<ffffffff812dcdd1>] handle_signal_work kernel/entry/common.c:146 [inline]
          [<ffffffff812dcdd1>] exit_to_user_mode_loop kernel/entry/common.c:172 [inline]
          [<ffffffff812dcdd1>] exit_to_user_mode_prepare+0x151/0x180 kernel/entry/common.c:209
          [<ffffffff843ff25d>] __syscall_exit_to_user_mode_work kernel/entry/common.c:291 [inline]
          [<ffffffff843ff25d>] syscall_exit_to_user_mode+0x1d/0x40 kernel/entry/common.c:302
          [<ffffffff843fa4a2>] do_syscall_64+0x42/0xb0 arch/x86/entry/common.c:86
          [<ffffffff84600068>] entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      when create_io_thread() return error, and not retry, the worker object
      need to be freed.
      
      Reported-by: syzbot+65454c239241d3d647da@syzkaller.appspotmail.com
      Signed-off-by: NQiang.zhang <qiang.zhang@windriver.com>
      Link: https://lore.kernel.org/r/20210909115822.181188-1-qiang.zhang@windriver.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
      66e70be7
    • J
      io-wq: fix silly logic error in io_task_work_match() · 3b33e3f4
      Jens Axboe 提交于
      We check for the func with an OR condition, which means it always ends
      up being false and we never match the task_work we want to cancel. In
      the unexpected case that we do exit with that pending, we can trigger
      a hang waiting for a worker to exit, but it was never created. syzbot
      reports that as such:
      
      INFO: task syz-executor687:8514 blocked for more than 143 seconds.
            Not tainted 5.14.0-syzkaller #0
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      task:syz-executor687 state:D stack:27296 pid: 8514 ppid:  8479 flags:0x00024004
      Call Trace:
       context_switch kernel/sched/core.c:4940 [inline]
       __schedule+0x940/0x26f0 kernel/sched/core.c:6287
       schedule+0xd3/0x270 kernel/sched/core.c:6366
       schedule_timeout+0x1db/0x2a0 kernel/time/timer.c:1857
       do_wait_for_common kernel/sched/completion.c:85 [inline]
       __wait_for_common kernel/sched/completion.c:106 [inline]
       wait_for_common kernel/sched/completion.c:117 [inline]
       wait_for_completion+0x176/0x280 kernel/sched/completion.c:138
       io_wq_exit_workers fs/io-wq.c:1162 [inline]
       io_wq_put_and_exit+0x40c/0xc70 fs/io-wq.c:1197
       io_uring_clean_tctx fs/io_uring.c:9607 [inline]
       io_uring_cancel_generic+0x5fe/0x740 fs/io_uring.c:9687
       io_uring_files_cancel include/linux/io_uring.h:16 [inline]
       do_exit+0x265/0x2a30 kernel/exit.c:780
       do_group_exit+0x125/0x310 kernel/exit.c:922
       get_signal+0x47f/0x2160 kernel/signal.c:2868
       arch_do_signal_or_restart+0x2a9/0x1c40 arch/x86/kernel/signal.c:865
       handle_signal_work kernel/entry/common.c:148 [inline]
       exit_to_user_mode_loop kernel/entry/common.c:172 [inline]
       exit_to_user_mode_prepare+0x17d/0x290 kernel/entry/common.c:209
       __syscall_exit_to_user_mode_work kernel/entry/common.c:291 [inline]
       syscall_exit_to_user_mode+0x19/0x60 kernel/entry/common.c:302
       do_syscall_64+0x42/0xb0 arch/x86/entry/common.c:86
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      RIP: 0033:0x445cd9
      RSP: 002b:00007fc657f4b308 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
      RAX: 0000000000000001 RBX: 00000000004cb448 RCX: 0000000000445cd9
      RDX: 00000000000f4240 RSI: 0000000000000081 RDI: 00000000004cb44c
      RBP: 00000000004cb440 R08: 000000000000000e R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 000000000049b154
      R13: 0000000000000003 R14: 00007fc657f4b400 R15: 0000000000022000
      
      While in there, also decrement accr->nr_workers. This isn't strictly
      needed as we're exiting, but let's make sure the accounting matches up.
      
      Fixes: 3146cba9 ("io-wq: make worker creation resilient against signals")
      Reported-by: syzbot+f62d3e0a4ea4f38f5326@syzkaller.appspotmail.com
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      3b33e3f4
  26. 08 9月, 2021 1 次提交
    • P
      io-wq: fix cancellation on create-worker failure · 713b9825
      Pavel Begunkov 提交于
      WARNING: CPU: 0 PID: 10392 at fs/io_uring.c:1151 req_ref_put_and_test
      fs/io_uring.c:1151 [inline]
      WARNING: CPU: 0 PID: 10392 at fs/io_uring.c:1151 req_ref_put_and_test
      fs/io_uring.c:1146 [inline]
      WARNING: CPU: 0 PID: 10392 at fs/io_uring.c:1151
      io_req_complete_post+0xf5b/0x1190 fs/io_uring.c:1794
      Modules linked in:
      Call Trace:
       tctx_task_work+0x1e5/0x570 fs/io_uring.c:2158
       task_work_run+0xe0/0x1a0 kernel/task_work.c:164
       tracehook_notify_signal include/linux/tracehook.h:212 [inline]
       handle_signal_work kernel/entry/common.c:146 [inline]
       exit_to_user_mode_loop kernel/entry/common.c:172 [inline]
       exit_to_user_mode_prepare+0x232/0x2a0 kernel/entry/common.c:209
       __syscall_exit_to_user_mode_work kernel/entry/common.c:291 [inline]
       syscall_exit_to_user_mode+0x19/0x60 kernel/entry/common.c:302
       do_syscall_64+0x42/0xb0 arch/x86/entry/common.c:86
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      When io_wqe_enqueue() -> io_wqe_create_worker() fails, we can't just
      call io_run_cancel() to clean up the request, it's already enqueued via
      io_wqe_insert_work() and will be executed either by some other worker
      during cancellation (e.g. in io_wq_put_and_exit()).
      Reported-by: NHao Sun <sunhao.th@gmail.com>
      Fixes: 3146cba9 ("io-wq: make worker creation resilient against signals")
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Link: https://lore.kernel.org/r/93b9de0fcf657affab0acfd675d4abcd273ee863.1631092071.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
      713b9825
  27. 03 9月, 2021 2 次提交
    • J
      io-wq: make worker creation resilient against signals · 3146cba9
      Jens Axboe 提交于
      If a task is queueing async work and also handling signals, then we can
      run into the case where create_io_thread() is interrupted and returns
      failure because of that. If this happens for creating the first worker
      in a group, then that worker will never get created and we can hang the
      ring.
      
      If we do get a fork failure, retry from task_work. With signals we have
      to be a bit careful as we cannot simply queue as task_work, as we'll
      still have signals pending at that point. Punt over a normal workqueue
      first and then create from task_work after that.
      
      Lastly, ensure that we handle fatal worker creations. Worker creation
      failures are normally not fatal, only if we fail to create one in an empty
      worker group can we not make progress. Right now that is ignored, ensure
      that we handle that and run cancel on the work item.
      
      There are two paths that create new workers - one is the "existing worker
      going to sleep", and the other is "no workers found for this work, create
      one". The former is never fatal, as workers do exist in the group. Only
      the latter needs to be carefully handled.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      3146cba9
    • J
      io-wq: get rid of FIXED worker flag · 05c5f4ee
      Jens Axboe 提交于
      It makes the logic easier to follow if we just get rid of the fixed worker
      flag, and simply ensure that we never exit the last worker in the group.
      This also means that no particular worker is special.
      
      Just track the last timeout state, and if we have hit it and no work
      is pending, check if there are other workers. If yes, then we can exit
      this one safely.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      05c5f4ee
  28. 02 9月, 2021 1 次提交
    • J
      io-wq: only exit on fatal signals · 15e20db2
      Jens Axboe 提交于
      If the application uses io_uring and also relies heavily on signals
      for communication, that can cause io-wq workers to spuriously exit
      just because the parent has a signal pending. Just ignore signals
      unless they are fatal.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      15e20db2