1. 19 2月, 2021 6 次提交
  2. 18 2月, 2021 1 次提交
  3. 17 2月, 2021 1 次提交
  4. 16 2月, 2021 1 次提交
  5. 14 2月, 2021 3 次提交
  6. 13 2月, 2021 4 次提交
  7. 12 2月, 2021 11 次提交
  8. 11 2月, 2021 3 次提交
    • J
      io_uring: assign file_slot prior to calling io_sqe_file_register() · e68a3ff8
      Jens Axboe 提交于
      We use the assigned slot in io_sqe_file_register(), and a previous
      patch moved the assignment to after we have called it. This isn't
      super pretty, and will get cleaned up in the future. For now, fix
      the regression by restoring the previous assignment/clear of the
      file_slot.
      
      Fixes: ea64ec02 ("io_uring: deduplicate file table slot calculation")
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      e68a3ff8
    • C
      io_uring: remove redundant initialization of variable ret · 4a245479
      Colin Ian King 提交于
      The variable ret is being initialized with a value that is never read
      and it is being updated later with a new value.  The initialization is
      redundant and can be removed.
      
      Addresses-Coverity: ("Unused value")
      Fixes: b63534c4 ("io_uring: re-issue block requests that failed because of resources")
      Signed-off-by: NColin Ian King <colin.king@canonical.com>
      Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      4a245479
    • P
      io_uring: unpark SQPOLL thread for cancelation · 34343786
      Pavel Begunkov 提交于
      We park SQPOLL task before going into io_uring_cancel_files(), so the
      task won't run task_works including those that might be important for
      the cancellation passes. In this case it's io_poll_remove_one(), which
      frees requests via io_put_req_deferred().
      
      Unpark it for while waiting, it's ok as we disable submissions
      beforehand, so no new requests will be generated.
      
      INFO: task syz-executor893:8493 blocked for more than 143 seconds.
      Call Trace:
       context_switch kernel/sched/core.c:4327 [inline]
       __schedule+0x90c/0x21a0 kernel/sched/core.c:5078
       schedule+0xcf/0x270 kernel/sched/core.c:5157
       io_uring_cancel_files fs/io_uring.c:8912 [inline]
       io_uring_cancel_task_requests+0xe70/0x11a0 fs/io_uring.c:8979
       __io_uring_files_cancel+0x110/0x1b0 fs/io_uring.c:9067
       io_uring_files_cancel include/linux/io_uring.h:51 [inline]
       do_exit+0x2fe/0x2ae0 kernel/exit.c:780
       do_group_exit+0x125/0x310 kernel/exit.c:922
       __do_sys_exit_group kernel/exit.c:933 [inline]
       __se_sys_exit_group kernel/exit.c:931 [inline]
       __x64_sys_exit_group+0x3a/0x50 kernel/exit.c:931
       do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Cc: stable@vger.kernel.org # 5.5+
      Reported-by: syzbot+695b03d82fa8e4901b06@syzkaller.appspotmail.com
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      34343786
  9. 10 2月, 2021 10 次提交
    • J
      io_uring: place ring SQ/CQ arrays under memcg memory limits · 26bfa89e
      Jens Axboe 提交于
      Instead of imposing rlimit memlock limits for the rings themselves,
      ensure that we account them properly under memcg with __GFP_ACCOUNT.
      We retain rlimit memlock for registered buffers, this is just for the
      ring arrays themselves.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      26bfa89e
    • J
      io_uring: enable kmemcg account for io_uring requests · 91f245d5
      Jens Axboe 提交于
      This puts io_uring under the memory cgroups accounting and limits for
      requests.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      91f245d5
    • J
      io_uring: enable req cache for IRQ driven IO · c7dae4ba
      Jens Axboe 提交于
      This is the last class of requests that cannot utilize the req alloc
      cache. Add a per-ctx req cache that is protected by the completion_lock,
      and refill our submit side cache when it gets over our batch count.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      c7dae4ba
    • H
      io_uring: fix possible deadlock in io_uring_poll · ed670c3f
      Hao Xu 提交于
      Abaci reported follow issue:
      
      [   30.615891] ======================================================
      [   30.616648] WARNING: possible circular locking dependency detected
      [   30.617423] 5.11.0-rc3-next-20210115 #1 Not tainted
      [   30.618035] ------------------------------------------------------
      [   30.618914] a.out/1128 is trying to acquire lock:
      [   30.619520] ffff88810b063868 (&ep->mtx){+.+.}-{3:3}, at: __ep_eventpoll_poll+0x9f/0x220
      [   30.620505]
      [   30.620505] but task is already holding lock:
      [   30.621218] ffff88810e952be8 (&ctx->uring_lock){+.+.}-{3:3}, at: __x64_sys_io_uring_enter+0x3f0/0x5b0
      [   30.622349]
      [   30.622349] which lock already depends on the new lock.
      [   30.622349]
      [   30.623289]
      [   30.623289] the existing dependency chain (in reverse order) is:
      [   30.624243]
      [   30.624243] -> #1 (&ctx->uring_lock){+.+.}-{3:3}:
      [   30.625263]        lock_acquire+0x2c7/0x390
      [   30.625868]        __mutex_lock+0xae/0x9f0
      [   30.626451]        io_cqring_overflow_flush.part.95+0x6d/0x70
      [   30.627278]        io_uring_poll+0xcb/0xd0
      [   30.627890]        ep_item_poll.isra.14+0x4e/0x90
      [   30.628531]        do_epoll_ctl+0xb7e/0x1120
      [   30.629122]        __x64_sys_epoll_ctl+0x70/0xb0
      [   30.629770]        do_syscall_64+0x2d/0x40
      [   30.630332]        entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [   30.631187]
      [   30.631187] -> #0 (&ep->mtx){+.+.}-{3:3}:
      [   30.631985]        check_prevs_add+0x226/0xb00
      [   30.632584]        __lock_acquire+0x1237/0x13a0
      [   30.633207]        lock_acquire+0x2c7/0x390
      [   30.633740]        __mutex_lock+0xae/0x9f0
      [   30.634258]        __ep_eventpoll_poll+0x9f/0x220
      [   30.634879]        __io_arm_poll_handler+0xbf/0x220
      [   30.635462]        io_issue_sqe+0xa6b/0x13e0
      [   30.635982]        __io_queue_sqe+0x10b/0x550
      [   30.636648]        io_queue_sqe+0x235/0x470
      [   30.637281]        io_submit_sqes+0xcce/0xf10
      [   30.637839]        __x64_sys_io_uring_enter+0x3fb/0x5b0
      [   30.638465]        do_syscall_64+0x2d/0x40
      [   30.638999]        entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [   30.639643]
      [   30.639643] other info that might help us debug this:
      [   30.639643]
      [   30.640618]  Possible unsafe locking scenario:
      [   30.640618]
      [   30.641402]        CPU0                    CPU1
      [   30.641938]        ----                    ----
      [   30.642664]   lock(&ctx->uring_lock);
      [   30.643425]                                lock(&ep->mtx);
      [   30.644498]                                lock(&ctx->uring_lock);
      [   30.645668]   lock(&ep->mtx);
      [   30.646321]
      [   30.646321]  *** DEADLOCK ***
      [   30.646321]
      [   30.647642] 1 lock held by a.out/1128:
      [   30.648424]  #0: ffff88810e952be8 (&ctx->uring_lock){+.+.}-{3:3}, at: __x64_sys_io_uring_enter+0x3f0/0x5b0
      [   30.649954]
      [   30.649954] stack backtrace:
      [   30.650592] CPU: 1 PID: 1128 Comm: a.out Not tainted 5.11.0-rc3-next-20210115 #1
      [   30.651554] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      [   30.652290] Call Trace:
      [   30.652688]  dump_stack+0xac/0xe3
      [   30.653164]  check_noncircular+0x11e/0x130
      [   30.653747]  ? check_prevs_add+0x226/0xb00
      [   30.654303]  check_prevs_add+0x226/0xb00
      [   30.654845]  ? add_lock_to_list.constprop.49+0xac/0x1d0
      [   30.655564]  __lock_acquire+0x1237/0x13a0
      [   30.656262]  lock_acquire+0x2c7/0x390
      [   30.656788]  ? __ep_eventpoll_poll+0x9f/0x220
      [   30.657379]  ? __io_queue_proc.isra.88+0x180/0x180
      [   30.658014]  __mutex_lock+0xae/0x9f0
      [   30.658524]  ? __ep_eventpoll_poll+0x9f/0x220
      [   30.659112]  ? mark_held_locks+0x5a/0x80
      [   30.659648]  ? __ep_eventpoll_poll+0x9f/0x220
      [   30.660229]  ? _raw_spin_unlock_irqrestore+0x2d/0x40
      [   30.660885]  ? trace_hardirqs_on+0x46/0x110
      [   30.661471]  ? __io_queue_proc.isra.88+0x180/0x180
      [   30.662102]  ? __ep_eventpoll_poll+0x9f/0x220
      [   30.662696]  __ep_eventpoll_poll+0x9f/0x220
      [   30.663273]  ? __ep_eventpoll_poll+0x220/0x220
      [   30.663875]  __io_arm_poll_handler+0xbf/0x220
      [   30.664463]  io_issue_sqe+0xa6b/0x13e0
      [   30.664984]  ? __lock_acquire+0x782/0x13a0
      [   30.665544]  ? __io_queue_proc.isra.88+0x180/0x180
      [   30.666170]  ? __io_queue_sqe+0x10b/0x550
      [   30.666725]  __io_queue_sqe+0x10b/0x550
      [   30.667252]  ? __fget_files+0x131/0x260
      [   30.667791]  ? io_req_prep+0xd8/0x1090
      [   30.668316]  ? io_queue_sqe+0x235/0x470
      [   30.668868]  io_queue_sqe+0x235/0x470
      [   30.669398]  io_submit_sqes+0xcce/0xf10
      [   30.669931]  ? xa_load+0xe4/0x1c0
      [   30.670425]  __x64_sys_io_uring_enter+0x3fb/0x5b0
      [   30.671051]  ? lockdep_hardirqs_on_prepare+0xde/0x180
      [   30.671719]  ? syscall_enter_from_user_mode+0x2b/0x80
      [   30.672380]  do_syscall_64+0x2d/0x40
      [   30.672901]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [   30.673503] RIP: 0033:0x7fd89c813239
      [   30.673962] Code: 01 00 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05  3d 01 f0 ff ff 73 01 c3 48 8b 0d 27 ec 2c 00 f7 d8 64 89 01 48
      [   30.675920] RSP: 002b:00007ffc65a7c628 EFLAGS: 00000217 ORIG_RAX: 00000000000001aa
      [   30.676791] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fd89c813239
      [   30.677594] RDX: 0000000000000000 RSI: 0000000000000014 RDI: 0000000000000003
      [   30.678678] RBP: 00007ffc65a7c720 R08: 0000000000000000 R09: 0000000003000000
      [   30.679492] R10: 0000000000000000 R11: 0000000000000217 R12: 0000000000400ff0
      [   30.680282] R13: 00007ffc65a7c840 R14: 0000000000000000 R15: 0000000000000000
      
      This might happen if we do epoll_wait on a uring fd while reading/writing
      the former epoll fd in a sqe in the former uring instance.
      So let's don't flush cqring overflow list, just do a simple check.
      Reported-by: NAbaci <abaci@linux.alibaba.com>
      Fixes: 6c503150 ("io_uring: patch up IOPOLL overflow_flush sync")
      Signed-off-by: NHao Xu <haoxu@linux.alibaba.com>
      Reviewed-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      ed670c3f
    • P
      io_uring: defer flushing cached reqs · e5d1bc0a
      Pavel Begunkov 提交于
      Awhile there are requests in the allocation cache -- use them, only if
      those ended go for the stashed memory in comp.free_list. As list
      manipulation are generally heavy and are not good for caches, flush them
      all or as much as can in one go.
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      [axboe: return success/failure from io_flush_cached_reqs()]
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      e5d1bc0a
    • P
      io_uring: take comp_state from ctx · c5eef2b9
      Pavel Begunkov 提交于
      __io_queue_sqe() is always called with a non-NULL comp_state, which is
      taken directly from context. Don't pass it around but infer from ctx.
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      c5eef2b9
    • J
      io_uring: enable req cache for task_work items · 65453d1e
      Jens Axboe 提交于
      task_work is run without utilizing the req alloc cache, so any deferred
      items don't get to take advantage of either the alloc or free side of it.
      With task_work now being wrapped by io_uring, we can use the ctx
      completion state to both use the req cache and the completion flush
      batching.
      
      With this, the only request type that cannot take advantage of the req
      cache is IRQ driven IO for regular files / block devices. Anything else,
      including IOPOLL polled IO to those same tyes, will take advantage of it.
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      65453d1e
    • J
      io_uring: provide FIFO ordering for task_work · 7cbf1722
      Jens Axboe 提交于
      task_work is a LIFO list, due to how it's implemented as a lockless
      list. For long chains of task_work, this can be problematic as the
      first entry added is the last one processed. Similarly, we'd waste
      a lot of CPU cycles reversing this list.
      
      Wrap the task_work so we have a single task_work entry per task per
      ctx, and use that to run it in the right order.
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      7cbf1722
    • J
      io_uring: use persistent request cache · 1b4c351f
      Jens Axboe 提交于
      Now that we have the submit_state in the ring itself, we can have io_kiocb
      allocations that are persistent across invocations. This reduces the time
      spent doing slab allocations and frees.
      
      [sil: rebased]
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      1b4c351f
    • P
      io_uring: feed reqs back into alloc cache · 6ff119a6
      Pavel Begunkov 提交于
      Make io_req_free_batch(), which is used for inline executed requests and
      IOPOLL, to return requests back into the allocation cache, so avoid
      most of kmalloc()/kfree() for those cases.
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      6ff119a6