- 17 3月, 2022 10 次提交
-
-
由 Pavel Begunkov 提交于
io_commit_cqring() is currently always under spinlock section, so it's always better to keep it as slim as possible. Move __io_commit_cqring_flush() out of it into ev_posted*(). If fast checks do fail and this post-processing is required, we'll reacquire ->completion_lock, which is fine as we don't care about performance of draining and offset timeouts. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/ec4e81fd720d3bc7bca8cb9152e080dad1a052f1.1647481208.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
A preparation patch, which moves a fast ->io_ev_fd check out of io_eventfd_signal() into ev_posted*(). Compilers are smart enough for it to not change anything, but will need it later. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/ec4091ac76d43912b73917e8db651c2dac4b7b01.1647481208.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
smp_mb() in io_cqring_ev_posted_iopoll() is only there because of waitqueue_active(). However, non-SQPOLL IOPOLL ring doesn't wake the CQ and so the barrier there is useless. Kill it, it's usually pretty expensive. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/d72e8ef6f7a3f6a72e18fad8409f7d47afc8da7d.1647481208.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
Use io_req_complete_failed() in kiocb_done(). This cleans up the code, but also ensures that a provided buffers is correctly freed on failure. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/a4880106fcf199d5810707fe2d17126fcdf18bc4.1647481208.git.asml.silence@gmail.com [axboe: split from previous patch] Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
It's never a good idea to put provided buffers without notifying the userspace, it'll lead to userspace leaks, so add io_put_kbuf() in io_req_complete_failed(). The fail helper is called by all sorts of requests, but it's still safe to do as io_put_kbuf() will return 0 in for all requests that don't support and so don't expect provided buffers. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/a4880106fcf199d5810707fe2d17126fcdf18bc4.1647481208.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
io_fill_cqe*() is not always the best way to post CQEs just because there is enough of infrastructure on top. Replace a raw call to a variant of it inside of io_timeout_cancel(), which also saves us some bloating and might help with batching later. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/46113ec4345764b4aef3b384ce38cceabaeedcbb.1647481208.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
Restore consistency in __io_fill_cqe* like helpers, always honouring "io_" prefix and adding "req" when we're passing in a request. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/bd016ff5c1a4f74687828069d2619d8a65e0c6d7.1647481208.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
With commit "io_uring: cache req->apoll->events in req->cflags" applied, we now have just io_poll_remove_entries() dipping into req->apoll when it isn't strictly necessary. Mark poll and double-poll with a flag, so we know if we need to look at apoll->double_poll. This avoids pulling in those cachelines if we don't need them. The common case is that the poll wake handler already removed these entries while hot off the completion path. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
When we arm poll on behalf of a different type of request, like a network receive, then we allocate req->apoll as our poll entry. Running network workloads shows io_poll_check_events() as the most expensive part of io_uring, and it's all due to having to pull in req->apoll instead of just the request which we have hot already. Cache poll->events in req->cflags, which isn't used until the request completes anyway. This isn't strictly needed for regular poll, where req->poll.events is used and thus already hot, but for the sake of unification we do it all around. This saves 3-4% of overhead in certain request workloads. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
This serves two purposes: - We now have the last cacheline mostly unused for generic workloads, instead of having to pull in the poll refs explicitly for workloads that rely on poll arming. - It shrinks the io_kiocb from 232 to 224 bytes. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 16 3月, 2022 1 次提交
-
-
由 Jens Axboe 提交于
Particularly for networked workloads, io_uring intensively uses its poll based backend to get a notification when data/space is available. Profiling workloads, we see 3-4% of alloc+free that is directly attributed to just the apoll allocation and free (and the rest being skb alloc+free). For the fast path, we have ctx->uring_lock held already for both issue and the inline completions, and we can utilize that to avoid any extra locking needed to have a basic recycling cache for the apoll entries on both the alloc and free side. Double poll still requires an allocation. But those are rare and not a fast path item. With the simple cache in place, we see a 3-4% reduction in overhead for the workload. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 12 3月, 2022 1 次提交
-
-
由 Jens Axboe 提交于
Julia and the kernel test robot report that the prep handling for this command inadvertently checks one field twice: fs/io_uring.c:4338:42-56: duplicated argument to && or || Get rid of it. Reported-by: Nkernel test robot <lkp@intel.com> Reported-by: NJulia Lawall <julia.lawall@lip6.fr> Fixes: 4f57f06c ("io_uring: add support for IORING_OP_MSG_RING command") Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 11 3月, 2022 7 次提交
-
-
由 Jens Axboe 提交于
By default, io_uring will stop submitting a batch of requests if we run into an error submitting a request. This isn't strictly necessary, as the error result is passed out-of-band via a CQE anyway. And it can be a bit confusing for some applications. Provide a way to setup a ring that will continue submitting on error, when the error CQE has been posted. There's still one case that will break out of submission. If we fail allocating a request, then we'll still return -ENOMEM. We could in theory post a CQE for that condition too even if we never got a request. Leave that for a potential followup. Reported-by: NDylan Yudaken <dylany@fb.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
If we are using provided buffers, it's less than useful to have a buffer selected and pinned if a request needs to go async or arms poll for notification trigger on when we can process it. Recycle the buffer in those events, so we don't pin it for the duration of the request. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
If we drop buffers between scheduling a retry, then we need to re-import when we start the request again. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
Most of the logic in io_read() deals with regular files, and in some ways it would make sense to split the handling into S_IFREG and others. But at least for retry, we don't need to bother setting up a bunch of state just to abort in the loop later. In particular, don't bother forcing setup of async data for a normal non-vectored read when we don't need it. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Olivier Langlois 提交于
The sqpoll thread can be used for performing the napi busy poll in a similar way that it does io polling for file systems supporting direct access bypassing the page cache. The other way that io_uring can be used for napi busy poll is by calling io_uring_enter() to get events. If the user specify a timeout value, it is distributed between polling and sleeping by using the systemwide setting /proc/sys/net/core/busy_poll. The changes have been tested with this program: https://github.com/lano1106/io_uring_udp_ping and the result is: Without sqpoll: NAPI busy loop disabled: rtt min/avg/max/mdev = 40.631/42.050/58.667/1.547 us NAPI busy loop enabled: rtt min/avg/max/mdev = 30.619/31.753/61.433/1.456 us With sqpoll: NAPI busy loop disabled: rtt min/avg/max/mdev = 42.087/44.438/59.508/1.533 us NAPI busy loop enabled: rtt min/avg/max/mdev = 35.779/37.347/52.201/0.924 us Co-developed-by: NHao Xu <haoxu@linux.alibaba.com> Signed-off-by: NHao Xu <haoxu@linux.alibaba.com> Signed-off-by: NOlivier Langlois <olivier@trillion01.com> Link: https://lore.kernel.org/r/810bd9408ffc510ff08269e78dca9df4af0b9e4e.1646777484.git.olivier@trillion01.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Olivier Langlois 提交于
Move up the block manipulating the sig variable to execute code that may encounter an error and exit first before continuing executing the rest of the function and avoid useless computations Signed-off-by: NOlivier Langlois <olivier@trillion01.com> Link: https://lore.kernel.org/r/84513f7cc1b1fb31d8f4cb910aee033391d036b4.1646777484.git.olivier@trillion01.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
This adds support for IORING_OP_MSG_RING, which allows an SQE to signal another ring. That allows either waking up someone waiting on the ring, or even passing a 64-bit value via the user_data field in the CQE. sqe->fd must contain the fd of a ring that should receive the CQE. sqe->off will be propagated to the cqe->user_data on the target ring, and sqe->len will be propagated to cqe->res. The results CQE will have IORING_CQE_F_MSG set in its flags, to indicate that this CQE was generated from a messaging request rather than a SQE issued locally on that ring. This effectively allows passing a 64-bit and a 32-bit quantify between the two rings. This request type has the following request specific error cases: - -EBADFD. Set if the sqe->fd doesn't point to a file descriptor that is of the io_uring type. - -EOVERFLOW. Set if we were not able to deliver a request to the target ring. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 10 3月, 2022 15 次提交
-
-
由 Jens Axboe 提交于
In testing high frequency workloads with provided buffers, we spend a lot of time in allocating and freeing the buffer units themselves. Rather than repeatedly free and alloc them, add a recycling cache instead. There are two caches: - ctx->io_buffers_cache. This is the one we grab from in the submission path, and it's protected by ctx->uring_lock. For inline completions, we can recycle straight back to this cache and not need any extra locking. - ctx->io_buffers_comp. If we're not under uring_lock, then we use this list to recycle buffers. It's protected by the completion_lock. On adding a new buffer, check io_buffers_cache. If it's empty, check if we can splice entries from the io_buffers_comp_cache. This reduces about 5-10% of overhead from provided buffers, bringing it pretty close to the non-provided path. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
Lots of workloads use multiple threads, in which case the file table is shared between them. This makes getting and putting the ring file descriptor for each io_uring_enter(2) system call more expensive, as it involves an atomic get and put for each call. Similarly to how we allow registering normal file descriptors to avoid this overhead, add support for an io_uring_register(2) API that allows to register the ring fds themselves: 1) IORING_REGISTER_RING_FDS - takes an array of io_uring_rsrc_update structs, and registers them with the task. 2) IORING_UNREGISTER_RING_FDS - takes an array of io_uring_src_update structs, and unregisters them. When a ring fd is registered, it is internally represented by an offset. This offset is returned to the application, and the application then uses this offset and sets IORING_ENTER_REGISTERED_RING for the io_uring_enter(2) system call. This works just like using a registered file descriptor, rather than a real one, in an SQE, where IOSQE_FIXED_FILE gets set to tell io_uring that we're using an internal offset/descriptor rather than a real file descriptor. In initial testing, this provides a nice bump in performance for threaded applications in real world cases where the batch count (eg number of requests submitted per io_uring_enter(2) invocation) is low. In a microbenchmark, submitting NOP requests, we see the following increases in performance: Requests per syscall Baseline Registered Increase ---------------------------------------------------------------- 1 ~7030K ~8080K +15% 2 ~13120K ~14800K +13% 4 ~22740K ~25300K +11% Co-developed-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Dylan Yudaken 提交于
Fix incorrect name reference in comment. ki_filp does not exist in the struct, but file does. Signed-off-by: NDylan Yudaken <dylany@fb.com> Link: https://lore.kernel.org/r/20220224105157.1332353-1-dylany@fb.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Dylan Yudaken 提交于
There is a slight optimisation to be had by calculating the correct pos pointer inside io_kiocb_update_pos and then using that later. It seems code size drops by a bit: 000000000000a1b0 0000000000000400 t io_read 000000000000a5b0 0000000000000319 t io_write vs 000000000000a1b0 00000000000003f6 t io_read 000000000000a5b0 0000000000000310 t io_write Signed-off-by: NDylan Yudaken <dylany@fb.com> Reviewed-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Dylan Yudaken 提交于
Update kiocb->ki_pos at execution time rather than in io_prep_rw(). io_prep_rw() happens before the job is enqueued to a worker and so the offset might be read multiple times before being executed once. Ensures that the file position in a set of _linked_ SQEs will be only obtained after earlier SQEs have completed, and so will include their incremented file position. Signed-off-by: NDylan Yudaken <dylany@fb.com> Reviewed-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Dylan Yudaken 提交于
io_kiocb_ppos is called in both branches, and it seems that the compiler does not fuse this. Fusing removes a few bytes from loop_rw_iter. Before: $ nm -S fs/io_uring.o | grep loop_rw_iter 0000000000002430 0000000000000124 t loop_rw_iter After: $ nm -S fs/io_uring.o | grep loop_rw_iter 0000000000002430 000000000000010d t loop_rw_iter Signed-off-by: NDylan Yudaken <dylany@fb.com> Reviewed-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Olivier Langlois 提交于
Avoid testing TIF_NOTIFY_SIGNAL twice by calling task_sigpending() directly from io_run_task_work_sig() Signed-off-by: NOlivier Langlois <olivier@trillion01.com> Link: https://lore.kernel.org/r/bd7c0495f7656e803e5736708591bb665e6eaacd.1645041650.git.olivier@trillion01.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Stefan Roesch 提交于
This makes the io-uring tracepoints consistent. Where it makes sense the tracepoints start with the following four fields: - context (ring) - request - user_data - opcode. Signed-off-by: NStefan Roesch <shr@fb.com> Link: https://lore.kernel.org/r/20220214180430.70572-3-shr@fb.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Stefan Roesch 提交于
This introduces the __fill_cqe function. This is necessary to correctly issue the io_uring_complete tracepoint. Signed-off-by: NStefan Roesch <shr@fb.com> Link: https://lore.kernel.org/r/20220214180430.70572-2-shr@fb.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Nathan Chancellor 提交于
Clang warns: fs/io_uring.c:9396:9: warning: variable 'ret' is uninitialized when used here [-Wuninitialized] return ret; ^~~ fs/io_uring.c:9373:13: note: initialize the variable 'ret' to silence this warning int fd, ret; ^ = 0 1 warning generated. Just return 0 directly and reduce the scope of ret to the if statement, as that is the only place that it is used, which is how the function was before the fixes commit. Fixes: 1a75fac9a0f9 ("io_uring: avoid ring quiesce while registering/unregistering eventfd") Link: https://github.com/ClangBuiltLinux/linux/issues/1579Signed-off-by: NNathan Chancellor <nathan@kernel.org> Reviewed-by: NNick Desaulniers <ndesaulniers@google.com> Link: https://lore.kernel.org/r/20220207162410.1013466-1-nathan@kernel.orgSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Usama Arif 提交于
None of the opcodes in io_uring_register use ring quiesce anymore. Hence io_register_op_must_quiesce always returns false and io_ctx_quiesce is never called. Signed-off-by: NUsama Arif <usama.arif@bytedance.com> Link: https://lore.kernel.org/r/20220204145117.1186568-6-usama.arif@bytedance.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Usama Arif 提交于
IORING_SETUP_R_DISABLED prevents submitting requests and so there will be no requests until IORING_REGISTER_ENABLE_RINGS is called. And IORING_REGISTER_RESTRICTIONS works only before IORING_REGISTER_ENABLE_RINGS is called. Hence ring quiesce is not needed for these opcodes. Signed-off-by: NUsama Arif <usama.arif@bytedance.com> Link: https://lore.kernel.org/r/20220204145117.1186568-5-usama.arif@bytedance.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Usama Arif 提交于
This is done using the RCU data structure (io_ev_fd). eventfd_async is moved from io_ring_ctx to io_ev_fd which is RCU protected hence avoiding ring quiesce which is much more expensive than an RCU lock. The place where eventfd_async is read is already under rcu_read_lock so there is no extra RCU read-side critical section needed. Signed-off-by: NUsama Arif <usama.arif@bytedance.com> Link: https://lore.kernel.org/r/20220204145117.1186568-4-usama.arif@bytedance.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Usama Arif 提交于
This is done by creating a new RCU data structure (io_ev_fd) as part of io_ring_ctx that holds the eventfd_ctx. The function io_eventfd_signal is executed under rcu_read_lock with a single rcu_dereference to io_ev_fd so that if another thread unregisters the eventfd while io_eventfd_signal is still being executed, the eventfd_signal for which io_eventfd_signal was called completes successfully. The process of registering/unregistering eventfd is already done under uring_lock so multiple threads won't enter a race condition while registering/unregistering eventfd. With the above approach ring quiesce can be avoided which is much more expensive then using RCU lock. On the system tested, io_uring_register with IORING_REGISTER_EVENTFD takes less than 1ms with RCU lock, compared to 15ms before with ring quiesce. Signed-off-by: NUsama Arif <usama.arif@bytedance.com> Link: https://lore.kernel.org/r/20220204145117.1186568-3-usama.arif@bytedance.com [axboe: long line fixups] Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Usama Arif 提交于
The information on whether eventfd is registered is not very useful and would result in the tracepoint being enclosed in an rcu_readlock in a later patch that tries to avoid ring quiesce for registering eventfd. Suggested-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NUsama Arif <usama.arif@bytedance.com> Link: https://lore.kernel.org/r/20220204145117.1186568-2-usama.arif@bytedance.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
- 23 2月, 2022 1 次提交
-
-
由 Dylan Yudaken 提交于
io_rsrc_ref_quiesce will unlock the uring while it waits for references to the io_rsrc_data to be killed. There are other places to the data that might add references to data via calls to io_rsrc_node_switch. There is a race condition where this reference can be added after the completion has been signalled. At this point the io_rsrc_ref_quiesce call will wake up and relock the uring, assuming the data is unused and can be freed - although it is actually being used. To fix this check in io_rsrc_ref_quiesce if a resource has been revived. Reported-by: syzbot+ca8bf833622a1662745b@syzkaller.appspotmail.com Cc: stable@vger.kernel.org Signed-off-by: NDylan Yudaken <dylany@fb.com> Link: https://lore.kernel.org/r/20220222161751.995746-1-dylany@fb.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
- 21 2月, 2022 1 次提交
-
-
由 Jens Axboe 提交于
If an application calls io_uring_enter(2) with a timespec passed in, convert that timespec to ktime_t rather than jiffies. The latter does not provide the granularity the application may expect, and may in fact provided different granularity on different systems, depending on what the HZ value is configured at. Turn the timespec into an absolute ktime_t, and use that with schedule_hrtimeout() instead. Link: https://github.com/axboe/liburing/issues/531 Cc: stable@vger.kernel.org Reported-by: NBob Chen <chenbo.chen@alibaba-inc.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 15 2月, 2022 1 次提交
-
-
由 Eric Dumazet 提交于
Looping ~65535 times doing kmalloc() calls can trigger soft lockups, especially with DEBUG features (like KASAN). [ 253.536212] watchdog: BUG: soft lockup - CPU#64 stuck for 26s! [b219417889:12575] [ 253.544433] Modules linked in: vfat fat i2c_mux_pca954x i2c_mux spidev cdc_acm xhci_pci xhci_hcd sha3_generic gq(O) [ 253.544451] CPU: 64 PID: 12575 Comm: b219417889 Tainted: G S O 5.17.0-smp-DEV #801 [ 253.544457] RIP: 0010:kernel_text_address (./include/asm-generic/sections.h:192 ./include/linux/kallsyms.h:29 kernel/extable.c:67 kernel/extable.c:98) [ 253.544464] Code: 0f 93 c0 48 c7 c1 e0 63 d7 a4 48 39 cb 0f 92 c1 20 c1 0f b6 c1 5b 5d c3 90 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 53 48 89 fb <48> c7 c0 00 00 80 a0 41 be 01 00 00 00 48 39 c7 72 0c 48 c7 c0 40 [ 253.544468] RSP: 0018:ffff8882d8baf4c0 EFLAGS: 00000246 [ 253.544471] RAX: 1ffff1105b175e00 RBX: ffffffffa13ef09a RCX: 00000000a13ef001 [ 253.544474] RDX: ffffffffa13ef09a RSI: ffff8882d8baf558 RDI: ffffffffa13ef09a [ 253.544476] RBP: ffff8882d8baf4d8 R08: ffff8882d8baf5e0 R09: 0000000000000004 [ 253.544479] R10: ffff8882d8baf5e8 R11: ffffffffa0d59a50 R12: ffff8882eab20380 [ 253.544481] R13: ffffffffa0d59a50 R14: dffffc0000000000 R15: 1ffff1105b175eb0 [ 253.544483] FS: 00000000016d3380(0000) GS:ffff88af48c00000(0000) knlGS:0000000000000000 [ 253.544486] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 253.544488] CR2: 00000000004af0f0 CR3: 00000002eabfa004 CR4: 00000000003706e0 [ 253.544491] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 253.544492] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 253.544494] Call Trace: [ 253.544496] <TASK> [ 253.544498] ? io_queue_sqe (fs/io_uring.c:7143) [ 253.544505] __kernel_text_address (kernel/extable.c:78) [ 253.544508] unwind_get_return_address (arch/x86/kernel/unwind_frame.c:19) [ 253.544514] arch_stack_walk (arch/x86/kernel/stacktrace.c:27) [ 253.544517] ? io_queue_sqe (fs/io_uring.c:7143) [ 253.544521] stack_trace_save (kernel/stacktrace.c:123) [ 253.544527] ____kasan_kmalloc (mm/kasan/common.c:39 mm/kasan/common.c:45 mm/kasan/common.c:436 mm/kasan/common.c:515) [ 253.544531] ? ____kasan_kmalloc (mm/kasan/common.c:39 mm/kasan/common.c:45 mm/kasan/common.c:436 mm/kasan/common.c:515) [ 253.544533] ? __kasan_kmalloc (mm/kasan/common.c:524) [ 253.544535] ? kmem_cache_alloc_trace (./include/linux/kasan.h:270 mm/slab.c:3567) [ 253.544541] ? io_issue_sqe (fs/io_uring.c:4556 fs/io_uring.c:4589 fs/io_uring.c:6828) [ 253.544544] ? __io_queue_sqe (fs/io_uring.c:?) [ 253.544551] __kasan_kmalloc (mm/kasan/common.c:524) [ 253.544553] kmem_cache_alloc_trace (./include/linux/kasan.h:270 mm/slab.c:3567) [ 253.544556] ? io_issue_sqe (fs/io_uring.c:4556 fs/io_uring.c:4589 fs/io_uring.c:6828) [ 253.544560] io_issue_sqe (fs/io_uring.c:4556 fs/io_uring.c:4589 fs/io_uring.c:6828) [ 253.544564] ? __kasan_slab_alloc (mm/kasan/common.c:45 mm/kasan/common.c:436 mm/kasan/common.c:469) [ 253.544567] ? __kasan_slab_alloc (mm/kasan/common.c:39 mm/kasan/common.c:45 mm/kasan/common.c:436 mm/kasan/common.c:469) [ 253.544569] ? kmem_cache_alloc_bulk (mm/slab.h:732 mm/slab.c:3546) [ 253.544573] ? __io_alloc_req_refill (fs/io_uring.c:2078) [ 253.544578] ? io_submit_sqes (fs/io_uring.c:7441) [ 253.544581] ? __se_sys_io_uring_enter (fs/io_uring.c:10154 fs/io_uring.c:10096) [ 253.544584] ? __x64_sys_io_uring_enter (fs/io_uring.c:10096) [ 253.544587] ? do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) [ 253.544590] ? entry_SYSCALL_64_after_hwframe (??:?) [ 253.544596] __io_queue_sqe (fs/io_uring.c:?) [ 253.544600] io_queue_sqe (fs/io_uring.c:7143) [ 253.544603] io_submit_sqe (fs/io_uring.c:?) [ 253.544608] io_submit_sqes (fs/io_uring.c:?) [ 253.544612] __se_sys_io_uring_enter (fs/io_uring.c:10154 fs/io_uring.c:10096) [ 253.544616] __x64_sys_io_uring_enter (fs/io_uring.c:10096) [ 253.544619] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) [ 253.544623] entry_SYSCALL_64_after_hwframe (??:?) Fixes: ddf0322d ("io_uring: add IORING_OP_PROVIDE_BUFFERS") Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Pavel Begunkov <asml.silence@gmail.com> Cc: io-uring <io-uring@vger.kernel.org> Reported-by: Nsyzbot <syzkaller@googlegroups.com> Link: https://lore.kernel.org/r/20220215041003.2394784-1-eric.dumazet@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
- 07 2月, 2022 2 次提交
-
-
由 Shakeel Butt 提交于
On an overcommitted system which is running multiple workloads of varying priorities, it is preferred to trigger an oom-killer to kill a low priority workload than to let the high priority workload receiving ENOMEMs. On our memory overcommitted systems, we are seeing a lot of ENOMEMs instead of oom-kills because io_uring_setup callchain is using __GFP_NORETRY gfp flag which avoids the oom-killer. Let's remove it and allow the oom-killer to kill a lower priority job. Signed-off-by: NShakeel Butt <shakeelb@google.com> Link: https://lore.kernel.org/r/20220125051736.2981459-1-shakeelb@google.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Alviro Iskandar Setiawan 提交于
In io_recv(), if import_single_range() fails, the @flags variable is uninitialized, then it will goto out_free. After the goto, the compiler doesn't know that (ret < min_ret) is always true, so it thinks the "if ((flags & MSG_WAITALL) ..." path could be taken. The complaint comes from gcc-9 (Debian 9.3.0-22) 9.3.0: ``` fs/io_uring.c:5238 io_recvfrom() error: uninitialized symbol 'flags' ``` Fix this by bypassing the @ret and @flags check when import_single_range() fails. Reasons: 1. import_single_range() only returns -EFAULT when it fails. 2. At that point, @flags is uninitialized and shouldn't be read. Reported-by: Nkernel test robot <lkp@intel.com> Reported-by: NDan Carpenter <dan.carpenter@oracle.com> Reported-by: N"Chen, Rong A" <rong.a.chen@intel.com> Link: https://lore.gnuweeb.org/timl/d33bb5a9-8173-f65b-f653-51fc0681c6d6@intel.com/ Cc: Pavel Begunkov <asml.silence@gmail.com> Suggested-by: NAmmar Faizi <ammarfaizi2@gnuweeb.org> Fixes: 7297ce3d ("io_uring: improve send/recv error handling") Signed-off-by: NAlviro Iskandar Setiawan <alviro.iskandar@gmail.com> Signed-off-by: NAmmar Faizi <ammarfaizi2@gnuweeb.org> Link: https://lore.kernel.org/r/20220207140533.565411-1-ammarfaizi2@gnuweeb.orgSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
- 28 1月, 2022 1 次提交
-
-
由 Usama Arif 提交于
io_ring_ctx is not used in the function. Signed-off-by: NUsama Arif <usama.arif@bytedance.com> Link: https://lore.kernel.org/r/20220127140444.4016585-1-usama.arif@bytedance.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-