提交 · f95dc207b93da9c88ddbb7741ec3730c6657b88e · openeuler / Kernel

02 9月, 2021 1 次提交

io-wq: split bounded and unbounded work into separate lists · f95dc207

由 Jens Axboe 提交于 8月 31, 2021

We've got a few issues that all boil down to the fact that we have one
list of pending work items, yet two different types of workers to
serve them. This causes some oddities around workers switching type and
even hashed work vs regular work on the same bounded list.

Just separate them out cleanly, similarly to how we already do
accounting of what is running. That provides a clean separation and
removes some corner cases that can cause stalls when handling IO
that is punted to io-wq.

Fixes: ecc53c48 ("io-wq: check max_worker limits if a worker transitions bound state")
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f95dc207

01 9月, 2021 7 次提交

io-wq: fix queue stalling race · 0242f642

由 Jens Axboe 提交于 8月 31, 2021

We need to set the stalled bit early, before we drop the lock for adding
us to the stall hash queue. If not, then we can race with new work being
queued between adding us to the stall hash and io_worker_handle_work()
marking us stalled.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

0242f642

io_uring: don't submit half-prepared drain request · b8ce1b9d

由 Pavel Begunkov 提交于 8月 31, 2021

[ 3784.910888] BUG: kernel NULL pointer dereference, address: 0000000000000020
[ 3784.910904] RIP: 0010:__io_file_supports_nowait+0x5/0xc0
[ 3784.910926] Call Trace:
[ 3784.910928]  ? io_read+0x17c/0x480
[ 3784.910945]  io_issue_sqe+0xcb/0x1840
[ 3784.910953]  __io_queue_sqe+0x44/0x300
[ 3784.910959]  io_req_task_submit+0x27/0x70
[ 3784.910962]  tctx_task_work+0xeb/0x1d0
[ 3784.910966]  task_work_run+0x61/0xa0
[ 3784.910968]  io_run_task_work_sig+0x53/0xa0
[ 3784.910975]  __x64_sys_io_uring_enter+0x22/0x30
[ 3784.910977]  do_syscall_64+0x3d/0x90
[ 3784.910981]  entry_SYSCALL_64_after_hwframe+0x44/0xae

io_drain_req() goes before checks for REQ_F_FAIL, which protect us from
submitting under-prepared request (e.g. failed in io_init_req(). Fail
such drained requests as well.

Fixes: a8295b98 ("io_uring: fix failed linkchain code logic")
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/e411eb9924d47a131b1e200b26b675df0c2b7627.1630415423.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

b8ce1b9d

io_uring: fix queueing half-created requests · c6d3d9cb

由 Pavel Begunkov 提交于 8月 31, 2021

[   27.259845] general protection fault, probably for non-canonical address 0xdffffc0000000005: 0000 [#1] SMP KASAN PTI
[   27.261043] KASAN: null-ptr-deref in range [0x0000000000000028-0x000000000000002f]
[   27.263730] RIP: 0010:sock_from_file+0x20/0x90
[   27.272444] Call Trace:
[   27.272736]  io_sendmsg+0x98/0x600
[   27.279216]  io_issue_sqe+0x498/0x68d0
[   27.281142]  __io_queue_sqe+0xab/0xb50
[   27.285830]  io_req_task_submit+0xbf/0x1b0
[   27.286306]  tctx_task_work+0x178/0xad0
[   27.288211]  task_work_run+0xe2/0x190
[   27.288571]  exit_to_user_mode_prepare+0x1a1/0x1b0
[   27.289041]  syscall_exit_to_user_mode+0x19/0x50
[   27.289521]  do_syscall_64+0x48/0x90
[   27.289871]  entry_SYSCALL_64_after_hwframe+0x44/0xae

io_req_complete_failed() -> io_req_complete_post() ->
io_req_task_queue() still would try to enqueue hard linked request,
which can be half prepared (e.g. failed init), so we can't allow
that to happen.

Fixes: a8295b98 ("io_uring: fix failed linkchain code logic")
Reported-by: syzbot+f9704d1878e290eddf73@syzkaller.appspotmail.com
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/70b513848c1000f88bd75965504649c6bb1415c0.1630415423.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

c6d3d9cb

io-wq: ensure that hash wait lock is IRQ disabling · 08bdbd39

由 Jens Axboe 提交于 8月 31, 2021

A previous commit removed the IRQ safety of the worker and wqe locks,
but that left one spot of the hash wait lock now being done without
already having IRQs disabled.

Ensure that we use the right locking variant for the hashed waitqueue
lock.

Fixes: a9a4aa9f ("io-wq: wqe and worker locks no longer need to be IRQ safe")
Signed-off-by: NJens Axboe <axboe@kernel.dk>

08bdbd39

io_uring: retry in case of short read on block device · 7db30437

由 Ming Lei 提交于 8月 21, 2021

In case of buffered reading from block device, when short read happens,
we should retry to read more, otherwise the IO will be completed
partially, for example, the following fio expects to read 2MB, but it
can only read 1M or less bytes:

    fio --name=onessd --filename=/dev/nvme0n1 --filesize=2M \
	--rw=randread --bs=2M --direct=0 --overwrite=0 --numjobs=1 \
	--iodepth=1 --time_based=0 --runtime=2 --ioengine=io_uring \
	--registerfiles --fixedbufs --gtod_reduce=1 --group_reporting

Fix the issue by allowing short read retry for block device, which sets
FMODE_BUF_RASYNC really.

Fixes: 9a173346 ("io_uring: fix short read retries for non-reg files")
Cc: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Reviewed-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/20210821150751.1290434-1-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

7db30437

io_uring: IORING_OP_WRITE needs hash_reg_file set · 7b3188e7

由 Jens Axboe 提交于 8月 30, 2021

During some testing, it became evident that using IORING_OP_WRITE doesn't
hash buffered writes like the other writes commands do. That's simply
an oversight, and can cause performance regressions when doing buffered
writes with this command.

Correct that and add the flag, so that buffered writes are correctly
hashed when using the non-iovec based write command.

Cc: stable@vger.kernel.org
Fixes: 3a6820f2 ("io_uring: add non-vectored read/write commands")
Signed-off-by: NJens Axboe <axboe@kernel.dk>

7b3188e7

io-wq: fix race between adding work and activating a free worker · 94ffb0a2

由 Jens Axboe 提交于 8月 30, 2021

The attempt to find and activate a free worker for new work is currently
combined with creating a new one if we don't find one, but that opens
io-wq up to a race where the worker that is found and activated can
put itself to sleep without knowing that it has been selected to perform
this new work.

Fix this by moving the activation into where we add the new work item,
then we can retain it within the wqe->lock scope and elimiate the race
with the worker itself checking inside the lock, but sleeping outside of
it.

Cc: stable@vger.kernel.org
Reported-by: NAndres Freund <andres@anarazel.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

94ffb0a2

30 8月, 2021 5 次提交

io-wq: fix wakeup race when adding new work · 87df7fb9

由 Jens Axboe 提交于 8月 30, 2021

When new work is added, io_wqe_enqueue() checks if we need to wake or
create a new worker. But that check is done outside the lock that
otherwise synchronizes us with a worker going to sleep, so we can end
up in the following situation:

CPU0				CPU1
lock
insert work
unlock
atomic_read(nr_running) != 0
				lock
				atomic_dec(nr_running)
no wakeup needed

Hold the wqe lock around the "need to wakeup" check. Then we can also get
rid of the temporary work_flags variable, as we know the work will remain
valid as long as we hold the lock.

Cc: stable@vger.kernel.org
Reported-by: NAndres Freund <andres@anarazel.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

87df7fb9

io-wq: wqe and worker locks no longer need to be IRQ safe · a9a4aa9f

由 Jens Axboe 提交于 8月 30, 2021

io_uring no longer queues async work off completion handlers that run in
hard or soft interrupt context, and that use case was the only reason that
io-wq had to use IRQ safe locks for wqe and worker locks.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

a9a4aa9f

io-wq: check max_worker limits if a worker transitions bound state · ecc53c48

由 Jens Axboe 提交于 8月 29, 2021

For the two places where new workers are created, we diligently check if
we are allowed to create a new worker. If we're currently at the limit
of how many workers of a given type we can have, then we don't create
any new ones.

If you have a mixed workload with various types of bound and unbounded
work, then it can happen that a worker finishes one type of work and
is then transitioned to the other type. For this case, we don't check
if we are actually allowed to do so. This can cause io-wq to temporarily
exceed the allowed number of workers for a given type.

When retrieving work, check that the types match. If they don't, check
if we are allowed to transition to the other type. If not, then don't
handle the new work.

Cc: stable@vger.kernel.org
Reported-by: NJohannes Lundberg <johalun0@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ecc53c48

io_uring: allow updating linked timeouts · f1042b6c

由 Pavel Begunkov 提交于 8月 28, 2021

We allow updating normal timeouts, add support for adjusting timings of
linked timeouts as well.
Reported-by: NVictor Stewart <v@nametag.social>
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f1042b6c

io_uring: keep ltimeouts in a list · ef9dd637

由 Pavel Begunkov 提交于 8月 28, 2021

A preparation patch. Keep all queued linked timeout in a list, so they
may be found and updated.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ef9dd637

29 8月, 2021 2 次提交

io_uring: support CLOCK_BOOTTIME/REALTIME for timeouts · 50c1df2b

由 Jens Axboe 提交于 8月 27, 2021

Certain use cases want to use CLOCK_BOOTTIME or CLOCK_REALTIME rather than
CLOCK_MONOTONIC, instead of the default CLOCK_MONOTONIC.

Add an IORING_TIMEOUT_BOOTTIME and IORING_TIMEOUT_REALTIME flag that
allows timeouts and linked timeouts to use the selected clock source.

Only one clock source may be selected, and we -EINVAL the request if more
than one is given. If neither BOOTIME nor REALTIME are selected, the
previous default of MONOTONIC is used.

Link: https://github.com/axboe/liburing/issues/369Signed-off-by: NJens Axboe <axboe@kernel.dk>

50c1df2b

io-wq: provide a way to limit max number of workers · 2e480058

由 Jens Axboe 提交于 8月 27, 2021

io-wq divides work into two categories:

1) Work that completes in a bounded time, like reading from a regular file
or a block device. This type of work is limited based on the size of
the SQ ring.

2) Work that may never complete, we call this unbounded work. The amount
of workers here is just limited by RLIMIT_NPROC.

For various uses cases, it's handy to have the kernel limit the maximum
amount of pending workers for both categories. Provide a way to do with
with a new IORING_REGISTER_IOWQ_MAX_WORKERS operation.

IORING_REGISTER_IOWQ_MAX_WORKERS takes an array of two integers and sets
the max worker count to what is being passed in for each category. The
old values are returned into that same array. If 0 is being passed in for
either category, it simply returns the current value.

The value is capped at RLIMIT_NPROC. This actually isn't that important
as it's more of a hint, if we're exceeding the value then our attempt
to fork a new worker will fail. This happens naturally already if more
than one node is in the system, as these values are per-node internally
for io-wq.
Reported-by: NJohannes Lundberg <johalun0@gmail.com>
Link: https://github.com/axboe/liburing/issues/420Signed-off-by: NJens Axboe <axboe@kernel.dk>

2e480058

27 8月, 2021 5 次提交

io_uring: add build check for buf_index overflows · 90499ad0

由 Pavel Begunkov 提交于 8月 25, 2021

req->buf_index is u16 and so we rely on registered buffers indexes
fitting into it. Add a build check, so when the upper limit for the
number of buffers is lifted we get a compliation fail but not lurking
problems.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/787e8e1a17cea51ca6301426b1c4c4887b8bd676.1629920396.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

90499ad0

io_uring: clarify io_req_task_cancel() locking · b18a1a45

由 Pavel Begunkov 提交于 8月 25, 2021

It's too easy to forget and misjudge about synchronisation in
io_req_task_cancel(), add a comment clarifying it.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/71099083835f983a1fd73d5a3da6391924da8300.1629920396.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

b18a1a45

io_uring: add task-refs-get helper · 9a10867a

由 Pavel Begunkov 提交于 8月 27, 2021

As we have a more complicated task referencing, which apart from normal
task references includes taking tctx->inflight and caching all that, it
would be a good idea to have all that isolated in helpers.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/d9114d037f1c195897aa13f38a496078eca2afdb.1630023531.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

9a10867a

io_uring: fix failed linkchain code logic · a8295b98

由 Hao Xu 提交于 8月 27, 2021

Given a linkchain like this:
req0(link_flag)-->req1(link_flag)-->...-->reqn(no link_flag)

There is a problem:
 - if some intermediate linked req like req1 's submittion fails, reqs
   after it won't be cancelled.

   - sqpoll disabled: maybe it's ok since users can get the error info
     of req1 and stop submitting the following sqes.

   - sqpoll enabled: definitely a problem, the following sqes will be
     submitted in the next round.

The solution is to refactor the code logic to:
 - if a linked req's submittion fails, just mark it and the head(if it
   exists) as REQ_F_FAIL. Leverage req->result to indicate whether it
   is failed or cancelled.
 - submit or fail the whole chain when we come to the end of it.
Signed-off-by: NHao Xu <haoxu@linux.alibaba.com>
Reviewed-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/20210827094609.36052-3-haoxu@linux.alibaba.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

a8295b98

io_uring: remove redundant req_set_fail() · 14afdd6e

由 Hao Xu 提交于 8月 27, 2021

req_set_fail() in io_submit_sqe() is redundant, remove it.
Signed-off-by: NHao Xu <haoxu@linux.alibaba.com>
Reviewed-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/20210827094609.36052-2-haoxu@linux.alibaba.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

14afdd6e

26 8月, 2021 1 次提交

io_uring: don't free request to slab · 0c6e1d7f

由 Hao Xu 提交于 8月 26, 2021

It's not necessary to free the request back to slab when we fail to
get sqe, just move it to state->free_list.
Signed-off-by: NHao Xu <haoxu@linux.alibaba.com>
Reviewed-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/20210825175856.194299-1-haoxu@linux.alibaba.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

0c6e1d7f

25 8月, 2021 4 次提交

io_uring: accept directly into fixed file table · aaa4db12

由 Pavel Begunkov 提交于 8月 25, 2021

As done with open opcodes, allow accept to skip installing fd into
processes' file tables and put it directly into io_uring's fixed file
table. Same restrictions and design as for open.
Suggested-by: NJosh Triplett <josh@joshtriplett.org>
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Link: https://lore.kernel.org/r/6d16163f376fac7ac26a656de6b42199143e9721.1629888991.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

aaa4db12

io_uring: hand code io_accept() fd installing · a7083ad5

由 Pavel Begunkov 提交于 8月 25, 2021

Make io_accept() to handle file descriptor allocations and installation.
A preparation patch for bypassing file tables.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Link: https://lore.kernel.org/r/5b73d204caa0ce979ccb98136695b60f52a3d98c.1629888991.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

a7083ad5

io_uring: openat directly into fixed fd table · b9445598

由 Pavel Begunkov 提交于 8月 25, 2021

Instead of opening a file into a process's file table as usual and then
registering the fd within io_uring, some users may want to skip the
first step and place it directly into io_uring's fixed file table.
This patch adds such a capability for IORING_OP_OPENAT and
IORING_OP_OPENAT2.

The behaviour is controlled by setting sqe->file_index, where 0 implies
the old behaviour using normal file tables. If non-zero value is
specified, then it will behave as described and place the file into a
fixed file slot sqe->file_index - 1. A file table should be already
created, the slot should be valid and empty, otherwise the operation
will fail.

Keep the error codes consistent with IORING_OP_FILES_UPDATE, ENXIO and
EINVAL on inappropriate fixed tables, and return EBADF on collision with
already registered file.

Note: IOSQE_FIXED_FILE can't be used to switch between modes, because
accept takes a file, and it already uses the flag with a different
meaning.
Suggested-by: NJosh Triplett <josh@joshtriplett.org>
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Link: https://lore.kernel.org/r/e9b33d1163286f51ea707f87d95bd596dada1e65.1629888991.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

b9445598

net: add accept helper not installing fd · d32f89da

由 Pavel Begunkov 提交于 8月 25, 2021

Introduce and reuse a helper that acts similarly to __sys_accept4_file()
but returns struct file instead of installing file descriptor. Will be
used by io_uring.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Acked-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Link: https://lore.kernel.org/r/c57b9e8e818d93683a3d24f8ca50ca038d1da8c4.1629888991.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

d32f89da

24 8月, 2021 15 次提交

io_uring: fix io_try_cancel_userdata race for iowq · dadebc35

由 Pavel Begunkov 提交于 8月 23, 2021

WARNING: CPU: 1 PID: 5870 at fs/io_uring.c:5975 io_try_cancel_userdata+0x30f/0x540 fs/io_uring.c:5975
CPU: 0 PID: 5870 Comm: iou-wrk-5860 Not tainted 5.14.0-rc6-next-20210820-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:io_try_cancel_userdata+0x30f/0x540 fs/io_uring.c:5975
Call Trace:
 io_async_cancel fs/io_uring.c:6014 [inline]
 io_issue_sqe+0x22d5/0x65a0 fs/io_uring.c:6407
 io_wq_submit_work+0x1dc/0x300 fs/io_uring.c:6511
 io_worker_handle_work+0xa45/0x1840 fs/io-wq.c:533
 io_wqe_worker+0x2cc/0xbb0 fs/io-wq.c:582
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295

io_try_cancel_userdata() can be called from io_async_cancel() executing
in the io-wq context, so the warning fires, which is there to alert
anyone accessing task->io_uring->io_wq in a racy way. However,
io_wq_put_and_exit() always first waits for all threads to complete,
so the only detail left is to zero tctx->io_wq after the context is
removed.

note: one little assumption is that when IO_WQ_WORK_CANCEL, the executor
won't touch ->io_wq, because io_wq_destroy() might cancel left pending
requests in such a way.

Cc: stable@vger.kernel.org
Reported-by: syzbot+b0c9d1588ae92866515f@syzkaller.appspotmail.com
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/dfdd37a80cfa9ffd3e59538929c99cdd55d8699e.1629721757.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

dadebc35

io_uring: IRQ rw completion batching · 126180b9

由 Pavel Begunkov 提交于 8月 18, 2021

Employ inline completion logic for read/write completions done via
io_req_task_complete(). If ->uring_lock is contended, just do normal
request completion, but if not, make tctx_task_work() to grab the lock
and do batched inline completions in io_req_task_complete().
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/94589c3ce69eaed86a21bb1ec696407a54fab1aa.1629286357.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

126180b9

io_uring: batch task work locking · f237c30a

由 Pavel Begunkov 提交于 8月 18, 2021

Many task_work handlers either grab ->uring_lock, or may benefit from
having it. Move locking logic out of individual handlers to a lazy
approach controlled by tctx_task_work(), so we don't keep doing
tons of mutex lock/unlock.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/d6a34e147f2507a2f3e2fa1e38a9c541dcad3929.1629286357.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

f237c30a

io_uring: flush completions for fallbacks · 5636c00d

由 Pavel Begunkov 提交于 8月 18, 2021

io_fallback_req_func() doesn't expect anyone creating inline
completions, and no one currently does that. Teach the function to flush
completions preparing for further changes.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/8b941516921f72e1a64d58932d671736892d7fff.1629286357.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

5636c00d

io_uring: add ->splice_fd_in checks · 26578cda

由 Pavel Begunkov 提交于 8月 20, 2021

->splice_fd_in is used only by splice/tee, but no other request checks
it for validity. Add the check for most of request types excluding
reads/writes/sends/recvs, we don't want overhead for them and can leave
them be as is until the field is actually used.

Cc: stable@vger.kernel.org
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/f44bc2acd6777d932de3d71a5692235b5b2b7397.1629451684.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

26578cda

io_uring: add clarifying comment for io_cqring_ev_posted() · 2c5d763c

由 Jens Axboe 提交于 8月 21, 2021

We've previously had an issue where overflow flush unconditionally calls
io_cqring_ev_posted() even if it didn't flush any events to the ring,
causing wake and eventfd increment where no new events are available.
Some applications don't like that, see commit b18032bb for details.

This came up in discussion for another patch recently, hence add a
comment detailing what the relationship between calling the events
posted helper and CQ ring entries is.

Link: https://lore.kernel.org/io-uring/77a44fce-c831-16a6-8e80-9aee77f496a2@kernel.dk/Signed-off-by: NJens Axboe <axboe@kernel.dk>

2c5d763c

io_uring: place fixed tables under memcg limits · 0bea96f5

由 Pavel Begunkov 提交于 8月 20, 2021

Fixed tables may be large enough, place all of them together with
allocated tags under memcg limits.

Cc: stable@vger.kernel.org
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/b3ac9f5da9821bb59837b5fe25e8ef4be982218c.1629451684.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

0bea96f5

io_uring: limit fixed table size by RLIMIT_NOFILE · 3a1b8a4e

由 Pavel Begunkov 提交于 8月 20, 2021

Limit the number of files in io_uring fixed tables by RLIMIT_NOFILE,
that's the first and the simpliest restriction that we should impose.

Cc: stable@vger.kernel.org
Suggested-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/b2756c340aed7d6c0b302c26dab50c6c5907f4ce.1629451684.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

3a1b8a4e

io_uring: fix lack of protection for compl_nr · 99c8bc52

由 Hao Xu 提交于 8月 21, 2021

coml_nr in ctx_flush_and_put() is not protected by uring_lock, this
may cause problems when accessing in parallel:

say coml_nr > 0

  ctx_flush_and put                  other context
   if (compl_nr)                      get mutex
                                      coml_nr > 0
                                      do flush
                                          coml_nr = 0
                                      release mutex
        get mutex
           do flush (*)
        release mutex

in (*) place, we call io_cqring_ev_posted() and users likely get
no events there. To avoid spurious events, re-check the value when
under the lock.

Fixes: 2c32395d ("io_uring: fix __tctx_task_work() ctx race")
Signed-off-by: NHao Xu <haoxu@linux.alibaba.com>
Link: https://lore.kernel.org/r/20210820221954.61815-1-haoxu@linux.alibaba.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

99c8bc52

io_uring: Add register support for non-4k PAGE_SIZE · 187f08c1

由 wangyangbo 提交于 8月 19, 2021

Now allocated rsrc table uses PAGE_SIZE as the size of 2nd-level, and
accessing this table relies on each level index from fixed TABLE_SHIFT
(12 - 3) in 4k page case. In order to correctly work in non-4k page,
define TABLE_SHIFT as non-fixed (PAGE_SHIFT - shift of data) for
2nd-level table entry number.
Signed-off-by: Nwangyangbo <wangyangbo@uniontech.com>
Link: https://lore.kernel.org/r/20210819055657.27327-1-wangyangbo@uniontech.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

187f08c1

io_uring: extend task put optimisations · e98e49b2

由 Pavel Begunkov 提交于 8月 18, 2021

Now with IRQ completions done via IRQ, almost all requests freeing
are done from the context of submitter task, so it makes sense to
extend task_put optimisation from io_req_free_batch_finish() to cover
all the cases including task_work by moving it into io_put_task().
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/824a7cbd745ddeee4a0f3ff85c558a24fd005872.1629302453.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

e98e49b2

io_uring: add comments on why PF_EXITING checking is safe · 316319e8

由 Jens Axboe 提交于 8月 19, 2021

We have two checks of task->flags & PF_EXITING left:

1) In io_req_task_submit(), which is called in task_work and hence always
   in the context of the original task. That means that
   req->task == current, and hence checking ->flags is totally fine.

2) In io_poll_rewait(), where we need to stop re-arming poll to prevent
   it interfering with cancelation. This is only run from task_work as
   well, and hence for this case too req->task == current.

Add a comment to both spots detailing that.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

316319e8

io-wq: move nr_running and worker_refs out of wqe->lock protection · 79dca184

由 Hao Xu 提交于 8月 10, 2021

We don't need to protect nr_running and worker_refs by wqe->lock, so
narrow the range of raw_spin_lock_irq - raw_spin_unlock_irq
Signed-off-by: NHao Xu <haoxu@linux.alibaba.com>
Link: https://lore.kernel.org/r/20210810125554.99229-1-haoxu@linux.alibaba.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

79dca184

io_uring: fix io_timeout_remove locking · ec3c3d0f

由 Pavel Begunkov 提交于 8月 18, 2021

io_timeout_cancel() posts CQEs so needs ->completion_lock to be held,
so grab it in io_timeout_remove().

Fixes: 48ecb6369f1f2 ("io_uring: run timeouts from task_work")
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/d6f03d653a4d7bf693ef6f39b6a426b6d97fd96f.1629280204.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

ec3c3d0f

io_uring: improve same wq polling · 23a65db8

由 Pavel Begunkov 提交于 8月 17, 2021

Move earlier the check for whether __io_queue_proc() tries to poll
already polled waitqueue, and do the same for the second poll entry, if
any. Shouldn't really matter, but at least it would have a more
predictable behaviour.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/8cb428cfe8ade0fd055859fabb878db8777d4c2f.1629228203.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

23a65db8

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功