提交 · 15641e427070f05fad2e9d74d191146d6514d30f · openeuler / Kernel

16 6月, 2021 6 次提交

io_uring: don't cache number of dropped SQEs · 15641e42

由 Pavel Begunkov 提交于 6月 14, 2021

Kill ->cached_sq_dropped and wire DRAIN sequence number correction via
->cq_extra, which is there exactly for that purpose. User visible
dropped counter will be populated by incrementing it instead of keeping
a copy, similarly as it was done not so long ago with cq_overflow.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/088aceb2707a534d531e2770267c4498e0507cc1.1623709150.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

15641e42

io_uring: refactor io_get_sqe() · 17d3aeb3

由 Pavel Begunkov 提交于 6月 14, 2021

The line of io_get_sqe() evaluating @head consists of too many
operations including READ_ONCE(), it's not convenient for probing.
Refactor it also improving readability.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/866ad6e4ef4851c7c61f6b0e08dbd0a8d1abce84.1623709150.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

17d3aeb3

io_uring: shuffle more fields into SQ ctx section · 7f1129d2

由 Pavel Begunkov 提交于 6月 14, 2021

Since moving locked_free_* out of struct io_submit_state
ctx->submit_state is accessed on submission side only, so move it into
the submission section. Same goes for rsrc table pointers/nodes/etc.,
they must be taken and checked during submission because sync'ed by
uring_lock, so move them there as well.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/8a5899a50afc6ccca63249e716f580b246f3dec6.1623709150.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

7f1129d2

io_uring: move ctx->flags from SQ cacheline · b52ecf8c

由 Pavel Begunkov 提交于 6月 14, 2021

ctx->flags are heavily used by both, completion and submission sides, so
move it out from the ctx fields related to submissions. Instead, place
it together with ctx->refs, because it's already cacheline-aligned and
so pads lots of space, and both almost never change. Also, in most
occasions they are accessed together as refs are taken at submission
time and put back during completion.

Do same with ctx->rings, where the pointer itself is never modified
apart from ring init/free.

Note: in percpu mode, struct percpu_ref doesn't modify the struct itself
but takes indirection with ref->percpu_count_ptr.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/4c48c173e63d35591383ba2b87e8b8e8dfdbd23d.1623709150.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

b52ecf8c

io_uring: keep SQ pointers in a single cacheline · c7af47cf

由 Pavel Begunkov 提交于 6月 14, 2021

sq_array and sq_sqes are always used together, however they are in
different cachelines, where the borderline is right before
cq_overflow_list is rather rarely touched. Move the fields together so
it loads only one cacheline.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/3ef2411a94874da06492506a8897eff679244f49.1623709150.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

c7af47cf

io_uring: Fix incorrect sizeof operator for copy_from_user call · fdd1dc31

由 Colin Ian King 提交于 6月 15, 2021

Static analysis is warning that the sizeof being used is should be
of *data->tags[i] and not data->tags[i]. Although these are the same
size on 64 bit systems it is not a portable assumption to assume
this is true for all cases. Fix this by using a temporary pointer
tag_slot to make the code a clearer.

Addresses-Coverity: ("Sizeof not portable")
Fixes: d878c816 ("io_uring: hide rsrc tag copy into generic helpers")
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Reviewed-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/20210615130011.57387-1-colin.king@canonical.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

fdd1dc31

14 6月, 2021 24 次提交

io_uring: inline io_iter_do_read() · aeab9506

由 Pavel Begunkov 提交于 6月 14, 2021

There are only two calls in source code of io_iter_do_read(), the
function is small and pretty hot though is failed to get inlined.
Makr it as inline.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/25a26dae7660da73fbc2244b361b397ef43d3caf.1623634182.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

aeab9506

io_uring: unify SQPOLL and user task cancellations · 78cc687b

由 Pavel Begunkov 提交于 6月 14, 2021

Merge io_uring_cancel_sqpoll() and __io_uring_cancel() as it's easier to
have a conditional ctx traverse inside than keeping them in sync.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/adfe24d6dad4a3883a40eee54352b8b65ac851bb.1623634181.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

78cc687b

io_uring: cache task struct refs · 09899b19

由 Pavel Begunkov 提交于 6月 14, 2021

tctx in submission part is always synchronised because is executed from
the task's context, so we can batch allocate tctx/task references and
store them across syscall boundaries. It avoids enough of operations,
including an atomic for getting task ref and a percpu_counter_add()
function call, which still fallback to spinlock for large batching
cases (around >=32). Should be good for SQPOLL submitting in small
portions and coming at some moment bpf submissions.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/14b327b973410a3eec1f702ecf650e100513aca9.1623634181.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

09899b19

io_uring: don't vmalloc rsrc tags · 2d091d62

由 Pavel Begunkov 提交于 6月 14, 2021

We don't really need vmalloc for keeping tags, it's not a hot path and
is there out of convenience, so replace it with two level tables to not
litter kernel virtual memory mappings.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/241a3422747113a8909e7e1030eb585d4a349e0d.1623634181.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

2d091d62

io_uring: add helpers for 2 level table alloc · 9123c8ff

由 Pavel Begunkov 提交于 6月 14, 2021

Some parts like fixed file table use 2 level tables, factor out helpers
for allocating/deallocating them as more users are to come.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/1709212359cd82eb416d395f86fc78431ccfc0aa.1623634181.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

9123c8ff

io_uring: remove rsrc put work irq save/restore · 157d257f

由 Pavel Begunkov 提交于 6月 14, 2021

io_rsrc_put_work() is executed by workqueue in non-irq context, so no
need for irqsave/restore variants of spinlocking.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/2a7f77220735f4ad404ac885b4d73bdf42d2f836.1623634181.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

157d257f

io_uring: hide rsrc tag copy into generic helpers · d878c816

由 Pavel Begunkov 提交于 6月 14, 2021

Make io_rsrc_data_alloc() taking care of rsrc tags loading on
registration, so we don't need to repeat it for each new rsrc type.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/5609680697bd09735de10561b75edb95283459da.1623634181.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

d878c816

io_uring: rename function *task_file · eef51daa

由 Pavel Begunkov 提交于 6月 14, 2021

What at some moment was references to struct file used to control
lifetimes of task/ctx is now just internal tctx structures/nodes,
so rename outdated *task_file() routines into something more sensible.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/e2fbce42932154c2631ce58ffbffaa232afe18d5.1623634181.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

eef51daa

io_uring: refactor io_iopoll_req_issued · cb3d8972

由 Pavel Begunkov 提交于 6月 14, 2021

A simple refactoring of io_iopoll_req_issued(), move in_async inside so
we don't pass it around and save on double checking it.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/1513bfde4f0c835be25ac69a82737ab0668d7665.1623634181.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

cb3d8972

io_uring: fix blocking inline submission · 976517f1

由 Pavel Begunkov 提交于 6月 09, 2021

There is a complaint against sys_io_uring_enter() blocking if it submits
stdin reads. The problem is in __io_file_supports_async(), which
sees that it's a cdev and allows it to be processed inline.

Punt char devices using generic rules of io_file_supports_async(),
including checking for presence of *_iter() versions of rw callbacks.
Apparently, it will affect most of cdevs with some exceptions like
null and zero devices.

Cc: stable@vger.kernel.org
Reported-by: NBirk Hirdman <lonjil@gmail.com>
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/d60270856b8a4560a639ef5f76e55eb563633599.1623236455.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

976517f1

io_uring: enable shmem/memfd memory registration · 40dad765

由 Pavel Begunkov 提交于 6月 09, 2021

Relax buffer registration restictions, which filters out file backed
memory, and allow shmem/memfd as they have normal anonymous pages
underneath.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

40dad765

io_uring: don't bounce submit_state cachelines · d0acdee2

由 Pavel Begunkov 提交于 5月 16, 2021

struct io_submit_state contains struct io_comp_state and so
locked_free_*, that renders cachelines around ->locked_free* being
invalidated on most non-inline completions, that may terrorise caches if
submissions and completions are done by different tasks.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/290cb5412b76892e8631978ee8ab9db0c6290dd5.1621201931.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

d0acdee2

io_uring: rename io_get_cqring · d068b506

由 Pavel Begunkov 提交于 5月 16, 2021

Rename io_get_cqring() into io_get_cqe() for consistency with SQ, and
just because the old name is not as clear.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/a46a53e3f781de372f5632c184e61546b86515ce.1621201931.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

d068b506

io_uring: kill cached_cq_overflow · 8f6ed49a

由 Pavel Begunkov 提交于 5月 16, 2021

There are two copies of cq_overflow, shared with userspace and internal
cached one. It was needed for DRAIN accounting, but now we have yet
another knob to tune the accounting, i.e. cq_extra, and we can throw
away the internal counter and just increment the one in the shared ring.

If user modifies it as so never gets the right overflow value ever
again, it's its problem, even though before we would have restored it
back by next overflow.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/8427965f5175dd051febc63804909861109ce859.1621201931.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

8f6ed49a

io_uring: deduce cq_mask from cq_entries · ea5ab3b5

由 Pavel Begunkov 提交于 5月 16, 2021

No need to cache cq_mask, it's exactly cq_entries - 1, so just deduce
it to not carry it around.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/d439efad0503c8398451dae075e68a04362fbc8d.1621201931.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

ea5ab3b5

io_uring: remove dependency on ring->sq/cq_entries · a566c556

由 Pavel Begunkov 提交于 5月 16, 2021

We have numbers of {sq,cq} entries cached in ctx, don't look up them in
user-shared rings as 1) it may fetch additional cacheline 2) user may
change it and so it's always error prone.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/745d31bc2da41283ddd0489ef784af5c8d6310e9.1621201931.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

a566c556

io_uring: better locality for rsrc fields · b13a8918

由 Pavel Begunkov 提交于 5月 16, 2021

ring has two types of resource-related fields: used for request
submission, and field needed for update/registration. Reshuffle them
into these two groups for better locality and readability. The second
group is not in the hot path, so it's natural to place them somewhere in
the end. Also update an outdated comment.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/05b34795bb4440f4ec4510f08abd5a31830f8ca0.1621201931.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

b13a8918

io_uring: shuffle rarely used ctx fields · b986af7e

由 Pavel Begunkov 提交于 5月 16, 2021

There is a bunch of scattered around ctx fields that are almost never
used, e.g. only on ring exit, plunge them to the end, better locality,
better aesthetically.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/782ff94b00355923eae757d58b1a47821b5b46d4.1621201931.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

b986af7e

io_uring: make fail flag not link specific · 93d2bcd2

由 Pavel Begunkov 提交于 5月 16, 2021

The main difference is in req_set_fail_links() renamed into
req_set_fail(), which now sets REQ_F_FAIL_LINK/REQ_F_FAIL flag
unconditional on whether it has been a link or not. It only matters in
io_disarm_next(), which already handles it well, and all calls to it
have a fast path checking REQ_F_LINK/HARDLINK.

It looks cleaner, and sheds binary size
text data bss dec hex filename
84235 12390 8 96633 17979 ./fs/io_uring.o
84151 12414 8 96573 1793d ./fs/io_uring.o
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/e2224154dd6e53b665ac835d29436b177872fa10.1621201931.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

93d2bcd2

io_uring: get rid of files in exit cancel · 3dd0c97a

由 Pavel Begunkov 提交于 5月 16, 2021

We don't match against files on cancellation anymore, so no need to drag
around files_struct anymore, just pass a flag telling whether only
inflight or all requests should be killed.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/7bfc5409a78f8e2d6b27dec3293ec2d248677348.1621201931.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

3dd0c97a

io_uring: simplify waking sqo_sq_wait · acfb381d

由 Pavel Begunkov 提交于 5月 16, 2021

Going through submission in __io_sq_thread() and still having a full SQ
is rather unexpected, so remove a check for SQ fullness and just wake up
whoever wait on sqo_sq_wait. Also skip if it doesn't do submission in
the first place, likely may to happen for SQPOLL sharing and/or IOPOLL.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/e2e91751e87b1a39f8d63ef884aaff578123f61e.1621201931.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

acfb381d

io_uring: remove unused park_task_work · 21f2fc08

由 Pavel Begunkov 提交于 5月 16, 2021

As sqpoll cancel via task_work is killed, remove everything related to
park_task_work as it's not used anymore.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/310d8b76a2fbbf3e139373500e04ad9af7ee3dbb.1621201931.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

21f2fc08

io_uring: improve sq_thread waiting check · aaa9f0f4

由 Pavel Begunkov 提交于 5月 16, 2021

If SQPOLL task finds a ring requesting it to continue running, no need
to set wake flag to rest of the rings as it will be cleared in a moment
anyway, so hide it in a single sqd->ctx_list loop.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/1ee5a696d9fd08645994c58ee147d149a8957d94.1621201931.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

aaa9f0f4

io_uring: improve sqpoll event/state handling · e4b6d902

由 Pavel Begunkov 提交于 5月 16, 2021

As sqd->state changes rarely, don't check every event one by one but
look them all at once. Add a helper function. Also don't go into event
waiting sleeping with STOP flag set.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/645025f95c7eeec97f88ff497785f4f1d6f3966f.1621201931.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

e4b6d902

11 6月, 2021 2 次提交

io_uring: add feature flag for rsrc tags · 9690557e

由 Pavel Begunkov 提交于 6月 10, 2021

Add IORING_FEAT_RSRC_TAGS indicating that io_uring supports a bunch of
new IORING_REGISTER operations, in particular
IORING_REGISTER_[FILES[,UPDATE]2,BUFFERS[2,UPDATE]] that support rsrc
tagging, and also indicating implemented dynamic fixed buffer updates.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/9b995d4045b6c6b4ab7510ca124fd25ac2203af7.1623339162.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

9690557e

io_uring: change registration/upd/rsrc tagging ABI · 992da01a

由 Pavel Begunkov 提交于 6月 10, 2021

There are ABI moments about recently added rsrc registration/update and
tagging that might become a nuisance in the future. First,
IORING_REGISTER_RSRC[_UPD] hide different types of resources under it,
so breaks fine control over them by restrictions. It works for now, but
once those are wanted under restrictions it would require a rework.

It was also inconvenient trying to fit a new resource not supporting
all the features (e.g. dynamic update) into the interface, so better
to return to IORING_REGISTER_* top level dispatching.

Second, register/update were considered to accept a type of resource,
however that's not a good idea because there might be several ways of
registration of a single resource type, e.g. we may want to add
non-contig buffers or anything more exquisite as dma mapped memory.
So, remove IORING_RSRC_[FILE,BUFFER] out of the ABI, and place them
internally for now to limit changes.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/9b554897a7c17ad6e3becc48dfed2f7af9f423d5.1623339162.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

992da01a

30 5月, 2021 1 次提交

io_uring: fix misaccounting fix buf pinned pages · 216e5835

由 Pavel Begunkov 提交于 5月 29, 2021

As Andres reports "... io_sqe_buffer_register() doesn't initialize imu.
io_buffer_account_pin() does imu->acct_pages++, before calling
io_account_mem(ctx, imu->acct_pages).", leading to evevntual -ENOMEM.

Initialise the field.
Reported-by: NAndres Freund <andres@anarazel.de>
Fixes: 41edf1a5 ("io_uring: keep table of pointers to ubufs")
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/438a6f46739ae5e05d9c75a0c8fa235320ff367c.1622285901.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

216e5835

27 5月, 2021 1 次提交

io_uring: fix data race to avoid potential NULL-deref · b16ef427

由 Marco Elver 提交于 5月 27, 2021

Commit ba5ef6dc ("io_uring: fortify tctx/io_wq cleanup") introduced
setting tctx->io_wq to NULL a bit earlier. This has caused KCSAN to
detect a data race between accesses to tctx->io_wq:

  write to 0xffff88811d8df330 of 8 bytes by task 3709 on cpu 1:
   io_uring_clean_tctx                  fs/io_uring.c:9042 [inline]
   __io_uring_cancel                    fs/io_uring.c:9136
   io_uring_files_cancel                include/linux/io_uring.h:16 [inline]
   do_exit                              kernel/exit.c:781
   do_group_exit                        kernel/exit.c:923
   get_signal                           kernel/signal.c:2835
   arch_do_signal_or_restart            arch/x86/kernel/signal.c:789
   handle_signal_work                   kernel/entry/common.c:147 [inline]
   exit_to_user_mode_loop               kernel/entry/common.c:171 [inline]
   ...
  read to 0xffff88811d8df330 of 8 bytes by task 6412 on cpu 0:
   io_uring_try_cancel_iowq             fs/io_uring.c:8911 [inline]
   io_uring_try_cancel_requests         fs/io_uring.c:8933
   io_ring_exit_work                    fs/io_uring.c:8736
   process_one_work                     kernel/workqueue.c:2276
   ...

With the config used, KCSAN only reports data races with value changes:
this implies that in the case here we also know that tctx->io_wq was
non-NULL. Therefore, depending on interleaving, we may end up with:

              [CPU 0]                 |        [CPU 1]
  io_uring_try_cancel_iowq()          | io_uring_clean_tctx()
    if (!tctx->io_wq) // false        |   ...
    ...                               |   tctx->io_wq = NULL
    io_wq_cancel_cb(tctx->io_wq, ...) |   ...
      -> NULL-deref                   |

Note: It is likely that thus far we've gotten lucky and the compiler
optimizes the double-read into a single read into a register -- but this
is never guaranteed, and can easily change with a different config!

Fix the data race by restoring the previous behaviour, where both
setting io_wq to NULL and put of the wq are _serialized_ after
concurrent io_uring_try_cancel_iowq() via acquisition of the uring_lock
and removal of the node in io_uring_del_task_file().

Fixes: ba5ef6dc ("io_uring: fortify tctx/io_wq cleanup")
Suggested-by: NPavel Begunkov <asml.silence@gmail.com>
Reported-by: syzbot+bf2b3d0435b9b728946c@syzkaller.appspotmail.com
Signed-off-by: NMarco Elver <elver@google.com>
Cc: Jens Axboe <axboe@kernel.dk>
Link: https://lore.kernel.org/r/20210527092547.2656514-1-elver@google.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

b16ef427

26 5月, 2021 1 次提交

io_uring/io-wq: close io-wq full-stop gap · 17a91051

由 Pavel Begunkov 提交于 5月 23, 2021

There is an old problem with io-wq cancellation where requests should be
killed and are in io-wq but are not discoverable, e.g. in @next_hashed
or @linked vars of io_worker_handle_work(). It adds some unreliability
to individual request canellation, but also may potentially get
__io_uring_cancel() stuck. For instance:

1) An __io_uring_cancel()'s cancellation round have not found any
   request but there are some as desribed.
2) __io_uring_cancel() goes to sleep
3) Then workers wake up and try to execute those hidden requests
   that happen to be unbound.

As we already cancel all requests of io-wq there, set IO_WQ_BIT_EXIT
in advance, so preventing 3) from executing unbound requests. The
workers will initially break looping because of getting a signal as they
are threads of the dying/exec()'ing user task.

Cc: stable@vger.kernel.org
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/abfcf8c54cb9e8f7bfbad7e9a0cc5433cc70bdc2.1621781238.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

17a91051

20 5月, 2021 1 次提交

io_uring: fortify tctx/io_wq cleanup · ba5ef6dc

由 Pavel Begunkov 提交于 5月 20, 2021

We don't want anyone poking into tctx->io_wq awhile it's being destroyed
by io_wq_put_and_exit(), and even though it shouldn't even happen, if
buggy would be preferable to get a NULL-deref instead of subtle delayed
failure or UAF.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/827b021de17926fd807610b3e53a5a5fa8530856.1621513214.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

ba5ef6dc

17 5月, 2021 1 次提交

io_uring: don't modify req->poll for rw · 7a274727

由 Pavel Begunkov 提交于 5月 17, 2021

__io_queue_proc() is used by both poll and apoll, so we should not
access req->poll directly but selecting right struct io_poll_iocb
depending on use case.

Reported-and-tested-by: syzbot+a84b8783366ecb1c65d0@syzkaller.appspotmail.com
Fixes: ea6a693d ("io_uring: disable multishot poll for double poll add cases")
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/4a6a1de31142d8e0250fe2dfd4c8923d82a5bbfc.1621251795.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

7a274727

14 5月, 2021 3 次提交

io_uring: increase max number of reg buffers · 489809e2

由 Pavel Begunkov 提交于 5月 14, 2021

Since recent changes instead of storing a large array of struct
io_mapped_ubuf, we store pointers to them, that is 4 times slimmer and
we should not to so worry about restricting max number of registererd
buffer slots, increase the limit 4 times.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/d3dee1da37f46da416aa96a16bf9e5094e10584d.1620990371.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

489809e2

io_uring: further remove sqpoll limits on opcodes · 2d74d042

由 Pavel Begunkov 提交于 5月 14, 2021

There are three types of requests that left disabled for sqpoll, namely
epoll ctx, statx, and resources update. Since SQPOLL task is now closely
mimics a userspace thread, remove the restrictions.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/909b52d70c45636d8d7897582474ea5aab5eed34.1620990306.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

2d74d042

io_uring: fix ltout double free on completion race · 447c19f3

由 Pavel Begunkov 提交于 5月 14, 2021

Always remove linked timeout on io_link_timeout_fn() from the master
request link list, otherwise we may get use-after-free when first
io_link_timeout_fn() puts linked timeout in the fail path, and then
will be found and put on master's free.

Cc: stable@vger.kernel.org # 5.10+
Fixes: 90cd7e42 ("io_uring: track link timeout's master explicitly")
Reported-and-tested-by: syzbot+5a864149dd970b546223@syzkaller.appspotmail.com
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/69c46bf6ce37fec4fdcd98f0882e18eb07ce693a.1620990121.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

447c19f3

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功