提交 · 63809137ebb58f0aa2ce359117422686e3304f45 · openeuler / Kernel

25 7月, 2022 40 次提交

io_uring: flush notifiers after sendzc · 63809137

由 Pavel Begunkov 提交于 7月 12, 2022

Allow to flush notifiers as a part of sendzc request by setting
IORING_SENDZC_FLUSH flag. When the sendzc request succeedes it will
flush the used [active] notifier.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/e0b4d9a6797e2fd6092824fe42953db7a519bbc8.1657643355.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

63809137

io_uring: add notification slot registration · bc24d6bd

由 Pavel Begunkov 提交于 7月 12, 2022

Let the userspace to register and unregister notification slots.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/a0aa8161fe3ebb2a4cc6e5dbd0cffb96e6881cf5.1657643355.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

bc24d6bd

io_uring: cache struct io_notif · eb4a299b

由 Pavel Begunkov 提交于 7月 12, 2022

kmalloc'ing struct io_notif is too expensive when done frequently, cache
them as many other resources in io_uring. Keep two list, the first one
is from where we're getting notifiers, it's protected by ->uring_lock.
The second is protected by ->completion_lock, to which we queue released
notifiers. Then we splice one list into another when needed.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/9dec18f7fcbab9f4bd40b96e5ae158b119945230.1657643355.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

eb4a299b

io_uring: add zc notification infrastructure · eb42cebb

由 Pavel Begunkov 提交于 7月 12, 2022

Add internal part of send zerocopy notifications. There are two main
structures, the first one is struct io_notif, which carries inside
struct ubuf_info and maps 1:1 to it. io_uring will be binding a number
of zerocopy send requests to it and ask to complete (aka flush) it. When
flushed and all attached requests and skbs complete, it'll generate one
and only one CQE. There are intended to be passed into the network layer
as struct msghdr::msg_ubuf.

The second concept is notification slots. The userspace will be able to
register an array of slots and subsequently addressing them by the index
in the array. Slots are independent of each other. Each slot can have
only one notifier at a time (called active notifier) but many notifiers
during the lifetime. When active, a notifier not going to post any
completion but the userspace can attach requests to it by specifying
the corresponding slot while issueing send zc requests. Eventually, the
userspace will want to "flush" the notifier losing any way to attach
new requests to it, however it can use the next atomatically added
notifier of this slot or of any other slot.

When the network layer is done with all enqueued skbs attached to a
notifier and doesn't need the specified in them user data, the flushed
notifier will post a CQE.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/3ecf54c31a85762bf679b0a432c9f43ecf7e61cc.1657643355.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

eb42cebb

io_uring: export io_put_task() · e70cb608

由 Pavel Begunkov 提交于 7月 12, 2022

Make io_put_task() available to non-core parts of io_uring, we'll need
it for notification infrastructure.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/3686807d4c03b72e389947b0e8692d4d44334ef0.1657643355.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

e70cb608

io_uring: ensure REQ_F_ISREG is set async offload · f6b543fd

由 Jens Axboe 提交于 7月 21, 2022

If we're offloading requests directly to io-wq because IOSQE_ASYNC was
set in the sqe, we can miss hashing writes appropriately because we
haven't set REQ_F_ISREG yet. This can cause a performance regression
with buffered writes, as io-wq then no longer correctly serializes writes
to that file.

Ensure that we set the flags in io_prep_async_work(), which will cause
the io-wq work item to be hashed appropriately.

Fixes: 584b0180 ("io_uring: move read/write file prep state into actual opcode handler")
Link: https://lore.kernel.org/io-uring/20220608080054.GB22428@xsang-OptiPlex-9020/Reported-by: Nkernel test robot <oliver.sang@intel.com>
Tested-by: NYin Fengwei <fengwei.yin@intel.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f6b543fd

io_uring: Don't require reinitable percpu_ref · 48904229

由 Michal Koutný 提交于 7月 15, 2022

The commit 8bb649ee ("io_uring: remove ring quiesce for
io_uring_register") removed the worklow relying on reinit/resurrection
of the percpu_ref, hence, initialization with that requested is a relic.

This is based on code review, this causes no real bug (and theoretically
can't). Technically it's a revert of commit 21482896 ("io_uring:
initialize percpu refcounters using PERCU_REF_ALLOW_REINIT") but since
the flag omission is now justified, I'm not making this a revert.

Fixes: 8bb649ee ("io_uring: remove ring quiesce for io_uring_register")
Signed-off-by: NMichal Koutný <mkoutny@suse.com>
Acked-by: NRoman Gushchin <roman.gushchin@linux.dev>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

48904229

io_uring: add netmsg cache · 43e0bbbd

由 Jens Axboe 提交于 7月 07, 2022

For recvmsg/sendmsg, if they don't complete inline, we currently need
to allocate a struct io_async_msghdr for each request. This is a
somewhat large struct.

Hook up sendmsg/recvmsg to use the io_alloc_cache. This reduces the
alloc + free overhead considerably, yielding 4-5% of extra performance
running netbench.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

43e0bbbd

io_uring: impose max limit on apoll cache · 9731bc98

由 Jens Axboe 提交于 7月 07, 2022

Caches like this tend to grow to the peak size, and then never get any
smaller. Impose a max limit on the size, to prevent it from growing too
big.

A somewhat randomly chosen 512 is the max size we'll allow the cache
to get. If a batch of frees come in and would bring it over that, we
simply start kfree'ing the surplus.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

9731bc98

io_uring: add abstraction around apoll cache · 9b797a37

由 Jens Axboe 提交于 7月 07, 2022

In preparation for adding limits, and one more user, abstract out the
core bits of the allocation+free cache.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

9b797a37

io_uring: move apoll cache to poll.c · 9da7471e

由 Jens Axboe 提交于 7月 07, 2022

This is where it's used, move the flush handler in there.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

9da7471e

io_uring: only trace one of complete or overflow · e0486f3f

由 Dylan Yudaken 提交于 6月 30, 2022

In overflow we see a duplcate line in the trace, and in some cases 3
lines (if initial io_post_aux_cqe fails).
Instead just trace once for each CQE
Signed-off-by: NDylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220630091231.1456789-13-dylany@fb.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

e0486f3f

io_uring: add allow_overflow to io_post_aux_cqe · 52120f0f

由 Dylan Yudaken 提交于 6月 30, 2022

Some use cases of io_post_aux_cqe would not want to overflow as is, but
might want to change the flags/result. For example multishot receive
requires in order CQE, and so if there is an overflow it would need to
stop receiving until the overflow is taken care of.
Signed-off-by: NDylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220630091231.1456789-8-dylany@fb.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

52120f0f

io_uring: let to set a range for file slot allocation · 6e73dffb

由 Pavel Begunkov 提交于 6月 25, 2022

From recently io_uring provides an option to allocate a file index for
operation registering fixed files. However, it's utterly unusable with
mixed approaches when for a part of files the userspace knows better
where to place it, as it may race and users don't have any sane way to
pick a slot and hoping it will not be taken.

Let the userspace to register a range of fixed file slots in which the
auto-allocation happens. The use case is splittting the fixed table in
two parts, where on of them is used for auto-allocation and another for
slot-specified operations.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/66ab0394e436f38437cf7c44676e1920d09687ad.1656154403.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

6e73dffb

io_uring: remove ctx->refs pinning on enter · fbb8bb02

由 Pavel Begunkov 提交于 6月 25, 2022

io_uring_enter() takes ctx->refs, which was previously preventing racing
with register quiesce. However, as register now doesn't touch the refs,
we can freely kill extra ctx pinning and rely on the fact that we're
holding a file reference preventing the ring from being destroyed.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/a11c57ad33a1be53541fce90669c1b79cf4d8940.1656153286.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

fbb8bb02

io_uring: don't check file ops of registered rings · 3273c440

由 Pavel Begunkov 提交于 6月 25, 2022

Registered rings are per definitions io_uring files, so we don't need to
additionally verify them.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/425cd64fd885b8e329a46c205ee811987691baaf.1656153286.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

3273c440

io_uring: remove extra TIF_NOTIFY_SIGNAL check · ad8b261d

由 Pavel Begunkov 提交于 6月 25, 2022

io_run_task_work() accounts for TIF_NOTIFY_SIGNAL, so no need to have an
second check in io_run_task_work_sig().
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/52ce41a592ad904511697f432141e5690fd4b968.1656153285.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

ad8b261d

io_uring: fuse fallback_node and normal tw node · 3218e5d3

由 Pavel Begunkov 提交于 6月 25, 2022

Now as both normal and fallback paths use llist, just keep one node head
in struct io_task_work and kill off ->fallback_node.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/d04ebde409f7b162fe247b361b4486b193293e46.1656153285.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

3218e5d3

io_uring: add sync cancelation API through io_uring_register() · 78a861b9

由 Jens Axboe 提交于 6月 18, 2022

The io_uring cancelation API is async, like any other API that we expose
there. For the case of finding a request to cancel, or not finding one,
it is fully sync in that when submission returns, the CQE for both the
cancelation request and the targeted request have been posted to the
CQ ring.

However, if the targeted work is being executed by io-wq, the API can
only start the act of canceling it. This makes it difficult to use in
some circumstances, as the caller then has to wait for the CQEs to come
in and match on the same cancelation data there.

Provide a IORING_REGISTER_SYNC_CANCEL command for io_uring_register()
that does sync cancelations, always. For the io-wq case, it'll wait
for the cancelation to come in before returning. The only expected
returns from this API is:

0		Request found and canceled fine.
> 0		Requests found and canceled. Only happens if asked to
		cancel multiple requests, and if the work wasn't in
		progress.
-ENOENT		Request not found.
-ETIME		A timeout on the operation was requested, but the timeout
		expired before we could cancel.

and we won't get -EALREADY via this API.

If the timeout value passed in is -1 (tv_sec and tv_nsec), then that
means that no timeout is requested. Otherwise, the timespec passed in
is the amount of time the sync cancel will wait for a successful
cancelation.

Link: https://github.com/axboe/liburing/discussions/608Signed-off-by: NJens Axboe <axboe@kernel.dk>

78a861b9

io_uring: trace task_work_run · c6dd763c

由 Dylan Yudaken 提交于 6月 22, 2022

trace task_work_run to help provide stats on how often task work is run
and what batch sizes are coming through.
Signed-off-by: NDylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220622134028.2013417-9-dylany@fb.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

c6dd763c

io_uring: batch task_work · 3a0c037b

由 Dylan Yudaken 提交于 6月 22, 2022

Batching task work up is an important performance optimisation, as
task_work_add is expensive.

In order to keep the semantics replace the task_list with a fake node
while processing the old list, and then do a cmpxchg at the end to see if
there is more work.
Signed-off-by: NDylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220622134028.2013417-6-dylany@fb.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

3a0c037b

io_uring: introduce llist helpers · 923d1592

由 Dylan Yudaken 提交于 6月 22, 2022

Introduce helpers to atomically switch llist.

Will later move this into common code
Signed-off-by: NDylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220622134028.2013417-5-dylany@fb.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

923d1592

io_uring: lockless task list · f88262e6

由 Dylan Yudaken 提交于 6月 22, 2022

With networking use cases we see contention on the spinlock used to
protect the task_list when multiple threads try and add completions at once.
Instead we can use a lockless list, and assume that the first caller to
add to the list is responsible for kicking off task work.
Signed-off-by: NDylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220622134028.2013417-4-dylany@fb.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

f88262e6

io_uring: remove __io_req_task_work_add · c34398a8

由 Dylan Yudaken 提交于 6月 22, 2022

this is no longer needed as there is only one caller
Signed-off-by: NDylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220622134028.2013417-3-dylany@fb.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

c34398a8

io_uring: remove priority tw list optimisation · ed5ccb3b

由 Dylan Yudaken 提交于 6月 22, 2022

This optimisation has some built in assumptions that make it easy to
introduce bugs. It also does not have clear wins that make it worth keeping.
Signed-off-by: NDylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220622134028.2013417-2-dylany@fb.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

ed5ccb3b

io_uring: consistent naming for inline completion · 9da070b1

由 Pavel Begunkov 提交于 6月 20, 2022

Improve naming of the inline/deferred completion helper so it's
consistent with it's *_post counterpart. Add some comments and extra
lockdeps to ensure the locking is done right.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/797c619943dac06529e9d3fcb16e4c3cde6ad1a3.1655684496.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

9da070b1

io_uring: add io_commit_cqring_flush() · 46929b08

由 Pavel Begunkov 提交于 6月 20, 2022

Since __io_commit_cqring_flush users moved to different files, introduce
io_commit_cqring_flush() helper and encapsulate all flags testing details
inside.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/0da03887435dd9869ffe46dcd3962bf104afcca3.1655684496.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

46929b08

io_uring: introduce locking helpers for CQE posting · 25399321

由 Pavel Begunkov 提交于 6月 20, 2022

spin_lock(&ctx->completion_lock);
/* post CQEs */
io_commit_cqring(ctx);
spin_unlock(&ctx->completion_lock);
io_cqring_ev_posted(ctx);

We have many places repeating this sequence, and the three function
unlock section is not perfect from the maintainance perspective and also
makes it harder to add new locking/sync trick.

Introduce two helpers. io_cq_lock(), which is simple and only grabs
->completion_lock, and io_cq_unlock_post() encapsulating the three call
section.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/fe0c682bf7f7b55d9be55b0d034be9c1949277dc.1655684496.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

25399321

io_uring: hide eventfd assumptions in eventfd paths · 305bef98

由 Pavel Begunkov 提交于 6月 20, 2022

Some io_uring-eventfd users assume that there won't be spurious wakeups.
That assumption has to be honoured by all io_cqring_ev_posted() callers,
which is inconvenient and from time to time leads to problems but should
be maintained to not break the userspace.

Instead of making the callers track whether a CQE was posted or not, hide
it inside io_eventfd_signal(). It saves ->cached_cq_tail it saw last time
and triggers the eventfd only when ->cached_cq_tail changed since then.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/0ffc66bae37a2513080b601e4370e147faaa72c5.1655684496.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

305bef98

io_uring: fix multi ctx cancellation · affa87db

由 Pavel Begunkov 提交于 6月 20, 2022

io_uring_try_cancel_requests() loops until there is nothing left to do
with the ring, however there might be several rings and they might have
dependencies between them, e.g. via poll requests.

Instead of cancelling rings one by one, try to cancel them all and only
then loop over if we still potenially some work to do.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/8d491fe02d8ac4c77ff38061cf86b9a827e8845c.1655684496.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

affa87db

io_uring: remove ->flush_cqes optimisation · d9dee430

由 Pavel Begunkov 提交于 6月 19, 2022

It's not clear how widely used IOSQE_CQE_SKIP_SUCCESS is, and how often
->flush_cqes flag prevents from completion being flushed. Sometimes it's
high level of concurrency that enables it at least for one CQE, but
sometimes it doesn't save much because nobody waiting on the CQ.

Remove ->flush_cqes flag and the optimisation, it should benefit the
normal use case. Note, that there is no spurious eventfd problem with
that as checks for spuriousness were incorporated into
io_eventfd_signal().
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/692e81eeddccc096f449a7960365fa7b4a18f8e6.1655637157.git.asml.silence@gmail.com
[axboe: remove now dead state->flush_cqes variable]
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d9dee430

io_uring: move io_eventfd_signal() · a830ffd2

由 Pavel Begunkov 提交于 6月 19, 2022

Move io_eventfd_signal() in the sources without any changes and kill its
forward declaration.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/9ebebb3f6f56f5a5448a621e0b6a537720c43334.1655637157.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

a830ffd2

io_uring: remove extra io_commit_cqring() · d142c3ec

由 Pavel Begunkov 提交于 6月 19, 2022

We don't post events in __io_commit_cqring_flush() anymore but send all
requests to tw, so no need to do io_commit_cqring() there.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/f2481e32375e749be89c42e4804268b608722cef.1655637157.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

d142c3ec

io_uring: clean up tracing events · 48863ffd

由 Pavel Begunkov 提交于 6月 16, 2022

We have lots of trace events accepting an io_uring request and wanting
to print some of its fields like user_data, opcode, flags and so on.
However, as trace points were unaware of io_uring structures, we had to
pass all the fields as arguments. Teach trace/events/io_uring.h about
struct io_kiocb and stop the misery of passing a horde of arguments to
trace helpers.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/40ff72f92798114e56d400f2b003beb6cde6ef53.1655384063.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

48863ffd

io_uring: kill extra io_uring_types.h includes · 27a9d66f

由 Pavel Begunkov 提交于 6月 16, 2022

io_uring/io_uring.h already includes io_uring_types.h, no need to
include it every time. Kill it in a bunch of places, it prepares us for
following patches.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/94d8c943fbe0ef949981c508ddcee7fc1c18850f.1655384063.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

27a9d66f

io_uring: change ->cqe_cached invariant for CQE32 · b3659a65

由 Pavel Begunkov 提交于 6月 17, 2022

With IORING_SETUP_CQE32 ->cqe_cached doesn't store a real address but
rather an implicit offset into cqes. Store the real cqe pointer and
increment it accordingly if CQE32.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/1ee1838cba16bed96381a006950b36ba640d998c.1655455613.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

b3659a65

io_uring: introduce io_req_cqe_overflow() · 68494a65

由 Pavel Begunkov 提交于 6月 17, 2022

__io_fill_cqe_req() is hot and inlined, we want it to be as small as
possible. Add io_req_cqe_overflow() accepting only a request and doing
all overflow accounting, and replace with it two calls to 6 argument
io_cqring_event_overflow().
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/048b9fbcce56814d77a1a540409c98c3d383edcb.1655455613.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

68494a65

io_uring: don't inline __io_get_cqe() · faf88dde

由 Pavel Begunkov 提交于 6月 17, 2022

__io_get_cqe() is not as hot as io_get_cqe(), no need to inline it, it
sheds ~500B from the binary.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/c1ac829198a881b7af8710926f99a3559b9f24c0.1655455613.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

faf88dde

io_uring: don't expose io_fill_cqe_aux() · d245bca6

由 Pavel Begunkov 提交于 6月 17, 2022

Deduplicate some code and add a helper for filling an aux CQE, locking
and notification.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/b7c6557c8f9dc5c4cfb01292116c682a0ff61081.1655455613.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

d245bca6

io_uring: mutex locked poll hashing · 9ca9fb24

由 Pavel Begunkov 提交于 6月 16, 2022

Currently we do two extra spin lock/unlock pairs to add a poll/apoll
request to the cancellation hash table and remove it from there.

On the submission side we often already hold ->uring_lock and tw
completion is likely to hold it as well. Add a second cancellation hash
table protected by ->uring_lock. In concerns for latency because of a
need to have the mutex locked on the completion side, use the new table
only in following cases:

1) IORING_SETUP_SINGLE_ISSUER: only one task grabs uring_lock, so there
is little to no contention and so the main tw hander will almost
always end up grabbing it before calling callbacks.

2) IORING_SETUP_SQPOLL: same as with single issuer, only one task is
a major user of ->uring_lock.

3) apoll: we normally grab the lock on the completion side anyway to
execute the request, so it's free.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/1bbad9c78c454b7b92f100bbf46730a37df7194f.1655371007.git.asml.silence@gmail.comReviewed-by: NHao Xu <howeyxu@tencent.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

9ca9fb24

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功