提交 · a7c41b4687f5902af70cd559806990930c8a307b · openeuler / Kernel

31 5月, 2022 1 次提交

io_uring: let IORING_OP_FILES_UPDATE support choosing fixed file slots · a7c41b46

由 Xiaoguang Wang 提交于 5月 30, 2022

One big issue with the file registration feature is that it needs user
space apps to maintain free slot info about io_uring's fixed file table,
which really is a burden for development. io_uring now supports choosing
free file slot for user space apps by using IORING_FILE_INDEX_ALLOC flag
in accept, open, and socket operations, but they need the app to use
direct accept or direct open, which not all apps are prepared to use yet.

To support apps that still need real fds, make use of the registration
feature easier. Let IORING_OP_FILES_UPDATE support choosing fixed file
slots, which will store picked fixed files slots in fd array and let cqe
return the number of slots allocated.
Suggested-by: NHao Xu <howeyxu@tencent.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
[axboe: move flag to uapi io_uring header, change goto to break, init]
Signed-off-by: NJens Axboe <axboe@kernel.dk>

a7c41b46

18 5月, 2022 1 次提交

io_uring: add support for ring mapped supplied buffers · c7fb1942

由 Jens Axboe 提交于 4月 30, 2022

Provided buffers allow an application to supply io_uring with buffers
that can then be grabbed for a read/receive request, when the data
source is ready to deliver data. The existing scheme relies on using
IORING_OP_PROVIDE_BUFFERS to do that, but it can be difficult to use
in real world applications. It's pretty efficient if the application
is able to supply back batches of provided buffers when they have been
consumed and the application is ready to recycle them, but if
fragmentation occurs in the buffer space, it can become difficult to
supply enough buffers at the time. This hurts efficiency.

Add a register op, IORING_REGISTER_PBUF_RING, which allows an application
to setup a shared queue for each buffer group of provided buffers. The
application can then supply buffers simply by adding them to this ring,
and the kernel can consume then just as easily. The ring shares the head
with the application, the tail remains private in the kernel.

Provided buffers setup with IORING_REGISTER_PBUF_RING cannot use
IORING_OP_{PROVIDE,REMOVE}_BUFFERS for adding or removing entries to the
ring, they must use the mapped ring. Mapped provided buffer rings can
co-exist with normal provided buffers, just not within the same group ID.

To gauge overhead of the existing scheme and evaluate the mapped ring
approach, a simple NOP benchmark was written. It uses a ring of 128
entries, and submits/completes 32 at the time. 'Replenish' is how
many buffers are provided back at the time after they have been
consumed:

Test			Replenish			NOPs/sec
================================================================
No provided buffers	NA				~30M
Provided buffers	32				~16M
Provided buffers	 1				~10M
Ring buffers		32				~27M
Ring buffers		 1				~27M

The ring mapped buffers perform almost as well as not using provided
buffers at all, and they don't care if you provided 1 or more back at
the same time. This means application can just replenish as they go,
rather than need to batch and compact, further reducing overhead in the
application. The NOP benchmark above doesn't need to do any compaction,
so that overhead isn't even reflected in the above test.
Co-developed-by: NDylan Yudaken <dylany@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c7fb1942

14 5月, 2022 1 次提交

io_uring: add IORING_ACCEPT_MULTISHOT for accept · 390ed29b

由 Hao Xu 提交于 5月 14, 2022

add an accept_flag IORING_ACCEPT_MULTISHOT for accept, which is to
support multishot.
Signed-off-by: NHao Xu <howeyxu@tencent.com>
Link: https://lore.kernel.org/r/20220514142046.58072-2-haoxu.linux@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

390ed29b

13 5月, 2022 2 次提交

io_uring: add flag for allocating a fully sparse direct descriptor space · a8da73a3

由 Jens Axboe 提交于 5月 09, 2022

Currently to setup a fully sparse descriptor space upfront, the app needs
to alloate an array of the full size and memset it to -1 and then pass
that in. Make this a bit easier by allowing a flag that simply does
this internally rather than needing to copy each slot separately.

This works with IORING_REGISTER_FILES2 as the flag is set in struct
io_uring_rsrc_register, and is only allow when the type is
IORING_RSRC_FILE as this doesn't make sense for registered buffers.
Reviewed-by: NHao Xu <howeyxu@tencent.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

a8da73a3

io_uring: allow allocated fixed files for openat/openat2 · 1339f24b

由 Jens Axboe 提交于 5月 07, 2022

If the application passes in IORING_FILE_INDEX_ALLOC as the file_slot,
then that's a hint to allocate a fixed file descriptor rather than have
one be passed in directly.

This can be useful for having io_uring manage the direct descriptor space.

Normal open direct requests will complete with 0 for success, and < 0
in case of error. If io_uring is asked to allocated the direct descriptor,
then the direct descriptor is returned in case of success.
Reviewed-by: NHao Xu <howeyxu@tencent.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

1339f24b

11 5月, 2022 1 次提交

fs,io_uring: add infrastructure for uring-cmd · ee692a21

由 Jens Axboe 提交于 5月 11, 2022

file_operations->uring_cmd is a file private handler.
This is somewhat similar to ioctl but hopefully a lot more sane and
useful as it can be used to enable many io_uring capabilities for the
underlying operation.

IORING_OP_URING_CMD is a file private kind of request. io_uring doesn't
know what is in this command type, it's for the provider of ->uring_cmd()
to deal with.
Co-developed-by: NKanchan Joshi <joshi.k@samsung.com>
Signed-off-by: NKanchan Joshi <joshi.k@samsung.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220511054750.20432-2-joshi.k@samsung.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

ee692a21

09 5月, 2022 2 次提交

io_uring: support CQE32 in io_uring_cqe · 7a51e5b4

由 Stefan Roesch 提交于 4月 26, 2022

This adds the big_cqe array to the struct io_uring_cqe to support large
CQE's.
Co-developed-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NStefan Roesch <shr@fb.com>
Reviewed-by: NKanchan Joshi <joshi.k@samsung.com>
Link: https://lore.kernel.org/r/20220426182134.136504-2-shr@fb.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

7a51e5b4

io_uring: add support for 128-byte SQEs · ebdeb7c0

由 Jens Axboe 提交于 3月 31, 2022

Normal SQEs are 64-bytes in length, which is fine for all the commands
we support. However, in preparation for supporting passthrough IO,
provide an option for setting up a ring with 128-byte SQEs.

We continue to use the same type for io_uring_sqe, it's marked and
commented with a zero sized array pad at the end. This provides up
to 80 bytes of data for a passthrough command - 64 bytes for the
extra added data, and 16 bytes available at the end of the existing
SQE.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ebdeb7c0

06 5月, 2022 1 次提交

io_uring: add POLL_FIRST support for send/sendmsg and recv/recvmsg · 0455d4cc

由 Jens Axboe 提交于 4月 26, 2022

If IORING_RECVSEND_POLL_FIRST is set for recv/recvmsg or send/sendmsg,
then we arm poll first rather than attempt a receive or send upfront.
This can be useful if we expect there to be no data (or space) available
for the request, as we can then avoid wasting time on the initial
issue attempt.
Reviewed-by: NHao Xu <howeyxu@tencent.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

0455d4cc

30 4月, 2022 3 次提交

io_uring: add IORING_SETUP_TASKRUN_FLAG · ef060ea9

由 Jens Axboe 提交于 4月 25, 2022

If IORING_SETUP_COOP_TASKRUN is set to use cooperative scheduling for
running task_work, then IORING_SETUP_TASKRUN_FLAG can be set so the
application can tell if task_work is pending in the kernel for this
ring. This allows use cases like io_uring_peek_cqe() to still function
appropriately, or for the task to know when it would be useful to
call io_uring_wait_cqe() to run pending events.
Reviewed-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/20220426014904.60384-7-axboe@kernel.dkSigned-off-by: NJens Axboe <axboe@kernel.dk>

ef060ea9

io_uring: use TWA_SIGNAL_NO_IPI if IORING_SETUP_COOP_TASKRUN is used · e1169f06

由 Jens Axboe 提交于 4月 25, 2022

If this is set, io_uring will never use an IPI to deliver a task_work
notification. This can be used in the common case where a single task or
thread communicates with the ring, and doesn't rely on
io_uring_cqe_peek().

This provides a noticeable win in performance, both from eliminating
the IPI itself, but also from avoiding interrupting the submitting
task unnecessarily.
Reviewed-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/20220426014904.60384-6-axboe@kernel.dkSigned-off-by: NJens Axboe <axboe@kernel.dk>

e1169f06

io_uring: return hint on whether more data is available after receive · f548a12e

由 Jens Axboe 提交于 4月 26, 2022

For now just use a CQE flag for this, with big CQE support we could
return the actual number of bytes left.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f548a12e

26 4月, 2022 1 次提交

io_uring: add type to op enum · cc51eaa8

由 Dylan Yudaken 提交于 4月 26, 2022

It is useful to have a type enum for opcodes, to allow the compiler to
assert that every value is used in a switch statement.
Signed-off-by: NDylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220426082907.3600028-2-dylany@fb.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

cc51eaa8

25 4月, 2022 6 次提交

io_uring: add socket(2) support · 1374e08e

由 Jens Axboe 提交于 4月 12, 2022

Supports both regular socket(2) where a normal file descriptor is
instantiated when called, or direct descriptors.

Link: https://lore.kernel.org/r/20220412202240.234207-3-axboe@kernel.dkSigned-off-by: NJens Axboe <axboe@kernel.dk>

1374e08e

io_uring: add fgetxattr and getxattr support · a56834e0

由 Stefan Roesch 提交于 3月 23, 2022

This adds support to io_uring for the fgetxattr and getxattr API.
Signed-off-by: NStefan Roesch <shr@fb.com>
Acked-by: NChristian Brauner <brauner@kernel.org>
Link: https://lore.kernel.org/r/20220323154420.3301504-5-shr@fb.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

a56834e0

io_uring: add fsetxattr and setxattr support · e9621e2b

由 Stefan Roesch 提交于 3月 23, 2022

This adds support to io_uring for the fsetxattr and setxattr API.
Signed-off-by: NStefan Roesch <shr@fb.com>
Acked-by: NChristian Brauner <christian.brauner@ubuntu.com>
Link: https://lore.kernel.org/r/20220323154420.3301504-4-shr@fb.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

e9621e2b

io_uring: add support for IORING_ASYNC_CANCEL_ANY · 970f256e

由 Jens Axboe 提交于 4月 18, 2022

Rather than match on a specific key, be it user_data or file, allow
canceling any request that we can lookup. Works like
IORING_ASYNC_CANCEL_ALL in that it cancels multiple requests, but it
doesn't key off user_data or the file.

Can't be set with IORING_ASYNC_CANCEL_FD, as that's a key selector.
Only one may be used at the time.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Link: https://lore.kernel.org/r/20220418164402.75259-6-axboe@kernel.dkSigned-off-by: NJens Axboe <axboe@kernel.dk>

970f256e

io_uring: allow IORING_OP_ASYNC_CANCEL with 'fd' key · 4bf94615

由 Jens Axboe 提交于 4月 18, 2022

Currently sqe->addr must contain the user_data of the request being
canceled. Introduce the IORING_ASYNC_CANCEL_FD flag, which tells the
kernel that we're keying off the file fd instead for cancelation. This
allows canceling any request that a) uses a file, and b) was assigned the
file based on the value being passed in.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Link: https://lore.kernel.org/r/20220418164402.75259-5-axboe@kernel.dk

4bf94615

io_uring: add support for IORING_ASYNC_CANCEL_ALL · 8e29da69

由 Jens Axboe 提交于 4月 18, 2022

The current cancelation will lookup and cancel the first request it
finds based on the key passed in. Add a flag that allows to cancel any
request that matches they key. It completes with the number of requests
found and canceled, or res < 0 if an error occured.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Link: https://lore.kernel.org/r/20220418164402.75259-4-axboe@kernel.dk

8e29da69

11 4月, 2022 1 次提交

io_uring: flag the fact that linked file assignment is sane · c4212f3e

由 Jens Axboe 提交于 4月 10, 2022

Give applications a way to tell if the kernel supports sane linked files,
as in files being assigned at the right time to be able to reliably
do <open file direct into slot X><read file from slot X> while using
IOSQE_IO_LINK to order them.

Not really a bug fix, but flag it as such so that it gets pulled in with
backports of the deferred file assignment.

Fixes: 6bf9c47a ("io_uring: defer file assignment")
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c4212f3e

24 3月, 2022 1 次提交

io_uring: remove IORING_CQE_F_MSG · 7ef66d18

由 Jens Axboe 提交于 3月 24, 2022

This was introduced with the message ring opcode, but isn't strictly
required for the request itself. The sender can encode what is needed
in user_data, which is passed to the receiver. It's unclear if having
a separate flag that essentially says "This CQE did not originate from
an SQE on this ring" provides any real utility to applications. While
we can always re-introduce a flag to provide this information, we cannot
take it away at a later point in time.

Remove the flag while we still can, before it's in a released kernel.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

7ef66d18

11 3月, 2022 2 次提交

io_uring: allow submissions to continue on error · bcbb7bf6

由 Jens Axboe 提交于 3月 10, 2022

By default, io_uring will stop submitting a batch of requests if we run
into an error submitting a request. This isn't strictly necessary, as
the error result is passed out-of-band via a CQE anyway. And it can be
a bit confusing for some applications.

Provide a way to setup a ring that will continue submitting on error,
when the error CQE has been posted.

There's still one case that will break out of submission. If we fail
allocating a request, then we'll still return -ENOMEM. We could in theory
post a CQE for that condition too even if we never got a request. Leave
that for a potential followup.
Reported-by: NDylan Yudaken <dylany@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

bcbb7bf6

io_uring: add support for IORING_OP_MSG_RING command · 4f57f06c

由 Jens Axboe 提交于 3月 10, 2022

This adds support for IORING_OP_MSG_RING, which allows an SQE to signal
another ring. That allows either waking up someone waiting on the ring,
or even passing a 64-bit value via the user_data field in the CQE.

sqe->fd must contain the fd of a ring that should receive the CQE.
sqe->off will be propagated to the cqe->user_data on the target ring,
and sqe->len will be propagated to cqe->res. The results CQE will have
IORING_CQE_F_MSG set in its flags, to indicate that this CQE was generated
from a messaging request rather than a SQE issued locally on that ring.
This effectively allows passing a 64-bit and a 32-bit quantify between
the two rings.

This request type has the following request specific error cases:

- -EBADFD. Set if the sqe->fd doesn't point to a file descriptor that is
  of the io_uring type.
- -EOVERFLOW. Set if we were not able to deliver a request to the target
  ring.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

4f57f06c

10 3月, 2022 1 次提交

io_uring: add support for registering ring file descriptors · e7a6c00d

由 Jens Axboe 提交于 3月 04, 2022

Lots of workloads use multiple threads, in which case the file table is
shared between them. This makes getting and putting the ring file
descriptor for each io_uring_enter(2) system call more expensive, as it
involves an atomic get and put for each call.

Similarly to how we allow registering normal file descriptors to avoid
this overhead, add support for an io_uring_register(2) API that allows
to register the ring fds themselves:

1) IORING_REGISTER_RING_FDS - takes an array of io_uring_rsrc_update
   structs, and registers them with the task.
2) IORING_UNREGISTER_RING_FDS - takes an array of io_uring_src_update
   structs, and unregisters them.

When a ring fd is registered, it is internally represented by an offset.
This offset is returned to the application, and the application then
uses this offset and sets IORING_ENTER_REGISTERED_RING for the
io_uring_enter(2) system call. This works just like using a registered
file descriptor, rather than a real one, in an SQE, where
IOSQE_FIXED_FILE gets set to tell io_uring that we're using an internal
offset/descriptor rather than a real file descriptor.

In initial testing, this provides a nice bump in performance for
threaded applications in real world cases where the batch count (eg
number of requests submitted per io_uring_enter(2) invocation) is low.
In a microbenchmark, submitting NOP requests, we see the following
increases in performance:

Requests per syscall	Baseline	Registered	Increase
----------------------------------------------------------------
1			 ~7030K		 ~8080K		+15%
2			~13120K		~14800K		+13%
4			~22740K		~25300K		+11%
Co-developed-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e7a6c00d

25 11月, 2021 1 次提交

io_uring: add option to skip CQE posting · 04c76b41

由 Pavel Begunkov 提交于 11月 10, 2021

Emitting a CQE is expensive from the kernel perspective. Often, it's
also not convenient for the userspace, spends some cycles on processing
and just complicates the logic. A similar problems goes for linked
requests, where we post an CQE for each request in the link.

Introduce a new flags, IOSQE_CQE_SKIP_SUCCESS, trying to help with it.
When set and a request completed successfully, it won't generate a CQE.
When fails, it produces an CQE, but all following linked requests will
be CQE-less, regardless whether they have IOSQE_CQE_SKIP_SUCCESS or not.
The notion of "fail" is the same as for link failing-cancellation, where
it's opcode dependent, and _usually_ result >= 0 is a success, but not
always.

Linked timeouts are a bit special. When the requests it's linked to was
not attempted to be executed, e.g. failing linked requests, it follows
the description above. Otherwise, whether a linked timeout will post a
completion or not solely depends on IOSQE_CQE_SKIP_SUCCESS of that
linked timeout request. Linked timeout never "fail" during execution, so
for them it's unconditional. It's expected for users to not really care
about the result of it but rely solely on the result of the master
request. Another reason for such a treatment is that it's racy, and the
timeout callback may be running awhile the master request posts its
completion.

use case 1:
If one doesn't care about results of some requests, e.g. normal
timeouts, just set IOSQE_CQE_SKIP_SUCCESS. Error result will still be
posted and need to be handled.

use case 2:
Set IOSQE_CQE_SKIP_SUCCESS for all requests of a link but the last,
and it'll post a completion only for the last one if everything goes
right, otherwise there will be one only one CQE for the first failed
request.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/0220fbe06f7cf99e6fc71b4297bb1cb6c0e89c2c.1636559119.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

04c76b41

19 10月, 2021 1 次提交

io_uring: add flag to not fail link after timeout · 6224590d

由 Pavel Begunkov 提交于 10月 02, 2021

For some reason non-off IORING_OP_TIMEOUT always fails links, it's
pretty inconvenient and unnecessary limits chaining after it to hard
linking, which is far from ideal, e.g. doesn't pair well with timeout
cancellation. Add a flag forcing it to not fail links on -ETIME.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/17c7ec0fb7a6113cc6be8cdaedcada0ba836ac0e.1633199723.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

6224590d

14 9月, 2021 1 次提交

io-wq: provide IO_WQ_* constants for IORING_REGISTER_IOWQ_MAX_WORKERS arg items · dd47c104

由 Eugene Syromiatnikov 提交于 9月 13, 2021

The items passed in the array pointed by the arg parameter
of IORING_REGISTER_IOWQ_MAX_WORKERS io_uring_register operation
carry certain semantics: they refer to different io-wq worker categories;
provide IO_WQ_* constants in the UAPI, so these categories can be referenced
in the user space code.
Suggested-by: NJens Axboe <axboe@kernel.dk>
Complements: 2e480058 ("io-wq: provide a way to limit max number of workers")
Signed-off-by: NEugene Syromiatnikov <esyr@redhat.com>
Link: https://lore.kernel.org/r/20210913154415.GA12890@asgard.redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

dd47c104

30 8月, 2021 1 次提交

io_uring: allow updating linked timeouts · f1042b6c

由 Pavel Begunkov 提交于 8月 28, 2021

We allow updating normal timeouts, add support for adjusting timings of
linked timeouts as well.
Reported-by: NVictor Stewart <v@nametag.social>
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f1042b6c

29 8月, 2021 2 次提交

io_uring: support CLOCK_BOOTTIME/REALTIME for timeouts · 50c1df2b

由 Jens Axboe 提交于 8月 27, 2021

Certain use cases want to use CLOCK_BOOTTIME or CLOCK_REALTIME rather than
CLOCK_MONOTONIC, instead of the default CLOCK_MONOTONIC.

Add an IORING_TIMEOUT_BOOTTIME and IORING_TIMEOUT_REALTIME flag that
allows timeouts and linked timeouts to use the selected clock source.

Only one clock source may be selected, and we -EINVAL the request if more
than one is given. If neither BOOTIME nor REALTIME are selected, the
previous default of MONOTONIC is used.

Link: https://github.com/axboe/liburing/issues/369Signed-off-by: NJens Axboe <axboe@kernel.dk>

50c1df2b

io-wq: provide a way to limit max number of workers · 2e480058

由 Jens Axboe 提交于 8月 27, 2021

io-wq divides work into two categories:

1) Work that completes in a bounded time, like reading from a regular file
or a block device. This type of work is limited based on the size of
the SQ ring.

2) Work that may never complete, we call this unbounded work. The amount
of workers here is just limited by RLIMIT_NPROC.

For various uses cases, it's handy to have the kernel limit the maximum
amount of pending workers for both categories. Provide a way to do with
with a new IORING_REGISTER_IOWQ_MAX_WORKERS operation.

IORING_REGISTER_IOWQ_MAX_WORKERS takes an array of two integers and sets
the max worker count to what is being passed in for each category. The
old values are returned into that same array. If 0 is being passed in for
either category, it simply returns the current value.

The value is capped at RLIMIT_NPROC. This actually isn't that important
as it's more of a hint, if we're exceeding the value then our attempt
to fork a new worker will fail. This happens naturally already if more
than one node is in the system, as these values are per-node internally
for io-wq.
Reported-by: NJohannes Lundberg <johalun0@gmail.com>
Link: https://github.com/axboe/liburing/issues/420Signed-off-by: NJens Axboe <axboe@kernel.dk>

2e480058

25 8月, 2021 1 次提交

io_uring: openat directly into fixed fd table · b9445598

由 Pavel Begunkov 提交于 8月 25, 2021

Instead of opening a file into a process's file table as usual and then
registering the fd within io_uring, some users may want to skip the
first step and place it directly into io_uring's fixed file table.
This patch adds such a capability for IORING_OP_OPENAT and
IORING_OP_OPENAT2.

The behaviour is controlled by setting sqe->file_index, where 0 implies
the old behaviour using normal file tables. If non-zero value is
specified, then it will behave as described and place the file into a
fixed file slot sqe->file_index - 1. A file table should be already
created, the slot should be valid and empty, otherwise the operation
will fail.

Keep the error codes consistent with IORING_OP_FILES_UPDATE, ENXIO and
EINVAL on inappropriate fixed tables, and return EBADF on collision with
already registered file.

Note: IOSQE_FIXED_FILE can't be used to switch between modes, because
accept takes a file, and it already uses the flag with a different
meaning.
Suggested-by: NJosh Triplett <josh@joshtriplett.org>
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Link: https://lore.kernel.org/r/e9b33d1163286f51ea707f87d95bd596dada1e65.1629888991.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

b9445598

24 8月, 2021 3 次提交

io_uring: add support for IORING_OP_LINKAT · cf30da90

由 Dmitry Kadashev 提交于 7月 08, 2021

IORING_OP_LINKAT behaves like linkat(2) and takes the same flags and
arguments.

In some internal places 'hardlink' is used instead of 'link' to avoid
confusion with the SQE links. Name 'link' conflicts with the existing
'link' member of io_kiocb.
Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Suggested-by: NChristian Brauner <christian.brauner@ubuntu.com>
Link: https://lore.kernel.org/io-uring/20210514145259.wtl4xcsp52woi6ab@wittgenstein/Signed-off-by: NDmitry Kadashev <dkadashev@gmail.com>
Acked-by: NChristian Brauner <christian.brauner@ubuntu.com>
Link: https://lore.kernel.org/r/20210708063447.3556403-12-dkadashev@gmail.com
[axboe: add splice_fd_in check]
Signed-off-by: NJens Axboe <axboe@kernel.dk>

cf30da90

io_uring: add support for IORING_OP_SYMLINKAT · 7a8721f8

由 Dmitry Kadashev 提交于 7月 08, 2021

IORING_OP_SYMLINKAT behaves like symlinkat(2) and takes the same flags
and arguments.
Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Suggested-by: NChristian Brauner <christian.brauner@ubuntu.com>
Link: https://lore.kernel.org/io-uring/20210514145259.wtl4xcsp52woi6ab@wittgenstein/Signed-off-by: NDmitry Kadashev <dkadashev@gmail.com>
Acked-by: NChristian Brauner <christian.brauner@ubuntu.com>
Link: https://lore.kernel.org/r/20210708063447.3556403-11-dkadashev@gmail.com
[axboe: add splice_fd_in check]
Signed-off-by: NJens Axboe <axboe@kernel.dk>

7a8721f8

io_uring: add support for IORING_OP_MKDIRAT · e34a02dc

由 Dmitry Kadashev 提交于 7月 08, 2021

IORING_OP_MKDIRAT behaves like mkdirat(2) and takes the same flags
and arguments.
Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NDmitry Kadashev <dkadashev@gmail.com>
Acked-by: NChristian Brauner <christian.brauner@ubuntu.com>
Link: https://lore.kernel.org/r/20210708063447.3556403-10-dkadashev@gmail.com
[axboe: add splice_fd_in check]
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e34a02dc

01 7月, 2021 1 次提交

io_uring: simplify struct io_uring_sqe layout · 9ba6a1c0

由 Pavel Begunkov 提交于 6月 24, 2021

Flatten struct io_uring_sqe, the last union is exactly 64B, so move them
out of union { struct { ... }}, and decrease __pad2 size.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/2e21ef7aed136293d654450bc3088973a8adc730.1624543113.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

9ba6a1c0

18 6月, 2021 1 次提交

io_uring: allow user configurable IO thread CPU affinity · fe76421d

由 Jens Axboe 提交于 6月 17, 2021

io-wq defaults to per-node masks for IO workers. This works fine by
default, but isn't particularly handy for workloads that prefer more
specific affinities, for either performance or isolation reasons.

This adds IORING_REGISTER_IOWQ_AFF that allows the user to pass in a CPU
mask that is then applied to IO thread workers, and an
IORING_UNREGISTER_IOWQ_AFF that simply resets the masks back to the
default of per-node.

Note that no care is given to existing IO threads, they will need to go
through a reschedule before the affinity is correct if they are already
running or sleeping.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

fe76421d

11 6月, 2021 2 次提交

io_uring: add feature flag for rsrc tags · 9690557e

由 Pavel Begunkov 提交于 6月 10, 2021

Add IORING_FEAT_RSRC_TAGS indicating that io_uring supports a bunch of
new IORING_REGISTER operations, in particular
IORING_REGISTER_[FILES[,UPDATE]2,BUFFERS[2,UPDATE]] that support rsrc
tagging, and also indicating implemented dynamic fixed buffer updates.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/9b995d4045b6c6b4ab7510ca124fd25ac2203af7.1623339162.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

9690557e

io_uring: change registration/upd/rsrc tagging ABI · 992da01a

由 Pavel Begunkov 提交于 6月 10, 2021

There are ABI moments about recently added rsrc registration/update and
tagging that might become a nuisance in the future. First,
IORING_REGISTER_RSRC[_UPD] hide different types of resources under it,
so breaks fine control over them by restrictions. It works for now, but
once those are wanted under restrictions it would require a rework.

It was also inconvenient trying to fit a new resource not supporting
all the features (e.g. dynamic update) into the interface, so better
to return to IORING_REGISTER_* top level dispatching.

Second, register/update were considered to accept a type of resource,
however that's not a good idea because there might be several ways of
registration of a single resource type, e.g. we may want to add
non-contig buffers or anything more exquisite as dma mapped memory.
So, remove IORING_RSRC_[FILE,BUFFER] out of the ABI, and place them
internally for now to limit changes.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/9b554897a7c17ad6e3becc48dfed2f7af9f423d5.1623339162.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

992da01a

26 4月, 2021 2 次提交

io_uring: add full-fledged dynamic buffers support · 634d00df

由 Pavel Begunkov 提交于 4月 25, 2021

Hook buffers into all rsrc infrastructure, including tagging and
updates.
Suggested-by: NBijan Mottahedeh <bijan.mottahedeh@oracle.com>
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/119ed51d68a491dae87eb55fb467a47870c86aad.1619356238.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

634d00df

io_uring: add generic rsrc update with tags · c3bdad02

由 Pavel Begunkov 提交于 4月 25, 2021

Add IORING_REGISTER_RSRC_UPDATE, which also supports passing in rsrc
tags. Implement it for registered files.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/d4dc66df204212f64835ffca2c4eb5e8363f2f05.1619356238.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

c3bdad02

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功