提交 · 960e432dfa5927892a9b170d14de874597b84849 · openeuler / Kernel

12 11月, 2019 6 次提交

io_uring: use correct "is IO worker" helper · 960e432d

由 Jens Axboe 提交于 11月 12, 2019

Since we switched to io-wq, the dependent link optimization for when to
pass back work inline has been broken. Fix this by providing a suitable
io-wq helper for io_uring to use to detect when to do this.

Fixes: 561fb04a ("io_uring: replace workqueue usage with io-wq")
Signed-off-by: NJens Axboe <axboe@kernel.dk>

960e432d

io_uring: fix -ENOENT issue with linked timer with short timeout · 76a46e06

由 Jens Axboe 提交于 11月 10, 2019

If you prep a read (for example) that needs to get punted to async
context with a timer, if the timeout is sufficiently short, the timer
request will get completed with -ENOENT as it could not find the read.

The issue is that we prep and start the timer before we start the read.
Hence the timer can trigger before the read is even started, and the end
result is then that the timer completes with -ENOENT, while the read
starts instead of being cancelled by the timer.

Fix this by splitting the linked timer into two parts:

1) Prep and validate the linked timer
2) Start timer

The read is then started between steps 1 and 2, so we know that the
timer will always have a consistent view of the read request state.
Reported-by: NHrvoje Zeba <zeba.hrvoje@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

76a46e06

io_uring: don't do flush cancel under inflight_lock · 768134d4

由 Jens Axboe 提交于 11月 10, 2019

We can't safely cancel under the inflight lock. If the work hasn't been
started yet, then io_wq_cancel_work() simply marks the work as cancelled
and invokes the work handler. But if the work completion needs to grab
the inflight lock because it's grabbing user files, then we'll deadlock
trying to finish the work as we already hold that lock.

Instead grab a reference to the request, if it isn't already zero. If
it's zero, then we know it's going through completion anyway, and we
can safely ignore it. If it's not zero, then we can drop the lock and
attempt to cancel from there.

This also fixes a missing finish_wait() at the end of
io_uring_cancel_files().
Signed-off-by: NJens Axboe <axboe@kernel.dk>

768134d4

io_uring: flag SQPOLL busy condition to userspace · c1edbf5f

由 Jens Axboe 提交于 11月 10, 2019

Now that we have backpressure, for SQPOLL, we have one more condition
that warrants flagging that the application needs to enter the kernel:
we failed to submit IO due to backpressure. Make sure we catch that
and flag it appropriately.

If we run into backpressure issues with the SQPOLL thread, flag it
as such to the application by setting IORING_SQ_NEED_WAKEUP. This will
cause the application to enter the kernel, and that will flush the
backlog and clear the condition.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c1edbf5f

io_uring: make ASYNC_CANCEL work with poll and timeout · 47f46768

由 Jens Axboe 提交于 11月 09, 2019

It's a little confusing that we have multiple types of command
cancellation opcodes now that we have a generic one. Make the generic
one work with POLL_ADD and TIMEOUT commands as well, that makes for an
easier to use API for the application. The fact that they currently
don't is a bit confusing.

Add a helper that takes care of it, so we can user it from both
IORING_OP_ASYNC_CANCEL and from the linked timeout cancellation.
Reported-by: NHrvoje Zeba <zeba.hrvoje@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

47f46768

io_uring: provide fallback request for OOM situations · 0ddf92e8

由 Jens Axboe 提交于 11月 08, 2019

One thing that really sucks for userspace APIs is if the kernel passes
back -ENOMEM/-EAGAIN for resource shortages. The application really has
no idea of what to do in those cases. Should it try and reap
completions? Probably a good idea. Will it solve the issue? Who knows.

This patch adds a simple fallback mechanism if we fail to allocate
memory for a request. If we fail allocating memory from the slab for a
request, we punt to a pre-allocated request. There's just one of these
per io_ring_ctx, but the important part is if we ever return -EBUSY to
the application, the applications knows that it can wait for events and
make forward progress when events have completed. This is the important
part.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

0ddf92e8

11 11月, 2019 6 次提交

io_uring: convert accept4() -ERESTARTSYS into -EINTR · 8e3cca12

由 Jens Axboe 提交于 11月 09, 2019

If we cancel a pending accept operating with a signal, we get
-ERESTARTSYS returned. Turn that into -EINTR for userspace, we should
not be return -ERESTARTSYS.

Fixes: 17f2fe35 ("io_uring: add support for IORING_OP_ACCEPT")
Reported-by: NHrvoje Zeba <zeba.hrvoje@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

8e3cca12

io_uring: fix error clear of ->file_table in io_sqe_files_register() · 46568e9b

由 Jens Axboe 提交于 11月 10, 2019

syzbot reports that when using failslab and friends, we can get a double
free in io_sqe_files_unregister():

BUG: KASAN: double-free or invalid-free in
io_sqe_files_unregister+0x20b/0x300 fs/io_uring.c:3185

CPU: 1 PID: 8819 Comm: syz-executor452 Not tainted 5.4.0-rc6-next-20191108
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x197/0x210 lib/dump_stack.c:118
  print_address_description.constprop.0.cold+0xd4/0x30b mm/kasan/report.c:374
  kasan_report_invalid_free+0x65/0xa0 mm/kasan/report.c:468
  __kasan_slab_free+0x13a/0x150 mm/kasan/common.c:450
  kasan_slab_free+0xe/0x10 mm/kasan/common.c:480
  __cache_free mm/slab.c:3426 [inline]
  kfree+0x10a/0x2c0 mm/slab.c:3757
  io_sqe_files_unregister+0x20b/0x300 fs/io_uring.c:3185
  io_ring_ctx_free fs/io_uring.c:3998 [inline]
  io_ring_ctx_wait_and_kill+0x348/0x700 fs/io_uring.c:4060
  io_uring_release+0x42/0x50 fs/io_uring.c:4068
  __fput+0x2ff/0x890 fs/file_table.c:280
  ____fput+0x16/0x20 fs/file_table.c:313
  task_work_run+0x145/0x1c0 kernel/task_work.c:113
  exit_task_work include/linux/task_work.h:22 [inline]
  do_exit+0x904/0x2e60 kernel/exit.c:817
  do_group_exit+0x135/0x360 kernel/exit.c:921
  __do_sys_exit_group kernel/exit.c:932 [inline]
  __se_sys_exit_group kernel/exit.c:930 [inline]
  __x64_sys_exit_group+0x44/0x50 kernel/exit.c:930
  do_syscall_64+0xfa/0x760 arch/x86/entry/common.c:290
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x43f2c8
Code: 31 b8 c5 f7 ff ff 48 8b 5c 24 28 48 8b 6c 24 30 4c 8b 64 24 38 4c 8b
6c 24 40 4c 8b 74 24 48 4c 8b 7c 24 50 48 83 c4 58 c3 66 <0f> 1f 84 00 00
00 00 00 48 8d 35 59 ca 00 00 0f b6 d2 48 89 fb 48
RSP: 002b:00007ffd5b976008 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 000000000043f2c8
RDX: 0000000000000000 RSI: 000000000000003c RDI: 0000000000000000
RBP: 00000000004bf0a8 R08: 00000000000000e7 R09: ffffffffffffffd0
R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000001
R13: 00000000006d1180 R14: 0000000000000000 R15: 0000000000000000

This happens if we fail allocating the file tables. For that case we do
free the file table correctly, but we forget to set it to NULL. This
means that ring teardown will see it as being non-NULL, and attempt to
free it again.

Fix this by clearing the file_table pointer if we free the table.

Reported-by: syzbot+3254bc44113ae1e331ee@syzkaller.appspotmail.com
Fixes: 65e19f54 ("io_uring: support for larger fixed file sets")
Reviewed-by: NBob Liu <bob.liu@oracle.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

46568e9b

io_uring: separate the io_free_req and io_free_req_find_next interface · c69f8dbe

由 Jackie Liu 提交于 11月 09, 2019

Similar to the distinction between io_put_req and io_put_req_find_next,
io_free_req has been modified similarly, with no functional changes.
Signed-off-by: NJackie Liu <liuyun01@kylinos.cn>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c69f8dbe

io_uring: keep io_put_req only responsible for release and put req · ec9c02ad

由 Jackie Liu 提交于 11月 08, 2019

We already have io_put_req_find_next to find the next req of the link.
we should not use the io_put_req function to find them. They should be
functions of the same level.
Signed-off-by: NJackie Liu <liuyun01@kylinos.cn>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ec9c02ad

io_uring: remove passed in 'ctx' function parameter ctx if possible · a197f664

由 Jackie Liu 提交于 11月 08, 2019

Many times, the core of the function is req, and req has already set
req->ctx at initialization time, so there is no need to pass in the
ctx from the caller.

Cleanup, no functional change.
Signed-off-by: NJackie Liu <liuyun01@kylinos.cn>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

a197f664

io_uring: reduce/pack size of io_ring_ctx · 206aefde

由 Jens Axboe 提交于 11月 07, 2019

With the recent flurry of additions and changes to io_uring, the
layout of io_ring_ctx has become a bit stale. We're right now at
704 bytes in size on my x86-64 build, or 11 cachelines. This
patch does two things:

- We have to completion structs embedded, that we only use for
  quiesce of the ctx (or shutdown) and for sqthread init cases.
  That 2x32 bytes right there, let's dynamically allocate them.

- Reorder the struct a bit with an eye on cachelines, use cases,
  and holes.

With this patch, we're down to 512 bytes, or 8 cachelines.
Reviewed-by: NJackie Liu <liuyun01@kylinos.cn>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

206aefde

08 11月, 2019 2 次提交

io_uring: properly mark async work as bounded vs unbounded · 5f8fd2d3

由 Jens Axboe 提交于 11月 07, 2019

Now that io-wq supports separating the two request lifetime types, mark
the following IO as having unbounded runtimes:

- Any read/write to a non-regular file
- Any specific networked IO
- Any poll command
Signed-off-by: NJens Axboe <axboe@kernel.dk>

5f8fd2d3

io-wq: add support for bounded vs unbunded work · c5def4ab

由 Jens Axboe 提交于 11月 07, 2019

io_uring supports request types that basically have two different
lifetimes:

1) Bounded completion time. These are requests like disk reads or writes,
   which we know will finish in a finite amount of time.
2) Unbounded completion time. These are generally networked IO, where we
   have no idea how long they will take to complete. Another example is
   POLL commands.

This patch provides support for io-wq to handle these differently, so we
don't starve bounded requests by tying up workers for too long. By default
all work is bounded, unless otherwise specified in the work item.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c5def4ab

10 11月, 2019 1 次提交

io_uring: add support for backlogged CQ ring · 1d7bb1d5

由 Jens Axboe 提交于 11月 06, 2019

Currently we drop completion events, if the CQ ring is full. That's fine
for requests with bounded completion times, but it may make it harder or
impossible to use io_uring with networked IO where request completion
times are generally unbounded. Or with POLL, for example, which is also
unbounded.

After this patch, we never overflow the ring, we simply store requests
in a backlog for later flushing. This flushing is done automatically by
the kernel. To prevent the backlog from growing indefinitely, if the
backlog is non-empty, we apply back pressure on IO submissions. Any
attempt to submit new IO with a non-empty backlog will get an -EBUSY
return from the kernel. This is a signal to the application that it has
backlogged CQ events, and that it must reap those before being allowed
to submit more IO.

Note that if we do return -EBUSY, we will have filled whatever
backlogged events into the CQ ring first, if there's room. This means
the application can safely reap events WITHOUT entering the kernel and
waiting for them, they are already available in the CQ ring.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

1d7bb1d5

08 11月, 2019 4 次提交

io_uring: pass in io_kiocb to fill/add CQ handlers · 78e19bbe

由 Jens Axboe 提交于 11月 06, 2019

This is in preparation for handling CQ ring overflow a bit smarter. We
should not have any functional changes in this patch. Most of the
changes are fairly straight forward, the only ones that stick out a bit
are the ones that change __io_free_req() to take the reference count
into account. If the request hasn't been submitted yet, we know it's
safe to simply ignore references and free it. But let's clean these up
too, as later patches will depend on the caller doing the right thing if
the completion logging grabs a reference to the request.
Reviewed-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

78e19bbe

io_uring: make io_cqring_events() take 'ctx' as argument · 84f97dc2

由 Jens Axboe 提交于 11月 06, 2019

The rings can be derived from the ctx, and we need the ctx there for
a future change.

No functional changes in this patch.
Reviewed-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

84f97dc2

io_uring: add support for linked SQE timeouts · 2665abfd

由 Jens Axboe 提交于 11月 05, 2019

While we have support for generic timeouts, we don't have a way to tie
a timeout to a specific SQE. The generic timeouts simply trigger wakeups
on the CQ ring.

This adds support for IORING_OP_LINK_TIMEOUT. This command is only valid
as a link to a previous command. The timeout specific can be either
relative or absolute, following the same rules as IORING_OP_TIMEOUT. If
the timeout triggers before the dependent command completes, it will
attempt to cancel that command. Likewise, if the dependent command
completes before the timeout triggers, it will cancel the timeout.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

2665abfd

io_uring: abstract out io_async_cancel_one() helper · e977d6d3

由 Jens Axboe 提交于 11月 05, 2019

We're going to need this helper in a future patch, so move it out
of io_async_cancel() and into its own separate function.

No functional changes in this patch.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e977d6d3

07 11月, 2019 5 次提交

io_uring: use inlined struct sqe_submit · 267bc904

由 Pavel Begunkov 提交于 11月 07, 2019

req->submit is always up-to-date, use it directly
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

267bc904

io_uring: Use submit info inlined into req · 50585b9a

由 Pavel Begunkov 提交于 11月 07, 2019

Stack allocated struct sqe_submit is passed down to the submission path
along with a request (a.k.a. struct io_kiocb), and will be copied into
req->submit for async requests.

As space for it is already allocated, fill req->submit in the first
place instead of using on-stack one. As a result:

1. sqe->submit is the only place for sqe_submit and is always valid,
so we don't need to track which one to use.
2. don't need to copy in case of async
3. allows to simplify the code by not carrying it as an argument all
the way down
4. allows to reduce number of function arguments / potentially improve
spilling

The downside is that stack is most probably be cached, that's not true
for just allocated memory for a request. Another concern is cache
pollution. Though, a request would be touched and fetched along with
req->submit at some point anyway, so shouldn't be a problem.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

50585b9a

io_uring: allocate io_kiocb upfront · 196be95c

由 Pavel Begunkov 提交于 11月 07, 2019

Let io_submit_sqes() to allocate io_kiocb before fetching an sqe.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

196be95c

io_uring: io_queue_link*() right after submit · e5eb6366

由 Pavel Begunkov 提交于 11月 06, 2019

After a call to io_submit_sqe(), it's already known whether it needs
to queue a link or not. Do it there, as it's simplier and doesn't keep
an extra variable across the loop.

Reviewed-by：Bob Liu <bob.liu@oracle.com>
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e5eb6366

io_uring: Merge io_submit_sqes and io_ring_submit · ae9428ca

由 Pavel Begunkov 提交于 11月 06, 2019

io_submit_sqes() and io_ring_submit() are doing the same stuff with
a little difference. Deduplicate them.

Reviewed-by：Bob Liu <bob.liu@oracle.com>
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ae9428ca

06 11月, 2019 3 次提交

io_uring: kill dead REQ_F_LINK_DONE flag · 3aa5fa03

由 Jens Axboe 提交于 11月 05, 2019

We had no more use for this flag after the conversion to io-wq, kill it
off.

Fixes: 561fb04a ("io_uring: replace workqueue usage with io-wq")
Signed-off-by: NJens Axboe <axboe@kernel.dk>

3aa5fa03

io_uring: fixup a few spots where link failure isn't flagged · f1f40853

由 Jens Axboe 提交于 11月 05, 2019

If a request fails, we need to ensure we set REQ_F_FAIL_LINK on it if
REQ_F_LINK is set. Any failure in the chain should break the chain.

We were missing a few spots where this should be done. It might be nice
to generalize this somewhat at some point, as long as we factor in the
fact that failure looks different for each request type.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f1f40853

io_uring: enable optimized link handling for IORING_OP_POLL_ADD · 89723d0b

由 Jens Axboe 提交于 11月 05, 2019

As introduced by commit:

ba816ad6 ("io_uring: run dependent links inline if possible")

enable inline dependent link running for poll commands.
io_poll_complete_work() is the most important change, as it allows a
linked sequence of { POLL, READ } (for example) to proceed inline
instead of needing to get punted to another async context. The
submission side only potentially matters for sqthread, but may as well
include that bit.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

89723d0b

04 11月, 2019 1 次提交

io_uring: add completion trace event · 51c3ff62

由 Jens Axboe 提交于 11月 03, 2019

We currently don't have a completion event trace, add one of those. And
to better be able to match up submissions and completions, add user_data
to the submission trace as well.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

51c3ff62

01 11月, 2019 2 次提交

io_uring: set -EINTR directly when a signal wakes up in io_cqring_wait · e9ffa5c2

由 Jackie Liu 提交于 10月 29, 2019

We didn't use -ERESTARTSYS to tell the application layer to restart the
system call, but instead return -EINTR. we can set -EINTR directly when
wakeup by the signal, which can help us save an assignment operation and
comparison operation.
Reviewed-by: NBob Liu <bob.liu@oracle.com>
Signed-off-by: NJackie Liu <liuyun01@kylinos.cn>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e9ffa5c2

io_uring: support for generic async request cancel · 62755e35

由 Jens Axboe 提交于 10月 28, 2019

This adds support for IORING_OP_ASYNC_CANCEL, which will attempt to
cancel requests that have been punted to async context and are now
in-flight. This works for regular read/write requests to files, as
long as they haven't been started yet. For socket based IO (or things
like accept4(2)), we can cancel work that is already running as well.

To cancel a request, the sqe must have ->addr set to the user_data of
the request it wishes to cancel. If the request is cancelled
successfully, the original request is completed with -ECANCELED
and the cancel request is completed with a result of 0. If the
request was already running, the original may or may not complete
in error. The cancel request will complete with -EALREADY for that
case. And finally, if the request to cancel wasn't found, the cancel
request is completed with -ENOENT.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

62755e35

30 10月, 2019 10 次提交

io_uring: io_wq_create() returns an error pointer, not NULL · 975c99a5

由 Jens Axboe 提交于 10月 30, 2019

syzbot reported an issue where we crash at setup time if failslab is
used. The issue is that io_wq_create() returns an error pointer on
failure, not NULL. Hence io_uring thought the io-wq was setup just
fine, but in reality it's a garbage error pointer.

Use IS_ERR() instead of a NULL check, and assign ret appropriately.

Reported-by: syzbot+221cc24572a2fed23b6b@syzkaller.appspotmail.com
Fixes: 561fb04a ("io_uring: replace workqueue usage with io-wq")
Signed-off-by: NJens Axboe <axboe@kernel.dk>

975c99a5

io_uring: fix race with canceling timeouts · 842f9612

由 Jens Axboe 提交于 10月 29, 2019

If we get -1 from hrtimer_try_to_cancel(), we know that the timer
is running. Hence leave all completion to the timeout handler. If
we don't, we can corrupt the list and miss a completion.

Fixes: 11365043 ("io_uring: add support for canceling timeout requests")
Reported-by: NHrvoje Zeba <zeba.hrvoje@gmail.com>
Tested-by: NHrvoje Zeba <zeba.hrvoje@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

842f9612

io_uring: support for larger fixed file sets · 65e19f54

由 Jens Axboe 提交于 10月 26, 2019

There's been a few requests for supporting more fixed files than 1024.
This isn't really tricky to do, we just need to split up the file table
into multiple tables and index appropriately. As we do so, reduce the
max single file table to 512. This enables us to do single page allocs
always for the tables, which is an improvement over the situation prior.

This patch adds support for up to 64K files, which should be enough for
everyone.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

65e19f54

io_uring: protect fixed file indexing with array_index_nospec() · b7620121

由 Jens Axboe 提交于 10月 26, 2019

We index the file tables with a user given value. After we check
it's within our limits, use array_index_nospec() to prevent any
spectre attacks here.
Suggested-by: NJann Horn <jannh@google.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b7620121

io_uring: add support for IORING_OP_ACCEPT · 17f2fe35

由 Jens Axboe 提交于 10月 17, 2019

This allows an application to call accept4() in an async fashion. Like
other opcodes, we first try a non-blocking accept, then punt to async
context if we have to.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

17f2fe35

io_uring: io_uring: add support for async work inheriting files · fcb323cc

由 Jens Axboe 提交于 10月 24, 2019

This is in preparation for adding opcodes that need to add new files
in a process file table, system calls like open(2) or accept4(2).

If an opcode needs this, it must set IO_WQ_WORK_NEEDS_FILES in the work
item. If work that needs to get punted to async context have this
set, the async worker will assume the original task file table before
executing the work.

Note that opcodes that need access to the current files of an
application cannot be done through IORING_SETUP_SQPOLL.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

fcb323cc

io_uring: replace workqueue usage with io-wq · 561fb04a

由 Jens Axboe 提交于 10月 24, 2019

Drop various work-arounds we have for workqueues:

- We no longer need the async_list for tracking sequential IO.

- We don't have to maintain our own mm tracking/setting.

- We don't need a separate workqueue for buffered writes. This didn't
  even work that well to begin with, as it was suboptimal for multiple
  buffered writers on multiple files.

- We can properly cancel pending interruptible work. This fixes
  deadlocks with particularly socket IO, where we cannot cancel them
  when the io_uring is closed. Hence the ring will wait forever for
  these requests to complete, which may never happen. This is different
  from disk IO where we know requests will complete in a finite amount
  of time.

- Due to being able to cancel work interruptible work that is already
  running, we can implement file table support for work. We need that
  for supporting system calls that add to a process file table.

- It gets us one step closer to adding async support for any system
  call.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

561fb04a

io_uring: Fix mm_fault with READ/WRITE_FIXED · 95a1b3ff

由 Pavel Begunkov 提交于 10月 27, 2019

Commit fb5ccc98 ("io_uring: Fix broken links with offloading")
introduced a potential performance regression with unconditionally
taking mm even for READ/WRITE_FIXED operations.

Return the logic handling it back. mm-faulted requests will go through
the generic submission path, so honoring links and drains, but will
fail further on req->has_user check.

Fixes: fb5ccc98 ("io_uring: Fix broken links with offloading")
Cc: stable@vger.kernel.org # v5.4
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

95a1b3ff

io_uring: remove index from sqe_submit · fa456228

由 Pavel Begunkov 提交于 10月 27, 2019

submit->index is used only for inbound check in submission path (i.e.
head < ctx->sq_entries). However, it always will be true, as
1. it's already validated by io_get_sqring()
2. ctx->sq_entries can't be changedd in between, because of held
ctx->uring_lock and ctx->refs.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

fa456228

io_uring: add set of tracing events · c826bd7a

由 Dmitrii Dolgov 提交于 10月 15, 2019

To trace io_uring activity one can get an information from workqueue and
io trace events, but looks like some parts could be hard to identify via
this approach. Making what happens inside io_uring more transparent is
important to be able to reason about many aspects of it, hence introduce
the set of tracing events.

All such events could be roughly divided into two categories:

* those, that are helping to understand correctness (from both kernel
  and an application point of view). E.g. a ring creation, file
  registration, or waiting for available CQE. Proposed approach is to
  get a pointer to an original structure of interest (ring context, or
  request), and then find relevant events. io_uring_queue_async_work
  also exposes a pointer to work_struct, to be able to track down
  corresponding workqueue events.

* those, that provide performance related information. Mostly it's about
  events that change the flow of requests, e.g. whether an async work
  was queued, or delayed due to some dependencies. Another important
  case is how io_uring optimizations (e.g. registered files) are
  utilized.
Signed-off-by: NDmitrii Dolgov <9erthalion6@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c826bd7a

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功