提交 · b7e298d265f20eafc3615be271a3e5d90e4dc3dd · openeuler / Kernel

12 4月, 2021 10 次提交

io_uring: merge defer_prep() and prep_async() · b7e298d2

由 Pavel Begunkov 提交于 2月 28, 2021

Merge two function and do renaming in favour of the second one, it
relays the meaning better.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b7e298d2

io_uring: rethink def->needs_async_data · 26f0505a

由 Pavel Begunkov 提交于 2月 28, 2021

needs_async_data controls allocation of async_data, and used in two
cases. 1) when async setup requires it (by io_req_prep_async() or
handler themselves), and 2) when op always needs additional space to
operate, like timeouts do.

Opcode preps already don't bother about the second case and do
allocation unconditionally, restrict needs_async_data to the first case
only and rename it into needs_async_setup.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
[axboe: update for IOPOLL fix]
Signed-off-by: NJens Axboe <axboe@kernel.dk>

26f0505a

io_uring: untie alloc_async_data and needs_async_data · 6cb78689

由 Pavel Begunkov 提交于 2月 28, 2021

All opcode handlers pretty well know whether they need async data or
not, and can skip testing for needs_async_data. The exception is rw
the generic path, but those test the flag by hand anyway. So, check the
flag and make io_alloc_async_data() allocating unconditionally.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

6cb78689

io_uring: refactor out send/recv async setup · 2e052d44

由 Pavel Begunkov 提交于 2月 28, 2021

IORING_OP_[SEND,RECV] don't need async setup neither will get into
io_req_prep_async(). Remove them from io_req_prep_async() and remove
needs_async_data checks from the related setup functions.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

2e052d44

io_uring: use better types for cflags · 8c3f9cd1

由 Pavel Begunkov 提交于 2月 28, 2021

__io_cqring_fill_event() takes cflags as long to squeeze it into u32 in
an CQE, awhile all users pass int or unsigned. Replace it with unsigned
int and store it as u32 in struct io_completion to match CQE.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

8c3f9cd1

io_uring: refactor provide/remove buffer locking · 9fb8cb49

由 Pavel Begunkov 提交于 2月 28, 2021

Always complete request holding the mutex instead of doing that strange
dancing with conditional ordering.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

9fb8cb49

io_uring: add a helper failing not issued requests · f41db273

由 Pavel Begunkov 提交于 2月 28, 2021

Add a simple helper doing CQE posting, marking request for link-failure,
and putting both submission and completion references.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f41db273

io_uring: further deduplicate file slot selection · dafecf19

由 Pavel Begunkov 提交于 2月 28, 2021

io_fixed_file_slot() and io_file_from_index() behave pretty similarly,
DRY and call one from another.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

dafecf19

io_uring: reuse io_req_task_queue_fail() · 2c4b8eb6

由 Pavel Begunkov 提交于 2月 28, 2021

Use io_req_task_queue_fail() on the fail path of io_req_task_queue().
It's unlikely to happen, so don't care about additional overhead, but
allows to keep all the req->result invariant in a single function.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

2c4b8eb6

io_uring: avoid taking ctx refs for task-cancel · e83acd7d

由 Pavel Begunkov 提交于 2月 28, 2021

Don't bother to take a ctx->refs for io_req_task_cancel() because it
take uring_lock before putting a request, and the context is promised to
stay alive until unlock happens.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e83acd7d

09 4月, 2021 1 次提交

io_uring: fix rw req completion · 97284637

由 Pavel Begunkov 提交于 4月 08, 2021

WARNING: at fs/io_uring.c:8578 io_ring_exit_work.cold+0x0/0x18

As reissuing is now passed back by REQ_F_REISSUE and kiocb_done()
internally uses __io_complete_rw(), it may stop after setting the flag
so leaving a dangling request.

There are tricky edge cases, e.g. reading beyound file, boundary, so
the easiest way is to hand code reissue in kiocb_done() as
__io_complete_rw() was doing for us before.

Fixes: 230d50d4 ("io_uring: move reissue into regular IO path")
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/f602250d292f8a84cca9a01d747744d1e797be26.1617842918.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

97284637

08 4月, 2021 1 次提交

io_uring: clear F_REISSUE right after getting it · 6ad7f233

由 Pavel Begunkov 提交于 4月 08, 2021

There are lots of ways r/w request may continue its path after getting
REQ_F_REISSUE, it's not necessarily io-wq and can be, e.g. apoll,
and submitted via io_async_task_func() -> __io_req_task_submit()

Clear the flag right after getting it, so the next attempt is well
prepared regardless how the request will be executed.

Fixes: 230d50d4 ("io_uring: move reissue into regular IO path")
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/11dcead939343f4e27cab0074d34afcab771bfa4.1617842918.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

6ad7f233

03 4月, 2021 1 次提交

io_uring: fix !CONFIG_BLOCK compilation failure · e82ad485

由 Jens Axboe 提交于 4月 02, 2021

kernel test robot correctly pinpoints a compilation failure if
CONFIG_BLOCK isn't set:

fs/io_uring.c: In function '__io_complete_rw':
>> fs/io_uring.c:2509:48: error: implicit declaration of function 'io_rw_should_reissue'; did you mean 'io_rw_reissue'? [-Werror=implicit-function-declaration]
    2509 |  if ((res == -EAGAIN || res == -EOPNOTSUPP) && io_rw_should_reissue(req)) {
         |                                                ^~~~~~~~~~~~~~~~~~~~
         |                                                io_rw_reissue
    cc1: some warnings being treated as errors

Ensure that we have a stub declaration of io_rw_should_reissue() for
!CONFIG_BLOCK.

Fixes: 230d50d4 ("io_uring: move reissue into regular IO path")
Reported-by: Nkernel test robot <lkp@intel.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e82ad485

02 4月, 2021 1 次提交

io_uring: move reissue into regular IO path · 230d50d4

由 Jens Axboe 提交于 4月 01, 2021

It's non-obvious how retry is done for block backed files, when it happens
off the kiocb done path. It also makes it tricky to deal with the iov_iter
handling.

Just mark the req as needing a reissue, and handling it from the
submission path instead. This makes it directly obvious that we're not
re-importing the iovec from userspace past the submit point, and it means
that we can just reuse our usual -EAGAIN retry path from the read/write
handling.

At some point in the future, we'll gain the ability to always reliably
return -EAGAIN through the stack. A previous attempt on the block side
didn't pan out and got reverted, hence the need to check for this
information out-of-band right now.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

230d50d4

01 4月, 2021 3 次提交

io_uring: fix EIOCBQUEUED iter revert · 07204f21

由 Pavel Begunkov 提交于 4月 01, 2021

iov_iter_revert() is done in completion handlers that happensf before
read/write returns -EIOCBQUEUED, no need to repeat reverting afterwards.
Moreover, even though it may appear being just a no-op, it's actually
races with 1) user forging a new iovec of a different size 2) reissue,
that is done via io-wq continues completely asynchronously.

Fixes: 3e6a0d3c ("io_uring: fix -EAGAIN retry with IOPOLL")
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

07204f21

io_uring/io-wq: protect against sprintf overflow · 696ee88a

由 Pavel Begunkov 提交于 4月 01, 2021

task_pid may be large enough to not fit into the left space of
TASK_COMM_LEN-sized buffers and overflow in sprintf. We not so care
about uniqueness, so replace it with safer snprintf().
Reported-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/1702c6145d7e1c46fbc382f28334c02e1a3d3994.1617267273.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

696ee88a

io_uring: don't mark S_ISBLK async work as unbounded · 4b982bd0

由 Jens Axboe 提交于 4月 01, 2021

S_ISBLK is marked as unbounded work for async preparation, because it
doesn't match S_ISREG. That is incorrect, as any read/write to a block
device is also a bounded operation. Fix it up and ensure that S_ISBLK
isn't marked unbounded.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

4b982bd0

31 3月, 2021 1 次提交

io_uring: drop sqd lock before handling signals for SQPOLL · 82734c5b

由 Jens Axboe 提交于 3月 29, 2021

Don't call into get_signal() with the sqd mutex held, it'll fail if we're
freezing the task and we'll get complaints on locks still being held:

====================================
WARNING: iou-sqp-8386/8387 still has locks held!
5.12.0-rc4-syzkaller #0 Not tainted
------------------------------------
1 lock held by iou-sqp-8386/8387:
 #0: ffff88801e1d2470 (&sqd->lock){+.+.}-{3:3}, at: io_sq_thread+0x24c/0x13a0 fs/io_uring.c:6731

 stack backtrace:
 CPU: 1 PID: 8387 Comm: iou-sqp-8386 Not tainted 5.12.0-rc4-syzkaller #0
 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
 Call Trace:
  __dump_stack lib/dump_stack.c:79 [inline]
  dump_stack+0x141/0x1d7 lib/dump_stack.c:120
  try_to_freeze include/linux/freezer.h:66 [inline]
  get_signal+0x171a/0x2150 kernel/signal.c:2576
  io_sq_thread+0x8d2/0x13a0 fs/io_uring.c:6748

Fold the get_signal() case in with the parking checks, as we need to drop
the lock in both cases, and since we need to be checking for parking when
juggling the lock anyway.

Reported-by: syzbot+796d767eb376810256f5@syzkaller.appspotmail.com
Fixes: dbe1bdbb ("io_uring: handle signals for IO threads like a normal thread")
Signed-off-by: NJens Axboe <axboe@kernel.dk>

82734c5b

29 3月, 2021 2 次提交

io_uring: handle setup-failed ctx in kill_timeouts · 51520426

由 Pavel Begunkov 提交于 3月 29, 2021

general protection fault, probably for non-canonical address
	0xdffffc0000000018: 0000 [#1] KASAN: null-ptr-deref
	in range [0x00000000000000c0-0x00000000000000c7]
RIP: 0010:io_commit_cqring+0x37f/0xc10 fs/io_uring.c:1318
Call Trace:
 io_kill_timeouts+0x2b5/0x320 fs/io_uring.c:8606
 io_ring_ctx_wait_and_kill+0x1da/0x400 fs/io_uring.c:8629
 io_uring_create fs/io_uring.c:9572 [inline]
 io_uring_setup+0x10da/0x2ae0 fs/io_uring.c:9599
 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
 entry_SYSCALL_64_after_hwframe+0x44/0xae

It can get into wait_and_kill() before setting up ctx->rings, and hence
io_commit_cqring() fails. Mimic poll cancel and do it only when we
completed events, there can't be any requests if it failed before
initialising rings.

Fixes: 80c4cbdb ("io_uring: do post-completion chore on t-out cancel")
Reported-by: syzbot+0e905eb8228070c457a0@syzkaller.appspotmail.com
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/660261a48f0e7abf260c8e43c87edab3c16736fa.1617014345.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

51520426

io_uring: always go for cancellation spin on exec · 5a978dcf

由 Pavel Begunkov 提交于 3月 27, 2021

Always try to do cancellation in __io_uring_task_cancel() at least once,
so it actually goes and cleans its sqpoll tasks (i.e. via
io_sqpoll_cancel_sync()), otherwise sqpoll task may submit new requests
after cancellation and it's racy for many reasons.

Fixes: 521d6a73 ("io_uring: cancel sqpoll via task_work")
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/0a21bd6d794bb1629bc906dd57a57b2c2985a8ac.1616839147.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

5a978dcf

28 3月, 2021 6 次提交

io_uring: remove unsued assignment to pointer io · 2b8ed1c9

由 Colin Ian King 提交于 3月 26, 2021

There is an assignment to io that is never read after the assignment,
the assignment is redundant and can be removed.
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

2b8ed1c9

io_uring: don't cancel extra on files match · 78d9d7c2

由 Pavel Begunkov 提交于 3月 25, 2021

As tasks always wait and kill their io-wq on exec/exit, files are of no
more concern to us, so we don't need to specifically cancel them by hand
in those cases. Moreover we should not, because io_match_task() looks at
req->task->files now, which is always true and so leads to extra
cancellations, that wasn't a case before per-task io-wq.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/0566c1de9b9dd417f5de345c817ca953580e0e2e.1616696997.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

78d9d7c2

io_uring: don't cancel-track common timeouts · 2482b58f

由 Pavel Begunkov 提交于 3月 25, 2021

Don't account usual timeouts (i.e. not linked) as REQ_F_INFLIGHT but
keep behaviour prior to dd59a3d5 ("io_uring: reliably cancel linked
timeouts").
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/104441ef5d97e3932113d44501fda0df88656b83.1616696997.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

2482b58f

io_uring: do post-completion chore on t-out cancel · 80c4cbdb

由 Pavel Begunkov 提交于 3月 25, 2021

Don't forget about io_commit_cqring() + io_cqring_ev_posted() after
exit/exec cancelling timeouts. Both functions declared only after
io_kill_timeouts(), so to avoid tons of forward declarations move
it down.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/72ace588772c0f14834a6a4185d56c445a366fb4.1616696997.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

80c4cbdb

io_uring: fix timeout cancel return code · 1ee4160c

由 Pavel Begunkov 提交于 3月 25, 2021

When we cancel a timeout we should emit a sensible return code, like
-ECANCELED but not 0, otherwise it may trick users.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/7b0ad1065e3bd1994722702bd0ba9e7bc9b0683b.1616696997.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

1ee4160c

io_uring: handle signals for IO threads like a normal thread · dbe1bdbb

由 Jens Axboe 提交于 3月 25, 2021

We go through various hoops to disallow signals for the IO threads, but
there's really no reason why we cannot just allow them. The IO threads
never return to userspace like a normal thread, and hence don't go through
normal signal processing. Instead, just check for a pending signal as part
of the work loop, and call get_signal() to handle it for us if anything
is pending.

With that, we can support receiving signals, including special ones like
SIGSTOP.
Acked-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

dbe1bdbb

26 3月, 2021 1 次提交

io_uring: maintain CQE order of a failed link · 90b87490

由 Pavel Begunkov 提交于 3月 25, 2021

Arguably we want CQEs of linked requests be in a strict order of
submission as it always was. Now if init of a request fails its CQE may
be posted before all prior linked requests including the head of the
link. Fix it by failing it last.

Fixes: de59bc10 ("io_uring: fail links more in io_submit_sqe()")
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/b7a96b05832e7ab23ad55f84092a2548c4a888b0.1616699075.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

90b87490

24 3月, 2021 1 次提交

io_uring: do ctx sqd ejection in a clear context · a185f1db

由 Pavel Begunkov 提交于 3月 23, 2021

WARNING: CPU: 1 PID: 27907 at fs/io_uring.c:7147 io_sq_thread_park+0xb5/0xd0 fs/io_uring.c:7147
CPU: 1 PID: 27907 Comm: iou-sqp-27905 Not tainted 5.12.0-rc4-syzkaller #0
RIP: 0010:io_sq_thread_park+0xb5/0xd0 fs/io_uring.c:7147
Call Trace:
 io_ring_ctx_wait_and_kill+0x214/0x700 fs/io_uring.c:8619
 io_uring_release+0x3e/0x50 fs/io_uring.c:8646
 __fput+0x288/0x920 fs/file_table.c:280
 task_work_run+0xdd/0x1a0 kernel/task_work.c:140
 io_run_task_work fs/io_uring.c:2238 [inline]
 io_run_task_work fs/io_uring.c:2228 [inline]
 io_uring_try_cancel_requests+0x8ec/0xc60 fs/io_uring.c:8770
 io_uring_cancel_sqpoll+0x1cf/0x290 fs/io_uring.c:8974
 io_sqpoll_cancel_cb+0x87/0xb0 fs/io_uring.c:8907
 io_run_task_work_head+0x58/0xb0 fs/io_uring.c:1961
 io_sq_thread+0x3e2/0x18d0 fs/io_uring.c:6763
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294

May happen that last ctx ref is killed in io_uring_cancel_sqpoll(), so
fput callback (i.e. io_uring_release()) is enqueued through task_work,
and run by same cancellation. As it's deeply nested we can't do parking
or taking sqd->lock there, because its state is unclear. So avoid
ctx ejection from sqd list from io_ring_ctx_wait_and_kill() and do it
in a clear context in io_ring_exit_work().

Fixes: f6d54255 ("io_uring: halt SQO submission on ctx exit")
Reported-by: syzbot+e3a3f84f5cecf61f0583@syzkaller.appspotmail.com
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/e90df88b8ff2cabb14a7534601d35d62ab4cb8c7.1616496707.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

a185f1db

22 3月, 2021 4 次提交

io_uring: fix provide_buffers sign extension · d81269fe

由 Pavel Begunkov 提交于 3月 19, 2021

io_provide_buffers_prep()'s "p->len * p->nbufs" to sign extension
problems. Not a huge problem as it's only used for access_ok() and
increases the checked length, but better to keep typing right.
Reported-by: NColin Ian King <colin.king@canonical.com>
Fixes: efe68c1c ("io_uring: validate the full range of provided buffers for access")
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Reviewed-by: NColin Ian King <colin.king@canonical.com>
Link: https://lore.kernel.org/r/562376a39509e260d8532186a06226e56eb1f594.1616149233.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

d81269fe

io_uring: don't skip file_end_write() on reissue · b65c128f

由 Pavel Begunkov 提交于 3月 22, 2021

Don't miss to call kiocb_end_write() from __io_complete_rw() on reissue.
Shouldn't be much of a problem as the function actually does some work
only for ISREG, and NONBLOCK won't be reissued.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/32af9b77c5b874e1bee1a3c46396094bd969e577.1616366969.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

b65c128f

io_uring: correct io_queue_async_work() traces · d07f1e8a

由 Pavel Begunkov 提交于 3月 22, 2021

Request's io-wq work is hashed in io_prep_async_link(), so
as trace_io_uring_queue_async_work() looks at it should follow after
prep has been done.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/709c9f872f4d2e198c7aed9c49019ca7095dd24d.1616366969.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

d07f1e8a

io_uring: don't use {test,clear}_tsk_thread_flag() for current · 0b8cfa97

由 Jens Axboe 提交于 3月 21, 2021

Linus correctly points out that this is both unnecessary and generates
much worse code on some archs as going from current to thread_info is
actually backwards - and obviously just wasteful, since the thread_info
is what we care about.

Since io_uring only operates on current for these operations, just use
test_thread_flag() instead. For io-wq, we can further simplify and use
tracehook_notify_signal() to handle the TIF_NOTIFY_SIGNAL work and clear
the flag. The latter isn't an actual bug right now, but it may very well
be in the future if we place other work items under TIF_NOTIFY_SIGNAL.
Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/io-uring/CAHk-=wgYhNck33YHKZ14mFB5MzTTk8gqXHcfj=RWTAXKwgQJgg@mail.gmail.com/Signed-off-by: NJens Axboe <axboe@kernel.dk>

0b8cfa97

21 3月, 2021 1 次提交

io_uring: call req_set_fail_links() on short send[msg]()/recv[msg]() with MSG_WAITALL · 0031275d

由 Stefan Metzmacher 提交于 3月 20, 2021

Without that it's not safe to use them in a linked combination with
others.

Now combinations like IORING_OP_SENDMSG followed by IORING_OP_SPLICE
should be possible.

We already handle short reads and writes for the following opcodes:

- IORING_OP_READV
- IORING_OP_READ_FIXED
- IORING_OP_READ
- IORING_OP_WRITEV
- IORING_OP_WRITE_FIXED
- IORING_OP_WRITE
- IORING_OP_SPLICE
- IORING_OP_TEE

Now we have it for these as well:

- IORING_OP_SENDMSG
- IORING_OP_SEND
- IORING_OP_RECVMSG
- IORING_OP_RECV

For IORING_OP_RECVMSG we also check for the MSG_TRUNC and MSG_CTRUNC
flags in order to call req_set_fail_links().

There might be applications arround depending on the behavior
that even short send[msg]()/recv[msg]() retuns continue an
IOSQE_IO_LINK chain.

It's very unlikely that such applications pass in MSG_WAITALL,
which is only defined in 'man 2 recvmsg', but not in 'man 2 sendmsg'.

It's expected that the low level sock_sendmsg() call just ignores
MSG_WAITALL, as MSG_ZEROCOPY is also ignored without explicitly set
SO_ZEROCOPY.

We also expect the caller to know about the implicit truncation to
MAX_RW_COUNT, which we don't detect.

cc: netdev@vger.kernel.org
Link: https://lore.kernel.org/r/c4e1a4cc0d905314f4d5dc567e65a7b09621aab3.1615908477.git.metze@samba.orgSigned-off-by: NStefan Metzmacher <metze@samba.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

0031275d

18 3月, 2021 4 次提交

io_uring: don't leak creds on SQO attach error · de75a3d3

由 Pavel Begunkov 提交于 3月 18, 2021

Attaching to already dead/dying SQPOLL task is disallowed in
io_sq_offload_create(), but cleanup is hand coded by calling
io_put_sq_data()/etc., that miss to put ctx->sq_creds.

Defer everything to error-path io_sq_thread_finish(), adding
ctx->sqd_list in the error case as well as finish will handle it.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

de75a3d3

io_uring: use typesafe pointers in io_uring_task · ee53fb2b

由 Stefan Metzmacher 提交于 3月 15, 2021

Signed-off-by: NStefan Metzmacher <metze@samba.org>
Link: https://lore.kernel.org/r/ce2a598e66e48347bb04afbaf2acc67c0cc7971a.1615809009.git.metze@samba.orgSigned-off-by: NJens Axboe <axboe@kernel.dk>

ee53fb2b

io_uring: remove structures from include/linux/io_uring.h · 53e043b2

由 Stefan Metzmacher 提交于 3月 15, 2021

Link: https://lore.kernel.org/r/8c1d14f3748105f4caeda01716d47af2fa41d11c.1615809009.git.metze@samba.orgSigned-off-by: NStefan Metzmacher <metze@samba.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

53e043b2

io_uring: imply MSG_NOSIGNAL for send[msg]()/recv[msg]() calls · 76cd979f

由 Stefan Metzmacher 提交于 3月 16, 2021

We never want to generate any SIGPIPE, -EPIPE only is much better.
Signed-off-by: NStefan Metzmacher <metze@samba.org>
Link: https://lore.kernel.org/r/38961085c3ec49fd21550c7788f214d1ff02d2d4.1615908477.git.metze@samba.orgSigned-off-by: NJens Axboe <axboe@kernel.dk>

76cd979f

15 3月, 2021 3 次提交

io_uring: fix sqpoll cancellation via task_work · b7f5a0bf

由 Pavel Begunkov 提交于 3月 15, 2021

Running sqpoll cancellations via task_work_run() is a bad idea because
it depends on other task works to be run, but those may be locked in
currently running task_work_run() because of how it's (splicing the list
in batches).

Enqueue and run them through a separate callback head, namely
struct io_sq_data::park_task_work. As a nice bonus we now precisely
control where it's run, that's much safer than guessing where it can
happen as it was before.
Reported-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b7f5a0bf

io_uring: add generic callback_head helpers · 9b465711

由 Pavel Begunkov 提交于 3月 15, 2021

We already have helpers to run/add callback_head but taking ctx and
working with ctx->exit_task_work. Extract generic versions of them
implemented in terms of struct callback_head, it will be used later.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

9b465711

io_uring: fix concurrent parking · 9e138a48

由 Pavel Begunkov 提交于 3月 14, 2021

If io_sq_thread_park() of one task got rescheduled right after
set_bit(), before it gets back to mutex_lock() there can happen
park()/unpark() by another task with SQPOLL locking again and
continuing running never seeing that first set_bit(SHOULD_PARK),
so won't even try to put the mutex down for parking.

It will get parked eventually when SQPOLL drops the lock for reschedule,
but may be problematic and will get in the way of further fixes.

Account number of tasks waiting for parking with a new atomic variable
park_pending and adjust SHOULD_PARK accordingly. It doesn't entirely
replaces SHOULD_PARK bit with this atomic var because it's convenient
to have it as a bit in the state and will help to do optimisations
later.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

9e138a48

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功