提交 · 09952e3e7826119ddd4357c453d54bcc7ef25156 · xiphi1978 / linux

20 3月, 2020 2 次提交

io_uring: make sure accept honor rlimit nofile · 09952e3e

由 Jens Axboe 提交于 3月 19, 2020

Just like commit 4022e7af, this fixes the fact that
IORING_OP_ACCEPT ends up using get_unused_fd_flags(), which checks
current->signal->rlim[] for limits.

Add an extra argument to __sys_accept4_file() that allows us to pass
in the proper nofile limit, and grab it at request prep time.
Acked-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

09952e3e

io_uring: make sure openat/openat2 honor rlimit nofile · 4022e7af

由 Jens Axboe 提交于 3月 19, 2020

Dmitry reports that a test case shows that io_uring isn't honoring a
modified rlimit nofile setting. get_unused_fd_flags() checks the task
signal->rlimi[] for the limits. As this isn't easily inheritable,
provide a __get_unused_fd_flags() that takes the value instead. Then we
can grab it when the request is prepared (from the original task), and
pass that in when we do the async part part of the open.
Reported-by: NDmitry Kadashev <dkadashev@gmail.com>
Tested-by: NDmitry Kadashev <dkadashev@gmail.com>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

4022e7af

15 3月, 2020 1 次提交

io_uring: NULL-deref for IOSQE_{ASYNC,DRAIN} · f1d96a8f

由 Pavel Begunkov 提交于 3月 13, 2020

Processing links, io_submit_sqe() prepares requests, drops sqes, and
passes them with sqe=NULL to io_queue_sqe(). There IOSQE_DRAIN and/or
IOSQE_ASYNC requests will go through the same prep, which doesn't expect
sqe=NULL and fail with NULL pointer deference.

Always do full prepare including io_alloc_async_ctx() for linked
requests, and then it can skip the second preparation.

Cc: stable@vger.kernel.org # 5.5
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f1d96a8f

09 3月, 2020 1 次提交

io_uring: ensure RCU callback ordering with rcu_barrier() · 805b13ad

由 Jens Axboe 提交于 3月 08, 2020

After more careful studying, Paul informs me that we cannot rely on
ordering of RCU callbacks in the way that the the tagged commit did.
The current construct looks like this:

	void C(struct rcu_head *rhp)
	{
		do_something(rhp);
		call_rcu(&p->rh, B);
	}

	call_rcu(&p->rh, A);
	call_rcu(&p->rh, C);

and we're relying on ordering between A and B, which isn't guaranteed.
Make this explicit instead, and have a work item issue the rcu_barrier()
to ensure that A has run before we manually execute B.

While thorough testing never showed this issue, it's dependent on the
per-cpu load in terms of RCU callbacks. The updated method simplifies
the code as well, and eliminates the need to maintain an rcu_head in
the fileset data.

Fixes: c1e2148f ("io_uring: free fixed_file_data after RCU grace period")
Reported-by: NPaul E. McKenney <paulmck@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

805b13ad

07 3月, 2020 2 次提交

io_uring: fix lockup with timeouts · f0e20b89

由 Pavel Begunkov 提交于 3月 07, 2020

There is a recipe to deadlock the kernel: submit a timeout sqe with a
linked_timeout (e.g.  test_single_link_timeout_ception() from liburing),
and SIGKILL the process.

Then, io_kill_timeouts() takes @ctx->completion_lock, but the timeout
isn't flagged with REQ_F_COMP_LOCKED, and will try to double grab it
during io_put_free() to cancel the linked timeout. Probably, the same
can happen with another io_kill_timeout() call site, that is
io_commit_cqring().
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f0e20b89

io_uring: free fixed_file_data after RCU grace period · c1e2148f

由 Jens Axboe 提交于 3月 04, 2020

The percpu refcount protects this structure, and we can have an atomic
switch in progress when exiting. This makes it unsafe to just free the
struct normally, and can trigger the following KASAN warning:

BUG: KASAN: use-after-free in percpu_ref_switch_to_atomic_rcu+0xfa/0x1b0
Read of size 1 at addr ffff888181a19a30 by task swapper/0/0

CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.6.0-rc4+ #5747
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
Call Trace:
 <IRQ>
 dump_stack+0x76/0xa0
 print_address_description.constprop.0+0x3b/0x60
 ? percpu_ref_switch_to_atomic_rcu+0xfa/0x1b0
 ? percpu_ref_switch_to_atomic_rcu+0xfa/0x1b0
 __kasan_report.cold+0x1a/0x3d
 ? percpu_ref_switch_to_atomic_rcu+0xfa/0x1b0
 percpu_ref_switch_to_atomic_rcu+0xfa/0x1b0
 rcu_core+0x370/0x830
 ? percpu_ref_exit+0x50/0x50
 ? rcu_note_context_switch+0x7b0/0x7b0
 ? run_rebalance_domains+0x11d/0x140
 __do_softirq+0x10a/0x3e9
 irq_exit+0xd5/0xe0
 smp_apic_timer_interrupt+0x86/0x200
 apic_timer_interrupt+0xf/0x20
 </IRQ>
RIP: 0010:default_idle+0x26/0x1f0

Fix this by punting the final exit and free of the struct to RCU, then
we know that it's safe to do so. Jann suggested the approach of using a
double rcu callback to achieve this. It's important that we do a nested
call_rcu() callback, as otherwise the free could be ordered before the
atomic switch, even if the latter was already queued.

Reported-by: syzbot+e017e49c39ab484ac87a@syzkaller.appspotmail.com
Suggested-by: NJann Horn <jannh@google.com>
Reviewed-by: NPaul E. McKenney <paulmck@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c1e2148f

28 2月, 2020 1 次提交

io_uring: fix 32-bit compatability with sendmsg/recvmsg · d8768362

由 Jens Axboe 提交于 2月 27, 2020

We must set MSG_CMSG_COMPAT if we're in compatability mode, otherwise
the iovec import for these commands will not do the right thing and fail
the command with -EINVAL.

Found by running the test suite compiled as 32-bit.

Cc: stable@vger.kernel.org
Fixes: aa1fa28f ("io_uring: add support for recvmsg()")
Fixes: 0fa03c62 ("io_uring: add support for sendmsg()")
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d8768362

27 2月, 2020 2 次提交

io_uring: define and set show_fdinfo only if procfs is enabled · bebdb65e

由 Tobias Klauser 提交于 2月 26, 2020

Follow the pattern used with other *_show_fdinfo functions and only
define and use io_uring_show_fdinfo and its helper functions if
CONFIG_PROC_FS is set.

Fixes: 87ce955b ("io_uring: add ->show_fdinfo() for the io_uring file descriptor")
Signed-off-by: NTobias Klauser <tklauser@distanz.ch>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

bebdb65e

io_uring: drop file set ref put/get on switch · dd3db2a3

由 Jens Axboe 提交于 2月 26, 2020

Dan reports that he triggered a warning on ring exit doing some testing:

percpu ref (io_file_data_ref_zero) <= 0 (0) after switching to atomic
WARNING: CPU: 3 PID: 0 at lib/percpu-refcount.c:160 percpu_ref_switch_to_atomic_rcu+0xe8/0xf0
Modules linked in:
CPU: 3 PID: 0 Comm: swapper/3 Not tainted 5.6.0-rc3+ #5648
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
RIP: 0010:percpu_ref_switch_to_atomic_rcu+0xe8/0xf0
Code: e7 ff 55 e8 eb d2 80 3d bd 02 d2 00 00 75 8b 48 8b 55 d8 48 c7 c7 e8 70 e6 81 c6 05 a9 02 d2 00 01 48 8b 75 e8 e8 3a d0 c5 ff <0f> 0b e9 69 ff ff ff 90 55 48 89 fd 53 48 89 f3 48 83 ec 28 48 83
RSP: 0018:ffffc90000110ef8 EFLAGS: 00010292
RAX: 0000000000000045 RBX: 7fffffffffffffff RCX: 0000000000000000
RDX: 0000000000000045 RSI: ffffffff825be7a5 RDI: ffffffff825bc32c
RBP: ffff8881b75eac38 R08: 000000042364b941 R09: 0000000000000045
R10: ffffffff825beb40 R11: ffffffff825be78a R12: 0000607e46005aa0
R13: ffff888107dcdd00 R14: 0000000000000000 R15: 0000000000000009
FS:  0000000000000000(0000) GS:ffff8881b9d80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f49e6a5ea20 CR3: 00000001b747c004 CR4: 00000000001606e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <IRQ>
 rcu_core+0x1e4/0x4d0
 __do_softirq+0xdb/0x2f1
 irq_exit+0xa0/0xb0
 smp_apic_timer_interrupt+0x60/0x140
 apic_timer_interrupt+0xf/0x20
 </IRQ>
RIP: 0010:default_idle+0x23/0x170
Code: ff eb ab cc cc cc cc 0f 1f 44 00 00 41 54 55 53 65 8b 2d 10 96 92 7e 0f 1f 44 00 00 e9 07 00 00 00 0f 00 2d 21 d0 51 00 fb f4 <65> 8b 2d f6 95 92 7e 0f 1f 44 00 00 5b 5d 41 5c c3 65 8b 05 e5 95

Turns out that this is due to percpu_ref_switch_to_atomic() only
grabbing a reference to the percpu refcount if it's not already in
atomic mode. io_uring drops a ref and re-gets it when switching back to
percpu mode. We attempt to protect against this with the FFD_F_ATOMIC
bit, but that isn't reliable.

We don't actually need to juggle these refcounts between atomic and
percpu switch, we can just do them when we've switched to atomic mode.
This removes the need for FFD_F_ATOMIC, which wasn't reliable.

Fixes: 05f3fb3c ("io_uring: avoid ring quiesce for fixed file set unregister and update")
Reported-by: NDan Melnic <dmm@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

dd3db2a3

26 2月, 2020 2 次提交

io_uring: import_single_range() returns 0/-ERROR · 3a901598

由 Jens Axboe 提交于 2月 25, 2020

Unlike the other core import helpers, import_single_range() returns 0 on
success, not the length imported. This means that links that depend on
the result of non-vec based IORING_OP_{READ,WRITE} that were added for
5.5 get errored when they should not be.

Fixes: 3a6820f2 ("io_uring: add non-vectored read/write commands")
Signed-off-by: NJens Axboe <axboe@kernel.dk>

3a901598

io_uring: pick up link work on submit reference drop · 2a44f467

由 Jens Axboe 提交于 2月 25, 2020

If work completes inline, then we should pick up a dependent link item
in __io_queue_sqe() as well. If we don't do so, we're forced to go async
with that item, which is suboptimal.

This also fixes an issue with io_put_req_find_next(), which always looks
up the next work item. That should only be done if we're dropping the
last reference to the request, to prevent multiple lookups of the same
work item.

Outside of being a fix, this also enables a good cleanup series for 5.7,
where we never have to pass 'nxt' around or into the work handlers.
Reviewed-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

2a44f467

25 2月, 2020 1 次提交

io_uring: fix poll_list race for SETUP_IOPOLL|SETUP_SQPOLL · bdcd3eab

由 Xiaoguang Wang 提交于 2月 25, 2020

After making ext4 support iopoll method:
  let ext4_file_operations's iopoll method be iomap_dio_iopoll(),
we found fio can easily hang in fio_ioring_getevents() with below fio
job:
    rm -f testfile; sync;
    sudo fio -name=fiotest -filename=testfile -iodepth=128 -thread
-rw=write -ioengine=io_uring  -hipri=1 -sqthread_poll=1 -direct=1
-bs=4k -size=10G -numjobs=8 -runtime=2000 -group_reporting
with IORING_SETUP_SQPOLL and IORING_SETUP_IOPOLL enabled.

There are two issues that results in this hang, one reason is that
when IORING_SETUP_SQPOLL and IORING_SETUP_IOPOLL are enabled, fio
does not use io_uring_enter to get completed events, it relies on
kernel io_sq_thread to poll for completed events.

Another reason is that there is a race: when io_submit_sqes() in
io_sq_thread() submits a batch of sqes, variable 'inflight' will
record the number of submitted reqs, then io_sq_thread will poll for
reqs which have been added to poll_list. But note, if some previous
reqs have been punted to io worker, these reqs will won't be in
poll_list timely. io_sq_thread() will only poll for a part of previous
submitted reqs, and then find poll_list is empty, reset variable
'inflight' to be zero. If app just waits these deferred reqs and does
not wake up io_sq_thread again, then hang happens.

For app that entirely relies on io_sq_thread to poll completed requests,
let io_iopoll_req_issued() wake up io_sq_thread properly when adding new
element to poll_list, and when io_sq_thread prepares to sleep, check
whether poll_list is empty again, if not empty, continue to poll.
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

bdcd3eab

24 2月, 2020 2 次提交

io_uring: fix personality idr leak · 41726c9a

由 Jens Axboe 提交于 2月 23, 2020

We somehow never free the idr, even though we init it for every ctx.
Free it when the rest of the ring data is freed.

Fixes: 071698e1 ("io_uring: allow registering credentials")
Reviewed-by: NStefano Garzarella <sgarzare@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

41726c9a

io_uring: handle multiple personalities in link chains · 193155c8

由 Jens Axboe 提交于 2月 22, 2020

If we have a chain of requests and they don't all use the same
credentials, then the head of the chain will be issued with the
credentails of the tail of the chain.

Ensure __io_queue_sqe() overrides the credentials, if they are different.

Once we do that, we can clean up the creds handling as well, by only
having io_submit_sqe() do the lookup of a personality. It doesn't need
to assign it, since __io_queue_sqe() now always does the right thing.

Fixes: 75c6a039 ("io_uring: support using a registered personality for commands")
Reported-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

193155c8

22 2月, 2020 2 次提交

io_uring: fix __io_iopoll_check deadlock in io_sq_thread · c7849be9

由 Xiaoguang Wang 提交于 2月 22, 2020

Since commit a3a0e43f ("io_uring: don't enter poll loop if we have
CQEs pending"), if we already events pending, we won't enter poll loop.
In case SETUP_IOPOLL and SETUP_SQPOLL are both enabled, if app has
been terminated and don't reap pending events which are already in cq
ring, and there are some reqs in poll_list, io_sq_thread will enter
__io_iopoll_check(), and find pending events, then return, this loop
will never have a chance to exit.

I have seen this issue in fio stress tests, to fix this issue, let
io_sq_thread call io_iopoll_getevents() with argument 'min' being zero,
and remove __io_iopoll_check().

Fixes: a3a0e43f ("io_uring: don't enter poll loop if we have CQEs pending")
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c7849be9

io_uring: prevent sq_thread from spinning when it should stop · 7143b5ac

由 Stefano Garzarella 提交于 2月 21, 2020

This patch drops 'cur_mm' before calling cond_resched(), to prevent
the sq_thread from spinning even when the user process is finished.

Before this patch, if the user process ended without closing the
io_uring fd, the sq_thread continues to spin until the
'sq_thread_idle' timeout ends.

In the worst case where the 'sq_thread_idle' parameter is bigger than
INT_MAX, the sq_thread will spin forever.

Fixes: 6c271ce2 ("io_uring: add submission polling")
Signed-off-by: NStefano Garzarella <sgarzare@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

7143b5ac

19 2月, 2020 2 次提交

io_uring: fix use-after-free by io_cleanup_req() · 929a3af9

由 Pavel Begunkov 提交于 2月 19, 2020

io_cleanup_req() should be called before req->io is freed, and so
shouldn't be after __io_free_req() -> __io_req_aux_free(). Also,
it will be ignored for in io_free_req_many(), which use
__io_req_aux_free().

Place cleanup_req() into __io_req_aux_free().

Fixes: 99bc4c38 ("io_uring: fix iovec leaks")
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

929a3af9

io_uring: remove unnecessary NULL checks · 297a31e3

由 Dan Carpenter 提交于 2月 17, 2020

The "kmsg" pointer can't be NULL and we have already dereferenced it so
a check here would be useless.
Reviewed-by: NStefano Garzarella <sgarzare@redhat.com>
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

297a31e3

17 2月, 2020 1 次提交

io_uring: add missing io_req_cancelled() · 7fbeb95d

由 Pavel Begunkov 提交于 2月 16, 2020

fallocate_finish() is missing cancellation check. Add it.
It's safe to do that, as only flags setup and sqe fields copy are done
before it gets into __io_fallocate().
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

7fbeb95d

14 2月, 2020 1 次提交

io_uring: prune request from overflow list on flush · 2ca10259

由 Jens Axboe 提交于 2月 13, 2020

Carter reported an issue where he could produce a stall on ring exit,
when we're cleaning up requests that match the given file table. For
this particular test case, a combination of a few things caused the
issue:

- The cq ring was overflown
- The request being canceled was in the overflow list

The combination of the above means that the cq overflow list holds a
reference to the request. The request is canceled correctly, but since
the overflow list holds a reference to it, the final put won't happen.
Since the final put doesn't happen, the request remains in the inflight.
Hence we never finish the cancelation flush.

Fix this by removing requests from the overflow list if we're canceling
them.

Cc: stable@vger.kernel.org # 5.5
Reported-by: NCarter Li 李通洲 <carter.li@eoitek.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

2ca10259

10 2月, 2020 2 次提交

io_uring: retain sockaddr_storage across send/recvmsg async punt · b537916c

由 Jens Axboe 提交于 2月 09, 2020

Jonas reports that he sometimes sees -97/-22 error returns from
sendmsg, if it gets punted async. This is due to not retaining the
sockaddr_storage between calls. Include that in the state we copy when
going async.

Cc: stable@vger.kernel.org # 5.3+
Reported-by: NJonas Bonn <jonas@norrbonn.se>
Tested-by: NJonas Bonn <jonas@norrbonn.se>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b537916c

io_uring: cancel pending async work if task exits · 6ab23144

由 Jens Axboe 提交于 2月 08, 2020

Normally we cancel all work we track, but for untracked work we could
leave the async worker behind until that work completes. This is totally
fine, but does leave resources pending after the task is gone until that
work completes.

Cancel work that this task queued up when it goes away.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

6ab23144

09 2月, 2020 11 次提交

io_uring: fix openat/statx's filename leak · 0bdbdd08

由 Pavel Begunkov 提交于 2月 08, 2020

As in the previous patch, make openat*_prep() and statx_prep() handle
double preparation to avoid resource leakage.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

0bdbdd08

io_uring: fix double prep iovec leak · 5f798bea

由 Pavel Begunkov 提交于 2月 08, 2020

Requests may be prepared multiple times with ->io allocated (i.e. async
prepared). Preparation functions don't handle it and forget about
previously allocated resources. This may happen in case of:
- spurious defer_check
- non-head (i.e. async prepared) request executed in sync (via nxt).

Make the handlers check, whether they already allocated resources, which
is true IFF REQ_F_NEED_CLEANUP is set.

Cc: stable@vger.kernel.org # 5.5
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

5f798bea

io_uring: fix async close() with f_op->flush() · a93b3331

由 Pavel Begunkov 提交于 2月 08, 2020

First, io_close() misses filp_close() and io_cqring_add_event(), when
f_op->flush is defined. That's because in this case it will
io_queue_async_work() itself not grabbing files, so the corresponding
chunk in io_close_finish() won't be executed.

Second, when submitted through io_wq_submit_work(), it will do
filp_close() and *_add_event() twice: first inline in io_close(),
and the second one in call to io_close_finish() from io_close().
The second one will also fire, because it was submitted async through
generic path, and so have grabbed files.

And the last nice thing is to remove this weird pilgrimage with checking
work/old_work and casting it to nxt. Just use a helper instead.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

a93b3331

io_uring: allow AT_FDCWD for non-file openat/openat2/statx · 0b5faf6b

由 Jens Axboe 提交于 2月 06, 2020

Don't just check for dirfd == -1, we should allow AT_FDCWD as well for
relative lookups.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

0b5faf6b

io_uring: grab ->fs as part of async preparation · ff002b30

由 Jens Axboe 提交于 2月 07, 2020

This passes it in to io-wq, so it assumes the right fs_struct when
executing async work that may need to do lookups.

Cc: stable@vger.kernel.org # 5.3+
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ff002b30

io_uring: retry raw bdev writes if we hit -EOPNOTSUPP · faac996c

由 Jens Axboe 提交于 2月 07, 2020

For non-blocking issue, we set IOCB_NOWAIT in the kiocb. However, on a
raw block device, this yields an -EOPNOTSUPP return, as non-blocking
writes aren't supported. Turn this -EOPNOTSUPP into -EAGAIN, so we retry
from blocking context with IOCB_NOWAIT cleared.

Cc: stable@vger.kernel.org # 5.5
Signed-off-by: NJens Axboe <axboe@kernel.dk>

faac996c

io_uring: add cleanup for openat()/statx() · 8fef80bf

由 Pavel Begunkov 提交于 2月 07, 2020

openat() and statx() may have allocated ->open.filename, which should be
be put. Add cleanup handlers for them.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

8fef80bf

io_uring: fix iovec leaks · 99bc4c38

由 Pavel Begunkov 提交于 2月 07, 2020

Allocated iovec is freed only in io_{read,write,send,recv)(), and just
leaves it if an error occured. There are plenty of such cases:
- cancellation of non-head requests
- fail grabbing files in __io_queue_sqe()
- set REQ_F_NOWAIT and returning in __io_queue_sqe()

Add REQ_F_NEED_CLEANUP, which will force such requests with custom
allocated resourses go through cleanup handlers on put.

Cc: stable@vger.kernel.org # 5.5
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

99bc4c38

io_uring: remove unused struct io_async_open · e96e9779

由 Pavel Begunkov 提交于 2月 07, 2020

struct io_async_open is unused, remove it.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e96e9779

io_uring: flush overflowed CQ events in the io_uring_poll() · 63e5d81f

由 Stefano Garzarella 提交于 2月 07, 2020

In io_uring_poll() we must flush overflowed CQ events before to
check if there are CQ events available, to avoid missing events.

We call the io_cqring_events() that checks and flushes any overflow
and returns the number of CQ events available.
Signed-off-by: NStefano Garzarella <sgarzare@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

63e5d81f

io_uring: statx/openat/openat2 don't support fixed files · cf3040ca

由 Jens Axboe 提交于 2月 06, 2020

All of these opcodes take a directory file descriptor. We can't easily
support fixed files for these operations, and the use case for that
probably isn't all that clear (or sensible) anyway.

Disable IOSQE_FIXED_FILE for these operations.
Reported-by: NStefan Metzmacher <metze@samba.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

cf3040ca

07 2月, 2020 3 次提交

io_uring: fix deferred req iovec leak · 1e95081c

由 Pavel Begunkov 提交于 2月 06, 2020

After defer, a request will be prepared, that includes allocating iovec
if needed, and then submitted through io_wq_submit_work() but not custom
handler (e.g. io_rw_async()/io_sendrecv_async()). However, it'll leak
iovec, as it's in io-wq and the code goes as follows:

io_read() {
	if (!io_wq_current_is_worker())
		kfree(iovec);
}

Put all deallocation logic in io_{read,write,send,recv}(), which will
leave the memory, if going async with -EAGAIN.

It also fixes a leak after failed io_alloc_async_ctx() in
io_{recv,send}_msg().

Cc: stable@vger.kernel.org # 5.5
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

1e95081c

io_uring: fix 1-bit bitfields to be unsigned · e1d85334

由 Randy Dunlap 提交于 2月 05, 2020

Make bitfields of size 1 bit be unsigned (since there is no room
for the sign bit).
This clears up the sparse warnings:

  CHECK   ../fs/io_uring.c
../fs/io_uring.c:207:50: error: dubious one-bit signed bitfield
../fs/io_uring.c:208:55: error: dubious one-bit signed bitfield
../fs/io_uring.c:209:63: error: dubious one-bit signed bitfield
../fs/io_uring.c:210:54: error: dubious one-bit signed bitfield
../fs/io_uring.c:211:57: error: dubious one-bit signed bitfield

Found by sight and then verified with sparse.

Fixes: 69b3e546 ("io_uring: change io_ring_ctx bool fields into bit fields")
Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: io-uring@vger.kernel.org
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e1d85334

io_uring: get rid of delayed mm check · 1cb1edb2

由 Pavel Begunkov 提交于 2月 06, 2020

Fail fast if can't grab mm, so past that requests always have an mm
when required. This allows us to remove req->user altogether.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

1cb1edb2

05 2月, 2020 2 次提交

io_uring: cleanup fixed file data table references · 2faf852d

由 Jens Axboe 提交于 2月 04, 2020

syzbot reports a use-after-free in io_ring_file_ref_switch() when it
tries to switch back to percpu mode. When we put the final reference to
the table by calling percpu_ref_kill_and_confirm(), we don't want the
zero reference to queue async work for flushing the potentially queued
up items. We currently do a few flush_work(), but they merely paper
around the issue, since the work item may not have been queued yet
depending on the when the percpu-ref callback gets run.

Coming into the file unregister, we know we have the ring quiesced.
io_ring_file_ref_switch() can check for whether or not the ref is dying
or not, and not queue anything async at that point. Once the ref has
been confirmed killed, flush any potential items manually.

Reported-by: syzbot+7caeaea49c2c8a591e3d@syzkaller.appspotmail.com
Fixes: 05f3fb3c ("io_uring: avoid ring quiesce for fixed file set unregister and update")
Signed-off-by: NJens Axboe <axboe@kernel.dk>

2faf852d

io_uring: spin for sq thread to idle on shutdown · df069d80

由 Jens Axboe 提交于 2月 04, 2020

As part of io_uring shutdown, we cancel work that is pending and won't
necessarily complete on its own. That includes requests like poll
commands and timeouts.

If we're using SQPOLL for kernel side submission and we shutdown the
ring immediately after queueing such work, we can race with the sqthread
doing the submission. This means we may miss cancelling some work, which
results in the io_uring shutdown hanging forever.

Cc: stable@vger.kernel.org
Signed-off-by: NJens Axboe <axboe@kernel.dk>

df069d80

04 2月, 2020 2 次提交

io_uring: put the flag changing code in the same spot · 3e577dcd

由 Pavel Begunkov 提交于 2月 01, 2020

Both iocb_flags() and kiocb_set_rw_flags() are inline and modify
kiocb->ki_flags. Place them close, so they can be potentially better
optimised.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

3e577dcd

io_uring: iterate req cache backwards · 6c8a3134

由 Pavel Begunkov 提交于 2月 01, 2020

Grab requests from cache-array from the end, so can get by only
free_reqs.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

6c8a3134