提交 · a566c5562d41b99f11c8224b2a3010e60ad93acf · openeuler / Kernel

14 6月, 2021 9 次提交

io_uring: remove dependency on ring->sq/cq_entries · a566c556

由 Pavel Begunkov 提交于 5月 16, 2021

We have numbers of {sq,cq} entries cached in ctx, don't look up them in
user-shared rings as 1) it may fetch additional cacheline 2) user may
change it and so it's always error prone.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/745d31bc2da41283ddd0489ef784af5c8d6310e9.1621201931.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

a566c556

io_uring: better locality for rsrc fields · b13a8918

由 Pavel Begunkov 提交于 5月 16, 2021

ring has two types of resource-related fields: used for request
submission, and field needed for update/registration. Reshuffle them
into these two groups for better locality and readability. The second
group is not in the hot path, so it's natural to place them somewhere in
the end. Also update an outdated comment.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/05b34795bb4440f4ec4510f08abd5a31830f8ca0.1621201931.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

b13a8918

io_uring: shuffle rarely used ctx fields · b986af7e

由 Pavel Begunkov 提交于 5月 16, 2021

There is a bunch of scattered around ctx fields that are almost never
used, e.g. only on ring exit, plunge them to the end, better locality,
better aesthetically.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/782ff94b00355923eae757d58b1a47821b5b46d4.1621201931.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

b986af7e

io_uring: make fail flag not link specific · 93d2bcd2

由 Pavel Begunkov 提交于 5月 16, 2021

The main difference is in req_set_fail_links() renamed into
req_set_fail(), which now sets REQ_F_FAIL_LINK/REQ_F_FAIL flag
unconditional on whether it has been a link or not. It only matters in
io_disarm_next(), which already handles it well, and all calls to it
have a fast path checking REQ_F_LINK/HARDLINK.

It looks cleaner, and sheds binary size
text data bss dec hex filename
84235 12390 8 96633 17979 ./fs/io_uring.o
84151 12414 8 96573 1793d ./fs/io_uring.o
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/e2224154dd6e53b665ac835d29436b177872fa10.1621201931.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

93d2bcd2

io_uring: get rid of files in exit cancel · 3dd0c97a

由 Pavel Begunkov 提交于 5月 16, 2021

We don't match against files on cancellation anymore, so no need to drag
around files_struct anymore, just pass a flag telling whether only
inflight or all requests should be killed.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/7bfc5409a78f8e2d6b27dec3293ec2d248677348.1621201931.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

3dd0c97a

io_uring: simplify waking sqo_sq_wait · acfb381d

由 Pavel Begunkov 提交于 5月 16, 2021

Going through submission in __io_sq_thread() and still having a full SQ
is rather unexpected, so remove a check for SQ fullness and just wake up
whoever wait on sqo_sq_wait. Also skip if it doesn't do submission in
the first place, likely may to happen for SQPOLL sharing and/or IOPOLL.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/e2e91751e87b1a39f8d63ef884aaff578123f61e.1621201931.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

acfb381d

io_uring: remove unused park_task_work · 21f2fc08

由 Pavel Begunkov 提交于 5月 16, 2021

As sqpoll cancel via task_work is killed, remove everything related to
park_task_work as it's not used anymore.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/310d8b76a2fbbf3e139373500e04ad9af7ee3dbb.1621201931.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

21f2fc08

io_uring: improve sq_thread waiting check · aaa9f0f4

由 Pavel Begunkov 提交于 5月 16, 2021

If SQPOLL task finds a ring requesting it to continue running, no need
to set wake flag to rest of the rings as it will be cleared in a moment
anyway, so hide it in a single sqd->ctx_list loop.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/1ee5a696d9fd08645994c58ee147d149a8957d94.1621201931.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

aaa9f0f4

io_uring: improve sqpoll event/state handling · e4b6d902

由 Pavel Begunkov 提交于 5月 16, 2021

As sqd->state changes rarely, don't check every event one by one but
look them all at once. Add a helper function. Also don't go into event
waiting sleeping with STOP flag set.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/645025f95c7eeec97f88ff497785f4f1d6f3966f.1621201931.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

e4b6d902

11 6月, 2021 2 次提交

io_uring: add feature flag for rsrc tags · 9690557e

由 Pavel Begunkov 提交于 6月 10, 2021

Add IORING_FEAT_RSRC_TAGS indicating that io_uring supports a bunch of
new IORING_REGISTER operations, in particular
IORING_REGISTER_[FILES[,UPDATE]2,BUFFERS[2,UPDATE]] that support rsrc
tagging, and also indicating implemented dynamic fixed buffer updates.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/9b995d4045b6c6b4ab7510ca124fd25ac2203af7.1623339162.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

9690557e

io_uring: change registration/upd/rsrc tagging ABI · 992da01a

由 Pavel Begunkov 提交于 6月 10, 2021

There are ABI moments about recently added rsrc registration/update and
tagging that might become a nuisance in the future. First,
IORING_REGISTER_RSRC[_UPD] hide different types of resources under it,
so breaks fine control over them by restrictions. It works for now, but
once those are wanted under restrictions it would require a rework.

It was also inconvenient trying to fit a new resource not supporting
all the features (e.g. dynamic update) into the interface, so better
to return to IORING_REGISTER_* top level dispatching.

Second, register/update were considered to accept a type of resource,
however that's not a good idea because there might be several ways of
registration of a single resource type, e.g. we may want to add
non-contig buffers or anything more exquisite as dma mapped memory.
So, remove IORING_RSRC_[FILE,BUFFER] out of the ABI, and place them
internally for now to limit changes.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/9b554897a7c17ad6e3becc48dfed2f7af9f423d5.1623339162.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

992da01a

30 5月, 2021 1 次提交

io_uring: fix misaccounting fix buf pinned pages · 216e5835

由 Pavel Begunkov 提交于 5月 29, 2021

As Andres reports "... io_sqe_buffer_register() doesn't initialize imu.
io_buffer_account_pin() does imu->acct_pages++, before calling
io_account_mem(ctx, imu->acct_pages).", leading to evevntual -ENOMEM.

Initialise the field.
Reported-by: NAndres Freund <andres@anarazel.de>
Fixes: 41edf1a5 ("io_uring: keep table of pointers to ubufs")
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/438a6f46739ae5e05d9c75a0c8fa235320ff367c.1622285901.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

216e5835

27 5月, 2021 1 次提交

io_uring: fix data race to avoid potential NULL-deref · b16ef427

由 Marco Elver 提交于 5月 27, 2021

Commit ba5ef6dc ("io_uring: fortify tctx/io_wq cleanup") introduced
setting tctx->io_wq to NULL a bit earlier. This has caused KCSAN to
detect a data race between accesses to tctx->io_wq:

  write to 0xffff88811d8df330 of 8 bytes by task 3709 on cpu 1:
   io_uring_clean_tctx                  fs/io_uring.c:9042 [inline]
   __io_uring_cancel                    fs/io_uring.c:9136
   io_uring_files_cancel                include/linux/io_uring.h:16 [inline]
   do_exit                              kernel/exit.c:781
   do_group_exit                        kernel/exit.c:923
   get_signal                           kernel/signal.c:2835
   arch_do_signal_or_restart            arch/x86/kernel/signal.c:789
   handle_signal_work                   kernel/entry/common.c:147 [inline]
   exit_to_user_mode_loop               kernel/entry/common.c:171 [inline]
   ...
  read to 0xffff88811d8df330 of 8 bytes by task 6412 on cpu 0:
   io_uring_try_cancel_iowq             fs/io_uring.c:8911 [inline]
   io_uring_try_cancel_requests         fs/io_uring.c:8933
   io_ring_exit_work                    fs/io_uring.c:8736
   process_one_work                     kernel/workqueue.c:2276
   ...

With the config used, KCSAN only reports data races with value changes:
this implies that in the case here we also know that tctx->io_wq was
non-NULL. Therefore, depending on interleaving, we may end up with:

              [CPU 0]                 |        [CPU 1]
  io_uring_try_cancel_iowq()          | io_uring_clean_tctx()
    if (!tctx->io_wq) // false        |   ...
    ...                               |   tctx->io_wq = NULL
    io_wq_cancel_cb(tctx->io_wq, ...) |   ...
      -> NULL-deref                   |

Note: It is likely that thus far we've gotten lucky and the compiler
optimizes the double-read into a single read into a register -- but this
is never guaranteed, and can easily change with a different config!

Fix the data race by restoring the previous behaviour, where both
setting io_wq to NULL and put of the wq are _serialized_ after
concurrent io_uring_try_cancel_iowq() via acquisition of the uring_lock
and removal of the node in io_uring_del_task_file().

Fixes: ba5ef6dc ("io_uring: fortify tctx/io_wq cleanup")
Suggested-by: NPavel Begunkov <asml.silence@gmail.com>
Reported-by: syzbot+bf2b3d0435b9b728946c@syzkaller.appspotmail.com
Signed-off-by: NMarco Elver <elver@google.com>
Cc: Jens Axboe <axboe@kernel.dk>
Link: https://lore.kernel.org/r/20210527092547.2656514-1-elver@google.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

b16ef427

26 5月, 2021 1 次提交

io_uring/io-wq: close io-wq full-stop gap · 17a91051

由 Pavel Begunkov 提交于 5月 23, 2021

There is an old problem with io-wq cancellation where requests should be
killed and are in io-wq but are not discoverable, e.g. in @next_hashed
or @linked vars of io_worker_handle_work(). It adds some unreliability
to individual request canellation, but also may potentially get
__io_uring_cancel() stuck. For instance:

1) An __io_uring_cancel()'s cancellation round have not found any
   request but there are some as desribed.
2) __io_uring_cancel() goes to sleep
3) Then workers wake up and try to execute those hidden requests
   that happen to be unbound.

As we already cancel all requests of io-wq there, set IO_WQ_BIT_EXIT
in advance, so preventing 3) from executing unbound requests. The
workers will initially break looping because of getting a signal as they
are threads of the dying/exec()'ing user task.

Cc: stable@vger.kernel.org
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/abfcf8c54cb9e8f7bfbad7e9a0cc5433cc70bdc2.1621781238.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

17a91051

20 5月, 2021 1 次提交

io_uring: fortify tctx/io_wq cleanup · ba5ef6dc

由 Pavel Begunkov 提交于 5月 20, 2021

We don't want anyone poking into tctx->io_wq awhile it's being destroyed
by io_wq_put_and_exit(), and even though it shouldn't even happen, if
buggy would be preferable to get a NULL-deref instead of subtle delayed
failure or UAF.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/827b021de17926fd807610b3e53a5a5fa8530856.1621513214.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

ba5ef6dc

17 5月, 2021 1 次提交

io_uring: don't modify req->poll for rw · 7a274727

由 Pavel Begunkov 提交于 5月 17, 2021

__io_queue_proc() is used by both poll and apoll, so we should not
access req->poll directly but selecting right struct io_poll_iocb
depending on use case.

Reported-and-tested-by: syzbot+a84b8783366ecb1c65d0@syzkaller.appspotmail.com
Fixes: ea6a693d ("io_uring: disable multishot poll for double poll add cases")
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/4a6a1de31142d8e0250fe2dfd4c8923d82a5bbfc.1621251795.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

7a274727

14 5月, 2021 3 次提交

io_uring: increase max number of reg buffers · 489809e2

由 Pavel Begunkov 提交于 5月 14, 2021

Since recent changes instead of storing a large array of struct
io_mapped_ubuf, we store pointers to them, that is 4 times slimmer and
we should not to so worry about restricting max number of registererd
buffer slots, increase the limit 4 times.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/d3dee1da37f46da416aa96a16bf9e5094e10584d.1620990371.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

489809e2

io_uring: further remove sqpoll limits on opcodes · 2d74d042

由 Pavel Begunkov 提交于 5月 14, 2021

There are three types of requests that left disabled for sqpoll, namely
epoll ctx, statx, and resources update. Since SQPOLL task is now closely
mimics a userspace thread, remove the restrictions.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/909b52d70c45636d8d7897582474ea5aab5eed34.1620990306.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

2d74d042

io_uring: fix ltout double free on completion race · 447c19f3

由 Pavel Begunkov 提交于 5月 14, 2021

Always remove linked timeout on io_link_timeout_fn() from the master
request link list, otherwise we may get use-after-free when first
io_link_timeout_fn() puts linked timeout in the fail path, and then
will be found and put on master's free.

Cc: stable@vger.kernel.org # 5.10+
Fixes: 90cd7e42 ("io_uring: track link timeout's master explicitly")
Reported-and-tested-by: syzbot+5a864149dd970b546223@syzkaller.appspotmail.com
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/69c46bf6ce37fec4fdcd98f0882e18eb07ce693a.1620990121.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

447c19f3

09 5月, 2021 1 次提交

io_uring: fix link timeout refs · a298232e

由 Pavel Begunkov 提交于 5月 07, 2021

WARNING: CPU: 0 PID: 10242 at lib/refcount.c:28 refcount_warn_saturate+0x15b/0x1a0 lib/refcount.c:28
RIP: 0010:refcount_warn_saturate+0x15b/0x1a0 lib/refcount.c:28
Call Trace:
 __refcount_sub_and_test include/linux/refcount.h:283 [inline]
 __refcount_dec_and_test include/linux/refcount.h:315 [inline]
 refcount_dec_and_test include/linux/refcount.h:333 [inline]
 io_put_req fs/io_uring.c:2140 [inline]
 io_queue_linked_timeout fs/io_uring.c:6300 [inline]
 __io_queue_sqe+0xbef/0xec0 fs/io_uring.c:6354
 io_submit_sqe fs/io_uring.c:6534 [inline]
 io_submit_sqes+0x2bbd/0x7c50 fs/io_uring.c:6660
 __do_sys_io_uring_enter fs/io_uring.c:9240 [inline]
 __se_sys_io_uring_enter+0x256/0x1d60 fs/io_uring.c:9182

io_link_timeout_fn() should put only one reference of the linked timeout
request, however in case of racing with the master request's completion
first io_req_complete() puts one and then io_put_req_deferred() is
called.

Cc: stable@vger.kernel.org # 5.12+
Fixes: 9ae1f8dd ("io_uring: fix inconsistent lock state")
Reported-by: syzbot+a2910119328ce8e7996f@syzkaller.appspotmail.com
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/ff51018ff29de5ffa76f09273ef48cb24c720368.1620417627.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

a298232e

06 5月, 2021 1 次提交

io_uring: truncate lengths larger than MAX_RW_COUNT on provide buffers · d1f82808

由 Thadeu Lima de Souza Cascardo 提交于 5月 05, 2021

Read and write operations are capped to MAX_RW_COUNT. Some read ops rely on
that limit, and that is not guaranteed by the IORING_OP_PROVIDE_BUFFERS.

Truncate those lengths when doing io_add_buffers, so buffer addresses still
use the uncapped length.

Also, take the chance and change struct io_buffer len member to __u32, so
it matches struct io_provide_buffer len member.

This fixes CVE-2021-3491, also reported as ZDI-CAN-13546.

Fixes: ddf0322d ("io_uring: add IORING_OP_PROVIDE_BUFFERS")
Reported-by: Billy Jheng Bing-Jhong (@st424204)
Signed-off-by: NThadeu Lima de Souza Cascardo <cascardo@canonical.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d1f82808

30 4月, 2021 7 次提交

io_uring: Fix memory leak in io_sqe_buffers_register() · bb6659cc

由 Zqiang 提交于 4月 30, 2021

unreferenced object 0xffff8881123bf0a0 (size 32):
comm "syz-executor557", pid 8384, jiffies 4294946143 (age 12.360s)
backtrace:
[<ffffffff81469b71>] kmalloc_node include/linux/slab.h:579 [inline]
[<ffffffff81469b71>] kvmalloc_node+0x61/0xf0 mm/util.c:587
[<ffffffff815f0b3f>] kvmalloc include/linux/mm.h:795 [inline]
[<ffffffff815f0b3f>] kvmalloc_array include/linux/mm.h:813 [inline]
[<ffffffff815f0b3f>] kvcalloc include/linux/mm.h:818 [inline]
[<ffffffff815f0b3f>] io_rsrc_data_alloc+0x4f/0xc0 fs/io_uring.c:7164
[<ffffffff815f26d8>] io_sqe_buffers_register+0x98/0x3d0 fs/io_uring.c:8383
[<ffffffff815f84a7>] __io_uring_register+0xf67/0x18c0 fs/io_uring.c:9986
[<ffffffff81609222>] __do_sys_io_uring_register fs/io_uring.c:10091 [inline]
[<ffffffff81609222>] __se_sys_io_uring_register fs/io_uring.c:10071 [inline]
[<ffffffff81609222>] __x64_sys_io_uring_register+0x112/0x230 fs/io_uring.c:10071
[<ffffffff842f616a>] do_syscall_64+0x3a/0xb0 arch/x86/entry/common.c:47
[<ffffffff84400068>] entry_SYSCALL_64_after_hwframe+0x44/0xae

Fix data->tags memory leak, through io_rsrc_data_free() to release
data memory space.

Reported-by: syzbot+0f32d05d8b6cd8d7ea3e@syzkaller.appspotmail.com
Signed-off-by: NZqiang <qiang.zhang@windriver.com>
Link: https://lore.kernel.org/r/20210430082515.13886-1-qiang.zhang@windriver.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

bb6659cc

io_uring: Fix premature return from loop and memory leak · cf3770e7

由 Colin Ian King 提交于 4月 29, 2021

Currently the -EINVAL error return path is leaking memory allocated
to data. Fix this by not returning immediately but instead setting
the error return variable to -EINVAL and breaking out of the loop.

Kudos to Pavel Begunkov for suggesting a correct fix.
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Reviewed-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/20210429104602.62676-1-colin.king@canonical.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

cf3770e7

io_uring: fix unchecked error in switch_start() · 47b228ce

由 Pavel Begunkov 提交于 4月 29, 2021

io_rsrc_node_switch_start() can fail, don't forget to check returned
error code.

Reported-by: syzbot+a4715dd4b7c866136f79@syzkaller.appspotmail.com
Fixes: eae071c9 ("io_uring: prepare fixed rw for dynanic buffers")
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/c4c06e2f3f0c8e43bd8d0a266c79055bcc6b6e60.1619693112.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

47b228ce

io_uring: allow empty slots for reg buffers · 6224843d

由 Pavel Begunkov 提交于 4月 28, 2021

Allow empty reg buffer slots any request using which should fail. This
allows users to not register all buffers in advance, but do it lazily
and/or on demand via updates. That is achieved by setting iov_base and
iov_len to zero for registration and/or buffer updates. Empty buffer
can't have a non-zero tag.

Implementation details: to not add extra overhead to io_import_fixed(),
create a dummy buffer crafted to fail any request using it, and set it
to all empty buffer slots.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/7e95e4d700082baaf010c648c72ac764c9cc8826.1619611868.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

6224843d

io_uring: add more build check for uapi · b0d658ec

由 Pavel Begunkov 提交于 4月 27, 2021

Add a couple of BUILD_BUG_ON() checking some rsrc uapi structs and SQE
flags.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/ff960df4d5026b9fb5bfd80994b9d3667d3926da.1619536280.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

b0d658ec

io_uring: dont overlap internal and user req flags · dddca226

由 Pavel Begunkov 提交于 4月 27, 2021

CQE flags take one byte that we store in req->flags together with other
REQ_F_* internal flags. CQE flags are copied directly into req and then
verified that requires some handling on failures, e.g. to make sure that
that copy doesn't set some of the internal flags.

Move all internal flags to take bits after the first byte, so we don't
need extra handling and make it safer overall.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/b8b5b02d1ab9d786fcc7db4a3fe86db6b70b8987.1619536280.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

dddca226

io_uring: fix drain with rsrc CQEs · 2840f710

由 Pavel Begunkov 提交于 4月 27, 2021

Resource emitted CQEs are not bound to requests, so fix up counters used
for DRAIN/defer logic.

Fixes: b60c8dce ("io_uring: preparation for rsrc tagging")
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/2b32f5f0a40d5928c3466d028f936e167f0654be.1619536280.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

2840f710

27 4月, 2021 2 次提交

io_uring: maintain drain logic for multishot poll requests · 7b289c38

由 Hao Xu 提交于 4月 13, 2021

Now that we have multishot poll requests, one SQE can emit multiple
CQEs. given below example:
    sqe0(multishot poll)-->sqe1-->sqe2(drain req)
sqe2 is designed to issue after sqe0 and sqe1 completed, but since sqe0
is a multishot poll request, sqe2 may be issued after sqe0's event
triggered twice before sqe1 completed. This isn't what users leverage
drain requests for.
Here the solution is to wait for multishot poll requests fully
completed.
To achieve this, we should reconsider the req_need_defer equation, the
original one is:

    all_sqes(excluding dropped ones) == all_cqes(including dropped ones)

This means we issue a drain request when all the previous submitted
SQEs have generated their CQEs.
Now we should consider multishot requests, we deduct all the multishot
CQEs except the cancellation one, In this way a multishot poll request
behave like a normal request, so:
    all_sqes == all_cqes - multishot_cqes(except cancellations)

Here we introduce cq_extra for it.
Signed-off-by: NHao Xu <haoxu@linux.alibaba.com>
Link: https://lore.kernel.org/r/1618298439-136286-1-git-send-email-haoxu@linux.alibaba.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

7b289c38

io_uring: Check current->io_uring in io_uring_cancel_sqpoll · 6d042ffb

由 Palash Oswal 提交于 4月 27, 2021

syzkaller identified KASAN: null-ptr-deref Write in
io_uring_cancel_sqpoll.

io_uring_cancel_sqpoll is called by io_sq_thread before calling
io_uring_alloc_task_context. This leads to current->io_uring being NULL.
io_uring_cancel_sqpoll should not have to deal with threads where
current->io_uring is NULL.

In order to cast a wider safety net, perform input sanitisation directly
in io_uring_cancel_sqpoll and return for NULL value of current->io_uring.
This is safe since if current->io_uring isn't set, then there's no way
for the task to have submitted any requests.

Reported-by: syzbot+be51ca5a4d97f017cd50@syzkaller.appspotmail.com
Cc: stable@vger.kernel.org
Signed-off-by: NPalash Oswal <hello@oswalpalash.com>
Link: https://lore.kernel.org/r/20210427125148.21816-1-hello@oswalpalash.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

6d042ffb

26 4月, 2021 10 次提交

io_uring: fix NULL reg-buffer · 0b8c0e7c

由 Pavel Begunkov 提交于 4月 26, 2021

io_import_fixed() doesn't expect a registered buffer slot to be NULL and
would fail stumbling on it. We don't allow it, but if during
__io_sqe_buffers_update() rsrc removal succeeds but following register
fails, we'll get such a situation.

Do it atomically and don't remove buffers until we sure that a new one
can be set.

Fixes: 634d00df ("io_uring: add full-fledged dynamic buffers support")
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/830020f9c387acddd51962a3123b5566571b8c6d.1619446608.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

0b8c0e7c

io_uring: simplify SQPOLL cancellations · 9f59a9d8

由 Pavel Begunkov 提交于 4月 25, 2021

All sqpoll rings (even sharing sqpoll task) are currently dead bound
to the task that created them, iow when owner task dies it kills all
its SQPOLL rings and their inflight requests via task_work infra. It's
neither the nicist way nor the most convenient as adds extra
locking/waiting and dependencies.

Leave it alone and rely on SIGKILL being delivered on its thread group
exit, so there are only two cases left:

1) thread group is dying, so sqpoll task gets a signal and exit itself
cancelling all requests.

2) an sqpoll ring is dying. Because refs_kill() is called the sqpoll not
going to submit any new request, and that's what we need. And
io_ring_exit_work() will do all the cancellation itself before
actually killing ctx, so sqpoll doesn't need to worry about it.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/3cd7f166b9c326a2c932b70e71a655b03257b366.1619389911.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

9f59a9d8

io_uring: fix work_exit sqpoll cancellations · 28090c13

由 Pavel Begunkov 提交于 4月 25, 2021

After closing an SQPOLL ring, io_ring_exit_work() kicks in and starts
doing cancellations via io_uring_try_cancel_requests(). It will go
through io_uring_try_cancel_iowq(), which uses ctx->tctx_list, but as
SQPOLL task don't have a ctx note, its io-wq won't be reachable and so
is left not cancelled.

It will eventually cancelled when one of the tasks dies, but if a thread
group survives for long and changes rings, it will spawn lots of
unreclaimed resources and live locked works.

Cancel SQPOLL task's io-wq separately in io_ring_exit_work().

Cc: stable@vger.kernel.org
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/a71a7fe345135d684025bb529d5cb1d8d6b46e10.1619389911.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

28090c13

io_uring: Fix uninitialized variable up.resv · 615cee49

由 Colin Ian King 提交于 4月 26, 2021

The variable up.resv is not initialized and is being checking for a
non-zero value in the call to _io_register_rsrc_update. Fix this by
explicitly setting the variable to 0.

Addresses-Coverity: ("Uninitialized scalar variable)"
Fixes: c3bdad02 ("io_uring: add generic rsrc update with tags")
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Link: https://lore.kernel.org/r/20210426094735.8320-1-colin.king@canonical.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

615cee49

io_uring: fix invalid error check after malloc · a2b4198c

由 Pavel Begunkov 提交于 4月 26, 2021

Now we allocate io_mapped_ubuf instead of bvec, so we clearly have to
check its address after allocation.

Fixes: 41edf1a5 ("io_uring: keep table of pointers to ubufs")
Reported-by: Nkernel test robot <lkp@intel.com>
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/d28eb1bc4384284f69dbce35b9f70c115ff6176f.1619392565.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

a2b4198c

io_uring: io_sq_thread() no longer needs to reset current->pf_io_worker · a2a7cc32

由 Stefan Metzmacher 提交于 4月 25, 2021

This is done by create_io_thread() now.
Signed-off-by: NStefan Metzmacher <metze@samba.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

a2a7cc32

io_uring: update sq_thread_idle after ctx deleted · 2b4ae19c

由 Hao Xu 提交于 4月 24, 2021

we shall update sq_thread_idle anytime we do ctx deletion from ctx_list

Fixes:734551df ("io_uring: fix shared sqpoll cancellation hangs")
Signed-off-by: NHao Xu <haoxu@linux.alibaba.com>
Link: https://lore.kernel.org/r/1619256380-236460-1-git-send-email-haoxu@linux.alibaba.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

2b4ae19c

io_uring: add full-fledged dynamic buffers support · 634d00df

由 Pavel Begunkov 提交于 4月 25, 2021

Hook buffers into all rsrc infrastructure, including tagging and
updates.
Suggested-by: NBijan Mottahedeh <bijan.mottahedeh@oracle.com>
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/119ed51d68a491dae87eb55fb467a47870c86aad.1619356238.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

634d00df

io_uring: implement fixed buffers registration similar to fixed files · bd54b6fe

由 Bijan Mottahedeh 提交于 4月 25, 2021

Apply fixed_rsrc functionality for fixed buffers support.
Signed-off-by: NBijan Mottahedeh <bijan.mottahedeh@oracle.com>
[rebase, remove multi-level tables, fix unregister on exit]
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/17035f4f75319dc92962fce4fc04bc0afb5a68dc.1619356238.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

bd54b6fe

io_uring: prepare fixed rw for dynanic buffers · eae071c9

由 Pavel Begunkov 提交于 4月 25, 2021

With dynamic buffer updates, registered buffers in the table may change
at any moment. First of all we want to prevent future races between
updating and importing (i.e. io_import_fixed()), where the latter one
may happen without uring_lock held, e.g. from io-wq.

Save the first loaded io_mapped_ubuf buffer and reuse.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/21a2302d07766ae956640b6f753292c45200fe8f.1619356238.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

eae071c9

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功