提交 · 1d0254e6b47e73222fd3d6ae95cccbaafe5b3ecf · openeuler / Kernel

27 11月, 2021 1 次提交

io_uring: fix soft lockup when call __io_remove_buffers · 1d0254e6

由 Ye Bin 提交于 11月 22, 2021

I got issue as follows:
[ 567.094140] __io_remove_buffers: [1]start ctx=0xffff8881067bf000 bgid=65533 buf=0xffff8881fefe1680
[  594.360799] watchdog: BUG: soft lockup - CPU#2 stuck for 26s! [kworker/u32:5:108]
[  594.364987] Modules linked in:
[  594.365405] irq event stamp: 604180238
[  594.365906] hardirqs last  enabled at (604180237): [<ffffffff93fec9bd>] _raw_spin_unlock_irqrestore+0x2d/0x50
[  594.367181] hardirqs last disabled at (604180238): [<ffffffff93fbbadb>] sysvec_apic_timer_interrupt+0xb/0xc0
[  594.368420] softirqs last  enabled at (569080666): [<ffffffff94200654>] __do_softirq+0x654/0xa9e
[  594.369551] softirqs last disabled at (569080575): [<ffffffff913e1d6a>] irq_exit_rcu+0x1ca/0x250
[  594.370692] CPU: 2 PID: 108 Comm: kworker/u32:5 Tainted: G            L    5.15.0-next-20211112+ #88
[  594.371891] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_073836-buildvm-ppc64le-16.ppc.fedoraproject.org-3.fc31 04/01/2014
[  594.373604] Workqueue: events_unbound io_ring_exit_work
[  594.374303] RIP: 0010:_raw_spin_unlock_irqrestore+0x33/0x50
[  594.375037] Code: 48 83 c7 18 53 48 89 f3 48 8b 74 24 10 e8 55 f5 55 fd 48 89 ef e8 ed a7 56 fd 80 e7 02 74 06 e8 43 13 7b fd fb bf 01 00 00 00 <e8> f8 78 474
[  594.377433] RSP: 0018:ffff888101587a70 EFLAGS: 00000202
[  594.378120] RAX: 0000000024030f0d RBX: 0000000000000246 RCX: 1ffffffff2f09106
[  594.379053] RDX: 0000000000000000 RSI: ffffffff9449f0e0 RDI: 0000000000000001
[  594.379991] RBP: ffffffff9586cdc0 R08: 0000000000000001 R09: fffffbfff2effcab
[  594.380923] R10: ffffffff977fe557 R11: fffffbfff2effcaa R12: ffff8881b8f3def0
[  594.381858] R13: 0000000000000246 R14: ffff888153a8b070 R15: 0000000000000000
[  594.382787] FS:  0000000000000000(0000) GS:ffff888399c00000(0000) knlGS:0000000000000000
[  594.383851] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  594.384602] CR2: 00007fcbe71d2000 CR3: 00000000b4216000 CR4: 00000000000006e0
[  594.385540] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  594.386474] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  594.387403] Call Trace:
[  594.387738]  <TASK>
[  594.388042]  find_and_remove_object+0x118/0x160
[  594.389321]  delete_object_full+0xc/0x20
[  594.389852]  kfree+0x193/0x470
[  594.390275]  __io_remove_buffers.part.0+0xed/0x147
[  594.390931]  io_ring_ctx_free+0x342/0x6a2
[  594.392159]  io_ring_exit_work+0x41e/0x486
[  594.396419]  process_one_work+0x906/0x15a0
[  594.399185]  worker_thread+0x8b/0xd80
[  594.400259]  kthread+0x3bf/0x4a0
[  594.401847]  ret_from_fork+0x22/0x30
[  594.402343]  </TASK>

Message from syslogd@localhost at Nov 13 09:09:54 ...
kernel:watchdog: BUG: soft lockup - CPU#2 stuck for 26s! [kworker/u32:5:108]
[  596.793660] __io_remove_buffers: [2099199]start ctx=0xffff8881067bf000 bgid=65533 buf=0xffff8881fefe1680

We can reproduce this issue by follow syzkaller log:
r0 = syz_io_uring_setup(0x401, &(0x7f0000000300), &(0x7f0000003000/0x2000)=nil, &(0x7f0000ff8000/0x4000)=nil, &(0x7f0000000280)=<r1=>0x0, &(0x7f0000000380)=<r2=>0x0)
sendmsg$ETHTOOL_MSG_FEATURES_SET(0xffffffffffffffff, &(0x7f0000003080)={0x0, 0x0, &(0x7f0000003040)={&(0x7f0000000040)=ANY=[], 0x18}}, 0x0)
syz_io_uring_submit(r1, r2, &(0x7f0000000240)=@IORING_OP_PROVIDE_BUFFERS={0x1f, 0x5, 0x0, 0x401, 0x1, 0x0, 0x100, 0x0, 0x1, {0xfffd}}, 0x0)
io_uring_enter(r0, 0x3a2d, 0x0, 0x0, 0x0, 0x0)

The reason above issue  is 'buf->list' has 2,100,000 nodes, occupied cpu lead
to soft lockup.
To solve this issue, we need add schedule point when do while loop in
'__io_remove_buffers'.
After add  schedule point we do regression, get follow data.
[  240.141864] __io_remove_buffers: [1]start ctx=0xffff888170603000 bgid=65533 buf=0xffff8881116fcb00
[  268.408260] __io_remove_buffers: [1]start ctx=0xffff8881b92d2000 bgid=65533 buf=0xffff888130c83180
[  275.899234] __io_remove_buffers: [2099199]start ctx=0xffff888170603000 bgid=65533 buf=0xffff8881116fcb00
[  296.741404] __io_remove_buffers: [1]start ctx=0xffff8881b659c000 bgid=65533 buf=0xffff8881010fe380
[  305.090059] __io_remove_buffers: [2099199]start ctx=0xffff8881b92d2000 bgid=65533 buf=0xffff888130c83180
[  325.415746] __io_remove_buffers: [1]start ctx=0xffff8881b92d1000 bgid=65533 buf=0xffff8881a17d8f00
[  333.160318] __io_remove_buffers: [2099199]start ctx=0xffff8881b659c000 bgid=65533 buf=0xffff8881010fe380
...

Fixes:8bab4c09("io_uring: allow conditional reschedule for intensive iterators")
Signed-off-by: NYe Bin <yebin10@huawei.com>
Link: https://lore.kernel.org/r/20211122024737.2198530-1-yebin10@huawei.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

1d0254e6

26 11月, 2021 2 次提交

io_uring: fix link traversal locking · 6af3f48b

由 Pavel Begunkov 提交于 11月 26, 2021

WARNING: inconsistent lock state
5.16.0-rc2-syzkaller #0 Not tainted
inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
ffff888078e11418 (&ctx->timeout_lock
){?.+.}-{2:2}
, at: io_timeout_fn+0x6f/0x360 fs/io_uring.c:5943
{HARDIRQ-ON-W} state was registered at:
  [...]
  spin_unlock_irq include/linux/spinlock.h:399 [inline]
  __io_poll_remove_one fs/io_uring.c:5669 [inline]
  __io_poll_remove_one fs/io_uring.c:5654 [inline]
  io_poll_remove_one+0x236/0x870 fs/io_uring.c:5680
  io_poll_remove_all+0x1af/0x235 fs/io_uring.c:5709
  io_ring_ctx_wait_and_kill+0x1cc/0x322 fs/io_uring.c:9534
  io_uring_release+0x42/0x46 fs/io_uring.c:9554
  __fput+0x286/0x9f0 fs/file_table.c:280
  task_work_run+0xdd/0x1a0 kernel/task_work.c:164
  exit_task_work include/linux/task_work.h:32 [inline]
  do_exit+0xc14/0x2b40 kernel/exit.c:832

674ee8e1 ("io_uring: correct link-list traversal locking") fixed a
data race but introduced a possible deadlock and inconsistentcy in irq
states. E.g.

io_poll_remove_all()
    spin_lock_irq(timeout_lock)
    io_poll_remove_one()
        spin_lock/unlock_irq(poll_lock);
    spin_unlock_irq(timeout_lock)

Another type of problem is freeing a request while holding
->timeout_lock, which may leads to a deadlock in
io_commit_cqring() -> io_flush_timeouts() and other places.

Having 3 nested locks is also too ugly. Add io_match_task_safe(), which
would briefly take and release timeout_lock for race prevention inside,
so the actuall request cancellation / free / etc. code doesn't have it
taken.

Reported-by: syzbot+ff49a3059d49b0ca0eec@syzkaller.appspotmail.com
Reported-by: syzbot+847f02ec20a6609a328b@syzkaller.appspotmail.com
Reported-by: syzbot+3368aadcd30425ceb53b@syzkaller.appspotmail.com
Reported-by: syzbot+51ce8887cdef77c9ac83@syzkaller.appspotmail.com
Reported-by: syzbot+3cb756a49d2f394a9ee3@syzkaller.appspotmail.com
Fixes: 674ee8e1 ("io_uring: correct link-list traversal locking")
Cc: stable@kernel.org # 5.15+
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/397f7ebf3f4171f1abe41f708ac1ecb5766f0b68.1637937097.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

6af3f48b

io_uring: fail cancellation for EXITING tasks · 617a8948

由 Pavel Begunkov 提交于 11月 26, 2021

WARNING: CPU: 1 PID: 20 at fs/io_uring.c:6269 io_try_cancel_userdata+0x3c5/0x640 fs/io_uring.c:6269
CPU: 1 PID: 20 Comm: kworker/1:0 Not tainted 5.16.0-rc1-syzkaller #0
Workqueue: events io_fallback_req_func
RIP: 0010:io_try_cancel_userdata+0x3c5/0x640 fs/io_uring.c:6269
Call Trace:
 <TASK>
 io_req_task_link_timeout+0x6b/0x1e0 fs/io_uring.c:6886
 io_fallback_req_func+0xf9/0x1ae fs/io_uring.c:1334
 process_one_work+0x9b2/0x1690 kernel/workqueue.c:2298
 worker_thread+0x658/0x11f0 kernel/workqueue.c:2445
 kthread+0x405/0x4f0 kernel/kthread.c:327
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
 </TASK>

We need original task's context to do cancellations, so if it's dying
and the callback is executed in a fallback mode, fail the cancellation
attempt.

Fixes: 89b263f6 ("io_uring: run linked timeouts from task_work")
Cc: stable@kernel.org # 5.15+
Reported-by: syzbot+ab0cfe96c2b3cd1c1153@syzkaller.appspotmail.com
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/4c41c5f379c6941ad5a07cd48cb66ed62199cf7e.1637937097.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

617a8948

23 11月, 2021 1 次提交

io_uring: correct link-list traversal locking · 674ee8e1

由 Pavel Begunkov 提交于 11月 23, 2021

As io_remove_next_linked() is now under ->timeout_lock (see
io_link_timeout_fn), we should update locking around io_for_each_link()
and io_match_task() to use the new lock.

Cc: stable@kernel.org # 5.15+
Fixes: 89850fce ("io_uring: run timeouts from task_work")
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/b54541cedf7de59cb5ae36109e58529ca16e66aa.1637631883.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

674ee8e1

17 11月, 2021 1 次提交

io_uring: fix missed comment from *task_file rename · f6f9b278

由 Kamal Mostafa 提交于 11月 16, 2021

Fix comment referring to function "io_uring_del_task_file()", now called
"io_uring_del_tctx_node()".

Fixes: eef51daa ("io_uring: rename function *task_file")
Signed-off-by: NKamal Mostafa <kamal@canonical.com>
Link: https://lore.kernel.org/r/20211116175530.31608-1-kamal@canonical.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

f6f9b278

08 11月, 2021 1 次提交

io_uring: honour zeroes as io-wq worker limits · bad119b9

由 Pavel Begunkov 提交于 11月 08, 2021

When we pass in zero as an io-wq worker number limit it shouldn't
actually change the limits but return the old value, follow that
behaviour with deferred limits setup as well.

Cc: stable@kernel.org # 5.15
Reported-by: NBeld Zhang <beldzhang@gmail.com>
Fixes: e139a1ec ("io_uring: apply max_workers limit to all future users")
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/1b222a92f7a78a24b042763805e891a4cdd4b544.1636384034.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

bad119b9

05 11月, 2021 1 次提交

io_uring: remove dead 'sqe' store · a1957780

由 Jens Axboe 提交于 11月 05, 2021

The kernel test robot correctly identifies that we store sqe twice,
remove the earlier one that is done before validating the index.

Fixes: f75d1183 ("io_uring: harder fdinfo sq/cq ring iterating")
Reported-by: Nkernel test robot <lkp@intel.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

a1957780

03 11月, 2021 1 次提交

io_uring: remove redundant assignment to ret in io_register_iowq_max_workers() · 83956c86

由 Nghia Le 提交于 11月 03, 2021

After the assignment, only exit path with label 'err' uses ret as
return value. However,before exiting through this path with label 'err',
ret is assigned with the return value of io_wq_max_workers(). Hence, the
initial assignment is redundant and can be removed.
Signed-off-by: NNghia Le <nghialm78@gmail.com>
Link: https://lore.kernel.org/r/20211102190521.28291-1-nghialm78@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

83956c86

02 11月, 2021 1 次提交

io_uring: clean up io_queue_sqe_arm_apoll · 9881024a

由 Pavel Begunkov 提交于 11月 02, 2021

The fix for linked timeout unprep got a bit distored with two rebases,
handle linked timeouts for IO_APOLL_READY as with all other cases, i.e.
queue it at the end of the function.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/130b1ea5605bbd81d7b874a95332295799d33b81.1635863773.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

9881024a

29 10月, 2021 1 次提交

io_uring: harder fdinfo sq/cq ring iterating · f75d1183

由 Jens Axboe 提交于 10月 29, 2021

The ring iteration is racy, which isn't necessarily a problem except it
can cause us to iterate the whole thing. That isn't desired or ideal,
and it can lead to excessive runtimes of reading fdinfo.

Cap the iteration at tail - head OR the ring size. While in there, clean
up the ring masking and just dump the raw values along with the masks.
That provides more useful debug info.

Fixes: 83f84356 ("io_uring: add more uring info to fdinfo for debug")
Reported-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f75d1183

27 10月, 2021 1 次提交

io_uring: don't assign write hint in the read path · 3884b83d

由 Jens Axboe 提交于 10月 25, 2021

Move this out of the generic read/write prep path, and place it in the
write specific kiocb setup instead.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

3884b83d

26 10月, 2021 1 次提交

fs: get rid of the res2 iocb->ki_complete argument · 6b19b766

由 Jens Axboe 提交于 10月 21, 2021

The second argument was only used by the USB gadget code, yet everyone
pays the overhead of passing a zero to be passed into aio, where it
ends up being part of the aio res2 value.

Now that everybody is passing in zero, kill off the extra argument.
Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

6b19b766

25 10月, 2021 7 次提交

io_uring: clusterise ki_flags access in rw_prep · fb27274a

由 Pavel Begunkov 提交于 10月 23, 2021

ioprio setup doesn't depend on other fields that are modified in
io_prep_rw() and we can move it down in the function without worrying
about performance. It's useful as it makes iocb->ki_flags
accesses/modifications closer together, so it's more likely the compiler
will cache it in a register and avoid extra reloads.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/8ee98779c06f1b59f6039b1e292db4332efd664b.1634987320.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

fb27274a

io_uring: kill unused param from io_file_supports_nowait · b9a6b8f9

由 Pavel Begunkov 提交于 10月 23, 2021

io_file_supports_nowait() doesn't use rw argument anymore, remove it.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/4bd6709fc573d70c866ea656cb7a7dbe94be8026.1634987320.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

b9a6b8f9

io_uring: clean up timeout async_data allocation · d6a644a7

由 Pavel Begunkov 提交于 10月 23, 2021

opcode prep functions are one of the first things that are called, we
can't have ->async_data allocated at this point and it's certainly a
bug. Reflect this assumption in io_timeout_prep() and add a WARN_ONCE
just in case.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/75a28ca7dbcc5af8b6cd9092819e8384c24dedd4.1634987320.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

d6a644a7

io_uring: don't try io-wq polling if not supported · afb7f56f

由 Pavel Begunkov 提交于 10月 23, 2021

If an opcode doesn't support polling, just let it be executed
synchronously in iowq, otherwise it will do a nonblock attempt just to
fail in io_arm_poll_handler() and return back to blocking execution.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/6401256db01b88f448f15fcd241439cb76f5b940.1634987320.git.asml.silence@gmail.comReviewed-by: NHao Xu <haoxu@linux.alibaba.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

afb7f56f

io_uring: check if opcode needs poll first on arming · 658d0a40

由 Pavel Begunkov 提交于 10月 23, 2021

->pollout or ->pollin are set only for opcodes that need a file, so if
io_arm_poll_handler() tests them first we can be sure that the request
has file set and the ->file check can be removed.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/9adfe4f543d984875e516fce6da35348aab48668.1634987320.git.asml.silence@gmail.comReviewed-by: NHao Xu <haoxu@linux.alibaba.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

658d0a40

io_uring: clean iowq submit work cancellation · d01905db

由 Pavel Begunkov 提交于 10月 23, 2021

If we've got IO_WQ_WORK_CANCEL in io_wq_submit_work(), handle the error
on the same lines as the check instead of having a weird code flow. The
main loop doesn't change but goes one indention left.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/ff4a09cf41f7a22bbb294b6f1faea721e21fe615.1634987320.git.asml.silence@gmail.comReviewed-by: NHao Xu <haoxu@linux.alibaba.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d01905db

io_uring: clean io_wq_submit_work()'s main loop · 255657d2

由 Pavel Begunkov 提交于 10月 23, 2021

Do a bit of cleaning for the main loop of io_wq_submit_work(). Get rid
of switch, just replace it with a single if as we're retrying in both
other cases. Kill issue_sqe label, Get rid of needs_poll nesting and
disambiguate a bit the comment.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/ed12ce0c64e051f9a6b8a37a24f8ea554d299c29.1634987320.git.asml.silence@gmail.comReviewed-by: NHao Xu <haoxu@linux.alibaba.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

255657d2

23 10月, 2021 1 次提交

io_uring: implement async hybrid mode for pollable requests · 90fa0288

由 Hao Xu 提交于 10月 18, 2021

The current logic of requests with IOSQE_ASYNC is first queueing it to
io-worker, then execute it in a synchronous way. For unbound works like
pollable requests(e.g. read/write a socketfd), the io-worker may stuck
there waiting for events for a long time. And thus other works wait in
the list for a long time too.
Let's introduce a new way for unbound works (currently pollable
requests), with this a request will first be queued to io-worker, then
executed in a nonblock try rather than a synchronous way. Failure of
that leads it to arm poll stuff and then the worker can begin to handle
other works.
The detail process of this kind of requests is:

step1: original context:
           queue it to io-worker
step2: io-worker context:
           nonblock try(the old logic is a synchronous try here)
               |
               |--fail--> arm poll
                            |
                            |--(fail/ready)-->synchronous issue
                            |
                            |--(succeed)-->worker finish it's job, tw
                                           take over the req

This works much better than the old IOSQE_ASYNC logic in cases where
unbound max_worker is relatively small. In this case, number of
io-worker eazily increments to max_worker, new worker cannot be created
and running workers stuck there handling old works in IOSQE_ASYNC mode.

In my 64-core machine, set unbound max_worker to 20, run echo-server,
turns out:
(arguments: register_file, connetion number is 1000, message size is 12
Byte)
original IOSQE_ASYNC: 76664.151 tps
after this patch: 166934.985 tps
Suggested-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NHao Xu <haoxu@linux.alibaba.com>
Link: https://lore.kernel.org/r/20211018133445.103438-1-haoxu@linux.alibaba.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

90fa0288

22 10月, 2021 1 次提交

io_uring: apply worker limits to previous users · b22fa62a

由 Pavel Begunkov 提交于 10月 21, 2021

Another change to the API io-wq worker limitation API added in 5.15,
apply the limit to all prior users that already registered a tctx. It
may be confusing as it's now, in particular the change covers the
following 2 cases:

TASK1                   | TASK2
_________________________________________________
ring = create()         |
                        | limit_iowq_workers()
*not limited*           |

TASK1                   | TASK2
_________________________________________________
ring = create()         |
                        | issue_requests()
limit_iowq_workers()    |
                        | *not limited*

A note on locking, it's safe to traverse ->tctx_list as we hold
->uring_lock, but do that after dropping sqd->lock to avoid possible
problems. It's also safe to access tctx->io_wq there because tasks
kill it only after removing themselves from tctx_list, see
io_uring_cancel_generic() -> io_uring_clean_tctx()
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/d6e09ecc3545e4dc56e43c906ee3d71b7ae21bed.1634818641.git.asml.silence@gmail.comReviewed-by: NHao Xu <haoxu@linux.alibaba.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b22fa62a

20 10月, 2021 4 次提交

io_uring: fix ltimeout unprep · 4ea672ab

由 Pavel Begunkov 提交于 10月 20, 2021

io_unprep_linked_timeout() is broken, first it needs to return back
REQ_F_ARM_LTIMEOUT, so the linked timeout is enqueued and disarmed. But
now we refcounted it, and linked timeouts may get not executed at all,
leaking a request.

Just kill the unprep optimisation.

Fixes: 906c6caa ("io_uring: optimise io_prep_linked_timeout()")
Reported-by: NBeld Zhang <beldzhang@gmail.com>
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/51b8e2bfc4bea8ee625cf2ba62b2a350cc9be031.1634719585.git.asml.silence@gmail.com
Link: https://github.com/axboe/liburing/issues/460Reported-by: NBeld Zhang <beldzhang@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

4ea672ab

io_uring: apply max_workers limit to all future users · e139a1ec

由 Pavel Begunkov 提交于 10月 19, 2021

Currently, IORING_REGISTER_IOWQ_MAX_WORKERS applies only to the task
that issued it, it's unexpected for users. If one task creates a ring,
limits workers and then passes it to another task the limit won't be
applied to the other task.

Another pitfall is that a task should either create a ring or submit at
least one request for IORING_REGISTER_IOWQ_MAX_WORKERS to work at all,
furher complicating the picture.

Change the API, save the limits and apply to all future users. Note, it
should be done first before giving away the ring or submitting new
requests otherwise the result is not guaranteed.

Fixes: 2e480058 ("io-wq: provide a way to limit max number of workers")
Link: https://github.com/axboe/liburing/issues/460Reported-by: NBeld Zhang <beldzhang@gmail.com>
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/51d0bae97180e08ab722c0d5c93e7439cfb6f697.1634683237.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

e139a1ec

io_uring: Use ERR_CAST() instead of ERR_PTR(PTR_ERR()) · 898df244

由 Changcheng Deng 提交于 10月 20, 2021

Use ERR_CAST() instead of ERR_PTR(PTR_ERR()).
This makes it more readable and also fix this warning detected by
err_cast.cocci:
./fs/io_uring.c: WARNING: 3208: 11-18: ERR_CAST can be used with buf
Reported-by: NZeal Robot <zealci@zte.com.cn>
Signed-off-by: NChangcheng Deng <deng.changcheng@zte.com.cn>
Link: https://lore.kernel.org/r/20211020084948.1038420-1-deng.changcheng@zte.com.cnSigned-off-by: NJens Axboe <axboe@kernel.dk>

898df244

io_uring: split logic of force_nonblock · 3b44b371

由 Hao Xu 提交于 10月 18, 2021

Currently force_nonblock stands for three meanings:
 - nowait or not
 - in an io-worker or not(hold uring_lock or not)

Let's split the logic to two flags, IO_URING_F_NONBLOCK and
IO_URING_F_UNLOCKED for convenience of the next patch.
Suggested-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NHao Xu <haoxu@linux.alibaba.com>
Reviewed-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/20211018133431.103298-1-haoxu@linux.alibaba.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

3b44b371

19 10月, 2021 15 次提交

io_uring: warning about unused-but-set parameter · 00169246

由 Arnd Bergmann 提交于 10月 19, 2021

When enabling -Wunused warnings by building with W=1, I get an
instance of the -Wunused-but-set-parameter warning in the io_uring code:

fs/io_uring.c: In function 'io_queue_async_work':
fs/io_uring.c:1445:61: error: parameter 'locked' set but not used [-Werror=unused-but-set-parameter]
 1445 | static void io_queue_async_work(struct io_kiocb *req, bool *locked)
      |                                                       ~~~~~~^~~~~~

There are very few warnings of this type, so it would be nice to enable
this by default and fix all the existing instances. As the assignment
serves no purpose by itself other than to prevent developers from using
the variable, an easy workaround is to remove the assignment and just
rename the argument to "dont_use".

Fixes: f237c30a ("io_uring: batch task work locking")
Link: https://lore.kernel.org/lkml/20210920121352.93063-1-arnd@kernel.org/Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20211019153507.348480-1-arnd@kernel.orgSigned-off-by: NJens Axboe <axboe@kernel.dk>

00169246

io_uring: inform block layer of how many requests we are submitting · 5ca7a8b3

由 Jens Axboe 提交于 10月 06, 2021

The block layer can use this knowledge to make smarter decisions on
how to handle the request, if it knows that N more may be coming. Switch
to using blk_start_plug_nr_ios() to pass in that information.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

5ca7a8b3

io_uring: simplify io_file_supports_nowait() · 88459b50

由 Pavel Begunkov 提交于 10月 17, 2021

Make sure that REQ_F_SUPPORT_NOWAIT is always set io_prep_rw(), and so
we can stop caring about setting it down the line simplifying
io_file_supports_nowait().
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/60c8f1f5e2cb45e00f4897b2cec10c5b3669da91.1634425438.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

88459b50

io_uring: combine REQ_F_NOWAIT_{READ,WRITE} flags · 35645ac3

由 Pavel Begunkov 提交于 10月 17, 2021

Merge REQ_F_NOWAIT_READ and REQ_F_NOWAIT_WRITE into one flag, i.e.
REQ_F_SUPPORT_NOWAIT. First it gets rid of dependence on CONFIG_64BIT
but also simplifies the code.

One thing to consider is when we don't have ->{read,write}_iter and go
through loop_rw_iter(). Just fail it with -EAGAIN if we expect nowait
behaviour but not sure whether it supports it.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/f832a20e5186c2e79c6519280c238f559a1d2bbc.1634425438.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

35645ac3

io_uring: arm poll for non-nowait files · e74ead13

由 Pavel Begunkov 提交于 10月 17, 2021

Don't check if we can do nowait before arming apoll, there are several
reasons for that. First, we don't care much about files that don't
support nowait. Second, it may be useful -- we don't want to be taking
away extra workers from io-wq when it can go in some async. Even if it
will go through io-wq eventually, it make difference in the numbers of
workers actually used. And the last one, it's needed to clean nowait in
future commits.

[kernel test robot: fix unused-var]
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/9d06f3cb2c8b686d970269a87986f154edb83043.1634425438.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

e74ead13

fs/io_uring: Prioritise checking faster conditions first in io_write · b10841c9

由 Noah Goldstein 提交于 10月 16, 2021

This commit reorders the conditions in a branch in io_write. The
reorder to check 'ret2 == -EAGAIN' first as checking
'(req->ctx->flags & IORING_SETUP_IOPOLL)' will likely be more
expensive due to 2x memory derefences.
Signed-off-by: NNoah Goldstein <goldstein.w.n@gmail.com>
Link: https://lore.kernel.org/r/20211017013229.4124279-1-goldstein.w.n@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

b10841c9

io_uring: clean io_prep_rw() · 5cb03d63

由 Pavel Begunkov 提交于 10月 15, 2021

We already store req->file in a variable in io_prep_rw(), just use it
instead of a couple of left references to kicob->ki_filp.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/2f5889fc7ab670daefd5ccaedd99416d8355f0ad.1634314022.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

5cb03d63

io_uring: optimise fixed rw rsrc node setting · 578c0ee2

由 Pavel Begunkov 提交于 10月 15, 2021

Move fixed rw io_req_set_rsrc_node() from rw prep into
io_import_fixed(), if we're using fixed buffers it will always be called
during submission as we save the state in advance,
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/68c06f66d5aa9661f1e4b88d08c52d23528297ec.1634314022.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

578c0ee2

io_uring: return iovec from __io_import_iovec · caa8fe6e

由 Pavel Begunkov 提交于 10月 15, 2021

We pass iovec** into __io_import_iovec(), which should keep it,
initialise and modify accordingly. It's expensive, return it directly
from __io_import_iovec encoding errors with ERR_PTR if needed.

io_import_iovec keeps the old interface, but it's inline and so
everything is optimised nicely.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/6230e9769982f03a8f86fa58df24666088c44d3e.1634314022.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

caa8fe6e

io_uring: optimise io_import_iovec fixed path · d1d681b0

由 Pavel Begunkov 提交于 10月 15, 2021

Delay loading req->rw.{addr,len} in io_import_iovec until it's really
needed, so removing extra loads for the fixed path, which doesn't use
them.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/3cc48dd0c4f1a37c4ce9aab5784281a2d83ad8be.1634314022.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

d1d681b0

io_uring: kill io_wq_current_is_worker() in iopoll · 9882131c

由 Pavel Begunkov 提交于 10月 15, 2021

Don't decide about locking based on io_wq_current_is_worker(), it's not
consistent with all other code and is expensive, use issue_flags.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/7546d5a58efa4360173541c6fe02ee6b8c7b4ea7.1634314022.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

9882131c

io_uring: optimise req->ctx reloads · 9983028e

由 Pavel Begunkov 提交于 10月 15, 2021

Don't load req->ctx in advance, it takes an extra register and the field
stays valid even after opcode handlers. It also optimises out req->ctx
load in io_iopoll_req_issued() once it's inlined.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/1e45ff671c44be0eb904f2e448a211734893fa0b.1634314022.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

9983028e

io_uring: rearrange io_read()/write() · 607b6fb8

由 Pavel Begunkov 提交于 10月 14, 2021

Combine force_nonblock branches (which is already optimised by
compiler), flip branches so the most hot/common path is the first, e.g.
as with non on-stack iov setup, and add extra likely/unlikely
attributions for errror paths.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/2c2536c5896d70994de76e387ea09a0402173a3f.1634144845.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

607b6fb8

io_uring: clean up io_import_iovec · 5e49c973

由 Pavel Begunkov 提交于 10月 14, 2021

Make io_import_iovec taking struct io_rw_state instead of an iter
pointer. First it takes care of initialising iovec pointer, which can be
forgotten. Even more, we can not init it if not needed, e.g. in case of
IORING_OP_READ_FIXED or IORING_OP_READ. Also hide saving iter_state
inside of it by splitting out an inline function of it to avoid extra
ifs.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/b1bbc213a95e5272d4da5867bb977d9acb6f2109.1634144845.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

5e49c973

io_uring: optimise io_import_iovec nonblock passing · 51aac424

由 Pavel Begunkov 提交于 10月 14, 2021

First, change IO_URING_F_NONBLOCK to take sign bit of the int, so
checking for it can be turned into test + sign-based-jump, makes the
binary smaller and may be faster.

Then, instead of passing need_lock boolean into io_import_iovec() just
give it issue_flags, which is already stored somewhere. Saves some space
on stack, a couple of test + cmov operations and other conversions.

note: we still leave
force_nonblock = issue_flags & IO_URING_F_NONBLOCK
variable, but it's optimised out by the compiler into testing
issue_flags directly.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/ee96547e692f6c975c229cd82fc721679571a734.1634144845.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

51aac424

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功