提交 · 6b639522f63f82437350038a4925633f769e4ec8 · openeuler / Kernel

19 10月, 2021 5 次提交

io_uring: inline io_dismantle_req · 6b639522

由 Pavel Begunkov 提交于 9月 08, 2021

io_dismantle_req() is hot, and not _too_ huge. Inline it, there are 3
call sites, which hopefully will turn into 2 in the future.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/bdd2dc30716cac270c2403e99bccd6286e4ae201.1631115443.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

6b639522

io_uring: kill off ios_left · 4b628aeb

由 Pavel Begunkov 提交于 9月 08, 2021

->ios_left is only used to decide whether to plug or not, kill it to
avoid this extra accounting, just use the initial submission number.
There is no much difference in regards of enabling plugging, where this
one does it in a few more cases, but all major ones should be covered
well.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/f13993bcf5b477f9a7d52881fc49f9457ea9870a.1631115443.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

4b628aeb

io_uring: dump sqe contents if issue fails · a87acfde

由 Jens Axboe 提交于 9月 11, 2021

I recently had to look at a production problem where a request ended
up getting the dreaded -EINVAL error on submit. The most used and
hence useless of error codes, as it just tells you that something
was wrong with your request, but not more than that.

Let's dump the full sqe contents if we run into an issue failure,
that'll allow easier diagnosing of a wide variety of issues.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

a87acfde

io_uring: utilize the io batching infrastructure for more efficient polled IO · b688f11e

由 Jens Axboe 提交于 10月 12, 2021

Wire up using an io_comp_batch for f_op->iopoll(). If the lower stack
supports it, we can handle high rates of polled IO more efficiently.

This raises the single core efficiency on my system from ~6.1M IOPS to
~6.6M IOPS running a random read workload at depth 128 on two gen2
Optane drives.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b688f11e

block: add a struct io_comp_batch argument to fops->iopoll() · 5a72e899

由 Jens Axboe 提交于 10月 12, 2021

struct io_comp_batch contains a list head and a completion handler, which
will allow completions to more effciently completed batches of IO.

For now, no functional changes in this patch, we just define the
io_comp_batch structure and add the argument to the file_operations iopoll
handler.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

5a72e899

18 10月, 2021 3 次提交

io_uring: don't sleep when polling for I/O · d729cf9a

由 Christoph Hellwig 提交于 10月 12, 2021

There is no point in sleeping for the expected I/O completion timeout
in the io_uring async polling model as we never poll for a specific
I/O.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Tested-by: NMark Wunderlich <mark.wunderlich@intel.com>
Link: https://lore.kernel.org/r/20211012111226.760968-11-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

d729cf9a

block: replace the spin argument to blk_iopoll with a flags argument · ef99b2d3

由 Christoph Hellwig 提交于 10月 12, 2021

Switch the boolean spin argument to blk_poll to passing a set of flags
instead.  This will allow to control polling behavior in a more fine
grained way.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Tested-by: NMark Wunderlich <mark.wunderlich@intel.com>
Link: https://lore.kernel.org/r/20211012111226.760968-10-hch@lst.de
[axboe: adapt to changed io_uring iopoll]
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ef99b2d3

io_uring: fix a layering violation in io_iopoll_req_issued · 30da1b45

由 Christoph Hellwig 提交于 10月 12, 2021

syscall-level code can't just poke into the details of the poll cookie,
which is private information of the block layer.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211012111226.760968-5-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

30da1b45

14 10月, 2021 1 次提交

io_uring: fix wrong condition to grab uring lock · 14cfbb7a

由 Hao Xu 提交于 10月 14, 2021

Grab uring lock when we are in io-worker rather than in the original
or system-wq context since we already hold it in these two situation.
Signed-off-by: NHao Xu <haoxu@linux.alibaba.com>
Fixes: b66ceaf3 ("io_uring: move iopoll reissue into regular IO path")
Link: https://lore.kernel.org/r/20211014140400.50235-1-haoxu@linux.alibaba.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

14cfbb7a

02 10月, 2021 1 次提交

io_uring: kill fasync · 3f008385

由 Pavel Begunkov 提交于 10月 01, 2021

We have never supported fasync properly, it would only fire when there
is something polling io_uring making it useless. The original support came
in through the initial io_uring merge for 5.1. Since it's broken and
nobody has reported it, get rid of the fasync bits.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/2f7ca3d344d406d34fa6713824198915c41cea86.1633080236.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

3f008385

25 9月, 2021 8 次提交

io_uring: make OP_CLOSE consistent with direct open · 7df778be

由 Pavel Begunkov 提交于 9月 24, 2021

From recently open/accept are now able to manipulate fixed file table,
but it's inconsistent that close can't. Close the gap, keep API same as
with open/accept, i.e. via sqe->file_slot.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

7df778be

io_uring: kill extra checks in io_write() · 9f3a2cb2

由 Pavel Begunkov 提交于 9月 24, 2021

We don't retry short writes and so we would never get to async setup in
io_write() in that case. Thus ret2 > 0 is always false and
iov_iter_advance() is never used. Apparently, the same is found by
Coverity, which complains on the code.

Fixes: cd658695 ("io_uring: use iov_iter state save/restore helpers")
Reported-by: NDave Jones <davej@codemonkey.org.uk>
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/5b33e61034748ef1022766efc0fb8854cfcf749c.1632500058.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

9f3a2cb2

io_uring: don't punt files update to io-wq unconditionally · cdb31c29

由 Jens Axboe 提交于 9月 24, 2021

There's no reason to punt it unconditionally, we just need to ensure that
the submit lock grabbing is conditional.

Fixes: 05f3fb3c ("io_uring: avoid ring quiesce for fixed file set unregister and update")
Signed-off-by: NJens Axboe <axboe@kernel.dk>

cdb31c29

io_uring: put provided buffer meta data under memcg accounting · 9990da93

由 Jens Axboe 提交于 9月 24, 2021

For each provided buffer, we allocate a struct io_buffer to hold the
data associated with it. As a large number of buffers can be provided,
account that data with memcg.

Fixes: ddf0322d ("io_uring: add IORING_OP_PROVIDE_BUFFERS")
Signed-off-by: NJens Axboe <axboe@kernel.dk>

9990da93

io_uring: allow conditional reschedule for intensive iterators · 8bab4c09

由 Jens Axboe 提交于 9月 24, 2021

If we have a lot of threads and rings, the tctx list can get quite big.
This is especially true if we keep creating new threads and rings.
Likewise for the provided buffers list. Be nice and insert a conditional
reschedule point while iterating the nodes for deletion.

Link: https://lore.kernel.org/io-uring/00000000000064b6b405ccb41113@google.com/
Reported-by: syzbot+111d2a03f51f5ae73775@syzkaller.appspotmail.com
Signed-off-by: NJens Axboe <axboe@kernel.dk>

8bab4c09

io_uring: fix potential req refcount underflow · 5b7aa38d

由 Hao Xu 提交于 9月 22, 2021

For multishot mode, there may be cases like:

iowq                                 original context
io_poll_add
  _arm_poll()
  mask = vfs_poll() is not 0
  if mask
(2)  io_poll_complete()
  compl_unlock
   (interruption happens
    tw queued to original
    context)
                                     io_poll_task_func()
                                     compl_lock
                                 (3) done = io_poll_complete() is true
                                     compl_unlock
                                     put req ref
(1) if (poll->flags & EPOLLONESHOT)
      put req ref

EPOLLONESHOT flag in (1) may be from (2) or (3), so there are multiple
combinations that can cause ref underfow.
Let's address it by:
- check the return value in (2) as done
- change (1) to if (done)
    in this way, we only do ref put in (1) if 'oneshot flag' is from
    (2)
- do poll.done check in io_poll_task_func(), so that we won't put ref
  for the second time.
Signed-off-by: NHao Xu <haoxu@linux.alibaba.com>
Link: https://lore.kernel.org/r/20210922101238.7177-4-haoxu@linux.alibaba.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

5b7aa38d

io_uring: fix missing set of EPOLLONESHOT for CQ ring overflow · a62682f9

由 Hao Xu 提交于 9月 22, 2021

We should set EPOLLONESHOT if cqring_fill_event() returns false since
io_poll_add() decides to put req or not by it.

Fixes: 5082620f ("io_uring: terminate multishot poll for CQ ring overflow")
Signed-off-by: NHao Xu <haoxu@linux.alibaba.com>
Link: https://lore.kernel.org/r/20210922101238.7177-3-haoxu@linux.alibaba.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

a62682f9

io_uring: fix race between poll completion and cancel_hash insertion · bd99c71b

由 Hao Xu 提交于 9月 22, 2021

If poll arming and poll completion runs in parallel, there maybe races.
For instance, run io_poll_add in iowq and io_poll_task_func in original
context, then:

  iowq                                      original context
  io_poll_add
    vfs_poll
     (interruption happens
      tw queued to original
      context)                              io_poll_task_func
                                              generate cqe
                                              del from cancel_hash[]
    if !poll.done
      insert to cancel_hash[]

The entry left in cancel_hash[], similar case for fast poll.
Fix it by set poll.done = true when del from cancel_hash[].

Fixes: 5082620f ("io_uring: terminate multishot poll for CQ ring overflow")
Signed-off-by: NHao Xu <haoxu@linux.alibaba.com>
Link: https://lore.kernel.org/r/20210922101238.7177-2-haoxu@linux.alibaba.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

bd99c71b

15 9月, 2021 3 次提交

io_uring: move iopoll reissue into regular IO path · b66ceaf3

由 Pavel Begunkov 提交于 9月 15, 2021

230d50d4 ("io_uring: move reissue into regular IO path")
made non-IOPOLL I/O to not retry from ki_complete handler. Follow it
steps and do the same for IOPOLL. Same problems, same implementation,
same -EAGAIN assumptions.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/f80dfee2d5fa7678f0052a8ab3cfca9496a112ca.1631699928.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

b66ceaf3

io_uring: use iov_iter state save/restore helpers · cd658695

由 Jens Axboe 提交于 9月 10, 2021

Get rid of the need to do re-expand and revert on an iterator when we
encounter a short IO, or failure that warrants a retry. Use the new
state save/restore helpers instead.

We keep the iov_iter_state persistent across retries, if we need to
restart the read or write operation. If there's a pending retry, the
operation will always exit with the state correctly saved.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

cd658695

io_uring: allow retry for O_NONBLOCK if async is supported · 5d329e12

由 Jens Axboe 提交于 9月 14, 2021

A common complaint is that using O_NONBLOCK files with io_uring can be a
bit of a pain. Be a bit nicer and allow normal retry IFF the file does
support async behavior. This makes it possible to use io_uring more
reliably with O_NONBLOCK files, for use cases where it either isn't
possible or feasible to modify the file flags.

Cc: stable@vger.kernel.org
Reported-and-tested-by: NDan Melnic <dmm@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

5d329e12

14 9月, 2021 3 次提交

io_uring: auto-removal for direct open/accept · 9c7b0ba8

由 Pavel Begunkov 提交于 9月 14, 2021

It might be inconvenient that direct open/accept deviates from the
update semantics and fails if the slot is taken instead of removing a
file sitting there. Implement this auto-removal.

Note that removal might need to allocate and so may fail. However, if an
empty slot is specified, it's guaraneed to not fail on the fd
installation side for valid userspace programs. It's needed for users
who can't tolerate such failures, e.g. accept where the other end
never retries.
Suggested-by: NFranz-B. Tuneke <franz-bernhard.tuneke@tu-dortmund.de>
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/c896f14ea46b0eaa6c09d93149e665c2c37979b4.1631632300.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

9c7b0ba8

io_uring: fix missing sigmask restore in io_cqring_wait() · 44df58d4

由 Xiaoguang Wang 提交于 9月 14, 2021

Move get_timespec() section in io_cqring_wait() before the sigmask
saving, otherwise we'll fail to restore sigmask once get_timespec()
returns error.

Fixes: c73ebb68 ("io_uring: add timeout support for io_uring_enter()")
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Link: https://lore.kernel.org/r/20210914143852.9663-1-xiaoguang.wang@linux.alibaba.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

44df58d4

io_uring: pin SQPOLL data before unlocking ring lock · 41d3a6bd

由 Jens Axboe 提交于 9月 13, 2021

We need to re-check sqd->thread after we've dropped the lock. Pin
the sqd before doing the lockdep lock dance, and check if the thread
is alive after that. It's either NULL or alive, as the SQPOLL thread
cannot exit without holding the same sqd->lock.

Reported-and-tested-by: syzbot+337de45f13a4fd54d708@syzkaller.appspotmail.com
Fixes: fa84693b ("io_uring: ensure IORING_REGISTER_IOWQ_MAX_WORKERS works with SQPOLL")
Signed-off-by: NJens Axboe <axboe@kernel.dk>

41d3a6bd

13 9月, 2021 1 次提交

io_uring: ensure symmetry in handling iter types in loop_rw_iter() · 16c8d2df

由 Jens Axboe 提交于 9月 12, 2021

When setting up the next segment, we check what type the iter is and
handle it accordingly. However, when incrementing and processed amount
we do not, and both iter advance and addr/len are adjusted, regardless
of type. Split the increment side just like we do on the setup side.

Fixes: 4017eb91 ("io_uring: make loop_rw_iter() use original user supplied pointers")
Cc: stable@vger.kernel.org
Reported-by: NValentina Palmiotti <vpalmiotti@gmail.com>
Reviewed-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

16c8d2df

10 9月, 2021 1 次提交

io_uring: fix off-by-one in BUILD_BUG_ON check of __REQ_F_LAST_BIT · 32c2d33e

由 Hao Xu 提交于 9月 07, 2021

Build check of __REQ_F_LAST_BIT should be larger than, not equal or larger
than. It's perfectly valid to have __REQ_F_LAST_BIT be 32, as that means
that the last valid bit is 31 which does fit in the type.
Signed-off-by: NHao Xu <haoxu@linux.alibaba.com>
Link: https://lore.kernel.org/r/20210907032243.114190-1-haoxu@linux.alibaba.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

32c2d33e

09 9月, 2021 3 次提交

io_uring: fail links of cancelled timeouts · 2ae2eb9d

由 Pavel Begunkov 提交于 9月 09, 2021

When we cancel a timeout we should mark it with REQ_F_FAIL, so
linked requests are cancelled as well, but not queued for further
execution.

Cc: stable@vger.kernel.org
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/fff625b44eeced3a5cae79f60e6acf3fbdf8f990.1631192135.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

2ae2eb9d

io_uring: drop ctx->uring_lock before acquiring sqd->lock · 009ad9f0

由 Jens Axboe 提交于 9月 08, 2021

The SQPOLL thread dictates the lock order, and we hold the ctx->uring_lock
for all the registration opcodes. We also hold a ref to the ctx, and we
do drop the lock for other reasons to quiesce, so it's fine to drop the
ctx lock temporarily to grab the sqd->lock. This fixes the following
lockdep splat:

======================================================
WARNING: possible circular locking dependency detected
5.14.0-syzkaller #0 Not tainted
------------------------------------------------------
syz-executor.5/25433 is trying to acquire lock:
ffff888023426870 (&sqd->lock){+.+.}-{3:3}, at: io_register_iowq_max_workers fs/io_uring.c:10551 [inline]
ffff888023426870 (&sqd->lock){+.+.}-{3:3}, at: __io_uring_register fs/io_uring.c:10757 [inline]
ffff888023426870 (&sqd->lock){+.+.}-{3:3}, at: __do_sys_io_uring_register+0x10aa/0x2e70 fs/io_uring.c:10792

but task is already holding lock:
ffff8880885b40a8 (&ctx->uring_lock){+.+.}-{3:3}, at: __do_sys_io_uring_register+0x2e1/0x2e70 fs/io_uring.c:10791

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (&ctx->uring_lock){+.+.}-{3:3}:
       __mutex_lock_common kernel/locking/mutex.c:596 [inline]
       __mutex_lock+0x131/0x12f0 kernel/locking/mutex.c:729
       __io_sq_thread fs/io_uring.c:7291 [inline]
       io_sq_thread+0x65a/0x1370 fs/io_uring.c:7368
       ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295

-> #0 (&sqd->lock){+.+.}-{3:3}:
       check_prev_add kernel/locking/lockdep.c:3051 [inline]
       check_prevs_add kernel/locking/lockdep.c:3174 [inline]
       validate_chain kernel/locking/lockdep.c:3789 [inline]
       __lock_acquire+0x2a07/0x54a0 kernel/locking/lockdep.c:5015
       lock_acquire kernel/locking/lockdep.c:5625 [inline]
       lock_acquire+0x1ab/0x510 kernel/locking/lockdep.c:5590
       __mutex_lock_common kernel/locking/mutex.c:596 [inline]
       __mutex_lock+0x131/0x12f0 kernel/locking/mutex.c:729
       io_register_iowq_max_workers fs/io_uring.c:10551 [inline]
       __io_uring_register fs/io_uring.c:10757 [inline]
       __do_sys_io_uring_register+0x10aa/0x2e70 fs/io_uring.c:10792
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae

other info that might help us debug this:

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&ctx->uring_lock);
                               lock(&sqd->lock);
                               lock(&ctx->uring_lock);
  lock(&sqd->lock);

 *** DEADLOCK ***

Fixes: 2e480058 ("io-wq: provide a way to limit max number of workers")
Reported-by: syzbot+97fa56483f69d677969f@syzkaller.appspotmail.com
Signed-off-by: NJens Axboe <axboe@kernel.dk>

009ad9f0

io_uring: fix missing mb() before waitqueue_active · c57a91fb

由 Pavel Begunkov 提交于 9月 08, 2021

In case of !SQPOLL, io_cqring_ev_posted_iopoll() doesn't provide a
memory barrier required by waitqueue_active(&ctx->poll_wait). There is
a wq_has_sleeper(), which does smb_mb() inside, but it's called only for
SQPOLL.

Fixes: 5fd46178 ("io_uring: be smarter about waking multiple CQ ring waiters")
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/2982e53bcea2274006ed435ee2a77197107d8a29.1631130542.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

c57a91fb

04 9月, 2021 1 次提交

io_uring: reexpand under-reexpanded iters · 89c2b3b7

由 Pavel Begunkov 提交于 8月 23, 2021

[   74.211232] BUG: KASAN: stack-out-of-bounds in iov_iter_revert+0x809/0x900
[   74.212778] Read of size 8 at addr ffff888025dc78b8 by task
syz-executor.0/828
[   74.214756] CPU: 0 PID: 828 Comm: syz-executor.0 Not tainted
5.14.0-rc3-next-20210730 #1
[   74.216525] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[   74.219033] Call Trace:
[   74.219683]  dump_stack_lvl+0x8b/0xb3
[   74.220706]  print_address_description.constprop.0+0x1f/0x140
[   74.224226]  kasan_report.cold+0x7f/0x11b
[   74.226085]  iov_iter_revert+0x809/0x900
[   74.227960]  io_write+0x57d/0xe40
[   74.232647]  io_issue_sqe+0x4da/0x6a80
[   74.242578]  __io_queue_sqe+0x1ac/0xe60
[   74.245358]  io_submit_sqes+0x3f6e/0x76a0
[   74.248207]  __do_sys_io_uring_enter+0x90c/0x1a20
[   74.257167]  do_syscall_64+0x3b/0x90
[   74.257984]  entry_SYSCALL_64_after_hwframe+0x44/0xae

old_size = iov_iter_count();
...
iov_iter_revert(old_size - iov_iter_count());

If iov_iter_revert() is done base on the initial size as above, and the
iter is truncated and not reexpanded in the middle, it miscalculates
borders causing problems. This trace is due to no one reexpanding after
generic_write_checks().

Now iters store how many bytes has been truncated, so reexpand them to
the initial state right before reverting.

Cc: stable@vger.kernel.org
Reported-by: NPalash Oswal <oswalpalash@gmail.com>
Reported-by: NSudip Mukherjee <sudipm.mukherjee@gmail.com>
Reported-and-tested-by: syzbot+9671693590ef5aad8953@syzkaller.appspotmail.com
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

89c2b3b7

03 9月, 2021 4 次提交

io_uring: fix possible poll event lost in multi shot mode · 31efe48e

由 Xiaoguang Wang 提交于 9月 03, 2021

IIUC, IORING_POLL_ADD_MULTI is similar to epoll's edge-triggered mode,
that means once one pure poll request returns one event(cqe), we'll
need to read or write continually until EAGAIN is returned, then I think
there is a possible poll event lost race in multi shot mode:

t1  poll request add |                         |
t2                   |                         |
t3  event happens    |                         |
t4  task work add    |                         |
t5                   | task work run           |
t6                   |   commit one cqe        |
t7                   |                         | user app handles cqe
t8                   |   new event happen      |
t9                   |   add back to waitqueue |
t10                  |

After t6 but before t9, if new event happens, there'll be no wakeup
operation, and if user app has picked up this cqe in t7, read or write
until EAGAIN is returned. In t8, new event happens and will be lost,
though this race window maybe small.

To fix this possible race, add poll request back to waitqueue before
committing cqe.

Fixes: 88e41cf9 ("io_uring: add multishot mode for IORING_OP_POLL_ADD")
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Link: https://lore.kernel.org/r/20210903142436.5767-1-xiaoguang.wang@linux.alibaba.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

31efe48e

io_uring: prolong tctx_task_work() with flushing · 8d4ad41e

由 Pavel Begunkov 提交于 9月 02, 2021

io_submit_flush_completions() may enqueue linked requests for task_work
execution, so don't leave tctx_task_work() right after the tw list is
exhausted, but try to flush and then retry.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/0755d4c2c36301447c63bdd4146c10477cea4249.1630539342.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

8d4ad41e

io_uring: don't disable kiocb_done() CQE batching · 63637853

由 Pavel Begunkov 提交于 9月 02, 2021

Not passing issue_flags from kiocb_done() into __io_complete_rw() means
that completion batching for this case is disabled, e.g. for most of
buffered reads.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/b2689462835c3ee28a5999ef4f9a581e24be04a2.1630539342.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

63637853

io_uring: ensure IORING_REGISTER_IOWQ_MAX_WORKERS works with SQPOLL · fa84693b

由 Jens Axboe 提交于 9月 01, 2021

SQPOLL has a different thread doing submissions, we need to check for
that and use the right task context when updating the worker values.
Just hold the sqd->lock across the operation, this ensures that the
thread cannot go away while we poke at ->io_uring.

Link: https://github.com/axboe/liburing/issues/420
Fixes: 2e480058 ("io-wq: provide a way to limit max number of workers")
Reported-by: NJohannes Lundberg <johalun0@gmail.com>
Tested-by: NJohannes Lundberg <johalun0@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

fa84693b

01 9月, 2021 4 次提交

io_uring: don't submit half-prepared drain request · b8ce1b9d

由 Pavel Begunkov 提交于 8月 31, 2021

[ 3784.910888] BUG: kernel NULL pointer dereference, address: 0000000000000020
[ 3784.910904] RIP: 0010:__io_file_supports_nowait+0x5/0xc0
[ 3784.910926] Call Trace:
[ 3784.910928]  ? io_read+0x17c/0x480
[ 3784.910945]  io_issue_sqe+0xcb/0x1840
[ 3784.910953]  __io_queue_sqe+0x44/0x300
[ 3784.910959]  io_req_task_submit+0x27/0x70
[ 3784.910962]  tctx_task_work+0xeb/0x1d0
[ 3784.910966]  task_work_run+0x61/0xa0
[ 3784.910968]  io_run_task_work_sig+0x53/0xa0
[ 3784.910975]  __x64_sys_io_uring_enter+0x22/0x30
[ 3784.910977]  do_syscall_64+0x3d/0x90
[ 3784.910981]  entry_SYSCALL_64_after_hwframe+0x44/0xae

io_drain_req() goes before checks for REQ_F_FAIL, which protect us from
submitting under-prepared request (e.g. failed in io_init_req(). Fail
such drained requests as well.

Fixes: a8295b98 ("io_uring: fix failed linkchain code logic")
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/e411eb9924d47a131b1e200b26b675df0c2b7627.1630415423.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

b8ce1b9d

io_uring: fix queueing half-created requests · c6d3d9cb

由 Pavel Begunkov 提交于 8月 31, 2021

[   27.259845] general protection fault, probably for non-canonical address 0xdffffc0000000005: 0000 [#1] SMP KASAN PTI
[   27.261043] KASAN: null-ptr-deref in range [0x0000000000000028-0x000000000000002f]
[   27.263730] RIP: 0010:sock_from_file+0x20/0x90
[   27.272444] Call Trace:
[   27.272736]  io_sendmsg+0x98/0x600
[   27.279216]  io_issue_sqe+0x498/0x68d0
[   27.281142]  __io_queue_sqe+0xab/0xb50
[   27.285830]  io_req_task_submit+0xbf/0x1b0
[   27.286306]  tctx_task_work+0x178/0xad0
[   27.288211]  task_work_run+0xe2/0x190
[   27.288571]  exit_to_user_mode_prepare+0x1a1/0x1b0
[   27.289041]  syscall_exit_to_user_mode+0x19/0x50
[   27.289521]  do_syscall_64+0x48/0x90
[   27.289871]  entry_SYSCALL_64_after_hwframe+0x44/0xae

io_req_complete_failed() -> io_req_complete_post() ->
io_req_task_queue() still would try to enqueue hard linked request,
which can be half prepared (e.g. failed init), so we can't allow
that to happen.

Fixes: a8295b98 ("io_uring: fix failed linkchain code logic")
Reported-by: syzbot+f9704d1878e290eddf73@syzkaller.appspotmail.com
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/70b513848c1000f88bd75965504649c6bb1415c0.1630415423.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

c6d3d9cb

io_uring: retry in case of short read on block device · 7db30437

由 Ming Lei 提交于 8月 21, 2021

In case of buffered reading from block device, when short read happens,
we should retry to read more, otherwise the IO will be completed
partially, for example, the following fio expects to read 2MB, but it
can only read 1M or less bytes:

    fio --name=onessd --filename=/dev/nvme0n1 --filesize=2M \
	--rw=randread --bs=2M --direct=0 --overwrite=0 --numjobs=1 \
	--iodepth=1 --time_based=0 --runtime=2 --ioengine=io_uring \
	--registerfiles --fixedbufs --gtod_reduce=1 --group_reporting

Fix the issue by allowing short read retry for block device, which sets
FMODE_BUF_RASYNC really.

Fixes: 9a173346 ("io_uring: fix short read retries for non-reg files")
Cc: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Reviewed-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/20210821150751.1290434-1-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

7db30437

io_uring: IORING_OP_WRITE needs hash_reg_file set · 7b3188e7

由 Jens Axboe 提交于 8月 30, 2021

During some testing, it became evident that using IORING_OP_WRITE doesn't
hash buffered writes like the other writes commands do. That's simply
an oversight, and can cause performance regressions when doing buffered
writes with this command.

Correct that and add the flag, so that buffered writes are correctly
hashed when using the non-iovec based write command.

Cc: stable@vger.kernel.org
Fixes: 3a6820f2 ("io_uring: add non-vectored read/write commands")
Signed-off-by: NJens Axboe <axboe@kernel.dk>

7b3188e7

30 8月, 2021 2 次提交

io_uring: allow updating linked timeouts · f1042b6c

由 Pavel Begunkov 提交于 8月 28, 2021

We allow updating normal timeouts, add support for adjusting timings of
linked timeouts as well.
Reported-by: NVictor Stewart <v@nametag.social>
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f1042b6c

io_uring: keep ltimeouts in a list · ef9dd637

由 Pavel Begunkov 提交于 8月 28, 2021

A preparation patch. Keep all queued linked timeout in a list, so they
may be found and updated.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ef9dd637

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功