提交 · 01a6f356e20303bb00c3f9a5c9c9bfc8b2f04b6f · openanolis / cloud-kernel

09 6月, 2020 5 次提交

psi: Fix a division error in psi poll() · 01a6f356

由 Johannes Weiner 提交于 12月 03, 2019

task #28327019

commit c3466952ca1514158d7c16c9cfc48c27d5c5dc0f upstream

The psi window size is a u64 an can be up to 10 seconds right now,
which exceeds the lower 32 bits of the variable. We currently use
div_u64 for it, which is meant only for 32-bit divisors. The result is
garbage pressure sampling values and even potential div0 crashes.

Use div64_u64.
Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NSuren Baghdasaryan <surenb@google.com>
Cc: Jingfeng Xie <xiejingfeng@linux.alibaba.com>
Link: https://lkml.kernel.org/r/20191203183524.41378-3-hannes@cmpxchg.orgSigned-off-by: Nzhongjiang-ali <zhongjiang-ali@linux.alibaba.com>
Reviewed-by: NXunlei Pang <xlpang@linux.alibaba.com>

01a6f356

fs/binfmt_elf.c: allocate initialized memory in fill_thread_core_info() · be1c1391

由 Alexander Potapenko 提交于 5月 27, 2020

to #27338374

commit 1d605416fb7175e1adf094251466caa52093b413 upstream.

KMSAN reported uninitialized data being written to disk when dumping
core.  As a result, several kilobytes of kmalloc memory may be written
to the core file and then read by a non-privileged user.
Reported-by: Nsam <sunhaoyl@outlook.com>
Signed-off-by: NAlexander Potapenko <glider@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Acked-by: NKees Cook <keescook@chromium.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20200419100848.63472-1-glider@google.com
Link: https://github.com/google/kmsan/issues/76Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Reference: CVE-2020-10732
Signed-off-by: NShile Zhang <shile.zhang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

be1c1391

KVM: SVM: Fix potential memory leak in svm_cpu_init() · 262d517b

由 Miaohe Lin 提交于 1月 04, 2020

to #27338374

commit d80b64ff297e40c2b6f7d7abc1b3eba70d22a068 upstream.

When kmalloc memory for sd->sev_vmcbs failed, we forget to free the page
held by sd->save_area. Also get rid of the var r as '-ENOMEM' is actually
the only possible outcome here.
Reviewed-by: NLiran Alon <liran.alon@oracle.com>
Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NMiaohe Lin <linmiaohe@huawei.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Cc: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Reference: CVE-2020-12768
Signed-off-by: NShile Zhang <shile.zhang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

262d517b

netlabel: cope with NULL catmap · 2d4b9c66

由 Paolo Abeni 提交于 5月 12, 2020

to #27338374

commit eead1c2ea2509fd754c6da893a94f0e69e83ebe4 upstream.

The cipso and calipso code can set the MLS_CAT attribute on
successful parsing, even if the corresponding catmap has
not been allocated, as per current configuration and external
input.

Later, selinux code tries to access the catmap if the MLS_CAT flag
is present via netlbl_catmap_getlong(). That may cause null ptr
dereference while processing incoming network traffic.

Address the issue setting the MLS_CAT flag only if the catmap is
really allocated. Additionally let netlbl_catmap_getlong() cope
with NULL catmap.
Reported-by: NMatthew Sheets <matthew.sheets@gd-ms.com>
Fixes: 4b8feff2 ("netlabel: fix the horribly broken catmap functions")
Fixes: ceba1832 ("calipso: Set the calipso socket label to match the secattr.")
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Acked-by: NPaul Moore <paul@paul-moore.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Reference: CVE-2020-10711
Signed-off-by: NShile Zhang <shile.zhang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

2d4b9c66

selinux: properly handle multiple messages in selinux_netlink_send() · 0b3509b8

由 Paul Moore 提交于 4月 28, 2020

to #27338374

commit fb73974172ffaaf57a7c42f35424d9aece1a5af6 upstream.

Fix the SELinux netlink_send hook to properly handle multiple netlink
messages in a single sk_buff; each message is parsed and subject to
SELinux access control.  Prior to this patch, SELinux only inspected
the first message in the sk_buff.

Cc: stable@vger.kernel.org
Reported-by: NDmitry Vyukov <dvyukov@google.com>
Reviewed-by: NStephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: NPaul Moore <paul@paul-moore.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Reference: CVE-2020-10751
Signed-off-by: NShile Zhang <shile.zhang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

0b3509b8

08 6月, 2020 1 次提交

sched/fair: Don't NUMA balance for kthreads · 3b541a29

由 Jens Axboe 提交于 5月 26, 2020

fix #28384496

commit 18f855e574d9799a0e7489f8ae6fd8447d0dd74a upstream

Stefano reported a crash with using SQPOLL with io_uring:

  BUG: kernel NULL pointer dereference, address: 00000000000003b0
  CPU: 2 PID: 1307 Comm: io_uring-sq Not tainted 5.7.0-rc7 #11
  RIP: 0010:task_numa_work+0x4f/0x2c0
  Call Trace:
   task_work_run+0x68/0xa0
   io_sq_thread+0x252/0x3d0
   kthread+0xf9/0x130
   ret_from_fork+0x35/0x40

which is task_numa_work() oopsing on current->mm being NULL.

The task work is queued by task_tick_numa(), which checks if current->mm is
NULL at the time of the call. But this state isn't necessarily persistent,
if the kthread is using use_mm() to temporarily adopt the mm of a task.

Change the task_tick_numa() check to exclude kernel threads in general,
as it doesn't make sense to attempt ot balance for kthreads anyway.
Reported-by: NStefano Garzarella <sgarzare@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NIngo Molnar <mingo@kernel.org>
Acked-by: NPeter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/865de121-8190-5d30-ece5-3b097dc74431@kernel.dkSigned-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Acked-by: NShanpei Chen <shanpeic@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

3b541a29

05 6月, 2020 2 次提交

KVM: polling: add architecture backend to disable polling · 15d3ec88

由 Christian Borntraeger 提交于 3月 05, 2019

fix #28092200

commit cdd6ad3ac63d2fa320baefcf92a02a918375c30f upstream

There are cases where halt polling is unwanted. For example when running
KVM on an over committed LPAR we rather want to give back the CPU to
neighbour LPARs instead of polling. Let us provide a callback that
allows architectures to disable polling.
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
Reviewed-by: NCornelia Huck <cohuck@redhat.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Nchenxiangzuo <cxz18821786681@linux.alibaba.com>
Reviewed-by: NCaspar Zhang <caspar@linux.alibaba.com>

15d3ec88

KVM: x86: fix missing prototypes · c053cc11

由 Paolo Bonzini 提交于 2月 13, 2020

to #28092200

commit d970a325561da5e611596cbb06475db3755ce823 upstream

Reported with "make W=1" due to -Wmissing-prototypes.
Reported-by: NQian Cai <cai@lca.pw>
Reviewed-by: NMiaohe Lin <linmiaohe@huawei.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Nchenxiangzuo <cxz18821786681@linux.alibaba.com>
Reviewed-by: NCaspar Zhang <caspar@linux.alibaba.com>

c053cc11

04 6月, 2020 32 次提交

io_uring: reset -EBUSY error when io sq thread is waken up · 2abbba20

由 Xiaoguang Wang 提交于 5月 20, 2020

to #28170604

commit d4ae271dfaae2a5f41c015f2f20d62a1deeec734 upstream

In io_sq_thread(), currently if we get an -EBUSY error and go to sleep,
we will won't clear it again, which will result in io_sq_thread() will
never have a chance to submit sqes again. Below test program test.c
can reveal this bug:

int main(int argc, char *argv[])
{
        struct io_uring ring;
        int i, fd, ret;
        struct io_uring_sqe *sqe;
        struct io_uring_cqe *cqe;
        struct iovec *iovecs;
        void *buf;
        struct io_uring_params p;

        if (argc < 2) {
                printf("%s: file\n", argv[0]);
                return 1;
        }

        memset(&p, 0, sizeof(p));
        p.flags = IORING_SETUP_SQPOLL;
        ret = io_uring_queue_init_params(4, &ring, &p);
        if (ret < 0) {
                fprintf(stderr, "queue_init: %s\n", strerror(-ret));
                return 1;
        }

        fd = open(argv[1], O_RDONLY | O_DIRECT);
        if (fd < 0) {
                perror("open");
                return 1;
        }

        iovecs = calloc(10, sizeof(struct iovec));
        for (i = 0; i < 10; i++) {
                if (posix_memalign(&buf, 4096, 4096))
                        return 1;
                iovecs[i].iov_base = buf;
                iovecs[i].iov_len = 4096;
        }

        ret = io_uring_register_files(&ring, &fd, 1);
        if (ret < 0) {
                fprintf(stderr, "%s: register %d\n", __FUNCTION__, ret);
                return ret;
        }

        for (i = 0; i < 10; i++) {
                sqe = io_uring_get_sqe(&ring);
                if (!sqe)
                        break;

                io_uring_prep_readv(sqe, 0, &iovecs[i], 1, 0);
                sqe->flags |= IOSQE_FIXED_FILE;

                ret = io_uring_submit(&ring);
                sleep(1);
                printf("submit %d\n", i);
        }

        for (i = 0; i < 10; i++) {
                io_uring_wait_cqe(&ring, &cqe);
                printf("receive: %d\n", i);
                if (cqe->res != 4096) {
                        fprintf(stderr, "ret=%d, wanted 4096\n", cqe->res);
                        ret = 1;
                }
                io_uring_cqe_seen(&ring, cqe);
        }

        close(fd);
        io_uring_queue_exit(&ring);
        return 0;
}
sudo ./test testfile
above command will hang on the tenth request, to fix this bug, when io
sq_thread is waken up, we reset the variable 'ret' to be zero.
Suggested-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

2abbba20

io_uring: don't add non-IO requests to iopoll pending list · 4d7ac019

由 Jens Axboe 提交于 5月 19, 2020

to #28170604

commit b532576ed39efe3b351ae8320b2ab67a4c4c3719 upstream

We normally disable any commands that aren't specifically poll commands
for a ring that is setup for polling, but we do allow buffer provide and
remove commands to support buffer selection for polled IO. Once a
request is issued, we add it to the poll list to poll for completion. But
we should not do that for non-IO commands, as those request complete
inline immediately and aren't pollable. If we do, we can leave requests
on the iopoll list after they are freed.

Fixes: ddf0322db79c ("io_uring: add IORING_OP_PROVIDE_BUFFERS")
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

4d7ac019

io_uring: don't use kiocb.private to store buf_index · d90653af

由 Bijan Mottahedeh 提交于 5月 19, 2020

to #28170604

commit 4f4eeba87cc731b200bff9372d14a80f5996b277 upstream

kiocb.private is used in iomap_dio_rw() so store buf_index separately.
Signed-off-by: NBijan Mottahedeh <bijan.mottahedeh@oracle.com>

Move 'buf_index' to a hole in io_kiocb.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

d90653af

io_uring: cancel work if task_work_add() fails · 44e75116

由 Jens Axboe 提交于 5月 18, 2020

to #28170604

commit e3aabf9554fd04eb14cd44ae7583fc9d40edd250 upstream

We currently move it to the io_wqe_manager for execution, but we cannot
safely do so as we may lack some of the state to execute it out of
context. As we cancel work anyway when the ring/task exits, just mark
this request as canceled and io_async_task_func() will do the right
thing.

Fixes: aa96bf8a9ee3 ("io_uring: use io-wq manager as backup task if task is exiting")
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

44e75116

io_uring: remove dead check in io_splice() · f1645783

由 Jens Axboe 提交于 5月 17, 2020

to #28170604

commit 948a7749454b1712f1b2f2429f9493eb3e4a89b0 upstream

We checked for 'force_nonblock' higher up, so it's definitely false
at this point. Kill the check, it's a remnant of when we tried to do
inline splice without always punting to async context.

Fixes: 2fb3e82284fc ("io_uring: punt splice async because of inode mutex")
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

f1645783

io_uring: fix FORCE_ASYNC req preparation · c8fc1875

由 Pavel Begunkov 提交于 5月 17, 2020

to #28170604

commit bd2ab18a1d6267446eae1b47dd839050452bdf7f upstream

As for other not inlined requests, alloc req->io for FORCE_ASYNC reqs,
so they can be prepared properly.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

c8fc1875

io_uring: don't prepare DRAIN reqs twice · 3c2eefb7

由 Pavel Begunkov 提交于 5月 17, 2020

to #28170604

commit 650b548129b60b0d23508351800108196f4aa89f upstream

If req->io is not NULL, it's already prepared. Don't do it again,
it's dangerous.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

3c2eefb7

io_uring: initialize ctx->sqo_wait earlier · c73bbba3

由 Jens Axboe 提交于 5月 17, 2020

to #28170604

commit 583863ed918136412ddf14de2e12534f17cfdc6f upstream

Ensure that ctx->sqo_wait is initialized as soon as the ctx is allocated,
instead of deferring it to the offload setup. This fixes a syzbot
reported lockdep complaint, which is really due to trying to wake_up
on an uninitialized wait queue:

RSP: 002b:00007fffb1fb9aa8 EFLAGS: 00000246 ORIG_RAX: 00000000000001a9
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000441319
RDX: 0000000000000001 RSI: 0000000020000140 RDI: 000000000000047b
RBP: 0000000000010475 R08: 0000000000000001 R09: 00000000004002c8
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000402260
R13: 00000000004022f0 R14: 0000000000000000 R15: 0000000000000000
INFO: trying to register non-static key.
the code is fine but needs lockdep annotation.
turning off the locking correctness validator.
CPU: 1 PID: 7090 Comm: syz-executor222 Not tainted 5.7.0-rc1-next-20200415-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x188/0x20d lib/dump_stack.c:118
 assign_lock_key kernel/locking/lockdep.c:913 [inline]
 register_lock_class+0x1664/0x1760 kernel/locking/lockdep.c:1225
 __lock_acquire+0x104/0x4c50 kernel/locking/lockdep.c:4234
 lock_acquire+0x1f2/0x8f0 kernel/locking/lockdep.c:4934
 __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
 _raw_spin_lock_irqsave+0x8c/0xbf kernel/locking/spinlock.c:159
 __wake_up_common_lock+0xb4/0x130 kernel/sched/wait.c:122
 io_cqring_ev_posted+0xa5/0x1e0 fs/io_uring.c:1160
 io_poll_remove_all fs/io_uring.c:4357 [inline]
 io_ring_ctx_wait_and_kill+0x2bc/0x5a0 fs/io_uring.c:7305
 io_uring_create fs/io_uring.c:7843 [inline]
 io_uring_setup+0x115e/0x22b0 fs/io_uring.c:7870
 do_syscall_64+0xf6/0x7d0 arch/x86/entry/common.c:295
 entry_SYSCALL_64_after_hwframe+0x49/0xb3
RIP: 0033:0x441319
Code: e8 5c ae 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 bb 0a fc ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007fffb1fb9aa8 EFLAGS: 00000246 ORIG_RAX: 00000000000001a9

Reported-by: syzbot+8c91f5d054e998721c57@syzkaller.appspotmail.com
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

c73bbba3

io_uring: polled fixed file must go through free iteration · b850f336

由 Jens Axboe 提交于 5月 13, 2020

to #28170604

commit 9d9e88a24c1f20ebfc2f28b1762ce78c0b9e1cb3 upstream

When we changed the file registration handling, it became important to
iterate the bulk request freeing list for fixed files as well, or we
miss dropping the fixed file reference. If not, we're leaking references,
and we'll get a kworker stuck waiting for file references to disappear.

This also means we can remove the special casing of fixed vs non-fixed
files, we need to iterate for both and we can just rely on
__io_req_aux_free() doing io_put_file() instead of doing it manually.

Fixes: 055895537302 ("io_uring: refactor file register/unregister/update handling")
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

b850f336

io_uring: fix zero len do_splice() · a49e9093

由 Pavel Begunkov 提交于 5月 04, 2020

to #28170604

commit c96874265cd04b4bd4a8e114ac9af039a6d83cfe upstream

do_splice() doesn't expect len to be 0. Just always return 0 in this
case as splice(2) does.

Fixes: 7d67af2c0134 ("io_uring: add splice(2) support")
Reported-by: NJann Horn <jannh@google.com>
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

a49e9093

io_uring: don't use 'fd' for openat/openat2/statx · bb87d8ed

由 Jens Axboe 提交于 5月 07, 2020

to #28170604

commit 63ff822358b276137059520cf16e587e8073e80f upstream

We currently make some guesses as when to open this fd, but in reality
we have no business (or need) to do so at all. In fact, it makes certain
things fail, like O_PATH.

Remove the fd lookup from these opcodes, we're just passing the 'fd' to
generic helpers anyway. With that, we can also remove the special casing
of fd values in io_req_needs_file(), and the 'fd_non_neg' check that
we have. And we can ensure that we only read sqe->fd once.

This fixes O_PATH usage with openat/openat2, and ditto statx path side
oddities.

Cc: stable@vger.kernel.org: # v5.6
Reported-by: NMax Kellermann <mk@cm4all.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

bb87d8ed

io_uring: handle -EFAULT properly in io_uring_setup() · 7e9de51e

由 Xiaoguang Wang 提交于 5月 05, 2020

to #28170604

commit 7f13657d141346125f4d0bb93eab4777f40c406e upstream

If copy_to_user() in io_uring_setup() failed, we'll leak many kernel
resources, which will be recycled until process terminates. This bug
can be reproduced by using mprotect to set params to PROT_READ. To fix
this issue, refactor io_uring_create() a bit to add a new 'struct
io_uring_params __user *params' parameter and move the copy_to_user()
in io_uring_setup() to io_uring_setup(), if copy_to_user() failed,
we can free kernel resource properly.
Suggested-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

7e9de51e

io_uring: fix mismatched finish_wait() calls in io_uring_cancel_files() · 5af1d631

由 Xiaoguang Wang 提交于 4月 26, 2020

to #28170604

commit d8f1b9716cfd1a1f74c0fedad40c5f65a25aa208 upstream

The prepare_to_wait() and finish_wait() calls in io_uring_cancel_files()
are mismatched. Currently I don't see any issues related this bug, just
find it by learning codes.
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

5af1d631

io_uring: punt splice async because of inode mutex · 62483d76

由 Pavel Begunkov 提交于 5月 01, 2020

to #28170604

commit 2fb3e82284fca40afbde5351907f0a5b3be717f9 upstream

Nonblocking do_splice() still may wait for some time on an inode mutex.
Let's play safe and always punt it async.
Reported-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

62483d76

io_uring: check non-sync defer_list carefully · 6a6ddd60

由 Pavel Begunkov 提交于 5月 01, 2020

to #28170604

commit 4ee3631451c9a62e6b6bc7ee51fb9a5b34e33509 upstream

io_req_defer() do double-checked locking. Use proper helpers for that,
i.e. list_empty_careful().
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

6a6ddd60

io_uring: fix extra put in sync_file_range() · 63c46871

由 Pavel Begunkov 提交于 5月 01, 2020

to #28170604

commit 7759a0bfadceef3910d0e50f86d63b6ed58b4e70 upstream

[   40.179474] refcount_t: underflow; use-after-free.
[   40.179499] WARNING: CPU: 6 PID: 1848 at lib/refcount.c:28 refcount_warn_saturate+0xae/0xf0
...
[   40.179612] RIP: 0010:refcount_warn_saturate+0xae/0xf0
[   40.179617] Code: 28 44 0a 01 01 e8 d7 01 c2 ff 0f 0b 5d c3 80 3d 15 44 0a 01 00 75 91 48 c7 c7 b8 f5 75 be c6 05 05 44 0a 01 01 e8 b7 01 c2 ff <0f> 0b 5d c3 80 3d f3 43 0a 01 00 0f 85 6d ff ff ff 48 c7 c7 10 f6
[   40.179619] RSP: 0018:ffffb252423ebe18 EFLAGS: 00010286
[   40.179623] RAX: 0000000000000000 RBX: ffff98d65e929400 RCX: 0000000000000000
[   40.179625] RDX: 0000000000000001 RSI: 0000000000000086 RDI: 00000000ffffffff
[   40.179627] RBP: ffffb252423ebe18 R08: 0000000000000001 R09: 000000000000055d
[   40.179629] R10: 0000000000000c8c R11: 0000000000000001 R12: 0000000000000000
[   40.179631] R13: ffff98d68c434400 R14: ffff98d6a9cbaa20 R15: ffff98d6a609ccb8
[   40.179634] FS:  0000000000000000(0000) GS:ffff98d6af580000(0000) knlGS:0000000000000000
[   40.179636] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   40.179638] CR2: 00000000033e3194 CR3: 000000006480a003 CR4: 00000000003606e0
[   40.179641] Call Trace:
[   40.179652]  io_put_req+0x36/0x40
[   40.179657]  io_free_work+0x15/0x20
[   40.179661]  io_worker_handle_work+0x2f5/0x480
[   40.179667]  io_wqe_worker+0x2a9/0x360
[   40.179674]  ? _raw_spin_unlock_irqrestore+0x24/0x40
[   40.179681]  kthread+0x12c/0x170
[   40.179685]  ? io_worker_handle_work+0x480/0x480
[   40.179690]  ? kthread_park+0x90/0x90
[   40.179695]  ret_from_fork+0x35/0x40
[   40.179702] ---[ end trace 85027405f00110aa ]---

Opcode handler must never put submission ref, but that's what
io_sync_file_range_finish() do. use io_steal_work() there.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

63c46871

io_uring: use cond_resched() in io_ring_ctx_wait_and_kill() · 8e086891

由 Xiaoguang Wang 提交于 5月 01, 2020

to #28170604

commit 3fd44c86711f71156b586c22b0495c58f69358bb upstream

While working on to make io_uring sqpoll mode support syscalls that need
struct files_struct, I got cpu soft lockup in io_ring_ctx_wait_and_kill(),

    while (ctx->sqo_thread && !wq_has_sleeper(&ctx->sqo_wait))
        cpu_relax();

above loop never has an chance to exit, it's because preempt isn't enabled
in the kernel, and the context calling io_ring_ctx_wait_and_kill() and
io_sq_thread() run in the same cpu, if io_sq_thread calls a cond_resched()
yield cpu and another context enters above loop, then io_sq_thread() will
always in runqueue and never exit.

Use cond_resched() can fix this issue.

 Reported-by: syzbot+66243bb7126c410cefe6@syzkaller.appspotmail.com
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

8e086891

io_uring: use proper references for fallback_req locking · 078c3690

由 Bijan Mottahedeh 提交于 4月 29, 2020

to #28170604

commit dd461af65946de060bff2dab08a63676d2731afe upstream

Use ctx->fallback_req address for test_and_set_bit_lock() and
clear_bit_unlock().
Signed-off-by: NBijan Mottahedeh <bijan.mottahedeh@oracle.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

078c3690

io_uring: only force async punt if poll based retry can't handle it · f00e9380

由 Jens Axboe 提交于 4月 28, 2020

to #28170604

commit 490e89676a523c688343d6cb8ca5f5dc476414df upstream

We do blocking retry from our poll handler, if the file supports polled
notifications. Only mark the request as needing an async worker if we
can't poll for it.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

f00e9380

io_uring: enable poll retry for any file with ->read_iter / ->write_iter · 97433f24

由 Jens Axboe 提交于 4月 28, 2020

to #28170604

commit af197f50ac53fff1241598c73ca606754a3bb808 upstream

We can have files like eventfd where it's perfectly fine to do poll
based retry on them, right now io_file_supports_async() doesn't take
that into account.

Pass in data direction and check the f_op instead of just always needing
an async worker.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

97433f24

io_uring: statx must grab the file table for valid fd · 105d00dc

由 Jens Axboe 提交于 4月 27, 2020

to #28170604

commit 5b0bbee4732cbd58aa98213d4c11a366356bba3d upstream

Clay reports that OP_STATX fails for a test case with a valid fd
and empty path:

 -- Test 0: statx:fd 3: SUCCEED, file mode 100755
 -- Test 1: statx:path ./uring_statx: SUCCEED, file mode 100755
 -- Test 2: io_uring_statx:fd 3: FAIL, errno 9: Bad file descriptor
 -- Test 3: io_uring_statx:path ./uring_statx: SUCCEED, file mode 100755

This is due to statx not grabbing the process file table, hence we can't
lookup the fd in async context. If the fd is valid, ensure that we grab
the file table so we can grab the file from async context.

Cc: stable@vger.kernel.org # v5.6
Reported-by: NClay Harris <bugs@claycon.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

105d00dc

net: make socket read/write_iter() honor IOCB_NOWAIT · 23ebac4d

由 Jens Axboe 提交于 12月 09, 2019

to #28170604

commit ebfcd8955c0b52eb793bcbc9e71140e3d0cdb228 upstream

The socket read/write helpers only look at the file O_NONBLOCK. not
the iocb IOCB_NOWAIT flag. This breaks users like preadv2/pwritev2
and io_uring that rely on not having the file itself marked nonblocking,
but rather the iocb itself.

Cc: netdev@vger.kernel.org
Acked-by: NDavid Miller <davem@davemloft.net>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

23ebac4d

io_uring: only restore req->work for req that needs do completion · 73a1c707

由 Xiaoguang Wang 提交于 4月 19, 2020

to #28170604

commit 44575a67314b3768d4415252271e8f60c5c70118 upstream

When testing io_uring IORING_FEAT_FAST_POLL feature, I got below panic:
BUG: kernel NULL pointer dereference, address: 0000000000000030
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
CPU: 5 PID: 2154 Comm: io_uring_echo_s Not tainted 5.6.0+ #359
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.11.1-0-g0551a4be2c-prebuilt.qemu-project.org 04/01/2014
RIP: 0010:io_wq_submit_work+0xf/0xa0
Code: ff ff ff be 02 00 00 00 e8 ae c9 19 00 e9 58 ff ff ff 66 0f 1f
84 00 00 00 00 00 0f 1f 44 00 00 41 54 49 89 fc 55 53 48 8b 2f <8b>
45 30 48 8d 9d 48 ff ff ff 25 01 01 00 00 83 f8 01 75 07 eb 2a
RSP: 0018:ffffbef543e93d58 EFLAGS: 00010286
RAX: ffffffff84364f50 RBX: ffffa3eb50f046b8 RCX: 0000000000000000
RDX: ffffa3eb0efc1840 RSI: 0000000000000006 RDI: ffffa3eb50f046b8
RBP: 0000000000000000 R08: 00000000fffd070d R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffffa3eb50f046b8
R13: ffffa3eb0efc2088 R14: ffffffff85b69be0 R15: ffffa3eb0effa4b8
FS:  00007fe9f69cc4c0(0000) GS:ffffa3eb5ef40000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000030 CR3: 0000000020410000 CR4: 00000000000006e0
Call Trace:
 task_work_run+0x6d/0xa0
 do_exit+0x39a/0xb80
 ? get_signal+0xfe/0xbc0
 do_group_exit+0x47/0xb0
 get_signal+0x14b/0xbc0
 ? __x64_sys_io_uring_enter+0x1b7/0x450
 do_signal+0x2c/0x260
 ? __x64_sys_io_uring_enter+0x228/0x450
 exit_to_usermode_loop+0x87/0xf0
 do_syscall_64+0x209/0x230
 entry_SYSCALL_64_after_hwframe+0x49/0xb3
RIP: 0033:0x7fe9f64f8df9
Code: Bad RIP value.

task_work_run calls io_wq_submit_work unexpectedly, it's obvious that
struct callback_head's func member has been changed. After looking into
codes, I found this issue is still due to the union definition:
    union {
        /*
         * Only commands that never go async can use the below fields,
         * obviously. Right now only IORING_OP_POLL_ADD uses them, and
         * async armed poll handlers for regular commands. The latter
         * restore the work, if needed.
         */
        struct {
            struct callback_head	task_work;
            struct hlist_node	hash_node;
            struct async_poll	*apoll;
        };
        struct io_wq_work	work;
    };

When task_work_run has multiple work to execute, the work that calls
io_poll_remove_all() will do req->work restore for  non-poll request
always, but indeed if a non-poll request has been added to a new
callback_head, subsequent callback will call io_async_task_func() to
handle this request, that means we should not do the restore work
for such non-poll request. Meanwhile in io_async_task_func(), we should
drop submit ref when req has been canceled.

Fix both issues.

Fixes: b1f573bd15fd ("io_uring: restore req->work when canceling poll request")
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

Use io_double_put_req()
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

73a1c707

io_uring: don't count rqs failed after current one · 68373156

由 Pavel Begunkov 提交于 4月 15, 2020

to #28170604

commit 31af27c7cc9f675d93a135dca99e6413f9096f1d upstream

When checking for draining with __req_need_defer(), it tries to match
how many requests were sent before a current one with number of already
completed. Dropped SQEs are included in req->sequence, and they won't
ever appear in CQ. To compensate for that, __req_need_defer() substracts
ctx->cached_sq_dropped.
However, what it should really use is number of SQEs dropped __before__
the current one. In other words, any submitted request shouldn't
shouldn't affect dequeueing from the drain queue of previously submitted
ones.

Instead of saving proper ctx->cached_sq_dropped in each request,
substract from req->sequence it at initialisation, so it includes number
of properly submitted requests.

note: it also changes behaviour of timeouts, but
1. it's already diverge from the description because of using SQ
2. the description is ambiguous regarding dropped SQEs
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

68373156

io_uring: kill already cached timeout.seq_offset · 3aecf4ea

由 Pavel Begunkov 提交于 4月 15, 2020

to #28170604

commit b55ce732004989c85bf9d858c03e6d477cf9023b upstream

req->timeout.count and req->io->timeout.seq_offset store the same value,
which is sqe->off. Kill the second one
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

3aecf4ea

io_uring: fix cached_sq_head in io_timeout() · 94173879

由 Pavel Begunkov 提交于 4月 15, 2020

to #28170604

commit 22cad1585c6bc6caf2688701004cf2af6865cbe0 upstream

io_timeout() can be executed asynchronously by a worker and without
holding ctx->uring_lock

1. using ctx->cached_sq_head there is racy there
2. it should count events from a moment of timeout's submission, but
not execution

Use req->sequence.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

94173879

io_uring: only post events in io_poll_remove_all() if we completed some · 40bee6c9

由 Jens Axboe 提交于 4月 13, 2020

to #28170604

commit 8e2e1faf28b3e66430f55f0b0ee83370ecc150af upstream

syzbot reports this crash:

BUG: unable to handle page fault for address: ffffffffffffffe8
PGD f96e17067 P4D f96e17067 PUD f96e19067 PMD 0
Oops: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
CPU: 55 PID: 211750 Comm: trinity-c127 Tainted: G    B        L    5.7.0-rc1-next-20200413 #4
Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 04/12/2017
RIP: 0010:__wake_up_common+0x98/0x290
el/sched/wait.c:87
Code: 40 4d 8d 78 e8 49 8d 7f 18 49 39 fd 0f 84 80 00 00 00 e8 6b bd 2b 00 49 8b 5f 18 45 31 e4 48 83 eb 18 4c 89 ff e8 08 bc 2b 00 <45> 8b 37 41 f6 c6 04 75 71 49 8d 7f 10 e8 46 bd 2b 00 49 8b 47 10
RSP: 0018:ffffc9000adbfaf0 EFLAGS: 00010046
RAX: 0000000000000000 RBX: ffffffffffffffe8 RCX: ffffffffaa9636b8
RDX: 0000000000000003 RSI: dffffc0000000000 RDI: ffffffffffffffe8
RBP: ffffc9000adbfb40 R08: fffffbfff582c5fd R09: fffffbfff582c5fd
R10: ffffffffac162fe3 R11: fffffbfff582c5fc R12: 0000000000000000
R13: ffff888ef82b0960 R14: ffffc9000adbfb80 R15: ffffffffffffffe8
FS:  00007fdcba4c4740(0000) GS:ffff889033780000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffffffffe8 CR3: 0000000f776a0004 CR4: 00000000001606e0
Call Trace:
 __wake_up_common_lock+0xea/0x150
ommon_lock at kernel/sched/wait.c:124
 ? __wake_up_common+0x290/0x290
 ? lockdep_hardirqs_on+0x16/0x2c0
 __wake_up+0x13/0x20
 io_cqring_ev_posted+0x75/0xe0
v_posted at fs/io_uring.c:1160
 io_ring_ctx_wait_and_kill+0x1c0/0x2f0
l at fs/io_uring.c:7305
 io_uring_create+0xa8d/0x13b0
 ? io_req_defer_prep+0x990/0x990
 ? __kasan_check_write+0x14/0x20
 io_uring_setup+0xb8/0x130
 ? io_uring_create+0x13b0/0x13b0
 ? check_flags.part.28+0x220/0x220
 ? lockdep_hardirqs_on+0x16/0x2c0
 __x64_sys_io_uring_setup+0x31/0x40
 do_syscall_64+0xcc/0xaf0
 ? syscall_return_slowpath+0x580/0x580
 ? lockdep_hardirqs_off+0x1f/0x140
 ? entry_SYSCALL_64_after_hwframe+0x3e/0xb3
 ? trace_hardirqs_off_caller+0x3a/0x150
 ? trace_hardirqs_off_thunk+0x1a/0x1c
 entry_SYSCALL_64_after_hwframe+0x49/0xb3
RIP: 0033:0x7fdcb9dd76ed
Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 6b 57 2c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffe7fd4e4f8 EFLAGS: 00000246 ORIG_RAX: 00000000000001a9
RAX: ffffffffffffffda RBX: 00000000000001a9 RCX: 00007fdcb9dd76ed
RDX: fffffffffffffffc RSI: 0000000000000000 RDI: 0000000000005d54
RBP: 00000000000001a9 R08: 0000000e31d3caa7 R09: 0082400004004000
R10: ffffffffffffffff R11: 0000000000000246 R12: 0000000000000002
R13: 00007fdcb842e058 R14: 00007fdcba4c46c0 R15: 00007fdcb842e000
Modules linked in: bridge stp llc nfnetlink cn brd vfat fat ext4 crc16 mbcache jbd2 loop kvm_intel kvm irqbypass intel_cstate intel_uncore dax_pmem intel_rapl_perf dax_pmem_core ip_tables x_tables xfs sd_mod tg3 firmware_class libphy hpsa scsi_transport_sas dm_mirror dm_region_hash dm_log dm_mod [last unloaded: binfmt_misc]
CR2: ffffffffffffffe8
---[ end trace f9502383d57e0e22 ]---
RIP: 0010:__wake_up_common+0x98/0x290
Code: 40 4d 8d 78 e8 49 8d 7f 18 49 39 fd 0f 84 80 00 00 00 e8 6b bd 2b 00 49 8b 5f 18 45 31 e4 48 83 eb 18 4c 89 ff e8 08 bc 2b 00 <45> 8b 37 41 f6 c6 04 75 71 49 8d 7f 10 e8 46 bd 2b 00 49 8b 47 10
RSP: 0018:ffffc9000adbfaf0 EFLAGS: 00010046
RAX: 0000000000000000 RBX: ffffffffffffffe8 RCX: ffffffffaa9636b8
RDX: 0000000000000003 RSI: dffffc0000000000 RDI: ffffffffffffffe8
RBP: ffffc9000adbfb40 R08: fffffbfff582c5fd R09: fffffbfff582c5fd
R10: ffffffffac162fe3 R11: fffffbfff582c5fc R12: 0000000000000000
R13: ffff888ef82b0960 R14: ffffc9000adbfb80 R15: ffffffffffffffe8
FS:  00007fdcba4c4740(0000) GS:ffff889033780000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffffffffe8 CR3: 0000000f776a0004 CR4: 00000000001606e0
Kernel panic - not syncing: Fatal exception
Kernel Offset: 0x29800000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
---[ end Kernel panic - not syncing: Fatal exception ]—

which is due to error injection (or allocation failure) preventing the
rings from being setup. On shutdown, we attempt to remove any pending
requests, and for poll request, we call io_cqring_ev_posted() when we've
killed poll requests. However, since the rings aren't setup, we won't
find any poll requests. Make the calling of io_cqring_ev_posted()
dependent on actually having completed requests. This fixes this setup
corner case, and removes spurious calls if we remove poll requests and
don't find any.
Reported-by: NQian Cai <cai@lca.pw>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

40bee6c9

io_uring: io_async_task_func() should check and honor cancelation · 2cfc4842

由 Jens Axboe 提交于 4月 13, 2020

to #28170604

commit 2bae047ec9576da72d5003487de0bb93e747fff7 upstream

If the request has been marked as canceled, don't try and issue it.
Instead just fill a canceled event and finish the request.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

2cfc4842

io_uring: check for need to re-wait in polled async handling · 2e7787a6

由 Jens Axboe 提交于 4月 13, 2020

to #28170604

commit 74ce6ce43d4fc6ce15efb21378d9ef26125c298b upstream

We added this for just the regular poll requests in commit a6ba632d2c24
("io_uring: retry poll if we got woken with non-matching mask"), we
should do the same for the poll handler used pollable async requests.
Move the re-wait check and arm into a helper, and call it from
io_async_task_func() as well.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

2e7787a6

io_uring: correct O_NONBLOCK check for splice punt · 874faab1

由 Jens Axboe 提交于 4月 12, 2020

to #28170604

commit 88357580854aab29d27e1a443575caaedd081612 upstream

The splice file punt check uses file->f_mode to check for O_NONBLOCK,
but it should be checking file->f_flags. This leads to punting even
for files that have O_NONBLOCK set, which isn't necessary. This equates
to checking for FMODE_PATH, which will never be set on the fd in
question.

Fixes: 7d67af2c0134 ("io_uring: add splice(2) support")
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

874faab1

io_uring: restore req->work when canceling poll request · 8ce2eca8

由 Xiaoguang Wang 提交于 4月 12, 2020

to #28170604

commit b1f573bd15fda2e19ea66a4d26fae8be1b12791d upstream

When running liburing test case 'accept', I got below warning:
RED: Invalid credentials
RED: At include/linux/cred.h:285
RED: Specified credentials: 00000000d02474a0
RED: ->magic=4b, put_addr=000000005b4f46e9
RED: ->usage=-1699227648, subscr=-25693
RED: ->*uid = { 256,-25693,-25693,65534 }
RED: ->*gid = { 0,-1925859360,-1789740800,-1827028688 }
RED: ->security is 00000000258c136e
eneral protection fault, probably for non-canonical address 0xdead4ead00000000: 0000 [#1] SMP PTI
PU: 21 PID: 2037 Comm: accept Not tainted 5.6.0+ #318
ardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.11.1-0-g0551a4be2c-prebuilt.qemu-project.org 04/01/2014
IP: 0010:dump_invalid_creds+0x16f/0x184
ode: 48 8b 83 88 00 00 00 48 3d ff 0f 00 00 76 29 48 89 c2 81 e2 00 ff ff ff 48
81 fa 00 6b 6b 6b 74 17 5b 48 c7 c7 4b b1 10 8e 5d <8b> 50 04 41 5c 8b 30 41 5d
e9 67 e3 04 00 5b 5d 41 5c 41 5d c3 0f
SP: 0018:ffffacc1039dfb38 EFLAGS: 00010087
AX: dead4ead00000000 RBX: ffff9ba39319c100 RCX: 0000000000000007
DX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff8e10b14b
BP: ffffffff8e108476 R08: 0000000000000000 R09: 0000000000000001
10: 0000000000000000 R11: ffffacc1039df9e5 R12: 000000009552b900
13: 000000009319c130 R14: ffff9ba39319c100 R15: 0000000000000246
S:  00007f96b2bfc4c0(0000) GS:ffff9ba39f340000(0000) knlGS:0000000000000000
S:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
R2: 0000000000401870 CR3: 00000007db7a4000 CR4: 00000000000006e0
all Trace:
__invalid_creds+0x48/0x4a
__io_req_aux_free+0x2e8/0x3b0
? io_poll_remove_one+0x2a/0x1d0
__io_free_req+0x18/0x200
io_free_req+0x31/0x350
io_poll_remove_one+0x17f/0x1d0
io_poll_cancel.isra.80+0x6c/0x80
io_async_find_and_cancel+0x111/0x120
io_issue_sqe+0x181/0x10e0
? __lock_acquire+0x552/0xae0
? lock_acquire+0x8e/0x310
? fs_reclaim_acquire.part.97+0x5/0x30
__io_queue_sqe.part.100+0xc4/0x580
? io_submit_sqes+0x751/0xbd0
? rcu_read_lock_sched_held+0x32/0x40
io_submit_sqes+0x9ba/0xbd0
? __x64_sys_io_uring_enter+0x2b2/0x460
? __x64_sys_io_uring_enter+0xaf/0x460
? find_held_lock+0x2d/0x90
? __x64_sys_io_uring_enter+0x111/0x460
__x64_sys_io_uring_enter+0x2d7/0x460
do_syscall_64+0x5a/0x230
entry_SYSCALL_64_after_hwframe+0x49/0xb3

After looking into codes, it turns out that this issue is because we didn't
restore the req->work, which is changed in io_arm_poll_handler(), req->work
is a union with below struct:
	struct {
		struct callback_head	task_work;
		struct hlist_node	hash_node;
		struct async_poll	*apoll;
	};
If we forget to restore, members in struct io_wq_work would be invalid,
restore the req->work to fix this issue.
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

Get rid of not needed 'need_restore' variable.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

8ce2eca8

io_uring: move all request init code in one place · 7b895592

由 Pavel Begunkov 提交于 4月 12, 2020

to #28170604

commit ef4ff581102a917a69877feca2e5347e2f3e458c upstream

Requests initialisation is scattered across several functions, namely
io_init_req(), io_submit_sqes(), io_submit_sqe(). Put it
in io_init_req() for better data locality and code clarity.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

7b895592

openanolis / cloud-kernel 接近 2 年 前同步成功

openanolis / cloud-kernel
接近 2 年前同步成功