- 22 2月, 2021 9 次提交
-
-
由 Jens Axboe 提交于
Remove the bool return, and the checking for it in the caller. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
Instead of using regular kthread kernel threads, create kernel threads that are like a real thread that the task would create. This ensures that we get all the context that we need, without having to carry that state around. This greatly reduces the code complexity, and the risk of missing state for a given request type. With the move away from kthread, we can also dump everything related to assigned state to the new threads. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
Just grab it from the worker itself, which we're already passing in. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
Move it outside of the io_ring_ctx, and tie it to the io_uring task context. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
We don't support attach anymore, so doesn't make sense to carry the use_refs reference count. Get rid of it. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
Moving towards making the io_wq per ring per task, so we can't really share it between rings. Which is fine, since we've now dropped some of that fat from it. Retain compatibility with how attaching works, so that any attempt to attach to an fd that doesn't exist, or isn't an io_uring fd, will fail like it did before. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
When the manager thread starts up, it creates a worker per node for the given context. Just let these get created dynamically, like we do for adding further workers. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
We hit this case when the task is exiting, and we need somewhere to do background cleanup of requests. Instead of relying on the io-wq task manager to do this work for us, just stuff it somewhere where we can safely run it ourselves directly. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
Do run task_work before io_uring_register(), that might make a first quiesce round much nicer. We generally do that for any syscall invocation to avoid spurious -EINTR/-ERESTARTSYS, for task_work that we generate. This patch brings io_uring_register() inline with the two other io_uring syscalls. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 21 2月, 2021 8 次提交
-
-
由 Pavel Begunkov 提交于
sqe->flags are subset of req flags, so incorrectly copied may span into in-kernel flags and wreck havoc, e.g. by setting REQ_F_INFLIGHT. Fixes: 5be9ad1e ("io_uring: optimise io_init_req() flags setting") Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
There is a short window where percpu_refs are already turned zero, but we try to do resurrect(). Play nicer and wait for ->release() to happen in this case and proceed as everything is ok. One downside for ctx refs is that we can ignore signal_pending() on a rare occasion, but someone else should check for it later if needed. Cc: <stable@vger.kernel.org> # 5.5+ Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
io_rsrc_ref_quiesce() is a generic resource function, though now it was wired to allocate and initialise ref nodes with file-specific callbacks/etc. Keep it sane by passing in as a parameters everything we need for initialisations, otherwise it will hurt us badly one day. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
After a rsrc/files reference node's refs are killed, it must never be used. And that's how it works, it either assigns a new node or kills the whole data table. Let's explicitly NULL it, that shouldn't be necessary, but if something would go wrong I'd rather catch a NULL dereference to using a dangling pointer. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
With the prep and prep async split, we now have potentially 3 helpers that need to be defined for !CONFIG_NET. Add some helpers to do just that. Fixes the following compile error on !CONFIG_NET: fs/io_uring.c:6171:10: error: implicit declaration of function 'io_sendmsg_prep_async'; did you mean 'io_req_prep_async'? [-Werror=implicit-function-declaration] return io_sendmsg_prep_async(req); ^~~~~~~~~~~~~~~~~~~~~ io_req_prep_async Fixes: 93642ef8 ("io_uring: split sqe-prep and async setup") Reported-by: NNaresh Kamboju <naresh.kamboju@linaro.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Hao Xu 提交于
Abaci reported the below issue: [ 141.400455] hrtimer: interrupt took 205853 ns [ 189.869316] process 'usr/local/ilogtail/ilogtail_0.16.26' started with executable stack [ 250.188042] [ 250.188327] ============================================ [ 250.189015] WARNING: possible recursive locking detected [ 250.189732] 5.11.0-rc4 #1 Not tainted [ 250.190267] -------------------------------------------- [ 250.190917] a.out/7363 is trying to acquire lock: [ 250.191506] ffff888114dbcbe8 (&ctx->uring_lock){+.+.}-{3:3}, at: __io_req_task_submit+0x29/0xa0 [ 250.192599] [ 250.192599] but task is already holding lock: [ 250.193309] ffff888114dbfbe8 (&ctx->uring_lock){+.+.}-{3:3}, at: __x64_sys_io_uring_register+0xad/0x210 [ 250.194426] [ 250.194426] other info that might help us debug this: [ 250.195238] Possible unsafe locking scenario: [ 250.195238] [ 250.196019] CPU0 [ 250.196411] ---- [ 250.196803] lock(&ctx->uring_lock); [ 250.197420] lock(&ctx->uring_lock); [ 250.197966] [ 250.197966] *** DEADLOCK *** [ 250.197966] [ 250.198837] May be due to missing lock nesting notation [ 250.198837] [ 250.199780] 1 lock held by a.out/7363: [ 250.200373] #0: ffff888114dbfbe8 (&ctx->uring_lock){+.+.}-{3:3}, at: __x64_sys_io_uring_register+0xad/0x210 [ 250.201645] [ 250.201645] stack backtrace: [ 250.202298] CPU: 0 PID: 7363 Comm: a.out Not tainted 5.11.0-rc4 #1 [ 250.203144] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [ 250.203887] Call Trace: [ 250.204302] dump_stack+0xac/0xe3 [ 250.204804] __lock_acquire+0xab6/0x13a0 [ 250.205392] lock_acquire+0x2c3/0x390 [ 250.205928] ? __io_req_task_submit+0x29/0xa0 [ 250.206541] __mutex_lock+0xae/0x9f0 [ 250.207071] ? __io_req_task_submit+0x29/0xa0 [ 250.207745] ? 0xffffffffa0006083 [ 250.208248] ? __io_req_task_submit+0x29/0xa0 [ 250.208845] ? __io_req_task_submit+0x29/0xa0 [ 250.209452] ? __io_req_task_submit+0x5/0xa0 [ 250.210083] __io_req_task_submit+0x29/0xa0 [ 250.210687] io_async_task_func+0x23d/0x4c0 [ 250.211278] task_work_run+0x89/0xd0 [ 250.211884] io_run_task_work_sig+0x50/0xc0 [ 250.212464] io_sqe_files_unregister+0xb2/0x1f0 [ 250.213109] __io_uring_register+0x115a/0x1750 [ 250.213718] ? __x64_sys_io_uring_register+0xad/0x210 [ 250.214395] ? __fget_files+0x15a/0x260 [ 250.214956] __x64_sys_io_uring_register+0xbe/0x210 [ 250.215620] ? trace_hardirqs_on+0x46/0x110 [ 250.216205] do_syscall_64+0x2d/0x40 [ 250.216731] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 250.217455] RIP: 0033:0x7f0fa17e5239 [ 250.218034] Code: 01 00 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 3d 01 f0 ff ff 73 01 c3 48 8b 0d 27 ec 2c 00 f7 d8 64 89 01 48 [ 250.220343] RSP: 002b:00007f0fa1eeac48 EFLAGS: 00000246 ORIG_RAX: 00000000000001ab [ 250.221360] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f0fa17e5239 [ 250.222272] RDX: 0000000000000000 RSI: 0000000000000003 RDI: 0000000000000008 [ 250.223185] RBP: 00007f0fa1eeae20 R08: 0000000000000000 R09: 0000000000000000 [ 250.224091] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 [ 250.224999] R13: 0000000000021000 R14: 0000000000000000 R15: 00007f0fa1eeb700 This is caused by calling io_run_task_work_sig() to do work under uring_lock while the caller io_sqe_files_unregister() already held uring_lock. To fix this issue, briefly drop uring_lock when calling io_run_task_work_sig(), and there are two things to concern: - hold uring_lock in io_ring_ctx_free() around io_sqe_files_unregister() this is for consistency of lock/unlock. - add new fixed rsrc ref node before dropping uring_lock it's not safe to do io_uring_enter-->percpu_ref_get() with a dying one. - check if rsrc_data->refs is dying to avoid parallel io_sqe_files_unregister Reported-by: NAbaci <abaci@linux.alibaba.com> Fixes: 1ffc5422 ("io_uring: fix io_sqe_files_unregister() hangs") Suggested-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NHao Xu <haoxu@linux.alibaba.com> [axboe: fixes from Pavel folded in] Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
In case of failure io_wq_submit_work() needs to post an CQE and so potentially take uring_lock. The safest way to deal with it is to do that from under task_work where we can safely take the lock. Also, as io_iopoll_check() holds the lock tight and releases it reluctantly, it will play nicer in the furuter with notifying an iopolling task about new such pending failed requests. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Al Viro 提交于
After switching to non-RCU mode, we want nd->depth to match the number of entries in nd->stack[] that need eventual path_put(). legitimize_links() takes care of that on failures; unfortunately, failure exits added for LOOKUP_CACHED do not. We could add the logics for that into those failure exits, both in try_to_unlazy() and in try_to_unlazy_next(), but since both checks are immediately followed by legitimize_links() and there's no calls of legitimize_links() other than those two... It's easier to move the check (and required handling of nd->depth on failure) into legitimize_links() itself. [caught by Jens: ... and since we are zeroing ->depth here, we need to do drop_links() first] Fixes: 6c6ec2b0 "fs: add support for LOOKUP_CACHED" Tested-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 19 2月, 2021 13 次提交
-
-
由 Pavel Begunkov 提交于
[ 97.866748] a.out/2890 is trying to acquire lock: [ 97.867829] ffff8881046763e8 (&ctx->uring_lock){+.+.}-{3:3}, at: io_wq_submit_work+0x155/0x240 [ 97.869735] [ 97.869735] but task is already holding lock: [ 97.871033] ffff88810dfe0be8 (&ctx->uring_lock){+.+.}-{3:3}, at: __x64_sys_io_uring_enter+0x3f0/0x5b0 [ 97.873074] [ 97.873074] other info that might help us debug this: [ 97.874520] Possible unsafe locking scenario: [ 97.874520] [ 97.875845] CPU0 [ 97.876440] ---- [ 97.877048] lock(&ctx->uring_lock); [ 97.877961] lock(&ctx->uring_lock); [ 97.878881] [ 97.878881] *** DEADLOCK *** [ 97.878881] [ 97.880341] May be due to missing lock nesting notation [ 97.880341] [ 97.881952] 1 lock held by a.out/2890: [ 97.882873] #0: ffff88810dfe0be8 (&ctx->uring_lock){+.+.}-{3:3}, at: __x64_sys_io_uring_enter+0x3f0/0x5b0 [ 97.885108] [ 97.885108] stack backtrace: [ 97.890457] Call Trace: [ 97.891121] dump_stack+0xac/0xe3 [ 97.891972] __lock_acquire+0xab6/0x13a0 [ 97.892940] lock_acquire+0x2c3/0x390 [ 97.894894] __mutex_lock+0xae/0x9f0 [ 97.901101] io_wq_submit_work+0x155/0x240 [ 97.902112] io_wq_cancel_cb+0x162/0x490 [ 97.904126] io_async_find_and_cancel+0x3b/0x140 [ 97.905247] io_issue_sqe+0x86d/0x13e0 [ 97.909122] __io_queue_sqe+0x10b/0x550 [ 97.913971] io_queue_sqe+0x235/0x470 [ 97.914894] io_submit_sqes+0xcce/0xf10 [ 97.917872] __x64_sys_io_uring_enter+0x3fb/0x5b0 [ 97.921424] do_syscall_64+0x2d/0x40 [ 97.922329] entry_SYSCALL_64_after_hwframe+0x44/0xa9 While holding uring_lock, e.g. from inline execution, async cancel request may attempt cancellations through io_wq_submit_work, which may try to grab a lock. Delay it to task_work, so we do it from a clean context and don't have to worry about locking. Cc: <stable@vger.kernel.org> # 5.5+ Fixes: c07e6719 ("io_uring: hold uring_lock while completing failed polled io in io_wq_submit_work()") Reported-by: NAbaci <abaci@linux.alibaba.com> Reported-by: NHao Xu <haoxu@linux.alibaba.com> Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jiri Bohac 提交于
Both pstore_compress() and decompress_record() use a mistyped config option name ("PSTORE_COMPRESSION" instead of "PSTORE_COMPRESS"). As a result compression and decompression of pstore records was always disabled. Use the correct config option name. Signed-off-by: NJiri Bohac <jbohac@suse.cz> Fixes: fd49e032 ("pstore: Fix linking when crypto API disabled") Acked-by: NMatteo Croce <mcroce@microsoft.com> Signed-off-by: NKees Cook <keescook@chromium.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210218111547.johvp5klpv3xrpnn@dwarf.suse.cz
-
由 Pavel Begunkov 提交于
Instead of marking a link with REQ_F_FAIL_LINK on an error and delaying its failing to the caller, do it eagerly right when after getting an error in io_submit_sqe(). This renders FAIL_LINK checks in io_queue_link_head() useless and we can skip it. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
Now, as we can do async setup without holding an SQE, we can skip doing io_req_defer_prep() for link heads, it will be tried to be executed inline and follows all the rules of the non-linked requests. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
Now as preparations are split from async setup, we can do the first one pretty early not spilling it across multiple call sites. And after it's done SQE is not needed anymore and we can save on passing it deeply into the submission stack. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
There are two kinds of opcode-specific preparations we do. The first is just initialising req with what is always needed for an opcode and reading all non-generic SQE fields. And the second is copying some of the stuff like iovec preparing to punt a request to somewhere async, e.g. to io-wq or for draining. For requests that have tried an inline execution but still needing to be punted, the second prep type is done by the opcode handler itself. Currently, we don't explicitly split those preparation steps, but combining both of them into io_*_prep(), altering the behaviour by allocating ->async_data. That's pretty messy and hard to follow and also gets in the way of some optimisations. Split the steps, leave the first type as where it is now, and put the second into a new io_req_prep_async() helper. It may make us to do opcode switch twice, but it's worth it. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
If we get an error in io_init_req() for a request that would have been linked, we break the submission but still issue a partially composed link, that's nasty, fail it instead. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
Move struct io_submit_link into submit_state, which is a part of a submission state and so belongs to it. It saves us from explicitly passing it, and init/deinit is now nicely hidden in io_submit_state_[start,end]. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
Behaves identically, just move io_init_req() call into the beginning of io_submit_sqes(). That looks better unloads io_submit_sqes(). Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
A preparation patch, symbol to symbol move io_init_req() + io_check_restriction() a bit up. The submission path is pretty settled down, so don't worry about backports and move the functions instead of relying on forward declarations in the future. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
IORING_OP_SYNC_FILE_RANGE is marked as .needs_file, so the common path will take care of assigning and validating req->file, no need to duplicate it in io_sfr_prep(). Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
Follow io_*_prep() naming pattern, there are only fsync and sfr that don't do that. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
@i and @Submitted are very much coupled together, and there is no need to keep them both. Remove @i, it doesn't change generated binary but helps to keep a single source of truth. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 18 2月, 2021 1 次提交
-
-
由 Pavel Begunkov 提交于
Don't forget to free iovec read inline completion and bunch of other cases that do "goto done" before setting up an async context. Fixes: 5ea5dd45 ("io_uring: inline io_read()'s iovec freeing") Reported-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 17 2月, 2021 1 次提交
-
-
由 Jens Axboe 提交于
We add task_work from any context, hence we need to ensure that we can tolerate it being from IRQ context as well. Fixes: 7cbf1722 ("io_uring: provide FIFO ordering for task_work") Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 16 2月, 2021 2 次提交
-
-
由 Jens Axboe 提交于
If this is attempted by an io-wq kthread, then return -EOPNOTSUPP as we don't currently support that. Once we can get task_pid_ptr() doing the right thing, then this can go away again. Use PF_IO_WORKER for this to speciically target the io_uring workers. Modify the /proc/self/ check to use PF_IO_WORKER as well. Cc: stable@vger.kernel.org Fixes: 8d4c3e76 ("proc: don't allow async path resolution of /proc/self components") Reported-by: NEric W. Biederman <ebiederm@xmission.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Laurent Vivier 提交于
It can be useful to the interpreter to know which flags are in use. For instance, knowing if the preserve-argv[0] is in use would allow to skip the pathname argument. This patch uses an unused auxiliary vector, AT_FLAGS, to add a flag to inform interpreter if the preserve-argv[0] is enabled. Note by Helge Deller: The real-world user of this patch is qemu-user, which needs to know if it has to preserve the argv[0]. See Debian bug #970460. Signed-off-by: NLaurent Vivier <laurent@vivier.eu> Reviewed-by: NYunQiang Su <ysu@wavecomp.com> URL: http://bugs.debian.org/970460Signed-off-by: NHelge Deller <deller@gmx.de>
-
- 14 2月, 2021 5 次提交
-
-
由 Wang ShaoBo 提交于
Fix to return PTR_ERR() error code from the error handling case instead fo 0 in function alloc_wbufs(), as done elsewhere in this function. Fixes: 6a98bc46 ("ubifs: Add authentication nodes to journal") Signed-off-by: NWang ShaoBo <bobo.shaobowang@huawei.com> Reviewed-by: NSascha Hauer <s.hauer@pengutronix.de> Signed-off-by: NRichard Weinberger <richard@nod.at>
-
由 Heiko Carstens 提交于
s390 and alpha are the only 64 bit architectures with a 32-bit ino_t. Since this is quite unusual this causes bugs from time to time. See e.g. commit ebce3eb2 ("ceph: fix inode number handling on arches with 32-bit ino_t") for an example. This (obviously) also prevents s390 and alpha to use 64-bit ino_t for tmpfs. See commit b85a7a8b ("tmpfs: disallow CONFIG_TMPFS_INODE64 on s390"). Therefore switch both s390 and alpha to 64-bit ino_t. This should only have an effect on the ustat system call. To prevent ABI breakage define struct ustat compatible to the old layout and change sys_ustat() accordingly. Acked-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NHeiko Carstens <hca@linux.ibm.com> Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>
-
由 Jens Axboe 提交于
Be nice and prune these upfront, in case the ring is being shared and one of the tasks is going away. This is a bit more important now that we account the allocations. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
We have three different ones, put it in a helper for easy calling. This is in preparation for doing it outside of ring freeing as well. With that in mind, also ensure that we do the proper locking for safe calling from a context where the ring it still live. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
No changes in this patch, just allows a caller to pass in a targeted task that we must match for freeing requests in the cache. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 13 2月, 2021 1 次提交
-
-
由 Jaegeuk Kim 提交于
Let's allow mounting readonly partition. We're able to recovery later once we have it as read-write back. Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
-