- 24 2月, 2021 1 次提交
-
-
由 Jens Axboe 提交于
We're now just using fork like we would from userspace, so there's no need to try and impose extra restrictions or accounting on the user side of things. That's already being done for us. That also means we don't have to pass in the user_struct anymore, that's correctly inherited through ->creds on fork. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 22 2月, 2021 9 次提交
-
-
由 Jens Axboe 提交于
We want to use this in io_uring proper as well, for the SQPOLL thread. Rename it from fork_thread() to io_wq_fork_thread(), and make it available through the io-wq.h header. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
If the worker isn't on the free_list, don't attempt to delete it. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
We are no longer grabbing state, so no need to maintain an IO identity that we COW if there are changes. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
Remove the bool return, and the checking for it in the caller. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
Instead of using regular kthread kernel threads, create kernel threads that are like a real thread that the task would create. This ensures that we get all the context that we need, without having to carry that state around. This greatly reduces the code complexity, and the risk of missing state for a given request type. With the move away from kthread, we can also dump everything related to assigned state to the new threads. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
Just grab it from the worker itself, which we're already passing in. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
We don't support attach anymore, so doesn't make sense to carry the use_refs reference count. Get rid of it. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
When the manager thread starts up, it creates a worker per node for the given context. Just let these get created dynamically, like we do for adding further workers. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
We hit this case when the task is exiting, and we need somewhere to do background cleanup of requests. Instead of relying on the io-wq task manager to do this work for us, just stuff it somewhere where we can safely run it ourselves directly. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 13 2月, 2021 1 次提交
-
-
由 Jens Axboe 提交于
By default, kernel threads have init_fs and init_files assigned. In the past, this has triggered security problems, as commands that don't ask for (and hence don't get assigned) fs/files from the originating task can then attempt path resolution etc with access to parts of the system they should not be able to. Rather than add checks in the fs code for misuse, just set these to NULL. If we do attempt to use them, then the resulting code will oops rather than provide access to something that it should not permit. Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 04 2月, 2021 1 次提交
-
-
由 Pavel Begunkov 提交于
Saving one lock/unlock for io-wq is not super important, but adds some ugliness in the code. More important, atomic decs not turning it to zero for some archs won't give the right ordering/barriers so the io_steal_work() may pretty easily get subtly and completely broken. Return back 2-step io-wq work exchange and clean it up. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 02 2月, 2021 1 次提交
-
-
由 Jens Axboe 提交于
It's no longer used as IORING_OP_CLOSE got rid for the need of flagging it as uncancelable, kill it of. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 21 12月, 2020 1 次提交
-
-
由 Jens Axboe 提交于
io_uring no longer issues full cancelations on the io-wq, so remove any remnants of this code and the IO_WQ_BIT_CANCEL flag. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 10 12月, 2020 1 次提交
-
-
由 Pavel Begunkov 提交于
Instead of iterating over each request and cancelling it individually in io_uring_cancel_files(), try to cancel all matching requests and use ->inflight_list only to check if there anything left. In many cases it should be faster, and we can reuse a lot of code from task cancellation. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 05 11月, 2020 1 次提交
-
-
由 Jens Axboe 提交于
This can't currently happen, but will be possible shortly. Handle missing files just like we do not being able to grab a needed mm, and mark the request as needing cancelation. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 22 10月, 2020 1 次提交
-
-
由 Jens Axboe 提交于
We correctly set io-wq NUMA node affinities when the io-wq context is setup, but if an entire node CPU set is offlined and then brought back online, the per node affinities are broken. Ensure that we set them again whenever a CPU comes online. This ensures that we always track the right node affinity. The usual cpuhp notifiers are used to drive it. Reported-by: NZhang Qiang <qiang.zhang@windriver.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 21 10月, 2020 1 次提交
-
-
由 Jens Axboe 提交于
This one was missed in the earlier conversion, should be included like any of the other IO identity flags. Make sure we restore to RLIM_INIFITY when dropping the personality again. Fixes: 98447d65 ("io_uring: move io identity items into separate struct") Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 17 10月, 2020 5 次提交
-
-
由 Jens Axboe 提交于
Make sure the async io-wq workers inherit the loginuid and sessionid from the original task, and restore them to unset once we're done with the async work item. While at it, disable the ability for kernel threads to write to their own loginuid. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
io-wq contains a pointer to the identity, which we just hold in io_kiocb for now. This is in preparation for putting this outside io_kiocb. The only exception is struct files_struct, which we'll need different rules for to avoid a circular dependency. No functional changes in this patch. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
We solely rely on work->work_flags now, so use that for proper checking and clearing/dropping of various identity items. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
We have a number of bits that decide what context to inherit. Set up io-wq flags for these instead. This is in preparation for always having the various members set, but not always needing them for all requests. No intended functional changes in this patch. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
There was an assumption that kthread_create_on_node() would properly set NUMA affinities in terms of CPUs allowed, but it doesn't. Make sure we do this when creating an io-wq context on NUMA. Cc: stable@vger.kernel.org Stefan Metzmacher <metze@samba.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 01 10月, 2020 5 次提交
-
-
由 Jens Axboe 提交于
This flag is no longer used, remove it. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Hillf Danton 提交于
The smart syzbot has found a reproducer for the following issue: ================================================================== BUG: KASAN: use-after-free in instrument_atomic_write include/linux/instrumented.h:71 [inline] BUG: KASAN: use-after-free in atomic_inc include/asm-generic/atomic-instrumented.h:240 [inline] BUG: KASAN: use-after-free in io_wqe_inc_running fs/io-wq.c:301 [inline] BUG: KASAN: use-after-free in io_wq_worker_running+0xde/0x110 fs/io-wq.c:613 Write of size 4 at addr ffff8882183db08c by task io_wqe_worker-0/7771 CPU: 0 PID: 7771 Comm: io_wqe_worker-0 Not tainted 5.9.0-rc4-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x198/0x1fd lib/dump_stack.c:118 print_address_description.constprop.0.cold+0xae/0x497 mm/kasan/report.c:383 __kasan_report mm/kasan/report.c:513 [inline] kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530 check_memory_region_inline mm/kasan/generic.c:186 [inline] check_memory_region+0x13d/0x180 mm/kasan/generic.c:192 instrument_atomic_write include/linux/instrumented.h:71 [inline] atomic_inc include/asm-generic/atomic-instrumented.h:240 [inline] io_wqe_inc_running fs/io-wq.c:301 [inline] io_wq_worker_running+0xde/0x110 fs/io-wq.c:613 schedule_timeout+0x148/0x250 kernel/time/timer.c:1879 io_wqe_worker+0x517/0x10e0 fs/io-wq.c:580 kthread+0x3b5/0x4a0 kernel/kthread.c:292 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294 Allocated by task 7768: kasan_save_stack+0x1b/0x40 mm/kasan/common.c:48 kasan_set_track mm/kasan/common.c:56 [inline] __kasan_kmalloc.constprop.0+0xbf/0xd0 mm/kasan/common.c:461 kmem_cache_alloc_node_trace+0x17b/0x3f0 mm/slab.c:3594 kmalloc_node include/linux/slab.h:572 [inline] kzalloc_node include/linux/slab.h:677 [inline] io_wq_create+0x57b/0xa10 fs/io-wq.c:1064 io_init_wq_offload fs/io_uring.c:7432 [inline] io_sq_offload_start fs/io_uring.c:7504 [inline] io_uring_create fs/io_uring.c:8625 [inline] io_uring_setup+0x1836/0x28e0 fs/io_uring.c:8694 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Freed by task 21: kasan_save_stack+0x1b/0x40 mm/kasan/common.c:48 kasan_set_track+0x1c/0x30 mm/kasan/common.c:56 kasan_set_free_info+0x1b/0x30 mm/kasan/generic.c:355 __kasan_slab_free+0xd8/0x120 mm/kasan/common.c:422 __cache_free mm/slab.c:3418 [inline] kfree+0x10e/0x2b0 mm/slab.c:3756 __io_wq_destroy fs/io-wq.c:1138 [inline] io_wq_destroy+0x2af/0x460 fs/io-wq.c:1146 io_finish_async fs/io_uring.c:6836 [inline] io_ring_ctx_free fs/io_uring.c:7870 [inline] io_ring_exit_work+0x1e4/0x6d0 fs/io_uring.c:7954 process_one_work+0x94c/0x1670 kernel/workqueue.c:2269 worker_thread+0x64c/0x1120 kernel/workqueue.c:2415 kthread+0x3b5/0x4a0 kernel/kthread.c:292 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294 The buggy address belongs to the object at ffff8882183db000 which belongs to the cache kmalloc-1k of size 1024 The buggy address is located 140 bytes inside of 1024-byte region [ffff8882183db000, ffff8882183db400) The buggy address belongs to the page: page:000000009bada22b refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x2183db flags: 0x57ffe0000000200(slab) raw: 057ffe0000000200 ffffea0008604c48 ffffea00086a8648 ffff8880aa040700 raw: 0000000000000000 ffff8882183db000 0000000100000002 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff8882183daf80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ffff8882183db000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >ffff8882183db080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff8882183db100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff8882183db180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ================================================================== which is down to the comment below, /* all workers gone, wq exit can proceed */ if (!nr_workers && refcount_dec_and_test(&wqe->wq->refs)) complete(&wqe->wq->done); because there might be multiple cases of wqe in a wq and we would wait for every worker in every wqe to go home before releasing wq's resources on destroying. To that end, rework wq's refcount by making it independent of the tracking of workers because after all they are two different things, and keeping it balanced when workers come and go. Note the manager kthread, like other workers, now holds a grab to wq during its lifetime. Finally to help destroy wq, check IO_WQ_BIT_EXIT upon creating worker and do nothing for exiting wq. Cc: stable@vger.kernel.org # v5.5+ Reported-by: syzbot+45fa0a195b941764e0f0@syzkaller.appspotmail.com Reported-by: syzbot+9af99580130003da82b1@syzkaller.appspotmail.com Cc: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: NHillf Danton <hdanton@sina.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Dennis Zhou 提交于
There are a few operations that are offloaded to the worker threads. In this case, we lose process context and end up in kthread context. This results in ios to be not accounted to the issuing cgroup and consequently end up as issued by root. Just like others, adopt the personality of the blkcg too when issuing via the workqueues. For the SQPOLL thread, it will live and attach in the inited cgroup's context. Signed-off-by: NDennis Zhou <dennis@kernel.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
During a context switch the scheduler invokes wq_worker_sleeping() with disabled preemption. Disabling preemption is needed because it protects access to `worker->sleeping'. As an optimisation it avoids invoking schedule() within the schedule path as part of possible wake up (thus preempt_enable_no_resched() afterwards). The io-wq has been added to the mix in the same section with disabled preemption. This breaks on PREEMPT_RT because io_wq_worker_sleeping() acquires a spinlock_t. Also within the schedule() the spinlock_t must be acquired after tsk_is_pi_blocked() otherwise it will block on the sleeping lock again while scheduling out. While playing with `io_uring-bench' I didn't notice a significant latency spike after converting io_wqe::lock to a raw_spinlock_t. The latency was more or less the same. In order to keep the spinlock_t it would have to be moved after the tsk_is_pi_blocked() check which would introduce a branch instruction into the hot path. The lock is used to maintain the `work_list' and wakes one task up at most. Should io_wqe_cancel_pending_work() cause latency spikes, while searching for a specific item, then it would need to drop the lock during iterations. revert_creds() is also invoked under the lock. According to debug cred::non_rcu is 0. Otherwise it should be moved outside of the locked section because put_cred_rcu()->free_uid() acquires a sleeping lock. Convert io_wqe::lock to a raw_spinlock_t.c Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
If we don't get and assign the namespace for the async work, then certain paths just don't work properly (like /dev/stdin, /proc/mounts, etc). Anything that references the current namespace of the given task should be assigned for async work on behalf of that task. Cc: stable@vger.kernel.org # v5.5+ Reported-by: NAl Viro <viro@zeniv.linux.org.uk> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 24 8月, 2020 1 次提交
-
-
由 Pavel Begunkov 提交于
Don't forget to update wqe->hash_tail after cancelling a pending work item, if it was hashed. Cc: stable@vger.kernel.org # 5.7+ Reported-by: NDmitry Shulyak <yashulyak@gmail.com> Fixes: 86f3cd1b ("io-wq: handle hashed writes in chains") Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 25 7月, 2020 2 次提交
-
-
由 Pavel Begunkov 提交于
Linked requests are hashed, remove a comment stating otherwise. Also move hash bits to emphasise that we don't carry it through loop iteration and set it every time. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
RLIMIT_SIZE in needed only for execution from an io-wq context, hence move all preparations from hot path to io-wq work setup. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 27 6月, 2020 1 次提交
-
-
由 Pavel Begunkov 提交于
It's easier to return next work from ->do_work() than having an in-out argument. Looks nicer and easier to compile. Also, merge io_wq_assign_next() into its only user. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 15 6月, 2020 3 次提交
-
-
由 Pavel Begunkov 提交于
If a process is going away, io_uring_flush() will cancel only 1 request with a matching pid. Cancel all of them Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
This adds support for cancelling all io-wq works matching a predicate. It isn't used yet, so no change in observable behaviour. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
Go all over all pending lists and cancel works there, and only then try to match running requests. No functional changes here, just a preparation for bulk cancellation. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 11 6月, 2020 3 次提交
-
-
由 Christoph Hellwig 提交于
Some architectures like arm64 and s390 require USER_DS to be set for kernel threads to access user address space, which is the whole purpose of kthread_use_mm, but other like x86 don't. That has lead to a huge mess where some callers are fixed up once they are tested on said architectures, while others linger around and yet other like io_uring try to do "clever" optimizations for what usually is just a trivial asignment to a member in the thread_struct for most architectures. Make kthread_use_mm set USER_DS, and kthread_unuse_mm restore to the previous value instead. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Tested-by: NJens Axboe <axboe@kernel.dk> Reviewed-by: NJens Axboe <axboe@kernel.dk> Acked-by: NMichael S. Tsirkin <mst@redhat.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Felipe Balbi <balbi@kernel.org> Cc: Felix Kuehling <Felix.Kuehling@amd.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Zhenyu Wang <zhenyuw@linux.intel.com> Cc: Zhi Wang <zhi.a.wang@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: http://lkml.kernel.org/r/20200404094101.672954-7-hch@lst.deSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Christoph Hellwig 提交于
Switch the function documentation to kerneldoc comments, and add WARN_ON_ONCE asserts that the calling thread is a kernel thread and does not have ->mm set (or has ->mm set in the case of unuse_mm). Also give the functions a kthread_ prefix to better document the use case. [hch@lst.de: fix a comment typo, cover the newly merged use_mm/unuse_mm caller in vfio] Link: http://lkml.kernel.org/r/20200416053158.586887-3-hch@lst.de [sfr@canb.auug.org.au: powerpc/vas: fix up for {un}use_mm() rename] Link: http://lkml.kernel.org/r/20200422163935.5aa93ba5@canb.auug.org.auSigned-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Tested-by: NJens Axboe <axboe@kernel.dk> Reviewed-by: NJens Axboe <axboe@kernel.dk> Acked-by: NFelix Kuehling <Felix.Kuehling@amd.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> [usb] Acked-by: NHaren Myneni <haren@linux.ibm.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Felipe Balbi <balbi@kernel.org> Cc: Jason Wang <jasowang@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Zhenyu Wang <zhenyuw@linux.intel.com> Cc: Zhi Wang <zhi.a.wang@intel.com> Link: http://lkml.kernel.org/r/20200404094101.672954-6-hch@lst.deSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Christoph Hellwig 提交于
Patch series "improve use_mm / unuse_mm", v2. This series improves the use_mm / unuse_mm interface by better documenting the assumptions, and my taking the set_fs manipulations spread over the callers into the core API. This patch (of 3): Use the proper API instead. Link: http://lkml.kernel.org/r/20200404094101.672954-1-hch@lst.de These helpers are only for use with kernel threads, and I will tie them more into the kthread infrastructure going forward. Also move the prototypes to kthread.h - mmu_context.h was a little weird to start with as it otherwise contains very low-level MM bits. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Tested-by: NJens Axboe <axboe@kernel.dk> Reviewed-by: NJens Axboe <axboe@kernel.dk> Acked-by: NFelix Kuehling <Felix.Kuehling@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Felipe Balbi <balbi@kernel.org> Cc: Jason Wang <jasowang@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Zhenyu Wang <zhenyuw@linux.intel.com> Cc: Zhi Wang <zhi.a.wang@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: http://lkml.kernel.org/r/20200404094101.672954-1-hch@lst.de Link: http://lkml.kernel.org/r/20200416053158.586887-1-hch@lst.de Link: http://lkml.kernel.org/r/20200404094101.672954-5-hch@lst.deSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 09 6月, 2020 1 次提交
-
-
由 Pavel Begunkov 提交于
io_uring is the only user of io-wq, and now it uses only io-wq callback for all its requests, namely io_wq_submit_work(). Instead of storing work->runner callback in each instance of io_wq_work, keep it in io-wq itself. pros: - reduces io_wq_work size - more robust -- ->func won't be invalidated with mem{cpy,set}(req) - helps other work Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 04 4月, 2020 1 次提交
-
-
由 Jens Axboe 提交于
If the original task is (or has) exited, then the task work will not get queued properly. Allow for using the io-wq manager task to queue this work for execution, and ensure that the io-wq manager notices and runs this work if woken up (or exiting). Reported-by: NDan Melnic <dmm@fb.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-