- 18 12月, 2018 7 次提交
-
-
由 Jens Axboe 提交于
Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
In preparation of handing in iocbs in a different fashion as well. Also make it clear that the iocb being passed in isn't modified, by marking it const throughout. Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
Replace the percpu_ref_put() + kmem_cache_free() with a call to iocb_put() instead. Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
Plugging is meant to optimize submission of a string of IOs, if we don't have more than 2 being submitted, don't bother setting up a plug. Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
It's 192 bytes, fairly substantial. Most items don't need to be cleared, especially not upfront. Clear the ones we do need to clear, and leave the other ones for setup when the iocb is prepared and submitted. Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
This is in preparation for certain types of IO not needing a ring reserveration. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
We know this is a read/write request, but in preparation for having different kinds of those, ensure that we call the assigned handler instead of assuming it's aio_complete_rq(). Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 12 12月, 2018 1 次提交
-
-
由 Jeff Moyer 提交于
Matthew pointed out that the ioctx_table is susceptible to spectre v1, because the index can be controlled by an attacker. The below patch should mitigate the attack for all of the aio system calls. Cc: stable@vger.kernel.org Reported-by: NMatthew Wilcox <willy@infradead.org> Reported-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NJeff Moyer <jmoyer@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 05 12月, 2018 1 次提交
-
-
由 Christoph Hellwig 提交于
No one is going to poll for aio (yet), so we must clear the HIPRI flag, as we would otherwise send it down the poll queues, where no one will be polling for completions. Signed-off-by: NChristoph Hellwig <hch@lst.de> IOCB_HIPRI, not RWF_HIPRI. Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 20 11月, 2018 1 次提交
-
-
由 Damien Le Moal 提交于
For cases when the application does not specify aio_reqprio for an aio, fallback to use get_current_ioprio() to obtain the task I/O priority last set using ioprio_set() rather than the hardcoded IOPRIO_CLASS_NONE value. Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de> Reviewed-by: NAdam Manzanares <adam.manzanares@wdc.com> Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 17 11月, 2018 1 次提交
-
-
由 Jens Axboe 提交于
If the ioprio capability check fails, we return without putting the file pointer. Fixes: d9a08a9e ("fs: Add aio iopriority support") Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 27 8月, 2018 1 次提交
-
-
由 Arnd Bergmann 提交于
Christoph Hellwig suggested a slightly different path for handling backwards compatibility with the 32-bit time_t based system calls: Rather than simply reusing the compat_sys_* entry points on 32-bit architectures unchanged, we get rid of those entry points and the compat_time types by renaming them to something that makes more sense on 32-bit architectures (which don't have a compat mode otherwise), and then share the entry points under the new name with the 64-bit architectures that use them for implementing the compatibility. The following types and interfaces are renamed here, and moved from linux/compat_time.h to linux/time32.h: old new --- --- compat_time_t old_time32_t struct compat_timeval struct old_timeval32 struct compat_timespec struct old_timespec32 struct compat_itimerspec struct old_itimerspec32 ns_to_compat_timeval() ns_to_old_timeval32() get_compat_itimerspec64() get_old_itimerspec32() put_compat_itimerspec64() put_old_itimerspec32() compat_get_timespec64() get_old_timespec32() compat_put_timespec64() put_old_timespec32() As we already have aliases in place, this patch addresses only the instances that are relevant to the system call interface in particular, not those that occur in device drivers and other modules. Those will get handled separately, while providing the 64-bit version of the respective interfaces. I'm not renaming the timex, rusage and itimerval structures, as we are still debating what the new interface will look like, and whether we will need a replacement at all. This also doesn't change the names of the syscall entry points, which can be done more easily when we actually switch over the 32-bit architectures to use them, at that point we need to change COMPAT_SYSCALL_DEFINEx to SYSCALL_DEFINEx with a new name, e.g. with a _time32 suffix. Suggested-by: NChristoph Hellwig <hch@infradead.org> Link: https://lore.kernel.org/lkml/20180705222110.GA5698@infradead.org/Signed-off-by: NArnd Bergmann <arnd@arndb.de>
-
- 06 8月, 2018 3 次提交
-
-
由 Christoph Hellwig 提交于
If we get a keyed wakeup for a aio poll waitqueue and wake can acquire the ctx_lock without spinning we can just complete the iocb straight from the wakeup callback to avoid a context switch. Signed-off-by: NChristoph Hellwig <hch@lst.de> Tested-by: NAvi Kivity <avi@scylladb.com>
-
由 Christoph Hellwig 提交于
Simple one-shot poll through the io_submit() interface. To poll for a file descriptor the application should submit an iocb of type IOCB_CMD_POLL. It will poll the fd for the events specified in the the first 32 bits of the aio_buf field of the iocb. Unlike poll or epoll without EPOLLONESHOT this interface always works in one shot mode, that is once the iocb is completed, it will have to be resubmitted. Signed-off-by: NChristoph Hellwig <hch@lst.de> Tested-by: NAvi Kivity <avi@scylladb.com>
-
由 Christoph Hellwig 提交于
This is needed to prevent races caused by the way the ->poll API works. To avoid introducing overhead for other users of the iocbs we initialize it to zero and only do refcount operations if it is non-zero in the completion path. Signed-off-by: NChristoph Hellwig <hch@lst.de> Tested-by: NAvi Kivity <avi@scylladb.com>
-
- 18 7月, 2018 1 次提交
-
-
由 Christoph Hellwig 提交于
glibc uses a different defintion of sigset_t than the kernel does, and the current version would pull in both. To fix this just do not expose the type at all - this somewhat mirrors pselect() where we do not even have a type for the magic sigmask argument, but just use pointer arithmetics. Fixes: 7a074e96 ("aio: implement io_pgetevents") Signed-off-by: NChristoph Hellwig <hch@lst.de> Reported-by: NAdrian Reber <adrian@lisas.de> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 12 7月, 2018 2 次提交
-
-
由 Al Viro 提交于
takes inode, vfsmount, name, O_... flags and file_operations and either returns a new struct file (in which case inode reference we held is consumed) or returns ERR_PTR(), in which case no refcounts are altered. converted aio_private_file() and sock_alloc_file() to it Acked-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
... so that it could set both ->f_flags and ->f_mode, without callers having to set ->f_flags manually. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 29 6月, 2018 1 次提交
-
-
由 Linus Torvalds 提交于
The poll() changes were not well thought out, and completely unexplained. They also caused a huge performance regression, because "->poll()" was no longer a trivial file operation that just called down to the underlying file operations, but instead did at least two indirect calls. Indirect calls are sadly slow now with the Spectre mitigation, but the performance problem could at least be largely mitigated by changing the "->get_poll_head()" operation to just have a per-file-descriptor pointer to the poll head instead. That gets rid of one of the new indirections. But that doesn't fix the new complexity that is completely unwarranted for the regular case. The (undocumented) reason for the poll() changes was some alleged AIO poll race fixing, but we don't make the common case slower and more complex for some uncommon special case, so this all really needs way more explanations and most likely a fundamental redesign. [ This revert is a revert of about 30 different commits, not reverted individually because that would just be unnecessarily messy - Linus ] Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 15 6月, 2018 1 次提交
-
-
由 Christoph Hellwig 提交于
The ->poll_mask() operation has a mask of events that the caller is interested in, but not all implementations might take it into account. Mask the return value to only the requested events, similar to what the poll and epoll code does. Reported-by: NAvi Kivity <avi@scylladb.com> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 05 6月, 2018 1 次提交
-
-
由 Adam Manzanares 提交于
Previously the value was ignored. Signed-off-by: NAdam Manzanares <adam.manzanares@wdc.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 31 5月, 2018 2 次提交
-
-
由 Adam Manzanares 提交于
This is the per-I/O equivalent of the ioprio_set system call. When IOCB_FLAG_IOPRIO is set on the iocb aio_flags field, then we set the newly added kiocb ki_ioprio field to the value in the iocb aio_reqprio field. This patch depends on block: add ioprio_check_cap function. Signed-off-by: NAdam Manzanares <adam.manzanares@wdc.com> Reviewed-by: NJeff Moyer <jmoyer@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Adam Manzanares 提交于
In order to avoid kiocb bloat for per command iopriority support, rw_hint is converted from enum to a u16. Added a guard around ki_hint assignment. Signed-off-by: NAdam Manzanares <adam.manzanares@wdc.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 30 5月, 2018 6 次提交
-
-
由 Al Viro 提交于
as it is, the logics in native io_submit(2) is "if asked for more than LONG_MAX/sizeof(pointer) iocbs to submit, don't bother with more than LONG_MAX/sizeof(pointer)" (i.e. 512M requests on 32bit and 1E requests on 64bit) while compat io_submit(2) goes with "stop after the first PAGE_SIZE/sizeof(pointer) iocbs", i.e. 1K or so. Which is * inconsistent * *way* too much in native case * possibly too little in compat one and * wrong anyway, since the natural point where we ought to stop bothering is ctx->nr_events Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
get rid of insane "copy array of 32bit pointers into an array of native ones" glue. Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
The logics for 'avail' is * not past the tail of cyclic buffer * no more than asked * not past the end of buffer * not past the end of a page Unobfuscate the last part. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
... so just make them return 0 when caller does not need to destroy iocb Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
We really want iocb out of io_cancel(2) reach before we start tearing it down. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 29 5月, 2018 1 次提交
-
-
由 Christoph Hellwig 提交于
Looks like this got lost in a merge. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 26 5月, 2018 4 次提交
-
-
由 Christoph Hellwig 提交于
If we can acquire ctx_lock without spinning we can just remove our iocb from the active_reqs list, and thus complete the iocbs from the wakeup context. Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Christoph Hellwig 提交于
Simple one-shot poll through the io_submit() interface. To poll for a file descriptor the application should submit an iocb of type IOCB_CMD_POLL. It will poll the fd for the events specified in the the first 32 bits of the aio_buf field of the iocb. Unlike poll or epoll without EPOLLONESHOT this interface always works in one shot mode, that is once the iocb is completed, it will have to be resubmitted. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Christoph Hellwig 提交于
With the current aio code there is no need for the magic KIOCB_CANCELLED value, as a cancelation just kicks the driver to queue the completion ASAP, with all actual completion handling done in another thread. Given that both the completion path and cancelation take the context lock there is no need for magic cmpxchg loops either. If we remove iocbs from the active list after calling ->ki_cancel (but with ctx_lock still held), we can also rely on the invariant thay anything found on the list has a ->ki_cancel callback and can be cancelled, further simplifing the code. Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Christoph Hellwig 提交于
No need to pass the key field to lookup_iocb to compare it with KIOCB_KEY, as we can do that right after retrieving it from userspace. Also move the KIOCB_KEY definition to aio.c as it is an internal value not used by any other place in the kernel. Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 24 5月, 2018 1 次提交
-
-
由 Al Viro 提交于
If io_destroy() gets to cancelling everything that can be cancelled and gets to kiocb_cancel() calling the function driver has left in ->ki_cancel, it becomes vulnerable to a race with IO completion. At that point req is already taken off the list and aio_complete() does *NOT* spin until we (in free_ioctx_users()) releases ->ctx_lock. As the result, it proceeds to kiocb_free(), freing req just it gets passed to ->ki_cancel(). Fix is simple - remove from the list after the call of kiocb_cancel(). All instances of ->ki_cancel() already have to cope with the being called with iocb still on list - that's what happens in io_cancel(2). Cc: stable@kernel.org Fixes: 0460fef2 "aio: use cancellation list lazily" Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 22 5月, 2018 1 次提交
-
-
由 Al Viro 提交于
kill_ioctx() used to have an explicit RCU delay between removing the reference from ->ioctx_table and percpu_ref_kill() dropping the refcount. At some point that delay had been removed, on the theory that percpu_ref_kill() itself contained an RCU delay. Unfortunately, that was the wrong kind of RCU delay and it didn't care about rcu_read_lock() used by lookup_ioctx(). As the result, we could get ctx freed right under lookup_ioctx(). Tejun has fixed that in a6d7cff4 ("fs/aio: Add explicit RCU grace period when freeing kioctx"); however, that fix is not enough. Suppose io_destroy() from one thread races with e.g. io_setup() from another; CPU1 removes the reference from current->mm->ioctx_table[...] just as CPU2 has picked it (under rcu_read_lock()). Then CPU1 proceeds to drop the refcount, getting it to 0 and triggering a call of free_ioctx_users(), which proceeds to drop the secondary refcount and once that reaches zero calls free_ioctx_reqs(). That does INIT_RCU_WORK(&ctx->free_rwork, free_ioctx); queue_rcu_work(system_wq, &ctx->free_rwork); and schedules freeing the whole thing after RCU delay. In the meanwhile CPU2 has gotten around to percpu_ref_get(), bumping the refcount from 0 to 1 and returned the reference to io_setup(). Tejun's fix (that queue_rcu_work() in there) guarantees that ctx won't get freed until after percpu_ref_get(). Sure, we'd increment the counter before ctx can be freed. Now we are out of rcu_read_lock() and there's nothing to stop freeing of the whole thing. Unfortunately, CPU2 assumes that since it has grabbed the reference, ctx is *NOT* going away until it gets around to dropping that reference. The fix is obvious - use percpu_ref_tryget_live() and treat failure as miss. It's not costlier than what we currently do in normal case, it's safe to call since freeing *is* delayed and it closes the race window - either lookup_ioctx() comes before percpu_ref_kill() (in which case ctx->users won't reach 0 until the caller of lookup_ioctx() drops it) or lookup_ioctx() fails, ctx->users is unaffected and caller of lookup_ioctx() doesn't see the object in question at all. Cc: stable@kernel.org Fixes: a6d7cff4 "fs/aio: Add explicit RCU grace period when freeing kioctx" Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 03 5月, 2018 4 次提交
-
-
由 Christoph Hellwig 提交于
This is the io_getevents equivalent of ppoll/pselect and allows to properly mix signals and aio completions (especially with IOCB_CMD_POLL) and atomically executes the following sequence: sigset_t origmask; pthread_sigmask(SIG_SETMASK, &sigmask, &origmask); ret = io_getevents(ctx, min_nr, nr, events, timeout); pthread_sigmask(SIG_SETMASK, &origmask, NULL); Note that unlike many other signal related calls we do not pass a sigmask size, as that would get us to 7 arguments, which aren't easily supported by the syscall infrastructure. It seems a lot less painful to just add a new syscall variant in the unlikely case we're going to increase the sigset size. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Christoph Hellwig 提交于
Simple workqueue offload for now, but prepared for adding a real aio_fsync method if the need arises. Based on an earlier patch from Dave Chinner. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Christoph Hellwig 提交于
Don't reference the kiocb structure from the common aio code, and move any use of it into helper specific to the read/write path. This is in preparation for aio_poll support that wants to use the space for different fields. Signed-off-by: NChristoph Hellwig <hch@lst.de> Acked-by: NJeff Moyer <jmoyer@redhat.com> Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Christoph Hellwig 提交于
If we release the lockdep write protection token before calling into ->write_iter and thus never access the file pointer after an -EIOCBQUEUED return from ->write_iter or ->read_iter we don't need this extra reference. Signed-off-by: NChristoph Hellwig <hch@lst.de>
-