- 09 9月, 2017 1 次提交
-
-
由 Jérôme Glisse 提交于
Introduce a new migration mode that allow to offload the copy to a device DMA engine. This changes the workflow of migration and not all address_space migratepage callback can support this. This is intended to be use by migrate_vma() which itself is use for thing like HMM (see include/linux/hmm.h). No additional per-filesystem migratepage testing is needed. I disables MIGRATE_SYNC_NO_COPY in all problematic migratepage() callback and i added comment in those to explain why (part of this patch). The commit message is unclear it should say that any callback that wish to support this new mode need to be aware of the difference in the migration flow from other mode. Some of these callbacks do extra locking while copying (aio, zsmalloc, balloon, ...) and for DMA to be effective you want to copy multiple pages in one DMA operations. But in the problematic case you can not easily hold the extra lock accross multiple call to this callback. Usual flow is: For each page { 1 - lock page 2 - call migratepage() callback 3 - (extra locking in some migratepage() callback) 4 - migrate page state (freeze refcount, update page cache, buffer head, ...) 5 - copy page 6 - (unlock any extra lock of migratepage() callback) 7 - return from migratepage() callback 8 - unlock page } The new mode MIGRATE_SYNC_NO_COPY: 1 - lock multiple pages For each page { 2 - call migratepage() callback 3 - abort in all problematic migratepage() callback 4 - migrate page state (freeze refcount, update page cache, buffer head, ...) } // finished all calls to migratepage() callback 5 - DMA copy multiple pages 6 - unlock all the pages To support MIGRATE_SYNC_NO_COPY in the problematic case we would need a new callback migratepages() (for instance) that deals with multiple pages in one transaction. Because the problematic cases are not important for current usage I did not wanted to complexify this patchset even more for no good reason. Link: http://lkml.kernel.org/r/20170817000548.32038-14-jglisse@redhat.comSigned-off-by: NJérôme Glisse <jglisse@redhat.com> Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: David Nellans <dnellans@nvidia.com> Cc: Evgeny Baskakov <ebaskakov@nvidia.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Mark Hairgrove <mhairgrove@nvidia.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Sherry Cheung <SCheung@nvidia.com> Cc: Subhash Gutti <sgutti@nvidia.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Bob Liu <liubo95@huawei.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 08 9月, 2017 1 次提交
-
-
Currently, aio-nr is incremented in steps of 'num_possible_cpus() * 8' for io_setup(nr_events, ..) with 'nr_events < num_possible_cpus() * 4': ioctx_alloc() ... nr_events = max(nr_events, num_possible_cpus() * 4); nr_events *= 2; ... ctx->max_reqs = nr_events; ... aio_nr += ctx->max_reqs; .... This limits the number of aio contexts actually available to much less than aio-max-nr, and is increasingly worse with greater number of CPUs. For example, with 64 CPUs, only 256 aio contexts are actually available (with aio-max-nr = 65536) because the increment is 512 in that scenario. Note: 65536 [max aio contexts] / (64*4*2) [increment per aio context] is 128, but make it 256 (double) as counting against 'aio-max-nr * 2': ioctx_alloc() ... if (aio_nr + nr_events > (aio_max_nr * 2UL) || ... goto err_ctx; ... This patch uses the original value of nr_events (from userspace) to increment aio-nr and count against aio-max-nr, which resolves those. Signed-off-by: NMauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com> Reported-by: NLekshmi C. Pillai <lekshmi.cpillai@in.ibm.com> Tested-by: NLekshmi C. Pillai <lekshmi.cpillai@in.ibm.com> Tested-by: NPaul Nguyen <nguyenp@us.ibm.com> Reviewed-by: NJeff Moyer <jmoyer@redhat.com> Signed-off-by: NBenjamin LaHaise <bcrl@kvack.org>
-
- 28 6月, 2017 1 次提交
-
-
由 Jens Axboe 提交于
Reviewed-by: NAndreas Dilger <adilger@dilger.ca> Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 20 6月, 2017 2 次提交
-
-
由 Goldwyn Rodrigues 提交于
RWF_NOWAIT informs kernel to bail out if an AIO request will block for reasons such as file allocations, or a writeback triggered, or would block while allocating requests while performing direct I/O. RWF_NOWAIT is translated to IOCB_NOWAIT for iocb->ki_flags. FMODE_AIO_NOWAIT is a flag which identifies the file opened is capable of returning -EAGAIN if the AIO call will block. This must be set by supporting filesystems in the ->open() call. Filesystems xfs, btrfs and ext4 would be supported in the following patches. Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NJan Kara <jack@suse.cz> Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Goldwyn Rodrigues 提交于
aio_rw_flags is introduced in struct iocb (using aio_reserved1) which will carry the RWF_* flags. We cannot use aio_flags because they are not checked for validity which may break existing applications. Note, the only place RWF_HIPRI comes in effect is dio_await_one(). All the rest of the locations, aio code return -EIOCBQUEUED before the checks for RWF_HIPRI. Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NJan Kara <jack@suse.cz> Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 02 3月, 2017 1 次提交
-
-
由 Ingo Molnar 提交于
sched/headers: Prepare to move signal wakeup & sigpending methods from <linux/sched.h> into <linux/sched/signal.h> Fix up affected files that include this signal functionality via sched.h. Acked-by: NLinus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
- 25 2月, 2017 1 次提交
-
-
由 Mike Rapoport 提交于
When a non-cooperative userfaultfd monitor copies pages in the background, it may encounter regions that were already unmapped. Addition of UFFD_EVENT_UNMAP allows the uffd monitor to track precisely changes in the virtual memory layout. Since there might be different uffd contexts for the affected VMAs, we first should create a temporary representation for the unmap event for each uffd context and then notify them one by one to the appropriate userfault file descriptors. The event notification occurs after the mmap_sem has been released. [arnd@arndb.de: fix nommu build] Link: http://lkml.kernel.org/r/20170203165141.3665284-1-arnd@arndb.de [mhocko@suse.com: fix nommu build] Link: http://lkml.kernel.org/r/20170202091503.GA22823@dhcp22.suse.cz Link: http://lkml.kernel.org/r/1485542673-24387-3-git-send-email-rppt@linux.vnet.ibm.comSigned-off-by: NMike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by: NMichal Hocko <mhocko@suse.com> Signed-off-by: NArnd Bergmann <arnd@arndb.de> Acked-by: NHillf Danton <hillf.zj@alibaba-inc.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Pavel Emelyanov <xemul@virtuozzo.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 20 2月, 2017 1 次提交
-
-
由 Miklos Szeredi 提交于
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
- 15 1月, 2017 1 次提交
-
-
由 Shaohua Li 提交于
lockdep reports a warnning. file_start_write/file_end_write only acquire/release the lock for regular files. So checking the files in aio side too. [ 453.532141] ------------[ cut here ]------------ [ 453.533011] WARNING: CPU: 1 PID: 1298 at ../kernel/locking/lockdep.c:3514 lock_release+0x434/0x670 [ 453.533011] DEBUG_LOCKS_WARN_ON(depth <= 0) [ 453.533011] Modules linked in: [ 453.533011] CPU: 1 PID: 1298 Comm: fio Not tainted 4.9.0+ #964 [ 453.533011] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.0-1.fc24 04/01/2014 [ 453.533011] ffff8803a24b7a70 ffffffff8196cffb ffff8803a24b7ae8 0000000000000000 [ 453.533011] ffff8803a24b7ab8 ffffffff81091ee1 ffff8803a5dba700 00000dba00000008 [ 453.533011] ffffed0074496f59 ffff8803a5dbaf54 ffff8803ae0f8488 fffffffffffffdef [ 453.533011] Call Trace: [ 453.533011] [<ffffffff8196cffb>] dump_stack+0x67/0x9c [ 453.533011] [<ffffffff81091ee1>] __warn+0x111/0x130 [ 453.533011] [<ffffffff81091f97>] warn_slowpath_fmt+0x97/0xb0 [ 453.533011] [<ffffffff81091f00>] ? __warn+0x130/0x130 [ 453.533011] [<ffffffff8191b789>] ? blk_finish_plug+0x29/0x60 [ 453.533011] [<ffffffff811205d4>] lock_release+0x434/0x670 [ 453.533011] [<ffffffff8198af94>] ? import_single_range+0xd4/0x110 [ 453.533011] [<ffffffff81322195>] ? rw_verify_area+0x65/0x140 [ 453.533011] [<ffffffff813aa696>] ? aio_write+0x1f6/0x280 [ 453.533011] [<ffffffff813aa6c9>] aio_write+0x229/0x280 [ 453.533011] [<ffffffff813aa4a0>] ? aio_complete+0x640/0x640 [ 453.533011] [<ffffffff8111df20>] ? debug_check_no_locks_freed+0x1a0/0x1a0 [ 453.533011] [<ffffffff8114793a>] ? debug_lockdep_rcu_enabled.part.2+0x1a/0x30 [ 453.533011] [<ffffffff81147985>] ? debug_lockdep_rcu_enabled+0x35/0x40 [ 453.533011] [<ffffffff812a92be>] ? __might_fault+0x7e/0xf0 [ 453.533011] [<ffffffff813ac9bc>] do_io_submit+0x94c/0xb10 [ 453.533011] [<ffffffff813ac2ae>] ? do_io_submit+0x23e/0xb10 [ 453.533011] [<ffffffff813ac070>] ? SyS_io_destroy+0x270/0x270 [ 453.533011] [<ffffffff8111d7b3>] ? mark_held_locks+0x23/0xc0 [ 453.533011] [<ffffffff8100201a>] ? trace_hardirqs_on_thunk+0x1a/0x1c [ 453.533011] [<ffffffff813acb90>] SyS_io_submit+0x10/0x20 [ 453.533011] [<ffffffff824f96aa>] entry_SYSCALL_64_fastpath+0x18/0xad [ 453.533011] [<ffffffff81119190>] ? trace_hardirqs_off_caller+0xc0/0x110 [ 453.533011] ---[ end trace b2fbe664d1cc0082 ]--- Cc: Dmitry Monakhov <dmonakhov@openvz.org> Cc: Jan Kara <jack@suse.cz> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NShaohua Li <shli@fb.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 26 12月, 2016 1 次提交
-
-
由 Thomas Gleixner 提交于
ktime is a union because the initial implementation stored the time in scalar nanoseconds on 64 bit machine and in a endianess optimized timespec variant for 32bit machines. The Y2038 cleanup removed the timespec variant and switched everything to scalar nanoseconds. The union remained, but become completely pointless. Get rid of the union and just keep ktime_t as simple typedef of type s64. The conversion was done with coccinelle and some manual mopping up. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org>
-
- 25 12月, 2016 1 次提交
-
-
由 Linus Torvalds 提交于
This was entirely automated, using the script by Al: PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>' sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \ $(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h) to do the replacement at the end of the merge window. Requested-by: NAl Viro <viro@zeniv.linux.org.uk> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 23 12月, 2016 1 次提交
-
-
由 Al Viro 提交于
... and fix the minor buglet in compat io_submit() - native one kills ioctx as cleanup when put_user() fails. Get rid of bogus compat_... in !CONFIG_AIO case, while we are at it - they should simply fail with ENOSYS, same as for native counterparts. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 05 12月, 2016 1 次提交
-
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 31 10月, 2016 4 次提交
-
-
由 Jan Kara 提交于
Currently we dropped freeze protection of aio writes just after IO was submitted. Thus aio write could be in flight while the filesystem was frozen and that could result in unexpected situation like aio completion wanting to convert extent type on frozen filesystem. Testcase from Dmitry triggering this is like: for ((i=0;i<60;i++));do fsfreeze -f /mnt ;sleep 1;fsfreeze -u /mnt;done & fio --bs=4k --ioengine=libaio --iodepth=128 --size=1g --direct=1 \ --runtime=60 --filename=/mnt/file --name=rand-write --rw=randwrite Fix the problem by dropping freeze protection only once IO is completed in aio_complete(). Reported-by: NDmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: NJan Kara <jack@suse.cz> [hch: forward ported on top of various VFS and aio changes] Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Christoph Hellwig 提交于
Pass the ABI iocb structure to aio_setup_rw and let it handle the non-vectored I/O case as well. With that and a new helper for the AIO return value handling we can now define new aio_read and aio_write helpers that implement reads and writes in a self-contained way without duplicating too much code. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Christoph Hellwig 提交于
Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Christoph Hellwig 提交于
Otherwise we might dereference an already freed file and/or inode when aio_complete is called before we return from the read_iter or write_iter method. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 28 9月, 2016 1 次提交
-
-
由 Rasmus Villemoes 提交于
Using a local variable we can prevent gcc from reloading aio_ring_file->f_inode->i_mapping twice, eliminating 2x2 dependent loads. Signed-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 16 9月, 2016 1 次提交
-
-
由 Jann Horn 提交于
This ensures that do_mmap() won't implicitly make AIO memory mappings executable if the READ_IMPLIES_EXEC personality flag is set. Such behavior is problematic because the security_mmap_file LSM hook doesn't catch this case, potentially permitting an attacker to bypass a W^X policy enforced by SELinux. I have tested the patch on my machine. To test the behavior, compile and run this: #define _GNU_SOURCE #include <unistd.h> #include <sys/personality.h> #include <linux/aio_abi.h> #include <err.h> #include <stdlib.h> #include <stdio.h> #include <sys/syscall.h> int main(void) { personality(READ_IMPLIES_EXEC); aio_context_t ctx = 0; if (syscall(__NR_io_setup, 1, &ctx)) err(1, "io_setup"); char cmd[1000]; sprintf(cmd, "cat /proc/%d/maps | grep -F '/[aio]'", (int)getpid()); system(cmd); return 0; } In the output, "rw-s" is good, "rwxs" is bad. Signed-off-by: NJann Horn <jann@thejh.net> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 24 5月, 2016 1 次提交
-
-
由 Michal Hocko 提交于
aio_setup_ring waits for mmap_sem in writable mode. If the waiting task gets killed by the oom killer it would block oom_reaper from asynchronous address space reclaim and reduce the chances of timely OOM resolving. Wait for the lock in the killable mode and return with EINTR if the task got killed while waiting. This will also expedite the return to the userspace and do_exit. Signed-off-by: NMichal Hocko <mhocko@suse.com> Acked-by: NJeff Moyer <jmoyer@redhat.com> Acked-by: NVlastimil Babka <vbabka@suse.cz> Cc: Benamin LaHaise <bcrl@kvack.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 04 4月, 2016 1 次提交
-
-
由 Al Viro 提交于
the value is never used after that point Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 05 9月, 2015 1 次提交
-
-
由 Oleg Nesterov 提交于
vma->vm_ops->mremap() looks more natural and clean in move_vma(), and this way ->mremap() can have more users. Say, vdso. While at it, s/aio_ring_remap/aio_ring_mremap/. Note: this is the minimal change before ->mremap() finds another user in file_operations; this method should have more arguments, and it can be used to kill arch_remap(). Signed-off-by: NOleg Nesterov <oleg@redhat.com> Acked-by: NPavel Emelyanov <xemul@parallels.com> Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Hugh Dickins <hughd@google.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Laurent Dufour <ldufour@linux.vnet.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 16 4月, 2015 1 次提交
-
-
由 Jens Axboe 提交于
exit_aio() currently serializes killing io contexts. Each context killing ends up having to do percpu_ref_kill(), which in turns has to wait for an RCU grace period. This can take a long time, depending on the number of contexts. And there's no point in doing them serially, when we could be waiting for all of them in one fell swoop. This patches makes my fio thread offload test case exit 0.2s instead of almost 6s. Reviewed-by: NJeff Moyer <jmoyer@redhat.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 12 4月, 2015 8 次提交
-
-
由 Al Viro 提交于
... avoiding write_iter/fcntl races. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
no remaining users Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
We check if ->ki_pos is positive. However, by that point we have already done rw_verify_area(), which would have rejected such unless the file had been one of /dev/mem, /dev/kmem and /proc/kcore. All of which do not have vectored rw methods, so we would've bailed out even earlier. This check had been introduced before rw_verify_area() had been added there - in fact, it was a subset of checks done on sync paths by rw_verify_area() (back then the /dev/mem exception didn't exist at all). The rest of checks (mandatory locking, etc.) hadn't been added until later. Unfortunately, by the time the call of rw_verify_area() got added, the /dev/mem exception had already appeared, so it wasn't obvious that the older explicit check downstream had become dead code. It *is* a dead code, though, since the few files for which the exception applies do not have ->aio_{read,write}() or ->{read,write}_iter() and for them we won't reach that check anyway. What's more, even if we ever introduce vectored methods for /dev/mem and friends, they'll have to cope with negative positions anyway, since readv(2) and writev(2) are using the same checks as read(2) and write(2) - i.e. rw_verify_area(). Let's bury it. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
Way, way back kiocb used to be picked from arrays, so ioctx_alloc() checked for multiplication overflow when calculating the size of such array. By the time fs/aio.c went into the tree (in 2002) they were already allocated one-by-one by kmem_cache_alloc(), so that check had already become pointless. Let's bury it... Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
identical to import_single_range() Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
We don't need req in either of those. We don't need nr_segs in caller. We don't really need len in caller either - iov_iter_count(&iter) will do. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
the only non-trivial detail is that we do it before rw_verify_area(), so we'd better cap the length ourselves in aio_setup_single_rw() case (for vectored case rw_copy_check_uvector() will do that for us). Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 07 4月, 2015 2 次提交
-
-
由 Al Viro 提交于
If we fail past the aio_setup_ring(), we need to destroy the mapping. We don't need to care about anybody having found ctx, or added requests to it, since the last failure exit is exactly the failure to make ctx visible to lookups. Reproducer (based on one by Joe Mario <jmario@redhat.com>): void count(char *p) { char s[80]; printf("%s: ", p); fflush(stdout); sprintf(s, "/bin/cat /proc/%d/maps|/bin/fgrep -c '/[aio] (deleted)'", getpid()); system(s); } int main() { io_context_t *ctx; int created, limit, i, destroyed; FILE *f; count("before"); if ((f = fopen("/proc/sys/fs/aio-max-nr", "r")) == NULL) perror("opening aio-max-nr"); else if (fscanf(f, "%d", &limit) != 1) fprintf(stderr, "can't parse aio-max-nr\n"); else if ((ctx = calloc(limit, sizeof(io_context_t))) == NULL) perror("allocating aio_context_t array"); else { for (i = 0, created = 0; i < limit; i++) { if (io_setup(1000, ctx + created) == 0) created++; } for (i = 0, destroyed = 0; i < created; i++) if (io_destroy(ctx[i]) == 0) destroyed++; printf("created %d, failed %d, destroyed %d\n", created, limit - created, destroyed); count("after"); } } Found-by: NJoe Mario <jmario@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
teach ->mremap() method to return an error and have it fail for aio mappings in process of being killed Note that in case of ->mremap() failure we need to undo move_page_tables() we'd already done; we could call ->mremap() first, but then the failure of move_page_tables() would require undoing whatever _successful_ ->mremap() has done, which would be a lot more headache in general. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 14 3月, 2015 2 次提交
-
-
由 Christoph Hellwig 提交于
Most callers in the kernel want to perform synchronous file I/O, but still have to bloat the stack with a full struct kiocb. Split out the parts needed in filesystem code from those in the aio code, and only allocate those needed to pass down argument on the stack. The aio code embedds the generic iocb in the one it allocates and can easily get back to it by using container_of. Also add a ->ki_complete method to struct kiocb, this is used to call into the aio code and thus removes the dependency on aio for filesystems impementing asynchronous operations. It will also allow other callers to substitute their own completion callback. We also add a new ->ki_flags field to work around the nasty layering violation recently introduced in commit 5e33f6 ("usb: gadget: ffs: add eventfd notification about ffs events"). Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Christoph Hellwig 提交于
The AIO interface is fairly complex because it tries to allow filesystems to always work async and then wakeup a synchronous caller through aio_complete. It turns out that basically no one was doing this to avoid the complexity and context switches, and we've already fixed up the remaining users and can now get rid of this case. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 13 3月, 2015 1 次提交
-
-
由 Christoph Hellwig 提交于
There is no need to pass the total request length in the kiocb, as we already get passed in through the iov_iter argument. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 20 2月, 2015 1 次提交
-
-
由 Kinglong Mee 提交于
Have defined pr_fmt as below in fs/aio.c, so remove duplicate function name in pr_debug message. #define pr_fmt(fmt) "%s: " fmt, __func__ Signed-off-by: NKinglong Mee <kinglongmee@gmail.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 04 2月, 2015 1 次提交
-
-
由 Dave Chinner 提交于
Under CONFIG_DEBUG_ATOMIC_SLEEP=y, aio_read_event_ring() will throw warnings like the following due to being called from wait_event context: WARNING: CPU: 0 PID: 16006 at kernel/sched/core.c:7300 __might_sleep+0x7f/0x90() do not call blocking ops when !TASK_RUNNING; state=1 set at [<ffffffff810d85a3>] prepare_to_wait_event+0x63/0x110 Modules linked in: CPU: 0 PID: 16006 Comm: aio-dio-fcntl-r Not tainted 3.19.0-rc6-dgc+ #705 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 ffffffff821c0372 ffff88003c117cd8 ffffffff81daf2bd 000000000000d8d8 ffff88003c117d28 ffff88003c117d18 ffffffff8109beda ffff88003c117cf8 ffffffff821c115e 0000000000000061 0000000000000000 00007ffffe4aa300 Call Trace: [<ffffffff81daf2bd>] dump_stack+0x4c/0x65 [<ffffffff8109beda>] warn_slowpath_common+0x8a/0xc0 [<ffffffff8109bf56>] warn_slowpath_fmt+0x46/0x50 [<ffffffff810d85a3>] ? prepare_to_wait_event+0x63/0x110 [<ffffffff810d85a3>] ? prepare_to_wait_event+0x63/0x110 [<ffffffff810bdfcf>] __might_sleep+0x7f/0x90 [<ffffffff81db8344>] mutex_lock+0x24/0x45 [<ffffffff81216b7c>] aio_read_events+0x4c/0x290 [<ffffffff81216fac>] read_events+0x1ec/0x220 [<ffffffff810d8650>] ? prepare_to_wait_event+0x110/0x110 [<ffffffff810fdb10>] ? hrtimer_get_res+0x50/0x50 [<ffffffff8121899d>] SyS_io_getevents+0x4d/0xb0 [<ffffffff81dba5a9>] system_call_fastpath+0x12/0x17 ---[ end trace bde69eaf655a4fea ]--- There is not actually a bug here, so annotate the code to tell the debug logic that everything is just fine and not to fire a false positive. Signed-off-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NBenjamin LaHaise <bcrl@kvack.org>
-
- 21 1月, 2015 2 次提交
-
-
由 Christoph Hellwig 提交于
Now that we never use the backing_dev_info pointer in struct address_space we can simply remove it and save 4 to 8 bytes in every inode. Signed-off-by: NChristoph Hellwig <hch@lst.de> Acked-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Reviewed-by: NTejun Heo <tj@kernel.org> Reviewed-by: NJan Kara <jack@suse.cz> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Christoph Hellwig 提交于
Since "BDI: Provide backing device capability information [try #3]" the backing_dev_info structure also provides flags for the kind of mmap operation available in a nommu environment, which is entirely unrelated to it's original purpose. Introduce a new nommu-only file operation to provide this information to the nommu mmap code instead. Splitting this from the backing_dev_info structure allows to remove lots of backing_dev_info instance that aren't otherwise needed, and entirely gets rid of the concept of providing a backing_dev_info for a character device. It also removes the need for the mtd_inodefs filesystem. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NTejun Heo <tj@kernel.org> Acked-by: NBrian Norris <computersforpeace@gmail.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-