- 02 9月, 2020 21 次提交
-
-
由 Pavel Begunkov 提交于
to #28736503 commit 59960b9deb5354e4cdb0b6ed3a3b653a2b4eb602 upstream Don't leave garbage in req.work before punting async on -EAGAIN in io_iopoll_queue(). [ 140.922099] general protection fault, probably for non-canonical address 0xdead000000000100: 0000 [#1] PREEMPT SMP PTI ... [ 140.922105] RIP: 0010:io_worker_handle_work+0x1db/0x480 ... [ 140.922114] Call Trace: [ 140.922118] ? __next_timer_interrupt+0xe0/0xe0 [ 140.922119] io_wqe_worker+0x2a9/0x360 [ 140.922121] ? _raw_spin_unlock_irqrestore+0x24/0x40 [ 140.922124] kthread+0x12c/0x170 [ 140.922125] ? io_worker_handle_work+0x480/0x480 [ 140.922126] ? kthread_park+0x90/0x90 [ 140.922127] ret_from_fork+0x22/0x30 Fixes: 7cdaf587de7c ("io_uring: avoid whole io_wq_work copy for requests completed inline") Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
由 Bin Yu 提交于
task #28924046 This patch enables extra hardware errata for ARM64 N1 platform and just for aarch64, not for x86. Signed-off-by: NBin Yu <jkchen@linux.alibaba.com> Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>
-
由 Catalin Marinas 提交于
task #28924046 [ Upstream commit 27a22fbdeedd6c5c451cf5f830d51782bf50c3a2 ] Clang reports a warning on the __tlbi(aside1is, 0) macro expansion since the value size does not match the register size specified in the inline asm. Construct the ASID value using the __TLBI_VADDR() macro. Fixes: 222fc0c8503d ("arm64: compat: Workaround Neoverse-N1 #1542419 for compat user-space") Reported-by: NNathan Chancellor <natechancellor@gmail.com> Cc: James Morse <james.morse@arm.com> Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com> Signed-off-by: NBin Yu <jkchen@linux.alibaba.com> Reviewed-by: NBaolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: Nzou cao <zoucao@linux.alibaba.com>
-
由 James Morse 提交于
task #28924046 [ Upstream commit 222fc0c8503d98cec3cb2bac2780cdd21a6e31c0 ] Compat user-space is unable to perform ICIMVAU instructions from user-space. Instead it uses a compat-syscall. Add the workaround for Neoverse-N1 #1542419 to this code path. Signed-off-by: NJames Morse <james.morse@arm.com> Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com> Signed-off-by: NBin Yu <jkchen@linux.alibaba.com> Reviewed-by: NBaolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: Nzou cao <zoucao@linux.alibaba.com>
-
由 James Morse 提交于
task #28924046 [ Upstream commit ee9d90be9dda ] Systems affected by Neoverse-N1 #1542419 support DIC so do not need to perform icache maintenance once new instructions are cleaned to the PoU. For the errata workaround, the kernel hides DIC from user-space, so that the unnecessary cache maintenance can be trapped by firmware. To reduce the number of traps, produce a fake IminLine value based on PAGE_SIZE. Signed-off-by: NJames Morse <james.morse@arm.com> Reviewed-by: NSuzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com> Signed-off-by: NBin Yu <jkchen@linux.alibaba.com> Reviewed-by: NBaolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: Nzou cao <zoucao@linux.alibaba.com>
-
由 James Morse 提交于
task #28924046 [ Upstream commit 05460849c3b5 ] Cores affected by Neoverse-N1 #1542419 could execute a stale instruction when a branch is updated to point to freshly generated instructions. To workaround this issue we need user-space to issue unnecessary icache maintenance that we can trap. Start by hiding CTR_EL0.DIC. Backport change: Modify the ARM64_WORKAROUND_1542419 from 45 to 37 Reviewed-by: NSuzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: NJames Morse <james.morse@arm.com> Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com> Signed-off-by: NBin Yu <jkchen@linux.alibaba.com> Reviewed-by: NBaolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: Nzou cao <zoucao@linux.alibaba.com>
-
由 Suzuki K Poulose 提交于
task #28924046 [ Upstream commit 4afe8e79da92 ] When there is a mismatch in the CTR_EL0 field, we trap access to CTR from EL0 on all CPUs to expose the safe value. However, we could skip trapping on a CPU which matches the safe value. Cc: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: NSuzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com> Signed-off-by: NBin Yu <jkchen@linux.alibaba.com> Reviewed-by: NBaolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: Nzou cao <zoucao@linux.alibaba.com>
-
由 James Clark 提交于
task #28924046 [ Upstream commit 9e282b739466 ] The source of the event codes and description text was the Neoverse N1 technical reference manual at: http://infocenter.arm.com/help/topic/com.arm.doc.100616_0301_01_en/neoverse_n1_trm_100616_0301_01_en.pdf The Cortex-A76 shares the same event IDs as the Neoverse N1 and they can be viewed at: https://static.docs.arm.com/100798/0400/cortex_a76_trm_100798_0400_00_en.pdfSigned-off-by: NJames Clark <james.clark@arm.com> Cc: "linux-perf-users@vger.kernel.org" <linux-perf-users@vger.kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jeremy Linton <jeremy.linton@arm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: james clark <james.clark@arm.com> Cc: nd <nd@arm.com> Link: http://lore.kernel.org/lkml/20190902160713.1425-2-james.clark@arm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: NBin Yu <jkchen@linux.alibaba.com> Reviewed-by: NBaolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: Nzou cao <zoucao@linux.alibaba.com>
-
由 James Morse 提交于
task #28924046 [ Upstream commit 3276cc248964 ] Neoverse-N1 affected by #1349291 may report an Uncontained RAS Error as Unrecoverable. The kernel's architecture code already considers Unrecoverable errors as fatal as without kernel-first support no further error-handling is possible. Now that KVM attributes SError to the host/guest more precisely the host's architecture code will always handle host errors that become pending during world-switch. Errors misclassified by this errata that affected the guest will be re-injected to the guest as an implementation-defined SError, which can be uncontained. Until kernel-first support is implemented, no workaround is needed for this issue. Signed-off-by: NJames Morse <james.morse@arm.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NBin Yu <jkchen@linux.alibaba.com> Reviewed-by: NBaolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: Nzou cao <zoucao@linux.alibaba.com>
-
由 Marc Zyngier 提交于
task #28924046 [ Upstream commit a5325089bd05 ] We already mitigate erratum 1188873 affecting Cortex-A76 and Neoverse-N1 r0p0 to r2p0. It turns out that revisions r0p0 to r3p1 of the same cores are affected by erratum 1418040, which has the same workaround as 1188873. Let's expand the range of affected revisions to match 1418040, and repaint all occurences of 1188873 to 1418040. Whilst we're there, do a bit of reformating in silicon-errata.txt and drop a now unnecessary dependency on ARM_ARCH_TIMER_OOL_WORKAROUND. Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NWill Deacon <will.deacon@arm.com> Signed-off-by: NBin Yu <jkchen@linux.alibaba.com> Reviewed-by: NBaolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: Nzou cao <zoucao@linux.alibaba.com>
-
由 Marc Zyngier 提交于
task #28924046 [Upstream commit 0f80cad3124f986d0e46c14d46b8da06d87a2bf4] We currently deal with ARM64_ERRATUM_1188873 by always trapping EL0 accesses for both instruction sets. Although nothing wrong comes out of that, people trying to squeeze the last drop of performance from buggy HW find this over the top. Oh well. Let's change the mitigation by flipping the counter enable bit on return to userspace. Non-broken HW gets an extra branch on the fast path, which is hopefully not the end of the world. The arch timer workaround is also removed. Acked-by: NDaniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NWill Deacon <will.deacon@arm.com> Reviewed-by: NBaolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: Nzou cao <zoucao@linux.alibaba.com>
-
由 Marc Zyngier 提交于
task #28924046 [ Upstream commit 6989303a3b2d864fd8e17d3fa3365d3e9649a598 ] Neoverse-N1 is also affected by ARM64_ERRATUM_1188873, so let's add it to the list of affected CPUs. Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com> [will: Update silicon-errata.txt] Signed-off-by: NWill Deacon <will.deacon@arm.com> Signed-off-by: NBin Yu <jkchen@linux.alibaba.com> Reviewed-by: NBaolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: Nzou cao <zoucao@linux.alibaba.com>
-
由 Marc Zyngier 提交于
task #28924046 [Upstream commit c2b5bba3967a ] Since ARM64_ERRATUM_1188873 only affects AArch32 EL0, it makes some sense that it should depend on COMPAT. Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NWill Deacon <will.deacon@arm.com> Signed-off-by: NBin Yu <jkchen@linux.alibaba.com> Reviewed-by: NBaolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: Nzou cao <zoucao@linux.alibaba.com>
-
由 Arnd Bergmann 提交于
task #28924046 [Upstream commit 040f340134751d73bd03ee92fabb992946c55b3d] arm64_1188873_read_cntvct_el0() is protected by the correct CONFIG_ARM64_ERRATUM_1188873 #ifdef, but the only reference to it is also inside of an CONFIG_ARM_ARCH_TIMER_OOL_WORKAROUND section, and causes a warning if that is disabled: drivers/clocksource/arm_arch_timer.c:323:20: error: 'arm64_1188873_read_cntvct_el0' defined but not used [-Werror=unused-function] Since the erratum requires that we always apply the workaround in the timer driver, select that symbol as we do for SoC specific errata. Fixes: 95b861a4a6d9 ("arm64: arch_timer: Add workaround for ARM erratum 1188873") Acked-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NArnd Bergmann <arnd@arndb.de> Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com> Signed-off-by: NBin Yu <jkchen@linux.alibaba.com> Reviewed-by: NBaolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: Nzou cao <zoucao@linux.alibaba.com>
-
由 Marc Zyngier 提交于
task #28924046 [ Upstream commit 95b861a4a6d94f64d5242605569218160ebacdbe ] When running on Cortex-A76, a timer access from an AArch32 EL0 task may end up with a corrupted value or register. The workaround for this is to trap these accesses at EL1/EL2 and execute them there. This only affects versions r0p0, r1p0 and r2p0 of the CPU. Backport change: The patch modifies ARM64_WORKAROUND_1188873 from 35 to 36 and the ARM_CPU_PART_CORTEX_A76 is deleted because a previous patch has been modified. Acked-by: NMark Rutland <mark.rutland@arm.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com> Signed-off-by: NBin Yu <jkchen@linux.alibaba.com> Reviewed-by: NBaolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: Nzou cao <zoucao@linux.alibaba.com>
-
由 Marc Zyngier 提交于
task #28924046 [ Upstream commit 0cf57b86859c] New CPU, new part number. You know the drill. Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NWill Deacon <will.deacon@arm.com> Signed-off-by: NBin Yu <jkchen@linux.alibaba.com> Reviewed-by: NBaolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: Nzou cao <zoucao@linux.alibaba.com>
-
由 Zhiyuan Hou 提交于
fix #28925962 In faas's VPC network mode, we should enable this feature to support multi-tenancy's snat on a node. Signed-off-by: NZhiyuan Hou <zhiyuan2048@linux.alibaba.com> Acked-by: NDust Li <dust.li@linux.alibaba.com>
-
由 Alex Williamson 提交于
to #28892961 commit abafbc551fddede3e0a08dee1dcde08fc0eb8476 upstream. Accessing the disabled memory space of a PCI device would typically result in a master abort response on conventional PCI, or an unsupported request on PCI express. The user would generally see these as a -1 response for the read return data and the write would be silently discarded, possibly with an uncorrected, non-fatal AER error triggered on the host. Some systems however take it upon themselves to bring down the entire system when they see something that might indicate a loss of data, such as this discarded write to a disabled memory space. To avoid this, we want to try to block the user from accessing memory spaces while they're disabled. We start with a semaphore around the memory enable bit, where writers modify the memory enable state and must be serialized, while readers make use of the memory region and can access in parallel. Writers include both direct manipulation via the command register, as well as any reset path where the internal mechanics of the reset may both explicitly and implicitly disable memory access, and manipulation of the MSI-X configuration, where the MSI-X vector table resides in MMIO space of the device. Readers include the read and write file ops to access the vfio device fd offsets as well as memory mapped access. In the latter case, we make use of our new vma list support to zap, or invalidate, those memory mappings in order to force them to be faulted back in on access. Our semaphore usage will stall user access to MMIO spaces across internal operations like reset, but the user might experience new behavior when trying to access the MMIO space while disabled via the PCI command register. Access via read or write while disabled will return -EIO and access via memory maps will result in a SIGBUS. This is expected to be compatible with known use cases and potentially provides better error handling capabilities than present in the hardware, while avoiding the more readily accessible and severe platform error responses that might otherwise occur. Fixes: CVE-2020-12888 Reviewed-by: NPeter Xu <peterx@redhat.com> Signed-off-by: NAlex Williamson <alex.williamson@redhat.com> [ shile: fixed conflicts in drivers/vfio/pci/vfio_pci.c drivers/vfio/pci/vfio_pci_private.h ] Signed-off-by: NShile Zhang <shile.zhang@linux.alibaba.com> Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
由 Alex Williamson 提交于
to #28892961 commit 11c4cd07ba111a09f49625f9e4c851d83daf0a22 upstream. Rather than calling remap_pfn_range() when a region is mmap'd, setup a vm_ops handler to support dynamic faulting of the range on access. This allows us to manage a list of vmas actively mapping the area that we can later use to invalidate those mappings. The open callback invalidates the vma range so that all tracking is inserted in the fault handler and removed in the close handler. Reviewed-by: NPeter Xu <peterx@redhat.com> Signed-off-by: NAlex Williamson <alex.williamson@redhat.com> Fixes: CVE-2020-12888 [ shile: fixed conflicts in vfio_pci_private.h ] Signed-off-by: NShile Zhang <shile.zhang@linux.alibaba.com> Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
由 chenxiangzuo 提交于
fix #27418285 We introduce a boot parametter 'deferred_meminit' for defer page init feature. Default it is disabled, and we can pass 'deferred_meminit' to enable it. Signed-off-by: Nchenxiangzuo <cxz18821786681@linux.alibaba.com> Reviewed-by: NXu Yu <xuyu@linux.alibaba.com> Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Acked-by: NShile Zhang <shile.zhang@linux.alibaba.com>
-
由 Dominik Brodowski 提交于
fix #29051137 commit 5906056e52e9ee5e130d880443e83016f892b5dd upstream While at it, add a few comments which config options #ifdef and #else statements refer to. Fixes: 86d333a8cc7f (cpufreq: intel_pstate: Add base_frequency attribute) Signed-off-by: NDominik Brodowski <linux@dominikbrodowski.net> Acked-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: NTianjia Zhang <tianjia.zhang@linux.alibaba.com> Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>
-
- 29 6月, 2020 19 次提交
-
-
由 Xiaoguang Wang 提交于
to #28736503 commit 405a5d2b2762f2a9813efdee93274d4e7bf607a1 upstream Basically IORING_OP_POLL_ADD command and async armed poll handlers for regular commands don't touch io_wq_work, so only REQ_F_WORK_INITIALIZED is set, can we do io_wq_work copy and restore. Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
由 Jens Axboe 提交于
to #28736503 commit 18bceab101adde8f38de76016bc77f3f25cf22f4 upstream Some file descriptors use separate waitqueues for their f_ops->poll() handler, most commonly one for read and one for write. The io_uring poll implementation doesn't work with that, as the 2nd poll_wait() call will cause the io_uring poll request to -EINVAL. This affects (at least) tty devices and /dev/random as well. This is a big problem for event loops where some file descriptors work, and others don't. With this fix, io_uring handles multiple waitqueues. Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
由 Xiaoguang Wang 提交于
to #28736503 commit 65a6543da386838f935d2f03f452c5c0acff2a68 upstream While testing io_uring in arm, we found sometimes io_sq_thread() keeps polling io requests even though there are not inflight io requests in block layer. After some investigations, found a possible race about io_kiocb.flags, see below race codes: 1) in the end of io_write() or io_read() req->flags &= ~REQ_F_NEED_CLEANUP; kfree(iovec); return ret; 2) in io_complete_rw_iopoll() if (res != -EAGAIN) req->flags |= REQ_F_IOPOLL_COMPLETED; In IOPOLL mode, io requests still maybe completed by interrupt, then above codes are not safe, concurrent modifications to req->flags, which is not protected by lock or is not atomic modifications. I also had disassemble io_complete_rw_iopoll() in arm: req->flags |= REQ_F_IOPOLL_COMPLETED; 0xffff000008387b18 <+76>: ldr w0, [x19,#104] 0xffff000008387b1c <+80>: orr w0, w0, #0x1000 0xffff000008387b20 <+84>: str w0, [x19,#104] Seems that the "req->flags |= REQ_F_IOPOLL_COMPLETED;" is load and modification, two instructions, which obviously is not atomic. To fix this issue, add a new iopoll_completed in io_kiocb to indicate whether io request is completed. Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
由 Jens Axboe 提交于
to #28736503 commit 310672552f4aea2ad50704711aa3cdd45f5441e9 upstream If the request is still hashed in io_async_task_func(), then it cannot have been canceled and it's pointless to check. So save that check. Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
由 Xiaoguang Wang 提交于
to #28736503 commit 7cdaf587de7c6f494b8433fded19f7728e70e1ef upstream If requests can be submitted and completed inline, we don't need to initialize whole io_wq_work in io_init_req(), which is an expensive operation, add a new 'REQ_F_WORK_INITIALIZED' to determine whether io_wq_work is initialized and add a helper io_req_init_async(), users must call io_req_init_async() for the first time touching any members of io_wq_work. I use /dev/nullb0 to evaluate performance improvement in my physical machine: modprobe null_blk nr_devices=1 completion_nsec=0 sudo taskset -c 60 fio -name=fiotest -filename=/dev/nullb0 -iodepth=128 -thread -rw=read -ioengine=io_uring -direct=1 -bs=4k -size=100G -numjobs=1 -time_based -runtime=120 before this patch: Run status group 0 (all jobs): READ: bw=724MiB/s (759MB/s), 724MiB/s-724MiB/s (759MB/s-759MB/s), io=84.8GiB (91.1GB), run=120001-120001msec With this patch: Run status group 0 (all jobs): READ: bw=761MiB/s (798MB/s), 761MiB/s-761MiB/s (798MB/s-798MB/s), io=89.2GiB (95.8GB), run=120001-120001msec About 5% improvement. Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
由 Jens Axboe 提交于
to #28736503 commit c5b856255cbc3b664d686a83fa9397a835e063de upstream We can assume that O_NONBLOCK is always honored, even if we don't have a ->read/write_iter() for the file type. Also unify the read/write checking for allowing async punt, having the write side factoring in the REQ_F_NOWAIT flag as well. Cc: stable@vger.kernel.org Fixes: 490e89676a52 ("io_uring: only force async punt if poll based retry can't handle it") Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
由 Pavel Begunkov 提交于
to #28736503 commit f5fa38c59cb0b40633dee5cdf7465801be3e4928 upstream io_uring is the only user of io-wq, and now it uses only io-wq callback for all its requests, namely io_wq_submit_work(). Instead of storing work->runner callback in each instance of io_wq_work, keep it in io-wq itself. pros: - reduces io_wq_work size - more robust -- ->func won't be invalidated with mem{cpy,set}(req) - helps other work Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
由 Pavel Begunkov 提交于
to #28736503 commit d4c81f38522f3e7f4be1b472ef9988d0ed7f3696 upstream Remove io_link_work_cb() -- the last custom work.func. Not the prettiest thing, but works. Instead of queueing a linked timeout in io_link_work_cb() mark a request with REQ_F_QUEUE_TIMEOUT and do enqueueing based on the flag in io_wq_submit_work(). Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
由 Pavel Begunkov 提交于
to #28736503 commit ac45abc0e2a8ed16ecc0eea039fe762ddfefbcad upstream In preparation of getting rid of work.func, this removes almost all custom instances of it, leaving only io_wq_submit_work() and io_link_work_cb(). And the last one will be dealt later. Nothing fancy, just routinely remove *_finish() function and inline what's left. E.g. remove io_fsync_finish() + inline __io_fsync() into io_fsync(). As no users of io_req_cancelled() are left, delete it as well. The patch adds extra switch lookup on cold-ish path, but that's overweighted by nice diffstat and other benefits of the following patches. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
由 Pavel Begunkov 提交于
to #28736503 commit 3af73b286ccee493dc055fc58da02b2dc7a5304d upstream Relying on having a specific work.func is dangerous, even if an opcode handler set it itself. E.g. io_wq_assign_next() can modify it. io_close() sets a custom work.func to indicate that __close_fd_get_file() was already called. Fortunately, there is no bugs with io_wq_assign_next() and close yet. Still, do it safe and always be prepared to be called through io_wq_submit_work(). Zero req->close.put_file in prep, and call __close_fd_get_file() IFF it's NULL. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
由 Denis Efremov 提交于
to #28736503 commit a8c73c1a614f6da6c0b04c393f87447e28cb6de4 upstream Use kvfree() to free the pages and vmas, since they are allocated by kvmalloc_array() in a loop. Fixes: d4ef647510b1 ("io_uring: avoid page allocation warnings") Signed-off-by: NDenis Efremov <efremov@linux.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20200605093203.40087-1-efremov@linux.comSigned-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
由 Bijan Mottahedeh 提交于
to #28736503 commit efe68c1ca8f49e8c06afd74b699411bfbb8ba1ff upstream Account for the number of provided buffers when validating the address range. Signed-off-by: NBijan Mottahedeh <bijan.mottahedeh@oracle.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
由 Jens Axboe 提交于
to #28736503 commit dddb3e26f6d88c5344d28cb5ff9d3d6fa05c4f7a upstream We already have the buffer selected, but we should set the iter list again. Cc: stable@vger.kernel.org # v5.7 Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
由 Pavel Begunkov 提交于
to #28736503 commit d2b6f48b691ed67569786c332f0173b918d3fd1b upstream Fail recv/send in case of IORING_SETUP_IOPOLL earlier during prep, so it'd be done only once. Removes duplication as well Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
由 Pavel Begunkov 提交于
to #28736503 commit ec65fea5a8d7a82d3137dd2a44197eb577da111f upstream io_openat_prep() and io_openat2_prep() are identical except for how struct open_how is built. Deduplicate it with a helper. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
由 Pavel Begunkov 提交于
to #28736503 commit 25e72d1012b30bdff712b563e6141a4f311d28d6 upstream build_open_how() is just adjusting open_flags/mode. Do it once during prep. It looks better than storing raw values for the future. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
由 Pavel Begunkov 提交于
to #28736503 commit 3232dd02af65f2d01be641120d2a710176b0c7a7 upstream IORING_SETUP_IOPOLL is defined only for read/write, other opcodes should be disallowed, otherwise it'll get an error as below. Also refuse open/close with SQPOLL, as the polling thread wouldn't know which file table to use. RIP: 0010:io_iopoll_getevents+0x111/0x5a0 Call Trace: ? _raw_spin_unlock_irqrestore+0x24/0x40 ? do_send_sig_info+0x64/0x90 io_iopoll_reap_events.part.0+0x5e/0xa0 io_ring_ctx_wait_and_kill+0x132/0x1c0 io_uring_release+0x20/0x30 __fput+0xcd/0x230 ____fput+0xe/0x10 task_work_run+0x67/0xa0 do_exit+0x353/0xb10 ? handle_mm_fault+0xd4/0x200 ? syscall_trace_enter+0x18c/0x2c0 do_group_exit+0x43/0xa0 __x64_sys_exit_group+0x18/0x20 do_syscall_64+0x60/0x1e0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> [axboe: allow provide/remove buffers and files update] Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
由 Jens Axboe 提交于
to #28736503 commit fd2206e4e97b5bae422d9f2f9ebbc79bc97e44a5 upstream A previous commit enabled this functionality, which also enabled O_PATH to work correctly with io_uring. But we can't safely close the ring itself, as the file handle isn't reference counted inside io_uring_enter(). Instead of jumping through hoops to enable ring closure, add a "soft" ->needs_file option, ->needs_file_no_error. This enables O_PATH file descriptors to work, but still catches the case of trying to close the ring itself. Reported-by: NJann Horn <jannh@google.com> Fixes: 904fbcb115c8 ("io_uring: remove 'fd is io_uring' from close path") Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
由 Pavel Begunkov 提交于
to #28736503 commit 7b53d59859bc932b37895d2d37388e7fa29af7a5 upstream Overflowed requests in io_uring_cancel_files() should be shed only of inflight and overflowed refs. All other left references are owned by someone else. If refcount_sub_and_test() fails, it will go further and put put extra ref, don't do that. Also, don't need to do io_wq_cancel_work() for overflowed reqs, they will be let go shortly anyway. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-