提交 · 64d14fe581202e2c42915216dc1b53132f9d82f8 · openanolis / cloud-kernel

02 9月, 2020 31 次提交

io_uring: name sq thread and ref completions · 64d14fe5

由 Jens Axboe 提交于 6月 28, 2020

to #28736503

commit 0f158b4cf20e7983d5b33878a6aad118cfac4f05 upstream

We used to have three completions, now we just have two. With the two,
let's not allocate them dynamically, just embed then in the ctx and
name them appropriately.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

64d14fe5

io_uring: acquire 'mm' for task_work for SQPOLL · f0f8d236

由 Jens Axboe 提交于 6月 28, 2020

to #28736503

commit 9d8426a09195e2dcf2aa249de2aaadd792d491c7 upstream

If we're unlucky with timing, we could be running task_work after
having dropped the memory context in the sq thread. Since dropping
the context requires a runnable task state, we cannot reliably drop
it as part of our check-for-work loop in io_sq_thread(). Instead,
abstract out the mm acquire for the sq thread into a helper, and call
it from the async task work handler.

Cc: stable@vger.kernel.org # v5.7
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

f0f8d236

io_uring: add memory barrier to synchronize io_kiocb's result and iopoll_completed · a1db6be6

由 Xiaoguang Wang 提交于 6月 28, 2020

to #28736503

commit bbde017a32b32d2fa8d5fddca25fade20132abf8 upstream

In io_complete_rw_iopoll(), stores to io_kiocb's result and iopoll
completed are two independent store operations, to ensure that once
iopoll_completed is ture and then req->result must been perceived by
the cpu executing io_do_iopoll(), proper memory barrier should be used.

And in io_do_iopoll(), we check whether req->result is EAGAIN, if it is,
we'll need to issue this io request using io-wq again. In order to just
issue a single smp_rmb() on the completion side, move the re-submit work
to io_iopoll_complete().

Cc: stable@vger.kernel.org
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
[axboe: don't set ->iopoll_completed for -EAGAIN retry]
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

a1db6be6

io_uring: don't fail links for EAGAIN error in IOPOLL mode · b0dc0883

由 Xiaoguang Wang 提交于 6月 28, 2020

to #28736503

commit 2d7d67920e5c8e0854df23ca77da2dd5880ce5dd upstream

In IOPOLL mode, for EAGAIN error, we'll try to submit io request
again using io-wq, so don't fail rest of links if this io request
has links.

Cc: stable@vger.kernel.org
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

b0dc0883

io_uring: cancel by ->task not pid · aa04321d

由 Pavel Begunkov 提交于 6月 28, 2020

to #28736503

commit 801dd57bd1d8c2c253f43635a3045bfa32a810b3 upstream

For an exiting process it tries to cancel all its inflight requests. Use
req->task to match such instead of work.pid. We always have req->task
set, and it will be valid because we're matching only current exiting
task.

Also, remove work.pid and everything related, it's useless now.
Reported-by: NEric W. Biederman <ebiederm@xmission.com>
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

aa04321d

io_uring: lazy get task · 090d6c4e

由 Pavel Begunkov 提交于 6月 28, 2020

to #28736503

commit 4dd2824d6d5914949b5fe589538bc2622d84c5dd upstream

There will be multiple places where req->task is used, so refcount-pin
it lazily with introduced *io_{get,put}_req_task(). We need to always
have valid ->task for cancellation reasons, but don't care about pinning
it in some cases. That's why it sets req->task in io_req_init() and
implements get/put laziness with a flag.

This also removes using @current from polling io_arm_poll_handler(),
etc., but doesn't change observable behaviour.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

090d6c4e

io_uring: batch cancel in io_uring_cancel_files() · 037c3f51

由 Pavel Begunkov 提交于 6月 28, 2020

to #28736503

commit 67c4d9e693e3bb7fb968af24e3584f821a78ba56 upstream

Instead of waiting for each request one by one, first try to cancel all
of them in a batched manner, and then go over inflight_list/etc to reap
leftovers.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

037c3f51

io_uring: cancel all task's requests on exit · f4c5149f

由 Pavel Begunkov 提交于 6月 28, 2020

to #28736503

commit 44e728b8aae0bb6d4229129083974f9dea43f50b upstream

If a process is going away, io_uring_flush() will cancel only 1
request with a matching pid. Cancel all of them
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

f4c5149f

io-wq: add an option to cancel all matched reqs · 4b786826

由 Pavel Begunkov 提交于 6月 28, 2020

to #28736503

commit 4f26bda1522c35d2701fc219368c7101c17005c1 upstream

This adds support for cancelling all io-wq works matching a predicate.
It isn't used yet, so no change in observable behaviour.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

4b786826

io-wq: reorder cancellation pending -> running · 570a7c05

由 Pavel Begunkov 提交于 6月 28, 2020

to #28736503

commit f4c2665e33f48904f2766d644df33fb3fd54b5ec upstream

Go all over all pending lists and cancel works there, and only then
try to match running requests. No functional changes here, just a
preparation for bulk cancellation.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

570a7c05

io_uring: fix lazy work init · 819f9648

由 Pavel Begunkov 提交于 6月 28, 2020

to #28736503

commit 59960b9deb5354e4cdb0b6ed3a3b653a2b4eb602 upstream

Don't leave garbage in req.work before punting async on -EAGAIN
in io_iopoll_queue().

[  140.922099] general protection fault, probably for non-canonical
     address 0xdead000000000100: 0000 [#1] PREEMPT SMP PTI
...
[  140.922105] RIP: 0010:io_worker_handle_work+0x1db/0x480
...
[  140.922114] Call Trace:
[  140.922118]  ? __next_timer_interrupt+0xe0/0xe0
[  140.922119]  io_wqe_worker+0x2a9/0x360
[  140.922121]  ? _raw_spin_unlock_irqrestore+0x24/0x40
[  140.922124]  kthread+0x12c/0x170
[  140.922125]  ? io_worker_handle_work+0x480/0x480
[  140.922126]  ? kthread_park+0x90/0x90
[  140.922127]  ret_from_fork+0x22/0x30

Fixes: 7cdaf587de7c ("io_uring: avoid whole io_wq_work copy for requests completed inline")
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

819f9648

alinux: config: add N1 config to aarch64 · 4a9a2a96

由 Bin Yu 提交于 6月 30, 2020

task #28924046

This patch enables extra hardware errata for
ARM64 N1 platform and just for aarch64, not for x86.
Signed-off-by: NBin Yu <jkchen@linux.alibaba.com>
Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>

4a9a2a96

arm64: Silence clang warning on mismatched value/register sizes · dcbb4923

由 Catalin Marinas 提交于 10月 28, 2019

task #28924046

[ Upstream commit 27a22fbdeedd6c5c451cf5f830d51782bf50c3a2 ]

Clang reports a warning on the __tlbi(aside1is, 0) macro expansion since
the value size does not match the register size specified in the inline
asm. Construct the ASID value using the __TLBI_VADDR() macro.

Fixes: 222fc0c8503d ("arm64: compat: Workaround Neoverse-N1 #1542419 for compat user-space")
Reported-by: NNathan Chancellor <natechancellor@gmail.com>
Cc: James Morse <james.morse@arm.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NBin Yu <jkchen@linux.alibaba.com>
Reviewed-by: NBaolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Nzou cao <zoucao@linux.alibaba.com>

dcbb4923

arm64: compat: Workaround Neoverse-N1 #1542419 for compat user-space · 98aeb744

由 James Morse 提交于 10月 17, 2019

task #28924046

[ Upstream commit 222fc0c8503d98cec3cb2bac2780cdd21a6e31c0 ]

Compat user-space is unable to perform ICIMVAU instructions from
user-space. Instead it uses a compat-syscall. Add the workaround for
Neoverse-N1 #1542419 to this code path.
Signed-off-by: NJames Morse <james.morse@arm.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NBin Yu <jkchen@linux.alibaba.com>
Reviewed-by: NBaolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Nzou cao <zoucao@linux.alibaba.com>

98aeb744

arm64: Fake the IminLine size on systems affected by Neoverse-N1 #1542419 · 46c9f81c

由 James Morse 提交于 10月 17, 2019

task #28924046

[ Upstream commit ee9d90be9dda ]
Systems affected by Neoverse-N1 #1542419 support DIC so do not need to
perform icache maintenance once new instructions are cleaned to the PoU.
For the errata workaround, the kernel hides DIC from user-space, so that
the unnecessary cache maintenance can be trapped by firmware.

To reduce the number of traps, produce a fake IminLine value based on
PAGE_SIZE.
Signed-off-by: NJames Morse <james.morse@arm.com>
Reviewed-by: NSuzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NBin Yu <jkchen@linux.alibaba.com>
Reviewed-by: NBaolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Nzou cao <zoucao@linux.alibaba.com>

46c9f81c

arm64: errata: Hide CTR_EL0.DIC on systems affected by Neoverse-N1 #1542419 · c1191e5c

由 James Morse 提交于 10月 17, 2019

task #28924046

[ Upstream commit 05460849c3b5 ]
Cores affected by Neoverse-N1 #1542419 could execute a stale instruction
when a branch is updated to point to freshly generated instructions.

To workaround this issue we need user-space to issue unnecessary
icache maintenance that we can trap. Start by hiding CTR_EL0.DIC.

Backport change:
Modify the ARM64_WORKAROUND_1542419 from 45 to 37
Reviewed-by: NSuzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: NJames Morse <james.morse@arm.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NBin Yu <jkchen@linux.alibaba.com>
Reviewed-by: NBaolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Nzou cao <zoucao@linux.alibaba.com>

c1191e5c

arm64: cpufeature: Trap CTR_EL0 access only where it is necessary · 63447ec2

由 Suzuki K Poulose 提交于 10月 09, 2018

task #28924046

[ Upstream commit 4afe8e79da92 ]
When there is a mismatch in the CTR_EL0 field, we trap
access to CTR from EL0 on all CPUs to expose the safe
value. However, we could skip trapping on a CPU which
matches the safe value.

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: NSuzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NBin Yu <jkchen@linux.alibaba.com>
Reviewed-by: NBaolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Nzou cao <zoucao@linux.alibaba.com>

63447ec2

perf tools: Add PMU event JSON files for ARM Cortex-A76 and, Neoverse N1. · 9faaf0ee

由 James Clark 提交于 9月 02, 2019

task #28924046

[ Upstream commit 9e282b739466 ]

The source of the event codes and description text was the Neoverse N1
technical reference manual at:

  http://infocenter.arm.com/help/topic/com.arm.doc.100616_0301_01_en/neoverse_n1_trm_100616_0301_01_en.pdf

The Cortex-A76 shares the same event IDs as the Neoverse N1 and they
can be viewed at:

  https://static.docs.arm.com/100798/0400/cortex_a76_trm_100798_0400_00_en.pdfSigned-off-by: NJames Clark <james.clark@arm.com>
Cc: "linux-perf-users@vger.kernel.org" <linux-perf-users@vger.kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jeremy Linton <jeremy.linton@arm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: james clark <james.clark@arm.com>
Cc: nd <nd@arm.com>
Link: http://lore.kernel.org/lkml/20190902160713.1425-2-james.clark@arm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: NBin Yu <jkchen@linux.alibaba.com>
Reviewed-by: NBaolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Nzou cao <zoucao@linux.alibaba.com>

9faaf0ee

arm64: Update silicon-errata.txt for Neoverse-N1 #1349291 · 52e57fc5

由 James Morse 提交于 6月 18, 2019

task #28924046

[ Upstream commit 3276cc248964 ]

Neoverse-N1 affected by #1349291 may report an Uncontained RAS Error
as Unrecoverable. The kernel's architecture code already considers
Unrecoverable errors as fatal as without kernel-first support no
further error-handling is possible.

Now that KVM attributes SError to the host/guest more precisely
the host's architecture code will always handle host errors that
become pending during world-switch.
Errors misclassified by this errata that affected the guest will be
re-injected to the guest as an implementation-defined SError, which can
be uncontained.

Until kernel-first support is implemented, no workaround is needed
for this issue.
Signed-off-by: NJames Morse <james.morse@arm.com>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NBin Yu <jkchen@linux.alibaba.com>
Reviewed-by: NBaolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Nzou cao <zoucao@linux.alibaba.com>

52e57fc5

arm64: Handle erratum 1418040 as a superset of erratum 1188873 · ca26fded

由 Marc Zyngier 提交于 5月 23, 2019

task #28924046

[ Upstream commit a5325089bd05 ]

We already mitigate erratum 1188873 affecting Cortex-A76 and
Neoverse-N1 r0p0 to r2p0. It turns out that revisions r0p0 to
r3p1 of the same cores are affected by erratum 1418040, which
has the same workaround as 1188873.

Let's expand the range of affected revisions to match 1418040,
and repaint all occurences of 1188873 to 1418040. Whilst we're
there, do a bit of reformating in silicon-errata.txt and drop
a now unnecessary dependency on ARM_ARCH_TIMER_OOL_WORKAROUND.
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NBin Yu <jkchen@linux.alibaba.com>
Reviewed-by: NBaolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Nzou cao <zoucao@linux.alibaba.com>

ca26fded

arm64: Restrict ARM64_ERRATUM_1188873 mitigation to AArch32 · 6ea79e78

由 Marc Zyngier 提交于 4月 15, 2019

task #28924046

[Upstream commit 0f80cad3124f986d0e46c14d46b8da06d87a2bf4]

We currently deal with ARM64_ERRATUM_1188873 by always trapping EL0
accesses for both instruction sets. Although nothing wrong comes out
of that, people trying to squeeze the last drop of performance from
buggy HW find this over the top. Oh well.

Let's change the mitigation by flipping the counter enable bit
on return to userspace. Non-broken HW gets an extra branch on
the fast path, which is hopefully not the end of the world.
The arch timer workaround is also removed.
Acked-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Reviewed-by: NBaolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Nzou cao <zoucao@linux.alibaba.com>

6ea79e78

arm64: Apply ARM64_ERRATUM_1188873 to Neoverse-N1 · b429dc25

由 Marc Zyngier 提交于 4月 15, 2019

task #28924046

[ Upstream commit 6989303a3b2d864fd8e17d3fa3365d3e9649a598 ]

Neoverse-N1 is also affected by ARM64_ERRATUM_1188873, so let's
add it to the list of affected CPUs.
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
[will: Update silicon-errata.txt]
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NBin Yu <jkchen@linux.alibaba.com>
Reviewed-by: NBaolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Nzou cao <zoucao@linux.alibaba.com>

b429dc25

arm64: Make ARM64_ERRATUM_1188873 depend on COMPAT · 5d14f280

由 Marc Zyngier 提交于 4月 15, 2019

task #28924046

[Upstream commit c2b5bba3967a ]

Since ARM64_ERRATUM_1188873 only affects AArch32 EL0, it makes some
sense that it should depend on COMPAT.
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NBin Yu <jkchen@linux.alibaba.com>
Reviewed-by: NBaolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Nzou cao <zoucao@linux.alibaba.com>

5d14f280

arm64: arch_timer: avoid unused function warning · 5be9c8fa

由 Arnd Bergmann 提交于 10月 02, 2018

task #28924046

[Upstream commit 040f340134751d73bd03ee92fabb992946c55b3d]

arm64_1188873_read_cntvct_el0() is protected by the correct
CONFIG_ARM64_ERRATUM_1188873 #ifdef, but the only reference to it is
also inside of an CONFIG_ARM_ARCH_TIMER_OOL_WORKAROUND section,
and causes a warning if that is disabled:

drivers/clocksource/arm_arch_timer.c:323:20: error: 'arm64_1188873_read_cntvct_el0' defined but not used [-Werror=unused-function]

Since the erratum requires that we always apply the workaround
in the timer driver, select that symbol as we do for SoC
specific errata.

Fixes: 95b861a4a6d9 ("arm64: arch_timer: Add workaround for ARM erratum 1188873")
Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NBin Yu <jkchen@linux.alibaba.com>
Reviewed-by: NBaolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Nzou cao <zoucao@linux.alibaba.com>

5be9c8fa

arm64: arch_timerq: Add workaround for ARM erratum 1188873 · 101ddbde

由 Marc Zyngier 提交于 9月 27, 2018

task #28924046

[ Upstream commit 95b861a4a6d94f64d5242605569218160ebacdbe ]

When running on Cortex-A76, a timer access from an AArch32 EL0
task may end up with a corrupted value or register. The workaround for
this is to trap these accesses at EL1/EL2 and execute them there.

This only affects versions r0p0, r1p0 and r2p0 of the CPU.

Backport change:
The patch modifies ARM64_WORKAROUND_1188873 from 35 to 36 and
the ARM_CPU_PART_CORTEX_A76 is deleted because a previous patch
has been modified.
Acked-by: NMark Rutland <mark.rutland@arm.com>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NBin Yu <jkchen@linux.alibaba.com>
Reviewed-by: NBaolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Nzou cao <zoucao@linux.alibaba.com>

101ddbde

arm64: Add part number for Neoverse N1 · 0c455077

由 Marc Zyngier 提交于 4月 15, 2019

task #28924046

[ Upstream commit 0cf57b86859c]

New CPU, new part number. You know the drill.
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NBin Yu <jkchen@linux.alibaba.com>
Reviewed-by: NBaolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Nzou cao <zoucao@linux.alibaba.com>

0c455077

configs: enable conntrack_zone option · fce8bc83

由 Zhiyuan Hou 提交于 6月 03, 2020

fix #28925962

In faas's VPC network mode, we should enable this feature to support
multi-tenancy's snat on a node.
Signed-off-by: NZhiyuan Hou <zhiyuan2048@linux.alibaba.com>
Acked-by: NDust Li <dust.li@linux.alibaba.com>

fce8bc83

vfio-pci: Invalidate mmaps and block MMIO access on disabled memory · aa30f9c9

由 Alex Williamson 提交于 4月 22, 2020

to #28892961

commit abafbc551fddede3e0a08dee1dcde08fc0eb8476 upstream.

Accessing the disabled memory space of a PCI device would typically
result in a master abort response on conventional PCI, or an
unsupported request on PCI express.  The user would generally see
these as a -1 response for the read return data and the write would be
silently discarded, possibly with an uncorrected, non-fatal AER error
triggered on the host.  Some systems however take it upon themselves
to bring down the entire system when they see something that might
indicate a loss of data, such as this discarded write to a disabled
memory space.

To avoid this, we want to try to block the user from accessing memory
spaces while they're disabled.  We start with a semaphore around the
memory enable bit, where writers modify the memory enable state and
must be serialized, while readers make use of the memory region and
can access in parallel.  Writers include both direct manipulation via
the command register, as well as any reset path where the internal
mechanics of the reset may both explicitly and implicitly disable
memory access, and manipulation of the MSI-X configuration, where the
MSI-X vector table resides in MMIO space of the device.  Readers
include the read and write file ops to access the vfio device fd
offsets as well as memory mapped access.  In the latter case, we make
use of our new vma list support to zap, or invalidate, those memory
mappings in order to force them to be faulted back in on access.

Our semaphore usage will stall user access to MMIO spaces across
internal operations like reset, but the user might experience new
behavior when trying to access the MMIO space while disabled via the
PCI command register.  Access via read or write while disabled will
return -EIO and access via memory maps will result in a SIGBUS.  This
is expected to be compatible with known use cases and potentially
provides better error handling capabilities than present in the
hardware, while avoiding the more readily accessible and severe
platform error responses that might otherwise occur.

Fixes: CVE-2020-12888
Reviewed-by: NPeter Xu <peterx@redhat.com>
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>

[ shile: fixed conflicts in
	drivers/vfio/pci/vfio_pci.c
	drivers/vfio/pci/vfio_pci_private.h ]
Signed-off-by: NShile Zhang <shile.zhang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

aa30f9c9

vfio-pci: Fault mmaps to enable vma tracking · c3da9844

由 Alex Williamson 提交于 4月 28, 2020

to #28892961

commit 11c4cd07ba111a09f49625f9e4c851d83daf0a22 upstream.

Rather than calling remap_pfn_range() when a region is mmap'd, setup
a vm_ops handler to support dynamic faulting of the range on access.
This allows us to manage a list of vmas actively mapping the area that
we can later use to invalidate those mappings.  The open callback
invalidates the vma range so that all tracking is inserted in the
fault handler and removed in the close handler.
Reviewed-by: NPeter Xu <peterx@redhat.com>
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>

Fixes: CVE-2020-12888
[ shile: fixed conflicts in vfio_pci_private.h ]
Signed-off-by: NShile Zhang <shile.zhang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

c3da9844

alinux: introduce deferred_meminit boot parameter · 05f6ed40

由 chenxiangzuo 提交于 6月 29, 2020

fix #27418285

We introduce a boot parametter 'deferred_meminit' for defer
page init feature. Default it is disabled, and we can pass
'deferred_meminit' to enable it.
Signed-off-by: Nchenxiangzuo <cxz18821786681@linux.alibaba.com>
Reviewed-by: NXu Yu <xuyu@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NShile Zhang <shile.zhang@linux.alibaba.com>

05f6ed40

cpufreq: intel_pstate: Fix compilation for !CONFIG_ACPI · 55952de7

由 Dominik Brodowski 提交于 10月 23, 2018

fix #29051137

commit 5906056e52e9ee5e130d880443e83016f892b5dd upstream

While at it, add a few comments which config options #ifdef
and #else statements refer to.

Fixes: 86d333a8cc7f (cpufreq: intel_pstate: Add base_frequency attribute)
Signed-off-by: NDominik Brodowski <linux@dominikbrodowski.net>
Acked-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: NTianjia Zhang <tianjia.zhang@linux.alibaba.com>
Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>

55952de7

29 6月, 2020 9 次提交

io_uring: avoid unnecessary io_wq_work copy for fast poll feature · b2607733

由 Xiaoguang Wang 提交于 6月 10, 2020

to #28736503

commit 405a5d2b2762f2a9813efdee93274d4e7bf607a1 upstream

Basically IORING_OP_POLL_ADD command and async armed poll handlers
for regular commands don't touch io_wq_work, so only REQ_F_WORK_INITIALIZED
is set, can we do io_wq_work copy and restore.
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

b2607733

io_uring: allow POLL_ADD with double poll_wait() users · cfbe7e8b

由 Jens Axboe 提交于 5月 15, 2020

to #28736503

commit 18bceab101adde8f38de76016bc77f3f25cf22f4 upstream

Some file descriptors use separate waitqueues for their f_ops->poll()
handler, most commonly one for read and one for write. The io_uring
poll implementation doesn't work with that, as the 2nd poll_wait()
call will cause the io_uring poll request to -EINVAL.

This affects (at least) tty devices and /dev/random as well. This is a
big problem for event loops where some file descriptors work, and others
don't.

With this fix, io_uring handles multiple waitqueues.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

cfbe7e8b

io_uring: fix io_kiocb.flags modification race in IOPOLL mode · d3101fc3

由 Xiaoguang Wang 提交于 6月 11, 2020

to #28736503

commit 65a6543da386838f935d2f03f452c5c0acff2a68 upstream

While testing io_uring in arm, we found sometimes io_sq_thread() keeps
polling io requests even though there are not inflight io requests in
block layer. After some investigations, found a possible race about
io_kiocb.flags, see below race codes:
  1) in the end of io_write() or io_read()
    req->flags &= ~REQ_F_NEED_CLEANUP;
    kfree(iovec);
    return ret;

  2) in io_complete_rw_iopoll()
    if (res != -EAGAIN)
        req->flags |= REQ_F_IOPOLL_COMPLETED;

In IOPOLL mode, io requests still maybe completed by interrupt, then
above codes are not safe, concurrent modifications to req->flags, which
is not protected by lock or is not atomic modifications. I also had
disassemble io_complete_rw_iopoll() in arm:
   req->flags |= REQ_F_IOPOLL_COMPLETED;
   0xffff000008387b18 <+76>:    ldr     w0, [x19,#104]
   0xffff000008387b1c <+80>:    orr     w0, w0, #0x1000
   0xffff000008387b20 <+84>:    str     w0, [x19,#104]

Seems that the "req->flags |= REQ_F_IOPOLL_COMPLETED;" is  load and
modification, two instructions, which obviously is not atomic.

To fix this issue, add a new iopoll_completed in io_kiocb to indicate
whether io request is completed.
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

d3101fc3

io_uring: async task poll trigger cleanup · 9c90c6a4

由 Jens Axboe 提交于 5月 17, 2020

to #28736503

commit 310672552f4aea2ad50704711aa3cdd45f5441e9 upstream

If the request is still hashed in io_async_task_func(), then it cannot
have been canceled and it's pointless to check. So save that check.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

9c90c6a4

io_uring: avoid whole io_wq_work copy for requests completed inline · 1d6d088b

由 Xiaoguang Wang 提交于 6月 10, 2020

to #28736503

commit 7cdaf587de7c6f494b8433fded19f7728e70e1ef upstream

If requests can be submitted and completed inline, we don't need to
initialize whole io_wq_work in io_init_req(), which is an expensive
operation, add a new 'REQ_F_WORK_INITIALIZED' to determine whether
io_wq_work is initialized and add a helper io_req_init_async(), users
must call io_req_init_async() for the first time touching any members
of io_wq_work.

I use /dev/nullb0 to evaluate performance improvement in my physical
machine:
  modprobe null_blk nr_devices=1 completion_nsec=0
  sudo taskset -c 60 fio  -name=fiotest -filename=/dev/nullb0 -iodepth=128
  -thread -rw=read -ioengine=io_uring -direct=1 -bs=4k -size=100G -numjobs=1
  -time_based -runtime=120

before this patch:
Run status group 0 (all jobs):
   READ: bw=724MiB/s (759MB/s), 724MiB/s-724MiB/s (759MB/s-759MB/s),
   io=84.8GiB (91.1GB), run=120001-120001msec

With this patch:
Run status group 0 (all jobs):
   READ: bw=761MiB/s (798MB/s), 761MiB/s-761MiB/s (798MB/s-798MB/s),
   io=89.2GiB (95.8GB), run=120001-120001msec

About 5% improvement.
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

1d6d088b

io_uring: allow O_NONBLOCK async retry · c0810993

由 Jens Axboe 提交于 6月 09, 2020

to #28736503

commit c5b856255cbc3b664d686a83fa9397a835e063de upstream

We can assume that O_NONBLOCK is always honored, even if we don't
have a ->read/write_iter() for the file type. Also unify the read/write
checking for allowing async punt, having the write side factoring in the
REQ_F_NOWAIT flag as well.

Cc: stable@vger.kernel.org
Fixes: 490e89676a52 ("io_uring: only force async punt if poll based retry can't handle it")
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

c0810993

io_wq: add per-wq work handler instead of per work · c349b387

由 Pavel Begunkov 提交于 6月 08, 2020

to #28736503

commit f5fa38c59cb0b40633dee5cdf7465801be3e4928 upstream

io_uring is the only user of io-wq, and now it uses only io-wq callback
for all its requests, namely io_wq_submit_work(). Instead of storing
work->runner callback in each instance of io_wq_work, keep it in io-wq
itself.

pros:
- reduces io_wq_work size
- more robust -- ->func won't be invalidated with mem{cpy,set}(req)
- helps other work
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

c349b387

io_uring: don't arm a timeout through work.func · efc07284

由 Pavel Begunkov 提交于 6月 08, 2020

to #28736503

commit d4c81f38522f3e7f4be1b472ef9988d0ed7f3696 upstream

Remove io_link_work_cb() -- the last custom work.func.
Not the prettiest thing, but works. Instead of queueing a linked timeout
in io_link_work_cb() mark a request with REQ_F_QUEUE_TIMEOUT and do
enqueueing based on the flag in io_wq_submit_work().
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

efc07284

io_uring: remove custom ->func handlers · 7e2014fa

由 Pavel Begunkov 提交于 6月 08, 2020

to #28736503

commit ac45abc0e2a8ed16ecc0eea039fe762ddfefbcad upstream

In preparation of getting rid of work.func, this removes almost all
custom instances of it, leaving only io_wq_submit_work() and
io_link_work_cb(). And the last one will be dealt later.

Nothing fancy, just routinely remove *_finish() function and inline
what's left. E.g. remove io_fsync_finish() + inline __io_fsync() into
io_fsync().

As no users of io_req_cancelled() are left, delete it as well. The patch
adds extra switch lookup on cold-ish path, but that's overweighted by
nice diffstat and other benefits of the following patches.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

7e2014fa

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功