提交 · 0885bdfb0b65ec59e45c86a3e036b30509146563 · openanolis / cloud-kernel

29 6月, 2020 40 次提交

io_uring: use kvfree() in io_sqe_buffer_register() · 0885bdfb

由 Denis Efremov 提交于 6月 05, 2020

to #28736503

commit a8c73c1a614f6da6c0b04c393f87447e28cb6de4 upstream

Use kvfree() to free the pages and vmas, since they are allocated by
kvmalloc_array() in a loop.

Fixes: d4ef647510b1 ("io_uring: avoid page allocation warnings")
Signed-off-by: NDenis Efremov <efremov@linux.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20200605093203.40087-1-efremov@linux.comSigned-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

0885bdfb

io_uring: validate the full range of provided buffers for access · f48b6274

由 Bijan Mottahedeh 提交于 6月 04, 2020

to #28736503

commit efe68c1ca8f49e8c06afd74b699411bfbb8ba1ff upstream

Account for the number of provided buffers when validating the address
range.
Signed-off-by: NBijan Mottahedeh <bijan.mottahedeh@oracle.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

f48b6274

io_uring: re-set iov base/len for buffer select retry · 616bb4fd

由 Jens Axboe 提交于 6月 04, 2020

to #28736503

commit dddb3e26f6d88c5344d28cb5ff9d3d6fa05c4f7a upstream

We already have the buffer selected, but we should set the iter list
again.

Cc: stable@vger.kernel.org # v5.7
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

616bb4fd

io_uring: move send/recv IOPOLL check into prep · 43770e26

由 Pavel Begunkov 提交于 6月 03, 2020

to #28736503

commit d2b6f48b691ed67569786c332f0173b918d3fd1b upstream

Fail recv/send in case of IORING_SETUP_IOPOLL earlier during prep,
so it'd be done only once. Removes duplication as well
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

43770e26

io_uring: deduplicate io_openat{,2}_prep() · 6ead77bf

由 Pavel Begunkov 提交于 6月 03, 2020

to #28736503

commit ec65fea5a8d7a82d3137dd2a44197eb577da111f upstream

io_openat_prep() and io_openat2_prep() are identical except for how
struct open_how is built. Deduplicate it with a helper.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

6ead77bf

io_uring: do build_open_how() only once · d81e76a7

由 Pavel Begunkov 提交于 6月 03, 2020

to #28736503

commit 25e72d1012b30bdff712b563e6141a4f311d28d6 upstream

build_open_how() is just adjusting open_flags/mode. Do it once during
prep. It looks better than storing raw values for the future.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

d81e76a7

io_uring: fix {SQ,IO}POLL with unsupported opcodes · a17a35d5

由 Pavel Begunkov 提交于 6月 03, 2020

to #28736503

commit 3232dd02af65f2d01be641120d2a710176b0c7a7 upstream

IORING_SETUP_IOPOLL is defined only for read/write, other opcodes should
be disallowed, otherwise it'll get an error as below. Also refuse
open/close with SQPOLL, as the polling thread wouldn't know which file
table to use.

RIP: 0010:io_iopoll_getevents+0x111/0x5a0
Call Trace:
 ? _raw_spin_unlock_irqrestore+0x24/0x40
 ? do_send_sig_info+0x64/0x90
 io_iopoll_reap_events.part.0+0x5e/0xa0
 io_ring_ctx_wait_and_kill+0x132/0x1c0
 io_uring_release+0x20/0x30
 __fput+0xcd/0x230
 ____fput+0xe/0x10
 task_work_run+0x67/0xa0
 do_exit+0x353/0xb10
 ? handle_mm_fault+0xd4/0x200
 ? syscall_trace_enter+0x18c/0x2c0
 do_group_exit+0x43/0xa0
 __x64_sys_exit_group+0x18/0x20
 do_syscall_64+0x60/0x1e0
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
[axboe: allow provide/remove buffers and files update]
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

a17a35d5

io_uring: disallow close of ring itself · da73d5e3

由 Jens Axboe 提交于 6月 02, 2020

to #28736503

commit fd2206e4e97b5bae422d9f2f9ebbc79bc97e44a5 upstream

A previous commit enabled this functionality, which also enabled O_PATH
to work correctly with io_uring. But we can't safely close the ring
itself, as the file handle isn't reference counted inside
io_uring_enter(). Instead of jumping through hoops to enable ring
closure, add a "soft" ->needs_file option, ->needs_file_no_error. This
enables O_PATH file descriptors to work, but still catches the case of
trying to close the ring itself.
Reported-by: NJann Horn <jannh@google.com>
Fixes: 904fbcb115c8 ("io_uring: remove 'fd is io_uring' from close path")
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

da73d5e3

io_uring: fix overflowed reqs cancellation · edba7ee1

由 Pavel Begunkov 提交于 5月 30, 2020

to #28736503

commit 7b53d59859bc932b37895d2d37388e7fa29af7a5 upstream

Overflowed requests in io_uring_cancel_files() should be shed only of
inflight and overflowed refs. All other left references are owned by
someone else.

If refcount_sub_and_test() fails, it will go further and put put extra
ref, don't do that. Also, don't need to do io_wq_cancel_work()
for overflowed reqs, they will be let go shortly anyway.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

edba7ee1

io_uring: off timeouts based only on completions · 39e5e106

由 Pavel Begunkov 提交于 5月 30, 2020

to #28736503

commit bfe68a221905de37e65394a6d58c1e5f3e545d2f upstream

Offset timeouts wait not for sqe->off non-timeout CQEs, but rather
sqe->off + number of prior inflight requests. Wait exactly for
sqe->off non-timeout completions
Reported-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

39e5e106

io_uring: move timeouts flushing to a helper · 58825d4e

由 Pavel Begunkov 提交于 5月 30, 2020

to #28736503

commit 360428f8c0cd857006a8a3f515946285370489ac upstream

Separate flushing offset timeouts io_commit_cqring() by moving it into a
helper. Just a preparation, makes following patches clearer.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

58825d4e

statx: hide interfaces no longer used by io_uring · c74e7e90

由 Bijan Mottahedeh 提交于 5月 22, 2020

to #28736503

commit 6f88cc176a3358c54bb6c38c8afee3f3a42faf54 upstream

The io_uring interfaces have been replaced by do_statx() and are no
longer needed.
Signed-off-by: NBijan Mottahedeh <bijan.mottahedeh@oracle.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

c74e7e90

io_uring: call statx directly · 743f0324

由 Bijan Mottahedeh 提交于 5月 22, 2020

to #28736503

commit e62753e4e2926f249d088cc0517be5ed4efec6d6 upstream

Calling statx directly both simplifies the interface and avoids potential
incompatibilities between sync and async invokations.
Signed-off-by: NBijan Mottahedeh <bijan.mottahedeh@oracle.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

743f0324

statx: allow system call to be invoked from io_uring · a5317669

由 Bijan Mottahedeh 提交于 5月 22, 2020

to #28736503

commit 0018784fc84f636d473a0d2a65a34f9d01893c0a upstream

This is a prepatory patch to allow io_uring to invoke statx directly.
Signed-off-by: NBijan Mottahedeh <bijan.mottahedeh@oracle.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

a5317669

io_uring: add io_statx structure · 756f2076

由 Bijan Mottahedeh 提交于 5月 22, 2020

to #28736503

commit 1d9e1288039a47dc1189c3c1fed5cf3c215e94b7 upstream

Separate statx data from open in io_kiocb. No functional changes.
Signed-off-by: NBijan Mottahedeh <bijan.mottahedeh@oracle.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

756f2076

io_uring: get rid of manual punting in io_close · 492cf1ed

由 Pavel Begunkov 提交于 5月 26, 2020

to #28736503

commit 0bf0eefdab52d9f9f3a1eeda32a4fc7afe4e9219 upstream

io_close() was punting async manually to skip grabbing files. Use
REQ_F_NO_FILE_TABLE instead, and pass it through the generic path
with -EAGAIN.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

492cf1ed

io_uring: separate DRAIN flushing into a cold path · 7f7487a9

由 Pavel Begunkov 提交于 5月 26, 2020

to #28736503

commit 0451894522108d6c72934aff6ef89023743a9ed4 upstream

io_commit_cqring() assembly doesn't look good with extra code handling
drained requests. IOSQE_IO_DRAIN is slow and discouraged to be used in
a hot path, so try to minimise its impact by putting it into a helper
and doing a fast check.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

7f7487a9

io_uring: don't re-read sqe->off in timeout_prep() · 1ddc14b7

由 Pavel Begunkov 提交于 5月 26, 2020

to #28736503

commit 56080b02ed6e71fbc0add2d05a32ed7361dd736a upstream

SQEs are user writable, don't read sqe->off twice in io_timeout_prep()
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

1ddc14b7

io_uring: simplify io_timeout locking · 697ad1d7

由 Pavel Begunkov 提交于 5月 26, 2020

to #28736503

commit 733f5c95e6fdabd05b8dfc15e04512809c9652c2 upstream

Move spin_lock_irq() earlier to have only 1 call site of it in
io_timeout(). It makes the flow easier.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

697ad1d7

io_uring: fix flush req->refs underflow · a1be6c19

由 Pavel Begunkov 提交于 5月 26, 2020

to #28736503

commit 4518a3cc273cf82efdd36522fb1f13baad173c70 upstream

In io_uring_cancel_files(), after refcount_sub_and_test() leaves 0
req->refs, it calls io_put_req(), which would also put a ref. Call
io_free_req() instead.

Cc: stable@vger.kernel.org
Fixes: 2ca10259b418 ("io_uring: prune request from overflow list on flush")
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

a1be6c19

io_uring: don't submit sqes when ctx->refs is dying · bc430a1c

由 Xiaoguang Wang 提交于 5月 20, 2020

to #28736503

commit 6b668c9b7fc6fc0c313cdaee8b75d17f4d954ab5 upstream

When IORING_SETUP_SQPOLL is enabled, io_ring_ctx_wait_and_kill() will wait
for sq thread to idle by busy loop:

    while (ctx->sqo_thread && !wq_has_sleeper(&ctx->sqo_wait))
        cond_resched();

Above loop isn't very CPU friendly, it may introduce a short cpu burst on
the current cpu.

If ctx->refs is dying, we forbid sq_thread from submitting any further
SQEs. Instead they just get discarded when we exit.
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

bc430a1c

io_uring: remove req->needs_fixed_files · 7166d570

由 Pavel Begunkov 提交于 5月 17, 2020

to #28736503

commit 0cdaf760f42eb8e8a714c1cc017423e5da6d4936 upstream

A submission is "async" IIF it's done by SQPOLL thread. Instead of
passing @async flag into io_submit_sqes(), deduce it from ctx->flags.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

7166d570

io_uring: add tee(2) support · ae81f3f3

由 Pavel Begunkov 提交于 5月 17, 2020

to #28736503

commit f2a8d5c7a218b9c24befb756c4eb30aa550ce822 upstream

Add IORING_OP_TEE implementing tee(2) support. Almost identical to
splice bits, but without offsets.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

ae81f3f3

splice: export do_tee() · 53763d3f

由 Pavel Begunkov 提交于 5月 17, 2020

to #28736503

commit 9dafdfc2f0a3ae551711098de3d7b621a469f11a upstream

export do_tee() for use in io_uring
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

53763d3f

io_uring: don't repeat valid flag list · 157739bf

由 Pavel Begunkov 提交于 5月 17, 2020

to #28736503

commit c11368a57be460de889696f6ff8815fbcacf4db2 upstream

req->flags stores all sqe->flags. After checking that sqe->flags are
valid set if IOSQE* flags, no need to double check it, just forward them
all.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

157739bf

io_uring: rename io_file_put() · ff238118

由 Pavel Begunkov 提交于 5月 17, 2020

to #28736503

commit 9f13c35b33fddb186beab9ef21c555a01e45f4d7 upstream

io_file_put() deals with flushing state's file refs, adding "state" to
its name makes it a bit clearer. Also, avoid double check of
state->file in __io_file_get() in some cases.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

ff238118

io_uring: cleanup io_poll_remove_one() logic · 6f3b9878

由 Jens Axboe 提交于 5月 17, 2020

to #28736503

commit 3bfa5bcb26f0b52d7ae8416aa0618fff21aceaaf upstream

We only need apoll in the one section, do the juggling with the work
restoration there. This removes a special case further down as well.

No functional changes in this patch.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

6f3b9878

io_uring: remove obsolete 'state' parameter · 3cd5e53f

由 Xiaoguang Wang 提交于 5月 08, 2020

to #28736503

commit 7d01bd745a8f52ff2883f661235139ab6e7d23e6 upstream

The "struct io_submit_state *state" parameter is not used, remove it.
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

3cd5e53f

ACPI / APEI: Add support for the SDEI GHES Notification type · 512ddcd8

由 James Morse 提交于 1月 29, 2019

fix #28612342

commit f9f05395f384ee858520b6c65d7e3e436af20c53 upstream

If the GHES notification type is SDEI, register the provided event
using the SDEI-GHES helper.

SDEI may be one of two types of event, normal and critical. Critical
events can interrupt normal events, so these must have separate
fixmap slots and locks in case both event types are in use.
Signed-off-by: NJames Morse <james.morse@arm.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: NAlex Shi <alex.shi@linux.alibaba.com>
Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>
Reviewed-by: Nluanshi <zhangliguang@linux.alibaba.com>

512ddcd8

firmware: arm_sdei: Add ACPI GHES registration helper · 9ba61fa5

由 James Morse 提交于 1月 29, 2019

fix #28612342

commit f96935d3bc38a5f4b5188b6470a10e3fb8c3f0cc upstream

APEI's Generic Hardware Error Source structures do not describe
whether the SDEI event is shared or private, as this information is
discoverable via the API.

GHES needs to know whether an event is normal or critical to avoid
sharing locks or fixmap entries, but GHES shouldn't have to know about
the SDEI API.

Add a helper to register the GHES using the appropriate normal or
critical callback.
Signed-off-by: NJames Morse <james.morse@arm.com>
Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: NAlex Shi <alex.shi@linux.alibaba.com>
Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>
Reviewed-by: Nluanshi <zhangliguang@linux.alibaba.com>

9ba61fa5

arm64: acpi: Make apei_claim_sea() synchronise with APEI's irq work · 5c61297c

由 James Morse 提交于 5月 01, 2020

fix #28612342

commit 8fcc4ae6faf8b455eeef00bc9ae70744e3b0f462 upstream

APEI is unable to do all of its error handling work in nmi-context, so
it defers non-fatal work onto the irq_work queue. arch_irq_work_raise()
sends an IPI to the calling cpu, but this is not guaranteed to be taken
before returning to user-space.

Unless the exception interrupted a context with irqs-masked,
irq_work_run() can run immediately. Otherwise return -EINPROGRESS to
indicate ghes_notify_sea() found some work to do, but it hasn't
finished yet.

With this apei_claim_sea() returning '0' means this external-abort was
also notification of a firmware-first RAS error, and that APEI has
processed the CPER records.
Signed-off-by: NJames Morse <james.morse@arm.com>
Tested-by: NTyler Baicar <baicar@os.amperecomputing.com>
Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: NAlex Shi <alex.shi@linux.alibaba.com>
Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>
Reviewed-by: Nluanshi <zhangliguang@linux.alibaba.com>

5c61297c

ACPI: APEI: Kick the memory_failure() queue for synchronous errors · 4d6a8607

由 James Morse 提交于 5月 01, 2020

fix #28612342

commit 7f17b4a121d0d50eca22cb1edebf0a157f3e43bf upstream

memory_failure() offlines or repairs pages of memory that have been
discovered to be corrupt. These may be detected by an external
component, (e.g. the memory controller), and notified via an IRQ.
In this case the work is queued as not all of memory_failure()s work
can happen in IRQ context.

If the error was detected as a result of user-space accessing a
corrupt memory location the CPU may take an abort instead. On arm64
this is a 'synchronous external abort', and on a firmware first
system it is replayed using NOTIFY_SEA.

This notification has NMI like properties, (it can interrupt
IRQ-masked code), so the memory_failure() work is queued. If we
return to user-space before the queued memory_failure() work is
processed, we will take the fault again. This loop may cause platform
firmware to exceed some threshold and reboot when Linux could have
recovered from this error.

For NMIlike notifications keep track of whether memory_failure() work
was queued, and make task_work pending to flush out the queue.
To save memory allocations, the task_work is allocated as part of
the ghes_estatus_node, and free()ing it back to the pool is deferred.
Signed-off-by: NJames Morse <james.morse@arm.com>
Tested-by: NTyler Baicar <baicar@os.amperecomputing.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: NAlex Shi <alex.shi@linux.alibaba.com>
Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>
Reviewed-by: Nluanshi <zhangliguang@linux.alibaba.com>

4d6a8607

mm/memory-failure: Add memory_failure_queue_kick() · 14e58c87

由 James Morse 提交于 5月 01, 2020

fix #28612342

commit 062022315e8ad9e0628515dfc756ab54b5fdb26b upstream

The GHES code calls memory_failure_queue() from IRQ context to schedule
work on the current CPU so that memory_failure() can sleep.

For synchronous memory errors the arch code needs to know any signals
that memory_failure() will trigger are pending before it returns to
user-space, possibly when exiting from the IRQ.

Add a helper to kick the memory failure queue, to ensure the scheduled
work has happened. This has to be called from process context, so may
have been migrated from the original cpu. Pass the cpu the work was
queued on.

Change memory_failure_work_func() to permit being called on the 'wrong'
cpu.
Signed-off-by: NJames Morse <james.morse@arm.com>
Tested-by: NTyler Baicar <baicar@os.amperecomputing.com>
Acked-by: NNaoya Horiguchi <naoya.horiguchi@nec.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: NAlex Shi <alex.shi@linux.alibaba.com>
Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>
Reviewed-by: Nluanshi <zhangliguang@linux.alibaba.com>

14e58c87

ACPI / APEI: Use separate fixmap pages for arm64 NMI-like notifications · 6bd13529

由 James Morse 提交于 1月 29, 2019

fix #28612342

commit b972d2eaf0c7021579755eec6b2b79e0f5bc7930 upstream

Now that ghes notification helpers provide the fixmap slots and
take the lock themselves, multiple NMI-like notifications can
be used on arm64.

These should be named after their notification method as they can't
all be called 'NMI'. x86's NOTIFY_NMI already is, change the SEA
fixmap entry to be called FIX_APEI_GHES_SEA.

Future patches can add support for FIX_APEI_GHES_SEI and
FIX_APEI_GHES_SDEI_{NORMAL,CRITICAL}.

Because all of ghes.c builds on both architectures, provide a
constant for each fixmap entry that the architecture will never
use.
Signed-off-by: NJames Morse <james.morse@arm.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: NAlex Shi <alex.shi@linux.alibaba.com>
Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>
Reviewed-by: Nluanshi <zhangliguang@linux.alibaba.com>

6bd13529

ACPI / APEI: Only use queued estatus entry during in_nmi_queue_one_entry() · 0c787924

由 James Morse 提交于 1月 29, 2019

fix #28612342

commit d9f608dc156487b55cb17c2ec591b06e53a6de64 upstream

Each struct ghes has an worst-case sized buffer for storing the
estatus. If an error is being processed by ghes_proc() in process
context this buffer will be in use. If the error source then triggers
an NMI-like notification, the same buffer will be used by
in_nmi_queue_one_entry() to stage the estatus data, before
__process_error() copys it into a queued estatus entry.

Merge __process_error()s work into in_nmi_queue_one_entry() so that
the queued estatus entry is used from the beginning. Use the new
ghes_peek_estatus() to know how much memory to allocate from
the ghes_estatus_pool before reading the records.
Reported-by: NBorislav Petkov <bp@suse.de>
Signed-off-by: NJames Morse <james.morse@arm.com>
Reviewed-by: NBorislav Petkov <bp@suse.de>

Change since v6:
 * Added a comment explaining the 'ack-error, then goto no_work'.
 * Added missing esatus-clearing, which is necessary after reading the GAS,
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: NAlex Shi <alex.shi@linux.alibaba.com>
Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>
Reviewed-by: Nluanshi <zhangliguang@linux.alibaba.com>

0c787924

ACPI / APEI: Split ghes_read_estatus() to allow a peek at the CPER length · 4d8c89f0

由 James Morse 提交于 1月 29, 2019

fix #28612342

commit e00a6e3392cb623b7ac4d61c5e1c1234b4520cad upstream

ghes_read_estatus() reads the record address, then the record's
header, then performs some sanity checks before reading the
records into the provided estatus buffer.

To provide this estatus buffer the caller must know the size of the
records in advance, or always provide a worst-case sized buffer as
happens today for the non-NMI notifications.

Add a function to peek at the record's header to find the size. This
will let the NMI path allocate the right amount of memory before reading
the records, instead of using the worst-case size, and having to copy
the records.

Split ghes_read_estatus() to create __ghes_peek_estatus() which
returns the address and size of the CPER records.
Signed-off-by: NJames Morse <james.morse@arm.com>

Changes since v7:
 * Grammar
 * concistent argument ordering

Changes since v6:
 * Additional buf_addr = 0 error handling
 * Moved checking out of peek-estatus
 * Reworded an error message so we can tell them apart
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: NAlex Shi <alex.shi@linux.alibaba.com>
Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>
Reviewed-by: Nluanshi <zhangliguang@linux.alibaba.com>

4d8c89f0

ACPI / APEI: Make GHES estatus header validation more user friendly · 82e4eb43

由 James Morse 提交于 1月 29, 2019

fix #28612342

commit f2a681b9160b9c80826b3062e71371cfc82b4863 upstream

ghes_read_estatus() checks various lengths in the top-level header to
ensure the CPER records to be read aren't obviously corrupt.

Take the opportunity to make this more user-friendly, printing a
(ratelimited) message about the nature of the header format error.
Suggested-by: NBorislav Petkov <bp@alien8.de>
Signed-off-by: NJames Morse <james.morse@arm.com>
[ rjw: Add missing 'static' ]
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: NAlex Shi <alex.shi@linux.alibaba.com>
Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>
Reviewed-by: Nluanshi <zhangliguang@linux.alibaba.com>

82e4eb43

ACPI / APEI: Pass ghes and estatus separately to avoid a later copy · e5604476

由 James Morse 提交于 1月 29, 2019

fix #28612342

commit f2a7e059aa7a6a22a6f4612f31ee29e726a3bfd0 upstream

The NMI-like notifications scribble over ghes->estatus, before
copying it somewhere else. If this interrupts the ghes_probe() code
calling ghes_proc() on each struct ghes, the data is corrupted.

All the NMI-like notifications should use a queued estatus entry
from the beginning, instead of the ghes version, then copying it.
To do this, break up any use of "ghes->estatus" so that all
functions take the estatus as an argument.

This patch just moves these ghes->estatus dereferences into separate
arguments, no change in behaviour. struct ghes becomes unused in
ghes_clear_estatus() as it only wanted ghes->estatus, which we now
pass directly. This is removed.
Signed-off-by: NJames Morse <james.morse@arm.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: NAlex Shi <alex.shi@linux.alibaba.com>
Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>
Reviewed-by: Nluanshi <zhangliguang@linux.alibaba.com>

e5604476

ACPI / APEI: Let the notification helper specify the fixmap slot · cfc73f7c

由 James Morse 提交于 1月 29, 2019

fix #28612342

commit b484079b9f520cc9a0797d885f1cd7f64b72b1b2 upstream

ghes_copy_tofrom_phys() uses a different fixmap slot depending on in_nmi().
This doesn't work when there are multiple NMI-like notifications, that
could interrupt each other.

As with the locking, move the chosen fixmap_idx to the notification helper.
This only matters for NMI-like notifications, anything calling
ghes_proc() can use the IRQ fixmap slot as its already holding an irqsave
spinlock.

This lets us collapse the ghes_ioremap_pfn_*() helpers.
Signed-off-by: NJames Morse <james.morse@arm.com>
Reviewed-by: NBorislav Petkov <bp@suse.de>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: NAlex Shi <alex.shi@linux.alibaba.com>
Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>
Reviewed-by: Nluanshi <zhangliguang@linux.alibaba.com>

cfc73f7c

ACPI / APEI: Move locking to the notification helper · d56089f5

由 James Morse 提交于 1月 29, 2019

fix #28612342

commit 3b880cbe4df5dd78a2b2279dbe16db9d193412ca upstream

ghes_copy_tofrom_phys() takes different locks depending on in_nmi().
This doesn't work if there are multiple NMI-like notifications, that
can interrupt each other.

Now that NOTIFY_SEA is always called in the same context, move the
lock-taking to the notification helper. The helper will always know
which lock to take. This avoids ghes_copy_tofrom_phys() taking a guess
based on in_nmi().

This splits NOTIFY_NMI and NOTIFY_SEA to use different locks. All
the other notifications use ghes_proc(), and are called in process
or IRQ context. Move the spin_lock_irqsave() around their ghes_proc()
calls.
Signed-off-by: NJames Morse <james.morse@arm.com>
Reviewed-by: NBorislav Petkov <bp@suse.de>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: NAlex Shi <alex.shi@linux.alibaba.com>
Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>
Reviewed-by: Nluanshi <zhangliguang@linux.alibaba.com>

d56089f5

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功