提交 6802dc37 编写于 作者: A Aili Yao 提交者: Yang Yingliang

mm,hwpoison: return -EHWPOISON to denote that the page has already been poisoned

mainline inclusion
from mainline-v5.13
commit 47af12ba
category: bugfix
bugzilla: 175120
CVE: NA

-------------------------------------------------

When memory_failure() is called with MF_ACTION_REQUIRED on the page that
has already been hwpoisoned, memory_failure() could fail to send SIGBUS
to the affected process, which results in infinite loop of MCEs.

Currently memory_failure() returns 0 if it's called for already
hwpoisoned page, then the caller, kill_me_maybe(), could return without
sending SIGBUS to current process.  An action required MCE is raised
when the current process accesses to the broken memory, so no SIGBUS
means that the current process continues to run and access to the error
page again soon, so running into MCE loop.

This issue can arise for example in the following scenarios:

 - Two or more threads access to the poisoned page concurrently. If
   local MCE is enabled, MCE handler independently handles the MCE
   events. So there's a race among MCE events, and the second or latter
   threads fall into the situation in question.

 - If there was a precedent memory error event and memory_failure() for
   the event failed to unmap the error page for some reason, the
   subsequent memory access to the error page triggers the MCE loop
   situation.

To fix the issue, make memory_failure() return an error code when the
error page has already been hwpoisoned.  This allows memory error
handler to control how it sends signals to userspace.  And make sure
that any process touching a hwpoisoned page should get a SIGBUS even in
"already hwpoisoned" path of memory_failure() as is done in page fault
path.

Link: https://lkml.kernel.org/r/20210521030156.2612074-3-nao.horiguchi@gmail.comSigned-off-by: NAili Yao <yaoaili@kingsoft.com>
Signed-off-by: NNaoya Horiguchi <naoya.horiguchi@nec.com>
Reviewed-by: NOscar Salvador <osalvador@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jue Wang <juew@google.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NNanyong Sun <sunnanyong@huawei.com>
Reviewed-by: NChen Wandun <chenwandun@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
上级 81f97f06
...@@ -1094,7 +1094,7 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags) ...@@ -1094,7 +1094,7 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags)
if (TestSetPageHWPoison(head)) { if (TestSetPageHWPoison(head)) {
pr_err("Memory failure: %#lx: already hardware poisoned\n", pr_err("Memory failure: %#lx: already hardware poisoned\n",
pfn); pfn);
return 0; return -EHWPOISON;
} }
num_poisoned_pages_inc(); num_poisoned_pages_inc();
...@@ -1286,6 +1286,7 @@ int memory_failure(unsigned long pfn, int flags) ...@@ -1286,6 +1286,7 @@ int memory_failure(unsigned long pfn, int flags)
if (TestSetPageHWPoison(p)) { if (TestSetPageHWPoison(p)) {
pr_err("Memory failure: %#lx: already hardware poisoned\n", pr_err("Memory failure: %#lx: already hardware poisoned\n",
pfn); pfn);
res = -EHWPOISON;
goto unlock_mutex; goto unlock_mutex;
} }
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册