提交 · 99cb7a1711beac8141d5c0c5c9a44a716e7b0434 · openanolis / cloud-kernel

30 10月, 2019 13 次提交

x86/mm/cpa: Prevent large page split when ftrace flips RW on kernel text · 99cb7a17

由 Thomas Gleixner 提交于 8月 29, 2019

commit 7af0145067bc429a09ac4047b167c0971c9f0dc7 upstream.

ftrace does not use text_poke() for enabling trace functionality. It uses
its own mechanism and flips the whole kernel text to RW and back to RO.

The CPA rework removed a loop based check of 4k pages which tried to
preserve a large page by checking each 4k page whether the change would
actually cover all pages in the large page.

This resulted in endless loops for nothing as in testing it turned out that
it actually never preserved anything. Of course testing missed to include
ftrace, which is the one and only case which benefitted from the 4k loop.

As a consequence enabling function tracing or ftrace based kprobes results
in a full 4k split of the kernel text, which affects iTLB performance.

The kernel RO protection is the only valid case where this can actually
preserve large pages.

All other static protections (RO data, data NX, PCI, BIOS) are truly
static. So a conflict with those protections which results in a split
should only ever happen when a change of memory next to a protected region
is attempted. But these conflicts are rightfully splitting the large page
to preserve the protected regions. In fact a change to the protected
regions itself is a bug and is warned about.

Add an exception for the static protection check for kernel text RO when
the to be changed region spawns a full large page which allows to preserve
the large mappings. This also prevents the syslog to be spammed about CPA
violations when ftrace is used.

The exception needs to be removed once ftrace switched over to text_poke()
which avoids the whole issue.

Fixes: 585948f4f695 ("x86/mm/cpa: Avoid the 4k pages check completely")
Reported-by: NSong Liu <songliubraving@fb.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Tested-by: NSong Liu <songliubraving@fb.com>
Reviewed-by: NSong Liu <songliubraving@fb.com>
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1908282355340.1938@nanos.tec.linutronix.deSigned-off-by: NShile Zhang <shile.zhang@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

99cb7a17

x86/mm: Remove unused variable 'old_pte' · 987e80b6

由 Qian Cai 提交于 3月 01, 2019

commit 24c41220659ecc5576c34c6f23537f8d3949fb05 upstream.

The commit 3a19109e ("x86/mm: Fix try_preserve_large_page() to
handle large PAT bit") fixed try_preserve_large_page() by using the
corresponding pud/pmd prot/pfn interfaces, but left a variable unused
because it no longer used pte_pfn().

Later, the commit 8679de0959e6 ("x86/mm/cpa: Split, rename and clean up
try_preserve_large_page()") renamed try_preserve_large_page() to
__should_split_large_page(), but the unused variable remains.

arch/x86/mm/pageattr.c: In function '__should_split_large_page':
arch/x86/mm/pageattr.c:741:17: warning: variable 'old_pte' set but not
used [-Wunused-but-set-variable]

Fixes: 3a19109e ("x86/mm: Fix try_preserve_large_page() to handle large PAT bit")
Signed-off-by: NQian Cai <cai@lca.pw>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: dave.hansen@linux.intel.com
Cc: luto@kernel.org
Cc: peterz@infradead.org
Cc: toshi.kani@hpe.com
Cc: bp@alien8.de
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/20190301152924.94762-1-cai@lca.pwSigned-off-by: NShile Zhang <shile.zhang@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

987e80b6

x86/mm/cpa: Avoid the 4k pages check completely · 12124b81

由 Thomas Gleixner 提交于 9月 17, 2018

commit 585948f4f695b07204702cfee0f828424af32aa7 upstream.

The extra loop which tries hard to preserve large pages in case of conflicts
with static protection regions turns out to be not preserving anything, at
least not in the experiments which have been conducted.

There might be corner cases in which the code would be able to preserve a
large page oaccsionally, but it's really not worth the extra code and the
cycles wasted in the common case.

Before:

 1G pages checked:                    2
 1G pages sameprot:                   0
 1G pages preserved:                  0
 2M pages checked:                  541
 2M pages sameprot:                 466
 2M pages preserved:                 47
 4K pages checked:                  514
 4K pages set-checked:             7668

After:
 1G pages checked:                    2
 1G pages sameprot:                   0
 1G pages preserved:                  0
 2M pages checked:                  538
 2M pages sameprot:                 466
 2M pages preserved:                 47
 4K pages set-checked:             7668
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NDave Hansen <dave.hansen@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Bin Yang <bin.yang@intel.com>
Cc: Mark Gross <mark.gross@intel.com>
Link: https://lkml.kernel.org/r/20180917143546.589642503@linutronix.deSigned-off-by: NShile Zhang <shile.zhang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

12124b81

x86/mm/cpa: Do the range check early · 861a2ee6

由 Thomas Gleixner 提交于 9月 17, 2018

commit 9cc9f17a5a0a8564b41b7c5c460e7f078c42d712 upstream.

To avoid excessive 4k wise checks in the common case do a quick check first
whether the requested new page protections conflict with a static
protection area in the large page. If there is no conflict then the
decision whether to preserve or to split the page can be made immediately.

If the requested range covers the full large page, preserve it. Otherwise
split it up. No point in doing a slow crawl in 4k steps.

Before:

 1G pages checked:                    2
 1G pages sameprot:                   0
 1G pages preserved:                  0
 2M pages checked:                  538
 2M pages sameprot:                 466
 2M pages preserved:                 47
 4K pages checked:               560642
 4K pages set-checked:             7668

After:

 1G pages checked:                    2
 1G pages sameprot:                   0
 1G pages preserved:                  0
 2M pages checked:                  541
 2M pages sameprot:                 466
 2M pages preserved:                 47
 4K pages checked:                  514
 4K pages set-checked:             7668
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NDave Hansen <dave.hansen@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Bin Yang <bin.yang@intel.com>
Cc: Mark Gross <mark.gross@intel.com>
Link: https://lkml.kernel.org/r/20180917143546.507259989@linutronix.deSigned-off-by: NShile Zhang <shile.zhang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

861a2ee6

x86/mm/cpa: Optimize same protection check · d9ce666d

由 Thomas Gleixner 提交于 9月 17, 2018

commit 1c4b406ee89c2c4210f2e19b97d39215f445c316 upstream.

When the existing mapping is correct and the new requested page protections
are the same as the existing ones, then further checks can be omitted and the
large page can be preserved. The slow path 4k wise check will not come up with
a different result.

Before:

 1G pages checked:                    2
 1G pages sameprot:                   0
 1G pages preserved:                  0
 2M pages checked:                  540
 2M pages sameprot:                 466
 2M pages preserved:                 47
 4K pages checked:               800709
 4K pages set-checked:             7668

After:

 1G pages checked:                    2
 1G pages sameprot:                   0
 1G pages preserved:                  0
 2M pages checked:                  538
 2M pages sameprot:                 466
 2M pages preserved:                 47
 4K pages checked:               560642
 4K pages set-checked:             7668
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NDave Hansen <dave.hansen@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Bin Yang <bin.yang@intel.com>
Cc: Mark Gross <mark.gross@intel.com>
Link: https://lkml.kernel.org/r/20180917143546.424477581@linutronix.deSigned-off-by: NShile Zhang <shile.zhang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

d9ce666d

x86/mm/cpa: Add sanity check for existing mappings · cb3646ec

由 Thomas Gleixner 提交于 9月 17, 2018

commit f61c5ba2885eabc7bc4b0b2f5232f475216ba446 upstream.

With the range check it is possible to do a quick verification that the
current mapping is correct vs. the static protection areas.

In case a incorrect mapping is detected a warning is emitted and the large
page is split up. If the large page is a 2M page, then the split code is
forced to check the static protections for the PTE entries to fix up the
incorrectness. For 1G pages this can't be done easily because that would
require to either find the offending 2M areas before the split or
afterwards. For now just warn about that case and revisit it when reported.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NDave Hansen <dave.hansen@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Bin Yang <bin.yang@intel.com>
Cc: Mark Gross <mark.gross@intel.com>
Link: https://lkml.kernel.org/r/20180917143546.331408643@linutronix.deSigned-off-by: NShile Zhang <shile.zhang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

cb3646ec

x86/mm/cpa: Avoid static protection checks on unmap · 627c2190

由 Thomas Gleixner 提交于 9月 17, 2018

commit 69c31e69df3d2efc4ad7f53d500fdd920d3865a4 upstream.

If the new pgprot has the PRESENT bit cleared, then conflicts vs. RW/NX are
completely irrelevant.

Before:

 1G pages checked:                    2
 1G pages sameprot:                   0
 1G pages preserved:                  0
 2M pages checked:                  540
 2M pages sameprot:                 466
 2M pages preserved:                 47
 4K pages checked:               800770
 4K pages set-checked:             7668

After:

 1G pages checked:                    2
 1G pages sameprot:                   0
 1G pages preserved:                  0
 2M pages checked:                  540
 2M pages sameprot:                 466
 2M pages preserved:                 47
 4K pages checked:               800709
 4K pages set-checked:             7668
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NDave Hansen <dave.hansen@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Bin Yang <bin.yang@intel.com>
Cc: Mark Gross <mark.gross@intel.com>
Link: https://lkml.kernel.org/r/20180917143546.245849757@linutronix.deSigned-off-by: NShile Zhang <shile.zhang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

627c2190

x86/mm/cpa: Add large page preservation statistics · b4212f86

由 Thomas Gleixner 提交于 9月 17, 2018

commit 5c280cf6081ff99078e28b51172d78359f194fd9 upstream.

The large page preservation mechanism is just magic and provides no
information at all. Add optional statistic output in debugfs so the magic can
be evaluated. Defaults is off.

Output:

 1G pages checked:                    2
 1G pages sameprot:                   0
 1G pages preserved:                  0
 2M pages checked:                  540
 2M pages sameprot:                 466
 2M pages preserved:                 47
 4K pages checked:               800770
 4K pages set-checked:             7668
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NDave Hansen <dave.hansen@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Bin Yang <bin.yang@intel.com>
Cc: Mark Gross <mark.gross@intel.com>
Link: https://lkml.kernel.org/r/20180917143546.160867778@linutronix.deSigned-off-by: NShile Zhang <shile.zhang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

b4212f86

x86/mm/cpa: Add debug mechanism · 2037aad1

由 Thomas Gleixner 提交于 9月 17, 2018

commit 4046460b867f8b041c81c26c09d3bcad6d6e814e upstream.

The whole static protection magic is silently fixing up anything which is
handed in. That's just wrong. The offending call sites need to be fixed.

Add a debug mechanism which emits a warning if a requested mapping needs to be
fixed up. The DETECT debug mechanism is really not meant to be enabled except
for developers, so limit the output hard to the protection fixups.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NDave Hansen <dave.hansen@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Bin Yang <bin.yang@intel.com>
Cc: Mark Gross <mark.gross@intel.com>
Link: https://lkml.kernel.org/r/20180917143546.078998733@linutronix.deSigned-off-by: NShile Zhang <shile.zhang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

2037aad1

x86/mm/cpa: Allow range check for static protections · 28c4f97f

由 Thomas Gleixner 提交于 9月 17, 2018

commit 91ee8f5c1f50a1f4096c178a93a8da46ce3f6cc8 upstream.

Checking static protections only page by page is slow especially for huge
pages. To allow quick checks over a complete range, add the ability to do
that.

Make the checks inclusive so the ranges can be directly used for debug output
later.

No functional change.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NDave Hansen <dave.hansen@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Bin Yang <bin.yang@intel.com>
Cc: Mark Gross <mark.gross@intel.com>
Link: https://lkml.kernel.org/r/20180917143545.995734490@linutronix.deSigned-off-by: NShile Zhang <shile.zhang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

28c4f97f

x86/mm/cpa: Rework static_protections() · 9bb4606c

由 Thomas Gleixner 提交于 9月 17, 2018

commit afd7969a99e072e6aa0d88511176d4d2f3009fd9 upstream.

static_protections() is pretty unreadable. Split it up into separate checks
for each protection area.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NDave Hansen <dave.hansen@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Bin Yang <bin.yang@intel.com>
Cc: Mark Gross <mark.gross@intel.com>
Link: https://lkml.kernel.org/r/20180917143545.913005317@linutronix.deSigned-off-by: NShile Zhang <shile.zhang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

9bb4606c

x86/mm/cpa: Split, rename and clean up try_preserve_large_page() · 395a831b

由 Thomas Gleixner 提交于 9月 17, 2018

commit 8679de0959e65ee7f78db6405a8d23e61665751d upstream.

Avoid the extra variable and gotos by splitting the function into the
actual algorithm and a callable function which contains the lock
protection.

Rename it to should_split_large_page() while at it so the return values make
actually sense.

Clean up the code flow, comments and general whitespace damage while at it. No
functional change.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NDave Hansen <dave.hansen@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Bin Yang <bin.yang@intel.com>
Cc: Mark Gross <mark.gross@intel.com>
Link: https://lkml.kernel.org/r/20180917143545.830507216@linutronix.deSigned-off-by: NShile Zhang <shile.zhang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

395a831b

x86/mm/init32: Mark text and rodata RO in one go · e782ba9f

由 Thomas Gleixner 提交于 9月 17, 2018

commit 2a25dc7c79c92c6cba45c6218c49395173be80bf upstream.

The sequence of marking text and rodata read-only in 32bit init is:

  set_ro(text);
  kernel_set_to_readonly = 1;
  set_ro(rodata);

When kernel_set_to_readonly is 1 it enables the protection mechanism in CPA
for the read only regions. With the upcoming checks for existing mappings
this consequently triggers the warning about an existing mapping being
incorrect vs. static protections because rodata has not been converted yet.

There is no technical reason to split the two, so just combine the RO
protection to convert text and rodata in one go.

Convert the printks to pr_info while at it.
Reported-by: Nkernel test robot <rong.a.chen@intel.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NDave Hansen <dave.hansen@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Bin Yang <bin.yang@intel.com>
Cc: Mark Gross <mark.gross@intel.com>
Link: https://lkml.kernel.org/r/20180917143545.731701535@linutronix.deSigned-off-by: NShile Zhang <shile.zhang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

e782ba9f

29 10月, 2019 7 次提交

CPX: KVM: x86: expose AVX512_BF16 feature to guest · 6eced5c4

由 Jing Liu 提交于 7月 11, 2019

commit 0b774629512057b4becc705e2495220844e6e795 upstream.

AVX512 BFLOAT16 instructions support 16-bit BFLOAT16 floating-point
format (BF16) for deep learning optimization.

Intel adds AVX512 BFLOAT16 feature in CooperLake, which is CPUID.7.1.EAX[5].

Detailed information of the CPUID bit can be found here,
https://software.intel.com/sites/default/files/managed/c5/15/\
architecture-instruction-set-extensions-programming-reference.pdf.
Signed-off-by: NJing Liu <jing2.liu@linux.intel.com>
[Fix type mismatch in min, changing constant "1" to "1u". - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NLin Wang <lin.x.wang@intel.com>
Signed-off-by: NJeffle Xu <jefflexu@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

6eced5c4

CPX: KVM: cpuid: remove has_leaf_count from struct kvm_cpuid_param · dc579cd6

由 Paolo Bonzini 提交于 6月 24, 2019

commit 60cec433c485564bd7caac38a9df5c1ed79ee560 upstream.

The has_leaf_count member was originally added for KVM's paravirtualization
CPUID leaves.  However, since then the leaf count _has_ been added to those
leaves as well, so we can drop that special case.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NLin Wang <lin.x.wang@intel.com>
Signed-off-by: NJeffle Xu <jefflexu@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

dc579cd6

CPX: KVM: cpuid: rename do_cpuid_1_ent · 300bae76

由 Paolo Bonzini 提交于 6月 24, 2019

commit 50a9e1a4b1dece60fd79ecdee25db01274a7f291 upstream.

do_cpuid_1_ent does not do the entire processing for a CPUID entry, it
only retrieves the host's values.  Rename it to match reality.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NLin Wang <lin.x.wang@intel.com>
Signed-off-by: NJeffle Xu <jefflexu@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

300bae76

CPX: KVM: cpuid: set struct kvm_cpuid_entry2 flags in do_cpuid_1_ent · fc4e269e

由 Paolo Bonzini 提交于 7月 04, 2019

commit d9aadaf689928ba896529cb684729923b21c2de5 upstream.

do_cpuid_1_ent is typically called in two places by __do_cpuid_func
for CPUID functions that have subleafs.  Both places have to set
the KVM_CPUID_FLAG_SIGNIFCANT_INDEX.  Set that flag, and
KVM_CPUID_FLAG_STATEFUL_FUNC as well, directly in do_cpuid_1_ent.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NLin Wang <lin.x.wang@intel.com>
Signed-off-by: NJeffle Xu <jefflexu@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

fc4e269e

CPX: KVM: cpuid: extract do_cpuid_7_mask and support multiple subleafs · 82b11806

由 Paolo Bonzini 提交于 7月 04, 2019

commit 54d360d41211006437bebf97513394693bd32623 upstream.

CPUID function 7 has multiple subleafs.  Instead of having nested
switch statements, move the logic to filter supported features to
a separate function, and call it for each subleaf.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NLin Wang <lin.x.wang@intel.com>
Signed-off-by: NJeffle Xu <jefflexu@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

82b11806

CPX: KVM: cpuid: do_cpuid_ent works on a whole CPUID function · 9ea2c5e1

由 Paolo Bonzini 提交于 6月 24, 2019

commit ab8bcf64971180e1344ce2c7e70c49b0f24f6b0d upstream.

Rename it as well as __do_cpuid_ent and __do_cpuid_ent_emulated to have
"func" in its name, and drop the index parameter which is always 0.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NLin Wang <lin.x.wang@intel.com>
Signed-off-by: NJeffle Xu <jefflexu@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

9ea2c5e1

CPX: x86/cpufeatures: Enumerate the new AVX512 BFLOAT16 instructions · 61c73dc7

由 Fenghua Yu 提交于 6月 17, 2019

commit b302e4b176d00e1cbc80148c5d0aee36751f7480 upstream.

AVX512 BFLOAT16 instructions support 16-bit BFLOAT16 floating-point
format (BF16) for deep learning optimization.

BF16 is a short version of 32-bit single-precision floating-point
format (FP32) and has several advantages over 16-bit half-precision
floating-point format (FP16). BF16 keeps FP32 accumulation after
multiplication without loss of precision, offers more than enough
range for deep learning training tasks, and doesn't need to handle
hardware exception.

AVX512 BFLOAT16 instructions are enumerated in CPUID.7.1:EAX[bit 5]
AVX512_BF16.

CPUID.7.1:EAX contains only feature bits. Reuse the currently empty
word 12 as a pure features word to hold the feature bits including
AVX512_BF16.

Detailed information of the CPUID bit and AVX512 BFLOAT16 instructions
can be found in the latest Intel Architecture Instruction Set Extensions
and Future Features Programming Reference.

 [ bp: Check CPUID(7) subleaf validity before accessing subleaf 1. ]
Signed-off-by: NFenghua Yu <fenghua.yu@intel.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Cc: "Chang S. Bae" <chang.seok.bae@intel.com>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nadav Amit <namit@vmware.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Peter Feiner <pfeiner@google.com>
Cc: Radim Krcmar <rkrcmar@redhat.com>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
Cc: Robert Hoo <robert.hu@linux.intel.com>
Cc: "Sean J Christopherson" <sean.j.christopherson@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Lendacky <Thomas.Lendacky@amd.com>
Cc: x86 <x86@kernel.org>
Link: https://lkml.kernel.org/r/1560794416-217638-3-git-send-email-fenghua.yu@intel.comSigned-off-by: NLin Wang <lin.x.wang@intel.com>
Signed-off-by: NJeffle Xu <jefflexu@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

61c73dc7

12 10月, 2019 2 次提交

arm/arm64: KVM: Enable 32 bits kvm vcpu events support · 76d210a9

由 Dongjiu Geng 提交于 10月 13, 2018

commit 58bf437ff64eac8aca606e42d7e4623e40b61fa1 upstream

The commit 539aee0e ("KVM: arm64: Share the parts of
get/set events useful to 32bit") shares the get/set events
helper for arm64 and arm32, but forgot to share the cap
extension code.

User space will check whether KVM supports vcpu events by
checking the KVM_CAP_VCPU_EVENTS extension
Acked-by: NJames Morse <james.morse@arm.com>
Reviewed-by : Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: NDongjiu Geng <gengdongjiu@huawei.com>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NZou Cao <zoucao@linux.alibaba.com>
Reviewed-by: NBaoyou Xie <xie.baoyou@linux.alibaba.com>

76d210a9

arm/arm64: KVM: Rename function kvm_arch_dev_ioctl_check_extension() · 351b99f3

由 Dongjiu Geng 提交于 10月 13, 2018

commit 375bdd3b5d4f7cf146f0df1488b4671b141dd799 upstream

Rename kvm_arch_dev_ioctl_check_extension() to
kvm_arch_vm_ioctl_check_extension(), because it does
not have any relationship with device.

Renaming this function can make code readable.

Cc: James Morse <james.morse@arm.com>
Reviewed-by: NSuzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: NDongjiu Geng <gengdongjiu@huawei.com>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NZou Cao <zoucao@linux.alibaba.com>
Reviewed-by: NBaoyou Xie <xie.baoyou@linux.alibaba.com>

351b99f3

25 9月, 2019 2 次提交

efi/x86/Add missing error handling to old_memmap 1:1 mapping code · 6961885f

由 Gen Zhang 提交于 5月 25, 2019

commit 4e78921ba4dd0aca1cc89168f45039add4183f8e upstream.

The old_memmap flow in efi_call_phys_prolog() performs numerous memory
allocations, and either does not check for failure at all, or it does
but fails to propagate it back to the caller, which may end up calling
into the firmware with an incomplete 1:1 mapping.

So let's fix this by returning NULL from efi_call_phys_prolog() on
memory allocation failures only, and by handling this condition in the
caller. Also, clean up any half baked sets of page tables that we may
have created before returning with a NULL return value.

Note that any failure at this level will trigger a panic() two levels
up, so none of this makes a huge difference, but it is a nice cleanup
nonetheless.

[ardb: update commit log, add efi_call_phys_epilog() call on error path]
Signed-off-by: NGen Zhang <blackgod016574@gmail.com>
Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rob Bradford <robert.bradford@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20190525112559.7917-2-ard.biesheuvel@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NJeffle Xu <jefflexu@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

6961885f

powerpc/pseries/dlpar: Fix a missing check in dlpar_parse_cc_property() · d77c75f7

由 Gen Zhang 提交于 5月 26, 2019

commit efa9ace68e487ddd29c2b4d6dd23242158f1f607 upstream.

In dlpar_parse_cc_property(), 'prop->name' is allocated by kstrdup().
kstrdup() may return NULL, so it should be checked and handle error.
And prop should be freed if 'prop->name' is NULL.
Signed-off-by: NGen Zhang <blackgod016574@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Signed-off-by: NJeffle Xu <jefflexu@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

d77c75f7

19 8月, 2019 2 次提交

x86: uaccess: Inhibit speculation past access_ok() in user_access_begin() · 226356df

由 Will Deacon 提交于 1月 19, 2019

commit 6e693b3ffecb0b478c7050b44a4842854154f715 upstream.

Commit 594cc251fdd0 ("make 'user_access_begin()' do 'access_ok()'")
makes the access_ok() check part of the user_access_begin() preceding a
series of 'unsafe' accesses.  This has the desirable effect of ensuring
that all 'unsafe' accesses have been range-checked, without having to
pick through all of the callsites to verify whether the appropriate
checking has been made.

However, the consolidated range check does not inhibit speculation, so
it is still up to the caller to ensure that they are not susceptible to
any speculative side-channel attacks for user addresses that ultimately
fail the access_ok() check.

This is an oversight, so use __uaccess_begin_nospec() to ensure that
speculation is inhibited until the access_ok() check has passed.
Reported-by: NJulien Thierry <julien.thierry@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NShile Zhang <shile.zhang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

226356df

make 'user_access_begin()' do 'access_ok()' · 6342e75a

由 Linus Torvalds 提交于 1月 04, 2019

commit 594cc251fdd0d231d342d88b2fdff4bc42fb0690 upstream.

Originally, the rule used to be that you'd have to do access_ok()
separately, and then user_access_begin() before actually doing the
direct (optimized) user access.

But experience has shown that people then decide not to do access_ok()
at all, and instead rely on it being implied by other operations or
similar.  Which makes it very hard to verify that the access has
actually been range-checked.

If you use the unsafe direct user accesses, hardware features (either
SMAP - Supervisor Mode Access Protection - on x86, or PAN - Privileged
Access Never - on ARM) do force you to use user_access_begin().  But
nothing really forces the range check.

By putting the range check into user_access_begin(), we actually force
people to do the right thing (tm), and the range check vill be visible
near the actual accesses.  We have way too long a history of people
trying to avoid them.
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

[ Shile: fix following conflicts by adding a dummy arguments ]
Conflicts:
	kernel/compat.c
	kernel/exit.c
Signed-off-by: NShile Zhang <shile.zhang@linux.alibaba.com>

6342e75a

17 8月, 2019 5 次提交

x86/kvmclock: set offset for kvm unstable clock · ca14531b

由 Pavel Tatashin 提交于 7月 10, 2019

commit b5179ec4187251a751832193693d6e474d3445ac upstream

VMs may show incorrect uptime and dmesg printk offsets on hypervisors with
unstable clock. The problem is produced when VM is rebooted without exiting
from qemu.

The fix is to calculate clock offset not only for stable clock but for
unstable clock as well, and use kvm_sched_clock_read() which substracts
the offset for both clocks.

This is safe, because pvclock_clocksource_read() does the right thing and
makes sure that clock always goes forward, so once offset is calculated
with unstable clock, we won't get new reads that are smaller than offset,
and thus won't get negative results.

Thank you Jon DeVree for helping to reproduce this issue.

Fixes: 857baa87 ("sched/clock: Enable sched clock early")
Cc: stable@vger.kernel.org
Reported-by: NDominique Martinet <asmadeus@codewreck.org>
Signed-off-by: NPavel Tatashin <pasha.tatashin@soleen.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NXingjun Liu <xingjun.liu@linux.alibaba.com>
Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>

ca14531b

sched: loadavg: consolidate LOAD_INT, LOAD_FRAC, CALC_LOAD · b0406fce

由 Johannes Weiner 提交于 10月 26, 2018

commit 8508cf3ffad4defa202b303e5b6379efc4cd9054 upstream.

There are several definitions of those functions/macros in places that
mess with fixed-point load averages.  Provide an official version.

[akpm@linux-foundation.org: fix missed conversion in block/blk-iolatency.c]
Link: http://lkml.kernel.org/r/20180828172258.3185-5-hannes@cmpxchg.orgSigned-off-by: NJohannes Weiner <hannes@cmpxchg.org>
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: NSuren Baghdasaryan <surenb@google.com>
Tested-by: NDaniel Drake <drake@endlessm.com>
Cc: Christopher Lameter <cl@linux.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Johannes Weiner <jweiner@fb.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Enderborg <peter.enderborg@sony.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vinayak Menon <vinmenon@codeaurora.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
[Joseph: use stat.mean instead of stat->rqs.mean to solve conflict]
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>

Conflicts:
    block/blk-iolatency.c

b0406fce

kconfig: Disable x86 clocksource watchdog · 89f55e0b

由 Jiufei Xue 提交于 1月 14, 2019

Unstable tsc will trigger clocksource watchdog and disable itself, as a
result other clocksource will be elected as the current clocksource
which will result in performace issue on our servers.

RHEL7 also disabled this feature for some issues, see changelog:
[x86] disable clocksource watchdog (Prarit Bhargava) [914709]
Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

89f55e0b

Revert "x86/tsc: Prepare warp test for TSC adjustment" · 74343ee8

由 Jiufei Xue 提交于 1月 10, 2019

This reverts commit 76d3b851.

The returned value for check_tsc_warp() is useless now, remove it.
Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

74343ee8

Revert "x86/tsc: Try to adjust TSC if sync test fails" · 08fc9dcb

由 Jiufei Xue 提交于 1月 10, 2019

This reverts commit cc4db268.

When we do hot-add and enable vCPU, the time inside the VM jumps and
then VM stucks.
The dmesg shows like this:
[   48.402948] CPU2 has been hot-added
[   48.413774] smpboot: Booting Node 0 Processor 2 APIC 0x2
[   48.415155] kvm-clock: cpu 2, msr 6b615081, secondary cpu clock
[   48.453690] TSC ADJUST compensate: CPU2 observed 139318776350 warp.  Adjust: 139318776350
[  102.060874] clocksource: timekeeping watchdog on CPU0: Marking clocksource 'tsc' as unstable because the skew is too large:
[  102.060874] clocksource:                       'kvm-clock' wd_now: 1cb1cfc4bf8 wd_last: 1be9588f1fe mask: ffffffffffffffff
[  102.060874] clocksource:                       'tsc' cs_now: 207d794f7e cs_last: 205a32697a mask: ffffffffffffffff
[  102.060874] tsc: Marking TSC unstable due to clocksource watchdog
[  102.070188] KVM setup async PF for cpu 2
[  102.071461] kvm-stealtime: cpu 2, msr 13ba95000
[  102.074530] Will online and init hotplugged CPU: 2

This is because the TSC for the newly added VCPU is initialized to 0
while others are ahead. Guest will do the TSC ADJUST compensate and
cause the time jumps.

Commit bd8fab39("KVM: x86: fix maintaining of kvm_clock stability
on guest CPU hotplug") can fix this problem.  However, the host kernel
version may be older, so do not ajust TSC if sync test fails, just mark
it unstable.
Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

08fc9dcb

16 8月, 2019 8 次提交

KVM: Fix leak vCPU's VMCS value into other pCPU · 2bc73d91

由 Wanpeng Li 提交于 8月 05, 2019

commit 17e433b54393a6269acbcb792da97791fe1592d8 upstream.

After commit d73eb57b80b (KVM: Boost vCPUs that are delivering interrupts), a
five years old bug is exposed. Running ebizzy benchmark in three 80 vCPUs VMs
on one 80 pCPUs Skylake server, a lot of rcu_sched stall warning splatting
in the VMs after stress testing:

 INFO: rcu_sched detected stalls on CPUs/tasks: { 4 41 57 62 77} (detected by 15, t=60004 jiffies, g=899, c=898, q=15073)
 Call Trace:
   flush_tlb_mm_range+0x68/0x140
   tlb_flush_mmu.part.75+0x37/0xe0
   tlb_finish_mmu+0x55/0x60
   zap_page_range+0x142/0x190
   SyS_madvise+0x3cd/0x9c0
   system_call_fastpath+0x1c/0x21

swait_active() sustains to be true before finish_swait() is called in
kvm_vcpu_block(), voluntarily preempted vCPUs are taken into account
by kvm_vcpu_on_spin() loop greatly increases the probability condition
kvm_arch_vcpu_runnable(vcpu) is checked and can be true, when APICv
is enabled the yield-candidate vCPU's VMCS RVI field leaks(by
vmx_sync_pir_to_irr()) into spinning-on-a-taken-lock vCPU's current
VMCS.

This patch fixes it by checking conservatively a subset of events.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Marc Zyngier <Marc.Zyngier@arm.com>
Cc: stable@vger.kernel.org
Fixes: 98f4a146 (KVM: add kvm_arch_vcpu_runnable() test to kvm_vcpu_on_spin() loop)
Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

2bc73d91

x86/purgatory: Do not use __builtin_memcpy and __builtin_memset · e0d262a5

由 Nick Desaulniers 提交于 8月 07, 2019

commit 4ce97317f41d38584fb93578e922fcd19e535f5b upstream.

Implementing memcpy and memset in terms of __builtin_memcpy and
__builtin_memset is problematic.

GCC at -O2 will replace calls to the builtins with calls to memcpy and
memset (but will generate an inline implementation at -Os).  Clang will
replace the builtins with these calls regardless of optimization level.
$ llvm-objdump -dr arch/x86/purgatory/string.o | tail

0000000000000339 memcpy:
     339: 48 b8 00 00 00 00 00 00 00 00 movabsq $0, %rax
                000000000000033b:  R_X86_64_64  memcpy
     343: ff e0                         jmpq    *%rax

0000000000000345 memset:
     345: 48 b8 00 00 00 00 00 00 00 00 movabsq $0, %rax
                0000000000000347:  R_X86_64_64  memset
     34f: ff e0

Such code results in infinite recursion at runtime. This is observed
when doing kexec.

Instead, reuse an implementation from arch/x86/boot/compressed/string.c.
This requires to implement a stub function for warn(). Also, Clang may
lower memcmp's that compare against 0 to bcmp's, so add a small definition,
too. See also: commit 5f074f3e192f ("lib/string.c: implement a basic bcmp")

Fixes: 8fc5b4d4 ("purgatory: core purgatory functionality")
Reported-by: NVaibhav Rustagi <vaibhavrustagi@google.com>
Debugged-by: NVaibhav Rustagi <vaibhavrustagi@google.com>
Debugged-by: NManoj Gupta <manojgupta@google.com>
Suggested-by: NAlistair Delva <adelva@google.com>
Signed-off-by: NNick Desaulniers <ndesaulniers@google.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Tested-by: NVaibhav Rustagi <vaibhavrustagi@google.com>
Cc: stable@vger.kernel.org
Link: https://bugs.chromium.org/p/chromium/issues/detail?id=984056
Link: https://lkml.kernel.org/r/20190807221539.94583-1-ndesaulniers@google.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

e0d262a5

s390/dma: provide proper ARCH_ZONE_DMA_BITS value · 5c4689cb

由 Halil Pasic 提交于 7月 24, 2019

[ Upstream commit 1a2dcff881059dedc14fafc8a442664c8dbd60f1 ]

On s390 ZONE_DMA is up to 2G, i.e. ARCH_ZONE_DMA_BITS should be 31 bits.
The current value is 24 and makes __dma_direct_alloc_pages() take a
wrong turn first (but __dma_direct_alloc_pages() recovers then).

Let's correct ARCH_ZONE_DMA_BITS value and avoid wrong turns.
Signed-off-by: NHalil Pasic <pasic@linux.ibm.com>
Reported-by: NPetr Tesarik <ptesarik@suse.cz>
Fixes: c61e9637 ("dma-direct: add support for allocation from ZONE_DMA and ZONE_DMA32")
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>

5c4689cb

ARM: dts: bcm: bcm47094: add missing #cells for mdio-bus-mux · bb41940c

由 Arnd Bergmann 提交于 7月 22, 2019

[ Upstream commit 3a9d2569e45cb02769cda26fee4a02126867c934 ]

The mdio-bus-mux has no #address-cells/#size-cells property,
which causes a few dtc warnings:

arch/arm/boot/dts/bcm47094-linksys-panamera.dts:129.4-18: Warning (reg_format): /mdio-bus-mux/mdio@200:reg: property has invalid length (4 bytes) (#address-cells == 2, #size-cells == 1)
arch/arm/boot/dts/bcm47094-linksys-panamera.dtb: Warning (pci_device_bus_num): Failed prerequisite 'reg_format'
arch/arm/boot/dts/bcm47094-linksys-panamera.dtb: Warning (i2c_bus_reg): Failed prerequisite 'reg_format'
arch/arm/boot/dts/bcm47094-linksys-panamera.dtb: Warning (spi_bus_reg): Failed prerequisite 'reg_format'
arch/arm/boot/dts/bcm47094-linksys-panamera.dts:128.22-132.5: Warning (avoid_default_addr_size): /mdio-bus-mux/mdio@200: Relying on default #address-cells value
arch/arm/boot/dts/bcm47094-linksys-panamera.dts:128.22-132.5: Warning (avoid_default_addr_size): /mdio-bus-mux/mdio@200: Relying on default #size-cells value

Add the normal cell numbers.

Link: https://lore.kernel.org/r/20190722145618.1155492-1-arnd@arndb.de
Fixes: 2bebdfcd ("ARM: dts: BCM5301X: Add support for Linksys EA9500")
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NOlof Johansson <olof@lixom.net>
Signed-off-by: NSasha Levin <sashal@kernel.org>

bb41940c

ARM: davinci: fix sleep.S build error on ARMv4 · 19e7df3e

由 Arnd Bergmann 提交于 7月 22, 2019

[ Upstream commit d64b212ea960db4276a1d8372bd98cb861dfcbb0 ]

When building a multiplatform kernel that includes armv4 support,
the default target CPU does not support the blx instruction,
which leads to a build failure:

arch/arm/mach-davinci/sleep.S: Assembler messages:
arch/arm/mach-davinci/sleep.S:56: Error: selected processor does not support `blx ip' in ARM mode

Add a .arch statement in the sources to make this file build.

Link: https://lore.kernel.org/r/20190722145211.1154785-1-arnd@arndb.deAcked-by: NSekhar Nori <nsekhar@ti.com>
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NOlof Johansson <olof@lixom.net>
Signed-off-by: NSasha Levin <sashal@kernel.org>

19e7df3e

x86/purgatory: Use CFLAGS_REMOVE rather than reset KBUILD_CFLAGS · b674f791

由 Nick Desaulniers 提交于 8月 07, 2019

commit b059f801a937d164e03b33c1848bb3dca67c0b04 upstream.

KBUILD_CFLAGS is very carefully built up in the top level Makefile,
particularly when cross compiling or using different build tools.
Resetting KBUILD_CFLAGS via := assignment is an antipattern.

The comment above the reset mentions that -pg is problematic. Other
Makefiles use `CFLAGS_REMOVE_file.o = $(CC_FLAGS_FTRACE)` when
CONFIG_FUNCTION_TRACER is set. Prefer that pattern to wiping out all of
the important KBUILD_CFLAGS then manually having to re-add them. Seems
also that __stack_chk_fail references are generated when using
CONFIG_STACKPROTECTOR or CONFIG_STACKPROTECTOR_STRONG.

Fixes: 8fc5b4d4 ("purgatory: core purgatory functionality")
Reported-by: NVaibhav Rustagi <vaibhavrustagi@google.com>
Suggested-by: NPeter Zijlstra <peterz@infradead.org>
Suggested-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NNick Desaulniers <ndesaulniers@google.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Tested-by: NVaibhav Rustagi <vaibhavrustagi@google.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20190807221539.94583-2-ndesaulniers@google.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

b674f791

x86/mm: Sync also unmappings in vmalloc_sync_all() · 9935d7ed

由 Joerg Roedel 提交于 7月 19, 2019

commit 8e998fc24de47c55b47a887f6c95ab91acd4a720 upstream.

With huge-page ioremap areas the unmappings also need to be synced between
all page-tables. Otherwise it can cause data corruption when a region is
unmapped and later re-used.

Make the vmalloc_sync_one() function ready to sync unmappings and make sure
vmalloc_sync_all() iterates over all page-tables even when an unmapped PMD
is found.

Fixes: 5d72b4fb ('x86, mm: support huge I/O mapping capability I/F')
Signed-off-by: NJoerg Roedel <jroedel@suse.de>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NDave Hansen <dave.hansen@linux.intel.com>
Link: https://lkml.kernel.org/r/20190719184652.11391-3-joro@8bytes.orgSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

9935d7ed

x86/mm: Check for pfn instead of page in vmalloc_sync_one() · dd524d48

由 Joerg Roedel 提交于 7月 19, 2019

commit 51b75b5b563a2637f9d8dc5bd02a31b2ff9e5ea0 upstream.

Do not require a struct page for the mapped memory location because it
might not exist. This can happen when an ioremapped region is mapped with
2MB pages.

Fixes: 5d72b4fb ('x86, mm: support huge I/O mapping capability I/F')
Signed-off-by: NJoerg Roedel <jroedel@suse.de>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NDave Hansen <dave.hansen@linux.intel.com>
Link: https://lkml.kernel.org/r/20190719184652.11391-2-joro@8bytes.orgSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

dd524d48

07 8月, 2019 1 次提交

x86/speculation/swapgs: Exclude ATOMs from speculation through SWAPGS · b88241ae

由 Thomas Gleixner 提交于 7月 17, 2019

commit f36cf386e3fec258a341d446915862eded3e13d8 upstream

Intel provided the following information:

 On all current Atom processors, instructions that use a segment register
 value (e.g. a load or store) will not speculatively execute before the
 last writer of that segment retires. Thus they will not use a
 speculatively written segment value.

That means on ATOMs there is no speculation through SWAPGS, so the SWAPGS
entry paths can be excluded from the extra LFENCE if PTI is disabled.

Create a separate bug flag for the through SWAPGS speculation and mark all
out-of-order ATOMs and AMD/HYGON CPUs as not affected. The in-order ATOMs
are excluded from the whole mitigation mess anyway.
Reported-by: NAndrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NTyler Hicks <tyhicks@canonical.com>
Reviewed-by: NJosh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

b88241ae

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功