提交 · 393ca9ea37fb1c8cd74838282bf569825d18e3bf · openanolis / cloud-kernel

22 5月, 2019 1 次提交

x86/speculation/mds: Revert CPU buffer clear on double fault exit · 393ca9ea

由 Andy Lutomirski 提交于 5月 14, 2019

commit 88640e1dcd089879530a49a8d212d1814678dfe7 upstream.

The double fault ESPFIX path doesn't return to user mode at all --
it returns back to the kernel by simulating a #GP fault.
prepare_exit_to_usermode() will run on the way out of
general_protection before running user code.
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jon Masters <jcm@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Fixes: 04dcbdb80578 ("x86/speculation/mds: Clear CPU buffers on exit to user")
Link: http://lkml.kernel.org/r/ac97612445c0a44ee10374f6ea79c222fe22a5c4.1557865329.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

393ca9ea

17 5月, 2019 5 次提交

x86/kprobes: Avoid kretprobe recursion bug · 57526050

由 Masami Hiramatsu 提交于 2月 24, 2019

[ Upstream commit b191fa96ea6dc00d331dcc28c1f7db5e075693a0 ]

Avoid kretprobe recursion loop bg by setting a dummy
kprobes to current_kprobe per-CPU variable.

This bug has been introduced with the asm-coded trampoline
code, since previously it used another kprobe for hooking
the function return placeholder (which only has a nop) and
trampoline handler was called from that kprobe.

This revives the old lost kprobe again.

With this fix, we don't see deadlock anymore.

And you can see that all inner-called kretprobe are skipped.

  event_1                                  235               0
  event_2                                19375           19612

The 1st column is recorded count and the 2nd is missed count.
Above shows (event_1 rec) + (event_2 rec) ~= (event_2 missed)
(some difference are here because the counter is racy)
Reported-by: NAndrea Righi <righi.andrea@gmail.com>
Tested-by: NAndrea Righi <righi.andrea@gmail.com>
Signed-off-by: NMasami Hiramatsu <mhiramat@kernel.org>
Acked-by: NSteven Rostedt <rostedt@goodmis.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Fixes: c9becf58 ("[PATCH] kretprobe: kretprobe-booster")
Link: http://lkml.kernel.org/r/155094064889.6137.972160690963039.stgit@devboxSigned-off-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NSasha Levin <alexander.levin@microsoft.com>

57526050

KVM: x86: avoid misreporting level-triggered irqs as edge-triggered in tracing · 0c8afd51

由 Vitaly Kuznetsov 提交于 3月 27, 2019

[ Upstream commit 7a223e06b1a411cef6c4cd7a9b9a33c8d225b10e ]

In __apic_accept_irq() interface trig_mode is int and actually on some code
paths it is set above u8:

kvm_apic_set_irq() extracts it from 'struct kvm_lapic_irq' where trig_mode
is u16. This is done on purpose as e.g. kvm_set_msi_irq() sets it to
(1 << 15) & e->msi.data

kvm_apic_local_deliver sets it to reg & (1 << 15).

Fix the immediate issue by making 'tm' into u16. We may also want to adjust
__apic_accept_irq() interface and use proper sizes for vector, level,
trig_mode but this is not urgent.
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>

0c8afd51

KVM: fix spectrev1 gadgets · d39f3cc7

由 Paolo Bonzini 提交于 4月 11, 2019

[ Upstream commit 1d487e9bf8ba66a7174c56a0029c54b1eca8f99c ]

These were found with smatch, and then generalized when applicable.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>

d39f3cc7

x86/reboot, efi: Use EFI reboot for Acer TravelMate X514-51T · 4074bc37

由 Jian-Hong Pan 提交于 4月 12, 2019

[ Upstream commit 0082517fa4bce073e7cf542633439f26538a14cc ]

Upon reboot, the Acer TravelMate X514-51T laptop appears to complete the
shutdown process, but then it hangs in BIOS POST with a black screen.

The problem is intermittent - at some points it has appeared related to
Secure Boot settings or different kernel builds, but ultimately we have
not been able to identify the exact conditions that trigger the issue to
come and go.

Besides, the EFI mode cannot be disabled in the BIOS of this model.

However, after extensive testing, we observe that using the EFI reboot
method reliably avoids the issue in all cases.

So add a boot time quirk to use EFI reboot on such systems.

Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=203119Signed-off-by: NJian-Hong Pan <jian-hong@endlessm.com>
Signed-off-by: NDaniel Drake <drake@endlessm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Cc: linux@endlessm.com
Link: http://lkml.kernel.org/r/20190412080152.3718-1-jian-hong@endlessm.com
[ Fix !CONFIG_EFI build failure, clarify the code and the changelog a bit. ]
Signed-off-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>

4074bc37

x86/build/lto: Fix truncated .bss with -fdata-sections · 3b51d713

由 Sami Tolvanen 提交于 4月 15, 2019

[ Upstream commit 6a03469a1edc94da52b65478f1e00837add869a3 ]

With CONFIG_LD_DEAD_CODE_DATA_ELIMINATION=y, we compile the kernel with
-fdata-sections, which also splits the .bss section.

The new section, with a new .bss.* name, which pattern gets missed by the
main x86 linker script which only expects the '.bss' name. This results
in the discarding of the second part and a too small, truncated .bss
section and an unhappy, non-working kernel.

Use the common BSS_MAIN macro in the linker script to properly capture
and merge all the generated BSS sections.
Signed-off-by: NSami Tolvanen <samitolvanen@google.com>
Reviewed-by: NNick Desaulniers <ndesaulniers@google.com>
Reviewed-by: NKees Cook <keescook@chromium.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20190415164956.124067-1-samitolvanen@google.com
[ Extended the changelog. ]
Signed-off-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>

3b51d713

15 5月, 2019 22 次提交

x86/speculation/mds: Add 'mitigations=' support for MDS · 12a0dad7

由 Josh Poimboeuf 提交于 4月 17, 2019

commit 5c14068f87d04adc73ba3f41c2a303d3c3d1fa12 upstream

Add MDS to the new 'mitigations=' cmdline option.
Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

12a0dad7

x86/speculation: Support 'mitigations=' cmdline option · af5332dd

由 Josh Poimboeuf 提交于 4月 12, 2019

commit d68be4c4d31295ff6ae34a8ddfaa4c1a8ff42812 upstream

Configure x86 runtime CPU speculation bug mitigations in accordance with
the 'mitigations=' cmdline option.  This affects Meltdown, Spectre v2,
Speculative Store Bypass, and L1TF.

The default behavior is unchanged.
Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Tested-by: Jiri Kosina <jkosina@suse.cz> (on x86)
Reviewed-by: NJiri Kosina <jkosina@suse.cz>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Jon Masters <jcm@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: linux-s390@vger.kernel.org
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-arch@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Tyler Hicks <tyhicks@canonical.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Steven Price <steven.price@arm.com>
Cc: Phil Auld <pauld@redhat.com>
Link: https://lkml.kernel.org/r/6616d0ae169308516cfdf5216bedd169f8a8291b.1555085500.git.jpoimboe@redhat.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

af5332dd

x86/speculation/mds: Print SMT vulnerable on MSBDS with mitigations off · 7ba793ae

由 Konrad Rzeszutek Wilk 提交于 4月 12, 2019

commit e2c3c94788b08891dcf3dbe608f9880523ecd71b upstream

This code is only for CPUs which are affected by MSBDS, but are *not*
affected by the other two MDS issues.

For such CPUs, enabling the mds_idle_clear mitigation is enough to
mitigate SMT.

However if user boots with 'mds=off' and still has SMT enabled, we should
not report that SMT is mitigated:

$cat /sys//devices/system/cpu/vulnerabilities/mds
Vulnerable; SMT mitigated

But rather:
Vulnerable; SMT vulnerable
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NTyler Hicks <tyhicks@canonical.com>
Reviewed-by: NJosh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/20190412215118.294906495@localhost.localdomainSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

7ba793ae

x86/speculation/mds: Fix comment · e9ae3226

由 Boris Ostrovsky 提交于 4月 12, 2019

commit cae5ec342645746d617dd420d206e1588d47768a upstream

s/L1TF/MDS/
Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NTyler Hicks <tyhicks@canonical.com>
Reviewed-by: NJosh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

e9ae3226

x86/speculation/mds: Add SMT warning message · b9faa465

由 Josh Poimboeuf 提交于 4月 02, 2019

commit 39226ef02bfb43248b7db12a4fdccb39d95318e3 upstream

MDS is vulnerable with SMT.  Make that clear with a one-time printk
whenever SMT first gets enabled.
Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NTyler Hicks <tyhicks@canonical.com>
Acked-by: NJiri Kosina <jkosina@suse.cz>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

b9faa465

x86/speculation: Move arch_smt_update() call to after mitigation decisions · b3a63d9c

由 Josh Poimboeuf 提交于 4月 02, 2019

commit 7c3658b20194a5b3209a143f63bc9c643c6a3ae2 upstream

arch_smt_update() now has a dependency on both Spectre v2 and MDS
mitigations.  Move its initial call to after all the mitigation decisions
have been made.
Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NTyler Hicks <tyhicks@canonical.com>
Acked-by: NJiri Kosina <jkosina@suse.cz>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

b3a63d9c

x86/speculation/mds: Add mds=full,nosmt cmdline option · f7a119a7

由 Josh Poimboeuf 提交于 4月 02, 2019

commit d71eb0ce109a124b0fa714832823b9452f2762cf upstream

Add the mds=full,nosmt cmdline option.  This is like mds=full, but with
SMT disabled if the CPU is vulnerable.
Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NTyler Hicks <tyhicks@canonical.com>
Acked-by: NJiri Kosina <jkosina@suse.cz>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

f7a119a7

Documentation: Move L1TF to separate directory · e3803099

由 Thomas Gleixner 提交于 2月 19, 2019

commit 65fd4cb65b2dad97feb8330b6690445910b56d6a upstream

Move L!TF to a separate directory so the MDS stuff can be added at the
side. Otherwise the all hardware vulnerabilites have their own top level
entry. Should have done that right away.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: NJon Masters <jcm@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

e3803099

x86/speculation/mds: Add mitigation mode VMWERV · c50e81fe

由 Thomas Gleixner 提交于 2月 20, 2019

commit 22dd8365088b6403630b82423cf906491859b65e upstream

In virtualized environments it can happen that the host has the microcode
update which utilizes the VERW instruction to clear CPU buffers, but the
hypervisor is not yet updated to expose the X86_FEATURE_MD_CLEAR CPUID bit
to guests.

Introduce an internal mitigation mode VMWERV which enables the invocation
of the CPU buffer clearing even if X86_FEATURE_MD_CLEAR is not set. If the
system has no updated microcode this results in a pointless execution of
the VERW instruction wasting a few CPU cycles. If the microcode is updated,
but not exposed to a guest then the CPU buffers will be cleared.

That said: Virtual Machines Will Eventually Receive Vaccine
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NBorislav Petkov <bp@suse.de>
Reviewed-by: NJon Masters <jcm@redhat.com>
Tested-by: NJon Masters <jcm@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

c50e81fe

x86/speculation/mds: Add sysfs reporting for MDS · 8230c202

由 Thomas Gleixner 提交于 2月 18, 2019

commit 8a4b06d391b0a42a373808979b5028f5c84d9c6a upstream

Add the sysfs reporting file for MDS. It exposes the vulnerability and
mitigation state similar to the existing files for the other speculative
hardware vulnerabilities.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: NBorislav Petkov <bp@suse.de>
Reviewed-by: NJon Masters <jcm@redhat.com>
Tested-by: NJon Masters <jcm@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

8230c202

x86/speculation/mds: Add mitigation control for MDS · 29510670

由 Thomas Gleixner 提交于 2月 18, 2019

commit bc1241700acd82ec69fde98c5763ce51086269f8 upstream

Now that the mitigations are in place, add a command line parameter to
control the mitigation, a mitigation selector function and a SMT update
mechanism.

This is the minimal straight forward initial implementation which just
provides an always on/off mode. The command line parameter is:

  mds=[full|off]

This is consistent with the existing mitigations for other speculative
hardware vulnerabilities.

The idle invocation is dynamically updated according to the SMT state of
the system similar to the dynamic update of the STIBP mitigation. The idle
mitigation is limited to CPUs which are only affected by MSBDS and not any
other variant, because the other variants cannot be mitigated on SMT
enabled systems.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NBorislav Petkov <bp@suse.de>
Reviewed-by: NJon Masters <jcm@redhat.com>
Tested-by: NJon Masters <jcm@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

29510670

x86/speculation/mds: Conditionally clear CPU buffers on idle entry · 4df98b3f

由 Thomas Gleixner 提交于 2月 18, 2019

commit 07f07f55a29cb705e221eda7894dd67ab81ef343 upstream

Add a static key which controls the invocation of the CPU buffer clear
mechanism on idle entry. This is independent of other MDS mitigations
because the idle entry invocation to mitigate the potential leakage due to
store buffer repartitioning is only necessary on SMT systems.

Add the actual invocations to the different halt/mwait variants which
covers all usage sites. mwaitx is not patched as it's not available on
Intel CPUs.

The buffer clear is only invoked before entering the C-State to prevent
that stale data from the idling CPU is spilled to the Hyper-Thread sibling
after the Store buffer got repartitioned and all entries are available to
the non idle sibling.

When coming out of idle the store buffer is partitioned again so each
sibling has half of it available. Now CPU which returned from idle could be
speculatively exposed to contents of the sibling, but the buffers are
flushed either on exit to user space or on VMENTER.

When later on conditional buffer clearing is implemented on top of this,
then there is no action required either because before returning to user
space the context switch will set the condition flag which causes a flush
on the return to user path.

Note, that the buffer clearing on idle is only sensible on CPUs which are
solely affected by MSBDS and not any other variant of MDS because the other
MDS variants cannot be mitigated when SMT is enabled, so the buffer
clearing on idle would be a window dressing exercise.

This intentionally does not handle the case in the acpi/processor_idle
driver which uses the legacy IO port interface for C-State transitions for
two reasons:

 - The acpi/processor_idle driver was replaced by the intel_idle driver
   almost a decade ago. Anything Nehalem upwards supports it and defaults
   to that new driver.

 - The legacy IO port interface is likely to be used on older and therefore
   unaffected CPUs or on systems which do not receive microcode updates
   anymore, so there is no point in adding that.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NBorislav Petkov <bp@suse.de>
Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: NFrederic Weisbecker <frederic@kernel.org>
Reviewed-by: NJon Masters <jcm@redhat.com>
Tested-by: NJon Masters <jcm@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

4df98b3f

x86/kvm/vmx: Add MDS protection when L1D Flush is not active · b39dc9a8

由 Thomas Gleixner 提交于 2月 27, 2019

commit 650b68a0622f933444a6d66936abb3103029413b upstream

CPUs which are affected by L1TF and MDS mitigate MDS with the L1D Flush on
VMENTER when updated microcode is installed.

If a CPU is not affected by L1TF or if the L1D Flush is not in use, then
MDS mitigation needs to be invoked explicitly.

For these cases, follow the host mitigation state and invoke the MDS
mitigation before VMENTER.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: NFrederic Weisbecker <frederic@kernel.org>
Reviewed-by: NBorislav Petkov <bp@suse.de>
Reviewed-by: NJon Masters <jcm@redhat.com>
Tested-by: NJon Masters <jcm@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

b39dc9a8

x86/speculation/mds: Clear CPU buffers on exit to user · e4fa775b

由 Thomas Gleixner 提交于 2月 18, 2019

commit 04dcbdb8057827b043b3c71aa397c4c63e67d086 upstream

Add a static key which controls the invocation of the CPU buffer clear
mechanism on exit to user space and add the call into
prepare_exit_to_usermode() and do_nmi() right before actually returning.

Add documentation which kernel to user space transition this covers and
explain why some corner cases are not mitigated.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: NBorislav Petkov <bp@suse.de>
Reviewed-by: NFrederic Weisbecker <frederic@kernel.org>
Reviewed-by: NJon Masters <jcm@redhat.com>
Tested-by: NJon Masters <jcm@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

e4fa775b

x86/speculation/mds: Add mds_clear_cpu_buffers() · 1f7c31be

由 Thomas Gleixner 提交于 2月 18, 2019

commit 6a9e529272517755904b7afa639f6db59ddb793e upstream

The Microarchitectural Data Sampling (MDS) vulernabilities are mitigated by
clearing the affected CPU buffers. The mechanism for clearing the buffers
uses the unused and obsolete VERW instruction in combination with a
microcode update which triggers a CPU buffer clear when VERW is executed.

Provide a inline function with the assembly magic. The argument of the VERW
instruction must be a memory operand as documented:

  "MD_CLEAR enumerates that the memory-operand variant of VERW (for
   example, VERW m16) has been extended to also overwrite buffers affected
   by MDS. This buffer overwriting functionality is not guaranteed for the
   register operand variant of VERW."

Documentation also recommends to use a writable data segment selector:

  "The buffer overwriting occurs regardless of the result of the VERW
   permission check, as well as when the selector is null or causes a
   descriptor load segment violation. However, for lowest latency we
   recommend using a selector that indicates a valid writable data
   segment."

Add x86 specific documentation about MDS and the internal workings of the
mitigation.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NBorislav Petkov <bp@suse.de>
Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: NFrederic Weisbecker <frederic@kernel.org>
Reviewed-by: NJon Masters <jcm@redhat.com>
Tested-by: NJon Masters <jcm@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

1f7c31be

x86/kvm: Expose X86_FEATURE_MD_CLEAR to guests · de89ff6f

由 Andi Kleen 提交于 1月 18, 2019

commit 6c4dbbd14730c43f4ed808a9c42ca41625925c22 upstream

X86_FEATURE_MD_CLEAR is a new CPUID bit which is set when microcode
provides the mechanism to invoke a flush of various exploitable CPU buffers
by invoking the VERW instruction.

Hand it through to guests so they can adjust their mitigations.

This also requires corresponding qemu changes, which are available
separately.

[ tglx: Massaged changelog ]
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NBorislav Petkov <bp@suse.de>
Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: NFrederic Weisbecker <frederic@kernel.org>
Reviewed-by: NJon Masters <jcm@redhat.com>
Tested-by: NJon Masters <jcm@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

de89ff6f

x86/speculation/mds: Add BUG_MSBDS_ONLY · aca9e8d8

由 Thomas Gleixner 提交于 3月 01, 2019

commit e261f209c3666e842fd645a1e31f001c3a26def9 upstream

This bug bit is set on CPUs which are only affected by Microarchitectural
Store Buffer Data Sampling (MSBDS) and not by any other MDS variant.

This is important because the Store Buffers are partitioned between
Hyper-Threads so cross thread forwarding is not possible. But if a thread
enters or exits a sleep state the store buffer is repartitioned which can
expose data from one thread to the other. This transition can be mitigated.

That means that for CPUs which are only affected by MSBDS SMT can be
enabled, if the CPU is not affected by other SMT sensitive vulnerabilities,
e.g. L1TF. The XEON PHI variants fall into that category. Also the
Silvermont/Airmont ATOMs, but for them it's not really relevant as they do
not support SMT, but mark them for completeness sake.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NFrederic Weisbecker <frederic@kernel.org>
Reviewed-by: NJon Masters <jcm@redhat.com>
Tested-by: NJon Masters <jcm@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

aca9e8d8

x86/speculation/mds: Add basic bug infrastructure for MDS · 2e9104aa

由 Andi Kleen 提交于 1月 18, 2019

commit ed5194c2732c8084af9fd159c146ea92bf137128 upstream

Microarchitectural Data Sampling (MDS), is a class of side channel attacks
on internal buffers in Intel CPUs. The variants are:

 - Microarchitectural Store Buffer Data Sampling (MSBDS) (CVE-2018-12126)
 - Microarchitectural Fill Buffer Data Sampling (MFBDS) (CVE-2018-12130)
 - Microarchitectural Load Port Data Sampling (MLPDS) (CVE-2018-12127)

MSBDS leaks Store Buffer Entries which can be speculatively forwarded to a
dependent load (store-to-load forwarding) as an optimization. The forward
can also happen to a faulting or assisting load operation for a different
memory address, which can be exploited under certain conditions. Store
buffers are partitioned between Hyper-Threads so cross thread forwarding is
not possible. But if a thread enters or exits a sleep state the store
buffer is repartitioned which can expose data from one thread to the other.

MFBDS leaks Fill Buffer Entries. Fill buffers are used internally to manage
L1 miss situations and to hold data which is returned or sent in response
to a memory or I/O operation. Fill buffers can forward data to a load
operation and also write data to the cache. When the fill buffer is
deallocated it can retain the stale data of the preceding operations which
can then be forwarded to a faulting or assisting load operation, which can
be exploited under certain conditions. Fill buffers are shared between
Hyper-Threads so cross thread leakage is possible.

MLDPS leaks Load Port Data. Load ports are used to perform load operations
from memory or I/O. The received data is then forwarded to the register
file or a subsequent operation. In some implementations the Load Port can
contain stale data from a previous operation which can be forwarded to
faulting or assisting loads under certain conditions, which again can be
exploited eventually. Load ports are shared between Hyper-Threads so cross
thread leakage is possible.

All variants have the same mitigation for single CPU thread case (SMT off),
so the kernel can treat them as one MDS issue.

Add the basic infrastructure to detect if the current CPU is affected by
MDS.

[ tglx: Rewrote changelog ]
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NBorislav Petkov <bp@suse.de>
Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: NFrederic Weisbecker <frederic@kernel.org>
Reviewed-by: NJon Masters <jcm@redhat.com>
Tested-by: NJon Masters <jcm@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

2e9104aa

x86/speculation: Consolidate CPU whitelists · 00b76324

由 Thomas Gleixner 提交于 2月 27, 2019

commit 36ad35131adacc29b328b9c8b6277a8bf0d6fd5d upstream

The CPU vulnerability whitelists have some overlap and there are more
whitelists coming along.

Use the driver_data field in the x86_cpu_id struct to denote the
whitelisted vulnerabilities and combine all whitelists into one.
Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NFrederic Weisbecker <frederic@kernel.org>
Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: NBorislav Petkov <bp@suse.de>
Reviewed-by: NJon Masters <jcm@redhat.com>
Tested-by: NJon Masters <jcm@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

00b76324

x86/msr-index: Cleanup bit defines · e09450ff

由 Thomas Gleixner 提交于 2月 21, 2019

commit d8eabc37310a92df40d07c5a8afc53cebf996716 upstream

Greg pointed out that speculation related bit defines are using (1 << N)
format instead of BIT(N). Aside of that (1 << N) is wrong as it should use
1UL at least.

Clean it up.

[ Josh Poimboeuf: Fix tools build ]
Reported-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: NBorislav Petkov <bp@suse.de>
Reviewed-by: NFrederic Weisbecker <frederic@kernel.org>
Reviewed-by: NJon Masters <jcm@redhat.com>
Tested-by: NJon Masters <jcm@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

e09450ff

kvm: x86: Report STIBP on GET_SUPPORTED_CPUID · ca0056d9

由 Eduardo Habkost 提交于 12月 05, 2018

commit d7b09c827a6cf291f66637a36f46928dd1423184 upstream

Months ago, we have added code to allow direct access to MSR_IA32_SPEC_CTRL
to the guest, which makes STIBP available to guests.  This was implemented
by commits d28b387f ("KVM/VMX: Allow direct access to
MSR_IA32_SPEC_CTRL") and b2ac58f9 ("KVM/SVM: Allow direct access to
MSR_IA32_SPEC_CTRL").

However, we never updated GET_SUPPORTED_CPUID to let userspace know that
STIBP can be enabled in CPUID.  Fix that by updating
kvm_cpuid_8000_0008_ebx_x86_features and kvm_cpuid_7_0_edx_x86_features.
Signed-off-by: NEduardo Habkost <ehabkost@redhat.com>
Reviewed-by: NJim Mattson <jmattson@google.com>
Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

ca0056d9

x86/cpu: Sanitize FAM6_ATOM naming · 1f1bc822

由 Peter Zijlstra 提交于 8月 07, 2018

commit f2c4db1bd80720cd8cb2a5aa220d9bc9f374f04e upstream

Going primarily by:

  https://en.wikipedia.org/wiki/List_of_Intel_Atom_microprocessors

with additional information gleaned from other related pages; notably:

 - Bonnell shrink was called Saltwell
 - Moorefield is the Merriefield refresh which makes it Airmont

The general naming scheme is: FAM6_ATOM_UARCH_SOCTYPE

  for i in `git grep -l FAM6_ATOM` ; do
	sed -i  -e 's/ATOM_PINEVIEW/ATOM_BONNELL/g'		\
		-e 's/ATOM_LINCROFT/ATOM_BONNELL_MID/'		\
		-e 's/ATOM_PENWELL/ATOM_SALTWELL_MID/g'		\
		-e 's/ATOM_CLOVERVIEW/ATOM_SALTWELL_TABLET/g'	\
		-e 's/ATOM_CEDARVIEW/ATOM_SALTWELL/g'		\
		-e 's/ATOM_SILVERMONT1/ATOM_SILVERMONT/g'	\
		-e 's/ATOM_SILVERMONT2/ATOM_SILVERMONT_X/g'	\
		-e 's/ATOM_MERRIFIELD/ATOM_SILVERMONT_MID/g'	\
		-e 's/ATOM_MOOREFIELD/ATOM_AIRMONT_MID/g'	\
		-e 's/ATOM_DENVERTON/ATOM_GOLDMONT_X/g'		\
		-e 's/ATOM_GEMINI_LAKE/ATOM_GOLDMONT_PLUS/g' ${i}
  done
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: dave.hansen@linux.intel.com
Cc: len.brown@intel.com
Signed-off-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

1f1bc822

10 5月, 2019 2 次提交

perf/x86/intel: Initialize TFA MSR · 7d10436c

由 Peter Zijlstra 提交于 3月 21, 2019

[ Upstream commit d7262457e35dbe239659e62654e56f8ddb814bed ]

Stephane reported that the TFA MSR is not initialized by the kernel,
but the TFA bit could set by firmware or as a leftover from a kexec,
which makes the state inconsistent.
Reported-by: NStephane Eranian <eranian@google.com>
Tested-by: NNelson DSouza <nelson.dsouza@intel.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: tonyj@suse.com
Link: https://lkml.kernel.org/r/20190321123849.GN6521@hirez.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>

7d10436c

perf/x86/intel: Fix handling of wakeup_events for multi-entry PEBS · 2b791e8e

由 Stephane Eranian 提交于 3月 06, 2019

[ Upstream commit 583feb08e7f7ac9d533b446882eb3a54737a6dbb ]

When an event is programmed with attr.wakeup_events=N (N>0), it means
the caller is interested in getting a user level notification after
N samples have been recorded in the kernel sampling buffer.

With precise events on Intel processors, the kernel uses PEBS.
The kernel tries minimize sampling overhead by verifying
if the event configuration is compatible with multi-entry PEBS mode.
If so, the kernel is notified only when the buffer has reached its threshold.
Other PEBS operates in single-entry mode, the kenrel is notified for each
PEBS sample.

The problem is that the current implementation look at frequency
mode and event sample_type but ignores the wakeup_events field. Thus,
it may not be possible to receive a notification after each precise event.

This patch fixes this problem by disabling multi-entry PEBS if wakeup_events
is non-zero.
Signed-off-by: NStephane Eranian <eranian@google.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NAndi Kleen <ak@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: kan.liang@intel.com
Link: https://lkml.kernel.org/r/20190306195048.189514-1-eranian@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>

2b791e8e

08 5月, 2019 6 次提交

x86/mm/tlb: Revert "x86/mm: Align TLB invalidation info" · 7a32cbf1

由 Peter Zijlstra 提交于 4月 16, 2019

commit 780e0106d468a2962b16b52fdf42898f2639e0a0 upstream.

Revert the following commit:

  515ab7c4: ("x86/mm: Align TLB invalidation info")

I found out (the hard way) that under some .config options (notably L1_CACHE_SHIFT=7)
and compiler combinations this on-stack alignment leads to a 320 byte
stack usage, which then triggers a KASAN stack warning elsewhere.

Using 320 bytes of stack space for a 40 byte structure is ludicrous and
clearly not right.
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Acked-by: NNadav Amit <namit@vmware.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 515ab7c4 ("x86/mm: Align TLB invalidation info")
Link: http://lkml.kernel.org/r/20190416080335.GM7905@worktop.programming.kicks-ass.net
[ Minor changelog edits. ]
Signed-off-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

7a32cbf1

x86/mm: Fix a crash with kmemleak_scan() · c48b027f

由 Qian Cai 提交于 4月 23, 2019

commit 0d02113b31b2017dd349ec9df2314e798a90fa6e upstream.

The first kmemleak_scan() call after boot would trigger the crash below
because this callpath:

  kernel_init
    free_initmem
      mem_encrypt_free_decrypted_mem
        free_init_pages

unmaps memory inside the .bss when DEBUG_PAGEALLOC=y.

kmemleak_init() will register the .data/.bss sections and then
kmemleak_scan() will scan those addresses and dereference them looking
for pointer references. If free_init_pages() frees and unmaps pages in
those sections, kmemleak_scan() will crash if referencing one of those
addresses:

  BUG: unable to handle kernel paging request at ffffffffbd402000
  CPU: 12 PID: 325 Comm: kmemleak Not tainted 5.1.0-rc4+ #4
  RIP: 0010:scan_block
  Call Trace:
   scan_gray_list
   kmemleak_scan
   kmemleak_scan_thread
   kthread
   ret_from_fork

Since kmemleak_free_part() is tolerant to unknown objects (not tracked
by kmemleak), it is fine to call it from free_init_pages() even if not
all address ranges passed to this function are known to kmemleak.

 [ bp: Massage. ]

Fixes: b3f0907c ("x86/mm: Add .bss..decrypted section to hold shared variables")
Signed-off-by: NQian Cai <cai@lca.pw>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190423165811.36699-1-cai@lca.pwSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

c48b027f

x86/mm/KASLR: Fix the size of the direct mapping section · 052c78f5

由 Baoquan He 提交于 4月 04, 2019

commit ec3937107ab43f3e8b2bc9dad95710043c462ff7 upstream.

kernel_randomize_memory() uses __PHYSICAL_MASK_SHIFT to calculate
the maximum amount of system RAM supported. The size of the direct
mapping section is obtained from the smaller one of the below two
values:

  (actual system RAM size + padding size) vs (max system RAM size supported)

This calculation is wrong since commit

  b83ce5ee ("x86/mm/64: Make __PHYSICAL_MASK_SHIFT always 52").

In it, __PHYSICAL_MASK_SHIFT was changed to be 52, regardless of whether
the kernel is using 4-level or 5-level page tables. Thus, it will always
use 4 PB as the maximum amount of system RAM, even in 4-level paging
mode where it should actually be 64 TB.

Thus, the size of the direct mapping section will always
be the sum of the actual system RAM size plus the padding size.

Even when the amount of system RAM is 64 TB, the following layout will
still be used. Obviously KALSR will be weakened significantly.

   |____|_______actual RAM_______|_padding_|______the rest_______|
   0            64TB                                            ~120TB

Instead, it should be like this:

   |____|_______actual RAM_______|_________the rest______________|
   0            64TB                                            ~120TB

The size of padding region is controlled by
CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING, which is 10 TB by default.

The above issue only exists when
CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING is set to a non-zero value,
which is the case when CONFIG_MEMORY_HOTPLUG is enabled. Otherwise,
using __PHYSICAL_MASK_SHIFT doesn't affect KASLR.

Fix it by replacing __PHYSICAL_MASK_SHIFT with MAX_PHYSMEM_BITS.

 [ bp: Massage commit message. ]

Fixes: b83ce5ee ("x86/mm/64: Make __PHYSICAL_MASK_SHIFT always 52")
Signed-off-by: NBaoquan He <bhe@redhat.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Reviewed-by: NThomas Garnier <thgarnie@google.com>
Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: frank.ramsay@hpe.com
Cc: herbert@gondor.apana.org.au
Cc: kirill@shutemov.name
Cc: mike.travis@hpe.com
Cc: thgarnie@google.com
Cc: x86-ml <x86@kernel.org>
Cc: yamada.masahiro@socionext.com
Link: https://lkml.kernel.org/r/20190417083536.GE7065@MiWiFi-R3L-srvSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

052c78f5

x86/mce: Improve error message when kernel cannot recover, p2 · 61ff4406

由 Tony Luck 提交于 2月 25, 2019

commit 41f035a86b5b72a4f947c38e94239d20d595352a upstream.

In

  c7d606f5 ("x86/mce: Improve error message when kernel cannot recover")

a case was added for a machine check caused by a DATA access to poison
memory from the kernel. A case should have been added also for an
uncorrectable error during an instruction fetch in the kernel.

Add that extra case so the error message now reads:

  mce: [Hardware Error]: Machine check: Instruction fetch error in kernel

Fixes: c7d606f5 ("x86/mce: Improve error message when kernel cannot recover")
Signed-off-by: NTony Luck <tony.luck@intel.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Pu Wen <puwen@hygon.cn>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190225205940.15226-1-tony.luck@intel.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

61ff4406

perf/x86/amd: Update generic hardware cache events for Family 17h · 3f8497cf

由 Kim Phillips 提交于 5月 02, 2019

commit 0e3b74e26280f2cf8753717a950b97d424da6046 upstream.

Add a new amd_hw_cache_event_ids_f17h assignment structure set
for AMD families 17h and above, since a lot has changed.  Specifically:

L1 Data Cache

The data cache access counter remains the same on Family 17h.

For DC misses, PMCx041's definition changes with Family 17h,
so instead we use the L2 cache accesses from L1 data cache
misses counter (PMCx060,umask=0xc8).

For DC hardware prefetch events, Family 17h breaks compatibility
for PMCx067 "Data Prefetcher", so instead, we use PMCx05a "Hardware
Prefetch DC Fills."

L1 Instruction Cache

PMCs 0x80 and 0x81 (32-byte IC fetches and misses) are backward
compatible on Family 17h.

For prefetches, we remove the erroneous PMCx04B assignment which
counts how many software data cache prefetch load instructions were
dispatched.

LL - Last Level Cache

Removing PMCs 7D, 7E, and 7F assignments, as they do not exist
on Family 17h, where the last level cache is L3.  L3 counters
can be accessed using the existing AMD Uncore driver.

Data TLB

On Intel machines, data TLB accesses ("dTLB-loads") are assigned
to counters that count load/store instructions retired.  This
is inconsistent with instruction TLB accesses, where Intel
implementations report iTLB misses that hit in the STLB.

Ideally, dTLB-loads would count higher level dTLB misses that hit
in lower level TLBs, and dTLB-load-misses would report those
that also missed in those lower-level TLBs, therefore causing
a page table walk.  That would be consistent with instruction
TLB operation, remove the redundancy between dTLB-loads and
L1-dcache-loads, and prevent perf from producing artificially
low percentage ratios, i.e. the "0.01%" below:

        42,550,869      L1-dcache-loads
        41,591,860      dTLB-loads
             4,802      dTLB-load-misses          #    0.01% of all dTLB cache hits
         7,283,682      L1-dcache-stores
         7,912,392      dTLB-stores
               310      dTLB-store-misses

On AMD Families prior to 17h, the "Data Cache Accesses" counter is
used, which is slightly better than load/store instructions retired,
but still counts in terms of individual load/store operations
instead of TLB operations.

So, for AMD Families 17h and higher, this patch assigns "dTLB-loads"
to a counter for L1 dTLB misses that hit in the L2 dTLB, and
"dTLB-load-misses" to a counter for L1 DTLB misses that caused
L2 DTLB misses and therefore also caused page table walks.  This
results in a much more accurate view of data TLB performance:

        60,961,781      L1-dcache-loads
             4,601      dTLB-loads
               963      dTLB-load-misses          #   20.93% of all dTLB cache hits

Note that for all AMD families, data loads and stores are combined
in a single accesses counter, so no 'L1-dcache-stores' are reported
separately, and stores are counted with loads in 'L1-dcache-loads'.

Also note that the "% of all dTLB cache hits" string is misleading
because (a) "dTLB cache": although TLBs can be considered caches for
page tables, in this context, it can be misinterpreted as data cache
hits because the figures are similar (at least on Intel), and (b) not
all those loads (technically accesses) technically "hit" at that
hardware level.  "% of all dTLB accesses" would be more clear/accurate.

Instruction TLB

On Intel machines, 'iTLB-loads' measure iTLB misses that hit in the
STLB, and 'iTLB-load-misses' measure iTLB misses that also missed in
the STLB and completed a page table walk.

For AMD Family 17h and above, for 'iTLB-loads' we replace the
erroneous instruction cache fetches counter with PMCx084
"L1 ITLB Miss, L2 ITLB Hit".

For 'iTLB-load-misses' we still use PMCx085 "L1 ITLB Miss,
L2 ITLB Miss", but set a 0xff umask because without it the event
does not get counted.

Branch Predictor (BPU)

PMCs 0xc2 and 0xc3 continue to be valid across all AMD Families.

Node Level Events

Family 17h does not have a PMCx0e9 counter, and corresponding counters
have not been made available publicly, so for now, we mark them as
unsupported for Families 17h and above.

Reference:

  "Open-Source Register Reference For AMD Family 17h Processors Models 00h-2Fh"
  Released 7/17/2018, Publication #56255, Revision 3.03:
  https://www.amd.com/system/files/TechDocs/56255_OSRR.pdf

[ mingo: tidied up the line breaks. ]
Signed-off-by: NKim Phillips <kim.phillips@amd.com>
Cc: <stable@vger.kernel.org> # v4.9+
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Liška <mliska@suse.cz>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Pu Wen <puwen@hygon.cn>
Cc: Stephane Eranian <eranian@google.com>
Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Lendacky <Thomas.Lendacky@amd.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: linux-kernel@vger.kernel.org
Cc: linux-perf-users@vger.kernel.org
Fixes: e40ed154 ("perf/x86: Add perf support for AMD family-17h processors")
Signed-off-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

3f8497cf

KVM: SVM: prevent DBG_DECRYPT and DBG_ENCRYPT overflow · 82e8da1f

由 David Rientjes 提交于 3月 25, 2019

[ Upstream commit b86bc2858b389255cd44555ce4b1e427b2b770c0 ]

This ensures that the address and length provided to DBG_DECRYPT and
DBG_ENCRYPT do not cause an overflow.

At the same time, pass the actual number of pages pinned in memory to
sev_unpin_memory() as a cleanup.
Reported-by: NCfir Cohen <cfir@google.com>
Signed-off-by: NDavid Rientjes <rientjes@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>

82e8da1f

05 5月, 2019 2 次提交

KVM: nVMX: Fix size checks in vmx_set_nested_state · 68d49ff4

由 Jim Mattson 提交于 1月 17, 2019

commit e8ab8d24b488632d07ce5ddb261f1d454114415b upstream.

The size checks in vmx_nested_state are wrong because the calculations
are made based on the size of a pointer to a struct kvm_nested_state
rather than the size of a struct kvm_nested_state.
Reported-by: NFelix Wilhelm  <fwilhelm@google.com>
Signed-off-by: NJim Mattson <jmattson@google.com>
Reviewed-by: NDrew Schmitt <dasch@google.com>
Reviewed-by: NMarc Orr <marcorr@google.com>
Reviewed-by: NPeter Shier <pshier@google.com>
Reviewed-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com>
Fixes: 8fcc4b59
Cc: stable@ver.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

68d49ff4

KVM: x86: Whitelist port 0x7e for pre-incrementing %rip · 499bbe73

由 Sean Christopherson 提交于 4月 29, 2019

commit 8764ed55c9705e426d889ff16c26f398bba70b9b upstream.

KVM's recent bug fix to update %rip after emulating I/O broke userspace
that relied on the previous behavior of incrementing %rip prior to
exiting to userspace.  When running a Windows XP guest on AMD hardware,
Qemu may patch "OUT 0x7E" instructions in reaction to the OUT itself.
Because KVM's old behavior was to increment %rip before exiting to
userspace to handle the I/O, Qemu manually adjusted %rip to account for
the OUT instruction.

Arguably this is a userspace bug as KVM requires userspace to re-enter
the kernel to complete instruction emulation before taking any other
actions.  That being said, this is a bit of a grey area and breaking
userspace that has worked for many years is bad.

Pre-increment %rip on OUT to port 0x7e before exiting to userspace to
hack around the issue.

Fixes: 45def77ebf79e ("KVM: x86: update %rip after emulating IO")
Reported-by: NSimon Becherer <simon@becherer.de>
Reported-and-tested-by: NIakov Karpov <srid@rkmail.ru>
Reported-by: NGabriele Balducci <balducci@units.it>
Reported-by: NAntti Antinoja <reader@fennosys.fi>
Cc: stable@vger.kernel.org
Cc: Takashi Iwai <tiwai@suse.com>
Cc: Jiri Slaby <jslaby@suse.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

499bbe73

04 5月, 2019 2 次提交

x86/mm: Don't exceed the valid physical address space · 6a364b2e

由 Ralph Campbell 提交于 3月 25, 2019

[ Upstream commit 92c77f7c4d5dfaaf45b2ce19360e69977c264766 ]

valid_phys_addr_range() is used to sanity check the physical address range
of an operation, e.g., access to /dev/mem. It uses __pa(high_memory)
internally.

If memory is populated at the end of the physical address space, then
__pa(high_memory) is outside of the physical address space because:

   high_memory = (void *)__va(max_pfn * PAGE_SIZE - 1) + 1;

For the comparison in valid_phys_addr_range() this is not an issue, but if
CONFIG_DEBUG_VIRTUAL is enabled, __pa() maps to __phys_addr(), which
verifies that the resulting physical address is within the valid physical
address space of the CPU. So in the case that memory is populated at the
end of the physical address space, this is not true and triggers a
VIRTUAL_BUG_ON().

Use __pa(high_memory - 1) to prevent the conversion from going beyond
the end of valid physical addresses.

Fixes: be62a320 ("x86/mm: Limit mmap() of /dev/mem to valid physical addresses")
Signed-off-by: NRalph Campbell <rcampbell@nvidia.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Craig Bergstrom <craigb@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hans Verkuil <hans.verkuil@cisco.com>
Cc: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sander Eikelenboom <linux@eikelenboom.it>
Cc: Sean Young <sean@mess.org>

Link: https://lkml.kernel.org/r/20190326001817.15413-2-rcampbell@nvidia.comSigned-off-by: NSasha Levin (Microsoft) <sashal@kernel.org>

6a364b2e

x86/realmode: Don't leak the trampoline kernel address · fe71e625

由 Matteo Croce 提交于 3月 26, 2019

[ Upstream commit b929a500d68479163c48739d809cbf4c1335db6f ]

Since commit

  ad67b74d ("printk: hash addresses printed with %p")

at boot "____ptrval____" is printed instead of the trampoline addresses:

  Base memory trampoline at [(____ptrval____)] 99000 size 24576

Remove the print as we don't want to leak kernel addresses and this
statement is not needed anymore.

Fixes: ad67b74d ("printk: hash addresses printed with %p")
Signed-off-by: NMatteo Croce <mcroce@redhat.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190326203046.20787-1-mcroce@redhat.comSigned-off-by: NSasha Levin (Microsoft) <sashal@kernel.org>

fe71e625

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功