提交 · d5813e77363cf550d23472aab64b3c9345de5d5c · openanolis / cloud-kernel

06 4月, 2019 5 次提交

kprobes/x86: Blacklist non-attachable interrupt functions · d5813e77

由 Andrea Righi 提交于 12月 06, 2018

[ Upstream commit a50480cb6d61d5c5fc13308479407b628b6bc1c5 ]

These interrupt functions are already non-attachable by kprobes.
Blacklist them explicitly so that they can show up in
/sys/kernel/debug/kprobes/blacklist and tools like BCC can use this
additional information.
Signed-off-by: NAndrea Righi <righi.andrea@gmail.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: David S. Miller <davem@davemloft.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yonghong Song <yhs@fb.com>
Link: http://lkml.kernel.org/r/20181206095648.GA8249@DellSigned-off-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>

d5813e77

x86/build: Mark per-CPU symbols as absolute explicitly for LLD · 648b949b

由 Rafael Ávila de Espíndola 提交于 12月 19, 2018

[ Upstream commit d071ae09a4a1414c1433d5ae9908959a7325b0ad ]

Accessing per-CPU variables is done by finding the offset of the
variable in the per-CPU block and adding it to the address of the
respective CPU's block.

Section 3.10.8 of ld.bfd's documentation states:

  For expressions involving numbers, relative addresses and absolute
  addresses, ld follows these rules to evaluate terms:

  Other binary operations, that is, between two relative addresses
  not in the same section, or between a relative address and an
  absolute address, first convert any non-absolute term to an
  absolute address before applying the operator."

Note that LLVM's linker does not adhere to the GNU ld's implementation
and as such requires implicitly-absolute terms to be explicitly marked
as absolute in the linker script. If not, it fails currently with:

  ld.lld: error: ./arch/x86/kernel/vmlinux.lds:153: at least one side of the expression must be absolute
  ld.lld: error: ./arch/x86/kernel/vmlinux.lds:154: at least one side of the expression must be absolute
  Makefile:1040: recipe for target 'vmlinux' failed

This is not a functional change for ld.bfd which converts the term to an
absolute symbol anyways as specified above.

Based on a previous submission by Tri Vo <trong@android.com>.
Reported-by: NDmitry Golovin <dima@golovin.in>
Signed-off-by: NRafael Ávila de Espíndola <rafael@espindo.la>
[ Update commit message per Boris' and Michael's suggestions. ]
Signed-off-by: NNick Desaulniers <ndesaulniers@google.com>
[ Massage commit message more, fix typos. ]
Signed-off-by: NBorislav Petkov <bp@suse.de>
Tested-by: NDmitry Golovin <dima@golovin.in>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Cao Jin <caoj.fnst@cn.fujitsu.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tri Vo <trong@android.com>
Cc: dima@golovin.in
Cc: morbo@google.com
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20181219190145.252035-1-ndesaulniers@google.comSigned-off-by: NSasha Levin <sashal@kernel.org>

648b949b

x86/build: Specify elf_i386 linker emulation explicitly for i386 objects · d2053718

由 George Rimar 提交于 1月 11, 2019

[ Upstream commit 927185c124d62a9a4d35878d7f6d432a166b74e3 ]

The kernel uses the OUTPUT_FORMAT linker script command in it's linker
scripts. Most of the time, the -m option is passed to the linker with
correct architecture, but sometimes (at least for x86_64) the -m option
contradicts the OUTPUT_FORMAT directive.

Specifically, arch/x86/boot and arch/x86/realmode/rm produce i386 object
files, but are linked with the -m elf_x86_64 linker flag when building
for x86_64.

The GNU linker manpage doesn't explicitly state any tie-breakers between
-m and OUTPUT_FORMAT. But with BFD and Gold linkers, OUTPUT_FORMAT
overrides the emulation value specified with the -m option.

LLVM lld has a different behavior, however. When supplied with
contradicting -m and OUTPUT_FORMAT values it fails with the following
error message:

  ld.lld: error: arch/x86/realmode/rm/header.o is incompatible with elf_x86_64

Therefore, just add the correct -m after the incorrect one (it overrides
it), so the linker invocation looks like this:

  ld -m elf_x86_64 -z max-page-size=0x200000 -m elf_i386 --emit-relocs -T \
    realmode.lds header.o trampoline_64.o stack.o reboot.o -o realmode.elf

This is not a functional change for GNU ld, because (although not
explicitly documented) OUTPUT_FORMAT overrides -m EMULATION.

Tested by building x86_64 kernel with GNU gcc/ld toolchain and booting
it in QEMU.

 [ bp: massage and clarify text. ]
Suggested-by: NDmitry Golovin <dima@golovin.in>
Signed-off-by: NGeorge Rimar <grimar@accesssoftek.com>
Signed-off-by: NTri Vo <trong@android.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Tested-by: NTri Vo <trong@android.com>
Tested-by: NNick Desaulniers <ndesaulniers@google.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Michael Matz <matz@suse.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: morbo@google.com
Cc: ndesaulniers@google.com
Cc: ruiu@google.com
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190111201012.71210-1-trong@android.comSigned-off-by: NSasha Levin <sashal@kernel.org>

d2053718

perf/aux: Make perf_event accessible to setup_aux() · efd85d83

由 Mathieu Poirier 提交于 1月 31, 2019

[ Upstream commit 840018668ce2d96783356204ff282d6c9b0e5f66 ]

When pmu::setup_aux() is called the coresight PMU needs to know which
sink to use for the session by looking up the information in the
event's attr::config2 field.

As such simply replace the cpu information by the complete perf_event
structure and change all affected customers.
Signed-off-by: NMathieu Poirier <mathieu.poirier@linaro.org>
Reviewed-by: NSuzuki Poulouse <suzuki.poulose@arm.com>
Acked-by: NPeter Zijlstra <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-s390@vger.kernel.org
Link: http://lkml.kernel.org/r/20190131184714.20388-2-mathieu.poirier@linaro.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>

efd85d83

x86/hyperv: Fix kernel panic when kexec on HyperV · c3f28d59

由 Kairui Song 提交于 3月 06, 2019

[ Upstream commit 179fb36abb097976997f50733d5b122a29158cba ]

After commit 68bb7bfb ("X86/Hyper-V: Enable IPI enlightenments"),
kexec fails with a kernel panic:

kexec_core: Starting new kernel
BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v3.0 03/02/2018
RIP: 0010:0xffffc9000001d000

Call Trace:
 ? __send_ipi_mask+0x1c6/0x2d0
 ? hv_send_ipi_mask_allbutself+0x6d/0xb0
 ? mp_save_irq+0x70/0x70
 ? __ioapic_read_entry+0x32/0x50
 ? ioapic_read_entry+0x39/0x50
 ? clear_IO_APIC_pin+0xb8/0x110
 ? native_stop_other_cpus+0x6e/0x170
 ? native_machine_shutdown+0x22/0x40
 ? kernel_kexec+0x136/0x156

That happens if hypercall based IPIs are used because the hypercall page is
reset very early upon kexec reboot, but kexec sends IPIs to stop CPUs,
which invokes the hypercall and dereferences the unusable page.

To fix his, reset hv_hypercall_pg to NULL before the page is reset to avoid
any misuse, IPI sending will fall back to the non hypercall based
method. This only happens on kexec / kdump so just setting the pointer to
NULL is good enough.

Fixes: 68bb7bfb ("X86/Hyper-V: Enable IPI enlightenments")
Signed-off-by: NKairui Song <kasong@redhat.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Sasha Levin <sashal@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: devel@linuxdriverproject.org
Link: https://lkml.kernel.org/r/20190306111827.14131-1-kasong@redhat.comSigned-off-by: NSasha Levin <sashal@kernel.org>

c3f28d59

03 4月, 2019 3 次提交

KVM: x86: Emulate MSR_IA32_ARCH_CAPABILITIES on AMD hosts · 3a18eaba

由 Sean Christopherson 提交于 3月 07, 2019

commit 0cf9135b773bf32fba9dd8e6699c1b331ee4b749 upstream.

The CPUID flag ARCH_CAPABILITIES is unconditioinally exposed to host
userspace for all x86 hosts, i.e. KVM advertises ARCH_CAPABILITIES
regardless of hardware support under the pretense that KVM fully
emulates MSR_IA32_ARCH_CAPABILITIES.  Unfortunately, only VMX hosts
handle accesses to MSR_IA32_ARCH_CAPABILITIES (despite KVM_GET_MSRS
also reporting MSR_IA32_ARCH_CAPABILITIES for all hosts).

Move the MSR_IA32_ARCH_CAPABILITIES handling to common x86 code so
that it's emulated on AMD hosts.

Fixes: 1eaafe91 ("kvm: x86: IA32_ARCH_CAPABILITIES is always supported")
Cc: stable@vger.kernel.org
Reported-by: NXiaoyao Li <xiaoyao.li@linux.intel.com>
Cc: Jim Mattson <jmattson@google.com>
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

3a18eaba

KVM: x86: update %rip after emulating IO · b9733a74

由 Sean Christopherson 提交于 3月 11, 2019

commit 45def77ebf79e2e8942b89ed79294d97ce914fa0 upstream.

Most (all?) x86 platforms provide a port IO based reset mechanism, e.g.
OUT 92h or CF9h.  Userspace may emulate said mechanism, i.e. reset a
vCPU in response to KVM_EXIT_IO, without explicitly announcing to KVM
that it is doing a reset, e.g. Qemu jams vCPU state and resumes running.

To avoid corruping %rip after such a reset, commit 0967b7bf ("KVM:
Skip pio instruction when it is emulated, not executed") changed the
behavior of PIO handlers, i.e. today's "fast" PIO handling to skip the
instruction prior to exiting to userspace.  Full emulation doesn't need
such tricks becase re-emulating the instruction will naturally handle
%rip being changed to point at the reset vector.

Updating %rip prior to executing to userspace has several drawbacks:

  - Userspace sees the wrong %rip on the exit, e.g. if PIO emulation
    fails it will likely yell about the wrong address.
  - Single step exits to userspace for are effectively dropped as
    KVM_EXIT_DEBUG is overwritten with KVM_EXIT_IO.
  - Behavior of PIO emulation is different depending on whether it
    goes down the fast path or the slow path.

Rather than skip the PIO instruction before exiting to userspace,
snapshot the linear %rip and cancel PIO completion if the current
value does not match the snapshot.  For a 64-bit vCPU, i.e. the most
common scenario, the snapshot and comparison has negligible overhead
as VMCS.GUEST_RIP will be cached regardless, i.e. there is no extra
VMREAD in this case.

All other alternatives to snapshotting the linear %rip that don't
rely on an explicit reset announcenment suffer from one corner case
or another.  For example, canceling PIO completion on any write to
%rip fails if userspace does a save/restore of %rip, and attempting to
avoid that issue by canceling PIO only if %rip changed then fails if PIO
collides with the reset %rip.  Attempting to zero in on the exact reset
vector won't work for APs, which means adding more hooks such as the
vCPU's MP_STATE, and so on and so forth.

Checking for a linear %rip match technically suffers from corner cases,
e.g. userspace could theoretically rewrite the underlying code page and
expect a different instruction to execute, or the guest hardcodes a PIO
reset at 0xfffffff0, but those are far, far outside of what can be
considered normal operation.

Fixes: 432baf60 ("KVM: VMX: use kvm_fast_pio_in for handling IN I/O")
Cc: <stable@vger.kernel.org>
Reported-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

b9733a74

x86/smp: Enforce CONFIG_HOTPLUG_CPU when SMP=y · a0713e81

由 Thomas Gleixner 提交于 3月 26, 2019

commit bebd024e4815b1a170fcd21ead9c2222b23ce9e6 upstream.

The SMT disable 'nosmt' command line argument is not working properly when
CONFIG_HOTPLUG_CPU is disabled. The teardown of the sibling CPUs which are
required to be brought up due to the MCE issues, cannot work. The CPUs are
then kept in a half dead state.

As the 'nosmt' functionality has become popular due to the speculative
hardware vulnerabilities, the half torn down state is not a proper solution
to the problem.

Enforce CONFIG_HOTPLUG_CPU=y when SMP is enabled so the full operation is
possible.
Reported-by: NTianyu Lan <Tianyu.Lan@microsoft.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Konrad Wilk <konrad.wilk@oracle.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Mukesh Ojha <mojha@codeaurora.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Rik van Riel <riel@surriel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Micheal Kelley <michael.h.kelley@microsoft.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20190326163811.598166056@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

a0713e81

27 3月, 2019 2 次提交

x86/unwind: Add hardcoded ORC entry for NULL · 206a76a6

由 Jann Horn 提交于 3月 01, 2019

commit ac5ceccce5501e43d217c596e4ee859f2a3fef79 upstream.

When the ORC unwinder is invoked for an oops caused by IP==0,
it currently has no idea what to do because there is no debug information
for the stack frame of NULL.

But if RIP is NULL, it is very likely that the last successfully executed
instruction was an indirect CALL/JMP, and it is possible to unwind out in
the same way as for the first instruction of a normal function. Hardcode
a corresponding ORC entry.

With an artificially-added NULL call in prctl_set_seccomp(), before this
patch, the trace is:

Call Trace:
 ? __x64_sys_prctl+0x402/0x680
 ? __ia32_sys_prctl+0x6e0/0x6e0
 ? __do_page_fault+0x457/0x620
 ? do_syscall_64+0x6d/0x160
 ? entry_SYSCALL_64_after_hwframe+0x44/0xa9

After this patch, the trace looks like this:

Call Trace:
 __x64_sys_prctl+0x402/0x680
 ? __ia32_sys_prctl+0x6e0/0x6e0
 ? __do_page_fault+0x457/0x620
 do_syscall_64+0x6d/0x160
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

prctl_set_seccomp() still doesn't show up in the trace because for some
reason, tail call optimization is only disabled in builds that use the
frame pointer unwinder.
Signed-off-by: NJann Horn <jannh@google.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NJosh Poimboeuf <jpoimboe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: syzbot <syzbot+ca95b2b7aef9e7cbd6ab@syzkaller.appspotmail.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Michal Marek <michal.lkml@markovi.net>
Cc: linux-kbuild@vger.kernel.org
Link: https://lkml.kernel.org/r/20190301031201.7416-2-jannh@google.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

206a76a6

x86/unwind: Handle NULL pointer calls better in frame unwinder · 367ccafb

由 Jann Horn 提交于 3月 01, 2019

commit f4f34e1b82eb4219d8eaa1c7e2e17ca219a6a2b5 upstream.

When the frame unwinder is invoked for an oops caused by a call to NULL, it
currently skips the parent function because BP still points to the parent's
stack frame; the (nonexistent) current function only has the first half of
a stack frame, and BP doesn't point to it yet.

Add a special case for IP==0 that calculates a fake BP from SP, then uses
the real BP for the next frame.

Note that this handles first_frame specially: Return information about the
parent function as long as the saved IP is >=first_frame, even if the fake
BP points below it.

With an artificially-added NULL call in prctl_set_seccomp(), before this
patch, the trace is:

Call Trace:
 ? prctl_set_seccomp+0x3a/0x50
 __x64_sys_prctl+0x457/0x6f0
 ? __ia32_sys_prctl+0x750/0x750
 do_syscall_64+0x72/0x160
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

After this patch, the trace is:

Call Trace:
 prctl_set_seccomp+0x3a/0x50
 __x64_sys_prctl+0x457/0x6f0
 ? __ia32_sys_prctl+0x750/0x750
 do_syscall_64+0x72/0x160
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
Signed-off-by: NJann Horn <jannh@google.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NJosh Poimboeuf <jpoimboe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: syzbot <syzbot+ca95b2b7aef9e7cbd6ab@syzkaller.appspotmail.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Michal Marek <michal.lkml@markovi.net>
Cc: linux-kbuild@vger.kernel.org
Link: https://lkml.kernel.org/r/20190301031201.7416-1-jannh@google.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

367ccafb

24 3月, 2019 13 次提交

KVM: nVMX: Ignore limit checks on VMX instructions using flat segments · 5ffb710b

由 Sean Christopherson 提交于 1月 23, 2019

commit 34333cc6c2cb021662fd32e24e618d1b86de95bf upstream.

Regarding segments with a limit==0xffffffff, the SDM officially states:

When the effective limit is FFFFFFFFH (4 GBytes), these accesses may
or may not cause the indicated exceptions. Behavior is
implementation-specific and may vary from one execution to another.

In practice, all CPUs that support VMX ignore limit checks for "flat
segments", i.e. an expand-up data or code segment with base=0 and
limit=0xffffffff. This is subtly different than wrapping the effective
address calculation based on the address size, as the flat segment
behavior also applies to accesses that would wrap the 4g boundary, e.g.
a 4-byte access starting at 0xffffffff will access linear addresses
0xffffffff, 0x0, 0x1 and 0x2.

Fixes: f9eb4af6 ("KVM: nVMX: VMX instructions: add checks for #GP/#SS exceptions")
Cc: stable@vger.kernel.org
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

5ffb710b

KVM: nVMX: Apply addr size mask to effective address for VMX instructions · 29b515c2

由 Sean Christopherson 提交于 1月 23, 2019

commit 8570f9e881e3fde98801bb3a47eef84dd934d405 upstream.

The address size of an instruction affects the effective address, not
the virtual/linear address.  The final address may still be truncated,
e.g. to 32-bits outside of long mode, but that happens irrespective of
the address size, e.g. a 32-bit address size can yield a 64-bit virtual
address when using FS/GS with a non-zero base.

Fixes: 064aea77 ("KVM: nVMX: Decoding memory operands of VMX instructions")
Cc: stable@vger.kernel.org
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

29b515c2

KVM: nVMX: Sign extend displacements of VMX instr's mem operands · 9ce0ffeb

由 Sean Christopherson 提交于 1月 23, 2019

commit 946c522b603f281195af1df91837a1d4d1eb3bc9 upstream.

The VMCS.EXIT_QUALIFCATION field reports the displacements of memory
operands for various instructions, including VMX instructions, as a
naturally sized unsigned value, but masks the value by the addr size,
e.g. given a ModRM encoded as -0x28(%ebp), the -0x28 displacement is
reported as 0xffffffd8 for a 32-bit address size. Despite some weird
wording regarding sign extension, the SDM explicitly states that bits
beyond the instructions address size are undefined:

In all cases, bits of this field beyond the instructionâ€™s address
size are undefined.

Failure to sign extend the displacement results in KVM incorrectly
treating a negative displacement as a large positive displacement when
the address size of the VMX instruction is smaller than KVM's native
size, e.g. a 32-bit address size on a 64-bit KVM.

The very original decoding, added by commit 064aea77 ("KVM: nVMX:
Decoding memory operands of VMX instructions"), sort of modeled sign
extension by truncating the final virtual/linear address for a 32-bit
address size. I.e. it messed up the effective address but made it work
by adjusting the final address.

When segmentation checks were added, the truncation logic was kept
as-is and no sign extension logic was introduced. In other words, it
kept calculating the wrong effective address while mostly generating
the correct virtual/linear address. As the effective address is what's
used in the segment limit checks, this results in KVM incorreclty
injecting #GP/#SS faults due to non-existent segment violations when
a nested VMM uses negative displacements with an address size smaller
than KVM's native address size.

Using the -0x28(%ebp) example, an EBP value of 0x1000 will result in
KVM using 0x100000fd8 as the effective address when checking for a
segment limit violation. This causes a 100% failure rate when running
a 32-bit KVM build as L1 on top of a 64-bit KVM L0.

9ce0ffeb

KVM: x86/mmu: Do not cache MMIO accesses while memslots are in flux · c235af5a

由 Sean Christopherson 提交于 2月 05, 2019

commit ddfd1730fd829743e41213e32ccc8b4aa6dc8325 upstream.

When installing new memslots, KVM sets bit 0 of the generation number to
indicate that an update is in-progress.  Until the update is complete,
there are no guarantees as to whether a vCPU will see the old or the new
memslots.  Explicity prevent caching MMIO accesses so as to avoid using
an access cached from the old memslots after the new memslots have been
installed.

Note that it is unclear whether or not disabling caching during the
update window is strictly necessary as there is no definitive
documentation as to what ordering guarantees KVM provides with respect
to updating memslots.  That being said, the MMIO spte code does not
allow reusing sptes created while an update is in-progress, and the
associated documentation explicitly states:

    We do not want to use an MMIO sptes created with an odd generation
    number, ...  If KVM is unlucky and creates an MMIO spte while the
    low bit is 1, the next access to the spte will always be a cache miss.

At the very least, disabling the per-vCPU MMIO cache during updates will
make its behavior consistent with the MMIO spte behavior and
documentation.

Fixes: 56f17dd3 ("kvm: x86: fix stale mmio cache bug")
Cc: <stable@vger.kernel.org>
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

c235af5a

KVM: x86/mmu: Detect MMIO generation wrap in any address space · 656e9e5d

由 Sean Christopherson 提交于 2月 05, 2019

commit e1359e2beb8b0a1188abc997273acbaedc8ee791 upstream.

The check to detect a wrap of the MMIO generation explicitly looks for a
generation number of zero.  Now that unique memslots generation numbers
are assigned to each address space, only address space 0 will get a
generation number of exactly zero when wrapping.  E.g. when address
space 1 goes from 0x7fffe to 0x80002, the MMIO generation number will
wrap to 0x2.  Adjust the MMIO generation to strip the address space
modifier prior to checking for a wrap.

Fixes: 4bd518f1 ("KVM: use separate generations for each address space")
Cc: <stable@vger.kernel.org>
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

656e9e5d

KVM: Call kvm_arch_memslots_updated() before updating memslots · 23ad135a

由 Sean Christopherson 提交于 2月 05, 2019

commit 152482580a1b0accb60676063a1ac57b2d12daf6 upstream.

kvm_arch_memslots_updated() is at this point in time an x86-specific
hook for handling MMIO generation wraparound. x86 stashes 19 bits of
the memslots generation number in its MMIO sptes in order to avoid
full page fault walks for repeat faults on emulated MMIO addresses.
Because only 19 bits are used, wrapping the MMIO generation number is
possible, if unlikely. kvm_arch_memslots_updated() alerts x86 that
the generation has changed so that it can invalidate all MMIO sptes in
case the effective MMIO generation has wrapped so as to avoid using a
stale spte, e.g. a (very) old spte that was created with generation==0.

Given that the purpose of kvm_arch_memslots_updated() is to prevent
consuming stale entries, it needs to be called before the new generation
is propagated to memslots. Invalidating the MMIO sptes after updating
memslots means that there is a window where a vCPU could dereference
the new memslots generation, e.g. 0, and incorrectly reuse an old MMIO
spte that was created with (pre-wrap) generation==0.

Fixes: e59dbe09 ("KVM: Introduce kvm_arch_memslots_updated()")
Cc: <stable@vger.kernel.org>
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

23ad135a

perf/x86/intel/uncore: Fix client IMC events return huge result · 30cedf18

由 Kan Liang 提交于 2月 27, 2019

commit 8041ffd36f42d8521d66dd1e236feb58cecd68bc upstream.

The client IMC bandwidth events currently return very large values:

  $ perf stat -e uncore_imc/data_reads/ -e uncore_imc/data_writes/ -I 10000 -a

  10.000117222 34,788.76 MiB uncore_imc/data_reads/
  10.000117222 8.26 MiB uncore_imc/data_writes/
  20.000374584 34,842.89 MiB uncore_imc/data_reads/
  20.000374584 10.45 MiB uncore_imc/data_writes/
  30.000633299 37,965.29 MiB uncore_imc/data_reads/
  30.000633299 323.62 MiB uncore_imc/data_writes/
  40.000891548 41,012.88 MiB uncore_imc/data_reads/
  40.000891548 6.98 MiB uncore_imc/data_writes/
  50.001142480 1,125,899,906,621,494.75 MiB uncore_imc/data_reads/
  50.001142480 6.97 MiB uncore_imc/data_writes/

The client IMC events are freerunning counters. They still use the
old event encoding format (0x1 for data_read and 0x2 for data write).
The counter bit width is calculated by common code, which assume that
the standard encoding format is used for the freerunning counters.
Error bit width information is calculated.

The patch intends to convert the old client IMC event encoding to the
standard encoding format.

Current common code uses event->attr.config which directly copy from
user space. We should not implicitly modify it for a converted event.
The event->hw.config is used to replace the event->attr.config in
common code.

For client IMC events, the event->attr.config is used to calculate a
converted event with standard encoding format in the custom
event_init(). The converted event is stored in event->hw.config.
For other events of freerunning counters, they already use the standard
encoding format. The same value as event->attr.config is assigned to
event->hw.config in common event_init().
Reported-by: NJin Yao <yao.jin@linux.intel.com>
Tested-by: NJin Yao <yao.jin@linux.intel.com>
Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: stable@kernel.org # v4.18+
Fixes: 9aae1780 ("perf/x86/intel/uncore: Clean up client IMC uncore")
Link: https://lkml.kernel.org/r/20190227165729.1861-1-kan.liang@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

30cedf18

x86/kprobes: Prohibit probing on optprobe template code · ee7d297f

由 Masami Hiramatsu 提交于 2月 13, 2019

commit 0192e6535ebe9af68614198ced4fd6d37b778ebf upstream.

Prohibit probing on optprobe template code, since it is not
a code but a template instruction sequence. If we modify
this template, copied template must be broken.
Signed-off-by: NMasami Hiramatsu <mhiramat@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andrea Righi <righi.andrea@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Fixes: 9326638c ("kprobes, x86: Use NOKPROBE_SYMBOL() instead of __kprobes annotation")
Link: http://lkml.kernel.org/r/154998787911.31052.15274376330136234452.stgit@devboxSigned-off-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

ee7d297f

xen: fix dom0 boot on huge systems · 468ff43f

由 Juergen Gross 提交于 3月 07, 2019

commit 01bd2ac2f55a1916d81dace12fa8d7ae1c79b5ea upstream.

Commit f7c90c2a ("x86/xen: don't write ptes directly in 32-bit
PV guests") introduced a regression for booting dom0 on huge systems
with lots of RAM (in the TB range).

Reason is that on those hosts the p2m list needs to be moved early in
the boot process and this requires temporary page tables to be created.
Said commit modified xen_set_pte_init() to use a hypercall for writing
a PTE, but this requires the page table being in the direct mapped
area, which is not the case for the temporary page tables used in
xen_relocate_p2m().

As the page tables are completely written before being linked to the
actual address space instead of set_pte() a plain write to memory can
be used in xen_relocate_p2m().

Fixes: f7c90c2a ("x86/xen: don't write ptes directly in 32-bit PV guests")
Cc: stable@vger.kernel.org
Signed-off-by: NJuergen Gross <jgross@suse.com>
Reviewed-by: NJan Beulich <jbeulich@suse.com>
Signed-off-by: NJuergen Gross <jgross@suse.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

468ff43f

crypto: x86/morus - fix handling chunked inputs and MAY_SLEEP · 66700c89

由 Eric Biggers 提交于 1月 31, 2019

commit 2060e284e9595fc3baed6e035903c05b93266555 upstream.

The x86 MORUS implementations all fail the improved AEAD tests because
they produce the wrong result with some data layouts.  The issue is that
they assume that if the skcipher_walk API gives 'nbytes' not aligned to
the walksize (a.k.a. walk.stride), then it is the end of the data.  In
fact, this can happen before the end.

Also, when the CRYPTO_TFM_REQ_MAY_SLEEP flag is given, they can
incorrectly sleep in the skcipher_walk_*() functions while preemption
has been disabled by kernel_fpu_begin().

Fix these bugs.

Fixes: 56e8e57f ("crypto: morus - Add common SIMD glue code for MORUS")
Cc: <stable@vger.kernel.org> # v4.18+
Cc: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: NEric Biggers <ebiggers@google.com>
Reviewed-by: NOndrej Mosnacek <omosnace@redhat.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

66700c89

crypto: x86/aesni-gcm - fix crash on empty plaintext · 8a9fcf4a

由 Eric Biggers 提交于 1月 31, 2019

commit 3af349639597fea582a93604734d717e59a0e223 upstream.

gcmaes_crypt_by_sg() dereferences the NULL pointer returned by
scatterwalk_ffwd() when encrypting an empty plaintext and the source
scatterlist ends immediately after the associated data.

Fix it by only fast-forwarding to the src/dst data scatterlists if the
data length is nonzero.

This bug is reproduced by the "rfc4543(gcm(aes))" test vectors when run
with the new AEAD test manager.

Fixes: e8455207 ("crypto: aesni - Update aesni-intel_glue to use scatter/gather")
Cc: <stable@vger.kernel.org> # v4.17+
Cc: Dave Watson <davejwatson@fb.com>
Signed-off-by: NEric Biggers <ebiggers@google.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

8a9fcf4a

crypto: x86/aegis - fix handling chunked inputs and MAY_SLEEP · 5d2a5172

由 Eric Biggers 提交于 1月 31, 2019

commit ba6771c0a0bc2fac9d6a8759bab8493bd1cffe3b upstream.

The x86 AEGIS implementations all fail the improved AEAD tests because
they produce the wrong result with some data layouts.  The issue is that
they assume that if the skcipher_walk API gives 'nbytes' not aligned to
the walksize (a.k.a. walk.stride), then it is the end of the data.  In
fact, this can happen before the end.

Also, when the CRYPTO_TFM_REQ_MAY_SLEEP flag is given, they can
incorrectly sleep in the skcipher_walk_*() functions while preemption
has been disabled by kernel_fpu_begin().

Fix these bugs.

Fixes: 1d373d4e ("crypto: x86 - Add optimized AEGIS implementations")
Cc: <stable@vger.kernel.org> # v4.18+
Cc: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: NEric Biggers <ebiggers@google.com>
Reviewed-by: NOndrej Mosnacek <omosnace@redhat.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

5d2a5172

x86/CPU: Add Icelake model number · a9503ade

由 Rajneesh Bhardwaj 提交于 2月 14, 2019

[ Upstream commit 8cd8f0ce0d6aafe661cb3d6781c8b82bc696c04d ]

Add the CPUID model number of Icelake (ICL) mobile processors to the
Intel family list. Icelake U/Y series uses model number 0x7E.
Signed-off-by: NRajneesh Bhardwaj <rajneesh.bhardwaj@linux.intel.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "David E. Box" <david.e.box@intel.com>
Cc: dvhart@infradead.org
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: platform-driver-x86@vger.kernel.org
Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190214115712.19642-2-rajneesh.bhardwaj@linux.intel.comSigned-off-by: NSasha Levin <sashal@kernel.org>

a9503ade

19 3月, 2019 3 次提交

perf/x86/intel: Make dev_attr_allow_tsx_force_abort static · d6b577c6

由 kbuild test robot 提交于 3月 14, 2019

commit c634dc6bdedeb0b2c750fc611612618a85639ab2 upstream.

Fixes: 400816f60c54 ("perf/x86/intel: Implement support for TSX Force Abort")
Signed-off-by: Nkbuild test robot <lkp@intel.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: kbuild-all@01.org
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20190313184243.GA10820@lkp-sb-ep06Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

d6b577c6

perf/x86/intel: Fix memory corruption · 92c9a389

由 Peter Zijlstra 提交于 3月 14, 2019

commit ede271b059463731cbd6dffe55ffd70d7dbe8392 upstream.

Through:

  validate_event()
    x86_pmu.get_event_constraints(.idx=-1)
      tfa_get_event_constraints()
        dyn_constraint()

cpuc->constraint_list[-1] is used, which is an obvious out-of-bound access.

In this case, simply skip the TFA constraint code, there is no event
constraint with just PMC3, therefore the code will never result in the
empty set.

Fixes: 400816f60c54 ("perf/x86/intel: Implement support for TSX Force Abort")
Reported-by: NTony Jones <tonyj@suse.com>
Reported-by: N"DSouza, Nelson" <nelson.dsouza@intel.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Tested-by: NTony Jones <tonyj@suse.com>
Tested-by: N"DSouza, Nelson" <nelson.dsouza@intel.com>
Cc: eranian@google.com
Cc: jolsa@redhat.com
Cc: stable@kernel.org
Link: https://lkml.kernel.org/r/20190314130705.441549378@infradead.orgSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

92c9a389

perf/x86: Fixup typo in stub functions · a8eae05f

由 Peter Zijlstra 提交于 3月 15, 2019

commit f764c58b7faa26f5714e6907f892abc2bc0de4f8 upstream.

Guenter reported a build warning for CONFIG_CPU_SUP_INTEL=n:

  > With allmodconfig-CONFIG_CPU_SUP_INTEL, this patch results in:
  >
  > In file included from arch/x86/events/amd/core.c:8:0:
  > arch/x86/events/amd/../perf_event.h:1036:45: warning: ‘struct cpu_hw_event’ declared inside parameter list will not be visible outside of this definition or declaration
  >  static inline int intel_cpuc_prepare(struct cpu_hw_event *cpuc, int cpu)

While harmless (an unsed pointer is an unused pointer, no matter the type)
it needs fixing.
Reported-by: NGuenter Roeck <linux@roeck-us.net>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Fixes: d01b1f96a82e ("perf/x86/intel: Make cpuc allocations consistent")
Link: http://lkml.kernel.org/r/20190315081410.GR5996@hirez.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

a8eae05f

14 3月, 2019 9 次提交

perf/x86/intel: Implement support for TSX Force Abort · 26b6e018

由 Peter Zijlstra (Intel) 提交于 3月 05, 2019

commit 400816f60c543153656ac74eaf7f36f6b7202378 upstream

Skylake (and later) will receive a microcode update to address a TSX
errata. This microcode will, on execution of a TSX instruction
(speculative or not) use (clobber) PMC3. This update will also provide
a new MSR to change this behaviour along with a CPUID bit to enumerate
the presence of this new MSR.

When the MSR gets set; the microcode will no longer use PMC3 but will
Force Abort every TSX transaction (upon executing COMMIT).

When TSX Force Abort (TFA) is allowed (default); the MSR gets set when
PMC3 gets scheduled and cleared when, after scheduling, PMC3 is
unused.

When TFA is not allowed; clear PMC3 from all constraints such that it
will not get used.
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

26b6e018

x86: Add TSX Force Abort CPUID/MSR · fdd82094

由 Peter Zijlstra (Intel) 提交于 3月 05, 2019

commit 52f64909409c17adf54fcf5f9751e0544ca3a6b4 upstream

Skylake systems will receive a microcode update to address a TSX
errata. This microcode will (by default) clobber PMC3 when TSX
instructions are (speculatively or not) executed.

It also provides an MSR to cause all TSX transaction to abort and
preserve PMC3.

Add the CPUID enumeration and MSR definition.
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

fdd82094

perf/x86/intel: Generalize dynamic constraint creation · 9e071aa6

由 Peter Zijlstra (Intel) 提交于 3月 05, 2019

commit 11f8b2d65ca9029591c8df26bb6bd063c312b7fe upstream

Such that we can re-use it.
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

9e071aa6

perf/x86/intel: Make cpuc allocations consistent · f99f7dae

由 Peter Zijlstra (Intel) 提交于 3月 05, 2019

commit d01b1f96a82e5dd7841a1d39db3abfdaf95f70ab upstream

The cpuc data structure allocation is different between fake and real
cpuc's; use the same code to init/free both.
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

f99f7dae

x86/PCI: Fixup RTIT_BAR of Intel Denverton Trace Hub · 36e3673d

由 Alexander Shishkin 提交于 2月 07, 2019

commit 2e095ce7b6ecce2f3e2ff330527f12056ed1e1a1 upstream.

On Denverton's integration of the Intel(R) Trace Hub (for a reference and
overview see Documentation/trace/intel_th.rst) the reported size of one of
its resources (RTIT_BAR) doesn't match its actual size, which leads to
overlaps with other devices' resources.

In practice, it overlaps with XHCI MMIO space, which results in the xhci
driver bailing out after seeing its registers as 0xffffffff, and perceived
disappearance of all USB devices:

intel_th_pci 0000:00:1f.7: enabling device (0004 -> 0006)
xhci_hcd 0000:00:15.0: xHCI host controller not responding, assume dead
xhci_hcd 0000:00:15.0: xHC not responding in xhci_irq, assume controller is dead
xhci_hcd 0000:00:15.0: HC died; cleaning up
usb 1-1: USB disconnect, device number 2

For this reason, we need to resize the RTIT_BAR on Denverton to its actual
size, which in this case is 4MB. The corresponding erratum is DNV36 at the
link below:

DNV36. Processor Host Root Complex May Incorrectly Route Memory
Accesses to Intel® Trace Hub

Problem: The Intel® Trace Hub RTIT_BAR (B0:D31:F7 offset 20h) is
reported as a 2KB memory range. Due to this erratum, the
processor Host Root Complex will forward addresses from
RTIT_BAR to RTIT_BAR + 4MB -1 to Intel® Trace Hub.

Implication: Devices assigned within the RTIT_BAR to RTIT_BAR + 4MB -1
space may not function correctly.

Workaround: A BIOS code change has been identified and may be
implemented as a workaround for this erratum.

Status: No Fix.

Note that 5118ccd3 ("intel_th: pci: Add Denverton SOC support") updates
the Trace Hub driver so it claims the Denverton device, but the resource
overlap exists regardless of whether that driver is loaded or that commit
is included.

Link: https://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/atom-c3000-family-spec-update.pdfSigned-off-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com>
[bhelgaas: include erratum text, clarify relationship with 5118ccd3]
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

36e3673d

x86_64: increase stack size for KASAN_EXTRA · 5edeae21

由 Qian Cai 提交于 2月 01, 2019

[ Upstream commit a8e911d13540487942d53137c156bd7707f66e5d ]

If the kernel is configured with KASAN_EXTRA, the stack size is
increasted significantly because this option sets "-fstack-reuse" to
"none" in GCC [1].  As a result, it triggers stack overrun quite often
with 32k stack size compiled using GCC 8.  For example, this reproducer

  https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/syscalls/madvise/madvise06.c

triggers a "corrupted stack end detected inside scheduler" very reliably
with CONFIG_SCHED_STACK_END_CHECK enabled.

There are just too many functions that could have a large stack with
KASAN_EXTRA due to large local variables that have been called over and
over again without being able to reuse the stacks.  Some noticiable ones
are

  size
  7648 shrink_page_list
  3584 xfs_rmap_convert
  3312 migrate_page_move_mapping
  3312 dev_ethtool
  3200 migrate_misplaced_transhuge_page
  3168 copy_process

There are other 49 functions are over 2k in size while compiling kernel
with "-Wframe-larger-than=" even with a related minimal config on this
machine.  Hence, it is too much work to change Makefiles for each object
to compile without "-fsanitize-address-use-after-scope" individually.

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81715#c23

Although there is a patch in GCC 9 to help the situation, GCC 9 probably
won't be released in a few months and then it probably take another
6-month to 1-year for all major distros to include it as a default.
Hence, the stack usage with KASAN_EXTRA can be revisited again in 2020
when GCC 9 is everywhere.  Until then, this patch will help users avoid
stack overrun.

This has already been fixed for arm64 for the same reason via
6e8830674ea ("arm64: kasan: Increase stack size for KASAN_EXTRA").

Link: http://lkml.kernel.org/r/20190109215209.2903-1-cai@lca.pwSigned-off-by: NQian Cai <cai@lca.pw>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>

5edeae21

x86/kexec: Don't setup EFI info if EFI runtime is not enabled · e2b45446

由 Kairui Song 提交于 1月 18, 2019

[ Upstream commit 2aa958c99c7fd3162b089a1a56a34a0cdb778de1 ]

Kexec-ing a kernel with "efi=noruntime" on the first kernel's command
line causes the following null pointer dereference:

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
  #PF error: [normal kernel read fault]
  Call Trace:
   efi_runtime_map_copy+0x28/0x30
   bzImage64_load+0x688/0x872
   arch_kexec_kernel_image_load+0x6d/0x70
   kimage_file_alloc_init+0x13e/0x220
   __x64_sys_kexec_file_load+0x144/0x290
   do_syscall_64+0x55/0x1a0
   entry_SYSCALL_64_after_hwframe+0x44/0xa9

Just skip the EFI info setup if EFI runtime services are not enabled.

 [ bp: Massage commit message. ]
Suggested-by: NDave Young <dyoung@redhat.com>
Signed-off-by: NKairui Song <kasong@redhat.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Acked-by: NDave Young <dyoung@redhat.com>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: bhe@redhat.com
Cc: David Howells <dhowells@redhat.com>
Cc: erik.schmauss@intel.com
Cc: fanc.fnst@cn.fujitsu.com
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: kexec@lists.infradead.org
Cc: lenb@kernel.org
Cc: linux-acpi@vger.kernel.org
Cc: Philipp Rudo <prudo@linux.vnet.ibm.com>
Cc: rafael.j.wysocki@intel.com
Cc: robert.moore@intel.com
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Cc: Yannik Sembritzki <yannik@sembritzki.me>
Link: https://lkml.kernel.org/r/20190118111310.29589-2-kasong@redhat.comSigned-off-by: NSasha Levin <sashal@kernel.org>

e2b45446

x86/microcode/amd: Don't falsely trick the late loading mechanism · f964a4d2

由 Thomas Lendacky 提交于 1月 31, 2019

[ Upstream commit 912139cfbfa6a2bc1da052314d2c29338dae1f6a ]

The load_microcode_amd() function searches for microcode patches and
attempts to apply a microcode patch if it is of different level than the
currently installed level.

While the processor won't actually load a level that is less than
what is already installed, the logic wrongly returns UCODE_NEW thus
signaling to its caller reload_store() that a late loading should be
attempted.

If the file-system contains an older microcode revision than what is
currently running, such a late microcode reload can result in these
misleading messages:

  x86/CPU: CPU features have changed after loading microcode, but might not take effect.
  x86/CPU: Please consider either early loading through initrd/built-in or a potential BIOS update.

These messages were issued on a system where SME/SEV are not
enabled by the BIOS (MSR C001_0010[23] = 0b) because during boot,
early_detect_mem_encrypt() is called and cleared the SME and SEV
features in this case.

However, after the wrong late load attempt, get_cpu_cap() is called and
reloads the SME and SEV feature bits, resulting in the messages.

Update the microcode level check to not attempt microcode loading if the
current level is greater than(!) and not only equal to the current patch
level.

 [ bp: massage commit message. ]

Fixes: 2613f36e ("x86/microcode: Attempt late loading only when new microcode is present")
Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/154894518427.9406.8246222496874202773.stgit@tlendack-t1.amdoffice.netSigned-off-by: NSasha Levin <sashal@kernel.org>

f964a4d2

x86/boot/compressed/64: Set EFER.LME=1 in 32-bit trampoline before returning to long mode · 15fb5d73

由 Wei Huang 提交于 1月 03, 2019

[ Upstream commit b677dfae5aa197afc5191755a76a8727ffca538a ]

In some old AMD KVM implementation, guest's EFER.LME bit is cleared by KVM
when the hypervsior detects that the guest sets CR0.PG to 0. This causes
the guest OS to reboot when it tries to return from 32-bit trampoline code
because the CPU is in incorrect state: CR4.PAE=1, CR0.PG=1, CS.L=1, but
EFER.LME=0.  As a precaution, set EFER.LME=1 as part of long mode
activation procedure. This extra step won't cause any harm when Linux is
booted on a bare-metal machine.
Signed-off-by: NWei Huang <wei@redhat.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: bp@alien8.de
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/20190104054411.12489-1-wei@redhat.comSigned-off-by: NSasha Levin <sashal@kernel.org>

15fb5d73

10 3月, 2019 2 次提交

x86/boot/compressed/64: Do not read legacy ROM on EFI system · 51c53180

由 Kirill A. Shutemov 提交于 2月 19, 2019

commit 6f913de3231e1d70a871135b38219da7810df218 upstream.

EFI systems do not necessarily provide a legacy ROM. If the ROM is missing
the memory is not mapped at all.

Trying to dereference values in the legacy ROM area leads to a crash on
Macbook Pro.

Only look for values in the legacy ROM area for non-EFI system.

Fixes: 3548e131 ("x86/boot/compressed/64: Find a place for 32-bit trampoline")
Reported-by: NPitam Mitra <pitamm@gmail.com>
Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Tested-by: NBockjoo Kim <bockjoo@phys.ufl.edu>
Cc: bp@alien8.de
Cc: hpa@zytor.com
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20190219075224.35058-1-kirill.shutemov@linux.intel.com
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=202351Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

51c53180

x86/CPU/AMD: Set the CPB bit unconditionally on F17h · eab5ea25

由 Jiaxun Yang 提交于 11月 20, 2018

commit 0237199186e7a4aa5310741f0a6498a20c820fd7 upstream.

Some F17h models do not have CPB set in CPUID even though the CPU
supports it. Set the feature bit unconditionally on all F17h.

 [ bp: Rewrite commit message and patch. ]
Signed-off-by: NJiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Acked-by: NTom Lendacky <thomas.lendacky@amd.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Sherry Hurwitz <sherry.hurwitz@amd.com>
Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20181120030018.5185-1-jiaxun.yang@flygoat.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

eab5ea25

06 3月, 2019 3 次提交

x86/uaccess: Don't leak the AC flag into __put_user() value evaluation · 7371994d

由 Andy Lutomirski 提交于 2月 22, 2019

commit 2a418cf3f5f1caf911af288e978d61c9844b0695 upstream.

When calling __put_user(foo(), ptr), the __put_user() macro would call
foo() in between __uaccess_begin() and __uaccess_end().  If that code
were buggy, then those bugs would be run without SMAP protection.

Fortunately, there seem to be few instances of the problem in the
kernel. Nevertheless, __put_user() should be fixed to avoid doing this.
Therefore, evaluate __put_user()'s argument before setting AC.

This issue was noticed when an objtool hack by Peter Zijlstra complained
about genregs_get() and I compared the assembly output to the C source.

 [ bp: Massage commit message and fixed up whitespace. ]

Fixes: 11f1a4b9 ("x86: reorganize SMAP handling in user space accesses")
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20190225125231.845656645@infradead.orgSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

7371994d

KVM: nSVM: clear events pending from svm_complete_interrupts() when exiting to L1 · 60a4b3f7

由 Vitaly Kuznetsov 提交于 1月 07, 2019

[ Upstream commit 619ad846fc3452adaf71ca246c5aa711e2055398 ]

kvm-unit-tests' eventinj "NMI failing on IDT" test results in NMI being
delivered to the host (L1) when it's running nested. The problem seems to
be: svm_complete_interrupts() raises 'nmi_injected' flag but later we
decide to reflect EXIT_NPF to L1. The flag remains pending and we do NMI
injection upon entry so it got delivered to L1 instead of L2.

It seems that VMX code solves the same issue in prepare_vmcs12(), this was
introduced with code refactoring in commit 5f3d5799 ("KVM: nVMX: Rework
event injection and recovery").
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>

60a4b3f7

svm: Fix AVIC incomplete IPI emulation · 0149b03e

由 Suravee Suthikulpanit 提交于 1月 22, 2019

[ Upstream commit bb218fbcfaaa3b115d4cd7a43c0ca164f3a96e57 ]

In case of incomplete IPI with invalid interrupt type, the current
SVM driver does not properly emulate the IPI, and fails to boot
FreeBSD guests with multiple vcpus when enabling AVIC.

Fix this by update APIC ICR high/low registers, which also
emulate sending the IPI.
Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>

0149b03e

openanolis / cloud-kernel 接近 2 年 前同步成功

openanolis / cloud-kernel
接近 2 年前同步成功